HARVEST: Unlocking the Dark Bioactivity Data of Pharmaceutical Patents via Agentic AI
DOI:
文献链接:
其他信息:
V Shepard, A Musin, K Chebykina, NA Zeninskaya…
bioRxiv, 2026
biorxiv.org
Pharmaceutical patents contain vast Structure-Activity Relationship tables documenting protein-ligand binding data that are technically public yet computationally inaccessible, rendering this wealth of data effectively dark-trapped in unstructured archives no existing database has systematically captured. We present HARVEST, a multi-agent large language model pipeline that autonomously extracts structured bioactivity records from USPTO patent archives at $0.11 per document. Applied to 164,877 patents, HARVEST produced 3.36 …

