PubMedBERT
Microsoft Research
BERT encoder pre-trained exclusively on PubMed abstracts with domain-specific vocabulary. Baseline for biomedical NER, relation extraction, and classification.
Best For
Fine-tuning for custom biomedical NLP tasks; drug-target interaction extraction from literature
License
Open Source (check repo)
Strengths
- +~2.5M monthly HuggingFace downloads
- +Outperforms GPT-4 on biomedical NER by 15-31 F1 points
Limitations
- −Encoder-only (not generative)
- −Requires task-specific fine-tuning
- −Static weights
R&D Pipeline Coverage
Related Tools
PubTator3
NCBI / NIH
AI-powered literature resource providing automated NER and relation annotations across ~36M PubMed abstracts and ~6M PMC full-text articles. Updated weekly.
INDRA
Gyori Lab, Harvard Medical School
Automated knowledge assembly system that reads NLP systems and databases, standardizes causal statements, and assembles them into executable mechanistic models.
More in Target Discovery
Open Targets Platform
EBI / Genentech / GSK / MSD / Pfizer / Sanofi / Wellcome Sanger
Integrates 23+ public data sources to systematically score and rank target-disease associations. Provides target prioritization based on clinical precedence and tractability.
DisGeNET
IMIM / DisGeNET (commercial entity)
Comprehensive gene-disease and variant-disease association database. >2M GDAs, >4M VDAs, >20M DDAs. Integrates curated repositories, GWAS, animal models, and NLP-extracted evidence.
STRING v12.5
EMBL / SIB / CPR
Functional protein-protein association networks across 12,535 organisms. v12.5 added a regulatory network layer capturing directionality via LLM-parsed literature.
Stay updated on PubMedBERT
Weekly newsletter covering AI tool releases, benchmarks, and what practitioners actually use.