..."> ..."> close

Tech

COVID-19 HPO Dataset

In order to provide a symptom-oriented view (via Human Phenotype Ontology concepts) over the current literature, we have annotated all abstracts published on COVID-19 since October 2019 using our HPO concept recognizer (https://track.health/2020/03/09/hpo-concept-recognition). Details on the dataset can be found below. Data can be downloaded at: https://github.com/pryzm-health-org/covid19-data.

Data entries:

  • Term CURIE
  • Term label
  • Abstracts count
  • List of PMID

Format:

CURIE | Label | Count | PMID list

 High-level stats:

  • Last updated: 29 March 2020
  • Current count of COVID-19 abstracts parsed: 36,922
  • Current count of COVID-19 abstracts with annotations: 19,446
  • Number of unique HPO terms: 2,797

Top-level term distribution: