..."> ..."> close


HPO Concept Recognition

Deep clinical phenotyping using the Human Phenotype Ontology (HPO; https://hpo.jax.org/app/) has proven to have a significant impact on the decision making process in rare disease genomic medicine [1-4]. A key challenge associated with the phenotyping process is the curation of a structured patient phenotype profile. Typically, this requires a clinician to manually select HPO concepts relevant to the patient – an activity that is tedious and counter-productive (it usually goes against the standard clinical workflow).

Clinicians write notes, observations, referral letters and reports – artefacts comprising invaluable knowledge that needs to be externalised to be usable as part of the existing rare disease knowledge graph (explore the Monarch Initiative for more details on building and analysing comprehensive cross-species phenotype knowledge graphs; http://monarchinitiative.org/). Phenotype concept recognition – or the process to bridge the gap between clinical free text and HPO concepts – is one of the key development areas for Pryzm Health.

The task itself presents the usual challenges associated with concept recognition, while adding some extra complexities of its own. For example, the use of metaphorical expressions, like ‘bell-shaped thorax‘ or ‘hitchhiker thumb‘; term coordination, e.g., ‘short and broad thumbs‘ or the complex intrinsic structure – i.e., canonical vs non-canonical forms. The concept recognition engine developed by Pryzm Health aims to address these challenges and, via its API, enables seamless integration into any application.

We evaluated the performance of our new HPO CR (PH) on the up to date version of the corpus initially published by Groza et al [5] and then extended by Couto et al [6]. Results are reported below, against existing state of the art methods – IHP [6] and NCR [7].

PH:  0.953 | 0.906 | 0.928
IHP: 0.872 | 0.854 | 0.863
NCR: 0.803 | 0.624 | 0.702

The API is now available in Beta and details can be found at: https://track.health/api/.
We welcome feedback and collaboration opportunities. Get in touch with us by sending an email to Tudor Groza (tudor.groza – at – pryzm – dot – health).


[1] Zhang, XA et al. Semantic integration of clinical laboratory tests from electronic health records for deep phenotyping and biomarker discovery. NPJ Digit Med. 2019;2. pii: 32. doi: 10.1038/s41746-019-0110-4.
[2] Kernohan KD. Evaluation of exome filtering techniques for the analysis of clinically relevant genes. Hum Mutat. 2018 Feb;39(2):197-201. doi: 10.1002/humu.23374
[3] Schubach M. Imbalance-Aware Machine Learning for Predicting Rare and Common Disease-Associated Non-Coding Variants. Sci Rep. 2017 Jun 7;7(1):2959. doi: 10.1038/s41598-017-03011-5.
[4] Smedley D. A Whole-Genome Analysis Framework for Effective Identification of Pathogenic Regulatory Variants in Mendelian Disease. Am J Hum Genet. 2016 Sep 1;99(3):595-606. doi: 10.1016/j.ajhg.2016.07.005.
[5] Groza T, Köhler S, Doelken S, Collier N, Oellrich A, Smedley D, Couto FM, Baynam G, Zankl A, Robinson PN. Automatic concept recognition using the human phenotype ontology reference and test suite corpora. Database (Oxford). 2015 Feb 27;2015. pii: bav005.
[6] Lobo M, Lamurias A, Couto FM. Identifying Human Phenotype Terms by Combining Machine Learning and Validation Rules. Biomed Res Int. 2017; 2017: 8565739.
[7] Arbabi A, Adams DR, Fidler S, Brudno M. Identifying Clinical Terms in Medical Text Using Ontology-Guided Machine Learning. JMIR Med Inform. 2019 Apr-Jun; 7(2): e12596.