TY - JOUR
T1 - Normalization of drug and therapeutic concepts with Thera-Py
AU - Cannon, Matthew
AU - Stevenson, James
AU - Kuzma, Kori
AU - Kiwala, Susanna
AU - Warner, Jeremy L.
AU - Griffith, Obi L.
AU - Griffith, Malachi
AU - Wagner, Alex H.
N1 - Publisher Copyright:
© 2023 Oxford University Press. All rights reserved.
PY - 2023/12/1
Y1 - 2023/12/1
N2 - Objective: The diversity of nomenclature and naming strategies makes therapeutic terminology difficult to manage and harmonize. As the number and complexity of available therapeutic ontologies continues to increase, the need for harmonized cross-resource mappings is becoming increasingly apparent. This study creates harmonized concept mappings that enable the linking together of like-concepts despite source-dependent differences in data structure or semantic representation. Materials and Methods: For this study, we created Thera-Py, a Python package and web API that constructs searchable concepts for drugs and therapeutic terminologies using 9 public resources and thesauri. By using a directed graph approach, Thera-Py captures commonly used aliases, trade names, annotations, and associations for any given therapeutic and combines them under a single concept record. Results: We highlight the creation of 16 069 unique merged therapeutic concepts from 9 distinct sources using Thera-Py and observe an increase in overlap of therapeutic concepts in 2 or more knowledge bases after harmonization using Thera-Py (9.8%-41.8%). Conclusion: We observe that Thera-Py tends to normalize therapeutic concepts to their underlying active ingredients (excluding nondrug therapeutics, eg, radiation therapy, biologics), and unifies all available descriptors regardless of ontological origin. Lay Summary Working with therapeutic terminology in medicine is challenging due to the ambiguity associated with different naming strategies. A therapeutic can have many different types of identifiers across many vocabularies: natural product names, chemical structures, development codes, generic names, brand names, product formulations, or treatment regiments. This diversity of nomenclature makes therapeutic terminology uniquely difficult to manage and the need for harmonized cross-resource mappings is becoming increasingly apparent. To support these mappings, we introduce Thera-Py, a Python package and web API that constructs stable, searchable therapeutic concepts for drugs and therapeutic terminology. By using a directed graph approach, Thera-Py captures commonly used aliases, trade names, annotations, and associations for any given therapeutic and harmonizes them under a single merged concept record. Using this approach, we found that Thera-Py tends to normalize therapeutic concepts to their underlying active ingredients (excluding nondrug therapeutics, eg, radiation therapy, biologics) and unifies all available descriptors regardless of ontological origin. In this report, we highlight the creation of 16 069 unique merged therapeutic concepts from 9 distinct sources and observe an increased overlap of therapeutic concepts in commonly used knowledge bases after harmonization using Thera-Py.
AB - Objective: The diversity of nomenclature and naming strategies makes therapeutic terminology difficult to manage and harmonize. As the number and complexity of available therapeutic ontologies continues to increase, the need for harmonized cross-resource mappings is becoming increasingly apparent. This study creates harmonized concept mappings that enable the linking together of like-concepts despite source-dependent differences in data structure or semantic representation. Materials and Methods: For this study, we created Thera-Py, a Python package and web API that constructs searchable concepts for drugs and therapeutic terminologies using 9 public resources and thesauri. By using a directed graph approach, Thera-Py captures commonly used aliases, trade names, annotations, and associations for any given therapeutic and combines them under a single concept record. Results: We highlight the creation of 16 069 unique merged therapeutic concepts from 9 distinct sources using Thera-Py and observe an increase in overlap of therapeutic concepts in 2 or more knowledge bases after harmonization using Thera-Py (9.8%-41.8%). Conclusion: We observe that Thera-Py tends to normalize therapeutic concepts to their underlying active ingredients (excluding nondrug therapeutics, eg, radiation therapy, biologics), and unifies all available descriptors regardless of ontological origin. Lay Summary Working with therapeutic terminology in medicine is challenging due to the ambiguity associated with different naming strategies. A therapeutic can have many different types of identifiers across many vocabularies: natural product names, chemical structures, development codes, generic names, brand names, product formulations, or treatment regiments. This diversity of nomenclature makes therapeutic terminology uniquely difficult to manage and the need for harmonized cross-resource mappings is becoming increasingly apparent. To support these mappings, we introduce Thera-Py, a Python package and web API that constructs stable, searchable therapeutic concepts for drugs and therapeutic terminology. By using a directed graph approach, Thera-Py captures commonly used aliases, trade names, annotations, and associations for any given therapeutic and harmonizes them under a single merged concept record. Using this approach, we found that Thera-Py tends to normalize therapeutic concepts to their underlying active ingredients (excluding nondrug therapeutics, eg, radiation therapy, biologics) and unifies all available descriptors regardless of ontological origin. In this report, we highlight the creation of 16 069 unique merged therapeutic concepts from 9 distinct sources and observe an increased overlap of therapeutic concepts in commonly used knowledge bases after harmonization using Thera-Py.
KW - biological ontologies
KW - health information interoperability
KW - knowledge bases
KW - medical informatics
KW - therapeutics
UR - http://www.scopus.com/inward/record.url?scp=85178031144&partnerID=8YFLogxK
U2 - 10.1093/jamiaopen/ooad093
DO - 10.1093/jamiaopen/ooad093
M3 - Article
C2 - 37954974
AN - SCOPUS:85178031144
SN - 2574-2531
VL - 6
JO - JAMIA Open
JF - JAMIA Open
IS - 4
M1 - ooad093
ER -