Abstract
The Unified Medical Language System (UMLS) is the largest thesaurus in the biomedical informatics domain. Previous works have shown that knowledge constructs comprised of transitively-associated UMLS concepts are effective for discovering potentially novel biomedical hypotheses. However, the extremely large size of the UMLS becomes a major challenge for these applications. To address this problem, we designed a k-neighborhood Decentralization Labeling Scheme (. kDLS) for the UMLS, and the corresponding method to effectively evaluate the kDLS indexing results. kDLS provides a comprehensive solution for indexing the UMLS for very efficient large scale knowledge discovery. We demonstrated that it is highly effective to use kDLS paths to prioritize disease-gene relations across the whole genome, with extremely high fold-enrichment values. To our knowledge, this is the first indexing scheme capable of supporting efficient large scale knowledge discovery on the UMLS as a whole. Our expectation is that kDLS will become a vital engine for retrieving information and generating hypotheses from the UMLS for future medical informatics applications.
Original language | English |
---|---|
Pages (from-to) | 323-336 |
Number of pages | 14 |
Journal | Journal of Biomedical Informatics |
Volume | 45 |
Issue number | 2 |
DOIs | |
State | Published - Apr 2012 |
Keywords
- Disease gene prioritization
- Fold enrichment
- Graph database
- Knowledge discovery
- UMLS