The Unified Medical Language System (UMLS) is the largest thesaurus in the biomedical informatics domain. Previous works have shown that knowledge constructs comprised of transitively-associated UMLS concepts are effective for discovering potentially novel biomedical hypotheses. However, the extremely large size of the UMLS becomes a major challenge for these applications. To address this problem, we designed a k-neighborhood Decentralization Labeling Scheme (. kDLS) for the UMLS, and the corresponding method to effectively evaluate the kDLS indexing results. kDLS provides a comprehensive solution for indexing the UMLS for very efficient large scale knowledge discovery. We demonstrated that it is highly effective to use kDLS paths to prioritize disease-gene relations across the whole genome, with extremely high fold-enrichment values. To our knowledge, this is the first indexing scheme capable of supporting efficient large scale knowledge discovery on the UMLS as a whole. Our expectation is that kDLS will become a vital engine for retrieving information and generating hypotheses from the UMLS for future medical informatics applications.

Original languageEnglish
Pages (from-to)323-336
Number of pages14
JournalJournal of Biomedical Informatics
Issue number2
StatePublished - Apr 2012


  • Disease gene prioritization
  • Fold enrichment
  • Graph database
  • Knowledge discovery
  • UMLS


Dive into the research topics of 'K-Neighborhood decentralization: A comprehensive solution to index the UMLS for large scale knowledge discovery'. Together they form a unique fingerprint.

Cite this