MDL hierarchical clustering with incomplete data

  • Po Hsiang Lai
  • , Joseph A. O'Sullivan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

The goal of stemmatology is to reconstruct a family tree of different variants of a text resulting from imperfect copying, which is a crucial part of textual criticism. In reality, historians often have incomplete data because some variants are not yet discovered and there are missing portions in available variants due to physical damage. Stemmatology is similar to molecular phylogenetics where biologists aim to reconstruct the evolutionary history of species based on genetic or protein sequences. Adoption of phylogenetics methods has lead to encouraging results in automatic stemmatology. We discuss and demonstrate the potential application of minimum description length (MDL) concepts to stemmatology. Our method is applied to a realistic dataset and outperforms major existing methods.

Original languageEnglish
Title of host publication2010 Information Theory and Applications Workshop, ITA 2010 - Conference Proceedings
Pages369-373
Number of pages5
DOIs
StatePublished - 2010
Event2010 Information Theory and Applications Workshop, ITA 2010 - San Diego, CA, United States
Duration: Jan 31 2010Feb 5 2010

Publication series

Name2010 Information Theory and Applications Workshop, ITA 2010 - Conference Proceedings

Conference

Conference2010 Information Theory and Applications Workshop, ITA 2010
Country/TerritoryUnited States
CitySan Diego, CA
Period01/31/1002/5/10

Fingerprint

Dive into the research topics of 'MDL hierarchical clustering with incomplete data'. Together they form a unique fingerprint.

Cite this