TY - GEN
T1 - MDL hierarchical clustering with incomplete data
AU - Lai, Po Hsiang
AU - O'Sullivan, Joseph A.
PY - 2010
Y1 - 2010
N2 - The goal of stemmatology is to reconstruct a family tree of different variants of a text resulting from imperfect copying, which is a crucial part of textual criticism. In reality, historians often have incomplete data because some variants are not yet discovered and there are missing portions in available variants due to physical damage. Stemmatology is similar to molecular phylogenetics where biologists aim to reconstruct the evolutionary history of species based on genetic or protein sequences. Adoption of phylogenetics methods has lead to encouraging results in automatic stemmatology. We discuss and demonstrate the potential application of minimum description length (MDL) concepts to stemmatology. Our method is applied to a realistic dataset and outperforms major existing methods.
AB - The goal of stemmatology is to reconstruct a family tree of different variants of a text resulting from imperfect copying, which is a crucial part of textual criticism. In reality, historians often have incomplete data because some variants are not yet discovered and there are missing portions in available variants due to physical damage. Stemmatology is similar to molecular phylogenetics where biologists aim to reconstruct the evolutionary history of species based on genetic or protein sequences. Adoption of phylogenetics methods has lead to encouraging results in automatic stemmatology. We discuss and demonstrate the potential application of minimum description length (MDL) concepts to stemmatology. Our method is applied to a realistic dataset and outperforms major existing methods.
UR - https://www.scopus.com/pages/publications/77952722725
U2 - 10.1109/ITA.2010.5454099
DO - 10.1109/ITA.2010.5454099
M3 - Conference contribution
AN - SCOPUS:77952722725
SN - 9781424470143
T3 - 2010 Information Theory and Applications Workshop, ITA 2010 - Conference Proceedings
SP - 369
EP - 373
BT - 2010 Information Theory and Applications Workshop, ITA 2010 - Conference Proceedings
T2 - 2010 Information Theory and Applications Workshop, ITA 2010
Y2 - 31 January 2010 through 5 February 2010
ER -