TY - JOUR
T1 - A fully automatic evolutionary classification of protein folds
T2 - Dali domain dictionary version 3
AU - Dietmann, Sabine
AU - Park, Jong
AU - Notredame, Cedric
AU - Heger, Andreas
AU - Lappe, Michael
AU - Holm, Liisa
PY - 2001/1/1
Y1 - 2001/1/1
N2 - The Dali Domain Dictionary (http://www.ebi.ac.uk/ dali/domain) is a numerical taxonomy of all known structures in the Protein Data Bank (PDB). The taxonomy is derived fully automatically from measurements of structural, functional and sequence similarities. Here, we report the extension of the classification to match the traditional four hierarchical levels corresponding to: (i) supersecondary structural motifs (attractors in fold space), (ii) the topology of globular domains (fold types), (iii) remote homologues (functional families) and (iv) homologues with sequence identity above 25% (sequence families). The computational definitions of attractors and functional families are new. In September 2000, the Dali classification contained 10 531 PDB entries comprising 17 101 chains, which were partitioned into five attractor regions, 1375 fold types, 2582 functional families and 3724 domain sequence families. Sequence families were further associated with 99 582 unique homologous sequences in the HSSP database, which increases the number of effectively known structures several-fold. The resulting database contains the description of protein domain architecture, the definition of structural neighbours around each known structure, the definition of structurally conserved cores and a comprehensive library of explicit multiple alignments of distantly related protein families.
AB - The Dali Domain Dictionary (http://www.ebi.ac.uk/ dali/domain) is a numerical taxonomy of all known structures in the Protein Data Bank (PDB). The taxonomy is derived fully automatically from measurements of structural, functional and sequence similarities. Here, we report the extension of the classification to match the traditional four hierarchical levels corresponding to: (i) supersecondary structural motifs (attractors in fold space), (ii) the topology of globular domains (fold types), (iii) remote homologues (functional families) and (iv) homologues with sequence identity above 25% (sequence families). The computational definitions of attractors and functional families are new. In September 2000, the Dali classification contained 10 531 PDB entries comprising 17 101 chains, which were partitioned into five attractor regions, 1375 fold types, 2582 functional families and 3724 domain sequence families. Sequence families were further associated with 99 582 unique homologous sequences in the HSSP database, which increases the number of effectively known structures several-fold. The resulting database contains the description of protein domain architecture, the definition of structural neighbours around each known structure, the definition of structurally conserved cores and a comprehensive library of explicit multiple alignments of distantly related protein families.
UR - http://www.scopus.com/inward/record.url?scp=0035167572&partnerID=8YFLogxK
U2 - 10.1093/nar/29.1.55
DO - 10.1093/nar/29.1.55
M3 - Article
C2 - 11125048
AN - SCOPUS:0035167572
SN - 0305-1048
VL - 29
SP - 55
EP - 57
JO - Nucleic acids research
JF - Nucleic acids research
IS - 1
ER -