TY - JOUR
T1 - Protein quantification across hundreds of experimental conditions
AU - Khan, Zia
AU - Bloom, Joshua S.
AU - Garci, Benjamin A.
AU - Singh, Mona
AU - Kruglyak, Leonid
PY - 2009/9/15
Y1 - 2009/9/15
N2 - Quantitative studies of protein abundance rarely span more than a small number of experimental conditions and replicates. In contrast, quantitative studies of transcript abundance often span hundreds of experimental conditions and replicates. This situation exists, in part, because extracting quantitative data from large proteomics datasets is significantly more difficult than reading quantitative data from a gene expression microarray. To address this problem, we introduce two algorithmic advances in the processing of quantitative proteomics data. First, we use spacepartitioning data structures to handle the large size of these datasets. Second, we introduce techniques that combine graphtheoretic algorithms with space-partitioning data structures to collect relative protein abundance data across hundreds of experimental conditions and replicates. We validate these algorithmic techniques by analyzing several datasets and computing both internal and external measures of quantification accuracy. We demonstrate the scalability of these techniques by applying them to a large dataset that comprises a total of 472 experimental conditions and replicates.
AB - Quantitative studies of protein abundance rarely span more than a small number of experimental conditions and replicates. In contrast, quantitative studies of transcript abundance often span hundreds of experimental conditions and replicates. This situation exists, in part, because extracting quantitative data from large proteomics datasets is significantly more difficult than reading quantitative data from a gene expression microarray. To address this problem, we introduce two algorithmic advances in the processing of quantitative proteomics data. First, we use spacepartitioning data structures to handle the large size of these datasets. Second, we introduce techniques that combine graphtheoretic algorithms with space-partitioning data structures to collect relative protein abundance data across hundreds of experimental conditions and replicates. We validate these algorithmic techniques by analyzing several datasets and computing both internal and external measures of quantification accuracy. We demonstrate the scalability of these techniques by applying them to a large dataset that comprises a total of 472 experimental conditions and replicates.
KW - Kd-tree
KW - Orthogonal range query
KW - Quantitative proteomics
KW - Space partitioning data structures
KW - Tandem mass spectrometry
UR - http://www.scopus.com/inward/record.url?scp=70349436819&partnerID=8YFLogxK
U2 - 10.1073/pnas.0904100106
DO - 10.1073/pnas.0904100106
M3 - Article
C2 - 19717460
AN - SCOPUS:70349436819
SN - 0027-8424
VL - 106
SP - 15544
EP - 15548
JO - Proceedings of the National Academy of Sciences of the United States of America
JF - Proceedings of the National Academy of Sciences of the United States of America
IS - 37
ER -