Abstract
We present tools and workflows for latent space exploration across datasets. scCoGAPS is an implementation of NNMF that is specifically suited for large, sparse scRNA-seq datasets. ProjectR implements a transfer-learning framework that rapidly projects new data into learned latent spaces. We demonstrate the utility of this approach for de novo annotation of new datasets, cross-species analysis, linking genomic regulatory and transcriptional signatures, and exploration of features across a catalog of cell types.
Original language | English |
---|---|
Pages (from-to) | 395-411.e8 |
Journal | Cell Systems |
Volume | 8 |
Issue number | 5 |
DOIs | |
State | Published - May 22 2019 |
Keywords
- NMF
- developmental biology
- dimension reduction
- integrated analysis
- latent spaces
- retina
- scRNA-seq
- single cells
- transfer learning
Fingerprint
Dive into the research topics of 'Decomposing Cell Identity for Transfer Learning across Cellular Measurements, Platforms, Tissues, and Species'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver
}
In: Cell Systems, Vol. 8, No. 5, 22.05.2019, p. 395-411.e8.
Research output: Contribution to journal › Article › peer-review
TY - JOUR
T1 - Decomposing Cell Identity for Transfer Learning across Cellular Measurements, Platforms, Tissues, and Species
AU - Stein-O'Brien, Genevieve L.
AU - Clark, Brian S.
AU - Sherman, Thomas
AU - Zibetti, Cristina
AU - Hu, Qiwen
AU - Sealfon, Rachel
AU - Liu, Sheng
AU - Qian, Jiang
AU - Colantuoni, Carlo
AU - Blackshaw, Seth
AU - Goff, Loyal A.
AU - Fertig, Elana J.
N1 - Funding Information: This work was supported by grants from the NIH ( R01EY020560 and U01EY027267 to S.B., F32EY024201 and K99EY027844 to B.S.C., R01CA177669 , U01CA196390 , U01CA212007 , and P30CA006973 to E.J.F.); the NSF ( IOS-1656592 to L.A.G.); the Chan-Zuckerberg Initiative DAF ( 2018-182718 for Q.H., 2018-183445 to L.A.G., and 2018-183444 to E.J.F.); an advised fund of Silicon Valley Community Foundation , the Johns Hopkins University Catalyst (E.F. and L.A.G.); and Discovery awards (E.J.F.), and the Johns Hopkins University School of Medicine Synergy Award (S.B., L.A.G., and E.J.F.). Q.H. would like to thank J. Taroni for discussions on transfer learning and low-dimensional representations. The authors would like to thank C.A. Berlinicke and D.J. Zack for assistance with FACS analysis; A. Wolf and F. Theis from the Helmholtz Center, Munich, Germany for productive discussions and introductory scanpy cod; the Johns Hopkins Genetic Resources Core Facility for use of the 10× Genomics Single Cell system; and the Hopkins microarray and Deep Sequencing Core for assistance with sequencing; the CZI Jamboree, C. Greene, K. Korthauer, and A. V. Favorov for invaluable collaborations and discussions; and A. Battle, V. Yegnasubramanian, and J. Bader for comments on the manuscript. Funding Information: This work was supported by grants from the NIH (R01EY020560 and U01EY027267 to S.B. F32EY024201 and K99EY027844 to B.S.C. R01CA177669, U01CA196390, U01CA212007, and P30CA006973 to E.J.F.); the NSF (IOS-1656592 to L.A.G.); the Chan-Zuckerberg Initiative DAF (2018-182718 for Q.H. 2018-183445 to L.A.G. and 2018-183444 to E.J.F.); an advised fund of Silicon Valley Community Foundation, the Johns Hopkins University Catalyst (E.F. and L.A.G.); and Discovery awards (E.J.F.), and the Johns Hopkins University School of Medicine Synergy Award (S.B. L.A.G. and E.J.F.). Q.H. would like to thank J. Taroni for discussions on transfer learning and low-dimensional representations. The authors would like to thank C.A. Berlinicke and D.J. Zack for assistance with FACS analysis; A. Wolf and F. Theis from the Helmholtz Center, Munich, Germany for productive discussions and introductory scanpy cod; the Johns Hopkins Genetic Resources Core Facility for use of the 10? Genomics Single Cell system; and the Hopkins microarray and Deep Sequencing Core for assistance with sequencing; the CZI Jamboree, C. Greene, K. Korthauer, and A. V. Favorov for invaluable collaborations and discussions; and A. Battle, V. Yegnasubramanian, and J. Bader for comments on the manuscript. G.L.S.-O.?B. B.S.C. S.B. L.A.G. and E.J.F. conceived and directed the study. B.S.C. generated scRNA-Seq data. G.L.S.-O.?B. B.S.C. L.A.G. and E.J.F. analyzed scRNA-seq data, with L.A.G. and E.J.F. as senior bioinformaticians. C.Z. and B.S.C. generated the bulk RNA-Seq, and C.Z. generated the ATAC-seq data. G.L.S.-O.?B. T.S. L.A.G. and E.J.F. developed scCoGAPS. G.L.S.-O.?B. R.S. C.C. L.A.G. and E.J.F. contributed to the development of projectR. Q.H. and G.L.S.-O.?B. developed the random forest classifier for the projections of bulk GWCoGAPS patterns. R.S. and G.L.S.-O.?B. developed the AUC evaluation method for projected pattern weights included in the projectR package. S.L. C.Z. J.Q. and G.L.S.-O.?B. analyzed the ATAC-seq data. G.L.S.-O.?B. B.S.C. S.B. L.A.G. and E.J.F. wrote the paper with input from all co-authors. The authors declare no competing interests. Funding Information: This work was supported by grants from the NIH (R01EY020560 and U01EY027267 to S.B. F32EY024201 and K99EY027844 to B.S.C. R01CA177669, U01CA196390, U01CA212007, and P30CA006973 to E.J.F.); the NSF (IOS-1656592 to L.A.G.); the Chan-Zuckerberg Initiative DAF (2018-182718 for Q.H. 2018-183445 to L.A.G. and 2018-183444 to E.J.F.); an advised fund of Silicon Valley Community Foundation, the Johns Hopkins University Catalyst (E.F. and L.A.G.); and Discovery awards (E.J.F.), and the Johns Hopkins University School of Medicine Synergy Award (S.B. L.A.G. and E.J.F.). Q.H. would like to thank J. Taroni for discussions on transfer learning and low-dimensional representations. The authors would like to thank C.A. Berlinicke and D.J. Zack for assistance with FACS analysis; A. Wolf and F. Theis from the Helmholtz Center, Munich, Germany for productive discussions and introductory scanpy cod; the Johns Hopkins Genetic Resources Core Facility for use of the 10× Genomics Single Cell system; and the Hopkins microarray and Deep Sequencing Core for assistance with sequencing; the CZI Jamboree, C. Greene, K. Korthauer, and A. V. Favorov for invaluable collaborations and discussions; and A. Battle, V. Yegnasubramanian, and J. Bader for comments on the manuscript. G.L.S.-O.’B. B.S.C. S.B. L.A.G. and E.J.F. conceived and directed the study. B.S.C. generated scRNA-Seq data. G.L.S.-O.’B. B.S.C. L.A.G. and E.J.F. analyzed scRNA-seq data, with L.A.G. and E.J.F. as senior bioinformaticians. C.Z. and B.S.C. generated the bulk RNA-Seq, and C.Z. generated the ATAC-seq data. G.L.S.-O.’B. T.S. L.A.G. and E.J.F. developed scCoGAPS. G.L.S.-O.’B. R.S. C.C. L.A.G. and E.J.F. contributed to the development of projectR. Q.H. and G.L.S.-O.’B. developed the random forest classifier for the projections of bulk GWCoGAPS patterns. R.S. and G.L.S.-O.’B. developed the AUC evaluation method for projected pattern weights included in the projectR package. S.L. C.Z. J.Q. and G.L.S.-O.’B. analyzed the ATAC-seq data. G.L.S.-O.’B. B.S.C. S.B. L.A.G. and E.J.F. wrote the paper with input from all co-authors. The authors declare no competing interests. Publisher Copyright: © 2019 The Author(s)
PY - 2019/5/22
Y1 - 2019/5/22
N2 - We present tools and workflows for latent space exploration across datasets. scCoGAPS is an implementation of NNMF that is specifically suited for large, sparse scRNA-seq datasets. ProjectR implements a transfer-learning framework that rapidly projects new data into learned latent spaces. We demonstrate the utility of this approach for de novo annotation of new datasets, cross-species analysis, linking genomic regulatory and transcriptional signatures, and exploration of features across a catalog of cell types.
AB - We present tools and workflows for latent space exploration across datasets. scCoGAPS is an implementation of NNMF that is specifically suited for large, sparse scRNA-seq datasets. ProjectR implements a transfer-learning framework that rapidly projects new data into learned latent spaces. We demonstrate the utility of this approach for de novo annotation of new datasets, cross-species analysis, linking genomic regulatory and transcriptional signatures, and exploration of features across a catalog of cell types.
KW - NMF
KW - developmental biology
KW - dimension reduction
KW - integrated analysis
KW - latent spaces
KW - retina
KW - scRNA-seq
KW - single cells
KW - transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85065625379&partnerID=8YFLogxK
U2 - 10.1016/j.cels.2019.04.004
DO - 10.1016/j.cels.2019.04.004
M3 - Article
C2 - 31121116
AN - SCOPUS:85065625379
SN - 2405-4712
VL - 8
SP - 395-411.e8
JO - Cell Systems
JF - Cell Systems
IS - 5
ER -