A deconvolution framework that uses single-cell sequencing plus a small benchmark data set for accurate analysis of cell type ratios in complex tissue samples

Wenyi Wang · 2025-01-22

Bulk deconvolution with single-cell/nucleus RNA-seq data is critical for understanding heterogeneity in complex biological samples, yet the technological discrepancy across sequencing platforms limits deconvolution accuracy. To address this, we utilize an experimental design to match inter-platform biological signals, hence revealing the technological discrepancy, and then develop a deconvolution framework called DeMixSC using this well-matched, that is, benchmark, data. Built upon a novel weighted nonnegative least-squares framework, DeMixSC identifies and adjusts genes with high technological discrepancy and aligns the benchmark data with large patient cohorts of matched-tissue-type for large-scale deconvolution. Our results using two benchmark data sets of healthy retinas and ovarian cancer tissues suggest much-improved deconvolution accuracy. Leveraging tissue-specific benchmark data sets, we applied DeMixSC to a large cohort of 453 age-related macular degeneration patients and a cohort of 30 ovarian cancer patients with various responses to neoadjuvant chemotherapy. Only DeMixSC successfully unveiled biologically meaningful differences across patient groups, demonstrating its broad applicability in diverse real-world clinical scenarios. Our findings reveal the impact of technological discrepancy on deconvolution performance and underscore the importance of a well-matched data set to resolve this challenge. The developed DeMixSC framework is generally applicable for accurately deconvolving large cohorts of disease tissues, including cancers, when a well-matched benchmark data set is available.

Authors
Funding
Acquisition of the Fluidigm system to accelerate functional genomics researchCancer Center Support GrantStatistical methods for genomic analysis of heterogeneous tumorsHigh Throughput Genomic Sequencer at BCM Core FacilityAdministrative CoreTraining Program in Biostatistics for Cancer ResearchAcquisition of 10X Genomics Chromium Instrument to Accelerate Genomic and Single Cell Transcriptomic ResearchStatistical methods and tools for cancer risk prediction in families with germline mutations in TP53Tumor BiologyMolecular Basis of Human Visual System DisordersHigh-Throughput Functional Genomics to Guide Precision Oncology in Gastrointestinal TumorsBD Biosciences Special Order LSRIIGenetics of early onset retinal diseasesHuman Cell Atlas Seed Network-Breast, Chan Zuckerberg Institute, MD Anderson Colorectal Cancer FundingUS Department of Defense FundingDoD Grant PC210079Cancer Center Support GrantStatistical methods and tools for cancer risk prediction in families with germline mutations in TP53Training Program in Biostatistics for Cancer ResearchHigh-Throughput Functional Genomics to Guide Precision Oncology in Gastrointestinal TumorsTumor BiologyBD Biosciences Special Order LSRIIAcquisition of the Fluidigm system to accelerate functional genomics researchHigh Throughput Genomic Sequencer at BCM Core FacilityAcquisition of 10X Genomics Chromium Instrument to Accelerate Genomic and Single Cell Transcriptomic ResearchTumor BiologyAdministrative CoreCancer Prevention and Research Institute of Texas FundingCPRIT FundingHuman Cell Atlas Seed Network-Retina, Chan Zuckerberg Institute Grant CZF2019-02425Molecular Basis of Human Visual System DisordersGenetics of early onset retinal diseasesRetinal Research Foundation FundingStatistical methods for genomic analysis of heterogeneous tumorsCPRIT Grant RP180684Baylor College of Medicine FundingCPRIT Core Facility Support Award Grant CPRIT-RP180672CPRIT Comprehensive Cancer Epigenomics Core Facility Grant RP200504Training Program in Biostatistics for Cancer Research

NIH HHS

S10 OD018033

NCI NIH HHS

P30 CA016672

NCI NIH HHS

R01 CA268380

NIH HHS

S10 OD023469

NEI NIH HHS

P30 EY002520

NCI NIH HHS

T32 CA096520

NIH HHS

S10 OD025240

NCI NIH HHS

R01 CA239342

NCI NIH HHS

P30 CA125123

NEI NIH HHS

R01 EY022356

NCI NIH HHS

K22 CA234406

NCRR NIH HHS

S10 RR024574

NEI NIH HHS

R01 EY018571

DoD

P30CA016672

National Institutes of Health

R01CA239342

National Institutes of Health

5T32CA096520-15

National Institutes of Health

K22CA234406

National Institutes of Health

CA125123

National Institutes of Health

RR024574

National Institutes of Health

S10OD018033

National Institutes of Health

S10OD023469

National Institutes of Health

S10OD025240

National Institutes of Health

P30CA125123

National Institutes of Health

P30EY002520

National Eye Institute

R01EY022356

National Eye Institute

R01EY018571

Human Cell Atlas Seed Network-Retina, Chan Zuckerberg Institute, NIH

R01CA268380

CPRIT Comprehensive Cancer Epigenomics Core Facility

5T32CA096520-15