Investigator
Northeast Forestry University
Graph-based deep learning for integrating single-cell and bulk transcriptomic data to identify clinical cancer subtypes
Abstract The integration of single-cell RNA sequencing (scRNA-seq) and bulk transcriptomic data has become essential for deciphering the complex heterogeneity of cancer and identifying clinical cancer subtypes. However, the inherent challenges posed by the high dimensionality, sparsity, and noise characteristics of scRNA-seq data have significantly hindered its widespread clinical translation. To address these limitations, we introduce single-cell and bulk transcriptomic graph deep learning, a graph-based deep learning method that synergistically integrates scRNA-seq and bulk transcriptomic data to precisely identify cancer subtypes and predict clinical outcomes. scBGDL constructs sample-specific gene graphs modeling complex gene–gene interactions and cellular relationships. The architecture employs Graph Attention Networks for feature aggregation, MinCutPool layers for dimensionality reduction, and Transformer modules to capture high-order biological dependencies. Independently validated in each of 16 distinct The Cancer Genome Atlas cancer types, scBGDL significantly outperformed existing methods in prognostic accuracy (mean C-index: 0.7060 versus 0.6709 max competitor), demonstrating robustness and generalizability to diverse transcriptional architectures. To demonstrate clinical versatility, we further evaluated scBGDL in three therapeutic contexts using multicenter cohorts: lung adenocarcinoma survival prediction (n = 1099), epithelial ovarian cancer platinum-based chemotherapy response (n = 762), skin cutaneous melanoma immunotherapy outcome (n = 305). scBGDL consistently delivered robust risk stratification (log-rank P < 0.05 across cohorts), identified key driver edges, and uncovered clinically relevant biological interpretations. By enabling multimodal data integration and interpretable biological insights, scBGDL advances precision oncology for prognosis prediction, therapy optimization, and biomarker discovery. The source code for scBGDL model is available online (https://github.com/NEFLab/scBGDL).
Computational identification of DNA damage-relevant lncRNAs for predicting therapeutic efficacy and clinical outcomes in cancer