Investigator

Guohua Wang

Full professor · Northeast Forestry University, Information and computer engineering college

GWGuohua Wang
Papers(2)
Graph-based deep lear…Application and Clini…
Collaborators(2)
Yixin LiuYuming Zhao
Institutions(2)
Northeast Forestry Un…Harbin Medical Univer…

Papers

Graph-based deep learning for integrating single-cell and bulk transcriptomic data to identify clinical cancer subtypes

Abstract The integration of single-cell RNA sequencing (scRNA-seq) and bulk transcriptomic data has become essential for deciphering the complex heterogeneity of cancer and identifying clinical cancer subtypes. However, the inherent challenges posed by the high dimensionality, sparsity, and noise characteristics of scRNA-seq data have significantly hindered its widespread clinical translation. To address these limitations, we introduce single-cell and bulk transcriptomic graph deep learning, a graph-based deep learning method that synergistically integrates scRNA-seq and bulk transcriptomic data to precisely identify cancer subtypes and predict clinical outcomes. scBGDL constructs sample-specific gene graphs modeling complex gene–gene interactions and cellular relationships. The architecture employs Graph Attention Networks for feature aggregation, MinCutPool layers for dimensionality reduction, and Transformer modules to capture high-order biological dependencies. Independently validated in each of 16 distinct The Cancer Genome Atlas cancer types, scBGDL significantly outperformed existing methods in prognostic accuracy (mean C-index: 0.7060 versus 0.6709 max competitor), demonstrating robustness and generalizability to diverse transcriptional architectures. To demonstrate clinical versatility, we further evaluated scBGDL in three therapeutic contexts using multicenter cohorts: lung adenocarcinoma survival prediction (n = 1099), epithelial ovarian cancer platinum-based chemotherapy response (n = 762), skin cutaneous melanoma immunotherapy outcome (n = 305). scBGDL consistently delivered robust risk stratification (log-rank P < 0.05 across cohorts), identified key driver edges, and uncovered clinically relevant biological interpretations. By enabling multimodal data integration and interpretable biological insights, scBGDL advances precision oncology for prognosis prediction, therapy optimization, and biomarker discovery. The source code for scBGDL model is available online (https://github.com/NEFLab/scBGDL).

Application and Clinical Value of Machine Learning‐Based Cervical Cancer Diagnosis and Prediction Model in Adjuvant Chemotherapy for Cervical Cancer: A Single‐Center, Controlled, Non‐Arbitrary Size Case‐Control Study

Objective. A case‐control study was conducted to explore the application and clinical value of machine learning‐based cervical cancer (CC) diagnosis and prediction model in adjuvant chemotherapy of CC. Methods. From August 2019 to August 2021, 46 patients with stage IA CC (study group) and 55 patients with high‐grade squamous intraepithelial lesions (HSIL) (control group) were retrospectively analyzed. All patients completed routine MRI examinations, the ADC values of diseased CC and normal cervix and cervical tissues in different stages were compared, and the changes of ADC values in CC tissues before and after chemotherapy were analyzed. The training set (IA = 37, HSIL = 44) and test set (IA = 9, HSIL = 11) are set in a ratio of 4 : 1. The preoperative MRI images were collected and uploaded to the radiomics cloud platform after preprocessing, and the cervix was manually delineated layer by layer on OSag‐T2WI, OAx‐T1WI, and OAx‐T2FS, respectively, to obtain a three‐dimensional volume of interest (VOI) of the cervix to extract omics features. Variance Threshold analysis, univariate feature selection (SelectKBest), and least absolute shrinkage and selection operator (LASSO) are adopted to reduce the dimension of data and enroll features. The arbitrary forest model was adopted for machine learning, the ROC curve was drawn, and the diagnostic performance of different sequence omics models was analyzed. Results. Compared with ADC of stage A CC and HSIL, the ADC value of CC was remarkably lower than that of normal CC (P < 0.05). The ROC curve analysis of ADC value to differentiate CC and normal cervix indicated that the AUC was 0.838 and the 95% confidence interval was 0.721–0.955. According to the maximum Youden index of 0.848, the optimal critical value of ADC was 1.267 × 10−3 mm2/s and the sensitivity and specificity were 92.21% and 9.48%, respectively. All results are indicated in Table 2. After CC treatment, 12 patients were effective (CR + PR) and 4 patients were ineffective (PD + SD). When the b value was 1000 s/mm2, the ADC value of the effective patients after the second chemotherapy was significantly higher than that of the first chemotherapy and before treatment (P < 0.05). There was no significant difference between the ADC value after the first chemotherapy and before treatment, compared with before treatment (P > 0.05). There was no significant difference in ADC value between the ineffective patients before treatment and after the first and second chemotherapy (P > 0.05). A total of 8 omics features were extracted based on OSag‐T2WI, all of which were wavelet features, including 7 texture features and 1 first‐order feature. A total of 10 omics features were extracted based on OAx‐T1WI, including 6 wavelet first‐order features, 2 gradient first‐order features, and 2 wavelet texture features. Based on OAx‐T2FS, 6 omics features were extracted, including 3 wavelet texture features, 2 original shape features, and 1 logarithmic first‐order feature. Based on OSag‐T2WI&OAx‐T2FS, 9 histological features were extracted, 4 from OSag‐T2WI and 5 from OAx‐T2FS. The diagnostic performance of the four arbitrary forest models is indicated in Table 1, and the ROC curve is indicated in Figure 6. The diagnostic performance of the omics model based on OSag‐T2WI&OAx‐T2FS was the best in both the training set and the test set. The AUC of the training set was 0.991 (95% CI (0.94, 1.00)), and the accuracy rate was 0.925. The AUC of the test set was 0.894 (95% CI (0.75, 1.00)), and the accuracy rate was 0.835. On the other hand, the diagnostic efficiency of the group model based on OAx‐T1WI was the worst in both the training set and the test set. The AUC of the training set was 0.713 (95% CI (0.52, 0.92)), and the accuracy rate was 0.71. The AUC of test set is 0.513 (95% CI (0.24, 0.77)), and the accuracy rate was 0.56, which has no practical clinical significance. Conclusion. A CC diagnosis and prediction model based on machine learning can better distinguish stage IA CC from HSIL in the absence of clear lesions, which is of great significance for reducing invasive examination before surgery, guiding surgical procedures and adjuvant chemotherapy for CC.

74Works
2Papers
2Collaborators
NeoplasmsBreast NeoplasmsTumor MicroenvironmentLiver NeoplasmsPrognosisCarcinoma, Pancreatic DuctalAntigens, Neoplasm

Positions

2020–

Full professor

Northeast Forestry University · Information and computer engineering college

2014–

Full Professor

Harbin Institute of Technology · School of Computer Science and Technology

2014–

Post Doc Fellow

Johns Hopkins University School of Medicine · Wilmer Ophthalmologic Institute

2009–

Associate Professor,

Harbin Institute of Technology · School of Computer Science and Technology

2004–

Assistant Professor

Harbin Institute of Technology · School of Computer Science and Technology

2006–

Visiting Research Associate

Indiana University School of Medicine · Department of Medicine

1999–

Teaching Assistant

Harbin Institute of Technology · School of Computer Science and Technology

Education

2009

Ph.D.

Harbin Institute of Technology

2003

M.A.

Harbin Institute of Technology

Country

CN