Journal

BMC Bioinformatics

Papers (10)

Integrating genetic and gene expression data in network-based stratification analysis of cancers

Cancers are complex diseases that have heterogeneous genetic drivers and varying clinical outcomes. A critical area of cancer research is organizing patient cohorts into subtypes and associating subtypes with clinical and biological outcomes for more effective prognosis and treatment. Large-scale studies have collected a plethora of omics data across multiple tumor types, providing an extensive dataset for stratifying patient cohorts. Network-based stratification (NBS) approaches have been presented to classify cancer tumors using somatic mutation data. A challenge in cancer stratification is integrating omics data to yield clinically meaningful subtypes. In this study, we investigate a novel approach to the NBS framework by integrating somatic mutation data with RNA sequencing data and investigating the effectiveness of integrated NBS on three cancers: ovarian, bladder, and uterine cancer. We show that integrated NBS subtypes are more significantly associated with overall survival or histology. Specifically, we observe that integrated NBS subtypes for ovarian and bladder cancer were more significantly associated with patient survival than single-data type NBS subtypes, even when accounting for covariates. In addition, we show that integrated NBS subtypes for bladder and uterine are more significantly associated with tumor histology than single-data type NBS subtypes. Integrated NBS networks also reveal highly influential genes that span across multiple integrated NBS subtypes and subtype-specific genes. Pathway enrichment analysis of integrated NBS subtypes reveal overarching biological differences between subtypes. These genes and pathways are involved in a heterogeneous set of cell functions, including ubiquitin homeostasis, p53 regulation, cytokine and chemokine signaling, and cell proliferation, emphasizing the importance of identifying not only cancer-specific gene drivers but also subtype-specific tumor drivers. Our study highlights the significance of integrating multi-omics data within the NBS framework to enhance cancer subtyping, specifically its utility in offering profound implications for personalized prognosis and treatment strategies. These insights contribute to the ongoing advancement of computational subtyping methods to uncover more targeted and effective therapeutic treatments while facilitating the discovery of cancer driver genes.

Network hub-node prioritization of gene regulation with intra-network association

Abstract Background To identify and prioritize the influential hub genes in a gene-set or biological pathway, most analyses rely on calculation of marginal effects or tests of statistical significance. These procedures may be inappropriate since hub nodes are common connection points and therefore may interact with other nodes more often than non-hub nodes do. Such dependence among gene nodes can be conjectured based on the topology of the pathway network or the correlation between them. Results Here we develop a pathway activity score incorporating the marginal (local) effects of gene nodes as well as intra-network affinity measures. This score summarizes the expression levels in a gene-set/pathway for each sample, with weights on local and network information, respectively. The score is next used to examine the impact of each node through a leave-one-out evaluation. To illustrate the procedure, two cancer studies, one involving RNA-Seq from breast cancer patients with high-grade ductal carcinoma in situ and one microarray expression data from ovarian cancer patients, are used to assess the performance of the procedure, and to compare with existing methods, both ones that do and do not take into consideration correlation and network information. The hub nodes identified by the proposed procedure in the two cancer studies are known influential genes; some have been included in standard treatments and some are currently considered in clinical trials for target therapy. The results from simulation studies show that when marginal effects are mild or weak, the proposed procedure can still identify causal nodes, whereas methods relying only on marginal effect size cannot. Conclusions The NetworkHub procedure proposed in this research can effectively utilize the network information in combination with local effects derived from marker values, and provide a useful and complementary list of recommendations for prioritizing causal hubs.

DAGBagM: learning directed acyclic graphs of mixed variables with an application to identify protein biomarkers for treatment response in ovarian cancer

Abstract Background Applying directed acyclic graph (DAG) models to proteogenomic data has been shown effective for detecting causal biomarkers of complex diseases. However, there remain unsolved challenges in DAG learning to jointly model binary clinical outcome variables and continuous biomarker measurements. Results In this paper, we propose a new tool, DAGBagM, to learn DAGs with both continuous and binary nodes. By using appropriate models, DAGBagM allows for either continuous or binary nodes to be parent or child nodes. It employs a bootstrap aggregating strategy to reduce false positives in edge inference. At the same time, the aggregation procedure provides a flexible framework to robustly incorporate prior information on edges. Conclusions Through extensive simulation experiments, we demonstrate that DAGBagM has superior performance compared to alternative strategies for modeling mixed types of nodes. In addition, DAGBagM is computationally more efficient than two competing methods. When applying DAGBagM to proteogenomic datasets from ovarian cancer studies, we identify potential protein biomarkers for platinum refractory/resistant response in ovarian cancer. DAGBagM is made available as a github repository at https://github.com/jie108/dagbagM .

Machine learning-based prediction of survival prognosis in cervical cancer

Abstract Background Accurately forecasting the prognosis could improve cervical cancer management, however, the currently used clinical features are difficult to provide enough information. The aim of this study is to improve forecasting capability by developing a miRNAs-based machine learning survival prediction model. Results The expression characteristics of miRNAs were chosen as features for model development. The cervical cancer miRNA expression data was obtained from The Cancer Genome Atlas database. Preprocessing, including unquantified data removal, missing value imputation, samples normalization, log transformation, and feature scaling, was performed. In total, 42 survival-related miRNAs were identified by Cox Proportional-Hazards analysis. The patients were optimally clustered into four groups with three different 5-years survival outcome (≥ 90%, ≈ 65%, ≤ 40%) by K-means clustering algorithm base on top 10 survival-related miRNAs. According to the K-means clustering result, a prediction model with high performance was established. The pathways analysis indicated that the miRNAs used play roles involved in the regulation of cancer stem cells. Conclusion A miRNAs-based machine learning cervical cancer survival prediction model was developed that robustly stratifies cervical cancer patients into high survival rate (5-years survival rate ≥ 90%), moderate survival rate (5-years survival rate ≈ 65%), and low survival rate (5-years survival rate ≤ 40%).

Exploring the dynamics and interplay of human papillomavirus and cervical tumorigenesis by integrating biological data into a mathematical model

Abstract Background Cervical cancer is the fourth most common tumor in women worldwide, mostly resulting from high-risk human papillomavirus (HR-HPV) with persistent infection. Results The present discoveries are comprised of the following: (i) A total of 16.64% of the individuals were positive for HR-HPV infection, with 13.04% having a single HR-HPV type and 3.60% having multiple HR-HPV types. (ii) Cluster analysis showed that the infection rate trends of HPV31 and HPV33 in all infections as well as HPV33 and HPV35 in single infections in precancerous stages were very similar. (iii) The single/multiple infection proportions of HR-HPV demonstrated a trend that the multiple infections rates of HR-HPV increased as the disease developed. Conclusions The HR-HPV prevalence in outpatients was 16.64%, and the predominant HR-HPV types in the study were HPV52, HPV58 and HPV16. HR-HPV subtypes with common biological properties had similar infection rate trends in precancerous stages. Especially, as the disease development of precancer evolved, defense against HPV infection broke, meanwhile, the potential of more HPV infection increased, which resulted in increase of multiple infections of HPV.

expHRD: an individualized, transcriptome-based prediction model for homologous recombination deficiency assessment in cancer

Abstract Background Homologous recombination deficiency (HRD) stands as a clinical indicator for discerning responsive outcomes to platinum-based chemotherapy and poly ADP-ribose polymerase (PARP) inhibitors. One of the conventional approaches to HRD prognostication has generally centered on identifying deleterious mutations within the BRCA1/2 genes, along with quantifying the genomic scars, such as Genomic Instability Score (GIS) estimation with scarHRD. However, the scarHRD method has limitations in scenarios involving tumors bereft of corresponding germline data. Although several RNA-seq-based HRD prediction algorithms have been developed, they mainly support cohort-wise classification, thereby yielding HRD status without furnishing an analogous quantitative metric akin to scarHRD. This study introduces the expHRD method, which operates as a novel transcriptome-based framework tailored to n-of-1-style HRD scoring. Results The prediction model has been established using the elastic net regression method in the Cancer Genome Atlas (TCGA) pan-cancer training set. The bootstrap technique derived the HRD geneset for applying the expHRD calculation. The expHRD demonstrated a notable correlation with scarHRD and superior performance in predicting HRD-high samples. We also performed intra- and extra-cohort evaluations for clinical feasibility in the TCGA-OV and the Genomic Data Commons (GDC) ovarian cancer cohort, respectively. The innovative web service designed for ease of use is poised to extend the realms of HRD prediction across diverse malignancies, with ovarian cancer standing as an emblematic example. Conclusions Our novel approach leverages the transcriptome data, enabling the prediction of HRD status with remarkable precision. This innovative method addresses the challenges associated with limited available data, opening new avenues for utilizing transcriptomics to inform clinical decisions.

Completing a genomic characterisation of microscopic tumour samples with copy number

Abstract Background Genomic insights in settings where tumour sample sizes are limited to just hundreds or even tens of cells hold great clinical potential, but also present significant technical challenges. We previously developed the DigiPico sequencing platform to accurately identify somatic mutations from such samples. Results Here, we complete this genomic characterisation with copy number. We present a novel protocol, PicoCNV, to call allele-specific somatic copy number alterations from picogram quantities of tumour DNA. We find that PicoCNV provides exactly accurate copy number in 84% of the genome for even the smallest samples, and demonstrate its clinical potential in maintenance therapy. Conclusions PicoCNV complements our existing platform, allowing for accurate and comprehensive genomic characterisations of cancers in settings where only microscopic samples are available.

Random forests for the analysis of matched case–control studies

Abstract Background Conditional logistic regression trees have been proposed as a flexible alternative to the standard method of conditional logistic regression for the analysis of matched case–control studies. While they allow to avoid the strict assumption of linearity and automatically incorporate interactions, conditional logistic regression trees may suffer from a relatively high variability. Further machine learning methods for the analysis of matched case–control studies are missing because conventional machine learning methods cannot handle the matched structure of the data. Results A random forest method for the analysis of matched case–control studies based on conditional logistic regression trees is proposed, which overcomes the issue of high variability. It provides an accurate estimation of exposure effects while being more flexible in the functional form of covariate effects. The efficacy of the method is illustrated in a simulation study and within an application to real-world data from a matched case–control study on the effect of regular participation in cervical cancer screening on the development of cervical cancer. Conclusions The proposed random forest method is a promising add-on to the toolbox for the analysis of matched case–control studies and addresses the need for machine-learning methods in this field. It provides a more flexible approach compared to the standard method of conditional logistic regression, but also compared to conditional logistic regression trees. It allows for non-linearity and the automatic inclusion of interaction effects and is suitable both for exploratory and explanatory analyses.

ClassifieR 2.0: expanding interactive gene expression-based stratification to prostate and high-grade serous ovarian cancer

Abstract Background Advances in transcriptional profiling methods have enabled the discovery of molecular subtypes within and across traditional tissue-based cancer classifications. Such molecular subgroups hold potential for improving patient outcomes by guiding treatment decisions and revealing physiological distinctions and targetable pathways. Computational methods for stratifying transcriptomic data into molecular subgroups are increasingly abundant. However, assigning samples to these subtypes and other transcriptionally inferred predictions is time-consuming and requires significant bioinformatics expertise. To address this need, we recently reported “ClassifieR,” a flexible, interactive cloud application for the functional annotation of colorectal and breast cancer transcriptomes. Here, we report “ClassifieR 2.0” which introduces additional modules for the molecular subtyping of prostate and high-grade serous ovarian cancer (HGSOC). Results ClassifieR 2.0 introduces ClassifieRp and ClassifieRov, two specialised modules specifically designed to address the challenges of prostate and HGSOC molecular classification. ClassifieRp includes sigInfer, a method we developed to infer commercial prognostic prostate gene expression signatures from publicly available gene-lists or indeed any user-uploaded gene-list. ClassifieRov utilizes consensus molecular subtyping methods for HGSOC, including tools like consensusOV, for accurate ovarian cancer stratification. Both modules include functionalities present in the original ClassifieR framework for estimating cellular composition, predicting transcription factor (TF) activity and single sample gene set enrichment analysis (ssGSEA). Conclusions ClassifieR 2.0 combines molecular subtyping of prostate cancer and HGSOC with commonly used sample annotation tools in a single, user-friendly platform, allowing scientists without bioinformatics training to explore prostate and HGSOC transcriptional data without the need for extensive bioinformatics knowledge or manual data handling to operate various packages. Our sigInfer method within ClassifieRp enables the inference of commercially available gene signatures for prostate cancer, while ClassifieRov incorporates consensus molecular subtyping for HGSOC. Overall, ClassifieR 2.0 aims to make molecular subtyping more accessible to the wider research community. This is crucial for increased understanding of the molecular heterogeneity of these cancers and developing personalised treatment strategies.

In silico design of a multi-epitope vaccine against HPV16/18

Abstract Background Cervical cancer is the fourth most common cancer affecting women and is caused by human Papillomavirus (HPV) infections that are sexually transmitted. There are currently commercially available prophylactic vaccines that have been shown to protect vaccinated individuals against HPV infections, however, these vaccines have no therapeutic effects for those who are previously infected with the virus. The current study’s aim was to use immunoinformatics to develop a multi-epitope vaccine with therapeutic potential against cervical cancer. Results In this study, T-cell epitopes from E5 and E7 proteins of HPV16/18 were predicted. These epitopes were evaluated and chosen based on their antigenicity, allergenicity, toxicity, and induction of IFN-γ production (only in helper T lymphocytes). Then, the selected epitopes were sequentially linked by appropriate linkers. In addition, a C-terminal fragment of Mycobacterium tuberculosis heat shock protein 70 (HSP70) was used as an adjuvant for the vaccine construct. The physicochemical parameters of the vaccine construct were acceptable. Furthermore, the vaccine was soluble, highly antigenic, and non-allergenic. The vaccine’s 3D model was predicted, and the structural improvement after refinement was confirmed using the Ramachandran plot and ProSA-web. The vaccine’s B-cell epitopes were predicted. Molecular docking analysis showed that the vaccine's refined 3D model had a strong interaction with the Toll-like receptor 4. The structural stability of the vaccine construct was confirmed by molecular dynamics simulation. Codon adaptation was performed in order to achieve efficient vaccine expression in Escherichia coli strain K12 (E. coli). Subsequently, in silico cloning of the multi-epitope vaccine was conducted into pET-28a ( +) expression vector. Conclusions According to the results of bioinformatics analyses, the multi-epitope vaccine is structurally stable, as well as a non-allergic and non-toxic antigen. However, in vitro and in vivo studies are needed to validate the vaccine’s efficacy and safety. If satisfactory results are obtained from in vitro and in vivo studies, the vaccine designed in this study may be effective as a therapeutic vaccine against cervical cancer.

Publisher

Springer Science and Business Media LLC

ISSN

1471-2105