Journal

Bioinformatics

Papers (9)

ImmunoPepper: extracting personalized peptides from complex splicing graphs

Abstract Motivation RNA sequencing enables the characterization of a cell’s transcript isoforms in healthy and disease conditions. In the context of cancer, local transcript variability may translate to splicing-derived tumor-associated peptides recognized by the immune system. A software tool that extracts such candidate peptides, is of great interest for personalized cancer therapy. Results We present the open-source software tool ImmunoPepper, which extracts a set of biologically plausible peptides from a splicing graph, derived from a set of RNA-seq datasets. This peptide set can be personalized with germline and somatic variation and takes novel RNA splice variants into account. ImmunoPepper supports several filtering options, including subtraction of normal tissue background, prediction of MHC-binding affinity, as well as MassSpec-based validation of identified peptides. We analyzed 32 ovarian cancer (TCGA-OV) and 31 breast invasive carcinoma (TCGA-BRCA) samples, with a strict cancer-specific filtering configuration, and obtained on average 834 and 569 cancer-specific predicted MHC-I binding 9-mers per sample, for each cohort, respectively. MassSpec validation with the target-decoy competition Subset-Neighbor-Search (SNS) showed an average validation rate of 4.5% per TCGA-OV sample and 5.3% per TCGA-BRCA sample. This corresponded to 25 MHC-I binders 9-mers per TCGA-OV sample, and 20 MHC-I binders 9-mers per TCGA-BRCA sample in average. Finally, we draw conclusions about the best framework for generation of splicing-derived neoepitopes and recommend to use joint data structures when processing homogeneously a cancer and a normal cohort and to focus on reproducibility of the candidates across generation pipelines. Availability and implementation ImmunoPepper is implemented in Python 3 and is available as open-source software at https://github.com/ratschlab/immunopepper. The online documentation can be found at https://immunopepper.readthedocs.io/en/latest/.

GammaGateR: semi-automated marker gating for single-cell multiplexed imaging

Abstract Motivation Multiplexed immunofluorescence (mIF) is an emerging assay for multichannel protein imaging that can decipher cell-level spatial features in tissues. However, existing automated cell phenotyping methods, such as clustering, face challenges in achieving consistency across experiments and often require subjective evaluation. As a result, mIF analyses often revert to marker gating based on manual thresholding of raw imaging data. Results To address the need for an evaluable semi-automated algorithm, we developed GammaGateR, an R package for interactive marker gating designed specifically for segmented cell-level data from mIF images. Based on a novel closed-form gamma mixture model, GammaGateR provides estimates of marker-positive cell proportions and soft clustering of marker-positive cells. The model incorporates user-specified constraints that provide a consistent but slide-specific model fit. We compared GammaGateR against the newest unsupervised approach for annotating mIF data, employing two colon datasets and one ovarian cancer dataset for the evaluation. We showed that GammaGateR produces highly similar results to a silver standard established through manual annotation. Furthermore, we demonstrated its effectiveness in identifying biological signals, achieved by mapping known spatial interactions between CD68 and MUC5AC cells in the colon and by accurately predicting survival in ovarian cancer patients using the phenotype probabilities as input for machine learning methods. GammaGateR is a highly efficient tool that can improve the replicability of marker gating results, while reducing the time of manual segmentation. Availability and implementation The R package is available at https://github.com/JiangmeiRubyXiong/GammaGateR.

MetDecode: methylation-based deconvolution of cell-free DNA for noninvasive multi-cancer typing

Abstract Motivation Circulating-cell free DNA (cfDNA) is widely explored as a noninvasive biomarker for cancer screening and diagnosis. The ability to decode the cells of origin in cfDNA would provide biological insights into pathophysiological mechanisms, aiding in cancer characterization and directing clinical management and follow-up. Results We developed a DNA methylation signature-based deconvolution algorithm, MetDecode, for cancer tissue origin identification. We built a reference atlas exploiting de novo and published whole-genome methylation sequencing data for colorectal, breast, ovarian, and cervical cancer, and blood-cell-derived entities. MetDecode models the contributors absent in the atlas with methylation patterns learnt on-the-fly from the input cfDNA methylation profiles. In addition, our model accounts for the coverage of each marker region to alleviate potential sources of noise. In-silico experiments showed a limit of detection down to 2.88% of tumor tissue contribution in cfDNA. MetDecode produced Pearson correlation coefficients above 0.95 and outperformed other methods in simulations (P < 0.001; T-test; one-sided). In plasma cfDNA profiles from cancer patients, MetDecode assigned the correct tissue-of-origin in 84.2% of cases. In conclusion, MetDecode can unravel alterations in the cfDNA pool components by accurately estimating the contribution of multiple tissues, while supplied with an imperfect reference atlas. Availability and implementation MetDecode is available at https://github.com/JorisVermeeschLab/MetDecode.

Prediction of molecular subtypes for endometrial cancer based on hierarchical foundation model

Abstract Motivation Endometrial cancer is a prevalent gynecological malignancy that requires accurate identification of its molecular subtypes for effective diagnosis and treatment. Four molecular subtypes with different clinical outcomes have been identified: POLE mutation, mismatch repair deficient, p53 abnormal, and no specific molecular profile. However, determining these subtypes typically relies on expensive gene sequencing. To overcome this limitation, we propose a novel method that utilizes hematoxylin and eosin-stained whole slide images to predict endometrial cancer molecular subtypes. Results Our approach leverages a hierarchical foundation model as a backbone, fine-tuned from the UNI computational pathology foundation model, to extract tissue embedding from different scales. We have achieved promising results through extensive experimentation on the Fudan University Shanghai Cancer Center cohort (N = 364). Our model demonstrates a macro-average AUROC of 0.879 (95% CI, 0.853–0.904) in a five-fold cross-validation. Compared to the current state-of-the-art molecular subtypes prediction for endometrial cancer, our method outperforms in terms of predictive accuracy and computational efficiency. Moreover, our method is highly reproducible, allowing for ease of implementation and widespread adoption. This study aims to address the cost and time constraints associated with traditional gene sequencing techniques. By providing a reliable and accessible alternative to gene sequencing, our method has the potential to revolutionize the field of endometrial cancer diagnosis and improve patient outcomes. Availability and implementation The codes and data used for generating results in this study are available at https://github.com/HaoyuCui/hi-UNI for GitHub and https://doi.org/10.5281/zenodo.14627478 for Zenodo.

PLCOjs, a FAIR GWAS web SDK for the NCI Prostate, Lung, Colorectal and Ovarian Cancer Genetic Atlas project

AbstractMotivationThe Division of Cancer Epidemiology and Genetics (DCEG) and the Division of Cancer Prevention (DCP) at the National Cancer Institute (NCI) have recently generated genome-wide association study (GWAS) data for multiple traits in the Prostate, Lung, Colorectal, and Ovarian (PLCO) Genomic Atlas project. The GWAS included 110 000 participants. The dissemination of the genetic association data through a data portal called GWAS Explorer, in a manner that addresses the modern expectations of FAIR reusability by data scientists and engineers, is the main motivation for the development of the open-source JavaScript software development kit (SDK) reported here.ResultsThe PLCO GWAS Explorer resource relies on a public stateless HTTP application programming interface (API) deployed as the sole backend service for both the landing page’s web application and third-party analytical workflows. The core PLCOjs SDK is mapped to each of the API methods, and also to each of the reference graphic visualizations in the GWAS Explorer. A few additional visualization methods extend it. As is the norm with web SDKs, no download or installation is needed and modularization supports targeted code injection for web applications, reactive notebooks (Observable) and node-based web services.Availability and implementationcode at https://github.com/episphere/plco; project page at https://episphere.github.io/plco

X-intNMF: a cross- and intra-omics regularized NMF framework for multi-omics integration

Abstract Motivation The rapid accumulation of multi-omics data presents a valuable opportunity to advance our understanding of complex diseases and biological systems, driving the development of integrative computational methods. However, the complexity of biological processes, spanning multiple molecular layers and involving intricate regulatory interactions, requires models that can capture both intra- and cross-omics relationships. Most existing integration methods primarily focus on sample-level similarities or intra-omics feature interactions, often neglecting the interactions across different omics layers. This limitation can result in the loss of critical biological information and suboptimal performance. To address this gap, we propose X-intNMF, a network-regularized non-negative matrix factorization (NMF) framework that simultaneously integrates intra- and cross-omics feature interactions into a shared low-dimensional representation (see Fig. 1). By modeling these multi-layered relationships, X-intNMF enhances the representation of biological interactions and improves integration quality and prediction accuracy. Results For evaluation, we applied X-intNMF to predict breast cancer phenotypes and classify clinical outcomes in lung and ovarian cancers using mRNA expression, microRNA expression, and DNA methylation data from TCGA. The results show that X-intNMF consistently outperforms state-of-the-art methods. Ablation studies confirm that incorporating both cross-omics and intra-omics interactions contributes significantly to the model’s improved performance. Additionally, survival analysis on 25 TCGA cancer datasets demonstrates that the integrated multi-omics representation offers strong prognostic value for both overall survival and disease-free status. These findings highlight X-intNMF’s ability to effectively model multi-layered molecular interactions while maintaining interpretability, robustness, and scalability within the NMF framework. Availability and implementation The source code and datasets supporting this study are publicly available at GitHub (https://github.com/compbiolabucf/X-intNMF) and archived on Zenodo (https://doi.org/10.5281/zenodo.18238385).

Publisher

Oxford University Press (OUP)

ISSN

1367-4811