Journal
A pedigree-based prediction model identifies carriers of deleterious de novo mutations in families with Li-Fraumeni syndrome
De novo mutations (DNMs) are increasingly recognized as rare disease causal factors. Identifying DNM carriers will allow researchers to study the likely distinct molecular mechanisms of DNMs. We developed Famdenovo to predict DNM status (DNM or familial mutation [FM]) of deleterious autosomal dominant germline mutations for any syndrome. We introduce Famdenovo.TP53 for Li-Fraumeni syndrome (LFS) and analyze 324 LFS family pedigrees from four US cohorts: a validation set of 186 pedigrees and a discovery set of 138 pedigrees. The concordance index for Famdenovo.TP53 prediction was 0.95 (95% CI: [0.92, 0.98]). Forty individuals (95% CI: [30, 50]) were predicted as DNM carriers, increasing the total number from 42 to 82. We compared clinical and biological features of FM versus DNM carriers: (1) cancer and mutation spectra along with parental ages were similarly distributed; (2) ascertainment criteria like early-onset breast cancer (age 20–35 yr) provides a condition for an unbiased estimate of the DNM rate: 48% (23 DNMs vs. 25 FMs); and (3) hotspot mutation R248W was not observed in DNMs, although it was as prevalent as hotspot mutation R248Q in FMs. Furthermore, we introduce Famdenovo.BRCA for hereditary breast and ovarian cancer syndrome and apply it to a small set of family data from the Cancer Genetics Network. In summary, we introduce a novel statistical approach to systematically evaluate deleterious DNMs in inherited cancer syndromes. Our approach may serve as a foundation for future studies evaluating how new deleterious mutations can be established in the germline, such as those in TP53 .
The SeqSplice multiplexed minigene splicing assay for characterization and quantitation of variant-induced BRCA1 and BRCA2 splice isoforms
BRCA1 and BRCA2 germline variant classification is vital for clinical management of families with hereditary breast and ovarian cancer. However, clinical classification of rare variants outside of the splice donor/acceptor ±1,2-dinucleotides remains challenging, particularly for variants that induce new or cryptic splice site usage. Here, we present SeqSplice, a high-throughput RNA splicing methodology utilizing barcoded minigene constructs together with a bespoke bioinformatics pipeline for identifying and quantifying the impacts for splice-altering variants. SeqSplice exhibits excellent reproducibility across cDNA input and PCR cycle differences and is able to identify and quantitate transcripts that differed by a single base. Of the 193 BRCA1 and 72 BRCA2 variants profiled, 89% (237/265) had no publicly available RNA splicing data. Complete or near complete impact owing to splice site gain/loss is observed for 42 variants, with 30 (71%) producing alternative transcripts owing to new or cryptic splice sites. These findings are used to update our aberration type predictor called SpliceAI-10k calculator, resulting in 94% specificity and 90% sensitivity for major alternative transcripts (>50% proportion). Comparison of SeqSplice findings for 28 variants with published data shows the value and limitations of using construct-based results for variant classification. Overall, our findings inform use of construct-derived data for clinical variant classification. We show that construct-derived results for variants showing low or no splicing impact provide reliable evidence against variant pathogenicity, whereas—for variants demonstrating splicing impact—construct design and naturally occurring alternative splicing are important considerations for assigning and weighting evidence towards pathogenicity.
Unified integration of spatial transcriptomics across platforms with LLOKI
Spatial transcriptomics (ST) has transformed our understanding of tissue architecture and cellular interactions, but integrating ST data across platforms remains challenging due to differences in gene panels, data sparsity, and technical variability. Here, we introduce LLOKI, a novel framework for integrating imaging-based ST data from diverse platforms without requiring shared gene panels. LLOKI addresses ST integration through two key alignment tasks: feature alignment across technologies and batch alignment across data sets. Optimal transport-guided feature propagation adjusts data sparsity to match scRNA-seq references through graph-based imputation, enabling single-cell foundation models such as scGPT to generate unified features. Batch alignment then refines scGPT-transformed embeddings, mitigating batch effects while preserving biological variability. Evaluations on mouse brain samples from five different technologies demonstrate that LLOKI outperforms existing methods and is effective for cross-technology spatial gene program identification, and tissue slice alignment. Applying LLOKI to five ovarian cancer data sets, we identify an integrated gene program indicative of tumor-infiltrating T cells across gene panels. Together, LLOKI provides a robust foundation for cross-platform ST studies, with the potential to scale to large atlas data sets, enabling deeper insights into cellular organization and tissue environments.
A deconvolution framework that uses single-cell sequencing plus a small benchmark data set for accurate analysis of cell type ratios in complex tissue samples
Bulk deconvolution with single-cell/nucleus RNA-seq data is critical for understanding heterogeneity in complex biological samples, yet the technological discrepancy across sequencing platforms limits deconvolution accuracy. To address this, we utilize an experimental design to match inter-platform biological signals, hence revealing the technological discrepancy, and then develop a deconvolution framework called DeMixSC using this well-matched, that is, benchmark, data. Built upon a novel weighted nonnegative least-squares framework, DeMixSC identifies and adjusts genes with high technological discrepancy and aligns the benchmark data with large patient cohorts of matched-tissue-type for large-scale deconvolution. Our results using two benchmark data sets of healthy retinas and ovarian cancer tissues suggest much-improved deconvolution accuracy. Leveraging tissue-specific benchmark data sets, we applied DeMixSC to a large cohort of 453 age-related macular degeneration patients and a cohort of 30 ovarian cancer patients with various responses to neoadjuvant chemotherapy. Only DeMixSC successfully unveiled biologically meaningful differences across patient groups, demonstrating its broad applicability in diverse real-world clinical scenarios. Our findings reveal the impact of technological discrepancy on deconvolution performance and underscore the importance of a well-matched data set to resolve this challenge. The developed DeMixSC framework is generally applicable for accurately deconvolving large cohorts of disease tissues, including cancers, when a well-matched benchmark data set is available.
De novo detection of somatic variants in high-quality long-read single-cell RNA sequencing data
In cancer, genetic and transcriptomic variations generate clonal heterogeneity, leading to treatment resistance. Long-read single-cell RNA sequencing (LR scRNA-seq) has the potential to detect genetic and transcriptomic variations simultaneously. Here, we present LongSom, a computational workflow leveraging high-quality LR scRNA-seq data to call de novo somatic single-nucleotide variants (SNVs), including in mitochondria (mtSNVs), copy number alterations (CNAs), and gene fusions, to reconstruct the tumor clonal heterogeneity. Before somatic variant calling, LongSom reannotates marker gene-based cell types using cell mutational profiles. LongSom distinguishes somatic SNVs from noise and germline polymorphisms by applying an extensive set of hard filters and statistical tests. Applying LongSom to human ovarian cancer samples, we detected clinically relevant somatic SNVs that were validated against matched DNA samples. Leveraging somatic SNVs and fusions, LongSom found subclones with different predicted treatment outcomes. In summary, LongSom enables de novo variant detection without the need for normal samples, facilitating the study of cancer evolution, clonal heterogeneity, and treatment resistance.
Estrogen-induced chromatin looping changes identify a subset of functional regulatory elements
Transcriptional enhancers can regulate individual or multiple genes through long-range three-dimensional (3D) genome interactions, and these interactions are commonly altered in cancer. Yet, the functional relationship between changes in 3D genome interactions associated with regulatory regions and differential gene expression appears context-dependent. In this study, we used HiChIP to capture changes in 3D genome interactions between active regulatory regions of endometrial cancer cells in response to estrogen treatment and uncovered significant differential long-range interactions strongly enriched for estrogen receptor alpha (ER, also known as ESR1)–bound sites (ERBSs). The ERBSs anchoring differential chromatin loops with either a gene's promoter or distal regions were correlated with larger transcriptional responses to estrogen compared with ERBSs not involved in differential 3D genome interactions. To functionally test this observation, CRISPR-based Enhancer-i was used to deactivate specific ERBSs, which revealed a wide range of effects on the transcriptional response to estrogen. However, these effects are only subtly and not significantly stronger for ERBSs in differential chromatin loops. In addition, we observed an enrichment of 3D genome interactions between the promoters of estrogen-upregulated genes and found that looped promoters can work together cooperatively. Overall, our work reveals that estrogen treatment causes large changes in 3D genome structure in endometrial cancer cells; however, these changes are not required for a regulatory region to contribute to an estrogen transcriptional response.
Rearrangements of viral and human genomes at human papillomavirus integration events and their allele-specific impacts on cancer genome regulation
Human papillomavirus (HPV) integration has been implicated in transforming HPV infection into cancer. To resolve genome dysregulation associated with HPV integration, we performed Oxford Nanopore Technologies long-read sequencing on 72 cervical cancer genomes from a Ugandan data set that was previously characterized using short-read sequencing. We find recurrent structural rearrangement patterns at HPV integration events, which we categorize as del(etion)-like, dup(lication)-like, translocation, multi-breakpoint, or repeat region integrations. Integrations involving amplified HPV–human concatemers, particularly multi-breakpoint events, frequently harbor heterogeneous forms and copy numbers of the viral genome. Transcriptionally active integrants are characterized by unmethylated regions in both the viral and human genomes downstream from the viral transcription start site, resulting in HPV–human fusion transcripts. In contrast, integrants without evidence of expression lack consistent methylation patterns. Furthermore, whereas transcriptional dysregulation is limited to genes within 200 kb of an HPV integrant, dysregulation of the human epigenome in the form of allelic differentially methylated regions affects megabase expanses of the genome, irrespective of the integrant's transcriptional status. By elucidating the structural, epigenetic, and allele-specific impacts of HPV integration, we provide insight into the role of integrated HPV in cervical cancer.
Cold Spring Harbor Laboratory
1088-9051