Investigator

Pei-Chen Peng

Assistant Professor · Cedars-Sinai Medical Center, Computational Biomedicine

PPPei-Chen Peng
Papers(4)
Measuring the accurac…Rare germline genetic…Characterizing somati…Ovarian Cancer Risk V…
Collaborators(10)
Paul D P PharoahJonathan TyrerSuzana A M EzquinaDennis HazelettEd DicksRosario I CoronaKate LawrensonMichelle JonesJames D. BrentonSimon A. Gayther
Institutions(3)
Cedars Sinai Medical …University Of Cambrid…University Of Califor…

Papers

Measuring the accuracy of electronic health record-based phenotyping in the All of Us Research Program to optimize statistical power for genetic association testing

Abstract Objective Accurate phenotyping is an essential task for researchers utilizing electronic health record (EHR)-linked biobank programs like the All of Us Research Program to study human genetics. However, little guidance is available on how to select an EHR-based phenotyping procedure that maximizes downstream statistical power. This study aims to estimate accuracy of three phenotype definitions of ovarian, female breast, and colorectal cancers in All of Us (v7 release) and determine which is most likely to optimize downstream statistical power for genetic association testing. Materials and Methods We used empirical carrier frequencies of deleterious variants in known risk genes to estimate the accuracy of each phenotype definition and compute statistical power after accounting for the probability of outcome misclassification. Results We found that the choice of phenotype definition can have a substantial impact on statistical power for association testing and that no approach was optimal across all tested diseases. The impact on power was particularly acute for rarer diseases and target risk alleles of moderate penetrance or low frequency. Additionally, our results suggest that the accuracy of higher-complexity phenotyping algorithms is inconsistent across Black and non-Hispanic White participants in All of Us, highlighting the potential for case ascertainment biases to impact downstream association testing. Discussion EHR-based phenotyping presents a bottleneck for maximizing power to detect novel risk alleles in All of Us, as well as a potential source of differential outcome misclassification that researchers should be aware of. We discuss the implications of this as well as potential mitigation strategies.

Rare germline genetic variation in PAX8 transcription factor binding sites and susceptibility to epithelial ovarian cancer

Abstract Common genetic variation throughout the genome and rare coding variants identified to date explain about half of the inherited genetic component of epithelial ovarian cancer risk. It is likely that rare variation in the noncoding genome will explain some of the unexplained heritability, but identifying such variants is challenging. The primary problem is a lack of statistical power to identify individual risk variants by association, as power is a function of sample size, effect size, and allele frequency. Power can be increased by using burden tests, which test for the association of carriers of any variant in a specified genomic region. This has the effect of increasing the putative effect allele frequency. PAX8 is a transcription factor that plays a critical role in tumor progression, migration, and invasion. Furthermore, regulatory elements proximal to target genes of PAX8 are enriched for common ovarian cancer risk variants. We hypothesized that rare variation in PAX8 binding sites is also associated with ovarian cancer risk but unlikely to be associated with risk of breast, colorectal, or endometrial cancer. We have used publicly available, whole-genome sequencing data from the UK 100,000 Genomes Project to evaluate the burden of rare variation in PAX8 binding sites across the genome. Data were available for 522 ovarian cancers, 2984 breast cancers, 2696 colorectal cancers, 836 endometrial cancers, and 2253 noncancer controls. Active binding sites were defined using data from multiple PAX8 and H3K27 chromatin immunoprecipitation sequencing experiments. We found no association between the burden of rare variation in PAX8 binding sites (defined in several ways) and risk of ovarian, breast, or endometrial cancer. An apparent association with colorectal cancer was likely to be a technical artifact as a similar association was also detected for rare variation in random regions of the genome. Despite the null result, this study provides a proof-of-principle for using burden testing to identify rare, noncoding germline genetic variation associated with disease. Larger sample sizes available from large-scale sequencing projects, together with improved understanding of the function of the noncoding genome, will increase the potential of similar studies in the future.

Characterizing somatic mutations in ovarian cancer germline risk regions

Abstract Epithelial ovarian cancer (EOC) genetics research has been focused on germline or somatic mutations independently. Emerging evidence suggests that the somatic mutational landscape can be shaped by the germline genetic background. In this study, we aim to unravel the role of somatic alterations within EOC germline susceptibility regions by incorporating functional annotations. We investigate somatic events, including mutational signatures, point mutations, copy number alterations, and transcription factor binding disruptions, within 33 EOC germline susceptibility regions. Our analysis identifies significant associations between candidate germline susceptibility genes and somatic mutational signatures known to be key risk factors for EOC, such as mismatch repair deficiency, age-related mutagenesis, and homologous recombination deficiency. In addition, we find somatic point mutations and copy number alterations are significantly enriched in histotype-specific active enhancers and promoters within EOC risk loci. Furthermore, we examine the impact of germline variants and somatic mutations on transcription factor binding sites, identifying cancer developmental transcription factor motifs frequently affected by both types of mutations. Overall, our study highlights the importance of integrating germline and somatic mutations with regulatory and epigenomic data to gain insights into the genetic basis of EOC.

18Works
4Papers
10Collaborators
Ovarian NeoplasmsCarcinoma, Ovarian EpithelialColorectal NeoplasmsBreast Neoplasms

Positions

2023–

Assistant Professor

Cedars-Sinai Medical Center · Computational Biomedicine

2019–

Postdoctoral Scientist

Cedars-Sinai Medical Center · Biomedical Sciences

Education

2018

Ph.D.

University of Illinois at Urbana-Champaign · Computer Science

2013

M.S.

National Taiwan University · Computer Science

2011

B.S.

National Taiwan University · Computer Science

Country

US

Keywords
Cancer Genomics & GeneticsComputational Biology