Journal
Measuring the accuracy of electronic health record-based phenotyping in the All of Us Research Program to optimize statistical power for genetic association testing
Abstract Objective Accurate phenotyping is an essential task for researchers utilizing electronic health record (EHR)-linked biobank programs like the All of Us Research Program to study human genetics. However, little guidance is available on how to select an EHR-based phenotyping procedure that maximizes downstream statistical power. This study aims to estimate accuracy of three phenotype definitions of ovarian, female breast, and colorectal cancers in All of Us (v7 release) and determine which is most likely to optimize downstream statistical power for genetic association testing. Materials and Methods We used empirical carrier frequencies of deleterious variants in known risk genes to estimate the accuracy of each phenotype definition and compute statistical power after accounting for the probability of outcome misclassification. Results We found that the choice of phenotype definition can have a substantial impact on statistical power for association testing and that no approach was optimal across all tested diseases. The impact on power was particularly acute for rarer diseases and target risk alleles of moderate penetrance or low frequency. Additionally, our results suggest that the accuracy of higher-complexity phenotyping algorithms is inconsistent across Black and non-Hispanic White participants in All of Us, highlighting the potential for case ascertainment biases to impact downstream association testing. Discussion EHR-based phenotyping presents a bottleneck for maximizing power to detect novel risk alleles in All of Us, as well as a potential source of differential outcome misclassification that researchers should be aware of. We discuss the implications of this as well as potential mitigation strategies.
Mitigation of outcome conflation in predicting patient outcomes using electronic health records
Abstract Objectives Artificial intelligence (AI) models utilizing electronic health record data for disease prediction can enhance risk stratification but may lack specificity, which is crucial for reducing the economic and psychological burdens associated with false positives. This study aims to evaluate the impact of confounders on the specificity of single-outcome prediction models and assess the effectiveness of a multi-class architecture in mitigating outcome conflation. Materials and Methods We evaluated a state-of-the-art model predicting pancreatic cancer from disease code sequences in an independent cohort of 2.3 million patients and compared this single-outcome model with a multi-class model designed to predict multiple cancer types simultaneously. Additionally, we conducted a clinical simulation experiment to investigate the impact of confounders on the specificity of single-outcome prediction models. Results While we were able to independently validate the pancreatic cancer prediction model, we found that its prediction scores were also correlated with ovarian cancer, suggesting conflation of outcomes due to underlying confounders. Building on this observation, we demonstrate that the specificity of single-outcome prediction models is impaired by confounders using a clinical simulation experiment. Introducing a multi-class architecture improves specificity in predicting cancer types compared to the single-outcome model while preserving performance, mitigating the conflation of outcomes in both the real-world and simulated contexts. Discussion Our results highlight the risk of outcome conflation in single-outcome AI prediction models and demonstrate the effectiveness of a multi-class approach in mitigating this issue. Conclusion The number of predicted outcomes needs to be carefully considered when employing AI disease risk prediction models.
Electronic health records contain dispersed risk factor information that could be used to prevent breast and ovarian cancer
Abstract Objective The genetic testing for hereditary breast cancer that is most helpful in high-risk women is underused. Our objective was to quantify the risk factors for heritable breast and ovarian cancer contained in the electronic health record (EHR), to determine how many women meet national guidelines for referral to a cancer genetics professional but have no record of a referral. Methods and Materials We reviewed EHR records of a random sample of women to determine the presence and location of risk-factor information meeting National Comprehensive Cancer Network (NCCN) guidelines for a further genetic risk evaluation for breast and/or ovarian cancer, and determine whether the women were referred for such an evaluation. Results A thorough review of the EHR records of 299 women revealed that 24 (8%) met the NCCN criteria for referral for a further genetic risk evaluation; of these, 12 (50%) had no referral to a medical genetics clinic. Conclusions Half of the women whose EHR records contain risk-factor information meeting the criteria for further genetic risk evaluation for heritable forms of breast and ovarian cancer were not referred.
Electronic health records-based algorithms to screen for U.S. Centers for Disease Control and Prevention tier 1 genetic diseases: a scoping review
Abstract Objective Missed diagnosis of genetic conditions is a persistent challenge in clinical care, particularly for familial hypercholesterolemia (FH), hereditary breast and ovarian cancer (HBOC), and Lynch syndrome—conditions designated by the U.S. Centers for Disease Control and Prevention (CDC) as Tier 1 genomic applications. This scoping review summarizes evidence on the use of electronic health record (EHR)-based algorithms to identify individuals with these conditions. Materials and Methods We conducted a scoping review using the JBI Manual for Evidence Synthesis and reported results according to PRISMA-ScR guidelines. We searched Ovid MEDLINE, Embase, and Web of Science through October 2024 for studies evaluating EHR-based algorithms to identify individuals with FH, HBOC, or Lynch syndrome. Eligible studies addressed (1) performance of algorithms in detecting clinically or genetically confirmed cases or (2) outcomes from the implementation of algorithms in unselected populations with follow-up to identify new diagnoses. Results Of 598 articles screened, 22 met inclusion criteria. Most studies (20/22) focused on FH. Fourteen FH studies assessed algorithm performance, and 7 reported prospective implementation. FH algorithm performance varied widely (AUROC range 0.78-0.95), with machine learning models outperforming rule-based approaches. Implementation studies reported positive predictive values ranging from 11% to 67%. Only two studies addressed HBOC or Lynch syndrome, both using rules-based algorithms with limited sensitivity. Discussion Machine learning models consistently outperform rules-based algorithms relying on clinical criteria, but limited evidence exists for HBOC and Lynch syndrome. Conclusions Early identification of CDC Tier 1 genetic conditions through EHR-based screening algorithms holds promise but will require both technical and implementation advances to realize improved patient care and outcomes.
Oxford University Press (OUP)
1067-5027