Machine learning prediction of germline
                    BRCA1/2
                    pathogenic variants in patients with ovarian cancer

Q: How does OCRA curate its research paper database?

OCRA indexes peer-reviewed papers from PubMed and CrossRef, enriched with MeSH terms, author profiles, and citation metrics. Papers are categorized by cancer type, research method, and clinical relevance.

Q: Can I search papers by MeSH term or author?

Yes. Use the search bar to filter papers by MeSH term, author name, journal, keyword, or cancer type. Each paper includes linked MeSH terms and author profiles for further exploration.

Q: How current is the paper database?

The database is updated regularly through automated ingestion from PubMed and CrossRef. Most papers appear within days of their PubMed indexing date.

Giovanni Innella

doi:10.1136/bmjhci-2025-101751

Objectives

To assess the performance of machine learning (ML) algorithms to predict the presence of germline BRCA1/2 pathogenic variants in ovarian cancer (OC) patients based on clinical–pathological features.

Methods

Clinical–pathological features of 648 patients with OC tested for BRCA1/2 were analysed using three supervised ML algorithms: random forest, boosting and support vector machine.

Results

In the ‘test’ sample, boosting proved to be the most effective algorithm (accuracy: 84.5%; precision: 80.0%; recall: 3.1%; area under the curve (AUC): 78.8%), followed by support vector machine (accuracy: 81.4%; precision: 72.7%; recall: 27.6%; AUC: 62.3%) and random forest (accuracy: 74.4%; precision: 55.6%; recall: 14.7%; AUC: 71.3%). In the ‘validation’ sample, accuracy was 79.8% for boosting, 81.7% for support vector machine, 80.8% for random forest.

In the most effective algorithm (boosting), family history of OC showed the highest relative influence (52.9), followed by histotype (19.5), personal history of breast cancer (BC) (17.1), age at diagnosis (8.4) and family history of BC (2.2), while Federation of Gynecology and Obstetrics stage had no influence.

Discussion

We identified the predictive algorithm that best estimates the a priori likelihood of being a carrier of germline BRCA1/2 pathogenic variants in patients with OC. These findings support a role for ML approaches in predicting BRCA1/2 status in patients with OC, but accuracy and precision are still suboptimal for clinical use, suggesting the need for additional research.

Conclusions

Results support the selection of relevant clinical features for predictive purposes, which could have significant implications for the clinical management of patients with OC.

Machine learning prediction of germline BRCA1/2 pathogenic variants in patients with ovarian cancer

Objectives

Methods

Results

Discussion

Conclusions

Links

Journal

Institutions

Authors

Funding