Machine learning prediction of germline BRCA1/2 pathogenic variants in patients with ovarian cancer

Giovanni Innella · 2025-12-31

Objectives

To assess the performance of machine learning (ML) algorithms to predict the presence of germline BRCA1/2 pathogenic variants in ovarian cancer (OC) patients based on clinical–pathological features.

Methods

Clinical–pathological features of 648 patients with OC tested for BRCA1/2 were analysed using three supervised ML algorithms: random forest, boosting and support vector machine.

Results

In the ‘test’ sample, boosting proved to be the most effective algorithm (accuracy: 84.5%; precision: 80.0%; recall: 3.1%; area under the curve (AUC): 78.8%), followed by support vector machine (accuracy: 81.4%; precision: 72.7%; recall: 27.6%; AUC: 62.3%) and random forest (accuracy: 74.4%; precision: 55.6%; recall: 14.7%; AUC: 71.3%). In the ‘validation’ sample, accuracy was 79.8% for boosting, 81.7% for support vector machine, 80.8% for random forest.

In the most effective algorithm (boosting), family history of OC showed the highest relative influence (52.9), followed by histotype (19.5), personal history of breast cancer (BC) (17.1), age at diagnosis (8.4) and family history of BC (2.2), while Federation of Gynecology and Obstetrics stage had no influence.

Discussion

We identified the predictive algorithm that best estimates the a priori likelihood of being a carrier of germline BRCA1/2 pathogenic variants in patients with OC. These findings support a role for ML approaches in predicting BRCA1/2 status in patients with OC, but accuracy and precision are still suboptimal for clinical use, suggesting the need for additional research.

Conclusions

Results support the selection of relevant clinical features for predictive purposes, which could have significant implications for the clinical management of patients with OC.