Simpler predictive models provide higher accuracy for ovarian cancer detection

Q: How does OCRA curate its research paper database?

OCRA indexes peer-reviewed papers from PubMed and CrossRef, enriched with MeSH terms, author profiles, and citation metrics. Papers are categorized by cancer type, research method, and clinical relevance.

Q: Can I search papers by MeSH term or author?

Yes. Use the search bar to filter papers by MeSH term, author name, journal, keyword, or cancer type. Each paper includes linked MeSH terms and author profiles for further exploration.

Q: How current is the paper database?

The database is updated regularly through automated ingestion from PubMed and CrossRef. Most papers appear within days of their PubMed indexing date.

doi:10.7717/peerj.20525

Ovarian cancer remains a danger to women’s health, and accurate screening tests would likely increase survival. Two established protein biomarkers, CA125 and HE4, have been shown to work well in isolation, but achieve even higher accuracy when combined using logistic regression (LR). This LR-based combination of protein concentrations achieves high accuracy when distinguishing healthy samples from cancer samples (area under the curve (AUC) = 0.99). The dataset we use was obtained from a previous publication that described DELFI-Pro, an LR model combining features derived from cell-free DNA (cfDNA) with the two proteins’ concentrations. We show that many of DELFI-Pro’s cfDNA features are affected by confounding technical variation within training data, which impacts the previously reported results. A minority of the training data’s cancer samples (42 of 94) have chromosomal copy number values that are markedly different from the other samples used to evaluate the DELFI-Pro screening model. After removing those 42 samples from the training data, we find that DELFI-Pro does not outperform CA125 or CA125+HE4 protein-only screening classifiers even in cross validation, including a two-protein model published alongside the DELFI-Pro model. We conclude that DELFI-Pro does not adequately justify the inclusion of its cfDNA features. Our results are in line with the principle that simpler machine learning models will tend to exhibit better generalizability on new data.

Simpler predictive models provide higher accuracy for ovarian cancer detection

Links

Journal

Authors