Optimized cancer gene selection using armadillo optimization algorithm and support vector machine

Mohd Asif Shah · 2025-10-25

Feature selection is a crucial step in improving classification performance for high dimensional cancer datasets, which often contain numerous irrelevant or redundant features that can negatively impact model accuracy and increase computational load. The proposed hybrid AOA-SVM method effectively reduces the gene pool to a subset of highly informative features by implementing AOA's capability for efficient local optimization and diversity maintenance. Within the AOA framework, gene selection is refined through local optimization within smaller subgroups, followed by a shuffling phase to preserve solution diversity. This dual-phase strategy allows the identification of key genes that optimally distinguish between cancerous and healthy tissues. Selected gene subsets are then classified using SVM for high accuracy. The approach was evaluated on three major cancer datasets: leukaemia (AML, ALL), ovarian, and CNS. The method demonstrated impressive performance across all datasets. In the ovarian dataset, the model achieved 99.12 % accuracy and an AUC-ROC score of 98.83 % with only 15 selected genes. The leukaemia dataset achieved perfect classification with 100 % accuracy and an AUC-ROC score with 34 genes. For the CNS dataset, the approach maintained 100 % accuracy using 43 genes. These results validate the AOASVM method's effectiveness in producing accurate, informative gene subsets, enabling high-precision classification. The AOA-SVM hybrid presents a highly accurate and computationally efficient tool for cancer diagnostics, demonstrating significant potential for application in precision medicine by identifying minimal, biologically relevant gene markers.