MBLEformer: Multi-Scale Bidirectional Lesion Enhancement Transformer for Cervical Cancer Image Segmentation

Shuhui Li

Background:

Accurate segmentation of lesion areas from Lugol's Iodine Staining images is crucial for screening pre-cancerous cervical lesions. However, in underdeveloped regions lacking skilled clinicians, this method may lead to misdiagnosis and missed diagnoses. In recent years, deep learning methods have been widely applied to assist in medical image segmentation.

Objective:

This study aims to improve the accuracy of cervical cancer lesion segmentation by addressing the limitations of Convolutional Neural Networks (CNNs) and attention mechanisms in capturing global features and refining upsampling details.

Methods:

This paper presents a Multi-Scale Bidirectional Lesion Enhancement Network, named MBLEformer, which employs the Swin Transformer encoder to extract image features at multiple stages and utilizes a multi-scale attention mechanism to capture semantic features from different perspectives. Additionally, a bidirectional lesion enhancement upsampling strategy is introduced to refine the edge details of lesion areas.

Results:

Experimental results demonstrate that the proposed model exhibits superior segmentation performance on a proprietary cervical cancer colposcopic dataset, outperforming other medical image segmentation methods, with a mean Intersection over Union (mIoU) of 82.5%, accuracy, and specificity of 94.9% and 83.6%.

Conclusion:

MBLEformer significantly improves the accuracy of lesion segmentation in iodine-stained cervical cancer images, with the potential to enhance the efficiency and accuracy of pre-cancerous lesion diagnosis and help address the issue of imbalanced medical resources.