
Full text loading...
Though great progress has been made in deep learning-based fusion methods, there still are some troubling challenges, such as low contrast, weak feature preservation, the loss of global information, and poor color fidelity.
A multimodal medical image fusion method based on the Swin Transformer and self-supervised contrast learning is proposed. The Swin Transformer can well utilize the hierarchical attention mechanisms to model the feature dependencies at different scales, and effectively capture both the global and local information. Due to the four defined loss functions, self-supervised contrastive learning can maximize the similarity between positive samples and minimize the similarity between positive and negative samples to make the fused images closer to the source images.
Compared with the seven state-of-the-art methods, the proposed fusion method can effectively deal with darkness, brightness imbalance, edge artifacts, and pseudo-color distortion. Furthermore, for MRI-CT fusion, the mean SSIM, CC, STD and QCB are increased by 11.29%, 3.09%, 20.4% and 17.3%, respectively; for MRI-PET fusion, it can achieve the highest value of all the six objective indicators, with average increases of EN 10.96%, QAB/F 19.30%, SSIM 10.07%, CC 4.40%, STD 15.52% and QCB 15.84% respectively and for MRI-CT fusion, the mean SSIM, CC, STD and QCB are increased by 11.29%, 3.09%, 20.4% and 17.3%, respectively.
All the experimental results show significant advantages in both subjective and objective evaluation. The proposed method can maintain the image brightness, detail sharpness and edge information and effectively integrate the structural and functional information between different modalities. The objective indicators, such as the SSIM, CC, STD, QAB/F and QCB, can be significantly improved, especially in MRI-PET fusion, where all indicators reached the highest value. As a whole, it significantly enhances the image detail feature and texture while the contrast and brightness are well preserved.