Multimodal Medical Image Fusion Method based on the Swin Transformer and Self-supervised Contrast Learning

Yuwei Wang; Lei Wang; Zizhen Huang; Yukun Zhang; Yaolong Han

doi:10.2174/0123520965311272240709053422

ISSN: 2352-0965
E-ISSN: 2352-0973

Multimodal Medical Image Fusion Method based on the Swin Transformer and Self-supervised Contrast Learning
Authors: Yuwei Wang¹, Lei Wang¹, Zizhen Huang¹, Yukun Zhang¹ and Yaolong Han¹
View Affiliations Hide Affiliations

¹ School of Computer Science and Technology, Shandong University of Technology, Zibo 255000, China
Source: Recent Advances in Electrical & Electronic Engineering, Volume 18, Issue 7, Aug 2025, p. 819 - 833
DOI: https://doi.org/10.2174/0123520965311272240709053422
- Received: 13 Mar 2024
- Accepted: 10 Jun 2024
- Available online: 23 Jul 2024

Abstract

Background

Though great progress has been made in deep learning-based fusion methods, there still are some troubling challenges, such as low contrast, weak feature preservation, the loss of global information, and poor color fidelity.

Methods

A multimodal medical image fusion method based on the Swin Transformer and self-supervised contrast learning is proposed. The Swin Transformer can well utilize the hierarchical attention mechanisms to model the feature dependencies at different scales, and effectively capture both the global and local information. Due to the four defined loss functions, self-supervised contrastive learning can maximize the similarity between positive samples and minimize the similarity between positive and negative samples to make the fused images closer to the source images.

Results

Compared with the seven state-of-the-art methods, the proposed fusion method can effectively deal with darkness, brightness imbalance, edge artifacts, and pseudo-color distortion. Furthermore, for MRI-CT fusion, the mean SSIM, CC, STD and QCB are increased by 11.29%, 3.09%, 20.4% and 17.3%, respectively; for MRI-PET fusion, it can achieve the highest value of all the six objective indicators, with average increases of EN 10.96%, Q^AB/F 19.30%, SSIM 10.07%, CC 4.40%, STD 15.52% and QCB 15.84% respectively and for MRI-CT fusion, the mean SSIM, CC, STD and QCB are increased by 11.29%, 3.09%, 20.4% and 17.3%, respectively.

Conclusion

All the experimental results show significant advantages in both subjective and objective evaluation. The proposed method can maintain the image brightness, detail sharpness and edge information and effectively integrate the structural and functional information between different modalities. The objective indicators, such as the SSIM, CC, STD, Q^AB/F and QCB, can be significantly improved, especially in MRI-PET fusion, where all indicators reached the highest value. As a whole, it significantly enhances the image detail feature and texture while the contrast and brightness are well preserved.

Article metrics loading...

/content/journals/raeeng/10.2174/0123520965311272240709053422

2024-07-23

2025-09-26

From This Site

/content/journals/raeeng/10.2174/0123520965311272240709053422

dcterms_title,dcterms_subject,pub_keyword

-contentType:Contributor -contentType:Concept -contentType:Institution

10

5

Full text loading...

References

SuganyadeviS. SeethalakshmiV. BalasamyK. A review on deep learning in medical image analysis., vol. 11, Int J Multimed Info Retr, no. (1), pp. 19-38, 2021.
KB. SS. A fuzzy based ROI selection for encryption and watermarking in medical image using DWT and SVDMultimedia Tools Appl.20218057167718610.1007/s11042‑020‑09981‑5
[Google Scholar]
BalasamyK. KrishnarajN. VijayalakshmiK. Improving the security of medical image through neuro-fuzzy based ROI selection for reliable transmission.Multimedia Tools Appl.20228110143211433710.1007/s11042‑022‑12367‑4
[Google Scholar]
LiB. HwangJ.N. LiuZ. LiC. WangZ. PET and MRI image fusion based on a dense convolutional network with dual attentionComput. Biol. Med.2022151Pt B10633910.1016/j.compbiomed.2022.106339 36459810
[Google Scholar]
ChauhanK. ChauhanR.K. SainiA. Medical image fusion methods: Review and application in cardiac diagnosis. Image Processing for Automated Diagnosis of Cardiac Diseases.Academic press2021195215
[Google Scholar]
ZhouJ. XingX. YanM. YuanD. ZhuC. ZhangC. XuT. A fusion algorithm based on composite decomposition for PET and MRI medical images.Biomed. Signal Process. Control20227610371710.1016/j.bspc.2022.103717
[Google Scholar]
HermessiH. MouraliO. ZagroubaE. Multimodal medical image fusion review: Theoretical background and recent advances.Signal Processing202118310803610.1016/j.sigpro.2021.108036
[Google Scholar]
KarimS. TongG. LiJ. QadirA. FarooqU. YuY. Current advances and future perspectives of image fusion: A comprehensive review.Inf. Fusion2022
[Google Scholar]
DuJ. LiW. XiaoB. Anatomical-functional image fusion by information of interest in local Laplacian filtering domain.IEEE Trans. Image Process.201726125855586610.1109/TIP.2017.2745202 28858799
[Google Scholar]
ChaoZ. DuanX. JiaS. GuoX. LiuH. JiaF. Medical image fusion via discrete stationary wavelet transform and an enhanced radial basis function neural network.Appl. Soft Comput.202211810854210.1016/j.asoc.2022.108542
[Google Scholar]
UesugiF. Novel image processing method inspired by wavelet transform.Micron202316810344210.1016/j.micron.2023.103442 36921436
[Google Scholar]
TawfikN. ElnemrH.A. FakhrM. DessoukyM.I. El-SamieF.E.A. Multimodal medical image fusion using stacked auto-encoder in NSCT domain.J. Digit. Imaging20223551308132510.1007/s10278‑021‑00554‑y 35768753
[Google Scholar]
ShilpaS. RajanM.R. AshaC. ShyamL. Enhanced JAYA optimization based medical image fusion in adaptive non subsampled shearlet transform domainEngin. Sci. Technol. Int. J.10.1016/j.jestch.2022.101245
[Google Scholar]
BalasamyK. KrishnarajN. VijayalakshmiK. An adaptive neuro-fuzzy based region selection and authenticating medical image through watermarking for secure communication.Wirel. Pers. Commun.202112228172837
[Google Scholar]
SuganyadeviS. SeethalakshmiV. CVD-HNet: Classifying Pneumonia and COVID-19 in Chest X-ray Images Using Deep Network.Wirel. Pers. Commun.202212643279330310.1007/s11277‑022‑09864‑y 35756172
[Google Scholar]
PradeepG. BalaS. SatheeshN.P. MahalakshmiM. BalasamyK. SuganyadeviS. "An Effective Framework for Detecting Epileptic Seizures using CNN and Encrypted EEG Signals".2023 International Conference on Sustainable Computing and Smart Systems (ICSCSS) 14-16 June 2023, Coimbatore, India, pp.611-617.10.1109/ICSCSS57650.2023.10169196
[Google Scholar]
ShamiaD. BalasamyK. SuganyadeviS. A secure framework for medical image by integrating watermarking and encryption through fuzzy based ROI selection.J. Intell. Fuzzy Syst.20234457449745710.3233/JIFS‑222618
[Google Scholar]
ZhouT. ChengQ. LuH. LiQ. ZhangX. QiuS. Deep learning methods for medical image fusion: A review.Comput. Biol. Med.202316010695910.1016/j.compbiomed.2023.106959 37141652
[Google Scholar]
LiuY. ChenX. ChengJ. PengH. "A medical image fusion method based on convolutional neural networks".2017 20th International Conference on Information Fusion (Fusion) 10-13 July 2017, Xi'an, China, 2017, pp. 1-7.10.23919/ICIF.2017.8009769
[Google Scholar]
ZhangY. LiuY. SunP. YanH. ZhaoX. ZhangL. IFCNN: A general image fusion framework based on convolutional neural network.Inf. Fusion2020549911810.1016/j.inffus.2019.07.011
[Google Scholar]
XuH. MaJ. JiangJ. GuoX. LingH. U2Fusion: A unified unsupervised image fusion network.IEEE Trans. Pattern Anal. Mach. Intell.202244150251810.1109/TPAMI.2020.3012548 32750838
[Google Scholar]
ZhangH. XuH. XiaoY. GuoX. MaJ. Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity.Proc. Conf. AAAI Artif. Intell.2020347127971280410.1609/aaai.v34i07.6975
[Google Scholar]
XuH. MaJ. EMFusion: An unsupervised enhanced medical image fusion network.Inf. Fusion20217617718610.1016/j.inffus.2021.06.001
[Google Scholar]
ZhouT. LiQ. LuH. ChengQ. ZhangX. GAN review: Models and medical image fusion applications.Inf. Fusion20239113414810.1016/j.inffus.2022.10.017
[Google Scholar]
MaJ. XuH. JiangJ. MeiX. ZhangX.P. DDcGAN: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion.IEEE Trans. Image Process.2020294980499510.1109/TIP.2020.2977573 32167894
[Google Scholar]
ZhanB. LiD. WuX. ZhouJ. WangY. Multi-modal MRI image synthesis via GAN with multi-scale gate mergence.IEEE J. Biomed. Health Inform.2022261172610.1109/JBHI.2021.3088866 34125692
[Google Scholar]
PengC. ZhangX. YuG. LuoG. SunJ. Large kernel matters--improve semantic segmentation by global convolutional networkarXiv:1703.02719201710.1109/CVPR.2017.189
[Google Scholar]
DaiY. GiesekeF. OehmckeS. WuY. BarnardK. Attentional feature fusionarXiv:2009.140822021
[Google Scholar]
LiuZ. "Swin transformer: Hierarchical vision transformer using shifted windows".Proceedings of the IEEE/CVF international conference on computer vision 10-17 October 2021, Montreal, QC, Canada, 2021, pp. 9992-10002.10.1109/ICCV48922.2021.00986
[Google Scholar]
HuangJ. FangY. WuY. WuH. GaoZ. LiY. SerJ.D. XiaJ. YangG. Swin transformer for fast MRI.Neurocomputing202249328130410.1016/j.neucom.2022.04.051
[Google Scholar]
IqbalA. SharifM. BTS-ST: Swin transformer network for segmentation and classification of multimodality breast cancer images.Knowl. Base. Syst.202326711039310.1016/j.knosys.2023.110393
[Google Scholar]
LiJ. ZhuJ. LiC. ChenX. YangB. CGTF: Convolution-guided transformer for infrared and visible image fusion.IEEE Trans. Instrum. Meas.20227111410.1109/TIM.2022.3218574
[Google Scholar]
TangW. HeF. LiuY. DuanY. MATR: Multimodal medical image fusion via multiscale adaptive transformer.IEEE Trans. Image Process.2022315134514910.1109/TIP.2022.3193288 35901003
[Google Scholar]
MaJ. TangL. FanF. HuangJ. MeiX. MaY. SwinFusion: Cross-domain long-range learning for general image fusion via swin transformerIEEE/CAA J. Auto. Sinica2022971200121710.1109/JAS.2022.105686
[Google Scholar]
DosovitskiyA. An image is worth 16x16 words: Transformers for image recognition at scalearXiv:2010.119292020
[Google Scholar]
HeK. FanH. WuY. XieS. GirshickR. "Momentum contrast for unsupervised visual representation learning".Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 13-19 June 2020, Seattle, WA, USA, 2020, pp. 9726- 9735.10.1109/CVPR42600.2020.00975
[Google Scholar]
XiaoT. SinghM. MintunE. DarrellT. DollárP. GirshickR. Early convolutions help transformers see better.Adv. Neural Inf. Process. Syst.2021343039230400
[Google Scholar]
WangZ. BovikA.C. SheikhH.R. SimoncelliE.P. Image quality assessment: from error visibility to structural similarity.IEEE Trans. Image Process.200413460061210.1109/TIP.2003.819861 15376593
[Google Scholar]
ChenT. KornblithS. NorouziM. A simple framework for contrastive learning of visual representationsarXiv:2002.057092020
[Google Scholar]
LinT.Y. "Microsoft coco: Common objects in context".13th European Conference 2014 September 6-12, Zurich, Switzerland, pp. 740-755.
[Google Scholar]
JohnsonK. BeckerJ. A. The whole brain atlasAvailable from: https://www.med.harvard.edu/aanlib/(accessed on 23-6-2024)
TanW. TiwariP. PandeyH.M. MoreiraC. JaiswalA.K. Multimodal medical image fusion algorithm in the era of big data.Neural Comput. Appl.2020•••12110.1007/s00521‑020‑05173‑2
[Google Scholar]
ZhangY. XiangW. ZhangS. ShenJ. WeiR. BaiX. ZhangL. ZhangQ. Local extreme map guided multi-modal brain image fusion.Front. Neurosci.202216105545110.3389/fnins.2022.1055451 36389249
[Google Scholar]
SongX. WuX-J. LiH. A Medical Image Fusion Method based on MDLatLRRv2arXiv:2206.151792022
[Google Scholar]
RajalingamB. PriyaR. Multimodal medical image fusion using various hybrid fusion techniques for clinical treatment analysis.Smart Construction Research201822120
[Google Scholar]
SenguptaA. SealA. PanigrahyC. KrejcarO. YazidiA. Edge information based image fusion metrics using fractional order differentiation and sigmoidal functions.IEEE Access20208883858839810.1109/ACCESS.2020.2993607
[Google Scholar]
XuW. FuY.L. XuH. WongK.K.L. Medical image fusion using enhanced cross-visual cortex model based on artificial selection and impulse-coupled neural network.Comput. Methods Programs Biomed.202322910730410.1016/j.cmpb.2022.107304 36586176
[Google Scholar]
HanY. CaiY. CaoY. XuX. A new image fusion performance metric based on visual information fidelity.Inf. Fusion201314212713510.1016/j.inffus.2011.08.002
[Google Scholar]
YangY. CaoS. WanW. HuangS. Multi-modal medical image super-resolution fusion based on detail enhancement and weighted local energy deviation.Biomed. Signal Process. Control20238010438710.1016/j.bspc.2022.104387
[Google Scholar]
LiW. ZhangY. WangG. HuangY. LiR. DFENet: A dual-branch feature enhanced network integrating transformers and convolutional feature learning for multimodal medical image fusion.Biomed. Signal Process. Control20238010440210.1016/j.bspc.2022.104402
[Google Scholar]
NaserM.Z. AlaviA.H. Error metrics and performance fitness indicators for artificial intelligence and machine learning in engineering and sciences. Architecture.Architecture, Structures and Construction20233449951710.1007/s44150‑021‑00015‑8
[Google Scholar]
BotchkarevA. A new typology design of performance metrics to measure errors in machine learning regression algorithms.Interdisciplinary J. Info. Knowled. Manag.2019144576
[Google Scholar]

/content/journals/raeeng/10.2174/0123520965311272240709053422

Multimodal Medical Image Fusion Method based on the Swin Transformer and Self-supervised Contrast Learning

Recent Advances in Electrical & Electronic Engineering 18, 819 (2025); https://doi.org/10.2174/0123520965311272240709053422

/content/journals/raeeng/10.2174/0123520965311272240709053422

Data & Media loading...

Article Type: Research Article

Keyword(s): contrast learning; image fusion; Medical images; MRI-PET fusion; self-supervised learning; swin transformer

Multimodal Medical Image Fusion Method based on the Swin Transformer and Self-supervised Contrast Learning

Abstract

From This Site

Most Read This Month

Most Cited Most Cited RSS feed

HSLE: A Hybrid Ensemble Classifier for Prediction of Heart Disease

PMSM Drives and its Application: An Overview

ANN-fuzzy Hybrid Control Strategy for MPPT of Grid-connected PV

An Efficient Approach for Diabetes Classification Using Feature Selection and Hyperparameter Tuning

Information Leakage Tracking Algorithms in Online Social Networks

Replacement of LVDS to HVDS for Reduction in Distribution Losses, Improvement in Voltage Profiles and Economic Analysis for National Saving

Fuel Cell Fed Electrical Vehicle Performance Analysis with Enriched Switched Parameter Cuk Converter

FPGA Implementation of Power-efficient Multipliers for Digital Signal Processing Applications

Multi-objective based Hybrid Artificial Intelligence Controlled Parallel Inverter in Islanded and Grid Connected Operations

Smart Home Powered by Solar: IoT-based SEPIC Converter Control