MDA-Unet: A Multi-Scale Dilated Attention U-Net for Medical Image Segmentation

Alyaa Amer; Tryphon Lambrou; Xujiong Ye

doi:10.3390/app12073676

MDA-Unet: A Multi-Scale Dilated Attention U-Net for Medical Image Segmentation

Alyaa Amer^* (Corresponding Author), Tryphon Lambrou, Xujiong Ye^* (Corresponding Author)

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

20 Citations (Scopus)

5 Downloads (Pure)

Abstract

The advanced development of deep learning methods has recently made significant improvements in medical image segmentation. Encoder–decoder networks, such as U-Net, have addressed some of the challenges in medical image segmentation with an outstanding performance, which has promoted them to be the most dominating deep learning architecture in this domain. Despite their outstanding performance, we argue that they still lack some aspects. First, there is incompatibility in U-Net’s skip connection between the encoder and decoder features due to the semantic gap between low-processed encoder features and highly processed decoder features, which adversely affects the final prediction. Second, it lacks capturing multi-scale context information and ignores the contribution of all semantic information through the segmentation process. Therefore, we propose a model named MDA-Unet, a novel multi-scale deep learning segmentation model. MDA-Unet improves upon U-Net and enhances its performance in segmenting medical images with variability in the shape and size of the region of interest. The model is integrated with a multi-scale spatial attention module, where spatial attention maps are derived from a hybrid hierarchical dilated convolution module that captures multi-scale context information. To ease the training process and reduce the gradient vanishing problem, residual blocks are deployed instead of the basic U-net blocks. Through a channel attention mechanism, the high-level decoder features are used to guide the low-level encoder features to promote the selection of meaningful context information, thus ensuring effective fusion. We evaluated our model on 2 different datasets: a lung dataset of 2628 axial CT images and an echocardiographic dataset of 2000 images, each with its own challenges. Our model has achieved a significant gain in performance with a slight increase in the number of trainable parameters in comparison with the basic U-Net model, providing a dice score of 98.3% on the lung dataset and 96.7% on the echocardiographic dataset, where the basic U-Net has achieved 94.2% on the lung dataset and 93.9% on the echocardiographic dataset.

Original language	English
Article number	3676
Number of pages	18
Journal	Applied Sciences
Volume	12
Issue number	7
Early online date	6 Apr 2022
DOIs	https://doi.org/10.3390/app12073676
Publication status	Published - 6 Apr 2022

Keywords

deep learning
U-Net
medical images
segmentation
computed tomography
echocardiography

Access to Document

10.3390/app12073676Licence: CC BY

Amer_AS_MDA_Unet_A_VoR
Copyright: © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/ 4.0/).
Final published version, 4.03 MBLicence: CC BY

Cite this

@article{0c59e98e77a043b794f959fc9ce004b7,

title = "MDA-Unet: A Multi-Scale Dilated Attention U-Net for Medical Image Segmentation",

abstract = "The advanced development of deep learning methods has recently made significant improvements in medical image segmentation. Encoder–decoder networks, such as U-Net, have addressed some of the challenges in medical image segmentation with an outstanding performance, which has promoted them to be the most dominating deep learning architecture in this domain. Despite their outstanding performance, we argue that they still lack some aspects. First, there is incompatibility in U-Net{\textquoteright}s skip connection between the encoder and decoder features due to the semantic gap between low-processed encoder features and highly processed decoder features, which adversely affects the final prediction. Second, it lacks capturing multi-scale context information and ignores the contribution of all semantic information through the segmentation process. Therefore, we propose a model named MDA-Unet, a novel multi-scale deep learning segmentation model. MDA-Unet improves upon U-Net and enhances its performance in segmenting medical images with variability in the shape and size of the region of interest. The model is integrated with a multi-scale spatial attention module, where spatial attention maps are derived from a hybrid hierarchical dilated convolution module that captures multi-scale context information. To ease the training process and reduce the gradient vanishing problem, residual blocks are deployed instead of the basic U-net blocks. Through a channel attention mechanism, the high-level decoder features are used to guide the low-level encoder features to promote the selection of meaningful context information, thus ensuring effective fusion. We evaluated our model on 2 different datasets: a lung dataset of 2628 axial CT images and an echocardiographic dataset of 2000 images, each with its own challenges. Our model has achieved a significant gain in performance with a slight increase in the number of trainable parameters in comparison with the basic U-Net model, providing a dice score of 98.3% on the lung dataset and 96.7% on the echocardiographic dataset, where the basic U-Net has achieved 94.2% on the lung dataset and 93.9% on the echocardiographic dataset.",

keywords = "deep learning, U-Net, medical images, segmentation, computed tomography, echocardiography",

author = "Alyaa Amer and Tryphon Lambrou and Xujiong Ye",

year = "2022",

month = apr,

day = "6",

doi = "10.3390/app12073676",

language = "English",

volume = "12",

journal = "Applied Sciences",

issn = "2076-3417",

publisher = "MDPI AG",

number = "7",

}

TY - JOUR

T1 - MDA-Unet

T2 - A Multi-Scale Dilated Attention U-Net for Medical Image Segmentation

AU - Amer, Alyaa

AU - Lambrou, Tryphon

AU - Ye, Xujiong

PY - 2022/4/6

Y1 - 2022/4/6

N2 - The advanced development of deep learning methods has recently made significant improvements in medical image segmentation. Encoder–decoder networks, such as U-Net, have addressed some of the challenges in medical image segmentation with an outstanding performance, which has promoted them to be the most dominating deep learning architecture in this domain. Despite their outstanding performance, we argue that they still lack some aspects. First, there is incompatibility in U-Net’s skip connection between the encoder and decoder features due to the semantic gap between low-processed encoder features and highly processed decoder features, which adversely affects the final prediction. Second, it lacks capturing multi-scale context information and ignores the contribution of all semantic information through the segmentation process. Therefore, we propose a model named MDA-Unet, a novel multi-scale deep learning segmentation model. MDA-Unet improves upon U-Net and enhances its performance in segmenting medical images with variability in the shape and size of the region of interest. The model is integrated with a multi-scale spatial attention module, where spatial attention maps are derived from a hybrid hierarchical dilated convolution module that captures multi-scale context information. To ease the training process and reduce the gradient vanishing problem, residual blocks are deployed instead of the basic U-net blocks. Through a channel attention mechanism, the high-level decoder features are used to guide the low-level encoder features to promote the selection of meaningful context information, thus ensuring effective fusion. We evaluated our model on 2 different datasets: a lung dataset of 2628 axial CT images and an echocardiographic dataset of 2000 images, each with its own challenges. Our model has achieved a significant gain in performance with a slight increase in the number of trainable parameters in comparison with the basic U-Net model, providing a dice score of 98.3% on the lung dataset and 96.7% on the echocardiographic dataset, where the basic U-Net has achieved 94.2% on the lung dataset and 93.9% on the echocardiographic dataset.

AB - The advanced development of deep learning methods has recently made significant improvements in medical image segmentation. Encoder–decoder networks, such as U-Net, have addressed some of the challenges in medical image segmentation with an outstanding performance, which has promoted them to be the most dominating deep learning architecture in this domain. Despite their outstanding performance, we argue that they still lack some aspects. First, there is incompatibility in U-Net’s skip connection between the encoder and decoder features due to the semantic gap between low-processed encoder features and highly processed decoder features, which adversely affects the final prediction. Second, it lacks capturing multi-scale context information and ignores the contribution of all semantic information through the segmentation process. Therefore, we propose a model named MDA-Unet, a novel multi-scale deep learning segmentation model. MDA-Unet improves upon U-Net and enhances its performance in segmenting medical images with variability in the shape and size of the region of interest. The model is integrated with a multi-scale spatial attention module, where spatial attention maps are derived from a hybrid hierarchical dilated convolution module that captures multi-scale context information. To ease the training process and reduce the gradient vanishing problem, residual blocks are deployed instead of the basic U-net blocks. Through a channel attention mechanism, the high-level decoder features are used to guide the low-level encoder features to promote the selection of meaningful context information, thus ensuring effective fusion. We evaluated our model on 2 different datasets: a lung dataset of 2628 axial CT images and an echocardiographic dataset of 2000 images, each with its own challenges. Our model has achieved a significant gain in performance with a slight increase in the number of trainable parameters in comparison with the basic U-Net model, providing a dice score of 98.3% on the lung dataset and 96.7% on the echocardiographic dataset, where the basic U-Net has achieved 94.2% on the lung dataset and 93.9% on the echocardiographic dataset.

KW - deep learning

KW - U-Net

KW - medical images

KW - segmentation

KW - computed tomography

KW - echocardiography

U2 - 10.3390/app12073676

DO - 10.3390/app12073676

M3 - Article

SN - 2076-3417

VL - 12

JO - Applied Sciences

JF - Applied Sciences

IS - 7

M1 - 3676

ER -

MDA-Unet: A Multi-Scale Dilated Attention U-Net for Medical Image Segmentation

Abstract

Keywords

Access to Document

Fingerprint

Cite this