Image super-resolution method based on the interactive fusion of transformer and CNN features

Jianxin Wang, Yongsong Zou, Osama Alfarraj, Pradip Kumar Sharma, Wael Said, Jin Wang* (Corresponding Author)

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Recently, Transformer has achieved outstanding performance in the field of computer vision, where the ability to capture global context is crucial for image super-resolution (SR) reconstruction. Unlike convolutional neural networks (CNNs), Transformers lack a local mechanism for information exchange within local regions. To address this problem, we propose a U-Net network (TCSR) based on Transformer and CNN feature interaction fusion, which has skip-connections for local and global semantic feature learning. The TCSR takes the transformer blocks as the basic module of the U-Net architecture and gradually extracts multi-scale feature information while modeling global long-range dependencies. First, we propose an efficient multi-head shift transposed attention to improve the internal structure of the transformer and thus recover sufficient texture details. In addition, a feature enhancement module is inserted in the skip-connection positions to capture local structural information at different levels. Finally, to further exploit the contextual information from features, we use a locally enhanced feed-forward layer to replace the feed-forward network in each Transformer, which incorporates local feature representation into the global context. Powered by these designs, TCSR has the ability to capture both local and global dependencies for image HR reconstruction. Extensive experiments showed that compared with other state-of-the-art SR algorithms, our proposed method could effectively recover the details of the image, producing significant improvements in both visual effect and image quality.

Original languageEnglish
Number of pages13
JournalVisual Computer
Early online date3 Nov 2023
DOIs
Publication statusE-pub ahead of print - 3 Nov 2023

Bibliographical note

Funding Information:
This work was supported by the Scientific Research Fund of the Hunan Provincial Education Department (Grant No. 22C0171), the Traffic Science and Technology Project of Hunan Province (Grant No. 202042), the Research Foundation of the Education Bureau of Hunan Province (Grant No. 21B0287), and the Researchers Support Project (Grant No. RSP2023R102), King Saud University, Riyadh, Saudi Arabia.

Data Availability Statement

All data generated or analyzed during this study are included in this published article (and its supplementary information files).

Keywords

  • Feed-forward network
  • Multi-head shift transposed attention
  • Super-resolution
  • Transformer

Fingerprint

Dive into the research topics of 'Image super-resolution method based on the interactive fusion of transformer and CNN features'. Together they form a unique fingerprint.

Cite this