Joint Training of Hierarchical GANs and Semantic Segmentation for Expression Translation

Rumeysa Bodur; Binod Bhattarai; Tae Kyun Kim

doi:10.1109/ICASSP49357.2023.10097243

Joint Training of Hierarchical GANs and Semantic Segmentation for Expression Translation

Rumeysa Bodur, Binod Bhattarai, Tae Kyun Kim

Computing Science

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

Abstract

Manipulating images by changing only specific attributes has been a long-standing research problem. Existing methods that rely solely on a global generator often suffer from changing unwanted attributes along with the desired attributes. Although hierarchical networks consisting of global and local networks have shown success, they extract local regions using bounding boxes and are non-differential, inaccurate, and unrealistic. As a result, the solution becomes suboptimal and introduces unwanted artifacts. A recent study has shown a strong correlation between facial attributes and local regions. To exploit this correlation, we have designed a unified architecture that combines semantic segmentation and hierarchical GANs. One advantage of our end-to-end differential framework is that the segmentation network conditions the GANs during the forward pass, and gradients from the GANs are propagated to the segmentation network during the backward pass, allowing both architectures to benefit from each other. We evaluated our method on two challenging expression translation benchmarks, AffectNet and RaFD, and a segmentation benchmark, CelebAMask-HQ, validating its effectiveness over existing methods.

Original language	English
Title of host publication	ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing
Publisher	IEEE Computer Society
Number of pages	5
ISBN (Electronic)	978-1-7281-6327-7
DOIs	https://doi.org/10.1109/ICASSP49357.2023.10097243
Publication status	Published - 5 May 2023
Event	48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Greece Duration: 4 Jun 2023 → 10 Jun 2023

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISSN (Print)	1520-6149

Conference

Conference	48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
Country/Territory	Greece
City	Rhodes Island
Period	4/06/23 → 10/06/23

Keywords

expression manipulation
generative adversarial networks
semantic segmentation

Access to Document

10.1109/ICASSP49357.2023.10097243Licence: Unspecified

Cite this

Bodur, R., Bhattarai, B., & Kim, T. K. (2023). Joint Training of Hierarchical GANs and Semantic Segmentation for Expression Translation. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). IEEE Computer Society. https://doi.org/10.1109/ICASSP49357.2023.10097243

Joint Training of Hierarchical GANs and Semantic Segmentation for Expression Translation. / Bodur, Rumeysa; Bhattarai, Binod; Kim, Tae Kyun.
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE Computer Society, 2023. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

Bodur, R, Bhattarai, B & Kim, TK 2023, Joint Training of Hierarchical GANs and Semantic Segmentation for Expression Translation. in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, IEEE Computer Society, 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023, Rhodes Island, Greece, 4/06/23. https://doi.org/10.1109/ICASSP49357.2023.10097243

Bodur R, Bhattarai B, Kim TK. Joint Training of Hierarchical GANs and Semantic Segmentation for Expression Translation. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE Computer Society. 2023. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP49357.2023.10097243

@inproceedings{fa54e5aea82d4f05bad5dcdc7be90ef2,

title = "Joint Training of Hierarchical GANs and Semantic Segmentation for Expression Translation",

abstract = "Manipulating images by changing only specific attributes has been a long-standing research problem. Existing methods that rely solely on a global generator often suffer from changing unwanted attributes along with the desired attributes. Although hierarchical networks consisting of global and local networks have shown success, they extract local regions using bounding boxes and are non-differential, inaccurate, and unrealistic. As a result, the solution becomes suboptimal and introduces unwanted artifacts. A recent study has shown a strong correlation between facial attributes and local regions. To exploit this correlation, we have designed a unified architecture that combines semantic segmentation and hierarchical GANs. One advantage of our end-to-end differential framework is that the segmentation network conditions the GANs during the forward pass, and gradients from the GANs are propagated to the segmentation network during the backward pass, allowing both architectures to benefit from each other. We evaluated our method on two challenging expression translation benchmarks, AffectNet and RaFD, and a segmentation benchmark, CelebAMask-HQ, validating its effectiveness over existing methods.",

keywords = "expression manipulation, generative adversarial networks, semantic segmentation",

author = "Rumeysa Bodur and Binod Bhattarai and Kim, {Tae Kyun}",

year = "2023",

month = may,

day = "5",

doi = "10.1109/ICASSP49357.2023.10097243",

language = "English",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "IEEE Computer Society",

booktitle = "ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing",

note = "48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 ; Conference date: 04-06-2023 Through 10-06-2023",

}

TY - GEN

T1 - Joint Training of Hierarchical GANs and Semantic Segmentation for Expression Translation

AU - Bodur, Rumeysa

AU - Bhattarai, Binod

AU - Kim, Tae Kyun

PY - 2023/5/5

Y1 - 2023/5/5

N2 - Manipulating images by changing only specific attributes has been a long-standing research problem. Existing methods that rely solely on a global generator often suffer from changing unwanted attributes along with the desired attributes. Although hierarchical networks consisting of global and local networks have shown success, they extract local regions using bounding boxes and are non-differential, inaccurate, and unrealistic. As a result, the solution becomes suboptimal and introduces unwanted artifacts. A recent study has shown a strong correlation between facial attributes and local regions. To exploit this correlation, we have designed a unified architecture that combines semantic segmentation and hierarchical GANs. One advantage of our end-to-end differential framework is that the segmentation network conditions the GANs during the forward pass, and gradients from the GANs are propagated to the segmentation network during the backward pass, allowing both architectures to benefit from each other. We evaluated our method on two challenging expression translation benchmarks, AffectNet and RaFD, and a segmentation benchmark, CelebAMask-HQ, validating its effectiveness over existing methods.

AB - Manipulating images by changing only specific attributes has been a long-standing research problem. Existing methods that rely solely on a global generator often suffer from changing unwanted attributes along with the desired attributes. Although hierarchical networks consisting of global and local networks have shown success, they extract local regions using bounding boxes and are non-differential, inaccurate, and unrealistic. As a result, the solution becomes suboptimal and introduces unwanted artifacts. A recent study has shown a strong correlation between facial attributes and local regions. To exploit this correlation, we have designed a unified architecture that combines semantic segmentation and hierarchical GANs. One advantage of our end-to-end differential framework is that the segmentation network conditions the GANs during the forward pass, and gradients from the GANs are propagated to the segmentation network during the backward pass, allowing both architectures to benefit from each other. We evaluated our method on two challenging expression translation benchmarks, AffectNet and RaFD, and a segmentation benchmark, CelebAMask-HQ, validating its effectiveness over existing methods.

KW - expression manipulation

KW - generative adversarial networks

KW - semantic segmentation

UR - http://www.scopus.com/inward/record.url?scp=85180419092&partnerID=8YFLogxK

U2 - 10.1109/ICASSP49357.2023.10097243

DO - 10.1109/ICASSP49357.2023.10097243

M3 - Published conference contribution

AN - SCOPUS:85180419092

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

BT - ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing

PB - IEEE Computer Society

T2 - 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

Y2 - 4 June 2023 through 10 June 2023

ER -

Joint Training of Hierarchical GANs and Semantic Segmentation for Expression Translation

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this