TY - GEN
T1 - Joint Training of Hierarchical GANs and Semantic Segmentation for Expression Translation
AU - Bodur, Rumeysa
AU - Bhattarai, Binod
AU - Kim, Tae Kyun
PY - 2023/5/5
Y1 - 2023/5/5
N2 - Manipulating images by changing only specific attributes has been a long-standing research problem. Existing methods that rely solely on a global generator often suffer from changing unwanted attributes along with the desired attributes. Although hierarchical networks consisting of global and local networks have shown success, they extract local regions using bounding boxes and are non-differential, inaccurate, and unrealistic. As a result, the solution becomes suboptimal and introduces unwanted artifacts. A recent study has shown a strong correlation between facial attributes and local regions. To exploit this correlation, we have designed a unified architecture that combines semantic segmentation and hierarchical GANs. One advantage of our end-to-end differential framework is that the segmentation network conditions the GANs during the forward pass, and gradients from the GANs are propagated to the segmentation network during the backward pass, allowing both architectures to benefit from each other. We evaluated our method on two challenging expression translation benchmarks, AffectNet and RaFD, and a segmentation benchmark, CelebAMask-HQ, validating its effectiveness over existing methods.
AB - Manipulating images by changing only specific attributes has been a long-standing research problem. Existing methods that rely solely on a global generator often suffer from changing unwanted attributes along with the desired attributes. Although hierarchical networks consisting of global and local networks have shown success, they extract local regions using bounding boxes and are non-differential, inaccurate, and unrealistic. As a result, the solution becomes suboptimal and introduces unwanted artifacts. A recent study has shown a strong correlation between facial attributes and local regions. To exploit this correlation, we have designed a unified architecture that combines semantic segmentation and hierarchical GANs. One advantage of our end-to-end differential framework is that the segmentation network conditions the GANs during the forward pass, and gradients from the GANs are propagated to the segmentation network during the backward pass, allowing both architectures to benefit from each other. We evaluated our method on two challenging expression translation benchmarks, AffectNet and RaFD, and a segmentation benchmark, CelebAMask-HQ, validating its effectiveness over existing methods.
KW - expression manipulation
KW - generative adversarial networks
KW - semantic segmentation
UR - http://www.scopus.com/inward/record.url?scp=85180419092&partnerID=8YFLogxK
U2 - 10.1109/ICASSP49357.2023.10097243
DO - 10.1109/ICASSP49357.2023.10097243
M3 - Published conference contribution
AN - SCOPUS:85180419092
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
BT - ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing
PB - IEEE Computer Society
T2 - 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
Y2 - 4 June 2023 through 10 June 2023
ER -