Face Parsing from RGB and Depth Using Cross-Domain Mutual Learning

Jihyun Lee; Binod Bhattarai; Tae-Kyun Kim

doi:10.1109/CVPRW53098.2021.00166

Face Parsing from RGB and Depth Using Cross-Domain Mutual Learning

Jihyun Lee, Binod Bhattarai, Tae-Kyun Kim

Computing Science

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

Abstract

Existing methods of face parsing have proven effective at classifying each pixel of an RGB image into different facial components. However, there is a lack of face parsing research that utilizes depth domain. To the best of our knowledge, we present the first study to exploit 2.5D data for face parsing. We introduce a novel framework to jointly learn (1) RGB face parsing, (2) depth face parsing and (3) RGB-to-depth domain translation, which can be effective even when only a small amount of annotated depth data is available for training. To this end, we also create the first RGB-D face parsing benchmarks based on CelebAMask-HQ, LaPa and Helen by utilizing an off-the-shelf 3D head reconstruction model. Overall, our approach makes two main contributions. First, our method leverages mutual learning between RGB and depth face parsing, which enables bidirectional knowledge distillation between the two data domains. Second, our method utilizes end-to-end learning of RGB-to-depth domain translation and depth face parsing, which can help overcome the scarcity of annotated depth data. We perform extensive experiments to validate the effectiveness of our method, in which we achieve state-of-the-art results in RGB face parsing. As far as we know, we also report the first results on face parsing from depth data. All experiments are conducted on our new RGB-D face parsing datasets, which are publicly available at https://github.com/jyunlee/CelebAMask-HQ-D_LaPa-D_Helen-D.

Original language	English
Title of host publication	2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Publisher	IEEE Explore
Pages	1501-1510
Number of pages	10
ISBN (Electronic)	978-1-6654-4899-4
ISBN (Print)	978-1-6654-4900-7
DOIs	https://doi.org/10.1109/CVPRW53098.2021.00166
Publication status	Published - 1 Sept 2021

Publication series

Name	IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops
ISSN (Print)	2160-7508
ISSN (Electronic)	2160-7516

Access to Document

10.1109/CVPRW53098.2021.00166Licence: Unspecified

Cite this

Face Parsing from RGB and Depth Using Cross-Domain Mutual Learning. / Lee, Jihyun; Bhattarai, Binod; Kim, Tae-Kyun.
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE Explore, 2021. p. 1501-1510 (IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops).

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

@inproceedings{0f9d3095c685489395d0e1458dfcc625,

title = "Face Parsing from RGB and Depth Using Cross-Domain Mutual Learning",

abstract = "Existing methods of face parsing have proven effective at classifying each pixel of an RGB image into different facial components. However, there is a lack of face parsing research that utilizes depth domain. To the best of our knowledge, we present the first study to exploit 2.5D data for face parsing. We introduce a novel framework to jointly learn (1) RGB face parsing, (2) depth face parsing and (3) RGB-to-depth domain translation, which can be effective even when only a small amount of annotated depth data is available for training. To this end, we also create the first RGB-D face parsing benchmarks based on CelebAMask-HQ, LaPa and Helen by utilizing an off-the-shelf 3D head reconstruction model. Overall, our approach makes two main contributions. First, our method leverages mutual learning between RGB and depth face parsing, which enables bidirectional knowledge distillation between the two data domains. Second, our method utilizes end-to-end learning of RGB-to-depth domain translation and depth face parsing, which can help overcome the scarcity of annotated depth data. We perform extensive experiments to validate the effectiveness of our method, in which we achieve state-of-the-art results in RGB face parsing. As far as we know, we also report the first results on face parsing from depth data. All experiments are conducted on our new RGB-D face parsing datasets, which are publicly available at https://github.com/jyunlee/CelebAMask-HQ-D_LaPa-D_Helen-D.",

author = "Jihyun Lee and Binod Bhattarai and Tae-Kyun Kim",

year = "2021",

month = sep,

day = "1",

doi = "10.1109/CVPRW53098.2021.00166",

language = "English",

isbn = "978-1-6654-4900-7",

series = "IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops",

publisher = "IEEE Explore",

pages = "1501--1510",

booktitle = "2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)",

}

TY - GEN

T1 - Face Parsing from RGB and Depth Using Cross-Domain Mutual Learning

AU - Lee, Jihyun

AU - Bhattarai, Binod

AU - Kim, Tae-Kyun

PY - 2021/9/1

Y1 - 2021/9/1

N2 - Existing methods of face parsing have proven effective at classifying each pixel of an RGB image into different facial components. However, there is a lack of face parsing research that utilizes depth domain. To the best of our knowledge, we present the first study to exploit 2.5D data for face parsing. We introduce a novel framework to jointly learn (1) RGB face parsing, (2) depth face parsing and (3) RGB-to-depth domain translation, which can be effective even when only a small amount of annotated depth data is available for training. To this end, we also create the first RGB-D face parsing benchmarks based on CelebAMask-HQ, LaPa and Helen by utilizing an off-the-shelf 3D head reconstruction model. Overall, our approach makes two main contributions. First, our method leverages mutual learning between RGB and depth face parsing, which enables bidirectional knowledge distillation between the two data domains. Second, our method utilizes end-to-end learning of RGB-to-depth domain translation and depth face parsing, which can help overcome the scarcity of annotated depth data. We perform extensive experiments to validate the effectiveness of our method, in which we achieve state-of-the-art results in RGB face parsing. As far as we know, we also report the first results on face parsing from depth data. All experiments are conducted on our new RGB-D face parsing datasets, which are publicly available at https://github.com/jyunlee/CelebAMask-HQ-D_LaPa-D_Helen-D.

AB - Existing methods of face parsing have proven effective at classifying each pixel of an RGB image into different facial components. However, there is a lack of face parsing research that utilizes depth domain. To the best of our knowledge, we present the first study to exploit 2.5D data for face parsing. We introduce a novel framework to jointly learn (1) RGB face parsing, (2) depth face parsing and (3) RGB-to-depth domain translation, which can be effective even when only a small amount of annotated depth data is available for training. To this end, we also create the first RGB-D face parsing benchmarks based on CelebAMask-HQ, LaPa and Helen by utilizing an off-the-shelf 3D head reconstruction model. Overall, our approach makes two main contributions. First, our method leverages mutual learning between RGB and depth face parsing, which enables bidirectional knowledge distillation between the two data domains. Second, our method utilizes end-to-end learning of RGB-to-depth domain translation and depth face parsing, which can help overcome the scarcity of annotated depth data. We perform extensive experiments to validate the effectiveness of our method, in which we achieve state-of-the-art results in RGB face parsing. As far as we know, we also report the first results on face parsing from depth data. All experiments are conducted on our new RGB-D face parsing datasets, which are publicly available at https://github.com/jyunlee/CelebAMask-HQ-D_LaPa-D_Helen-D.

U2 - 10.1109/CVPRW53098.2021.00166

DO - 10.1109/CVPRW53098.2021.00166

M3 - Published conference contribution

SN - 978-1-6654-4900-7

T3 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops

SP - 1501

EP - 1510

BT - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

PB - IEEE Explore

ER -

Face Parsing from RGB and Depth Using Cross-Domain Mutual Learning

Abstract

Publication series

Access to Document

Fingerprint

Cite this