Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition

Yuchen Hu; Chen Chen; Ruizhe Li; Qiushi Zhu; Eng Siong Chng

doi:10.48550/ARXIV.2302.11362

Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition

Yuchen Hu, Chen Chen, Ruizhe Li, Qiushi Zhu, Eng Siong Chng

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

10 Citations (Scopus)

Abstract

Speech enhancement (SE) is proved effective in reducing noise from noisy speech signals for downstream automatic speech recognition (ASR), where multi-task learning strategy is employed to jointly optimize these two tasks. However, the enhanced speech learned by SE objective may not always yield good ASR results. From the optimization view, there sometimes exists interference between the gradients of SE and ASR tasks, which could hinder the multi-task learning and finally lead to sub-optimal ASR performance. In this paper, we propose a simple yet effective approach called gradient remedy (GR) to solve interference between task gradients in noise-robust speech recognition, from perspectives of both angle and magnitude. Specifically, we first project the SE task's gradient onto a dynamic surface that is at acute angle to ASR gradient, in order to remove the conflict between them and assist in ASR optimization. Furthermore, we adaptively rescale the magnitude of two gradients to prevent the dominant ASR task from being misled by SE gradient. Experimental results show that the proposed approach well resolves the gradient interference and achieves relative word error rate (WER) reductions of 9.3% and 11.1% over multi-task learning baseline, on RATS and CHiME-4 datasets, respectively. Our code is available at GitHub.

Original language	English
Title of host publication	ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Publisher	IEEE Explore
Number of pages	5
ISBN (Electronic)	978-1-7281-6327-7
ISBN (Print)	978-1-7281-6328-4
DOIs	https://doi.org/10.48550/ARXIV.2302.11362 https://doi.org/10.1109/ICASSP49357.2023.10096615
Publication status	Published - 4 Jun 2023
Event	2023 IEEE International Conference on Acoustics, Speech and Signal Processing: 48th ICASSP - Rodos Palace Luxury Convention Resort, Rhodes Island, Greece Duration: 4 Jun 2023 → 10 Jun 2023 Conference number: 48th https://2023.ieeeicassp.org/

Conference

Conference	2023 IEEE International Conference on Acoustics, Speech and Signal Processing
Country/Territory	Greece
City	Rhodes Island
Period	4/06/23 → 10/06/23
Internet address	https://2023.ieeeicassp.org/

Bibliographical note

This research is supported by National Research Foundation Singapore under its AI Singapore Programme (Award Number: AISG2-100E-2022-10).

Keywords

Gradient remedy
Multi-task learning
speech enhancement
noise-robust speech recognition
gradient interference

Access to Document

10.48550/ARXIV.2302.11362Licence: Unspecified
10.1109/ICASSP49357.2023.10096615Licence: Unspecified

Embargoed Document

Hu_etal_IEEE_Gradient_Remedy_For_AAM
Accepted author manuscript, 713 KB
Licence: Unspecified
Embargo ends: 16/03/25

Cite this

Hu, Y, Chen, C, Li, R, Zhu, Q & Chng, ES 2023, Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition. in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE Explore, 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Rhodes Island, Greece, 4/06/23. https://doi.org/10.48550/ARXIV.2302.11362, https://doi.org/10.1109/ICASSP49357.2023.10096615

@inproceedings{12049ed28b644a9188f98fac39da81fe,

title = "Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition",

abstract = "Speech enhancement (SE) is proved effective in reducing noise from noisy speech signals for downstream automatic speech recognition (ASR), where multi-task learning strategy is employed to jointly optimize these two tasks. However, the enhanced speech learned by SE objective may not always yield good ASR results. From the optimization view, there sometimes exists interference between the gradients of SE and ASR tasks, which could hinder the multi-task learning and finally lead to sub-optimal ASR performance. In this paper, we propose a simple yet effective approach called gradient remedy (GR) to solve interference between task gradients in noise-robust speech recognition, from perspectives of both angle and magnitude. Specifically, we first project the SE task's gradient onto a dynamic surface that is at acute angle to ASR gradient, in order to remove the conflict between them and assist in ASR optimization. Furthermore, we adaptively rescale the magnitude of two gradients to prevent the dominant ASR task from being misled by SE gradient. Experimental results show that the proposed approach well resolves the gradient interference and achieves relative word error rate (WER) reductions of 9.3% and 11.1% over multi-task learning baseline, on RATS and CHiME-4 datasets, respectively. Our code is available at GitHub.",

keywords = "Gradient remedy, Multi-task learning, speech enhancement, noise-robust speech recognition, gradient interference",

author = "Yuchen Hu and Chen Chen and Ruizhe Li and Qiushi Zhu and Chng, {Eng Siong}",

note = "This research is supported by National Research Foundation Singapore under its AI Singapore Programme (Award Number: AISG2-100E-2022-10). ; 2023 IEEE International Conference on Acoustics, Speech and Signal Processing : 48th ICASSP ; Conference date: 04-06-2023 Through 10-06-2023",

year = "2023",

month = jun,

day = "4",

doi = "10.48550/ARXIV.2302.11362",

language = "English",

isbn = "978-1-7281-6328-4",

booktitle = "ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)",

publisher = "IEEE Explore",

url = "https://2023.ieeeicassp.org/",

}

TY - GEN

T1 - Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition

AU - Hu, Yuchen

AU - Chen, Chen

AU - Li, Ruizhe

AU - Zhu, Qiushi

AU - Chng, Eng Siong

N1 - Conference code: 48th

PY - 2023/6/4

Y1 - 2023/6/4

N2 - Speech enhancement (SE) is proved effective in reducing noise from noisy speech signals for downstream automatic speech recognition (ASR), where multi-task learning strategy is employed to jointly optimize these two tasks. However, the enhanced speech learned by SE objective may not always yield good ASR results. From the optimization view, there sometimes exists interference between the gradients of SE and ASR tasks, which could hinder the multi-task learning and finally lead to sub-optimal ASR performance. In this paper, we propose a simple yet effective approach called gradient remedy (GR) to solve interference between task gradients in noise-robust speech recognition, from perspectives of both angle and magnitude. Specifically, we first project the SE task's gradient onto a dynamic surface that is at acute angle to ASR gradient, in order to remove the conflict between them and assist in ASR optimization. Furthermore, we adaptively rescale the magnitude of two gradients to prevent the dominant ASR task from being misled by SE gradient. Experimental results show that the proposed approach well resolves the gradient interference and achieves relative word error rate (WER) reductions of 9.3% and 11.1% over multi-task learning baseline, on RATS and CHiME-4 datasets, respectively. Our code is available at GitHub.

AB - Speech enhancement (SE) is proved effective in reducing noise from noisy speech signals for downstream automatic speech recognition (ASR), where multi-task learning strategy is employed to jointly optimize these two tasks. However, the enhanced speech learned by SE objective may not always yield good ASR results. From the optimization view, there sometimes exists interference between the gradients of SE and ASR tasks, which could hinder the multi-task learning and finally lead to sub-optimal ASR performance. In this paper, we propose a simple yet effective approach called gradient remedy (GR) to solve interference between task gradients in noise-robust speech recognition, from perspectives of both angle and magnitude. Specifically, we first project the SE task's gradient onto a dynamic surface that is at acute angle to ASR gradient, in order to remove the conflict between them and assist in ASR optimization. Furthermore, we adaptively rescale the magnitude of two gradients to prevent the dominant ASR task from being misled by SE gradient. Experimental results show that the proposed approach well resolves the gradient interference and achieves relative word error rate (WER) reductions of 9.3% and 11.1% over multi-task learning baseline, on RATS and CHiME-4 datasets, respectively. Our code is available at GitHub.

KW - Gradient remedy

KW - Multi-task learning

KW - speech enhancement

KW - noise-robust speech recognition

KW - gradient interference

U2 - 10.48550/ARXIV.2302.11362

DO - 10.48550/ARXIV.2302.11362

M3 - Published conference contribution

SN - 978-1-7281-6328-4

BT - ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

PB - IEEE Explore

T2 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing

Y2 - 4 June 2023 through 10 June 2023

ER -

Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition

Abstract

Conference

Bibliographical note

Keywords

Access to Document

Embargoed Document

Fingerprint

Cite this