Controllable Text Generation for All Ages: Evaluating a Plug-and-Play Approach to Age-Adapted Dialogue

Lennert Jansen; Štěpán Lars Laichter; Arabella Sinclair; Margot J. van der Goot; Raquel Fernández; Sandro Pezzelle

doi:10.18653/v1/2022.gem-1.14

Controllable Text Generation for All Ages: Evaluating a Plug-and-Play Approach to Age-Adapted Dialogue

Lennert Jansen, Štěpán Lars Laichter, Arabella Sinclair, Margot J. van der Goot, Raquel Fernández, Sandro Pezzelle

Computing Science

University of Amsterdam

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

Abstract

To be trusted and perceived as natural and coherent, conversational systems must adapt to the language of their users. While personalized dialogue is a promising direction, controlling generation for fine-grained language features remains a challenge in this approach. A recent line of research showed the effectiveness of leveraging pre-trained language models toward adapting to a text's topic or sentiment. In this study, we build on these approaches and focus on a higher-level dimension of language variation: speakers' age. We frame the task as a dialogue response generation, and test methods based on bag-of-words (BoW) and neural discriminators (Disc) to condition the output of GPT-2 and DialoGPT without altering the parameters of the language models. We show that Disc models achieve a higher degree of detectable control than BoW models based on automatic evaluation. In contrast, humans can partially detect age differences in BoW but not Disc responses. Since BoW responses are deemed better than Disc ones by humans, simple controllable methods thus appear to be a better tradeoff between adaptation and language quality. Our work confirms the challenges of adapting to higher-level dimensions of language variation. Moreover, it highlights the need to evaluate natural language generation thoroughly.

Original language	English
Title of host publication	Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)
Publisher	Association for Computational Linguistics (ACL)
Pages	172-188
Number of pages	17
ISBN (Electronic)	9781959429128
DOIs	https://doi.org/10.18653/v1/2022.gem-1.14
Publication status	Published - 8 Dec 2022
Event	2nd Workshop on Natural Language Generation, Evaluation, and Metrics, GEM 2022, as part of EMNLP 2022 - Abu Dhabi, United Arab Emirates Duration: 7 Dec 2022 → 7 Dec 2022

Publication series

Name	ACL Anthology
Publisher	Association for Computational Linguistics

Conference

Conference	2nd Workshop on Natural Language Generation, Evaluation, and Metrics, GEM 2022, as part of EMNLP 2022
Country/Territory	United Arab Emirates
City	Abu Dhabi
Period	7/12/22 → 7/12/22

Bibliographical note

Funding Information:
We would like to thank the four anonymous GEM reviewers for their valuable feedback and the participants of our crowdsourcing experiments. The work received funding from the University of Amsterdam’s Research Priority Area Human(e) AI and from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 819455).

Access to Document

10.18653/v1/2022.gem-1.14Licence: CC BY

Jansen_etal_GEM_Controllable_Text_Generation_VOR
https://creativecommons.org/licenses/by/4.0/
Final published version, 1.97 MBLicence: CC BY

Cite this

Jansen, L., Laichter, Š. L., Sinclair, A., van der Goot, M. J., Fernández, R., & Pezzelle, S. (2022). Controllable Text Generation for All Ages: Evaluating a Plug-and-Play Approach to Age-Adapted Dialogue. In Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM) (pp. 172-188). (ACL Anthology). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.gem-1.14

Controllable Text Generation for All Ages: Evaluating a Plug-and-Play Approach to Age-Adapted Dialogue. / Jansen, Lennert; Laichter, Štěpán Lars; Sinclair, Arabella et al.
Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM). Association for Computational Linguistics (ACL), 2022. p. 172-188 (ACL Anthology).

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

Jansen, L, Laichter, ŠL, Sinclair, A, van der Goot, MJ, Fernández, R & Pezzelle, S 2022, Controllable Text Generation for All Ages: Evaluating a Plug-and-Play Approach to Age-Adapted Dialogue. in Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM). ACL Anthology, Association for Computational Linguistics (ACL), pp. 172-188, 2nd Workshop on Natural Language Generation, Evaluation, and Metrics, GEM 2022, as part of EMNLP 2022, Abu Dhabi, United Arab Emirates, 7/12/22. https://doi.org/10.18653/v1/2022.gem-1.14

Jansen L, Laichter ŠL, Sinclair A, van der Goot MJ, Fernández R, Pezzelle S. Controllable Text Generation for All Ages: Evaluating a Plug-and-Play Approach to Age-Adapted Dialogue. In Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM). Association for Computational Linguistics (ACL). 2022. p. 172-188. (ACL Anthology). doi: 10.18653/v1/2022.gem-1.14

@inproceedings{62c7e2a8c5134c0f8375abd2bae90f94,

title = "Controllable Text Generation for All Ages: Evaluating a Plug-and-Play Approach to Age-Adapted Dialogue",

abstract = "To be trusted and perceived as natural and coherent, conversational systems must adapt to the language of their users. While personalized dialogue is a promising direction, controlling generation for fine-grained language features remains a challenge in this approach. A recent line of research showed the effectiveness of leveraging pre-trained language models toward adapting to a text's topic or sentiment. In this study, we build on these approaches and focus on a higher-level dimension of language variation: speakers' age. We frame the task as a dialogue response generation, and test methods based on bag-of-words (BoW) and neural discriminators (Disc) to condition the output of GPT-2 and DialoGPT without altering the parameters of the language models. We show that Disc models achieve a higher degree of detectable control than BoW models based on automatic evaluation. In contrast, humans can partially detect age differences in BoW but not Disc responses. Since BoW responses are deemed better than Disc ones by humans, simple controllable methods thus appear to be a better tradeoff between adaptation and language quality. Our work confirms the challenges of adapting to higher-level dimensions of language variation. Moreover, it highlights the need to evaluate natural language generation thoroughly.",

author = "Lennert Jansen and Laichter, {{\v S}t{\v e}p{\'a}n Lars} and Arabella Sinclair and {van der Goot}, {Margot J.} and Raquel Fern{\'a}ndez and Sandro Pezzelle",

note = "Funding Information: We would like to thank the four anonymous GEM reviewers for their valuable feedback and the participants of our crowdsourcing experiments. The work received funding from the University of Amsterdam{\textquoteright}s Research Priority Area Human(e) AI and from the European Research Council (ERC) under the European Union{\textquoteright}s Horizon 2020 research and innovation programme (grant agreement No. 819455). ; 2nd Workshop on Natural Language Generation, Evaluation, and Metrics, GEM 2022, as part of EMNLP 2022 ; Conference date: 07-12-2022 Through 07-12-2022",

year = "2022",

month = dec,

day = "8",

doi = "10.18653/v1/2022.gem-1.14",

language = "English",

series = "ACL Anthology",

publisher = "Association for Computational Linguistics (ACL)",

pages = "172--188",

booktitle = "Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)",

}

TY - GEN

T1 - Controllable Text Generation for All Ages

T2 - 2nd Workshop on Natural Language Generation, Evaluation, and Metrics, GEM 2022, as part of EMNLP 2022

AU - Jansen, Lennert

AU - Laichter, Štěpán Lars

AU - Sinclair, Arabella

AU - van der Goot, Margot J.

AU - Fernández, Raquel

AU - Pezzelle, Sandro

N1 - Funding Information: We would like to thank the four anonymous GEM reviewers for their valuable feedback and the participants of our crowdsourcing experiments. The work received funding from the University of Amsterdam’s Research Priority Area Human(e) AI and from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 819455).

PY - 2022/12/8

Y1 - 2022/12/8

N2 - To be trusted and perceived as natural and coherent, conversational systems must adapt to the language of their users. While personalized dialogue is a promising direction, controlling generation for fine-grained language features remains a challenge in this approach. A recent line of research showed the effectiveness of leveraging pre-trained language models toward adapting to a text's topic or sentiment. In this study, we build on these approaches and focus on a higher-level dimension of language variation: speakers' age. We frame the task as a dialogue response generation, and test methods based on bag-of-words (BoW) and neural discriminators (Disc) to condition the output of GPT-2 and DialoGPT without altering the parameters of the language models. We show that Disc models achieve a higher degree of detectable control than BoW models based on automatic evaluation. In contrast, humans can partially detect age differences in BoW but not Disc responses. Since BoW responses are deemed better than Disc ones by humans, simple controllable methods thus appear to be a better tradeoff between adaptation and language quality. Our work confirms the challenges of adapting to higher-level dimensions of language variation. Moreover, it highlights the need to evaluate natural language generation thoroughly.

AB - To be trusted and perceived as natural and coherent, conversational systems must adapt to the language of their users. While personalized dialogue is a promising direction, controlling generation for fine-grained language features remains a challenge in this approach. A recent line of research showed the effectiveness of leveraging pre-trained language models toward adapting to a text's topic or sentiment. In this study, we build on these approaches and focus on a higher-level dimension of language variation: speakers' age. We frame the task as a dialogue response generation, and test methods based on bag-of-words (BoW) and neural discriminators (Disc) to condition the output of GPT-2 and DialoGPT without altering the parameters of the language models. We show that Disc models achieve a higher degree of detectable control than BoW models based on automatic evaluation. In contrast, humans can partially detect age differences in BoW but not Disc responses. Since BoW responses are deemed better than Disc ones by humans, simple controllable methods thus appear to be a better tradeoff between adaptation and language quality. Our work confirms the challenges of adapting to higher-level dimensions of language variation. Moreover, it highlights the need to evaluate natural language generation thoroughly.

UR - http://www.scopus.com/inward/record.url?scp=85152890180&partnerID=8YFLogxK

U2 - 10.18653/v1/2022.gem-1.14

DO - 10.18653/v1/2022.gem-1.14

M3 - Published conference contribution

AN - SCOPUS:85152890180

T3 - ACL Anthology

SP - 172

EP - 188

BT - Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)

PB - Association for Computational Linguistics (ACL)

Y2 - 7 December 2022 through 7 December 2022

ER -

Controllable Text Generation for All Ages: Evaluating a Plug-and-Play Approach to Age-Adapted Dialogue

Abstract

Publication series

Conference

Bibliographical note

Access to Document

Other files and links

Fingerprint

Cite this