Are Experts Needed?: On Human Evaluation of Counselling Reflection Generation

Zixiu Wu; Simone Balloccu; Ehud Reiter; Rim Helaoui; Diego Reforgiato Recupero; Daniele Riboni

doi:10.18653/v1/2023.acl-long.382

Are Experts Needed? On Human Evaluation of Counselling Reflection Generation

Zixiu Wu, Simone Balloccu, Ehud Reiter, Rim Helaoui, Diego Reforgiato Recupero, Daniele Riboni

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

2 Citations (Scopus)

1 Downloads (Pure)

Abstract

Reflection is a crucial counselling skill where the therapist conveys to the client their interpretation of what the client said. Language models have recently been used to generate reflections automatically, but human evaluation is challenging, particularly due to the cost of hiring experts. Laypeople-based evaluation is less expensive and easier to scale, but its quality is unknown for reflections. Therefore, we explore whether laypeople can be an alternative to experts in evaluating a fundamental quality aspect: coherence and context-consistency. We do so by asking a group of laypeople and a group of experts to annotate both synthetic reflections and human reflections from actual therapists. We find that both laypeople and experts are reliable annotators and that they have moderate-to-strong inter-group correlation, which shows that laypeople can be trusted for such evaluations. We also discover that GPT-3 mostly produces coherent and consistent reflections, and we explore changes in evaluation results when the source of synthetic reflections changes to GPT-3 from the less powerful GPT-2.

Original language	English
Title of host publication	Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Editors	Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Place of Publication	Toronto, Canada
Publisher	Association for Computational Linguistics
Pages	6906-6930
Number of pages	25
DOIs	https://doi.org/10.18653/v1/2023.acl-long.382
Publication status	Published - 2023
Event	The 61st Annual Meeting of the Association for Computational Linguistics - Toronto, Canada Duration: 9 Jul 2023 → 14 Jul 2023 Conference number: 61 https://2023.aclweb.org/

Conference

Conference	The 61st Annual Meeting of the Association for Computational Linguistics
Country/Territory	Canada
City	Toronto
Period	9/07/23 → 14/07/23
Internet address	https://2023.aclweb.org/

Access to Document

10.18653/v1/2023.acl-long.382Licence: CC BY

Wu_etal_ACL_Are_Experts_Needed_VoR
ACL materials are Copyright © 1963–2023 ACL; other materials are copyrighted by their respective copyright holders. Materials prior to 2016 here are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License.
Final published version, 807 KBLicence: CC BY

Cite this

Wu, Z., Balloccu, S., Reiter, E., Helaoui, R., Reforgiato Recupero, D., & Riboni, D. (2023). Are Experts Needed? On Human Evaluation of Counselling Reflection Generation. In A. Rogers, J. Boyd-Graber, & N. Okazaki (Eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 6906-6930). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.acl-long.382

Are Experts Needed? On Human Evaluation of Counselling Reflection Generation. / Wu, Zixiu; Balloccu, Simone; Reiter, Ehud et al.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). ed. / Anna Rogers; Jordan Boyd-Graber; Naoaki Okazaki. Toronto, Canada: Association for Computational Linguistics, 2023. p. 6906-6930.

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

Wu, Z, Balloccu, S, Reiter, E, Helaoui, R, Reforgiato Recupero, D & Riboni, D 2023, Are Experts Needed? On Human Evaluation of Counselling Reflection Generation. in A Rogers, J Boyd-Graber & N Okazaki (eds), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, pp. 6906-6930, The 61st Annual Meeting of the Association for Computational Linguistics, Toronto, Canada, 9/07/23. https://doi.org/10.18653/v1/2023.acl-long.382

Wu Z, Balloccu S, Reiter E, Helaoui R, Reforgiato Recupero D, Riboni D. Are Experts Needed? On Human Evaluation of Counselling Reflection Generation. In Rogers A, Boyd-Graber J, Okazaki N, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Toronto, Canada: Association for Computational Linguistics. 2023. p. 6906-6930 doi: 10.18653/v1/2023.acl-long.382

Wu, Zixiu ; Balloccu, Simone ; Reiter, Ehud et al. / Are Experts Needed? On Human Evaluation of Counselling Reflection Generation. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). editor / Anna Rogers ; Jordan Boyd-Graber ; Naoaki Okazaki. Toronto, Canada : Association for Computational Linguistics, 2023. pp. 6906-6930

@inproceedings{3a2ccf7737144a16b8294f77768981f0,

title = "Are Experts Needed?: On Human Evaluation of Counselling Reflection Generation",

abstract = "Reflection is a crucial counselling skill where the therapist conveys to the client their interpretation of what the client said. Language models have recently been used to generate reflections automatically, but human evaluation is challenging, particularly due to the cost of hiring experts. Laypeople-based evaluation is less expensive and easier to scale, but its quality is unknown for reflections. Therefore, we explore whether laypeople can be an alternative to experts in evaluating a fundamental quality aspect: coherence and context-consistency. We do so by asking a group of laypeople and a group of experts to annotate both synthetic reflections and human reflections from actual therapists. We find that both laypeople and experts are reliable annotators and that they have moderate-to-strong inter-group correlation, which shows that laypeople can be trusted for such evaluations. We also discover that GPT-3 mostly produces coherent and consistent reflections, and we explore changes in evaluation results when the source of synthetic reflections changes to GPT-3 from the less powerful GPT-2.",

author = "Zixiu Wu and Simone Balloccu and Ehud Reiter and Rim Helaoui and {Reforgiato Recupero}, Diego and Daniele Riboni",

year = "2023",

doi = "10.18653/v1/2023.acl-long.382",

language = "English",

pages = "6906--6930",

editor = "Anna Rogers and Jordan Boyd-Graber and Naoaki Okazaki",

booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",

publisher = "Association for Computational Linguistics",

note = "The 61st Annual Meeting of the Association for Computational Linguistics ; Conference date: 09-07-2023 Through 14-07-2023",

url = "https://2023.aclweb.org/",

}

TY - GEN

T1 - Are Experts Needed?

T2 - The 61st Annual Meeting of the Association for Computational Linguistics

AU - Wu, Zixiu

AU - Balloccu, Simone

AU - Reiter, Ehud

AU - Helaoui, Rim

AU - Reforgiato Recupero, Diego

AU - Riboni, Daniele

N1 - Conference code: 61

PY - 2023

Y1 - 2023

N2 - Reflection is a crucial counselling skill where the therapist conveys to the client their interpretation of what the client said. Language models have recently been used to generate reflections automatically, but human evaluation is challenging, particularly due to the cost of hiring experts. Laypeople-based evaluation is less expensive and easier to scale, but its quality is unknown for reflections. Therefore, we explore whether laypeople can be an alternative to experts in evaluating a fundamental quality aspect: coherence and context-consistency. We do so by asking a group of laypeople and a group of experts to annotate both synthetic reflections and human reflections from actual therapists. We find that both laypeople and experts are reliable annotators and that they have moderate-to-strong inter-group correlation, which shows that laypeople can be trusted for such evaluations. We also discover that GPT-3 mostly produces coherent and consistent reflections, and we explore changes in evaluation results when the source of synthetic reflections changes to GPT-3 from the less powerful GPT-2.

AB - Reflection is a crucial counselling skill where the therapist conveys to the client their interpretation of what the client said. Language models have recently been used to generate reflections automatically, but human evaluation is challenging, particularly due to the cost of hiring experts. Laypeople-based evaluation is less expensive and easier to scale, but its quality is unknown for reflections. Therefore, we explore whether laypeople can be an alternative to experts in evaluating a fundamental quality aspect: coherence and context-consistency. We do so by asking a group of laypeople and a group of experts to annotate both synthetic reflections and human reflections from actual therapists. We find that both laypeople and experts are reliable annotators and that they have moderate-to-strong inter-group correlation, which shows that laypeople can be trusted for such evaluations. We also discover that GPT-3 mostly produces coherent and consistent reflections, and we explore changes in evaluation results when the source of synthetic reflections changes to GPT-3 from the less powerful GPT-2.

U2 - 10.18653/v1/2023.acl-long.382

DO - 10.18653/v1/2023.acl-long.382

M3 - Published conference contribution

SP - 6906

EP - 6930

BT - Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

A2 - Rogers, Anna

A2 - Boyd-Graber, Jordan

A2 - Okazaki, Naoaki

PB - Association for Computational Linguistics

CY - Toronto, Canada

Y2 - 9 July 2023 through 14 July 2023

ER -

Are Experts Needed? On Human Evaluation of Counselling Reflection Generation

Abstract

Conference

Access to Document

Fingerprint

Cite this