Evaluating factual accuracy in complex data-to-text

Craig Thomson; Ehud Reiter; Barkavi Sundararajan

doi:10.1016/j.csl.2023.101482

Evaluating factual accuracy in complex data-to-text

Craig Thomson^* (Corresponding Author), Ehud Reiter, Barkavi Sundararajan

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

4 Citations (Scopus)

Abstract

It is essential that data-to-text Natural Language Generation (NLG) systems produce texts which are factually accurate. We examine accuracy issues in the task of generating summaries of basketball games, including what accuracy means in this context, how accuracy errors can be detected by human annotators, as well as the types of accuracy mistakes made by both neural NLG systems and human authors. We also look at the effectiveness of automatic metrics in measuring factual accuracy.

Original language	English
Article number	101482
Number of pages	20
Journal	Computer Speech & Language
Volume	80
Early online date	30 Jan 2023
DOIs	https://doi.org/10.1016/j.csl.2023.101482
Publication status	Published - 1 May 2023

Bibliographical note

We are very grateful for the hard work of the Mechanical Turk annotators who did excellent work and provided helpful feedback. We would like to thank all of the participants in the shared task, the combination of their hard work and diverse approaches has been essential to furthering understanding of the factual accuracy problem in NLG. We would also like to thank Sam Wiseman, Ratish Puduppully, and Clément Rebuffel for providing outputs from their respective systems. The constructive and insightful feedback from the two anonymous reviewers was very helpful and we greatly appreciate their input. We would also like to thank Anya Belz for checking the German translation, as well as Moray Greig, our basketball domain expert. Finally, we would like to thank members of the Aberdeen CLAN group for their advice and feedback. Craig Thomson’s work on this project was supported under an EPSRC NPIF studentship grant (EP/R512412/1).

Data Availability Statement

Data will be made available on request.

Keywords

Natural Language Generation
Complex data-to-text
Evaluation
Annotation
Factual accuracy
Neural data-to-text

Access to Document

10.1016/j.csl.2023.101482Licence: Unspecified

Thomson_etal_CSL_Evaluating_factual_accuracy_AAM
© 2023. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/
Accepted author manuscript, 809 KBLicence: CC BY-NC-ND

https://www.sciencedirect.com/science/article/pii/S0885230823000013

Cite this

@article{214727c6d0bc42558010a8e11333439e,

title = "Evaluating factual accuracy in complex data-to-text",

abstract = "It is essential that data-to-text Natural Language Generation (NLG) systems produce texts which are factually accurate. We examine accuracy issues in the task of generating summaries of basketball games, including what accuracy means in this context, how accuracy errors can be detected by human annotators, as well as the types of accuracy mistakes made by both neural NLG systems and human authors. We also look at the effectiveness of automatic metrics in measuring factual accuracy.",

keywords = "Natural Language Generation, Complex data-to-text, Evaluation, Annotation, Factual accuracy, Neural data-to-text",

author = "Craig Thomson and Ehud Reiter and Barkavi Sundararajan",

note = "We are very grateful for the hard work of the Mechanical Turk annotators who did excellent work and provided helpful feedback. We would like to thank all of the participants in the shared task, the combination of their hard work and diverse approaches has been essential to furthering understanding of the factual accuracy problem in NLG. We would also like to thank Sam Wiseman, Ratish Puduppully, and Cl{\'e}ment Rebuffel for providing outputs from their respective systems. The constructive and insightful feedback from the two anonymous reviewers was very helpful and we greatly appreciate their input. We would also like to thank Anya Belz for checking the German translation, as well as Moray Greig, our basketball domain expert. Finally, we would like to thank members of the Aberdeen CLAN group for their advice and feedback. Craig Thomson{\textquoteright}s work on this project was supported under an EPSRC NPIF studentship grant (EP/R512412/1).",

year = "2023",

month = may,

day = "1",

doi = "10.1016/j.csl.2023.101482",

language = "English",

volume = "80",

journal = "Computer Speech & Language",

issn = "0885-2308",

publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - Evaluating factual accuracy in complex data-to-text

AU - Thomson, Craig

AU - Reiter, Ehud

AU - Sundararajan, Barkavi

N1 - We are very grateful for the hard work of the Mechanical Turk annotators who did excellent work and provided helpful feedback. We would like to thank all of the participants in the shared task, the combination of their hard work and diverse approaches has been essential to furthering understanding of the factual accuracy problem in NLG. We would also like to thank Sam Wiseman, Ratish Puduppully, and Clément Rebuffel for providing outputs from their respective systems. The constructive and insightful feedback from the two anonymous reviewers was very helpful and we greatly appreciate their input. We would also like to thank Anya Belz for checking the German translation, as well as Moray Greig, our basketball domain expert. Finally, we would like to thank members of the Aberdeen CLAN group for their advice and feedback. Craig Thomson’s work on this project was supported under an EPSRC NPIF studentship grant (EP/R512412/1).

PY - 2023/5/1

Y1 - 2023/5/1

N2 - It is essential that data-to-text Natural Language Generation (NLG) systems produce texts which are factually accurate. We examine accuracy issues in the task of generating summaries of basketball games, including what accuracy means in this context, how accuracy errors can be detected by human annotators, as well as the types of accuracy mistakes made by both neural NLG systems and human authors. We also look at the effectiveness of automatic metrics in measuring factual accuracy.

AB - It is essential that data-to-text Natural Language Generation (NLG) systems produce texts which are factually accurate. We examine accuracy issues in the task of generating summaries of basketball games, including what accuracy means in this context, how accuracy errors can be detected by human annotators, as well as the types of accuracy mistakes made by both neural NLG systems and human authors. We also look at the effectiveness of automatic metrics in measuring factual accuracy.

KW - Natural Language Generation

KW - Complex data-to-text

KW - Evaluation

KW - Annotation

KW - Factual accuracy

KW - Neural data-to-text

U2 - 10.1016/j.csl.2023.101482

DO - 10.1016/j.csl.2023.101482

M3 - Article

SN - 0885-2308

VL - 80

JO - Computer Speech & Language

JF - Computer Speech & Language

M1 - 101482

ER -

Evaluating factual accuracy in complex data-to-text

Abstract

Bibliographical note

Data Availability Statement

Keywords

Access to Document

Fingerprint

Cite this