Evaluating factual accuracy in complex data-to-text

Craig Thomson* (Corresponding Author), Ehud Reiter, Barkavi Sundararajan

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


It is essential that data-to-text Natural Language Generation (NLG) systems produce texts which are factually accurate. We examine accuracy issues in the task of generating summaries of basketball games, including what accuracy means in this context, how accuracy errors can be detected by human annotators, as well as the types of accuracy mistakes made by both neural NLG systems and human authors. We also look at the effectiveness of automatic metrics in measuring factual accuracy.
Original languageEnglish
Article number101482
Number of pages20
JournalComputer Speech & Language
Early online date30 Jan 2023
Publication statusPublished - 1 May 2023

Bibliographical note

We are very grateful for the hard work of the Mechanical Turk annotators who did excellent work and provided helpful feedback. We would like to thank all of the participants in the shared task, the combination of their hard work and diverse approaches has been essential to furthering understanding of the factual accuracy problem in NLG. We would also like to thank Sam Wiseman, Ratish Puduppully, and Clément Rebuffel for providing outputs from their respective systems. The constructive and insightful feedback from the two anonymous reviewers was very helpful and we greatly appreciate their input. We would also like to thank Anya Belz for checking the German translation, as well as Moray Greig, our basketball domain expert. Finally, we would like to thank members of the Aberdeen CLAN group for their advice and feedback. Craig Thomson’s work on this project was supported under an EPSRC NPIF studentship grant (EP/R512412/1).

Data Availability Statement

Data will be made available on request.


  • Natural Language Generation
  • Complex data-to-text
  • Evaluation
  • Annotation
  • Factual accuracy
  • Neural data-to-text


Dive into the research topics of 'Evaluating factual accuracy in complex data-to-text'. Together they form a unique fingerprint.

Cite this