It is essential that data-to-text Natural Language Generation (NLG) systems produce texts which are factually accurate. We examine accuracy issues in the task of generating summaries of basketball games, including what accuracy means in this context, how accuracy errors can be detected by human annotators, as well as the types of accuracy mistakes made by both neural NLG systems and human authors. We also look at the effectiveness of automatic metrics in measuring factual accuracy.
Bibliographical noteWe are very grateful for the hard work of the Mechanical Turk annotators who did excellent work and provided helpful feedback. We would like to thank all of the participants in the shared task, the combination of their hard work and diverse approaches has been essential to furthering understanding of the factual accuracy problem in NLG. We would also like to thank Sam Wiseman, Ratish Puduppully, and Clément Rebuffel for providing outputs from their respective systems. The constructive and insightful feedback from the two anonymous reviewers was very helpful and we greatly appreciate their input. We would also like to thank Anya Belz for checking the German translation, as well as Moray Greig, our basketball domain expert. Finally, we would like to thank members of the Aberdeen CLAN group for their advice and feedback. Craig Thomson’s work on this project was supported under an EPSRC NPIF studentship grant (EP/R512412/1).
Data Availability StatementData will be made available on request.
- Natural Language Generation
- Complex data-to-text
- Factual accuracy
- Neural data-to-text