The Accuracy Evaluation Shared Task as a Retrospective Reproduction Study

Craig Thomson, Ehud Reiter

Research output: Chapter in Book/Report/Conference proceedingPublished conference contribution

3 Downloads (Pure)


We investigate the data collected for the Accuracy Evaluation Shared Task as a retrospective reproduction study. The shared task was based upon errors found by human annotation of com- puter generated summaries of basketball games. Annotation was performed in three separate stages, with texts taken from the same three systems and checked for errors by the same three annotators. We show that the mean count of errors was consistent at the highest level for each experiment, with increased variance when looking at per-system and/or per-error- type breakdowns.
Original languageEnglish
Title of host publicationProceedings of the 15th International Conference on Natural Language Generation: Generation Challenges
Place of PublicationWaterville, Maine, USA and virtual meeting
PublisherAssociation for Computational Linguistics
Number of pages9
Publication statusPublished - 1 Jul 2022


Dive into the research topics of 'The Accuracy Evaluation Shared Task as a Retrospective Reproduction Study'. Together they form a unique fingerprint.

Cite this