Towards objectively evaluating the quality of generated medical summaries

Francesco Moramarco, Aleksandar Savkov, Damir Juric, Ehud Reiter

Research output: Chapter in Book/Report/Conference proceedingPublished conference contribution

1 Citation (Scopus)
3 Downloads (Pure)


We propose a method for evaluating the quality of generated text by asking evaluators to count facts, and computing precision, recall, f-score, and accuracy from the raw counts. We believe this approach leads to a more objective and easier to reproduce evaluation. We apply this to the task of medical report summarisation, where measuring objective quality and accuracy is of paramount importance.
Original languageEnglish
Title of host publicationProceedings of the Workshop on Human Evaluation of NLP Systems (HumEval)
Subtitle of host publicationEACL 2021
EditorsAnya Belz, Shubham Agarwal, Yvette Graham, Ehud Reiter, Anastasia Shimorina
PublisherACL Anthology
Number of pages6
ISBN (Electronic)978-1-954085-10-7
Publication statusPublished - 19 Apr 2021
EventWorkshop on Human Evaluation of NLP Systems - virtual
Duration: 19 Apr 202119 Apr 2021


WorkshopWorkshop on Human Evaluation of NLP Systems
Internet address

Cite this