Abstract
We propose a method for evaluating the quality of generated text by asking evaluators to count facts, and computing precision, recall, f-score, and accuracy from the raw counts. We believe this approach leads to a more objective and easier to reproduce evaluation. We apply this to the task of medical report summarisation, where measuring objective quality and accuracy is of paramount importance.
Original language | English |
---|---|
Title of host publication | Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval) |
Subtitle of host publication | EACL 2021 |
Editors | Anya Belz, Shubham Agarwal, Yvette Graham, Ehud Reiter, Anastasia Shimorina |
Publisher | ACL Anthology |
Pages | 56-61 |
Number of pages | 6 |
ISBN (Electronic) | 978-1-954085-10-7 |
Publication status | Published - 19 Apr 2021 |
Event | Workshop on Human Evaluation of NLP Systems - virtual Duration: 19 Apr 2021 → 19 Apr 2021 https://www.virtual2021.eacl.org/workshop_WS-5.html |
Workshop
Workshop | Workshop on Human Evaluation of NLP Systems |
---|---|
Period | 19/04/21 → 19/04/21 |
Internet address |