SportSett: Basketball - A robust and maintainable dataset for Natural Language Generation

Craig Thomson, Ehud Reiter, Somayajulu Sripada

Research output: Contribution to conferenceUnpublished paperpeer-review


Data2Text Natural Language Generation is a complex and varied task. We investigate the data requirements for the difficult real-world problem of generating statistic-focused summaries of basketball games. This has recently been tackled using the Rotowire and Rotowire-FG datasets of paired data and text. It can, however, be difficult to filter, query, and maintain such large volumes of data. In this resource paper, we introduce the SportSett:Basketball database. This easy-to-use resource allows for simple scripts to be written which generate data in suitable formats for a variety of systems. Building upon the existing
data, we provide more attributes, across multiple dimensions, increasing the overlap of content between data and text. We also highlight and resolve issues of training, validation and test partition contamination in these previous datasets
Original languageEnglish
Publication statusAccepted/In press - 17 Aug 2020
EventIntelLanG : Intelligent Information Processing and Natural Language Generation - Santiago de Compostela, Spain
Duration: 7 Sept 20207 Sept 2020


CitySantiago de Compostela
Internet address


Dive into the research topics of 'SportSett: Basketball - A robust and maintainable dataset for Natural Language Generation'. Together they form a unique fingerprint.

Cite this