TY - GEN
T1 - Auto-generating textual data stories using data science pipelines
AU - Wang, Ruilin
AU - Gowri Sripada, Somayajulu
AU - Beacham, Nigel
PY - 2021/12/25
Y1 - 2021/12/25
N2 - Understanding a dataset directly is challenging but transforming the results of data analysis into data stories could help people build mental models and understand the dataset easily. In this paper, we present a new framework for data-to-text NLG to generate data stories for specific personas. In order to understand the feasibility of this method and if the human generated story is consistent with the story generated by the data science pipelines, we present two experiments: a data story study with 3 financial experts, 4 Ph.D. students, and 20 Amazon Mechanical Turk workers, which offers several data stories generated by humans; and a validation study involving 39 Amazon Mechanical Turk workers who conducted usability and understandability assessments for 9 high-quality data stories, written by humans and machine. We conduct a qualitative analysis of human-written data stories to determine what people consider when writing data stories and if the human generated story is consistent with the one generated by the data science pipeline. The experimental results show that readers comprehend machine-written data stories as well as they comprehend human-written data stories.
AB - Understanding a dataset directly is challenging but transforming the results of data analysis into data stories could help people build mental models and understand the dataset easily. In this paper, we present a new framework for data-to-text NLG to generate data stories for specific personas. In order to understand the feasibility of this method and if the human generated story is consistent with the story generated by the data science pipelines, we present two experiments: a data story study with 3 financial experts, 4 Ph.D. students, and 20 Amazon Mechanical Turk workers, which offers several data stories generated by humans; and a validation study involving 39 Amazon Mechanical Turk workers who conducted usability and understandability assessments for 9 high-quality data stories, written by humans and machine. We conduct a qualitative analysis of human-written data stories to determine what people consider when writing data stories and if the human generated story is consistent with the one generated by the data science pipeline. The experimental results show that readers comprehend machine-written data stories as well as they comprehend human-written data stories.
KW - Data science
KW - Data sensemaking
KW - Data storytelling
KW - NLP
UR - http://www.scopus.com/inward/record.url?scp=85125891701&partnerID=8YFLogxK
U2 - 10.1145/3508546.3508642
DO - 10.1145/3508546.3508642
M3 - Published conference contribution
AN - SCOPUS:85125891701
T3 - ACM International Conference Proceeding Series
BT - ACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence
PB - Association for Computing Machinery
T2 - 4th International Conference on Algorithms, Computing and Artificial Intelligence, ACAI 2021
Y2 - 22 December 2021 through 24 December 2021
ER -