Auto-generating textual data stories using data science pipelines

Research output: Chapter in Book/Report/Conference proceedingPublished conference contribution

Abstract

Understanding a dataset directly is challenging but transforming the results of data analysis into data stories could help people build mental models and understand the dataset easily. In this paper, we present a new framework for data-to-text NLG to generate data stories for specific personas. In order to understand the feasibility of this method and if the human generated story is consistent with the story generated by the data science pipelines, we present two experiments: a data story study with 3 financial experts, 4 Ph.D. students, and 20 Amazon Mechanical Turk workers, which offers several data stories generated by humans; and a validation study involving 39 Amazon Mechanical Turk workers who conducted usability and understandability assessments for 9 high-quality data stories, written by humans and machine. We conduct a qualitative analysis of human-written data stories to determine what people consider when writing data stories and if the human generated story is consistent with the one generated by the data science pipeline. The experimental results show that readers comprehend machine-written data stories as well as they comprehend human-written data stories.

Original languageEnglish
Title of host publicationACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence
PublisherAssociation for Computing Machinery
Number of pages8
ISBN (Electronic)9781450385053
DOIs
Publication statusPublished - 25 Dec 2021
Event4th International Conference on Algorithms, Computing and Artificial Intelligence, ACAI 2021 - Sanya, China
Duration: 22 Dec 202124 Dec 2021

Publication series

NameACM International Conference Proceeding Series

Conference

Conference4th International Conference on Algorithms, Computing and Artificial Intelligence, ACAI 2021
Country/TerritoryChina
CitySanya
Period22/12/2124/12/21

Keywords

  • Data science
  • Data sensemaking
  • Data storytelling
  • NLP

Fingerprint

Dive into the research topics of 'Auto-generating textual data stories using data science pipelines'. Together they form a unique fingerprint.

Cite this