How redundant is it? An empirical analysis on linked datasets

Honghan Wu*, Boris Villazon-Terrazas, Jeff Z. Pan, Jose Manuel Gomez-Perez

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingPublished conference contribution

3 Citations (Scopus)
11 Downloads (Pure)


Data redundancy resides in most, if not all, information systems. Linked Data is no exception. Existing approaches try to avoid data redundancies by proposing compression techniques or succinct data structures. However, data redundancies in Linked Data are useful sometimes, e.g., ontology based data access can make use of A-Box redundancies to avoid unnecessary query rewritings. Either you want to avoid it or make use of it, a good understanding about data redundancies will facilitate your task, e.g., identify the exact redundant parts which could be utilised or choose most effective techniques to compress a particular dataset. Unfortunately, little effort has been put on making the data redundancy explicit to data users. In this paper, we introduce a systematic categorisation for Linked Data redundancy, and propose a graph pattern based approach for efficient analysis. Analysis results on representative datasets lead to a main conclusion, that is redundant-aware techniques are demanded.

Original languageEnglish
Title of host publicationProceedings of the 5th International Workshop on Consuming Linked Data (COLD 2014) co-located with the 13th International Semantic Web Conference (ISWC 2014) Riva del Garda, Italy, October 20, 2014.
EditorsOlaf Hartig, Aidan Hogan, Juan Sequeda
Number of pages12
Publication statusPublished - 7 Oct 2014
Event5th International Workshop on Consuming Linked Data - Riva del Garda, Italy
Duration: 20 Oct 201420 Oct 2014

Publication series

NameCEUR Workshop Proceedings
ISSN (Electronic)1613-0073


Workshop5th International Workshop on Consuming Linked Data
Abbreviated title(COLD 2014)
CityRiva del Garda


Dive into the research topics of 'How redundant is it? An empirical analysis on linked datasets'. Together they form a unique fingerprint.

Cite this