Complementing privacy and utility trade-off with self-organising maps

Kabiru Mohammed* (Corresponding Author), Aladdin Ayesh, Eerke Boiten

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)
31 Downloads (Pure)

Abstract

In recent years, data-enabled technologies have intensified the rate and scale at which organisations collect and analyse data. Data mining techniques are applied to realise the full potential of large-scale data analysis. These techniques are highly efficient in sifting through big data to extract hidden knowledge and assist evidence-based decisions, offering significant benefits to their adopters. However, this capability is constrained by important legal, ethical and reputational concerns. These concerns arise because they can be exploited to allow inferences to be made on sensitive data, thus posing severe threats to individuals’ privacy. Studies have shown Privacy-Preserving Data Mining (PPDM) can adequately address this privacy risk and permit knowledge extraction in mining processes. Several published works in this area have utilised clustering techniques to enforce anonymisation models on private data, which work by grouping the data into clusters using a quality measure and generalising the data in each group separately to achieve an anonymisation threshold. However, existing approaches do not work well with high-dimensional data, since it is difficult to develop good groupings without incurring excessive information loss. Our work aims to complement this balancing act by optimising utility in PPDM processes. To illustrate this, we propose a hybrid approach, that combines self-organising maps with conventional privacy-based clustering algorithms. We demonstrate through experimental evaluation, that results from our approach produce more utility for data mining tasks and outperforms conventional privacy-based clustering algorithms. This approach can significantly enable large-scale analysis of data in a privacy-preserving and trustworthy manner.

Original languageEnglish
Article number20
JournalCryptography
Volume5
Issue number3
Early online date17 Aug 2021
DOIs
Publication statusPublished - Sept 2021
Externally publishedYes

Bibliographical note

This research received no external funding.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: UCI Machine Learning Repository [https://archive.ics.uci.edu/ml/datasets/Adult], accessed on 7 January 2021.

Keywords

  • Clustering
  • K-anonymity
  • Privacy preserving data mining
  • Self-organising map

Fingerprint

Dive into the research topics of 'Complementing privacy and utility trade-off with self-organising maps'. Together they form a unique fingerprint.

Cite this