Can Complexity Measures and Instance Hardness Measures Reflect the Actual Complexity of Microarray Data?

Omaimah Al Hosni, Andrew Starkey* (Corresponding Author)

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingPublished conference contribution

Abstract

Despite the significant contribution of the research community in the context of the Microarray data analysis, little attention has been made in understanding the Microarray dataset characteristics using Complexity Measures and Instance Hardness Measures; thus, this study aims to examine the performance of both datasets with Microarray properties. The study assumes that since these measures are data dependent, they might also be negatively affected by complex data characteristics -like the classification algorithm- and provide values that do not reflect the actual data complexity. To investigate this, we have adopted a different experiment strategy than other works undertaken in this context by using a controlled environment with synthetic data that match Microarray properties to assess the effect of each data challenge individually without relying on the classification algorithm performance. The study argues that the experiment strategy adopted by others in correlating the classification algorithm performance to the performance of the measures is not a good independent indicator for validating the measures performance in estimating the actual data difficulty nor for showing the causes of the poor prediction of the learning algorithm’s performance as both are data dependant. The experiment outcomes indicate that among 35 measures covered in this study the measures responded differently to each data challenge due to the different assumptions they adopted and their sensitivity to the different data challenges. Thus, the study has confirmed that complex data characteristics result in the measures not reflecting the actual data complexity.

Original languageEnglish
Title of host publicationMachine Learning, Optimization, and Data Science - 9th International Conference, LOD 2023
EditorsGiuseppe Nicosia, Varun Ojha, Emanuele La Malfa, Gabriele La Malfa, Panos M. Pardalos, Renato Umeton
PublisherSpringer Science and Business Media Deutschland GmbH
Pages445-462
Number of pages18
ISBN (Print)9783031539688
DOIs
Publication statusPublished - 16 Feb 2024
Event9th International Conference on Machine Learning, Optimization, and Data Science, LOD 2023 - Grasmere, United Kingdom
Duration: 22 Sept 202326 Sept 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14505 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference9th International Conference on Machine Learning, Optimization, and Data Science, LOD 2023
Country/TerritoryUnited Kingdom
CityGrasmere
Period22/09/2326/09/23

Keywords

  • Complexity Measures
  • High Dimensionality
  • Imbalanced Classes
  • Instance Hardness Measures
  • Small Sample size

Fingerprint

Dive into the research topics of 'Can Complexity Measures and Instance Hardness Measures Reflect the Actual Complexity of Microarray Data?'. Together they form a unique fingerprint.

Cite this