Investigating the Performance of Data Complexity & Instance Hardness Measures as A Meta-Feature in Overlapping Classes Problem

Omaimah Al Hosni; Andrew Starkey

doi:10.1145/3616131.3616132

Investigating the Performance of Data Complexity & Instance Hardness Measures as A Meta-Feature in Overlapping Classes Problem

Omaimah Al Hosni^*, Andrew Starkey

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

1 Downloads (Pure)

Abstract

Since the meta-learning recommendation's quality depends on the meta-features decision quality, a common problem in meta-learning is establishing a (good) collection of meta-features that best represent the dataset properties. Therefore, many meta-feature measures/methods have been proposed during the last decade to describe the characteristics of the data. However, little attention has been paid to validating the meta-feature decisions in reflecting the actual data properties. In particular, if the meta-feature analysis is negatively affected by complex data characteristics, such as class overlap due to the distortion imposed by the noisy features at the decision boundary of the classes and thereby produces biased meta-learning recommendations that do not match the actual data characteristics (either by overestimating or underestimating the complexity). Hence, this issue is crucial to ensure the success of the meta-learning model since the learning algorithm selection decision is based on meta-feature analysis. Based on that, in this work, we aim to investigate this by assessing the performance of Complexity Measures (global/data-level measures) & Instance Hardness Measures (local/instance-level measures) as a meta-feature in reflecting the actual data complexity associated with the high-class overlapping problem. The reason for focusing on the overlapping classes problem is that several studies have proven that this data issue significantly contributes to degrading prediction accuracy, with which most real-world datasets are associated. On the other hand, the motivation for using the above measures among different meta-feature methods proposed in the literature is that since this study aims to focus on the overlapping classes problem, the above measures are mainly proposed to estimate the data complexity according to the geometrical descriptions focusing on the class overlap imposed by feature values, in which match the data problem that the study interested to investigate.

Original language	English
Title of host publication	ICCBDC '23
Subtitle of host publication	Proceedings of the 2023 7th International Conference on Cloud and Big Data Computing
Place of Publication	New York
Publisher	Association for Computing Machinery
Pages	1-9
Number of pages	9
ISBN (Electronic)	9798400707339
DOIs	https://doi.org/10.1145/3616131.3616132
Publication status	Published - 17 Aug 2023
Event	2023 7th International Conference on Cloud and Big Data Computing, ICCBDC 2023 - Manchester, United Kingdom Duration: 17 Aug 2023 → 19 Aug 2023

Publication series

Name	ACM International Conference Proceeding Series

Conference

Conference	2023 7th International Conference on Cloud and Big Data Computing, ICCBDC 2023
Country/Territory	United Kingdom
City	Manchester
Period	17/08/23 → 19/08/23

Bibliographical note

Open Access via the ACM Agreement

Keywords

Class Overlapping
Data Complexity Measure
Instance Hardness Measures
Meta-Feature
Meta-Learning

Access to Document

10.1145/3616131.3616132Licence: CC BY

Hosni_etal_ACM_Investigating_performance_data_VOR
Copyright © 2023 Owner/Author This work is licensed under a Creative Commons Attribution International 4.0 License. https://creativecommons.org/licenses/by/4.0/
Final published version, 597 KBLicence: CC BY

Cite this

Al Hosni, O., & Starkey, A. (2023). Investigating the Performance of Data Complexity & Instance Hardness Measures as A Meta-Feature in Overlapping Classes Problem. In ICCBDC '23: Proceedings of the 2023 7th International Conference on Cloud and Big Data Computing (pp. 1-9). (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3616131.3616132

Investigating the Performance of Data Complexity & Instance Hardness Measures as A Meta-Feature in Overlapping Classes Problem. / Al Hosni, Omaimah; Starkey, Andrew.
ICCBDC '23: Proceedings of the 2023 7th International Conference on Cloud and Big Data Computing. New York: Association for Computing Machinery, 2023. p. 1-9 (ACM International Conference Proceeding Series).

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

Al Hosni, O & Starkey, A 2023, Investigating the Performance of Data Complexity & Instance Hardness Measures as A Meta-Feature in Overlapping Classes Problem. in ICCBDC '23: Proceedings of the 2023 7th International Conference on Cloud and Big Data Computing. ACM International Conference Proceeding Series, Association for Computing Machinery, New York, pp. 1-9, 2023 7th International Conference on Cloud and Big Data Computing, ICCBDC 2023, Manchester, United Kingdom, 17/08/23. https://doi.org/10.1145/3616131.3616132

Al Hosni O, Starkey A. Investigating the Performance of Data Complexity & Instance Hardness Measures as A Meta-Feature in Overlapping Classes Problem. In ICCBDC '23: Proceedings of the 2023 7th International Conference on Cloud and Big Data Computing. New York: Association for Computing Machinery. 2023. p. 1-9. (ACM International Conference Proceeding Series). doi: 10.1145/3616131.3616132

Al Hosni, Omaimah ; Starkey, Andrew. / Investigating the Performance of Data Complexity & Instance Hardness Measures as A Meta-Feature in Overlapping Classes Problem. ICCBDC '23: Proceedings of the 2023 7th International Conference on Cloud and Big Data Computing. New York : Association for Computing Machinery, 2023. pp. 1-9 (ACM International Conference Proceeding Series).

@inproceedings{ec6154fa713348f49c6a607ffe03f1bf,

title = "Investigating the Performance of Data Complexity & Instance Hardness Measures as A Meta-Feature in Overlapping Classes Problem",

abstract = "Since the meta-learning recommendation's quality depends on the meta-features decision quality, a common problem in meta-learning is establishing a (good) collection of meta-features that best represent the dataset properties. Therefore, many meta-feature measures/methods have been proposed during the last decade to describe the characteristics of the data. However, little attention has been paid to validating the meta-feature decisions in reflecting the actual data properties. In particular, if the meta-feature analysis is negatively affected by complex data characteristics, such as class overlap due to the distortion imposed by the noisy features at the decision boundary of the classes and thereby produces biased meta-learning recommendations that do not match the actual data characteristics (either by overestimating or underestimating the complexity). Hence, this issue is crucial to ensure the success of the meta-learning model since the learning algorithm selection decision is based on meta-feature analysis. Based on that, in this work, we aim to investigate this by assessing the performance of Complexity Measures (global/data-level measures) & Instance Hardness Measures (local/instance-level measures) as a meta-feature in reflecting the actual data complexity associated with the high-class overlapping problem. The reason for focusing on the overlapping classes problem is that several studies have proven that this data issue significantly contributes to degrading prediction accuracy, with which most real-world datasets are associated. On the other hand, the motivation for using the above measures among different meta-feature methods proposed in the literature is that since this study aims to focus on the overlapping classes problem, the above measures are mainly proposed to estimate the data complexity according to the geometrical descriptions focusing on the class overlap imposed by feature values, in which match the data problem that the study interested to investigate.",

keywords = "Class Overlapping, Data Complexity Measure, Instance Hardness Measures, Meta-Feature, Meta-Learning",

author = "{Al Hosni}, Omaimah and Andrew Starkey",

note = "Open Access via the ACM Agreement; 2023 7th International Conference on Cloud and Big Data Computing, ICCBDC 2023 ; Conference date: 17-08-2023 Through 19-08-2023",

year = "2023",

month = aug,

day = "17",

doi = "10.1145/3616131.3616132",

language = "English",

series = "ACM International Conference Proceeding Series",

publisher = "Association for Computing Machinery",

pages = "1--9",

booktitle = "ICCBDC '23",

}

TY - GEN

T1 - Investigating the Performance of Data Complexity & Instance Hardness Measures as A Meta-Feature in Overlapping Classes Problem

AU - Al Hosni, Omaimah

AU - Starkey, Andrew

N1 - Open Access via the ACM Agreement

PY - 2023/8/17

Y1 - 2023/8/17

N2 - Since the meta-learning recommendation's quality depends on the meta-features decision quality, a common problem in meta-learning is establishing a (good) collection of meta-features that best represent the dataset properties. Therefore, many meta-feature measures/methods have been proposed during the last decade to describe the characteristics of the data. However, little attention has been paid to validating the meta-feature decisions in reflecting the actual data properties. In particular, if the meta-feature analysis is negatively affected by complex data characteristics, such as class overlap due to the distortion imposed by the noisy features at the decision boundary of the classes and thereby produces biased meta-learning recommendations that do not match the actual data characteristics (either by overestimating or underestimating the complexity). Hence, this issue is crucial to ensure the success of the meta-learning model since the learning algorithm selection decision is based on meta-feature analysis. Based on that, in this work, we aim to investigate this by assessing the performance of Complexity Measures (global/data-level measures) & Instance Hardness Measures (local/instance-level measures) as a meta-feature in reflecting the actual data complexity associated with the high-class overlapping problem. The reason for focusing on the overlapping classes problem is that several studies have proven that this data issue significantly contributes to degrading prediction accuracy, with which most real-world datasets are associated. On the other hand, the motivation for using the above measures among different meta-feature methods proposed in the literature is that since this study aims to focus on the overlapping classes problem, the above measures are mainly proposed to estimate the data complexity according to the geometrical descriptions focusing on the class overlap imposed by feature values, in which match the data problem that the study interested to investigate.

AB - Since the meta-learning recommendation's quality depends on the meta-features decision quality, a common problem in meta-learning is establishing a (good) collection of meta-features that best represent the dataset properties. Therefore, many meta-feature measures/methods have been proposed during the last decade to describe the characteristics of the data. However, little attention has been paid to validating the meta-feature decisions in reflecting the actual data properties. In particular, if the meta-feature analysis is negatively affected by complex data characteristics, such as class overlap due to the distortion imposed by the noisy features at the decision boundary of the classes and thereby produces biased meta-learning recommendations that do not match the actual data characteristics (either by overestimating or underestimating the complexity). Hence, this issue is crucial to ensure the success of the meta-learning model since the learning algorithm selection decision is based on meta-feature analysis. Based on that, in this work, we aim to investigate this by assessing the performance of Complexity Measures (global/data-level measures) & Instance Hardness Measures (local/instance-level measures) as a meta-feature in reflecting the actual data complexity associated with the high-class overlapping problem. The reason for focusing on the overlapping classes problem is that several studies have proven that this data issue significantly contributes to degrading prediction accuracy, with which most real-world datasets are associated. On the other hand, the motivation for using the above measures among different meta-feature methods proposed in the literature is that since this study aims to focus on the overlapping classes problem, the above measures are mainly proposed to estimate the data complexity according to the geometrical descriptions focusing on the class overlap imposed by feature values, in which match the data problem that the study interested to investigate.

KW - Class Overlapping

KW - Data Complexity Measure

KW - Instance Hardness Measures

KW - Meta-Feature

KW - Meta-Learning

UR - http://www.scopus.com/inward/record.url?scp=85176012085&partnerID=8YFLogxK

U2 - 10.1145/3616131.3616132

DO - 10.1145/3616131.3616132

M3 - Published conference contribution

AN - SCOPUS:85176012085

T3 - ACM International Conference Proceeding Series

SP - 1

EP - 9

BT - ICCBDC '23

PB - Association for Computing Machinery

CY - New York

T2 - 2023 7th International Conference on Cloud and Big Data Computing, ICCBDC 2023

Y2 - 17 August 2023 through 19 August 2023

ER -

Investigating the Performance of Data Complexity & Instance Hardness Measures as A Meta-Feature in Overlapping Classes Problem

Abstract

Publication series

Conference

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this