Towards a Transparent and an Environmental-Friendly Approach for Short Text Topic Detection: A Comparison of Methods for Performance, Transparency, and Carbon Footprint

Sami Al Sulaimani; Andrew Starkey

doi:10.12720/jait.14.6.1240-1253

Towards a Transparent and an Environmental-Friendly Approach for Short Text Topic Detection: A Comparison of Methods for Performance, Transparency, and Carbon Footprint

Sami Al Sulaimani^* (Corresponding Author), Andrew Starkey

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

—Online social media platforms have contributed sig-nificantly t o t he dissemination o fuser-generated information. Many studies have proposed various techniques to analyze publicly available short texts to automatically extract topics. The majority of these works have mainly focused on the competitive performance of the proposed approaches. In this paper, our main focus is on how to tackle this problem by incorporating two other important qualities: Transparency and Carbon Footprint. These two pillars are cornerstones to fulfill the emerging international demands and to adhere to the new regulations, such as “Right to Explanation” and “Green AI”. Based on these three qualities, this paper compares the most prominent algorithms in this field (specifically within the category of unsupervised-retrospective learning), such as: Latent Dirichlet Allocation, Non-Negative Matrix Factoriza-tion, and K-Means, as well as two most recent approaches, such as: BERTopic and Contextual Analysis. By using two different datasets, the methods were evaluated for Perfor-mance. On average, the results show that BERTopic is the best-performing approach overall in terms of Performance. However, Contextual Analysis achieves the best Performance in one of the two datasets used. When considering the three qualities together, the results demonstrate the effectiveness and the benefits of the Contextual Analysis method t owards a more transparent and greener approach for the topic detection task.

Original language	English
Pages (from-to)	1240-1253
Number of pages	14
Journal	Journal of Advances in Information Technology
Volume	14
Issue number	6
Early online date	22 Nov 2023
DOIs	https://doi.org/10.12720/jait.14.6.1240-1253
Publication status	Published - 2023

Keywords

carbon footprint
contextual analysis
explainability
text analysis
topic detection
transparency
unsupervised machine learning

Access to Document

10.12720/jait.14.6.1240-1253Licence: CC BY-NC-ND

Al-Sulamani_etStarkey_JAIT_Towards_a_Transparent_VOR
is is an open access article distributed under the Creative Commons Attribution License (CC BY-NC-ND 4.0), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.
Final published version, 1.9 MBLicence: CC BY-NC-ND

Cite this

Towards a Transparent and an Environmental-Friendly Approach for Short Text Topic Detection: A Comparison of Methods for Performance, Transparency, and Carbon Footprint. / Al Sulaimani, Sami (Corresponding Author); Starkey, Andrew.
In: Journal of Advances in Information Technology, Vol. 14, No. 6, 2023, p. 1240-1253.

Research output: Contribution to journal › Article › peer-review

@article{c6217f7b09584dd1a1b17e4cc232a43e,

title = "Towards a Transparent and an Environmental-Friendly Approach for Short Text Topic Detection: A Comparison of Methods for Performance, Transparency, and Carbon Footprint",

abstract = "—Online social media platforms have contributed sig-nificantly t o t he dissemination o fuser-generated information. Many studies have proposed various techniques to analyze publicly available short texts to automatically extract topics. The majority of these works have mainly focused on the competitive performance of the proposed approaches. In this paper, our main focus is on how to tackle this problem by incorporating two other important qualities: Transparency and Carbon Footprint. These two pillars are cornerstones to fulfill the emerging international demands and to adhere to the new regulations, such as “Right to Explanation” and “Green AI”. Based on these three qualities, this paper compares the most prominent algorithms in this field (specifically within the category of unsupervised-retrospective learning), such as: Latent Dirichlet Allocation, Non-Negative Matrix Factoriza-tion, and K-Means, as well as two most recent approaches, such as: BERTopic and Contextual Analysis. By using two different datasets, the methods were evaluated for Perfor-mance. On average, the results show that BERTopic is the best-performing approach overall in terms of Performance. However, Contextual Analysis achieves the best Performance in one of the two datasets used. When considering the three qualities together, the results demonstrate the effectiveness and the benefits of the Contextual Analysis method t owards a more transparent and greener approach for the topic detection task.",

keywords = "carbon footprint, contextual analysis, explainability, text analysis, topic detection, transparency, unsupervised machine learning",

author = "{Al Sulaimani}, Sami and Andrew Starkey",

year = "2023",

doi = "10.12720/jait.14.6.1240-1253",

language = "English",

volume = "14",

pages = "1240--1253",

journal = "Journal of Advances in Information Technology",

issn = "1798-2340",

publisher = "Engineering and Technology Publishing",

number = "6",

}

TY - JOUR

T1 - Towards a Transparent and an Environmental-Friendly Approach for Short Text Topic Detection

T2 - A Comparison of Methods for Performance, Transparency, and Carbon Footprint

AU - Al Sulaimani, Sami

AU - Starkey, Andrew

PY - 2023

Y1 - 2023

N2 - —Online social media platforms have contributed sig-nificantly t o t he dissemination o fuser-generated information. Many studies have proposed various techniques to analyze publicly available short texts to automatically extract topics. The majority of these works have mainly focused on the competitive performance of the proposed approaches. In this paper, our main focus is on how to tackle this problem by incorporating two other important qualities: Transparency and Carbon Footprint. These two pillars are cornerstones to fulfill the emerging international demands and to adhere to the new regulations, such as “Right to Explanation” and “Green AI”. Based on these three qualities, this paper compares the most prominent algorithms in this field (specifically within the category of unsupervised-retrospective learning), such as: Latent Dirichlet Allocation, Non-Negative Matrix Factoriza-tion, and K-Means, as well as two most recent approaches, such as: BERTopic and Contextual Analysis. By using two different datasets, the methods were evaluated for Perfor-mance. On average, the results show that BERTopic is the best-performing approach overall in terms of Performance. However, Contextual Analysis achieves the best Performance in one of the two datasets used. When considering the three qualities together, the results demonstrate the effectiveness and the benefits of the Contextual Analysis method t owards a more transparent and greener approach for the topic detection task.

AB - —Online social media platforms have contributed sig-nificantly t o t he dissemination o fuser-generated information. Many studies have proposed various techniques to analyze publicly available short texts to automatically extract topics. The majority of these works have mainly focused on the competitive performance of the proposed approaches. In this paper, our main focus is on how to tackle this problem by incorporating two other important qualities: Transparency and Carbon Footprint. These two pillars are cornerstones to fulfill the emerging international demands and to adhere to the new regulations, such as “Right to Explanation” and “Green AI”. Based on these three qualities, this paper compares the most prominent algorithms in this field (specifically within the category of unsupervised-retrospective learning), such as: Latent Dirichlet Allocation, Non-Negative Matrix Factoriza-tion, and K-Means, as well as two most recent approaches, such as: BERTopic and Contextual Analysis. By using two different datasets, the methods were evaluated for Perfor-mance. On average, the results show that BERTopic is the best-performing approach overall in terms of Performance. However, Contextual Analysis achieves the best Performance in one of the two datasets used. When considering the three qualities together, the results demonstrate the effectiveness and the benefits of the Contextual Analysis method t owards a more transparent and greener approach for the topic detection task.

KW - carbon footprint

KW - contextual analysis

KW - explainability

KW - text analysis

KW - topic detection

KW - transparency

KW - unsupervised machine learning

UR - http://www.scopus.com/inward/record.url?scp=85177661188&partnerID=8YFLogxK

U2 - 10.12720/jait.14.6.1240-1253

DO - 10.12720/jait.14.6.1240-1253

M3 - Article

AN - SCOPUS:85177661188

SN - 1798-2340

VL - 14

SP - 1240

EP - 1253

JO - Journal of Advances in Information Technology

JF - Journal of Advances in Information Technology

IS - 6

ER -

Towards a Transparent and an Environmental-Friendly Approach for Short Text Topic Detection: A Comparison of Methods for Performance, Transparency, and Carbon Footprint

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this