TY - JOUR
T1 - Towards a Transparent and an Environmental-Friendly Approach for Short Text Topic Detection
T2 - A Comparison of Methods for Performance, Transparency, and Carbon Footprint
AU - Al Sulaimani, Sami
AU - Starkey, Andrew
PY - 2023
Y1 - 2023
N2 - —Online social media platforms have contributed sig-nificantly t o t he dissemination o fuser-generated information. Many studies have proposed various techniques to analyze publicly available short texts to automatically extract topics. The majority of these works have mainly focused on the competitive performance of the proposed approaches. In this paper, our main focus is on how to tackle this problem by incorporating two other important qualities: Transparency and Carbon Footprint. These two pillars are cornerstones to fulfill the emerging international demands and to adhere to the new regulations, such as “Right to Explanation” and “Green AI”. Based on these three qualities, this paper compares the most prominent algorithms in this field (specifically within the category of unsupervised-retrospective learning), such as: Latent Dirichlet Allocation, Non-Negative Matrix Factoriza-tion, and K-Means, as well as two most recent approaches, such as: BERTopic and Contextual Analysis. By using two different datasets, the methods were evaluated for Perfor-mance. On average, the results show that BERTopic is the best-performing approach overall in terms of Performance. However, Contextual Analysis achieves the best Performance in one of the two datasets used. When considering the three qualities together, the results demonstrate the effectiveness and the benefits of the Contextual Analysis method t owards a more transparent and greener approach for the topic detection task.
AB - —Online social media platforms have contributed sig-nificantly t o t he dissemination o fuser-generated information. Many studies have proposed various techniques to analyze publicly available short texts to automatically extract topics. The majority of these works have mainly focused on the competitive performance of the proposed approaches. In this paper, our main focus is on how to tackle this problem by incorporating two other important qualities: Transparency and Carbon Footprint. These two pillars are cornerstones to fulfill the emerging international demands and to adhere to the new regulations, such as “Right to Explanation” and “Green AI”. Based on these three qualities, this paper compares the most prominent algorithms in this field (specifically within the category of unsupervised-retrospective learning), such as: Latent Dirichlet Allocation, Non-Negative Matrix Factoriza-tion, and K-Means, as well as two most recent approaches, such as: BERTopic and Contextual Analysis. By using two different datasets, the methods were evaluated for Perfor-mance. On average, the results show that BERTopic is the best-performing approach overall in terms of Performance. However, Contextual Analysis achieves the best Performance in one of the two datasets used. When considering the three qualities together, the results demonstrate the effectiveness and the benefits of the Contextual Analysis method t owards a more transparent and greener approach for the topic detection task.
KW - carbon footprint
KW - contextual analysis
KW - explainability
KW - text analysis
KW - topic detection
KW - transparency
KW - unsupervised machine learning
UR - http://www.scopus.com/inward/record.url?scp=85177661188&partnerID=8YFLogxK
U2 - 10.12720/jait.14.6.1240-1253
DO - 10.12720/jait.14.6.1240-1253
M3 - Article
AN - SCOPUS:85177661188
SN - 1798-2340
VL - 14
SP - 1240
EP - 1253
JO - Journal of Advances in Information Technology
JF - Journal of Advances in Information Technology
IS - 6
ER -