TOPIC ANALYSIS OF STUDENT FEEDBACK ON LEARNING MANAGEMENT SYSTEMS USING BERTOPIC: A COMPARATIVE STUDY OF INDOBERT, DISTILBERT, AND SBERT

Aldi Aditya Perdana; Sajarwo Anggai; Winarni

doi:10.15575/jb.v4i2.54192

Authors

Aldi Aditya Perdana Universitas Pamulang, Tangerang Selatan, Indonesia
Sajarwo Anggai Universitas Pamulang, Tangerang Selatan, Indonesia
Winarni Universitas Pamulang, Tangerang Selatan, Indonesia

DOI:

https://doi.org/10.15575/jb.v4i2.54192

Abstract

The widespread adoption of Learning Management Systems (LMS) in digital education has generated large volumes of student feedback in the form of unstructured free-text data, making manual analysis increasingly impractical. This study aims to identify the dominant themes emerging from student feedback on LMS platforms and to compare the performance of different Transformer-based embedding models in topic modeling tasks. The proposed approach employs BERTopic with three embedding models, namely IndoBERT, DistilBERT, and Sentence-BERT (SBERT). Student feedback data were collected from an institutional LMS and processed through text preprocessing, embedding generation, and topic modeling stages. Model performance was evaluated using multiple coherence metrics (c_v, c_npmi, u_mass, and c_uci), topic diversity, and the proportion of outlier documents. The results indicate that the IndoBERT-family model achieves the highest coherence scores, particularly in c_v and c_npmi, suggesting superior semantic consistency of the generated topics. DistilBERT produces the lowest proportion of outliers but yields a more limited number of topics, while SBERT demonstrates a balanced performance between topic quality and thematic diversity. These findings highlight that the choice of embedding model significantly influences the quality of topic modeling outcomes for Indonesian-language student feedback data.

References

Abdurrazzaq, M. A. (n.d.). Analisis Ulasan Aplikasi MyPertamina Menggunakan Topic modeling dengan Latent Dirichlet Allocation. Jurnal Sains Dan Teknologi, 10(1).

Abella, Á. R., Silvestre, J. P., & Tabuada, P. (2024). The Asymptotic Behavior of Attention in Transformers. 1–26. http://arxiv.org/abs/2412.02682

Ahammad, T. (2024). Identifying hidden patterns of fake COVID-19 news: An in-depth sentiment analysis and topic modeling approach. Natural Language Processing Journal, 6(January), 100053. https://doi.org/10.1016/j.nlp.2024.100053

Akdeas Oktanae Widodo, Septiadi, F., & Nur Aini Rakhmawati. (2023). Analisis Tren Konten Pada Vtuber Indonesia Menggunakan Latent Dirichlet Allocation. Jurnal Informatika Dan Rekayasa Elektronik, 6(1), 56–63. https://doi.org/10.36595/jire.v6i1.718

Alamsyah, A., & Girawan, N. D. (2023). Improving Clothing Product Quality and Reducing Waste Based on Consumer Review Using RoBERTa and BERTopic Language Model. Big Data and Cognitive Computing, 7(4). https://doi.org/10.3390/bdcc7040168

Allenbrand, C. (2024). Supervised and unsupervised learning models for pharmaceutical drug rating and classification using consumer generated reviews. Healthcare Analytics, 5(December 2023), 100288. https://doi.org/10.1016/j.health.2023.100288

Alonso-Dos-Santos, M., Sánchez Franco, M., Calabuig, F., & González-Serrano, M. H. (2023). Modelling the structure of the sports management research field using the bertopic approach. Retos, 47, 648–663. https://doi.org/10.47197/retos.v47.93622

Alryalat, S. A., Qasem, A., Albdour, K., & Rawashdeh, B. (2023). Assessment of Topics Published in Leading Medical Journals Using Natural Language Processing. High Yield Medical Reviews, 1(1), 1–8. https://doi.org/10.59707/hymrhmdo2739

An, Y., Oh, H., & Lee, J. (2023). Marketing Insights from Reviews Using Topic modeling with BERTopic and Deep Clustering Network. Applied Sciences (Switzerland), 13(16). https://doi.org/10.3390/app13169443

Arif Fitra Setyawan, Amelia Devi Putri Ariyanto, Fari Katul Fikriah, & Rozaq Isnaini Nugraha. (2024). Analisis Sentimen Ulasan iPhone di Amazon Menggunakan Model Deep Learning BERT Berbasis Transformer. Elkom: Jurnal Elektronika Dan Komputer, 17(2), 447–452. https://doi.org/10.51903/elkom.v17i2.2150

Arslan, M., & Cruz, C. (2023). Leveraging NLP approaches to define and implement text relevance hierarchy framework for business news classification. Procedia Computer Science, 225, 317–326. https://doi.org/10.1016/j.procs.2023.10.016

Aryadi, J. A., Basith, Y. A. A., Munawir, M., & Agustini, D. A. R. (2023). Analisis Data Review Hotel di Google Maps Melalui Text Mining (Studi Kasus: Kabupanten Bandung). JIKO (Jurnal Informatika dan Komputer), 7(2), 312. https://doi.org/10.26798/jiko.v7i2.938

Bachoumis, A., Mylonas, C., Plakas, K., Birbas, M., & Birbas, A. (2023). Data-Driven Analytics for Reliability in the Buildings-to-Grid Integrated System Framework: A Systematic Text-Mining-Assisted Literature Review and Trend Analysis. IEEE Access, 11(October), 130763–130787. https://doi.org/10.1109/ACCESS.2023.3335191

Bu, W., Shu, H., Kang, F., Hu, Q., & Zhao, Y. (2023). Software Subclassification Based on BERTopic-BERT-BiLSTM Model. Electronics (Switzerland), 12(18). https://doi.org/10.3390/electronics12183798

Chagnon, E., Pandolfi, R., Donatelli, J., & Ushizima, D. (2024). Benchmarking topic models on scientific articles using BERTeley. Natural Language Processing Journal, 6(October 2023), 100044. https://doi.org/10.1016/j.nlp.2023.100044

Colantoni, F. (2023). The impact of corporate governance on default risk: BERTopic literature review. Corporate Ownership and Control, 20(4), 57–71. https://doi.org/10.22495/cocv20i4art4

Dahlke, J. (2024). Artificial intelligence as a sociotechnical system: Integrating technical design, human goals, and social expectations. Journal of Artificial Intelligence Studies, 12(1), 45–60.

Herwinsyah. (2023). Pemodelan Topik Dalam Al-Qur’an Menggunakan Library. Simetris, 14(2), 319–327.

Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794. https://arxiv.org/abs/2203.05794

Jamaaluddin, & Sulistyowati, I. (2021). Buku Ajar Kecerdasan Buatan. Umsida Press, 121.

Khadijah, U. N., & Cahyono, N. (2024). Analisis Topic Modelling Pariwisata Yogyakarta Menggunakan Latent Dirichlet Allocation (LDA). The Indonesian Journal of Computer Science, 13(4).

Masruriyah, A. F. N., Sukmawati, C. E., & Novita, H. Y. (2022). Pengelompokan Topik Cuitan Pengguna Twitter Terhadap Kuliah Kerja Nyata (KKN) di Indonesia Menggunakan Latent Dirichlet Allocation. Konferensi Nasional Penelitian Dan Pengabdian (KNPP), 3, 1128–1133.

Matira, Y., & Setiawan, I. (2023). Pemodelan Topik pada Judul Berita Online Detikcom Menggunakan Latent Dirichlet Allocation. Estimasi: Journal of Statistics and Its Application, 4(1), 2379–2721. https://doi.org/10.20956/ejsa.vi.24843

Maulidiya, D. (2023). Topic Modelling using Latent Dirichlet Allocation (LDA) to Investigate the Latent Topics of Mathematical Creative Thinking Research in Indonesia. Journal of Intelligent Computing & Health Informatics, 3(2), 35. https://doi.org/10.26714/jichi.v3i2.11428

Mueller, J. P., & Massaron, L. (2018). Artificial Intelligence for Dummies. John Wiley & Sons, Inc.

Mulia, A., & Dzikrillah, A. R. (2023a). Analisis Perbedaan Pendapat Netizen Indonesia tentang Presiden Jokowi sebelum dan sesudah Kenaikan Harga BBM Analysis of Indonesian Netizens’ Dissent on President Jokowi before and after Fuel Price Increase. Journal of Computing Engineering, System and Science), 8(2), 318–328.

Mulia, A., & Dzikrillah, A. R. (2023b). Analisis Perbedaan Pendapat Netizen Indonesia tentang Presiden Jokowi sebelum dan sesudah Kenaikan Harga BBM Analysis of Indonesian Netizens’ Dissent on President Jokowi before and after Fuel Price Increase. Journal of Computing Engineering, System and Science), 8(2), 318–328.

Naghshzan, A., & Ratte, S. (2023). Enhancing API Documentation through BERTopic modeling and Summarization. http://arxiv.org/abs/2308.09070

Niroomand, K., Saady, N. M. C., Bazan, C., Zendehboudi, S., Soares, A., & Albayati, T. M. (2023). Smart investigation of artificial intelligence in renewable energy system technologies by natural language processing: Insightful pattern for decision-makers. Engineering Applications of Artificial Intelligence, 126(PA), 106848. https://doi.org/10.1016/j.engappai.2023.106848

Ojo, A. O., & Bouguila, N. (2024). A topic modeling and image classification framework: The Generalized Dirichlet variational autoencoder. Pattern Recognition, 146(October 2023), 110037. https://doi.org/10.1016/j.patcog.2023.110037

Parlina, A., & Maryati, I. (2023). Leveraging BERTopic for the Analysis of Scientific Papers on Seaweed. Proceedings - 2023 10th International Conference on Computer, Control, Informatics and Its Applications: Exploring the Power of Data: Leveraging Information to Drive Digital Innovation, IC3INA 2023, 2022, 279–283. https://doi.org/10.1109/IC3INA60834.2023.10285737

Rahman, R. D., Setiawan, N. Y., & Bachtiar, F. A. (2025). Analisis Sentimen Pengguna Aplikasi Mobile Berbasis Review Pada Platform Blibli Menggunakan Metode Bidirectional Encoder Representations from Transformers (BERT). Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 9(4), 2548–2964.

Rejeb, A., Rejeb, K., Appolloni, A., Jagtap, S., Iranmanesh, M., Alghamdi, S., Alhasawi, Y., & Kayikci, Y. (2024). Unleashing the power of internet of things and blockchain: A comprehensive analysis and future directions. Internet of Things and Cyber-Physical Systems, 4(June 2023), 1–18. https://doi.org/10.1016/j.iotcps.2023.06.003

Rosalinda, G., Santoso, R., & Kartikasari, P. (2023). Pemodelan Topik Ulasan Aplikasi Netflix Pada Google Play Store Menggunakan Latent Dirichlet Allocation. Jurnal Gaussian, 11(4), 554–561. https://doi.org/10.14710/j.gauss.11.4.554-561

Saidi, F., Trabelsi, Z., & Thangaraj, E. (2022). A novel framework for semantic classification of cyber terrorist communities on Twitter. Engineering Applications of Artificial Intelligence, 115(January), 105271. https://doi.org/10.1016/j.engappai.2022.105271

Samsir, S., Saragih, R. S., Subagio, S., Aditiya, R., & Watrianthos, R. (2023). BERTopic modeling of Natural Language Processing Abstracts: Thematic Structure and Trajectory. Jurnal Media Informatika Budidarma, 7(3), 1514. https://doi.org/10.30865/mib.v7i3.6426

Suryotrisongko, H., Ginardi, H., Ciptaningtyas, H. T., Dehqan, S., & Musashi, Y. (2022). Topic modeling for Cyber Threat Intelligence (CTI). 2022 7th International Conference on Informatics and Computing, ICIC 2022, 1–7. https://doi.org/10.1109/ICIC56845.2022.10006988

Tondang, B. A., Fadhil, M. R., Perdana, M. N., Fauzi, A., & Janitra, U. S. (2023). Analisis pemodelan topik ulasan aplikasi BNI, BCA, dan BRI menggunakan latent dirichlet allocation. INFOTECH: Jurnal Informatika & Teknologi, 4(1), 114–127. https://doi.org/10.37373/infotech.v4i1.601

Wang, Y., Bashar, M. A., Chandramohan, M., & Nayak, R. (2023). Exploring topic models to discern cyber threats on Twitter: A case study on Log4Shell. Intelligent Systems with Applications, 20(March), 200280. https://doi.org/10.1016/j.iswa.2023.200280

Yvon, F. (2023). Transformers in Natural Language Processing. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 13500 LNAI(May), 81–105. https://doi.org/10.1007/978-3-031-24349-3_6

Zain, R. M., Anggai, S., Tukiyat, Musyafa, A., & Waskita, A. A. (2024). Revealing a Country’s Government Discourse Through BERT-based Topic modeling in the US Presidential Speeches. International Conference on Computer, Control, Informatics and Its Applications, IC3INA, 2024, 191–196. https://doi.org/10.1109/IC3INA64086.2024.10732578

Zhang, D., Wu, X., Liu, P., Qin, H., & Zhou, W. (2023). Identification of Product Innovation Path Incorporating the FOS and BERTopic Model from the Perspective of Invalid Patents. Applied Sciences (Switzerland), 13(13). https://doi.org/10.3390/app13137987

Zou, T., Guo, P., Li, F., & Wu, Q. (2024). Research topic identification and trend prediction of China’s energy policy: A combined LDA-ARIMA approach. Renewable Energy, 220(February 2023), 119619. https://doi.org/10.1016/j.renene.2023.119619