Studi Performa TF-IDF dan Word2Vec Pada Analisis Sentimen Cyberbullying
DOI:
https://doi.org/10.62951/router.v2i2.76Keywords:
Cyberbullying, Sentiment Analysis, SVM, TF-IDF, Word2VecAbstract
On August 14, 2023, Indonesia had approximately 228 million social media users, a number that is expected to continue growing to reach 267 million by 2028. Social media can be used to spread both positive and negative information, and one of the various negative effects is cyberbullying. Consequently, much research is conducted in the field of machine learning to develop sentiment analysis. One crucial step in sentiment analysis is word weighting. The two most common word weighting methods are TF-IDF and Word2Vec. These methods can be compared to determine which one produces better classification results, allowing cyberbullying sentiments on social media to be detected more accurately. Based on nine test scenarios, the final results showed that TF-IDF performed better than Word2Vec in this study, with an accuracy of 84%.
Downloads
References
Ahmad Aliero, A., Dankolo, N., Sulaimon Adebayo, B., Olanrewaju Aliyu, H., Gogo Tafida, A., Umar Kangiwa, B., & Muhammad Dankolo, N. (2023). Systematic review on text normalization techniques and its approach to non-standard words. International Journal of Computer Applications, 185(33). https://www.researchgate.net/publication/374166354
Al-Otaibi, S., & Al-Rasheed, A. (2022). A review and comparative analysis of sentiment analysis techniques. Informatica (Slovenia), 46(6), 33–44. https://doi.org/10.31449/inf.v46i6.3991
Aura Azzahra, T., Anisa Sri Winarsih, N., Wilujeng Saraswati, G., Ocky Saputra, F., Syaifur Rohman, M., Oka Ratmana, D., Anggi Pramunendar, R., & Fajar Shidik, G. (2024). Perbandingan efektivitas Naïve Bayes dan SVM dalam menganalisis sentimen kebencanaan di YouTube. Jurnal Media Informatika Budidarma. https://doi.org/10.30865/mib.v8i1.7186
Damayanti, L., & Lhaksmana, K. M. (2024). Sentiment analysis of the 2024 Indonesia presidential election on Twitter. Jurnal Dan Penelitian Teknik Informatika, 8(2). https://doi.org/10.33395/v8i2.13379
Ibrohim, M. O., & Budi, I. (2019). Multi-label hate speech and abusive language detection in Indonesian Twitter. Komnas HAM. https://www.komnasham.go.id/index.php/
Jivani, A. G., Anjali, M., & Jivani, G. (n.d.). A comparative study of stemming algorithms. https://www.researchgate.net/publication/284038938
Kemp, S. (2023, January 26). Digital 2023: Global overview report. DataReportal. https://datareportal.com/reports/digital-2023-global-overview-report
Jasmarizal, Rahmaddeni, Junadhi, & Khairul Anam, M. (n.d.). Penerapan metode support vector machine untuk analisis sentimen terhadap. Indonesian Journal of Computer Science Attribution, 13(1), 2024–1438.
Kim, S. W., & Gil, J. M. (2019). Research paper classification systems based on TF-IDF and LDA schemes. Human-Centric Computing and Information Sciences, 9(1). https://doi.org/10.1186/s13673-019-0192-7
Kotsiantis, S., Kanellopoulos, D., & Pintelas, P. E. (2014). Data preprocessing for supervised learning. https://www.researchgate.net/publication/228084519
Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5(4), 1093–1113. https://doi.org/10.1016/j.asej.2014.04.011
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. http://arxiv.org/abs/1301.3781
Muraina, I. O. (n.d.). Ideal dataset splitting ratios in machine learning algorithms: General concerns for data scientists and data analysts. https://www.researchgate.net/publication/358284895
Nyoman, N. I., Suciartini, A., Luh, N. I., & Sumartini, P. U. (2018). 2 pertama diterima: 4 Agustus.
Patchin, J. W., & Hinduja, S. (2006). Bullies move beyond the schoolyard: A preliminary look at cyberbullying. Youth Violence and Juvenile Justice, 4(2), 148–169. https://doi.org/10.1177/1541204006286288
Rahm, E., & Do, H. H. (2000). Data cleaning: Problems and current approaches. https://www.researchgate.net/publication/220282831
Rezki, N., Thamrin, S. A., & Siswanto, S. (2023). Sentiment analysis of Merdeka Belajar Kampus Merdeka policy using support vector machine with Word2Vec. BAREKENG: Jurnal Ilmu Matematika Dan Terapan, 17(1), 0481–0486. https://doi.org/10.30598/barekengvol17iss1pp0481-0486
Salloum, S. A., Khan, R., & Shaalan, K. (2020). A survey of semantic analysis approaches. Advances in Intelligent Systems and Computing, 1153, 61–70. https://doi.org/10.1007/978-3-030-44289-7_6
Tong, S., & Koller, D. (2001). Support vector machine active learning with applications to text classification. Journal of Machine Learning Research.
Toraman, C., Yilmaz, E. H., Şahinuç, F., & Ozcelik, O. (2022). Impact of tokenization on language models: An analysis for Turkish. https://doi.org/10.1145/3578707
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Router : Jurnal Teknik Informatika dan Terapan

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.