Studi Performa TF-IDF dan Word2Vec Pada Analisis Sentimen Cyberbullying

Authors

  • Ahmad Hilman Dani Universitas Pembangunan Nasional “Veteran” Jawa Timur
  • Eva Yulia Puspaningrum Universitas Pembangunan Nasional “Veteran” Jawa Timur
  • Retno Mumpuni Universitas Pembangunan Nasional “Veteran” Jawa Timur

DOI:

https://doi.org/10.62951/router.v2i2.76

Keywords:

Cyberbullying, Sentiment Analysis, SVM, TF-IDF, Word2Vec

Abstract

On August 14, 2023, Indonesia had approximately 228 million social media users, a number that is expected to continue growing to reach 267 million by 2028. Social media can be used to spread both positive and negative information, and one of the various negative effects is cyberbullying. Consequently, much research is conducted in the field of machine learning to develop sentiment analysis. One crucial step in sentiment analysis is word weighting. The two most common word weighting methods are TF-IDF and Word2Vec. These methods can be compared to determine which one produces better classification results, allowing cyberbullying sentiments on social media to be detected more accurately. Based on nine test scenarios, the final results showed that TF-IDF performed better than Word2Vec in this study, with an accuracy of 84%.

 

 

Downloads

Download data is not yet available.

References

Ahmad Aliero, A., Dankolo, N., Sulaimon Adebayo, B., Olanrewaju Aliyu, H., Gogo Tafida, A., Umar Kangiwa, B., & Muhammad Dankolo, N. (2023). Systematic review on text normalization techniques and its approach to non-standard words. International Journal of Computer Applications, 185(33). https://www.researchgate.net/publication/374166354

Al-Otaibi, S., & Al-Rasheed, A. (2022). A review and comparative analysis of sentiment analysis techniques. Informatica (Slovenia), 46(6), 33–44. https://doi.org/10.31449/inf.v46i6.3991

Aura Azzahra, T., Anisa Sri Winarsih, N., Wilujeng Saraswati, G., Ocky Saputra, F., Syaifur Rohman, M., Oka Ratmana, D., Anggi Pramunendar, R., & Fajar Shidik, G. (2024). Perbandingan efektivitas Naïve Bayes dan SVM dalam menganalisis sentimen kebencanaan di YouTube. Jurnal Media Informatika Budidarma. https://doi.org/10.30865/mib.v8i1.7186

Damayanti, L., & Lhaksmana, K. M. (2024). Sentiment analysis of the 2024 Indonesia presidential election on Twitter. Jurnal Dan Penelitian Teknik Informatika, 8(2). https://doi.org/10.33395/v8i2.13379

Ibrohim, M. O., & Budi, I. (2019). Multi-label hate speech and abusive language detection in Indonesian Twitter. Komnas HAM. https://www.komnasham.go.id/index.php/

Jivani, A. G., Anjali, M., & Jivani, G. (n.d.). A comparative study of stemming algorithms. https://www.researchgate.net/publication/284038938

Kemp, S. (2023, January 26). Digital 2023: Global overview report. DataReportal. https://datareportal.com/reports/digital-2023-global-overview-report

Jasmarizal, Rahmaddeni, Junadhi, & Khairul Anam, M. (n.d.). Penerapan metode support vector machine untuk analisis sentimen terhadap. Indonesian Journal of Computer Science Attribution, 13(1), 2024–1438.

Kim, S. W., & Gil, J. M. (2019). Research paper classification systems based on TF-IDF and LDA schemes. Human-Centric Computing and Information Sciences, 9(1). https://doi.org/10.1186/s13673-019-0192-7

Kotsiantis, S., Kanellopoulos, D., & Pintelas, P. E. (2014). Data preprocessing for supervised learning. https://www.researchgate.net/publication/228084519

Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5(4), 1093–1113. https://doi.org/10.1016/j.asej.2014.04.011

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. http://arxiv.org/abs/1301.3781

Muraina, I. O. (n.d.). Ideal dataset splitting ratios in machine learning algorithms: General concerns for data scientists and data analysts. https://www.researchgate.net/publication/358284895

Nyoman, N. I., Suciartini, A., Luh, N. I., & Sumartini, P. U. (2018). 2 pertama diterima: 4 Agustus.

Patchin, J. W., & Hinduja, S. (2006). Bullies move beyond the schoolyard: A preliminary look at cyberbullying. Youth Violence and Juvenile Justice, 4(2), 148–169. https://doi.org/10.1177/1541204006286288

Rahm, E., & Do, H. H. (2000). Data cleaning: Problems and current approaches. https://www.researchgate.net/publication/220282831

Rezki, N., Thamrin, S. A., & Siswanto, S. (2023). Sentiment analysis of Merdeka Belajar Kampus Merdeka policy using support vector machine with Word2Vec. BAREKENG: Jurnal Ilmu Matematika Dan Terapan, 17(1), 0481–0486. https://doi.org/10.30598/barekengvol17iss1pp0481-0486

Salloum, S. A., Khan, R., & Shaalan, K. (2020). A survey of semantic analysis approaches. Advances in Intelligent Systems and Computing, 1153, 61–70. https://doi.org/10.1007/978-3-030-44289-7_6

Tong, S., & Koller, D. (2001). Support vector machine active learning with applications to text classification. Journal of Machine Learning Research.

Toraman, C., Yilmaz, E. H., Şahinuç, F., & Ozcelik, O. (2022). Impact of tokenization on language models: An analysis for Turkish. https://doi.org/10.1145/3578707

Published

2024-06-04

How to Cite

Ahmad Hilman Dani, Eva Yulia Puspaningrum, & Retno Mumpuni. (2024). Studi Performa TF-IDF dan Word2Vec Pada Analisis Sentimen Cyberbullying. Router : Jurnal Teknik Informatika Dan Terapan, 2(2), 94–106. https://doi.org/10.62951/router.v2i2.76

Similar Articles

<< < 1 2 

You may also start an advanced similarity search for this article.