Implementing XGBoost Model for Predicting Customer Churn in E-Commerce Platforms

Authors

  • Andy Hermawan Universitas Indraprasta PGRI
  • Aji Saputra Universitas Khairun
  • Muhammad Dhika Rafi Purwadhika Digital Technology School
  • Syafiq Basmallah Purwadhika Digital Technology School
  • Yilmaz Trigumari Syah Putra Purwadhika Digital Technology School
  • Wafa Nabila Purwadhika Digital Technology School

DOI:

https://doi.org/10.62951/repeater.v3i2.401

Keywords:

Churn Prediction, E-commerce, Machine Learning, XGBoost

Abstract

Customer churn is a major challenge in e-commerce, directly affecting revenue and profit. This study aims to develop a machine learning model using XGBoost to predict churn probability. To handle class imbalance, SMOTE was applied as a resampling method, and hyperparameter tuning was performed to enhance performance. The model was evaluated using the F2-score, prioritizing recall while maintaining precision. The results show that the XGBoost model with SMOTE achieves strong performance, with an F2-score of 0.849 on the tuned test data. This model can help businesses identify at-risk customers early, enabling proactive retention strategies.

Downloads

Download data is not yet available.

References

Ahn, J., Hwang, J., Kim, D., Choi, H., & Kang, S. (2020). A survey on churn analysis in various business domains. IEEE Access, 8, 220816–220839. https://doi.org/10.1109/access.2020.3042657

Aydin, Z. E., & Ozturk, Z. K. (2021). Performance analysis of XGBoost classifier with missing data. Manchester Journal of Artificial Intelligence and Applied Sciences (MJAIAS, 2(2). ICMI 2021.

Bajaj, S. (2025). Churn rate 101: Meaning, calculation & reduction strategies. Shiprocket. https://www.shiprocket.in/blog/churn-rate/

Bentéjac, C., Csörgő, A., & Martínez-Muñoz, G. (2019). A comparative analysis of XGBoost. arXiv.org. https://arxiv.org/abs/1911.01914

Cai, J. (2024). The causes of bank customer churn based on XGBoost and LightGBM models: The evidence from the Kaggle Dataset. Finance & Economics, 1(4). https://doi.org/10.61173/ggw4ga77

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953

Chen, T., & Guestrin, C. (2016). XGBoost. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785

Chen, H., Tang, Q., Wei, Y., & Song, M. (2021). Churn prediction model of telecom users based on XGBoost. Journal on Artificial Intelligence, 3(3), 115–121. https://doi.org/10.32604/jai.2021.026851

De, S., & Prabu, P. (2022). Predicting customer churn: A systematic literature review. Journal of Discrete Mathematical Sciences and Cryptography, 25(7), 1965–1985. https://doi.org/10.1080/09720529.2022.2133238

Dhangar, K., & Anand, P. (2021). A review on customer churn prediction using machine learning approach. Novateur Publications International Journal of Innovations in Engineering Research and Technology, 8(5), 193–201.

Fayrix. (n.d.). How churn prediction using machine learning benefits different industries. Fayrix. https://fayrix.com/blog/customer-churn-prediction-benefits#content

Gan, L. (2022). XGBoost-based e-commerce customer loss prediction. Computational Intelligence and Neuroscience, 2022, 1–10. https://doi.org/10.1155/2022/1858300

Hermawan, A., Jayanti, N. R., Tabaruk, Z., Triadi, F. L. Y., Saputra, A., & Syachrudin, M. R. H. (2024). Membangun model prediksi churn pelanggan yang akurat. Merkurius: Jurnal Riset Sistem Informasi Dan Teknik Informatika, 2(6), 67–81. https://doi.org/10.61132/merkurius.v2i6.398

Imani, M., Beikmohammadi, A., & Arabnia, H. R. (2025). Comprehensive analysis of random forest and XGBoost performance with SMOTE, ADASYN, and GNUS upsampling under varying imbalance levels. Preprints. https://doi.org/10.20944/preprints202501.2274.v1

Janez, D. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.

Lai, S. B. S., Shahri, N. H. N. B. M., Mohamad, M. B., Rahman, H. A. B. A., & Rambli, A. B. (2021). Comparing the performance of AdaBoost, XGBoost, and logistic regression for imbalanced data. Mathematics and Statistics, 9(3), 379–385.

Lemmens, A., & Gupta, S. (2017). Managing churn to maximize profits. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.2964906

Li, L. (2022). Research on improved XGBoost algorithm for big data analysis of e-commerce customer churn. International Journal of Advanced Computer Science and Applications, 13(12), 1086–1094. https://doi.org/10.14569/IJACSA.2022.01312124

Matuszelański, K., & Kopczewska, K. (2022). Customer churn in retail e-commerce business: Spatial and machine learning approach. Journal of Theoretical and Applied Electronic Commerce Research, 17(1), 165–198. https://doi.org/10.3390/jtaer17010009

Nassar, O. (2023). Data leakage in machine learning. ResearchGate. https://doi.org/10.13140/RG.2.2.27468.59528

Peng, K., Peng, Y., & Li, W. (2023). Research on customer churn prediction and model interpretability analysis. PLoS ONE, 18(12), e0289724. https://doi.org/10.1371/journal.pone.0289724

Pondel, M., Wuczyński, M., Gryncewicz, W., Łysik, Ł., Hernes, M., Rot, A., & Kozina, A. (2021). Deep learning for customer churn prediction in e-commerce decision support. Business Information Systems, 3–12. https://doi.org/10.52825/bis.v1i.42

Tang, Q., Xia, G., Zhang, X., & Long, F. (2020). A customer churn prediction model based on XGBoost and MLP. 2020 International Conference on Computer Engineering and Application (ICCEA), 608–612. https://doi.org/10.1109/iccea50009.2020.00133

Vasudevan, M., Narayanan, R. S., Nakeeb, S. F., & Abhishek, A. (2022). Customer churn analysis using XGBoosted decision trees. Indonesian Journal of Electrical Engineering and Computer Science, 25(1), 488. https://doi.org/10.11591/ijeecs.v25.i1.pp488-495

Wu, O. (2023). Rethinking class imbalance in machine learning. arXiv preprint, arXiv:2305.03900. https://doi.org/10.48550/arXiv.2305.03900

Downloads

Published

2025-03-12

How to Cite

Andy Hermawan, Aji Saputra, Muhammad Dhika Rafi, Syafiq Basmallah, Yilmaz Trigumari Syah Putra, & Wafa Nabila. (2025). Implementing XGBoost Model for Predicting Customer Churn in E-Commerce Platforms. Repeater : Publikasi Teknik Informatika Dan Jaringan, 3(2), 17–31. https://doi.org/10.62951/repeater.v3i2.401

Similar Articles

1 2 > >> 

You may also start an advanced similarity search for this article.