Implementing XGBoost Model for Predicting Customer Churn in E-Commerce Platforms
DOI:
https://doi.org/10.62951/repeater.v3i2.401Keywords:
Churn Prediction, E-commerce, Machine Learning, XGBoostAbstract
Customer churn is a major challenge in e-commerce, directly affecting revenue and profit. This study aims to develop a machine learning model using XGBoost to predict churn probability. To handle class imbalance, SMOTE was applied as a resampling method, and hyperparameter tuning was performed to enhance performance. The model was evaluated using the F2-score, prioritizing recall while maintaining precision. The results show that the XGBoost model with SMOTE achieves strong performance, with an F2-score of 0.849 on the tuned test data. This model can help businesses identify at-risk customers early, enabling proactive retention strategies.
Downloads
References
Ahn, J., Hwang, J., Kim, D., Choi, H., & Kang, S. (2020). A survey on churn analysis in various business domains. IEEE Access, 8, 220816–220839. https://doi.org/10.1109/access.2020.3042657
Aydin, Z. E., & Ozturk, Z. K. (2021). Performance analysis of XGBoost classifier with missing data. Manchester Journal of Artificial Intelligence and Applied Sciences (MJAIAS, 2(2). ICMI 2021.
Bajaj, S. (2025). Churn rate 101: Meaning, calculation & reduction strategies. Shiprocket. https://www.shiprocket.in/blog/churn-rate/
Bentéjac, C., Csörgő, A., & Martínez-Muñoz, G. (2019). A comparative analysis of XGBoost. arXiv.org. https://arxiv.org/abs/1911.01914
Cai, J. (2024). The causes of bank customer churn based on XGBoost and LightGBM models: The evidence from the Kaggle Dataset. Finance & Economics, 1(4). https://doi.org/10.61173/ggw4ga77
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
Chen, T., & Guestrin, C. (2016). XGBoost. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785
Chen, H., Tang, Q., Wei, Y., & Song, M. (2021). Churn prediction model of telecom users based on XGBoost. Journal on Artificial Intelligence, 3(3), 115–121. https://doi.org/10.32604/jai.2021.026851
De, S., & Prabu, P. (2022). Predicting customer churn: A systematic literature review. Journal of Discrete Mathematical Sciences and Cryptography, 25(7), 1965–1985. https://doi.org/10.1080/09720529.2022.2133238
Dhangar, K., & Anand, P. (2021). A review on customer churn prediction using machine learning approach. Novateur Publications International Journal of Innovations in Engineering Research and Technology, 8(5), 193–201.
Fayrix. (n.d.). How churn prediction using machine learning benefits different industries. Fayrix. https://fayrix.com/blog/customer-churn-prediction-benefits#content
Gan, L. (2022). XGBoost-based e-commerce customer loss prediction. Computational Intelligence and Neuroscience, 2022, 1–10. https://doi.org/10.1155/2022/1858300
Hermawan, A., Jayanti, N. R., Tabaruk, Z., Triadi, F. L. Y., Saputra, A., & Syachrudin, M. R. H. (2024). Membangun model prediksi churn pelanggan yang akurat. Merkurius: Jurnal Riset Sistem Informasi Dan Teknik Informatika, 2(6), 67–81. https://doi.org/10.61132/merkurius.v2i6.398
Imani, M., Beikmohammadi, A., & Arabnia, H. R. (2025). Comprehensive analysis of random forest and XGBoost performance with SMOTE, ADASYN, and GNUS upsampling under varying imbalance levels. Preprints. https://doi.org/10.20944/preprints202501.2274.v1
Janez, D. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.
Lai, S. B. S., Shahri, N. H. N. B. M., Mohamad, M. B., Rahman, H. A. B. A., & Rambli, A. B. (2021). Comparing the performance of AdaBoost, XGBoost, and logistic regression for imbalanced data. Mathematics and Statistics, 9(3), 379–385.
Lemmens, A., & Gupta, S. (2017). Managing churn to maximize profits. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.2964906
Li, L. (2022). Research on improved XGBoost algorithm for big data analysis of e-commerce customer churn. International Journal of Advanced Computer Science and Applications, 13(12), 1086–1094. https://doi.org/10.14569/IJACSA.2022.01312124
Matuszelański, K., & Kopczewska, K. (2022). Customer churn in retail e-commerce business: Spatial and machine learning approach. Journal of Theoretical and Applied Electronic Commerce Research, 17(1), 165–198. https://doi.org/10.3390/jtaer17010009
Nassar, O. (2023). Data leakage in machine learning. ResearchGate. https://doi.org/10.13140/RG.2.2.27468.59528
Peng, K., Peng, Y., & Li, W. (2023). Research on customer churn prediction and model interpretability analysis. PLoS ONE, 18(12), e0289724. https://doi.org/10.1371/journal.pone.0289724
Pondel, M., Wuczyński, M., Gryncewicz, W., Łysik, Ł., Hernes, M., Rot, A., & Kozina, A. (2021). Deep learning for customer churn prediction in e-commerce decision support. Business Information Systems, 3–12. https://doi.org/10.52825/bis.v1i.42
Tang, Q., Xia, G., Zhang, X., & Long, F. (2020). A customer churn prediction model based on XGBoost and MLP. 2020 International Conference on Computer Engineering and Application (ICCEA), 608–612. https://doi.org/10.1109/iccea50009.2020.00133
Vasudevan, M., Narayanan, R. S., Nakeeb, S. F., & Abhishek, A. (2022). Customer churn analysis using XGBoosted decision trees. Indonesian Journal of Electrical Engineering and Computer Science, 25(1), 488. https://doi.org/10.11591/ijeecs.v25.i1.pp488-495
Wu, O. (2023). Rethinking class imbalance in machine learning. arXiv preprint, arXiv:2305.03900. https://doi.org/10.48550/arXiv.2305.03900
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Repeater : Publikasi Teknik Informatika dan Jaringan

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.