Implementasi Pipeline ETL dan Pemodelan Prediktif ARIMA dalam Memetakan Pola Pembelian Konsumen pada Dataset Marketplace
DOI:
https://doi.org/10.62951/repeater.v4i1.799Keywords:
ARIMA, Data Engineering, ETL, Forecasting, MarketplaceAbstract
In the rapidly evolving digital economy, the ability to anticipate transaction surges is a strategic asset for marketplace platforms to maintain operational efficiency. This research aims to build an accurate daily transaction volume forecasting system thru the implementation of an Extract, Transform, and Load (ETL) pipeline and Autoregressive Integrated Moving Average (ARIMA) predictive modeling. The dataset used is sourced from dataset_olshop.csv, which includes transaction history for the entire year of 2025. The ETL stage focused on data cleaning and handling missing values, while time series analysis began with the Augmented Dickey-Fuller (ADF) stationarity test, which yielded a significant p-value of 0.000006. The parameter model was optimized using the auto_arima algorithm, which determined the ARIMA(2,0,0) configuration as the best model. The evaluation results of the model show fairly stable performance with a Root Mean Squared Error (RMSE) value of 2.002 and a Mean Absolute Error (MAE) of 1.704 on the test data. Research findings reveal a consistently higher purchasing pattern during the mid-month and end-of-month periods, with an average of 5.52 daily transactions, compared to the beginning of the month, which saw 5.48 transactions. The 30-day forecast results provide valuable insights for online store managers to proactively adjust inventory and logistics workforce allocation strategies. This research concludes that integrating data engineering techniques and statistical analysis can provide predictive solutions for the dynamics of the digital market.
Downloads
References
Aggarwal, C. C. (2018). Machine learning for data mining. Springer.
Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2016). Time series analysis: Forecasting and control (5th ed.). Wiley.
Chen, D., Hu, Y., & Smith, J. (2021). Consumer behavior analytics in e-commerce. Electronic Commerce Research, 21(3), 567–589. https://doi.org/10.1007/s10660-020-09421-3
Dickey, D. A., & Fuller, W. A. (1979). Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association, 74(366), 427–431. https://doi.org/10.1080/01621459.1979.10482531
Fildes, R., Ma, S., & Kolassa, S. (2019). Retail forecasting: Research and practice. International Journal of Forecasting, 35(1), 1–15. https://doi.org/10.1016/j.ijforecast.2018.06.004
Han, J., Kamber, M., & Pei, J. (2022). Data mining: Concepts and techniques (4th ed.). Morgan Kaufmann.
Hyndman, R. J., & Athanasopoulos, G. (2021). Forecasting: Principles and practice (3rd ed.). OTexts.
Hyndman, R. J., & Khandakar, Y. (2008). Automatic time series forecasting: The forecast package for R. Journal of Statistical Software, 27(3), 1–22. https://doi.org/10.18637/jss.v027.i03
Inmon, W. H. (2019). Building the data warehouse (5th ed.). Wiley.
Kaggle. (2024). E-commerce transaction dataset. Kaggle. https://www.kaggle.com
Kimball, R., & Ross, M. (2016). The data warehouse toolkit (3rd ed.). Wiley.
Larose, D. T., & Larose, C. D. (2015). Data mining and predictive analytics. Wiley.
Makridakis, S., Spiliotis, E., & Assimakopoulos, V. (2018). Statistical and machine learning forecasting methods. Journal of Forecasting, 37(8), 802–814. https://doi.org/10.1002/for.2583
Montgomery, D. C., Jennings, C. L., & Kulahci, M. (2015). Introduction to time series analysis and forecasting. Wiley.
Provost, F., & Fawcett, T. (2013). Data science for business. O’Reilly Media.
Rahman, A., Putra, D., & Sari, N. (2023). Time series forecasting for e-commerce transactions using ARIMA. Journal of Big Data Analytics, 5(2), 45–60.
Shumway, R. H., & Stoffer, D. S. (2017). Time series analysis and its applications (4th ed.). Springer. https://doi.org/10.1007/978-3-319-52452-8
Smith, J. (2020). Data engineering fundamentals: Designing ETL pipelines. CRC Press.
Tsay, R. S. (2014). Multivariate time series analysis: With R and financial applications. Wiley. https://doi.org/10.1002/9781118445112.stat03545
Vassiliadis, P. (2009). A survey of extract-transform-load technology. International Journal of Data Warehousing and Mining, 5(3), 1–27. https://doi.org/10.4018/jdwm.2009070101
Wei, W. W. S. (2018). Time series analysis: Univariate and multivariate methods (2nd ed.). Pearson.
Zhang, G. P. (2003). Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing, 50, 159–175. https://doi.org/10.1016/S0925-2312(01)00702-0
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Repeater : Publikasi Teknik Informatika dan Jaringan

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.


