Enhancing Forecast Accuracy: The Impact of Data Transformation in Time Series Models
DOI:
https://doi.org/10.15611/eada.2025.4.03Keywords:
forecasting, stochastic process models, recurrent neural networksAbstract
Aim: The aim of the article was formulate suggestions on which preprocessing method is preferable for various forecasting algorithms, including machine learning approaches, particularly for forecasting stock values.
Methodology: Research study on actual stock values prediction on an example of 10 average NYSE enterprises, comparing five scenarios of data preparation.
Results: The results confirm theoretical assumptions and recommendations for the proper design of benchmark studies and real forecasting models.
Implications and recommendations: As stated in the literature of the subject data transformation for models based on stochastic processes, such as ARIMA and GARCH, transforming data to rates of return (a form of differentiation) is a desirable approach. For machine learning models, especially recurrent neural networks, such as the Long Short-Term Memory Network and the Gated Recurrent Unit, the min-max normalisation data transformation should be applied. For exponential Smoothing and Brownian motion methods, the best results were achieved for non-transformed (raw) data. The guidance relevant to benchmark studies and real forecasting models is presented in the final section of the paper. The central thesis was to emphasise, through the example of stock values forecasting, that proper benchmark studies and real-life applications should be designed in a way that ensures proper preprocessing is used for the given model. Using the same preprocessing for different models may sometimes yield misleading results.
Originality/value: The topic of data preparation and transformation, although commonly present in the literature of the subject, is rarely confirmed by research studies on real datasets. To the best of the author's knowledge, this type of analysis has not been conducted on real data to date.
Downloads
References
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31(3), 307-327.
Box, G. E. P., & Jenkins, G. M. (1970). Time series analysis: Forecasting and control. Holden-Day.
Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2016). Time series analysis: Forecasting and control (5th ed.). Wiley.
Brownlee, J. (2019). Deep learning for time series forecasting: Predict the future with MLPs, CNNs and LSTMs in Python. Machine Learning Mastery.
Cho, K., van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural machine translation: Encoder–decoder approaches. In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (pp. 103-111). Association for Computational Linguistics. https://doi.org/10.3115/v1/W14-4012
Chollet, F., & Allaire, J. J. (2018). Deep learning with R. Manning.
Einstein, A. (1905). On the movement of small particles suspended in stationary liquids required by the molecular-kinetic theory of heat. Annalen der Physik, 17, 549-560.
Enders, W. (2015). Applied econometric time series (4th ed.). Wiley.
Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of UK inflation. Econometrica, 50(4), 987-1007.
Fischer, T., & Krauss, C. (2018). Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational Research, 270(2), 654-669. https://doi.org/10.1016/j.ejor.2017.11.054
Francq, C., & Zakoïan, J.-M. (2019). GARCH models: Structure, statistical inference and financial applications (2nd ed.). Wiley.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
Hamilton, J. D. (1994). Time series analysis. Princeton University Press.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
Holt, C. C. (1957). Forecasting Seasonals and Trends by Exponentially Weighted Moving Averages. ONR Memorandum, Vol. 52. Carnegie Institute of Technology.
Tsay, R. S. (2010). Analysis of financial time series (3rd ed.). Wiley.
Wiener, N. (1923). Differential space. Journal of Mathematics and Physics, 2(1), 131-174.
Winters, P. R. (1960). Forecasting sales by exponentially weighted moving averages. Management Science, 6(3), 324-342.
Zhang, D., Chen, S., & He, Z. (2020). A deep learning framework for financial time series using stacked autoencoders and long–short term memory. PLOS ONE, 15(1), e0226997. https://doi.org/10.1371/journal.pone.0226997
Downloads
Published
Issue
Section
Categories
License
Copyright (c) 2025 Andrzej Dudek

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Accepted 2025-11-21
Published 2025-12-17






