Forecasting the Spread of Dengue Outbreaks with a Synthesis of Machine Learning Models Utilizing Exogenous Variables

Authors

  • Amulya Gottipati Academies of Loudoun
  • Sreeja Iragavarapu Academies of Loudoun

DOI:

https://doi.org/10.37266/ISER.2025v12i1.pp13-28

Keywords:

Dengue Fever, Forecasting, Exogenous Variables, Stagnant Water, Machine Learning

Abstract

Dengue fever, a viral mosquito-borne disease, affects four billion people worldwide, posing economic and health burdens. Unfortunately, there are no antiviral drugs to treat dengue infections, requiring patients to rely solely on palliative treatment. Forecasting future epidemics will aid public officials in implementing mitigation efforts by predicting dengue cases. The purpose of this study was to develop a machine learning model that forecasts the incidence of dengue outbreaks temporally and geographically by utilizing eco-climatic and socioeconomic factors. Methods included preprocessing monthly dengue cases, precipitation, temperature, and socioeconomic datasets from seven countries (between 2014 and 2023) before performing a principal component analysis. A novel topographical feature applied to the model was stagnant water, a critical breeding ground for mosquitoes. A ridge regression technique was used to manage multicollinearity within the data before applying it to the seasonal autoregressive integrated moving average with exogenous variables (SARIMAX) model, which accounts for the seasonality aspect of the variables being examined. Overall, the forecasting algorithm was capable of accurately predicting dengue incidence up to at least six months in advance with a mean absolute error of 2.420e-6. When the novel feature of stagnant water was removed from the datasets, the prediction’s accuracy significantly decreased when forecasting for the same time period of six months in advance, demonstrating its importance as a feature when forecasting dengue. Therefore, this algorithm can assist public health officials with planning proactive measures, significantly diminishing economic stress and dengue transmission, thus improving the quality of life in dengue-endemic countries. 

References

Brownlee, J. (2019, August 21). A Gentle Introduction to SARIMA for Time Series Forecasting in Python. Retrieved from https://machinelearningmastery.com/sarima-for-time-series-forecasting-in-python/

Dahmana, H. & Mediannikov, O. (2017). Mosquito-borne diseases emergence/resurgence and how to effectively control it biologically. Pathogens, 9(4). https://doi.org/10.3390/pathogens9040310

Data, M. N. (n.d.). Data collections: Earth System Data Explorer | My NASA Data. Retrieved from https://mynasadata.larc.nasa.gov/basic-page/data-collections-earth-system-data-explorer

Dengue emergency in the Americas: time for a new continental eradication plan. (2023). The Lancet Regional Health - Americas, 22. https://doi.org/10.1016/j.lana.2023.100539

Dengue – the Region of the Americas. (2023, July 19). Retrieved from https://www.who.int/emergencies/disease-outbreak-news/item/2023-DON475

Farooq, Z., Rocklöv, J., Wallin, J., Abiri, N., Sewe, M., Sjödin, H., & Semenza, J. (2022). Artificial intelligence to predict West Nile virus outbreaks with eco-climatic drivers. The Lancet Regional Health - Europe. https://doi.org/10.1016/j.lanepe.2022.100370

FreshExplorer. (n.d.). Retrieved from https://map.sdg661.app/#

Global economy, world economy. (n.d.). Retrieved from https://www.theglobaleconomy.com/

Gutiérrez, L. A. (n.d.). PAHO/WHO Data - National Dengue fever cases | PAHO/WHO. Retrieved from https://www3.paho.org/data/index.php/en/mnu-topics/indicadores-dengue-en/dengue-nacional-en/252-dengue-pais-ano-en.html

Hii, Y. L., Rocklöv, J., Wall, S., Ng, L. C., Tang, C. S., & Ng, N. (2012). Optimal lead time for dengue forecast. PLOS Neglected Tropical Diseases, 6(10). https://doi.org/10.1371/journal.pntd.0001848

How Dengue Spreads. (2024, May 14). Retrieved from https://www.cdc.gov/dengue/transmission/index.html

Jaadi, Z. (2024, February 23). Principal Component Analysis (PCA): A Step-by-Step Explanation. Retrieved from https://builtin.com/data-science/step-step-explanation-principal-component-analysis

Laserna, A., Barahona-Correa, J., Baquero, L., & Castañeda-Cardona, C. (2018). Economic impact of dengue fever in Latin America and the Caribbean: a systematic review. Revista Panamericana de Salud Pública, 42. https://doi.org/10.26633/RPSP.2018.111

Life cycle of Aedes mosquitoes. (2024, April 16). Retrieved from https://www.cdc.gov/mosquitoes/about/life-cycle-of-aedes-mosquitoes.html

Morgan, J., Strode, C., & Salcedo-Sora, J. (2021). Climatic and socio-economic factors supporting the co-circulation of dengue, Zika and chikungunya in three different ecosystems in Colombia. PLOS Neglected Tropical Diseases. https://doi.org/10.1371/journal.pntd.0009259

Naish, S., Dale, P., Mackenzie, J. S., McBride, J., Mengersen, K., & Tong, S. (2014). Climate Change and Dengue: A Critical and Systematic Review of Quantitative Modelling Approaches. BMC Infectious Diseases, 14(1). https://doi.org/10.1186/1471-2334-14-167

National Centers for Environmental Information (NCEI). (n.d.). Search | Climate Data Online (CDO) | National Climatic Data Center (NCDC). Retrieved from https://www.ncdc.noaa.gov/cdo-web/search

Navelski, J., & Odongo, K. (2021). Making Use of PCA in the Presence of Multicollinearity: An Application to Predicting Body Fat Percentage. Washington State University. https://s3.wp.wsu.edu/uploads/sites/2762/2022/10/PCA_and_Multicollinearity.pdf

Seasonal-Trend decomposition using LOESS (STL). (n.d.). Retrieved from https://www.statsmodels.org/dev/examples/notebooks/generated/stl_decomposition.html

Wang, H., Yao, R., Hou, L., Zhao, J., & Zhao, X. (2021). A Methodology for Calculating the Contribution of Exogenous Variables to ARIMAX Predictions. Proceedings of the Canadian Conference on Artificial Intelligence. https://doi.org/10.21428/594757db.2c2969c0

Published

2025-03-05

How to Cite

Gottipati, A., & Iragavarapu, S. (2025). Forecasting the Spread of Dengue Outbreaks with a Synthesis of Machine Learning Models Utilizing Exogenous Variables. Industrial and Systems Engineering Review, 12(1), 13-28. https://doi.org/10.37266/ISER.2025v12i1.pp13-28