Data Sciencing Covid-19 with Ensemble Models

Njapa, Christelle

Files

Njpa_81941900_2022.pdf

Closed access
Adobe PDF
8.97 MB

Details

Supervisors: Pircalabelu, Eugen
Faculty: Faculté des sciences
Degree label: Master [120] en science des données, orientation statistique, à finalité spécialisée
Abstract: The entire world is affected by the COVID-19 pandemic and we first heard about in the late 2019. The COVID-19 virus has been responsible for many millions of confirmed and death cases across the world and it has severely affected the economy and the public health status of the people in various countries. This disease radically changed the human life on different plans, for example at work, in restaurants, in public places, in family too and we have changed many of our habits. Even if this outbreak is in our life since a moment ago, it is difficult to anticipate its spread and many models are used in order to predict and forecast on a short term for example the new cases and deaths due to COVID-19. The model is the tool that helps the stakeholders and the authorities to define the rules for the populations as we can prevent for example the new waves. That can help the governments plan policies to prevent the spread of the virus, the hospitals to plan the number of beds for new patients. So, this step is a crucial point to perform in this outbreak as for the moment at which I am writing this thesis, there are no vaccines that could give an immunity to people not to have the disease anymore, the vaccines for now only help people to not develop the severe forms of the disease and in this case a vaccinated person could contract the disease. As there is no cure available for the disease, it becomes important to estimate the number of potential cases that may occur using available data. Many models are used to predict and forecast the new and death cases of COVID-19 although there are no consistent conclusions on which ones are better, maybe a single model might not be accurate but recent studies suggest combining multiple single models may have better performance. By using this approach, we can produce an optimal predictive model called "ensemble model". In this study, we conduct a comparative assessment of the performance of four popular ensemble methods : Bagging, Boosting, Voting and Stacking, based on ten base learners from three approaches : Machine Learning models (Decision Tree, Support Vector Machine, K-Nearest Neighbors), Deep Learning models (Recurrent Neural Network, Long Short-Term Memory, Bidirectional Long-Short Term Memory, Convolutional Neural Network, Gated Re- current Units, Multi-Layer Perceptron) and statistical models (Seasonal and non-seasonal Autoregressive Integrated Moving Average, Autoregressive Models). In order to compare the performance of several single models and ensemble models, the Root Mean Square Error (RMSE) was selected as performance measure for comparison of the accuracy of the models. Our models have been applied on times series data of COVID-19 for confirmed and death cases at different levels : firstly at national level where we use Belgium as case study with data provided by Belgian institute for health (Sciensano), secondly at regional level within Belgium (Brussels, Flanders, Wallonia) provided by Belgian institute for health (Sciensano) and thirdly at international level (France, Germany, India, Italy, Luxembourg, USA) with data provided by the World Health Organisation (WHO). They consist of time series data from March 01, 2020, to June 30, 2022 for confirmed and death cases. Experimental results reveal that our ensemble models are generally in top three best models in each level comparison and in terms of Root Means Square Error, they generally outperform most of the single models in both confirmed and death cases.

ATTENTION/WARNING - NE PAS DÉPOSER ICI/DO NOT SUBMIT HERE

Data Sciencing Covid-19 with Ensemble Models

Files

Njpa_81941900_2022.pdf

Details