Big Data Mining and Analytics


COVID-19, regression, correlation, machine learning, prediction


The novel coronavirus outbreak was first reported in late December 2019 and more than 7 million people were infected with this disease and over 0.40 million worldwide lost their lives. The first case was diagnosed on 30 January 2020 in India and the figure crossed 0.24 million as of 6 June 2020. This paper presents a detailed study of recently developed forecasting models and predicts the number of confirmed, recovered, and death cases in India caused by COVID-19. The correlation coefficients and multiple linear regression applied for prediction and autocorrelation and autoregression have been used to improve the accuracy. The predicted number of cases shows a good agreement with 0.9992 R-squared score to the actual values. The finding suggests that lockdown and social distancing are two important factors that can help to suppress the increasing spread rate of COVID-19.


Tsinghua University Press