*This post was originally published by Sharmistha Chatterjee at Hacker Noon*

**Here in this blog, we try to model Single-Step Time Series Prediction, using Deep Learning Models, on the basis of Medical Information available for different states of India.****Motivation**

*Predict Number of Active Cases by Covid-19 Pandemic based on Medical Facilities (Volume of Testing, ICU beds, Ventilators, Isolation Units, etc)**using* *Multi-variate LSTM based forecasting*

Considering all these factors, it becomes important to have a predictive model that can predict the Number of Active Cases, Deaths and Recoveries based on the change in Medical Facilities as well as other changes in infrastructure.

## Single Step Time Series Prediction

One step time series prediction is a supervised machine learning task that comes with the functionality where the **previous n-values** are available when the next value in the time-series is predicted. In contrast, multi-step prediction involves prediction for **x future steps.**

The following figure depicts the different life cycle stages of time-series model training and prediction.

- Feeding
**Multi-variate data**from a single source or from aggregated sources available directly from the cloud or other 3rd-party providers into the ML modeling data ingestion system. - Cleaning, preprocessing, and feature engineering of the data involving
**scaling**and**normalization**.

- Conversion of the data to a
**supervised time-series**. - Feeding the data to a deep learning training source that can train different time-series models like
**LSTM, CNN, BI-LSTM, CNN+LSTM**using different combinations of**hidden layers, neurons, batch-size, and other hyper-parameters.** - Forecasting based on
**near term**or**far distant term**in future either using**Single-Step or Multi-Step Forecasting respectively.** - Evaluation of some of the error metrics like (
**MAPE, MAE, ME, RMSE, MPE**) by comparing it with the actual data, when it comes inRe-training the**model and continuous improvements**when the threshold of error exceeds.

## Data Loading and Selecting Features

As Delhi had high Covid-19 cases, here we model different DL models for the **“DELHI” State (National Capital of India).** Further**,** we keep the scope of dates from 25th March to 6th June 2020. Data till 29th April has been used for Training, whereas from 30th April to 6th June has been used for testing/prediction.

*Here, we have selected features mostly related to the availability of medical facilities like hospitals, ICU beds, the amount of testing facilities, and a number of cured/discharged/migrated/quarantined centers.*

```
stateName = unique_states[34]
dataset =list_state_all[34]
dataset = dataset.sort_values(by='Date', ascending=True)
dataset = dataset[(dataset['Date'] >= '2020-03-25') & (dataset['Date'] <= '2020-06-06')]
daterange = dataset['Date'].values
no_Dates = len(daterange)
dateStart = daterange[0]
dateEnd = daterange[no_Dates - 1]
dataset = dataset[['Total Confirmed cases','Death',
'Cured/Discharged/Migrated', 'coronaenquirycalls',
'cumulativepeopleinquarantine', 'negative', 'numcallsstatehelpline',
'numicubeds', 'numisolationbeds', 'numventilators',
'populationncp2019projection', 'positive',
'testpositivityrate',
'testspermillion', 'testsperpositivecase', 'testsperthousand',
'totaln95masks', 'totalpeoplecurrentlyinquarantine',
'totalpeoplereleasedfromquarantine', 'totalppe', 'totaltested',
'unconfirmed', 'Active Cases']]
```

As we have 22 features in total, we ensure each of the input features are initially scaled and then are ** time-shifted by one unit (t+1) th output for t th input** to yield in 22 input features plus one output predicted outcome, i.e. The

**Number of Active Cases**. The rest of the columns are dropped. The below code snippet explains that in detail.

## Feature Scaling

This becomes very important given, as in this current problem scope the features vary in the range too much, (10 to 1000000)

```
#no_features = 22
no_features = np.shape(dataset)[1]-1
print("No of features", no_features)
values = dataset.values
# ensure all data is float
values = values.astype('float32')
print(np.shape(values))
# normalize features
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(values)
reframed = series_to_supervised(scaled, 1, 1)
# drop columns we don't want to predict
print(np.shape(reframed))
```

## Convert Time-Series to a Supervised DataSet

This procedure is known as a **one-step prediction** in time series which uses lagged (one) observations (e.g. t-1) as input variables to forecast the current time step (t). This ensures all series are stationary with differencing and seasonal adjustment.

```
# # convert series to supervised learning
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
n_vars = 1 if type(data) is list else data.shape[1]
df = pd.DataFrame(data)
cols, names = list(), list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
names += [('var%d(t-%d)' % (j + 1, i)) for j in range(n_vars)]
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
if i == 0:
names += [('var%d(t)' % (j + 1)) for j in range(n_vars)]
else:
names += [('var%d(t+%d)' % (j + 1, i)) for j in range(n_vars)]
# put it all together
agg = pd.concat(cols, axis=1)
agg.columns = names
# drop rows with NaN values
if dropnan:
agg.dropna(inplace=True)
return agg
```

After the redundant/un-necessary columns are dropped (24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45) the entire dataset is split into training and testing dataset in the **ratio of 60%:40%,** and then we apply different deep learning techniques.

As we train only on the basis of 22 features and predict one output, columns starting from 24 to 45 are dropped.

```
reframed.drop(reframed.columns[[24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45]], axis=1, inplace=True)
# split into train and test sets
values = reframed.values
split_factor = int(dataset.shape[0]*0.6)
print(split_factor)
train = values[:split_factor, :]
test = values[split_factor:, :]
print(np.shape(train))
print(np.shape(test))
# split into input and outputs
train_X, train_y = train[:, :-1], train[:, -1]
test_X, test_y = test[:, :-1], test[:, -1]
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
print(train_X.shape[1], train_X.shape[2])
```

The following code snippet demonstrates how we train an **LSTM model** , plot the t**raining and validation loss,** before making a prediction.

```
# design Stacked LSTM networks
model = Sequential()
model.add(LSTM(units=50, return_sequences= True, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(LSTM(units=50, return_sequences=True))
model.add(LSTM(units=50))
model.add(Dense(units=1))
model.compile(loss='mae', optimizer='adam')
# fit network
history = model.fit(train_X, train_y, epochs=1500, batch_size=72, validation_data=(test_X, test_y), verbose=2, shuffle=False)
# plot history
plt.figure(figsize=(14,12))
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='test')
plt.legend()
plt.show()
# make a prediction
y_predict = model.predict(test_X)
test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
```

Training vs Validation Loss

This code snippet shows mechanism to compute the error metrics and inverse scale the predicted outcome.

```
plt.figure(figsize=(14,12))
plt.plot(test_y, label='actual')
plt.plot(y_predict, label='predicted')
plt.title('Scaled LSTM based Time Series Active Cases Prediction for state ' + stateName)
plt.legend()
plt.show()
rmse = np.sqrt(mean_squared_error(test_y, y_predict))
print('Test RMSE: %.3f' % rmse)
inv_y_predict = concatenate((y_predict, test_X[:, -(no_features):]), axis=1)
inv_y_predict = scaler.inverse_transform(inv_y_predict)
inv_y_predict = inv_y_predict[:, 0]
# invert scaling for actual
test_y = test_y.reshape((len(test_y), 1))
inv_y = concatenate((test_y, test_X[:, 1:]), axis=1)
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:, 0]
# calculate RMSE
rmse = np.sqrt(mean_squared_error(inv_y, inv_y_predict))
print('Test RMSE: %.3f' % rmse)
pred_len = len(inv_y_predict)
print(pred_len)
dateEnd = daterange[split_factor+1]
print(dateEnd)
pred_index= pd.date_range(start=dateEnd, periods=pred_len, freq='D')
print(pred_index)
inv_y_actual = pd.Series(inv_y, pred_index)
inv_y_predicted = pd.Series(inv_y_predict, pred_index)
plt.figure(figsize=(14,12))
plt.plot(inv_y_actual, label='actual')
plt.plot(inv_y_predicted, label='predicted')
plt.title('LSTM based Time Series Active Cases Prediction for state ' + stateName)
plt.legend()
plt.show()
```

The below figure illustrates the **Actual vs Predicted Outcome of LSTM model**, after the predicted outcome has been inverse -transformed (to remove the effect of scaling).

## Bi-directional LSTM

As we know LSTM (Uni-directional) preserves information from inputs to the outputs that have already passed through it using the hidden state.

On the contrary, bidirectional will run inputs in two ways, one from past to future and one from future to past. This kind of LSTM that runs backwards to preserve information from the **future** and using the two hidden states combined, it is able in any point in time to preserve information from **both past and future**

The following code snippet demonstrates how we train a **Bi-LSTM model**, plot the t**raining and validation loss,** before making a prediction.

```
train = values[:split_factor, :]
test = values[split_factor:, :]
print(np.shape(train))
print(np.shape(test))
# split into input and outputs
train_X, train_y = train[:, :-1], train[:, -1]
test_X, test_y = test[:, :-1], test[:, -1]
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
print(train_X.shape[1], train_X.shape[2])
# design Stacked LSTM networks/Bi-directional LSTM networks
model = Sequential()
model.add(Bidirectional(LSTM(50, activation='relu'), input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam')
# fit network
history = model.fit(train_X, train_y, epochs=1500, batch_size=72, validation_data=(test_X, test_y), verbose=2, shuffle=False)
```

The below figure illustrates the **Actual vs Predicted Outcome of Bi-LSTM model** , after the predicted outcome has been inverse -transformed (to remove the effect of scaling).

## CNN (Convolution Neural Network)

We also used CNN for evaluating the model performance for single-step time-series prediction.

The following code snippet demonstrates how we train a **CNN model**, plot the t**raining and validation loss,** before making a prediction.

```
train_X, train_y = train[:, :-1], train[:, -1]
test_X, test_y = test[:, :-1], test[:, -1]
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], train_X.shape[1], 1))
test_X = test_X.reshape((test_X.shape[0], test_X.shape[1], 1))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
#CNN
model = Sequential()
model.add(Conv1D(filters=64, kernel_size=2, activation='relu', input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(MaxPooling1D(pool_size=2))
model.add(Conv1D(filters=64, kernel_size=2, activation='relu', input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(100, activation='relu'))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')
model.summary()
#fit model
history =model.fit(train_X, train_y, epochs=1500, batch_size=72, validation_data=(test_X, test_y), verbose=2,shuffle=False)
```

Use the **trained model for prediction**

```
# make a prediction
y_predict = model.predict(test_X)
test_X = test_X.reshape((test_X.shape[0], test_X.shape[1]))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
```

## Inverse Transform Predicted and Computation of Error Metrics

```
inv_y_predict = concatenate((y_predict, test_X[:, -(no_features):]), axis=1)
inv_y_predict = scaler.inverse_transform(inv_y_predict)
inv_y_predict = inv_y_predict[:, 0]
# invert scaling for actual
test_y = test_y.reshape((len(test_y), 1))
inv_y = concatenate((test_y, test_X[:, 1:]), axis=1)
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:, 0]
# calculate RMSE
rmse = np.sqrt(mean_squared_error(inv_y, inv_y_predict))
print('Test RMSE: %.3f' % rmse)
pred_len = len(inv_y_predict)
print(pred_len)
dateEnd = daterange[split_factor+1]
print(dateEnd)
pred_index= pd.date_range(start=dateEnd, periods=pred_len, freq='D')
#print(pred_index)
inv_y_actual = pd.Series(inv_y, pred_index)
inv_y_predicted = pd.Series(inv_y_predict, pred_index)
```

The below figure illustrates the **Actual vs Predicted Outcome of CNN model**, after the predicted outcome has been inverse -transformed (to remove the effect of scaling).

## CNN + LSTM

Here we have used **Conv1d with TimeDistributed Layer,** which is then fed to a **single layer of LSTM**, to predicted different sequences, as illustrated by the figure below.

TimeDistributed Layer is primarily used to present several sets of data (say images) that are chronologically ordered to detect movements, actions, directions.

```
# split into input and outputs
train_X, train_y = train[:, :-1], train[:, -1]
test_X, test_y = test[:, :-1], test[:, -1]
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
#LSTM + CNN
subsequences = 1
timesteps = train_X.shape[1]
X_train_series_sub = train_X.reshape((train_X.shape[0], subsequences, timesteps, 1))
X_valid_series_sub = test_X.reshape((test_X.shape[0], subsequences, timesteps, 1))
print('Train set shape', X_train_series_sub.shape)
print('Validation set shape', X_valid_series_sub.shape)
model = Sequential()
model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation='relu'), input_shape=(None, X_train_series_sub.shape[2], X_valid_series_sub.shape[3])))
model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(50, activation='relu'))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')
history = model.fit(X_train_series_sub, train_y, validation_data=(X_valid_series_sub, test_y), epochs=1500, verbose=2)
```

The prediction and inverse scaling help to yield the actual predicted outcomes.

```
#Prediction (LSTM + CNN)
yhat = model.predict(X_valid_series_sub)
print(yhat)
test_X = X_valid_series_sub.reshape((X_valid_series_sub.shape[0], X_valid_series_sub.shape[2]))
inv_y_predict = concatenate((y_predict, test_X[:, -(no_features):]), axis=1)
inv_y_predict = scaler.inverse_transform(inv_y_predict)
inv_y_predict = inv_y_predict[:, 0]
# invert scaling for actual
test_y = test_y.reshape((len(test_y), 1))
inv_y = concatenate((test_y, test_X[:, 1:]), axis=1)
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:, 0]
# calculate RMSE
rmse = np.sqrt(mean_squared_error(inv_y, inv_y_predict))
print('Test RMSE: %.3f' % rmse)
pred_len = len(inv_y_predict)
print(pred_len)
dateEnd = daterange[split_factor+1]
print(dateEnd)
pred_index= pd.date_range(start=dateEnd, periods=pred_len, freq='D')
#print(pred_index)
inv_y_actual = pd.Series(inv_y, pred_index)
inv_y_predicted = pd.Series(inv_y_predict, pred_index)
```

The below figure illustrates the **Actual vs Predicted Outcome of stacked LSTM and CNN model** after the predicted outcome has been inverse -transformed (to remove the effect of scaling).

```
Epoch 1494/1500
58/58 - 0s - loss: 3.2615e-06 - val_loss: 0.0056
Epoch 1495/1500
58/58 - 0s - loss: 3.3479e-06 - val_loss: 0.0056
Epoch 1496/1500
58/58 - 0s - loss: 3.3705e-06 - val_loss: 0.0053
Epoch 1497/1500
58/58 - 0s - loss: 3.2291e-06 - val_loss: 0.0054
Epoch 1498/1500
58/58 - 0s - loss: 3.0793e-06 - val_loss: 0.0056
Epoch 1499/1500
58/58 - 0s - loss: 3.8484e-06 - val_loss: 0.0055
Epoch 1500/1500
58/58 - 0s - loss: 3.8213e-06 - val_loss: 0.0054
```

The following table depicts the computed RMSE metrics for each of the deep learning models.

## Conclusion

Here we see **bi-directional LSTM works the best**, followed by **multiple stacked layers of LSTM and single LSTM layer**. This is just a basic study and results might differ based on the dataset. In the next blog (series 2 ) we will see different multi-step prediction results.

More extensive hyper-parameter tuning is needed along with **dynamic data** featuring a **change in medical facilities and supplies.**

For complete source code check out https://github.com/sharmi1206/covid-19-analysis

## References

- https://arxiv.org/pdf/1801.02143.pdfhttps://machinelearningmastery.com/multi-step-time-series-forecasting/
- https://machinelearningmastery.com/multi-step-time-series-forecasting-with-machine-learning-models-for-household-electricity-consumption/
- https://machinelearningmastery.com/how-to-develop-lstm-models-for-multi-step-time-series-forecasting-of-household-power-consumption/
- https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
- https://www.tensorflow.org/tutorials/structured_data/time_series

*This post was originally published by Sharmistha Chatterjee at Hacker Noon*