Handling Missing Data in Time Series

Duration: 5 min

This module delves into the critical task of handling missing data in time series forecasting. Missing data can significantly impact the accuracy and reliability of time series models. Understanding and effectively managing missing values is essential for maintaining the integrity of your forecasts.

Understanding Missing Data Types

Missing data in time series can be classified into three types: Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR). MCAR occurs when data is missing independently of both observed and unobserved values. MAR occurs when the missingness is related to observed data but not to the missing data itself. MNAR occurs when the missingness depends on the missing data.

import pandas as pd
import numpy as np

# Create a sample time series with missing values
data = {'date': pd.date_range(start='1/1/2020', periods=10),
         'value': [1, 2, np.nan, 4, 5, np.nan, 7, 8, np.nan, 10]}
df = pd.DataFrame(data)

# Print the original DataFrame
print('Original DataFrame:')
print(df)

# Fill missing values with forward fill method
df_filled = df.fillna(method='ffill')

# Print the DataFrame after filling missing values
print('\nDataFrame after filling missing values:')
print(df_filled)

Try it in Google Colab:

Original DataFrame:
        date  value
0 2020-01-01    1.0
1 2020-01-02    2.0
2 2020-01-03    NaN
3 2020-01-04    4.0
4 2020-01-05    5.0
5 2020-01-06    NaN
6 2020-01-07    7.0
7 2020-01-08    8.0
8 2020-01-09    NaN
9 2020-01-10   10.0

DataFrame after filling missing values:
        date  value
0 2020-01-01    1.0
1 2020-01-02    2.0
2 2020-01-03    2.0
3 2020-01-04    4.0
4 2020-01-05    5.0
5 2020-01-06    5.0
6 2020-01-07    7.0
7 2020-01-08    8.0
8 2020-01-09    8.0
9 2020-01-10   10.0

Advanced Techniques for Handling Missing Data

Advanced techniques for handling missing data include interpolation, regression imputation, and model-based approaches. Interpolation fills missing values based on the trend of the data, while regression imputation uses a regression model to predict missing values. Model-based approaches use algorithms like K-Nearest Neighbors (KNN) or machine learning models to estimate missing values.

from sklearn.impute import KNNImputer
import pandas as pd
import numpy as np

# Create a sample time series with missing values
data = {'date': pd.date_range(start='1/1/2020', periods=10),
         'value': [1, 2, np.nan, 4, 5, np.nan, 7, 8, np.nan, 10]}
df = pd.DataFrame(data)

# Separate date and value columns
dates = df['date']
values = df[['value']]

# Apply KNNImputer to fill missing values
imputer = KNNImputer(n_neighbors=2)
values_filled = imputer.fit_transform(values)

# Create a new DataFrame with filled values
df_filled = pd.DataFrame(values_filled, columns=['value'])
df_filled['date'] = dates

# Print the DataFrame after filling missing values
print('DataFrame after filling missing values using KNNImputer:')
print(df_filled)

💡 Tip: When using KNNImputer, carefully choose the number of neighbors (n_neighbors) to balance between overfitting and underfitting. A common practice is to start with a small number and increase it if necessary.

❓ What are the three types of missing data in time series?

MCAR, MAR, MNAR MCAR, MAD, MNAR MCAR, MAR, MCAR MCAR, MAR, MRAD

❓ Which method is used in the second code example to handle missing data?

Forward fill Backward fill KNNImputer Linear interpolation

Key Concepts

Concept	Description
Trend	Core principle in this module
Seasonality	Core principle in this module
Stationarity	Core principle in this module
Autocorrelation	Core principle in this module

Check Your Understanding

❓ How does Handling handle edge cases?

Ignores them Applies regularization Removes them Duplicates them

❓ What is the computational complexity of Handling?

O(n) O(n²) O(log n) Depends on implementation

❓ Which hyperparameter is most critical for Handling?

Learning rate Batch size Epochs All equally important

Handling Missing Data in Time Series

Understanding Missing Data Types

Advanced Techniques for Handling Missing Data

Key Concepts

Check Your Understanding

Related Courses