Module 3 of 25 · AI & Machine Learning Fundamentals · Beginner

Data Preprocessing and Feature Engineering

Duration: 5 min

This module delves into the crucial steps of data preprocessing and feature engineering, which are essential for preparing raw data for machine learning models. Proper preprocessing and feature engineering can significantly improve model performance, reduce training time, and enhance interpretability.

Data Cleaning and Handling Missing Values

Data cleaning involves identifying and correcting errors, inconsistencies, and inaccuracies in a dataset. Handling missing values is a critical part of this process. Common techniques include removing rows or columns with missing data, imputing missing values using statistical methods, or using machine learning algorithms to predict missing values.

import pandas as pd

# Sample dataset with missing values
data = {'A': [1, 2, None, 4], 'B': [None, 2, 3, 4]}
df = pd.DataFrame(data)

# Handling missing values by filling with the mean
df.fillna(df.mean(), inplace=True)

print(df)

Try it in Google Colab: Open in Colab

     A    B
0  1.0  3.0
1  2.0  2.0
2  2.5  3.0
3  4.0  4.0

Feature Scaling

Feature scaling is the method of normalizing the range of independent variables or features of data. In data processing, it is also known as data normalization and is used for changing the value of the data. In machine learning, we use different algorithms which require scaled features to perform well.

from sklearn.preprocessing import StandardScaler
import numpy as np

# Sample dataset
data = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])

# Applying feature scaling
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)

print(scaled_data)

💡 Tip: Always ensure that feature scaling is applied after handling missing values and before splitting the dataset into training and testing sets to avoid data leakage.

❓ What is the primary purpose of handling missing values in data preprocessing?

❓ Which of the following is a common technique for feature scaling?

← Previous Continue interactively → Next →

Related Courses