Supervised Learning: Classification

Duration: 5 min

This module delves into the realm of supervised learning with a focus on classification algorithms. You will learn the fundamental concepts, algorithms, and techniques required to build and evaluate classification models. Understanding classification is crucial as it forms the backbone of many real-world applications, from spam detection to medical diagnosis.

Understanding Classification Algorithms

Classification algorithms are a subset of supervised learning techniques used to predict categorical labels. These algorithms learn from labeled training data to make predictions on unseen data. Common classification algorithms include Logistic Regression, Decision Trees, and Support Vector Machines (SVM). Each algorithm has its strengths and weaknesses, making it suitable for different types of classification problems.

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load the Iris dataset
data = load_iris()
X, y = data.data, data.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the Logistic Regression model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

Try it in Google Colab:

Accuracy: 1.00

Feature Engineering for Classification

Feature engineering is the process of selecting, transforming, and creating features to improve the performance of machine learning models. In classification, effective feature engineering can significantly enhance model accuracy. Techniques include scaling features, encoding categorical variables, and creating interaction terms. Proper feature engineering is crucial for building robust and accurate classification models.

import pandas as pd
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier

# Sample DataFrame
data = {'feature1': [1, 2, 3, 4], 'feature2': ['A', 'B', 'A', 'B'], 'target': [0, 1, 0, 1]}
df = pd.DataFrame(data)

# Define preprocessing for numeric and categorical features
numeric_features = ['feature1']
categorical_features = ['feature2']

numeric_transformer = StandardScaler()
categorical_transformer = OneHotEncoder(drop='first')

preprocessor = ColumnTransformer(transformers=[
    ('num', numeric_transformer, numeric_features),
    ('cat', categorical_transformer, categorical_features)
])

# Create a pipeline that preprocesses the data and then fits a Random Forest model
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('classifier', RandomForestClassifier())
])

# Split the data into features and target
X = df.drop('target', axis=1)
y = df['target']

# Fit the pipeline
pipeline.fit(X, y)

# Predict using the pipeline
y_pred = pipeline.predict(X)
print(y_pred)

💡 Tip: Always evaluate the performance of your classification model using appropriate metrics such as accuracy, precision, recall, and F1-score. Relying solely on accuracy can be misleading, especially in imbalanced datasets.

❓ Which algorithm is used in the first code example for classification?

Decision Tree K-Nearest Neighbors Logistic Regression Support Vector Machine

❓ What is the purpose of feature engineering in classification?

To reduce model complexity To improve model accuracy To decrease training time To visualize data

Supervised Learning: Classification

Understanding Classification Algorithms

Feature Engineering for Classification

Related Courses