Python for Machine Learning

Duration: 8 min

This module delves into the advanced use of Python for machine learning, a crucial skill for anyone aspiring to develop intelligent systems. We will explore libraries such as TensorFlow and scikit-learn, and understand how Python's syntax and features facilitate the creation of sophisticated machine learning models.

Supervised Learning with Scikit-learn

Supervised learning involves training a model on a labeled dataset, which consists of input-output pairs. Scikit-learn is a powerful library that provides simple and efficient tools for data mining and data analysis. It is particularly useful for implementing various supervised learning algorithms like linear regression, logistic regression, and support vector machines.

example1.py

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate some sample data
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
y = np.dot(X, np.array([1, 2])) + 3

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a linear regression model
model = LinearRegression()

# Train the model
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Calculate the mean squared error
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')

Try it in Google Colab:

Mean Squared Error: [value] (Note: the actual value will depend on the random state and the split of the data)

Unsupervised Learning with K-Means Clustering

Unsupervised learning deals with unlabeled data and aims to learn the patterns and structure from the data without predefined labels. K-Means clustering is a popular unsupervised learning algorithm that partitions the data into K distinct clusters based on feature similarity. It is widely used for customer segmentation, image compression, and more.

example2.py

from sklearn.cluster import KMeans
import numpy as np
import matplotlib.pyplot as plt

# Generate sample data
X = np.array([[1, 2], [1, 4], [1, 0],
              [4, 2], [4, 4], [4, 0]])

# Create a KMeans model
kmeans = KMeans(n_clusters=2, random_state=42)

# Fit the model to the data
kmeans.fit(X)

# Get the cluster centers and labels
centroids = kmeans.cluster_centers_
labels = kmeans.labels_

# Plot the data points and centroids
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.scatter(centroids[:, 0], centroids[:, 1], s=300, c='red', marker='X')
plt.show()

💡 Tip: When using K-Means clustering, it's important to choose the right number of clusters. The Elbow Method is a useful technique to determine the optimal number of clusters by plotting the explained variation as a function of the number of clusters.

❓ What is the primary purpose of using train_test_split in the first code example?

To visualize the data To split the data into training and testing sets To train the model To evaluate the model's performance

❓ In the second code example, what does the KMeans algorithm do?

It trains a model on labeled data It partitions the data into clusters based on feature similarity It predicts the output for new data It evaluates the model's accuracy

Python for Machine Learning

Supervised Learning with Scikit-learn

Unsupervised Learning with K-Means Clustering

Related Courses