Python for Machine Learning
Duration: 8 min
This module delves into the advanced use of Python for machine learning, a crucial skill for anyone aspiring to develop intelligent systems. We will explore libraries such as TensorFlow and scikit-learn, and understand how Python's syntax and features facilitate the creation of sophisticated machine learning models.
Supervised Learning with Scikit-learn
Supervised learning involves training a model on a labeled dataset, which consists of input-output pairs. Scikit-learn is a powerful library that provides simple and efficient tools for data mining and data analysis. It is particularly useful for implementing various supervised learning algorithms like linear regression, logistic regression, and support vector machines.
example1.py
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np
# Generate some sample data
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
y = np.dot(X, np.array([1, 2])) + 3
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create a linear regression model
model = LinearRegression()
# Train the model
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Calculate the mean squared error
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')Mean Squared Error: [value] (Note: the actual value will depend on the random state and the split of the data)Unsupervised Learning with K-Means Clustering
Unsupervised learning deals with unlabeled data and aims to learn the patterns and structure from the data without predefined labels. K-Means clustering is a popular unsupervised learning algorithm that partitions the data into K distinct clusters based on feature similarity. It is widely used for customer segmentation, image compression, and more.
example2.py
from sklearn.cluster import KMeans
import numpy as np
import matplotlib.pyplot as plt
# Generate sample data
X = np.array([[1, 2], [1, 4], [1, 0],
[4, 2], [4, 4], [4, 0]])
# Create a KMeans model
kmeans = KMeans(n_clusters=2, random_state=42)
# Fit the model to the data
kmeans.fit(X)
# Get the cluster centers and labels
centroids = kmeans.cluster_centers_
labels = kmeans.labels_
# Plot the data points and centroids
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.scatter(centroids[:, 0], centroids[:, 1], s=300, c='red', marker='X')
plt.show()💡 Tip: When using K-Means clustering, it's important to choose the right number of clusters. The Elbow Method is a useful technique to determine the optimal number of clusters by plotting the explained variation as a function of the number of clusters.
❓ What is the primary purpose of using train_test_split in the first code example?
❓ In the second code example, what does the KMeans algorithm do?