Module 6 of 16 · Maths and Statistics in AI · Beginner

Data Visualization Techniques

Duration: 5 min

Data visualization transforms complex data into visual representations that reveal patterns, trends, and insights. Effective visualizations communicate data stories and facilitate better decision-making.

Scatter Plots

Purpose

Show relationships between two continuous variables and identify correlations, trends, and outliers.

Example: Positive Correlation

Scatter Plot: Height vs Weight

Weight (kg)
    │
 90 │     ●
    │   ●   ●
 80 │  ●     ●
    │ ●       ●
 70 │●         ●
    │
 60 └─────────────
    160  170  180
    Height (cm)

Interpretation: Taller people tend to weigh more

Example: No Correlation

Scatter Plot: Age vs Shoe Size

Shoe Size
    │
  12│  ●     ●
    │    ●  ●
  10│  ●   ●
    │   ●    ●
   8│●        ●
    │
   6└─────────────
    20   40   60
    Age (years)

Interpretation: No clear relationship

Python Code

import matplotlib.pyplot as plt
import numpy as np

# Generate correlated data
x = np.random.rand(50) * 100
y = x + np.random.randn(50) * 10

# Create scatter plot
plt.scatter(x, y, alpha=0.6, s=100, color='blue')
plt.xlabel('Variable X')
plt.ylabel('Variable Y')
plt.title('Scatter Plot: Relationship Analysis')
plt.grid(True, alpha=0.3)
plt.show()

Histograms

Purpose

Display the distribution of a single variable across bins to identify patterns, skewness, and outliers.

Example: Normal Distribution

Histogram: Test Scores

Frequency
    │
 30 │        ╱╲
    │       ╱  ╲
 20 │      ╱    ╲
    │     ╱      ╲
 10 │    ╱        ╲
    │   ╱          ╲
  0 └──────────────────
    40  50  60  70  80  90
    Score

Interpretation: Bell-shaped, most scores around 65

Example: Skewed Distribution

Histogram: Income Distribution

Frequency
    │
 50 │ ╱
    │ │
 40 │ │
    │ │
 30 │ │
    │ │  ╱
 20 │ │  │
    │ │  │  ╱
 10 │ │  │  │
    │ │  │  │  ╱
  0 └─────────────
    0  20  40  60  80
    Income ($1000s)

Interpretation: Right-skewed, most earn less

Python Code

import matplotlib.pyplot as plt
import numpy as np

# Generate data
data = np.random.normal(loc=70, scale=15, size=1000)

# Create histogram
plt.hist(data, bins=30, alpha=0.7, color='green', edgecolor='black')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histogram: Distribution Analysis')
plt.grid(True, alpha=0.3, axis='y')
plt.show()

Line Charts

Purpose

Show trends over time or continuous data progression.

Example: Time Series

Line Chart: Stock Price Over Time

Price ($)
    │
 150│        ╱╲
    │       ╱  ╲
 140│      ╱    ╲╱╲
    │     ╱        ╲
 130│    ╱          ╲
    │   ╱            ╲
 120└──────────────────
    Jan Feb Mar Apr May
    Month

Interpretation: Price rises then falls

Python Code

import matplotlib.pyplot as plt
import numpy as np

# Generate time series data
months = np.arange(1, 13)
sales = np.array([100, 120, 115, 140, 160, 155, 180, 190, 175, 200, 210, 220])

# Create line chart
plt.plot(months, sales, marker='o', linewidth=2, markersize=8, color='red')
plt.xlabel('Month')
plt.ylabel('Sales ($1000s)')
plt.title('Line Chart: Sales Trend')
plt.grid(True, alpha=0.3)
plt.xticks(months)
plt.show()

Bar Charts

Purpose

Compare values across different categories.

Example: Category Comparison

Bar Chart: Sales by Region

Sales ($1000s)
    │
 300│ ┌─┐
    │ │ │
 200│ │ │ ┌─┐
    │ │ │ │ │ ┌─┐
 100│ │ │ │ │ │ │
    │ │ │ │ │ │ │
   0└─┴─┴─┴─┴─┴─┴─
    N  S  E  W  NE SE
    Region

Interpretation: North region has highest sales

Python Code

import matplotlib.pyplot as plt

# Data
regions = ['North', 'South', 'East', 'West']
sales = [300, 150, 200, 180]

# Create bar chart
plt.bar(regions, sales, color=['red', 'blue', 'green', 'orange'])
plt.xlabel('Region')
plt.ylabel('Sales ($1000s)')
plt.title('Bar Chart: Regional Sales Comparison')
plt.grid(True, alpha=0.3, axis='y')
plt.show()

Box Plots

Purpose

Show distribution, median, quartiles, and outliers.

Example: Box Plot Structure

Box Plot: Data Distribution

Value
    │
 100│ ●  (outlier)
    │
  80│ ┌─────┐
    │ │     │ (Q3)
  60│ ├─────┤ (median)
    │ │     │ (Q1)
  40│ └─────┘
    │ │
  20│ ●  (outlier)
    │
   0└─────
    Data

Components:
- Box: Q1 to Q3 (middle 50%)
- Line in box: Median
- Whiskers: Min/Max
- Dots: Outliers

Python Code

import matplotlib.pyplot as plt
import numpy as np

# Generate data
data = [np.random.normal(50, 15, 100) for _ in range(4)]

# Create box plot
plt.boxplot(data, labels=['Group A', 'Group B', 'Group C', 'Group D'])
plt.ylabel('Value')
plt.title('Box Plot: Distribution Comparison')
plt.grid(True, alpha=0.3, axis='y')
plt.show()

Heatmaps

Purpose

Show intensity/magnitude of values in a 2D matrix using colors.

Example: Correlation Heatmap

Heatmap: Correlation Matrix

        X    Y    Z
    ┌─────────────┐
  X │ 1.0  0.8  0.2│
    ├─────────────┤
  Y │ 0.8  1.0  0.5│
    ├─────────────┤
  Z │ 0.2  0.5  1.0│
    └─────────────┘

Color intensity: Red (high) → White (low)
Interpretation: X and Y are highly correlated

Python Code

import matplotlib.pyplot as plt
import numpy as np

# Create correlation matrix
data = np.random.randn(100, 3)
corr = np.corrcoef(data.T)

# Create heatmap
plt.imshow(corr, cmap='RdYlBu_r', vmin=-1, vmax=1)
plt.colorbar(label='Correlation')
plt.xticks([0, 1, 2], ['X', 'Y', 'Z'])
plt.yticks([0, 1, 2], ['X', 'Y', 'Z'])
plt.title('Heatmap: Correlation Matrix')
plt.show()

Best Practices

  1. Choose the right chart type for your data
  2. Label axes clearly with units
  3. Use colors meaningfully (not just for aesthetics)
  4. Avoid clutter - remove unnecessary elements
  5. Provide context with titles and legends
  6. Consider your audience - simplify for general viewers

❓ What is the primary purpose of a scatter plot?

❓ Which chart type is best for showing distribution?

❓ What does a box plot show?

❓ Which chart type is best for showing trends over time?

← Previous Continue interactively → Next →

Related Courses