Data Visualization Techniques
Duration: 5 min
Data visualization transforms complex data into visual representations that reveal patterns, trends, and insights. Effective visualizations communicate data stories and facilitate better decision-making.
Scatter Plots
Purpose
Show relationships between two continuous variables and identify correlations, trends, and outliers.
Example: Positive Correlation
Scatter Plot: Height vs Weight
Weight (kg)
│
90 │ ●
│ ● ●
80 │ ● ●
│ ● ●
70 │● ●
│
60 └─────────────
160 170 180
Height (cm)
Interpretation: Taller people tend to weigh moreExample: No Correlation
Scatter Plot: Age vs Shoe Size
Shoe Size
│
12│ ● ●
│ ● ●
10│ ● ●
│ ● ●
8│● ●
│
6└─────────────
20 40 60
Age (years)
Interpretation: No clear relationshipPython Code
import matplotlib.pyplot as plt
import numpy as np
# Generate correlated data
x = np.random.rand(50) * 100
y = x + np.random.randn(50) * 10
# Create scatter plot
plt.scatter(x, y, alpha=0.6, s=100, color='blue')
plt.xlabel('Variable X')
plt.ylabel('Variable Y')
plt.title('Scatter Plot: Relationship Analysis')
plt.grid(True, alpha=0.3)
plt.show()Histograms
Purpose
Display the distribution of a single variable across bins to identify patterns, skewness, and outliers.
Example: Normal Distribution
Histogram: Test Scores
Frequency
│
30 │ ╱╲
│ ╱ ╲
20 │ ╱ ╲
│ ╱ ╲
10 │ ╱ ╲
│ ╱ ╲
0 └──────────────────
40 50 60 70 80 90
Score
Interpretation: Bell-shaped, most scores around 65Example: Skewed Distribution
Histogram: Income Distribution
Frequency
│
50 │ ╱
│ │
40 │ │
│ │
30 │ │
│ │ ╱
20 │ │ │
│ │ │ ╱
10 │ │ │ │
│ │ │ │ ╱
0 └─────────────
0 20 40 60 80
Income ($1000s)
Interpretation: Right-skewed, most earn lessPython Code
import matplotlib.pyplot as plt
import numpy as np
# Generate data
data = np.random.normal(loc=70, scale=15, size=1000)
# Create histogram
plt.hist(data, bins=30, alpha=0.7, color='green', edgecolor='black')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histogram: Distribution Analysis')
plt.grid(True, alpha=0.3, axis='y')
plt.show()Line Charts
Purpose
Show trends over time or continuous data progression.
Example: Time Series
Line Chart: Stock Price Over Time
Price ($)
│
150│ ╱╲
│ ╱ ╲
140│ ╱ ╲╱╲
│ ╱ ╲
130│ ╱ ╲
│ ╱ ╲
120└──────────────────
Jan Feb Mar Apr May
Month
Interpretation: Price rises then fallsPython Code
import matplotlib.pyplot as plt
import numpy as np
# Generate time series data
months = np.arange(1, 13)
sales = np.array([100, 120, 115, 140, 160, 155, 180, 190, 175, 200, 210, 220])
# Create line chart
plt.plot(months, sales, marker='o', linewidth=2, markersize=8, color='red')
plt.xlabel('Month')
plt.ylabel('Sales ($1000s)')
plt.title('Line Chart: Sales Trend')
plt.grid(True, alpha=0.3)
plt.xticks(months)
plt.show()Bar Charts
Purpose
Compare values across different categories.
Example: Category Comparison
Bar Chart: Sales by Region
Sales ($1000s)
│
300│ ┌─┐
│ │ │
200│ │ │ ┌─┐
│ │ │ │ │ ┌─┐
100│ │ │ │ │ │ │
│ │ │ │ │ │ │
0└─┴─┴─┴─┴─┴─┴─
N S E W NE SE
Region
Interpretation: North region has highest salesPython Code
import matplotlib.pyplot as plt
# Data
regions = ['North', 'South', 'East', 'West']
sales = [300, 150, 200, 180]
# Create bar chart
plt.bar(regions, sales, color=['red', 'blue', 'green', 'orange'])
plt.xlabel('Region')
plt.ylabel('Sales ($1000s)')
plt.title('Bar Chart: Regional Sales Comparison')
plt.grid(True, alpha=0.3, axis='y')
plt.show()Box Plots
Purpose
Show distribution, median, quartiles, and outliers.
Example: Box Plot Structure
Box Plot: Data Distribution
Value
│
100│ ● (outlier)
│
80│ ┌─────┐
│ │ │ (Q3)
60│ ├─────┤ (median)
│ │ │ (Q1)
40│ └─────┘
│ │
20│ ● (outlier)
│
0└─────
Data
Components:
- Box: Q1 to Q3 (middle 50%)
- Line in box: Median
- Whiskers: Min/Max
- Dots: OutliersPython Code
import matplotlib.pyplot as plt
import numpy as np
# Generate data
data = [np.random.normal(50, 15, 100) for _ in range(4)]
# Create box plot
plt.boxplot(data, labels=['Group A', 'Group B', 'Group C', 'Group D'])
plt.ylabel('Value')
plt.title('Box Plot: Distribution Comparison')
plt.grid(True, alpha=0.3, axis='y')
plt.show()Heatmaps
Purpose
Show intensity/magnitude of values in a 2D matrix using colors.
Example: Correlation Heatmap
Heatmap: Correlation Matrix
X Y Z
┌─────────────┐
X │ 1.0 0.8 0.2│
├─────────────┤
Y │ 0.8 1.0 0.5│
├─────────────┤
Z │ 0.2 0.5 1.0│
└─────────────┘
Color intensity: Red (high) → White (low)
Interpretation: X and Y are highly correlatedPython Code
import matplotlib.pyplot as plt
import numpy as np
# Create correlation matrix
data = np.random.randn(100, 3)
corr = np.corrcoef(data.T)
# Create heatmap
plt.imshow(corr, cmap='RdYlBu_r', vmin=-1, vmax=1)
plt.colorbar(label='Correlation')
plt.xticks([0, 1, 2], ['X', 'Y', 'Z'])
plt.yticks([0, 1, 2], ['X', 'Y', 'Z'])
plt.title('Heatmap: Correlation Matrix')
plt.show()Best Practices
- Choose the right chart type for your data
- Label axes clearly with units
- Use colors meaningfully (not just for aesthetics)
- Avoid clutter - remove unnecessary elements
- Provide context with titles and legends
- Consider your audience - simplify for general viewers
❓ What is the primary purpose of a scatter plot?
❓ Which chart type is best for showing distribution?
❓ What does a box plot show?
❓ Which chart type is best for showing trends over time?