Neural Networks Fundamentals
Duration: 5 min
Neural networks are inspired by biological neurons in the brain. They consist of interconnected layers of nodes that process information through weighted connections and activation functions.
Biological Neuron vs Artificial Neuron
Biological Neuron
Biological Neuron:
Dendrites (inputs)
│ │ │
└─┼─┘
│
Soma (cell body)
│
Axon (output)
│
Synapse (connection)Artificial Neuron (Perceptron)
Artificial Neuron:
x₁ ──w₁──┐
x₂ ──w₂──┼─→ Σ ──f(·)── y
x₃ ──w₃──┤
b ───────┘
Where:
- x₁, x₂, x₃ = inputs
- w₁, w₂, w₃ = weights
- b = bias
- Σ = summation
- f(·) = activation function
- y = outputPerceptron: The Simplest Neural Network
How It Works
Step 1: Weighted Sum
z = w₁x₁ + w₂x₂ + ... + wₙxₙ + b
Step 2: Apply Activation Function
y = f(z)
Step 3: Output
If y > threshold: output = 1 (class A)
If y ≤ threshold: output = 0 (class B)Example: AND Gate
Truth Table:
x₁ x₂ | Output
─────────┼────────
0 0 | 0
0 1 | 0
1 0 | 0
1 1 | 1
Perceptron with w₁=0.5, w₂=0.5, b=-0.75:
z = 0.5x₁ + 0.5x₂ - 0.75
If z > 0: output = 1, else output = 0
Verification:
(0,0): z = -0.75 → 0 ✓
(0,1): z = -0.25 → 0 ✓
(1,0): z = -0.25 → 0 ✓
(1,1): z = 0.25 → 1 ✓Python Code
import numpy as np
def perceptron(inputs, weights, bias, threshold=0):
z = np.dot(inputs, weights) + bias
return 1 if z > threshold else 0
# AND gate
weights = np.array([0.5, 0.5])
bias = -0.75
test_cases = [(0,0), (0,1), (1,0), (1,1)]
for x1, x2 in test_cases:
output = perceptron([x1, x2], weights, bias)
print(f"({x1}, {x2}) → {output}")Activation Functions
Why Activation Functions?
Without activation: y = w₁x₁ + w₂x₂ + b (linear)
With activation: y = f(w₁x₁ + w₂x₂ + b) (non-linear)
Linear combinations can't learn complex patterns!
Activation functions enable learning non-linear relationships.Sigmoid Function
Formula: σ(z) = 1 / (1 + e^(-z))
Graph:
σ(z)
│
1 ├─────────────
│ ╱
0.5├────╱
│ ╱
0 ├──╱─────────
│
-1 └─────────────
-5 0 5
z
Properties:
- Output range: (0, 1)
- Smooth gradient
- Used in output layer for binary classificationReLU (Rectified Linear Unit)
Formula: ReLU(z) = max(0, z)
Graph:
ReLU(z)
│
│ ╱
│ ╱
│ ╱
0 ├───╱─────────
│ ╱
-1 └─────────────
-5 0 5
z
Properties:
- Output range: [0, ∞)
- Computationally efficient
- Most popular in hidden layers
- Problem: Dying ReLU (neurons output 0)Tanh Function
Formula: tanh(z) = (e^z - e^(-z)) / (e^z + e^(-z))
Graph:
tanh(z)
│
1 ├─────────────
│ ╱
0 ├─────╱
│ ╱
-1 ├──╱─────────
│
└─────────────
-5 0 5
z
Properties:
- Output range: (-1, 1)
- Centered at 0 (better for learning)
- Stronger gradients than sigmoidMulti-Layer Neural Network
Architecture
Input Layer Hidden Layer 1 Hidden Layer 2 Output Layer
│ │ │ │
x₁ ─────────────→ h₁ ─────────────→ o₁ ─────────────→ y₁
│ │ │ │
x₂ ─────────────→ h₂ ─────────────→ o₂ ─────────────→ y₂
│ │ │ │
x₃ ─────────────→ h₃ ─────────────→ o₃
│ │ │
(3 inputs) (3 neurons) (3 neurons) (2 outputs)
Connections: Fully connected (dense)Forward Propagation
Layer 1 (Input → Hidden):
h = f(W₁ · x + b₁)
Layer 2 (Hidden → Hidden):
o = f(W₂ · h + b₂)
Layer 3 (Hidden → Output):
y = f(W₃ · o + b₃)
Where:
- W = weight matrix
- b = bias vector
- f = activation functionBackpropagation: How Networks Learn
The Learning Process
1. Forward Pass
Input → Hidden → Output
2. Calculate Loss
Loss = (predicted - actual)²
3. Backward Pass
Calculate gradients using chain rule
4. Update Weights
w_new = w_old - learning_rate × gradient
5. Repeat until convergenceGradient Descent Visualization
Loss Function:
Loss
│
│ ╱╲
│ ╱ ╲
│ ╱ ╲
│ ╱ ╲
│╱ ╲
└──────────╲──
w₁ w₂ w₃ w₄
Gradient descent finds minimum:
Start at random point → Follow slope downward → Reach minimumCommon Network Architectures
Feedforward Network (MLP)
Simple, fully connected layers
Used for: Classification, regressionConvolutional Neural Network (CNN)
Convolutional layers for spatial features
Used for: Image recognition, computer visionRecurrent Neural Network (RNN)
Loops for sequential data
Used for: Time series, NLP, speechTransformer
Attention mechanisms for sequences
Used for: NLP, language modelsKey Concepts Summary
| Concept | Purpose |
|---|---|
| Neuron | Basic processing unit |
| Weight | Strength of connection |
| Bias | Shift activation threshold |
| Activation | Introduce non-linearity |
| Layer | Group of neurons |
| Forward Pass | Compute output |
| Backprop | Calculate gradients |
| Loss | Measure error |
| Gradient Descent | Optimize weights |
❓ What is the primary function of a perceptron?
❓ Why do we need activation functions?
❓ What is the output range of sigmoid function?
❓ What does backpropagation do?