Module 7 of 16 · Maths and Statistics in AI · Beginner

Neural Networks Fundamentals

Duration: 5 min

Neural networks are inspired by biological neurons in the brain. They consist of interconnected layers of nodes that process information through weighted connections and activation functions.

Biological Neuron vs Artificial Neuron

Biological Neuron

Biological Neuron:
    Dendrites (inputs)
         │ │ │
         └─┼─┘
           │
        Soma (cell body)
           │
        Axon (output)
           │
        Synapse (connection)

Artificial Neuron (Perceptron)

Artificial Neuron:
    x₁ ──w₁──┐
    x₂ ──w₂──┼─→ Σ ──f(·)── y
    x₃ ──w₃──┤
    b ───────┘

Where:
- x₁, x₂, x₃ = inputs
- w₁, w₂, w₃ = weights
- b = bias
- Σ = summation
- f(·) = activation function
- y = output

Perceptron: The Simplest Neural Network

How It Works

Step 1: Weighted Sum
z = w₁x₁ + w₂x₂ + ... + wₙxₙ + b

Step 2: Apply Activation Function
y = f(z)

Step 3: Output
If y > threshold: output = 1 (class A)
If y ≤ threshold: output = 0 (class B)

Example: AND Gate

Truth Table:
x₁  x₂  | Output
─────────┼────────
 0   0  |   0
 0   1  |   0
 1   0  |   0
 1   1  |   1

Perceptron with w₁=0.5, w₂=0.5, b=-0.75:
z = 0.5x₁ + 0.5x₂ - 0.75
If z > 0: output = 1, else output = 0

Verification:
(0,0): z = -0.75 → 0 ✓
(0,1): z = -0.25 → 0 ✓
(1,0): z = -0.25 → 0 ✓
(1,1): z = 0.25 → 1 ✓

Python Code

import numpy as np

def perceptron(inputs, weights, bias, threshold=0):
    z = np.dot(inputs, weights) + bias
    return 1 if z > threshold else 0

# AND gate
weights = np.array([0.5, 0.5])
bias = -0.75

test_cases = [(0,0), (0,1), (1,0), (1,1)]
for x1, x2 in test_cases:
    output = perceptron([x1, x2], weights, bias)
    print(f"({x1}, {x2}) → {output}")

Activation Functions

Why Activation Functions?

Without activation: y = w₁x₁ + w₂x₂ + b (linear)
With activation: y = f(w₁x₁ + w₂x₂ + b) (non-linear)

Linear combinations can't learn complex patterns!
Activation functions enable learning non-linear relationships.

Sigmoid Function

Formula: σ(z) = 1 / (1 + e^(-z))

Graph:
    σ(z)
      │
    1 ├─────────────
      │      ╱
    0.5├────╱
      │   ╱
    0 ├──╱─────────
      │
   -1 └─────────────
      -5  0  5
      z

Properties:
- Output range: (0, 1)
- Smooth gradient
- Used in output layer for binary classification

ReLU (Rectified Linear Unit)

Formula: ReLU(z) = max(0, z)

Graph:
    ReLU(z)
      │
      │      ╱
      │     ╱
      │    ╱
    0 ├───╱─────────
      │  ╱
   -1 └─────────────
      -5  0  5
      z

Properties:
- Output range: [0, ∞)
- Computationally efficient
- Most popular in hidden layers
- Problem: Dying ReLU (neurons output 0)

Tanh Function

Formula: tanh(z) = (e^z - e^(-z)) / (e^z + e^(-z))

Graph:
    tanh(z)
      │
    1 ├─────────────
      │      ╱
    0 ├─────╱
      │   ╱
   -1 ├──╱─────────
      │
      └─────────────
      -5  0  5
      z

Properties:
- Output range: (-1, 1)
- Centered at 0 (better for learning)
- Stronger gradients than sigmoid

Multi-Layer Neural Network

Architecture

Input Layer    Hidden Layer 1    Hidden Layer 2    Output Layer
    │                │                 │                │
   x₁ ─────────────→ h₁ ─────────────→ o₁ ─────────────→ y₁
    │                │                 │                │
   x₂ ─────────────→ h₂ ─────────────→ o₂ ─────────────→ y₂
    │                │                 │                │
   x₃ ─────────────→ h₃ ─────────────→ o₃
    │                │                 │
   (3 inputs)    (3 neurons)      (3 neurons)      (2 outputs)

Connections: Fully connected (dense)

Forward Propagation

Layer 1 (Input → Hidden):
h = f(W₁ · x + b₁)

Layer 2 (Hidden → Hidden):
o = f(W₂ · h + b₂)

Layer 3 (Hidden → Output):
y = f(W₃ · o + b₃)

Where:
- W = weight matrix
- b = bias vector
- f = activation function

Backpropagation: How Networks Learn

The Learning Process

1. Forward Pass
   Input → Hidden → Output

2. Calculate Loss
   Loss = (predicted - actual)²

3. Backward Pass
   Calculate gradients using chain rule

4. Update Weights
   w_new = w_old - learning_rate × gradient

5. Repeat until convergence

Gradient Descent Visualization

Loss Function:
    Loss
      │
      │    ╱╲
      │   ╱  ╲
      │  ╱    ╲
      │ ╱      ╲
      │╱        ╲
      └──────────╲──
      w₁  w₂  w₃  w₄

Gradient descent finds minimum:
Start at random point → Follow slope downward → Reach minimum

Common Network Architectures

Feedforward Network (MLP)

Simple, fully connected layers
Used for: Classification, regression

Convolutional Neural Network (CNN)

Convolutional layers for spatial features
Used for: Image recognition, computer vision

Recurrent Neural Network (RNN)

Loops for sequential data
Used for: Time series, NLP, speech

Transformer

Attention mechanisms for sequences
Used for: NLP, language models

Key Concepts Summary

Concept Purpose
Neuron Basic processing unit
Weight Strength of connection
Bias Shift activation threshold
Activation Introduce non-linearity
Layer Group of neurons
Forward Pass Compute output
Backprop Calculate gradients
Loss Measure error
Gradient Descent Optimize weights

❓ What is the primary function of a perceptron?

❓ Why do we need activation functions?

❓ What is the output range of sigmoid function?

❓ What does backpropagation do?

← Previous Continue interactively → Next →

Related Courses