Docker & Containerization

Duration: 120 min

Docker revolutionized application deployment by packaging applications and their dependencies into containers. Containers are lightweight, portable, and consistent across environments. This module covers Docker fundamentals, image creation, networking, volumes, and multi-stage builds.

What is Docker?

Docker is a containerization platform that packages applications with all dependencies into a standardized unit called a container. Containers are:

Lightweight: Share the host OS kernel, not full VMs
Portable: Run consistently on any system with Docker
Isolated: Each container has its own filesystem, processes, and network
Reproducible: Same image always produces the same container

Docker Architecture

Docker uses a client-server architecture:

Docker Client: CLI tool for interacting with Docker
Docker Daemon: Background service managing containers and images
Docker Registry: Repository for storing and sharing images (Docker Hub, ECR, etc.)
Images: Read-only templates for creating containers
Containers: Running instances of images

Creating Docker Images

Basic Dockerfile

# Use a base image
FROM ubuntu:22.04

# Set working directory
WORKDIR /app

# Install dependencies
RUN apt-get update && apt-get install -y \
    python3 \
    python3-pip

# Copy application code
COPY . .

# Install Python dependencies
RUN pip install -r requirements.txt

# Expose port
EXPOSE 8000

# Set environment variables
ENV FLASK_APP=app.py

# Run the application
CMD ["python3", "app.py"]

Build and run:

# Build the image
docker build -t myapp:1.0 .

# Run a container
docker run -d -p 8000:8000 --name myapp-container myapp:1.0

# View running containers
docker ps

# View logs
docker logs myapp-container

# Stop the container
docker stop myapp-container

Multi-Stage Builds

Multi-stage builds reduce image size by using multiple FROM statements:

# Stage 1: Build
FROM golang:1.21 AS builder
WORKDIR /app
COPY . .
RUN go build -o myapp .

# Stage 2: Runtime
FROM alpine:latest
WORKDIR /app
COPY --from=builder /app/myapp .
EXPOSE 8080
CMD ["./myapp"]

This approach:

Keeps build tools out of final image
Reduces image size significantly
Improves security by minimizing attack surface

Docker Networking

Containers can communicate with each other through Docker networks:

# Create a custom network
docker network create mynetwork

# Run containers on the network
docker run -d --name web --network mynetwork nginx:latest
docker run -d --name db --network mynetwork postgres:latest

# Containers can communicate by name
docker exec web curl http://db:5432

# Inspect network
docker network inspect mynetwork

Docker Volumes

Volumes persist data beyond container lifecycle:

# Create a named volume
docker volume create mydata

# Run container with volume
docker run -d -v mydata:/data --name myapp myapp:1.0

# Mount host directory
docker run -d -v /host/path:/container/path --name myapp myapp:1.0

# View volumes
docker volume ls
docker volume inspect mydata

# Remove volume
docker volume rm mydata

Docker Compose

Docker Compose defines multi-container applications in YAML:

version: '3.8'

services:
  web:
    build: .
    ports:
      - "8000:8000"
    environment:
      - DATABASE_URL=postgresql://db:5432/myapp
    depends_on:
      - db
    volumes:
      - ./app:/app

  db:
    image: postgres:15
    environment:
      - POSTGRES_DB=myapp
      - POSTGRES_PASSWORD=secret
    volumes:
      - postgres_data:/var/lib/postgresql/data

volumes:
  postgres_data:

Run with:

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f web

# Stop services
docker-compose down

# Remove volumes
docker-compose down -v

Best Practices for Docker Images

Minimize Layer Count

# Bad: Multiple RUN commands create multiple layers
RUN apt-get update
RUN apt-get install -y python3
RUN apt-get install -y pip

# Good: Single RUN command with && chains
RUN apt-get update && apt-get install -y \
    python3 \
    pip

Use .dockerignore

# .dockerignore
node_modules
.git
.env
*.log
__pycache__
.pytest_cache

Non-Root User

FROM ubuntu:22.04

RUN useradd -m appuser
USER appuser

COPY --chown=appuser:appuser . /app
WORKDIR /app

CMD ["./app"]

Docker Registry and Image Management

Push to AWS ECR

# Create ECR repository
aws ecr create-repository --repository-name myapp --region us-east-1

# Get login token
aws ecr get-login-password --region us-east-1 | \
  docker login --username AWS --password-stdin 123456789.dkr.ecr.us-east-1.amazonaws.com

# Tag image
docker tag myapp:1.0 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:1.0

# Push image
docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:1.0

# Pull image
docker pull 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:1.0

Image Tagging Strategy

# Tag with version
docker tag myapp:latest myapp:1.0.0

# Tag with git commit
docker tag myapp:latest myapp:$(git rev-parse --short HEAD)

# Tag with timestamp
docker tag myapp:latest myapp:$(date +%Y%m%d-%H%M%S)

# Push all tags
docker push myapp --all-tags

Debugging Containers

# Execute command in running container
docker exec -it myapp-container bash

# Inspect container details
docker inspect myapp-container

# View resource usage
docker stats myapp-container

# View container processes
docker top myapp-container

# Copy files from container
docker cp myapp-container:/app/data.txt ./data.txt

❓ What is the main advantage of containers over virtual machines?

Containers are lightweight and share the host OS kernel Containers provide better security isolation Containers can run multiple operating systems Containers don't require Docker to run

❓ What is the purpose of a multi-stage Docker build?

To run multiple containers simultaneously To support multiple programming languages To reduce final image size by excluding build tools To speed up Docker build process

❓ What does a Docker volume do?

Limits container memory usage Persists data beyond container lifecycle Encrypts container data Manages container networking

❓ How do containers on the same Docker network communicate?

By container name using DNS resolution Only through exposed ports They cannot communicate directly Through environment variables only

❓ What is the best practice for reducing Docker image layers?

Use multiple FROM statements Create separate Dockerfiles Chain RUN commands with && to combine layers Use COPY instead of ADD