Back to Blog
MLOps

MLOps Pipeline from Scratch

Data, training, deployment, monitoring—production ML in 100 lines of code

Published July 1, 2026 14 min read

Definition: MLOps = applying DevOps principles to machine learning. Automate training, deployment, and monitoring so teams ship models reliably.

The Problem: From Notebook to Production

Data scientists train models in Jupyter notebooks. Then what?

MLOps solves this by automating the entire lifecycle.

MLOps Pipeline Architecture

1. Data Pipeline (fetch, clean, validate)
           ↓
2. Training (hyperparameter tuning, cross-validation)
           ↓
3. Testing (evaluate on holdout test set)
           ↓
4. Versioning (save model artifact with metadata)
           ↓
5. Deployment (containerize, push to registry)
           ↓
6. Monitoring (track performance, detect drift)
           ↓
7. Retraining (automatically if performance drops)

Build Your First Pipeline

Step 1: Set Up Project

mkdir ml-pipeline
cd ml-pipeline
python -m venv venv
source venv/bin/activate
pip install scikit-learn pandas pydantic fastapi

Step 2: Data Module

# data.py
import pandas as pd
from sklearn.model_selection import train_test_split

def load_and_prepare():
    # Load dataset
    df = pd.read_csv("data.csv")
    
    # Clean
    df = df.dropna()
    
    # Split
    X_train, X_test, y_train, y_test = train_test_split(
        df.drop("target", axis=1),
        df["target"],
        test_size=0.2,
        random_state=42
    )
    
    return X_train, X_test, y_train, y_test

Step 3: Training Module

# train.py
from sklearn.ensemble import RandomForestClassifier
import joblib
import json
from data import load_and_prepare

def train():
    X_train, X_test, y_train, y_test = load_and_prepare()
    
    # Train
    model = RandomForestClassifier(n_estimators=100)
    model.fit(X_train, y_train)
    
    # Evaluate
    train_score = model.score(X_train, y_train)
    test_score = model.score(X_test, y_test)
    
    # Save model
    joblib.dump(model, "model.pkl")
    
    # Save metrics
    metrics = {
        "train_accuracy": train_score,
        "test_accuracy": test_score
    }
    with open("metrics.json", "w") as f:
        json.dump(metrics, f)
    
    print(f"Model trained. Test accuracy: {test_score:.3f}")
    
    return model

if __name__ == "__main__":
    train()

Step 4: Serving Module (FastAPI)

# serve.py
from fastapi import FastAPI
import joblib
import numpy as np

app = FastAPI()
model = joblib.load("model.pkl")

@app.get("/predict")
def predict(features: list):
    """Predict on new data"""
    X = np.array([features])
    prediction = model.predict(X)[0]
    probability = model.predict_proba(X).max()
    return {
        "prediction": int(prediction),
        "confidence": float(probability)
    }

@app.get("/health")
def health():
    """Health check"""
    return {"status": "ok"}

Step 5: Docker Container

# Dockerfile
FROM python:3.10-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY model.pkl serve.py .

CMD ["uvicorn", "serve:app", "--host", "0.0.0.0", "--port", "8000"]

Step 6: GitHub Actions Automation

# .github/workflows/pipeline.yml
name: ML Pipeline

on:
  schedule:
    - cron: "0 0 * * 0"  # Weekly
  push:
    branches: [main]

jobs:
  train:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      
      - name: Set up Python
        uses: actions/setup-python@v2
        with:
          python-version: "3.10"
      
      - name: Install dependencies
        run: pip install -r requirements.txt
      
      - name: Train model
        run: python train.py
      
      - name: Test model
        run: python -m pytest tests/
      
      - name: Upload artifact
        uses: actions/upload-artifact@v2
        with:
          name: model
          path: model.pkl
      
      - name: Deploy to production
        run: |
          docker build -t ml-model .
          docker push gcr.io/my-project/ml-model

Monitoring and Alerts

# monitor.py
import json
from datetime import datetime

def check_model_drift():
    """Alert if model performance drops"""
    
    # Load current metrics
    with open("metrics.json") as f:
        current = json.load(f)
    
    # Load historical baseline
    with open("baseline_metrics.json") as f:
        baseline = json.load(f)
    
    accuracy_drop = baseline["accuracy"] - current["accuracy"]
    
    if accuracy_drop > 0.05:  # Alert if >5% drop
        print(f"⚠️ MODEL DRIFT DETECTED!")
        print(f"Accuracy dropped from {baseline['accuracy']:.3f} to {current['accuracy']:.3f}")
        # Send alert to Slack, email, etc.
        return False
    
    return True

Key MLOps Tools

Layer Tool Purpose
Versioning DVC, MLflow Track data/model versions
Orchestration Airflow, Prefect Schedule pipelines
Training Kubeflow, Ray Distributed training
Deployment Docker, Kubernetes Containerize and serve
Monitoring Prometheus, Datadog Track metrics and alerts

Common Pitfalls

Next Steps

Grow from this foundation:

Master Production ML

Learn to build automated, monitored ML systems that scale.

Start MLOps Course →