How-To Guide

Running Llama 3 Locally with Ollama

Free LLMs on your machine: no API costs, full control, offline capable

Published July 1, 2026 • 8 min read

TL;DR: Download Ollama (5 min), run `ollama pull llama3:13b`, then `ollama run llama3:13b`. Chat with a powerful LLM for free, locally.

Why Run Models Locally?

🔒 Privacy: Your data never leaves your machine
💰 Free: No API costs after initial download
⚡ Fast: No network latency
🔌 Offline: Works without internet after download
🎮 Full control: Modify behavior, use custom prompts, batch processing

Step 1: Install Ollama

Mac (M1/M2/M3)

# Download and install from ollama.ai
# Or use Homebrew:
brew install ollama

Linux

curl -fsSL https://ollama.ai/install.sh | sh

Windows

Download installer from ollama.ai

Step 2: Download Llama 3

# Start Ollama service (runs in background)
ollama serve

# In another terminal, pull the model
ollama pull llama3:13b

# Or the larger version
ollama pull llama3:70b

First download takes 5-15 minutes depending on internet speed. Ollama caches models locally.

Model	Size	RAM	Speed
llama3:7b	3.7GB	8GB	~50 tokens/sec (MacBook)
llama3:13b	7.4GB	16GB	~30 tokens/sec
llama3:70b	40GB	64GB	~5 tokens/sec

Step 3: Run and Chat

ollama run llama3:13b

You're now in an interactive chat. Type and press Enter:

>> Explain quantum computing in one sentence

Quantum computers harness the strange properties of quantum 
mechanics—where particles exist in multiple states simultaneously—
to perform computations exponentially faster than classical computers 
for specific problem types.

>>> /bye  # Exit the chat

Advanced: Use Ollama Programmatically

Python

import requests
import json

def query_ollama(prompt):
    response = requests.post("http://localhost:11434/api/generate", 
        json={
            "model": "llama3:13b",
            "prompt": prompt,
            "stream": False
        }
    )
    return response.json()["response"]

answer = query_ollama("What's 2+2?")
print(answer)

cURL

curl http://localhost:11434/api/generate -d '{
  "model": "llama3:13b",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

Other Models Available

Ollama supports many open-source models:

Llama: ollama pull llama3:70b or ollama pull llama2:13b
Mistral: ollama pull mistral:latest
Qwen: ollama pull qwen:latest
Neural Chat: ollama pull neural-chat:latest

Performance Tips

Mac M1/M2: Models run with GPU acceleration. 13B is recommended sweet spot.
Linux: Add NVIDIA GPU support: `ollama run ollama/ollama:latest-gpu`
Windows: Limited GPU support as of 2026. CPU-only recommended.
Reduce latency: Smaller models (7B) are 2-3x faster than 13B.
Batch requests: Send multiple prompts to amortize startup cost.

Learn More: Local LLM Architecture

Master local LLM deployment with our comprehensive course covering:

Ollama setup and optimization
Custom model fine-tuning locally
Building production pipelines
Cost analysis: cloud vs. local
Real-world case studies

Master Local LLM Architecture

Learn to build, deploy, and optimize LLMs on your own hardware.

Start Local LLM Course →