Back to Blog
How-To Guide

Running Llama 3 Locally with Ollama

Free LLMs on your machine: no API costs, full control, offline capable

Published July 1, 2026 8 min read

TL;DR: Download Ollama (5 min), run `ollama pull llama3:13b`, then `ollama run llama3:13b`. Chat with a powerful LLM for free, locally.

Why Run Models Locally?

Step 1: Install Ollama

Mac (M1/M2/M3)

# Download and install from ollama.ai
# Or use Homebrew:
brew install ollama

Linux

curl -fsSL https://ollama.ai/install.sh | sh

Windows

Download installer from ollama.ai

Step 2: Download Llama 3

# Start Ollama service (runs in background)
ollama serve

# In another terminal, pull the model
ollama pull llama3:13b

# Or the larger version
ollama pull llama3:70b

First download takes 5-15 minutes depending on internet speed. Ollama caches models locally.

Model Size RAM Speed
llama3:7b 3.7GB 8GB ~50 tokens/sec (MacBook)
llama3:13b 7.4GB 16GB ~30 tokens/sec
llama3:70b 40GB 64GB ~5 tokens/sec

Step 3: Run and Chat

ollama run llama3:13b

You're now in an interactive chat. Type and press Enter:

>> Explain quantum computing in one sentence

Quantum computers harness the strange properties of quantum 
mechanics—where particles exist in multiple states simultaneously—
to perform computations exponentially faster than classical computers 
for specific problem types.

>>> /bye  # Exit the chat

Advanced: Use Ollama Programmatically

Python

import requests
import json

def query_ollama(prompt):
    response = requests.post("http://localhost:11434/api/generate", 
        json={
            "model": "llama3:13b",
            "prompt": prompt,
            "stream": False
        }
    )
    return response.json()["response"]

answer = query_ollama("What's 2+2?")
print(answer)

cURL

curl http://localhost:11434/api/generate -d '{
  "model": "llama3:13b",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

Other Models Available

Ollama supports many open-source models:

Performance Tips

  • Mac M1/M2: Models run with GPU acceleration. 13B is recommended sweet spot.
  • Linux: Add NVIDIA GPU support: `ollama run ollama/ollama:latest-gpu`
  • Windows: Limited GPU support as of 2026. CPU-only recommended.
  • Reduce latency: Smaller models (7B) are 2-3x faster than 13B.
  • Batch requests: Send multiple prompts to amortize startup cost.

Learn More: Local LLM Architecture

Master local LLM deployment with our comprehensive course covering:

Master Local LLM Architecture

Learn to build, deploy, and optimize LLMs on your own hardware.

Start Local LLM Course →