RAG Systems

Vector Databases Explained

The backbone of RAG: semantic search at scale with embeddings

Published July 1, 2026 • 12 min read

Quick Explanation: Vector databases store embeddings (numerical representations of text/images) and answer "find documents semantically similar to this" in milliseconds. Essential for RAG, semantic search, and AI applications.

Why Vector Databases?

Traditional databases search by keyword. Vector databases search by meaning.

Traditional SQL Query

SELECT * FROM docs 
WHERE text LIKE '%python%'

❌ Misses "Python is a snake" if you search for "programming"

Vector Query

vector_db.search(
  embedding("python programming"),
  top_k=5
)

✅ Finds docs about Python, coding, scripts, programming languages

How Vector Databases Work

Step 1: Convert Text to Embeddings

Embeddings are vectors (lists of numbers) that represent meaning:

Text: "The cat sat on the mat"
Embedding: [0.23, -0.45, 0.12, ..., 0.89]  (1536 dimensions)

Semantically similar sentences have similar embeddings:

"The cat sat on the mat" → [0.23, -0.45, 0.12, ...]
"A feline rested on the rug" → [0.24, -0.46, 0.11, ...]
                      ↑ Very close vectors!

Step 2: Store Vectors (Indexed)

Vector DBs use specialized indexes like HNSW (Hierarchical Navigable Small World) for fast nearest-neighbor search:

Can search 1M vectors in <10ms
Uses approximate nearest neighbor (ANN), not exact
Trade-off: Speed vs accuracy (but difference is minimal)

Step 3: Query with New Embedding

Query embedding: "kitten napping on carpet"
→ [0.22, -0.47, 0.11, ...]

Find vectors closest to this embedding
Result: docs about cats, animals, resting, furniture

Vector Databases Comparison

Database	Type	Best For	Setup
Pinecone	Managed cloud	Production SaaS	5 min, $0.04/hour
Weaviate	Self-hosted + cloud	Hybrid, GraphQL API	Docker, 15 min
Milvus	Self-hosted open-source	High scale, cost-conscious	Docker, Kubernetes
ChromaDB	Embedded/lightweight	Dev, prototyping	pip install, 1 min
Qdrant	Self-hosted, API-first	Production, performance	Docker, Rust-based

Real Example: Building a RAG Chatbot

from pinecone import Pinecone
from openai import OpenAI

# Initialize
pc = Pinecone(api_key="your-key")
index = pc.Index("documents")
client = OpenAI()

# Step 1: Store documents (one-time)
documents = [
    "Paris is the capital of France",
    "The Eiffel Tower is 330 meters tall",
    "France has a population of 68 million"
]

for doc in documents:
    embedding = client.embeddings.create(
        model="text-embedding-3-small",
        input=doc
    ).data[0].embedding
    
    index.upsert([(doc[:50], embedding, {"text": doc})])

# Step 2: Answer questions by searching
question = "How tall is the Eiffel Tower?"
q_embedding = client.embeddings.create(
    model="text-embedding-3-small",
    input=question
).data[0].embedding

results = index.query(q_embedding, top_k=3)
context = [r['metadata']['text'] for r in results]

# Step 3: Generate answer with LLM
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content": f"""
Context: {context}
Question: {question}
Answer:"""}
    ]
)

print(response.choices[0].message.content)
# Output: The Eiffel Tower is 330 meters tall.

When to Use a Vector Database

✅ Building RAG systems for document Q&A
✅ Semantic search across large text corpora
✅ Recommendation systems based on similarity
✅ Anomaly detection with embeddings
❌ Structured queries (use PostgreSQL instead)
❌ High-frequency transactions

Limitations and Trade-offs

Approximate results: ANN doesn't always find the exact nearest neighbor, but close enough for most uses
Embedding quality: Results only as good as your embedding model
Curse of dimensionality: Very high-dimensional spaces (1500+ dims) can be slow
Cost scaling: Pinecone pricing grows with storage/queries

Quick Start: ChromaDB (Local Development)

from chromadb import Client

client = Client()
collection = client.create_collection(name="documents")

# Add documents
collection.add(
    ids=["1", "2", "3"],
    documents=[
        "Apple is a fruit",
        "Apple is a tech company",
        "Banana is a fruit"
    ]
)

# Query
results = collection.query(
    query_texts=["What fruit is yellow?"],
    n_results=1
)
print(results)  # Returns "Banana is a fruit"

Learn RAG at Scale

Master vector databases and RAG architecture with hands-on projects:

Understanding embeddings: OpenAI vs Hugging Face vs custom
Choosing and deploying vector databases
Building production RAG pipelines
Optimizing retrieval quality
Cost analysis: cloud vs self-hosted

Master RAG Systems

Build production-grade AI systems with semantic search and vector databases.

Start RAG Systems Course →