Back to Blog
RAG Systems

Vector Databases Explained

The backbone of RAG: semantic search at scale with embeddings

Published July 1, 2026 12 min read

Quick Explanation: Vector databases store embeddings (numerical representations of text/images) and answer "find documents semantically similar to this" in milliseconds. Essential for RAG, semantic search, and AI applications.

Why Vector Databases?

Traditional databases search by keyword. Vector databases search by meaning.

Traditional SQL Query

SELECT * FROM docs 
WHERE text LIKE '%python%'

❌ Misses "Python is a snake" if you search for "programming"

Vector Query

vector_db.search(
  embedding("python programming"),
  top_k=5
)

✅ Finds docs about Python, coding, scripts, programming languages

How Vector Databases Work

Step 1: Convert Text to Embeddings

Embeddings are vectors (lists of numbers) that represent meaning:

Text: "The cat sat on the mat"
Embedding: [0.23, -0.45, 0.12, ..., 0.89]  (1536 dimensions)

Semantically similar sentences have similar embeddings:

"The cat sat on the mat" → [0.23, -0.45, 0.12, ...]
"A feline rested on the rug" → [0.24, -0.46, 0.11, ...]
                      ↑ Very close vectors!

Step 2: Store Vectors (Indexed)

Vector DBs use specialized indexes like HNSW (Hierarchical Navigable Small World) for fast nearest-neighbor search:

Step 3: Query with New Embedding

Query embedding: "kitten napping on carpet"
→ [0.22, -0.47, 0.11, ...]

Find vectors closest to this embedding
Result: docs about cats, animals, resting, furniture

Vector Databases Comparison

Database Type Best For Setup
Pinecone Managed cloud Production SaaS 5 min, $0.04/hour
Weaviate Self-hosted + cloud Hybrid, GraphQL API Docker, 15 min
Milvus Self-hosted open-source High scale, cost-conscious Docker, Kubernetes
ChromaDB Embedded/lightweight Dev, prototyping pip install, 1 min
Qdrant Self-hosted, API-first Production, performance Docker, Rust-based

Real Example: Building a RAG Chatbot

from pinecone import Pinecone
from openai import OpenAI

# Initialize
pc = Pinecone(api_key="your-key")
index = pc.Index("documents")
client = OpenAI()

# Step 1: Store documents (one-time)
documents = [
    "Paris is the capital of France",
    "The Eiffel Tower is 330 meters tall",
    "France has a population of 68 million"
]

for doc in documents:
    embedding = client.embeddings.create(
        model="text-embedding-3-small",
        input=doc
    ).data[0].embedding
    
    index.upsert([(doc[:50], embedding, {"text": doc})])

# Step 2: Answer questions by searching
question = "How tall is the Eiffel Tower?"
q_embedding = client.embeddings.create(
    model="text-embedding-3-small",
    input=question
).data[0].embedding

results = index.query(q_embedding, top_k=3)
context = [r['metadata']['text'] for r in results]

# Step 3: Generate answer with LLM
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content": f"""
Context: {context}
Question: {question}
Answer:"""}
    ]
)

print(response.choices[0].message.content)
# Output: The Eiffel Tower is 330 meters tall.

When to Use a Vector Database

Limitations and Trade-offs

Quick Start: ChromaDB (Local Development)

from chromadb import Client

client = Client()
collection = client.create_collection(name="documents")

# Add documents
collection.add(
    ids=["1", "2", "3"],
    documents=[
        "Apple is a fruit",
        "Apple is a tech company",
        "Banana is a fruit"
    ]
)

# Query
results = collection.query(
    query_texts=["What fruit is yellow?"],
    n_results=1
)
print(results)  # Returns "Banana is a fruit"

Learn RAG at Scale

Master vector databases and RAG architecture with hands-on projects:

Master RAG Systems

Build production-grade AI systems with semantic search and vector databases.

Start RAG Systems Course →