Vector Databases Explained
The backbone of RAG: semantic search at scale with embeddings
Quick Explanation: Vector databases store embeddings (numerical representations of text/images) and answer "find documents semantically similar to this" in milliseconds. Essential for RAG, semantic search, and AI applications.
Why Vector Databases?
Traditional databases search by keyword. Vector databases search by meaning.
Traditional SQL Query
SELECT * FROM docs
WHERE text LIKE '%python%'
❌ Misses "Python is a snake" if you search for "programming"
Vector Query
vector_db.search(
embedding("python programming"),
top_k=5
)
✅ Finds docs about Python, coding, scripts, programming languages
How Vector Databases Work
Step 1: Convert Text to Embeddings
Embeddings are vectors (lists of numbers) that represent meaning:
Text: "The cat sat on the mat"
Embedding: [0.23, -0.45, 0.12, ..., 0.89] (1536 dimensions)
Semantically similar sentences have similar embeddings:
"The cat sat on the mat" → [0.23, -0.45, 0.12, ...]
"A feline rested on the rug" → [0.24, -0.46, 0.11, ...]
↑ Very close vectors!
Step 2: Store Vectors (Indexed)
Vector DBs use specialized indexes like HNSW (Hierarchical Navigable Small World) for fast nearest-neighbor search:
- Can search 1M vectors in <10ms
- Uses approximate nearest neighbor (ANN), not exact
- Trade-off: Speed vs accuracy (but difference is minimal)
Step 3: Query with New Embedding
Query embedding: "kitten napping on carpet"
→ [0.22, -0.47, 0.11, ...]
Find vectors closest to this embedding
Result: docs about cats, animals, resting, furniture
Vector Databases Comparison
| Database | Type | Best For | Setup |
|---|---|---|---|
| Pinecone | Managed cloud | Production SaaS | 5 min, $0.04/hour |
| Weaviate | Self-hosted + cloud | Hybrid, GraphQL API | Docker, 15 min |
| Milvus | Self-hosted open-source | High scale, cost-conscious | Docker, Kubernetes |
| ChromaDB | Embedded/lightweight | Dev, prototyping | pip install, 1 min |
| Qdrant | Self-hosted, API-first | Production, performance | Docker, Rust-based |
Real Example: Building a RAG Chatbot
from pinecone import Pinecone
from openai import OpenAI
# Initialize
pc = Pinecone(api_key="your-key")
index = pc.Index("documents")
client = OpenAI()
# Step 1: Store documents (one-time)
documents = [
"Paris is the capital of France",
"The Eiffel Tower is 330 meters tall",
"France has a population of 68 million"
]
for doc in documents:
embedding = client.embeddings.create(
model="text-embedding-3-small",
input=doc
).data[0].embedding
index.upsert([(doc[:50], embedding, {"text": doc})])
# Step 2: Answer questions by searching
question = "How tall is the Eiffel Tower?"
q_embedding = client.embeddings.create(
model="text-embedding-3-small",
input=question
).data[0].embedding
results = index.query(q_embedding, top_k=3)
context = [r['metadata']['text'] for r in results]
# Step 3: Generate answer with LLM
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "user", "content": f"""
Context: {context}
Question: {question}
Answer:"""}
]
)
print(response.choices[0].message.content)
# Output: The Eiffel Tower is 330 meters tall.
When to Use a Vector Database
- ✅ Building RAG systems for document Q&A
- ✅ Semantic search across large text corpora
- ✅ Recommendation systems based on similarity
- ✅ Anomaly detection with embeddings
- ❌ Structured queries (use PostgreSQL instead)
- ❌ High-frequency transactions
Limitations and Trade-offs
- Approximate results: ANN doesn't always find the exact nearest neighbor, but close enough for most uses
- Embedding quality: Results only as good as your embedding model
- Curse of dimensionality: Very high-dimensional spaces (1500+ dims) can be slow
- Cost scaling: Pinecone pricing grows with storage/queries
Quick Start: ChromaDB (Local Development)
from chromadb import Client
client = Client()
collection = client.create_collection(name="documents")
# Add documents
collection.add(
ids=["1", "2", "3"],
documents=[
"Apple is a fruit",
"Apple is a tech company",
"Banana is a fruit"
]
)
# Query
results = collection.query(
query_texts=["What fruit is yellow?"],
n_results=1
)
print(results) # Returns "Banana is a fruit"
Learn RAG at Scale
Master vector databases and RAG architecture with hands-on projects:
- Understanding embeddings: OpenAI vs Hugging Face vs custom
- Choosing and deploying vector databases
- Building production RAG pipelines
- Optimizing retrieval quality
- Cost analysis: cloud vs self-hosted
Master RAG Systems
Build production-grade AI systems with semantic search and vector databases.
Start RAG Systems Course →