Knowledge Bases & RAG
Duration: 60 min
Retrieval-Augmented Generation (RAG) combines foundation models with your own data to answer questions grounded in specific context. This module covers Bedrock Knowledge Bases, vector embeddings, chunking strategies, and retrieval patterns.
What is RAG?
RAG solves the hallucination problem by:
- Converting your documents into vector embeddings
- Storing embeddings in a vector database
- Retrieving relevant documents when answering questions
- Passing retrieved context to the model
User Question
↓
Vector Embedding
↓
Search Vector DB
↓
Retrieve Top-K Documents
↓
Combine with Prompt
↓
Send to Model
↓
Grounded AnswerBedrock Knowledge Bases
Bedrock Knowledge Bases automate the RAG pipeline:
import boto3
import json
client = boto3.client('bedrock-agent', region_name='us-east-1')
# Create a knowledge base
response = client.create_knowledge_base(
name='company-docs-kb',
description='Company policies and procedures',
roleArn='arn:aws:iam::ACCOUNT:role/BedrockKBRole',
knowledgeBaseConfiguration={
'type': 'VECTOR',
'vectorKnowledgeBaseConfiguration': {
'embeddingModel': {
'provider': 'BEDROCK',
'modelIdentifier': 'amazon.titan-embed-text-v2:0'
}
}
},
storageConfiguration={
'type': 'OPENSEARCH_SERVERLESS',
'opensearchServerlessConfiguration': {
'collectionArn': 'arn:aws:aoss:us-east-1:ACCOUNT:collection/...'
}
}
)
kb_id = response['knowledgeBase']['id']
print(f"Knowledge Base ID: {kb_id}")Data Sources
Add documents to your knowledge base:
# Create a data source
response = client.create_data_source(
knowledgeBaseId=kb_id,
name='s3-policies',
description='Company policies from S3',
dataSourceConfiguration={
'type': 'S3',
's3Configuration': {
'bucketArn': 'arn:aws:s3:::my-company-docs',
'inclusionPrefixes': ['policies/'],
'documentEncodingConfiguration': {
'encoding': 'UTF-8'
}
}
}
)
data_source_id = response['dataSource']['id']
# Ingest documents
response = client.start_ingestion_job(
knowledgeBaseId=kb_id,
dataSourceId=data_source_id
)
print(f"Ingestion Job: {response['ingestionJob']['ingestionJobId']}")Vector Embeddings
Embeddings convert text into numerical vectors for similarity search:
import boto3
bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')
# Generate embeddings for text
response = bedrock.invoke_model(
modelId='amazon.titan-embed-text-v2:0',
body=json.dumps({
"inputText": "AWS Bedrock is a managed service for foundation models"
})
)
result = json.loads(response['body'].read())
embedding = result['embedding'] # List of 1024 floats
print(f"Embedding dimension: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")Document Chunking
Split large documents into manageable chunks:
def chunk_text(text, chunk_size=1000, overlap=200):
"""Split text into overlapping chunks"""
chunks = []
for i in range(0, len(text), chunk_size - overlap):
chunk = text[i:i + chunk_size]
chunks.append(chunk)
return chunks
# Example
document = """
AWS Bedrock is a fully managed service that provides access to foundation models.
It supports multiple models from different providers including Anthropic, Meta, Mistral, and Stability AI.
Bedrock handles scaling, availability, and security automatically.
You can use Bedrock for various tasks including content generation, code assistance, and RAG.
"""
chunks = chunk_text(document, chunk_size=200, overlap=50)
for i, chunk in enumerate(chunks):
print(f"Chunk {i}: {chunk[:50]}...")Retrieval Patterns
Basic Retrieval
from boto3 import client as boto3_client
bedrock_agent_runtime = boto3_client('bedrock-agent-runtime', region_name='us-east-1')
# Retrieve documents from knowledge base
response = bedrock_agent_runtime.retrieve(
knowledgeBaseId=kb_id,
retrievalQuery={
'text': 'What is the vacation policy?'
},
retrievalConfiguration={
'vectorSearchConfiguration': {
'numberOfResults': 5,
'overrideSearchType': 'SEMANTIC'
}
}
)
# Process results
for result in response['retrievalResults']:
print(f"Score: {result['score']}")
print(f"Content: {result['content']['text'][:200]}...")Retrieve and Generate
# Retrieve documents and generate answer
response = bedrock_agent_runtime.retrieve_and_generate(
input={
'text': 'What is the remote work policy?'
},
retrieveAndGenerateConfiguration={
'type': 'KNOWLEDGE_BASE',
'knowledgeBaseConfiguration': {
'knowledgeBaseId': kb_id,
'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0',
'retrievalConfiguration': {
'vectorSearchConfiguration': {
'numberOfResults': 5
}
}
}
}
)
answer = response['output']['text']
print(f"Answer: {answer}")
# See which documents were used
for citation in response['citations']:
print(f"Source: {citation['generatedResponsePart']}")Chunking Strategies
# Strategy 1: Fixed-size chunks
def fixed_chunks(text, size=500):
return [text[i:i+size] for i in range(0, len(text), size)]
# Strategy 2: Sentence-based chunks
import re
def sentence_chunks(text, sentences_per_chunk=5):
sentences = re.split(r'(?<=[.!?])\s+', text)
chunks = []
for i in range(0, len(sentences), sentences_per_chunk):
chunk = ' '.join(sentences[i:i+sentences_per_chunk])
chunks.append(chunk)
return chunks
# Strategy 3: Semantic chunks (using embeddings)
def semantic_chunks(text, max_chunk_size=1000):
"""Split at natural boundaries"""
# Split by paragraphs first
paragraphs = text.split('\n\n')
chunks = []
current_chunk = ""
for para in paragraphs:
if len(current_chunk) + len(para) < max_chunk_size:
current_chunk += para + "\n\n"
else:
if current_chunk:
chunks.append(current_chunk)
current_chunk = para + "\n\n"
if current_chunk:
chunks.append(current_chunk)
return chunksBest Practices
# ✅ Good: Metadata for filtering
chunk_with_metadata = {
"text": "AWS Bedrock pricing...",
"metadata": {
"source": "pricing-guide.pdf",
"date": "2024-01-15",
"category": "pricing"
}
}
# ✅ Good: Appropriate chunk size
# Too small: Loses context
# Too large: Retrieves irrelevant info
# Sweet spot: 500-1000 tokens
# ✅ Good: Overlap between chunks
# Prevents losing information at boundaries
# Typical overlap: 10-20% of chunk size
# ✅ Good: Regular re-indexing
# Update knowledge base when documents change
# Monitor ingestion job status❓ What is the main problem that RAG solves?
❓ What is the purpose of document chunking?
❓ What does an embedding represent?
❓ What is a typical optimal chunk size?