Knowledge Bases & RAG

Duration: 60 min

Retrieval-Augmented Generation (RAG) combines foundation models with your own data to answer questions grounded in specific context. This module covers Bedrock Knowledge Bases, vector embeddings, chunking strategies, and retrieval patterns.

What is RAG?

RAG solves the hallucination problem by:

Converting your documents into vector embeddings
Storing embeddings in a vector database
Retrieving relevant documents when answering questions
Passing retrieved context to the model

User Question
    ↓
Vector Embedding
    ↓
Search Vector DB
    ↓
Retrieve Top-K Documents
    ↓
Combine with Prompt
    ↓
Send to Model
    ↓
Grounded Answer

Bedrock Knowledge Bases

Bedrock Knowledge Bases automate the RAG pipeline:

import boto3
import json

client = boto3.client('bedrock-agent', region_name='us-east-1')

# Create a knowledge base
response = client.create_knowledge_base(
    name='company-docs-kb',
    description='Company policies and procedures',
    roleArn='arn:aws:iam::ACCOUNT:role/BedrockKBRole',
    knowledgeBaseConfiguration={
        'type': 'VECTOR',
        'vectorKnowledgeBaseConfiguration': {
            'embeddingModel': {
                'provider': 'BEDROCK',
                'modelIdentifier': 'amazon.titan-embed-text-v2:0'
            }
        }
    },
    storageConfiguration={
        'type': 'OPENSEARCH_SERVERLESS',
        'opensearchServerlessConfiguration': {
            'collectionArn': 'arn:aws:aoss:us-east-1:ACCOUNT:collection/...'
        }
    }
)

kb_id = response['knowledgeBase']['id']
print(f"Knowledge Base ID: {kb_id}")

Data Sources

Add documents to your knowledge base:

# Create a data source
response = client.create_data_source(
    knowledgeBaseId=kb_id,
    name='s3-policies',
    description='Company policies from S3',
    dataSourceConfiguration={
        'type': 'S3',
        's3Configuration': {
            'bucketArn': 'arn:aws:s3:::my-company-docs',
            'inclusionPrefixes': ['policies/'],
            'documentEncodingConfiguration': {
                'encoding': 'UTF-8'
            }
        }
    }
)

data_source_id = response['dataSource']['id']

# Ingest documents
response = client.start_ingestion_job(
    knowledgeBaseId=kb_id,
    dataSourceId=data_source_id
)

print(f"Ingestion Job: {response['ingestionJob']['ingestionJobId']}")

Vector Embeddings

Embeddings convert text into numerical vectors for similarity search:

import boto3

bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')

# Generate embeddings for text
response = bedrock.invoke_model(
    modelId='amazon.titan-embed-text-v2:0',
    body=json.dumps({
        "inputText": "AWS Bedrock is a managed service for foundation models"
    })
)

result = json.loads(response['body'].read())
embedding = result['embedding']  # List of 1024 floats

print(f"Embedding dimension: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")

Document Chunking

Split large documents into manageable chunks:

def chunk_text(text, chunk_size=1000, overlap=200):
    """Split text into overlapping chunks"""
    chunks = []
    for i in range(0, len(text), chunk_size - overlap):
        chunk = text[i:i + chunk_size]
        chunks.append(chunk)
    return chunks

# Example
document = """
AWS Bedrock is a fully managed service that provides access to foundation models.
It supports multiple models from different providers including Anthropic, Meta, Mistral, and Stability AI.
Bedrock handles scaling, availability, and security automatically.
You can use Bedrock for various tasks including content generation, code assistance, and RAG.
"""

chunks = chunk_text(document, chunk_size=200, overlap=50)
for i, chunk in enumerate(chunks):
    print(f"Chunk {i}: {chunk[:50]}...")

Retrieval Patterns

Basic Retrieval

from boto3 import client as boto3_client

bedrock_agent_runtime = boto3_client('bedrock-agent-runtime', region_name='us-east-1')

# Retrieve documents from knowledge base
response = bedrock_agent_runtime.retrieve(
    knowledgeBaseId=kb_id,
    retrievalQuery={
        'text': 'What is the vacation policy?'
    },
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'numberOfResults': 5,
            'overrideSearchType': 'SEMANTIC'
        }
    }
)

# Process results
for result in response['retrievalResults']:
    print(f"Score: {result['score']}")
    print(f"Content: {result['content']['text'][:200]}...")

Retrieve and Generate

# Retrieve documents and generate answer
response = bedrock_agent_runtime.retrieve_and_generate(
    input={
        'text': 'What is the remote work policy?'
    },
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': kb_id,
            'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0',
            'retrievalConfiguration': {
                'vectorSearchConfiguration': {
                    'numberOfResults': 5
                }
            }
        }
    }
)

answer = response['output']['text']
print(f"Answer: {answer}")

# See which documents were used
for citation in response['citations']:
    print(f"Source: {citation['generatedResponsePart']}")

Chunking Strategies

# Strategy 1: Fixed-size chunks
def fixed_chunks(text, size=500):
    return [text[i:i+size] for i in range(0, len(text), size)]

# Strategy 2: Sentence-based chunks
import re

def sentence_chunks(text, sentences_per_chunk=5):
    sentences = re.split(r'(?<=[.!?])\s+', text)
    chunks = []
    for i in range(0, len(sentences), sentences_per_chunk):
        chunk = ' '.join(sentences[i:i+sentences_per_chunk])
        chunks.append(chunk)
    return chunks

# Strategy 3: Semantic chunks (using embeddings)
def semantic_chunks(text, max_chunk_size=1000):
    """Split at natural boundaries"""
    # Split by paragraphs first
    paragraphs = text.split('\n\n')
    chunks = []
    current_chunk = ""
    
    for para in paragraphs:
        if len(current_chunk) + len(para) < max_chunk_size:
            current_chunk += para + "\n\n"
        else:
            if current_chunk:
                chunks.append(current_chunk)
            current_chunk = para + "\n\n"
    
    if current_chunk:
        chunks.append(current_chunk)
    
    return chunks

Best Practices

# ✅ Good: Metadata for filtering
chunk_with_metadata = {
    "text": "AWS Bedrock pricing...",
    "metadata": {
        "source": "pricing-guide.pdf",
        "date": "2024-01-15",
        "category": "pricing"
    }
}

# ✅ Good: Appropriate chunk size
# Too small: Loses context
# Too large: Retrieves irrelevant info
# Sweet spot: 500-1000 tokens

# ✅ Good: Overlap between chunks
# Prevents losing information at boundaries
# Typical overlap: 10-20% of chunk size

# ✅ Good: Regular re-indexing
# Update knowledge base when documents change
# Monitor ingestion job status

❓ What is the main problem that RAG solves?

Slow API response times Model hallucinations by grounding answers in retrieved documents High token costs Model access limitations

❓ What is the purpose of document chunking?

Reduce storage costs Improve document security Split documents into manageable pieces for embedding and retrieval Compress document size

❓ What does an embedding represent?

A numerical vector representation of text for similarity search A compressed version of a document A hash of the document content A metadata tag for documents

❓ What is a typical optimal chunk size?

100-200 tokens 500-1000 tokens 2000-5000 tokens Chunk size doesn't matter