Prompt Engineering for Bedrock

Duration: 50 min

Prompt engineering is the art of crafting inputs to get the best outputs from foundation models. This module covers system prompts, temperature, top_p, stop sequences, and practical techniques to improve model responses.

System Prompts

A system prompt sets the context and behavior for the model. It's like giving instructions to an AI assistant before they start working.

import boto3
import json

client = boto3.client('bedrock-runtime', region_name='us-east-1')

# Without system prompt
response = client.converse(
    modelId='anthropic.claude-3-sonnet-20240229-v1:0',
    messages=[
        {"role": "user", "content": "What is machine learning?"}
    ]
)

# With system prompt
response = client.converse(
    modelId='anthropic.claude-3-sonnet-20240229-v1:0',
    messages=[
        {"role": "user", "content": "What is machine learning?"}
    ],
    system="You are a beginner-friendly AI tutor. Explain concepts simply with examples."
)

print(response['output']['message']['content'][0]['text'])

Temperature Parameter

Temperature controls randomness. Lower values (0.0-0.5) produce deterministic, focused responses. Higher values (0.7-1.0) produce creative, varied responses.

# Deterministic response (good for Q&A, classification)
response = client.converse(
    modelId='anthropic.claude-3-sonnet-20240229-v1:0',
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
    inferenceConfig={
        "temperature": 0.0,  # Always the same answer
        "maxTokens": 100
    }
)

# Creative response (good for brainstorming, content generation)
response = client.converse(
    modelId='anthropic.claude-3-sonnet-20240229-v1:0',
    messages=[
        {"role": "user", "content": "Write a creative story about AI"}
    ],
    inferenceConfig={
        "temperature": 0.9,  # Varied, creative responses
        "maxTokens": 500
    }
)

Top-P (Nucleus Sampling)

Top-P controls diversity by only considering tokens with cumulative probability up to P. Values between 0.0 and 1.0.

# Conservative (top_p=0.5): Only consider top 50% of likely tokens
response = client.converse(
    modelId='anthropic.claude-3-sonnet-20240229-v1:0',
    messages=[
        {"role": "user", "content": "Classify this email as spam or not: 'Buy cheap watches now!'"}
    ],
    inferenceConfig={
        "topP": 0.5,
        "temperature": 0.3,
        "maxTokens": 50
    }
)

# Diverse (top_p=0.95): Consider more token options
response = client.converse(
    modelId='anthropic.claude-3-sonnet-20240229-v1:0',
    messages=[
        {"role": "user", "content": "Generate 3 creative product names for a coffee app"}
    ],
    inferenceConfig={
        "topP": 0.95,
        "temperature": 0.8,
        "maxTokens": 200
    }
)

Stop Sequences

Stop sequences tell the model when to stop generating. Useful for structured outputs.

# Stop at newline to get single-line responses
response = client.invoke_model(
    modelId='anthropic.claude-3-sonnet-20240229-v1:0',
    body=json.dumps({
        "anthropic_version": "bedrock-2023-06-01",
        "max_tokens": 100,
        "stop_sequences": ["\n"],
        "messages": [
            {"role": "user", "content": "List one benefit of AWS Bedrock"}
        ]
    })
)

# Stop at specific marker
response = client.invoke_model(
    modelId='anthropic.claude-3-sonnet-20240229-v1:0',
    body=json.dumps({
        "anthropic_version": "bedrock-2023-06-01",
        "max_tokens": 500,
        "stop_sequences": ["</answer>"],
        "messages": [
            {"role": "user", "content": "Answer: <answer>What is RAG?</answer>"}
        ]
    })
)

Prompt Techniques

Few-Shot Prompting

Provide examples to guide the model's behavior:

prompt = """
Classify the sentiment of these reviews:

Example 1: "This product is amazing!" → Positive
Example 2: "Terrible quality, waste of money" → Negative
Example 3: "It's okay, nothing special" → Neutral

Now classify: "Best purchase I've made all year!"
"""

response = client.converse(
    modelId='anthropic.claude-3-sonnet-20240229-v1:0',
    messages=[
        {"role": "user", "content": prompt}
    ],
    inferenceConfig={"maxTokens": 50, "temperature": 0.0}
)

Chain-of-Thought Prompting

Ask the model to explain its reasoning:

prompt = """
Solve this step by step:
If a train travels 100 miles in 2 hours, how far does it travel in 5 hours?

Think through:
1. What is the speed?
2. How far in 5 hours?
"""

response = client.converse(
    modelId='anthropic.claude-3-sonnet-20240229-v1:0',
    messages=[
        {"role": "user", "content": prompt}
    ],
    inferenceConfig={"maxTokens": 300}
)

Role-Based Prompting

Assign a role to get specialized responses:

response = client.converse(
    modelId='anthropic.claude-3-sonnet-20240229-v1:0',
    messages=[
        {"role": "user", "content": "Explain quantum computing"}
    ],
    system="You are a quantum physics professor explaining to undergraduate students. Use analogies and avoid jargon.",
    inferenceConfig={"maxTokens": 500}
)

Structured Output

Use JSON formatting to get structured responses:

prompt = """
Extract information from this text and return as JSON:
"John Smith works at Acme Corp as a Software Engineer. He has 5 years of experience."

Return format:
{
  "name": "...",
  "company": "...",
  "role": "...",
  "experience_years": ...
}
"""

response = client.converse(
    modelId='anthropic.claude-3-sonnet-20240229-v1:0',
    messages=[
        {"role": "user", "content": prompt}
    ],
    inferenceConfig={"maxTokens": 200, "temperature": 0.0}
)

# Parse JSON response
import json
result = json.loads(response['output']['message']['content'][0]['text'])
print(result['name'])

Best Practices

# ✅ Good: Clear, specific, with context
good_prompt = """
You are a Python code reviewer. Review this function for bugs, performance issues, and style.
Focus on correctness first, then optimization.

def calculate_average(numbers):
    return sum(numbers) / len(numbers)

Provide:
1. Any bugs found
2. Performance concerns
3. Style improvements
"""

# ❌ Bad: Vague, no context
bad_prompt = "Review this code"

# ✅ Good: Specify output format
structured_prompt = """
Summarize this article in JSON format:
{
  "title": "...",
  "main_points": ["...", "..."],
  "conclusion": "..."
}

Article: [text here]
"""

# ✅ Good: Use temperature appropriately
# For factual tasks: temperature=0.0-0.3
# For creative tasks: temperature=0.7-1.0

❓ What temperature should you use for factual Q&A tasks?

0.0-0.3 (low, deterministic) 0.5-0.7 (medium) 0.8-1.0 (high, creative) Temperature doesn't matter for Q&A

❓ What is the purpose of a system prompt?

To increase response speed To reduce token usage To set context and guide the model's behavior To enable streaming responses

❓ What does top_p=0.5 mean?

Use the top 50% of models Only consider tokens with cumulative probability up to 50% Reduce output length by 50% Use 50% of available tokens

❓ Which technique provides examples to guide model behavior?

Few-shot prompting Chain-of-thought prompting Role-based prompting Stop sequence prompting