User Simulation
Section titled “User Simulation”Overview
Section titled “Overview”User simulation enables realistic multi-turn conversation evaluation by simulating end-users interacting with your agents. Using the ActorSimulator class configured for user simulation, you can generate dynamic, goal-oriented conversations that test your agent’s ability to handle real user interactions.
The from_case_for_user_simulator() factory method automatically configures the simulator with user-appropriate profiles and behaviors:
from strands_evals import ActorSimulator, Case
case = Case( input="I need to book a flight to Paris", metadata={"task_description": "Flight booking confirmed"})
# Automatically configured for user simulationuser_sim = ActorSimulator.from_case_for_user_simulator( case=case, max_turns=10)Key Features
Section titled “Key Features”- Realistic Actor Simulation: Generates human-like responses based on actor profiles
- Multi-turn Conversations: Maintains context across multiple conversation turns
- Automatic Profile Generation: Creates actor profiles from test cases
- Goal-Oriented Behavior: Tracks and evaluates goal completion
- Flexible Configuration: Supports custom profiles, prompts, and tools
- Conversation Control: Automatic stopping based on goal completion or turn limits
- Integration with Evaluators: Works seamlessly with trace-based evaluators
When to Use
Section titled “When to Use”Use user simulation when you need to:
- Evaluate agents in multi-turn user conversations
- Test how agents handle realistic user behavior
- Assess goal completion from the user’s perspective
- Generate diverse user interaction patterns
- Evaluate agents without predefined conversation scripts
- Test conversational flow and context maintenance with users
Basic Usage
Section titled “Basic Usage”Simple User Simulation
Section titled “Simple User Simulation”from strands import Agentfrom strands_evals import Case, ActorSimulator
# Create test casecase = Case( name="flight-booking", input="I need to book a flight to Paris next week", metadata={"task_description": "Flight booking confirmed"})
# Create user simulatoruser_sim = ActorSimulator.from_case_for_user_simulator( case=case, max_turns=5 # Limits conversation length; simulator may stop earlier if goal is achieved)
# Create target agent to evaluateagent = Agent( system_prompt="You are a helpful travel assistant.", callback_handler=None)
# Run multi-turn conversationuser_message = case.inputconversation_log = []
while user_sim.has_next(): # Agent responds agent_response = agent(user_message) agent_message = str(agent_response) conversation_log.append({"role": "agent", "message": agent_message})
# User simulator generates next message user_result = user_sim.act(agent_message) user_message = str(user_result.structured_output.message) conversation_log.append({"role": "user", "message": user_message})
print(f"Conversation completed in {len(conversation_log) // 2} turns")Actor Profiles
Section titled “Actor Profiles”Actor profiles define the characteristics, context, and goals of the simulated actor.
Automatic Profile Generation
Section titled “Automatic Profile Generation”The simulator can automatically generate realistic profiles from test cases:
from strands_evals import Case, ActorSimulator
case = Case( input="My order hasn't arrived yet", metadata={"task_description": "Order status resolved and customer satisfied"})
# Profile is automatically generated from input and task_descriptionuser_sim = ActorSimulator.from_case_for_user_simulator(case=case)
# Access the generated profileprint(user_sim.actor_profile.traits)print(user_sim.actor_profile.context)print(user_sim.actor_profile.actor_goal)Custom Actor Profiles
Section titled “Custom Actor Profiles”For more control, create custom profiles:
from strands_evals.simulation import ActorSimulatorfrom strands_evals.types.simulation import ActorProfile
# Define custom profileprofile = ActorProfile( traits={ "expertise_level": "expert", "communication_style": "technical", "patience_level": "low", "detail_preference": "high" }, context="A software engineer debugging a production memory leak issue.", actor_goal="Identify the root cause and get actionable steps to resolve the memory leak.")
# Create simulator with custom profilesimulator = ActorSimulator( actor_profile=profile, initial_query="Our service is experiencing high memory usage in production.", system_prompt_template="You are simulating: {actor_profile}", max_turns=10)Integration with Evaluators
Section titled “Integration with Evaluators”With Trace-Based Evaluators
Section titled “With Trace-Based Evaluators”from strands import Agentfrom strands_evals import Case, Experiment, ActorSimulatorfrom strands_evals.evaluators import HelpfulnessEvaluatorfrom strands_evals.mappers import StrandsInMemorySessionMapperfrom strands_evals.telemetry import StrandsEvalsTelemetry
# Setup telemetrytelemetry = StrandsEvalsTelemetry().setup_in_memory_exporter()memory_exporter = telemetry.in_memory_exporter
def task_function(case: Case) -> dict: # Create simulator user_sim = ActorSimulator.from_case_for_user_simulator( case=case, max_turns=5 )
# Create target agent agent = Agent( trace_attributes={ "gen_ai.conversation.id": case.session_id, "session.id": case.session_id }, system_prompt="You are a helpful assistant.", callback_handler=None )
# Collect spans across all turns all_spans = [] user_message = case.input
while user_sim.has_next(): # Agent responds agent_response = agent(user_message) agent_message = str(agent_response)
# User simulator responds user_result = user_sim.act(agent_message) user_message = str(user_result.structured_output.message)
all_spans = memory_exporter.get_finished_spans() # Map spans to session mapper = StrandsInMemorySessionMapper() session = mapper.map_to_session(all_spans, session_id=case.session_id)
return {"output": agent_message, "trajectory": session}
# Create test casestest_cases = [ Case( name="booking-1", input="I need to book a flight to Paris", metadata={"task_description": "Flight booking confirmed"} )]
# Run evaluationevaluators = [HelpfulnessEvaluator()]experiment = Experiment(cases=test_cases, evaluators=evaluators)reports = experiment.run_evaluations(task_function)reports[0].run_display()Conversation Control
Section titled “Conversation Control”Automatic Stopping
Section titled “Automatic Stopping”The simulator automatically stops when:
- Goal Completion: Actor includes
<stop/>token in message - Turn Limit: Maximum number of turns is reached
user_sim = ActorSimulator.from_case_for_user_simulator( case=case, max_turns=10 # Stop after 10 turns)
# Check if conversation should continuewhile user_sim.has_next(): # ... conversation logic ... passManual Turn Tracking
Section titled “Manual Turn Tracking”turn_count = 0max_turns = 5
while user_sim.has_next() and turn_count < max_turns: agent_response = agent(user_message) user_result = user_sim.act(str(agent_response)) user_message = str(user_result.structured_output.message) turn_count += 1
print(f"Conversation ended after {turn_count} turns")Actor Response Structure
Section titled “Actor Response Structure”Each actor response includes reasoning and the actual message. The reasoning field provides insight into the simulator’s decision-making process, helping you understand why it responded in a particular way and whether it’s behaving realistically:
user_result = user_sim.act(agent_message)
# Access structured outputreasoning = user_result.structured_output.reasoningmessage = user_result.structured_output.message
print(f"Actor's reasoning: {reasoning}")print(f"Actor's message: {message}")
# Example output:# Actor's reasoning: "The agent provided flight options but didn't ask for my preferred time.# I should specify that I prefer morning flights to move the conversation forward."# Actor's message: "Thanks! Do you have any morning flights available?"The reasoning is particularly useful for:
- Debugging: Understanding why the simulator isn’t reaching the goal
- Validation: Ensuring the simulator is behaving realistically
- Analysis: Identifying patterns in how users respond to agent behavior
Advanced Usage
Section titled “Advanced Usage”Custom System Prompts
Section titled “Custom System Prompts”custom_prompt = """You are simulating a user with the following profile:{actor_profile}
Guidelines:- Be concise and direct- Ask clarifying questions when needed- Express satisfaction when goals are met- Include <stop/> when your goal is achieved"""
user_sim = ActorSimulator.from_case_for_user_simulator( case=case, system_prompt_template=custom_prompt, max_turns=10)Adding Custom Tools
Section titled “Adding Custom Tools”from strands import tool
@tooldef check_order_status(order_id: str) -> str: """Check the status of an order.""" return f"Order {order_id} is in transit"
user_sim = ActorSimulator.from_case_for_user_simulator( case=case, tools=[check_order_status], # Additional tools for the simulator max_turns=10)Different Model for Simulation
Section titled “Different Model for Simulation”user_sim = ActorSimulator.from_case_for_user_simulator( case=case, model="anthropic.claude-3-5-sonnet-20241022-v2:0", # Specific model max_turns=10)Complete Example: Customer Service Evaluation
Section titled “Complete Example: Customer Service Evaluation”from strands import Agentfrom strands_evals import Case, Experiment, ActorSimulatorfrom strands_evals.evaluators import HelpfulnessEvaluator, GoalSuccessRateEvaluatorfrom strands_evals.mappers import StrandsInMemorySessionMapperfrom strands_evals.telemetry import StrandsEvalsTelemetry
# Setup telemetrytelemetry = StrandsEvalsTelemetry().setup_in_memory_exporter()memory_exporter = telemetry.in_memory_exporter
def customer_service_task(case: Case) -> dict: """Simulate customer service interaction."""
# Create user simulator user_sim = ActorSimulator.from_case_for_user_simulator( case=case, max_turns=8 )
# Create customer service agent agent = Agent( trace_attributes={ "gen_ai.conversation.id": case.session_id, "session.id": case.session_id }, system_prompt=""" You are a helpful customer service agent. - Be empathetic and professional - Gather necessary information - Provide clear solutions - Confirm customer satisfaction """, callback_handler=None )
# Run conversation all_spans = [] user_message = case.input conversation_history = []
while user_sim.has_next(): memory_exporter.clear()
# Agent responds agent_response = agent(user_message) agent_message = str(agent_response) conversation_history.append({ "role": "agent", "message": agent_message })
# Collect spans turn_spans = list(memory_exporter.get_finished_spans()) all_spans.extend(turn_spans)
# User responds user_result = user_sim.act(agent_message) user_message = str(user_result.structured_output.message) conversation_history.append({ "role": "user", "message": user_message, "reasoning": user_result.structured_output.reasoning })
# Map to session mapper = StrandsInMemorySessionMapper() session = mapper.map_to_session(all_spans, session_id=case.session_id)
return { "output": agent_message, "trajectory": session, "conversation_history": conversation_history }
# Create diverse test casestest_cases = [ Case( name="order-issue", input="My order #12345 hasn't arrived and it's been 2 weeks", metadata={ "category": "order_tracking", "task_description": "Order status checked, issue resolved, customer satisfied" } ), Case( name="product-return", input="I want to return a product that doesn't fit", metadata={ "category": "returns", "task_description": "Return initiated, return label provided, customer satisfied" } ), Case( name="billing-question", input="I was charged twice for my last order", metadata={ "category": "billing", "task_description": "Billing issue identified, refund processed, customer satisfied" } )]
# Run evaluation with multiple evaluatorsevaluators = [ HelpfulnessEvaluator(), GoalSuccessRateEvaluator()]
experiment = Experiment(cases=test_cases, evaluators=evaluators)reports = experiment.run_evaluations(customer_service_task)
# Display resultsfor report in reports: print(f"\n{'='*60}") print(f"Evaluator: {report.evaluator_name}") print(f"{'='*60}") report.run_display()Best Practices
Section titled “Best Practices”1. Clear Task Descriptions
Section titled “1. Clear Task Descriptions”# Good: Specific, measurable goalcase = Case( input="I need to book a flight", metadata={ "task_description": "Flight booked with confirmation number, dates confirmed, payment processed" })
# Less effective: Vague goalcase = Case( input="I need to book a flight", metadata={"task_description": "Help with booking"})2. Appropriate Turn Limits
Section titled “2. Appropriate Turn Limits”# Simple queries: 3-5 turnsuser_sim = ActorSimulator.from_case_for_user_simulator( case=simple_case, max_turns=5)
# Complex tasks: 8-15 turnsuser_sim = ActorSimulator.from_case_for_user_simulator( case=complex_case, max_turns=12)3. Clear Span Collection
Section titled “3. Clear Span Collection”# Always clear before agent calls to avoid capturing simulator traceswhile user_sim.has_next(): memory_exporter.clear() # Clear simulator traces agent_response = agent(user_message) turn_spans = list(memory_exporter.get_finished_spans()) # Only agent spans all_spans.extend(turn_spans) user_result = user_sim.act(str(agent_response)) user_message = str(user_result.structured_output.message)4. Conversation Logging
Section titled “4. Conversation Logging”# Log conversations for analysisconversation_log = []
while user_sim.has_next(): agent_response = agent(user_message) agent_message = str(agent_response)
user_result = user_sim.act(agent_message) user_message = str(user_result.structured_output.message)
conversation_log.append({ "turn": len(conversation_log) // 2 + 1, "agent": agent_message, "user": user_message, "user_reasoning": user_result.structured_output.reasoning })
# Save for reviewimport jsonwith open("conversation_log.json", "w") as f: json.dump(conversation_log, f, indent=2)Common Patterns
Section titled “Common Patterns”Pattern 1: Goal Completion Testing
Section titled “Pattern 1: Goal Completion Testing”def test_goal_completion(case: Case) -> bool: user_sim = ActorSimulator.from_case_for_user_simulator(case=case) agent = Agent(system_prompt="Your agent prompt")
user_message = case.input goal_completed = False
while user_sim.has_next(): agent_response = agent(user_message) user_result = user_sim.act(str(agent_response)) user_message = str(user_result.structured_output.message)
# Check for stop token if "<stop/>" in user_message: goal_completed = True break
return goal_completedPattern 2: Multi-Evaluator Assessment
Section titled “Pattern 2: Multi-Evaluator Assessment”def comprehensive_evaluation(case: Case) -> dict: # ... run conversation with simulator ...
return { "output": final_message, "trajectory": session, "turns_taken": turn_count, "goal_completed": "<stop/>" in last_user_message }
evaluators = [ HelpfulnessEvaluator(), GoalSuccessRateEvaluator(), FaithfulnessEvaluator()]
experiment = Experiment(cases=cases, evaluators=evaluators)reports = experiment.run_evaluations(comprehensive_evaluation)Pattern 3: Conversation Analysis
Section titled “Pattern 3: Conversation Analysis”def analyze_conversation(case: Case) -> dict: user_sim = ActorSimulator.from_case_for_user_simulator(case=case) agent = Agent(system_prompt="Your prompt")
metrics = { "turns": 0, "agent_messages": [], "user_messages": [], "user_reasoning": [] }
user_message = case.input while user_sim.has_next(): agent_response = agent(user_message) agent_message = str(agent_response) metrics["agent_messages"].append(agent_message)
user_result = user_sim.act(agent_message) user_message = str(user_result.structured_output.message) metrics["user_messages"].append(user_message) metrics["user_reasoning"].append(user_result.structured_output.reasoning) metrics["turns"] += 1
return metricsTroubleshooting
Section titled “Troubleshooting”Issue: Simulator Stops Too Early
Section titled “Issue: Simulator Stops Too Early”Solution: Increase max_turns or check task_description clarity
user_sim = ActorSimulator.from_case_for_user_simulator( case=case, max_turns=15 # Increase limit)Issue: Simulator Doesn’t Stop
Section titled “Issue: Simulator Doesn’t Stop”Solution: Ensure task_description is achievable and clear
# Make goal specific and achievablecase = Case( input="I need help", metadata={ "task_description": "Specific, measurable goal that can be completed" })Issue: Unrealistic Responses
Section titled “Issue: Unrealistic Responses”Solution: Use custom profile or adjust system prompt
custom_prompt = """You are simulating a realistic user with: {actor_profile}
Be natural and human-like:- Don't be overly formal- Ask follow-up questions naturally- Express emotions appropriately- Include <stop/> only when truly satisfied"""
user_sim = ActorSimulator.from_case_for_user_simulator( case=case, system_prompt_template=custom_prompt)Issue: Capturing Simulator Traces
Section titled “Issue: Capturing Simulator Traces”Solution: Always clear exporter before agent calls
while user_sim.has_next(): memory_exporter.clear() # Critical: clear before agent call agent_response = agent(user_message) spans = list(memory_exporter.get_finished_spans()) # ... rest of logic ...Related Documentation
Section titled “Related Documentation”- Simulators Overview: Learn about the ActorSimulator and simulator framework
- Quickstart Guide: Get started with Strands Evals
- Helpfulness Evaluator: Evaluate conversation helpfulness
- Goal Success Rate Evaluator: Assess goal completion