User Simulation

Overview

User simulation enables realistic multi-turn conversation evaluation by simulating end-users interacting with your agents. Using the ActorSimulator class configured for user simulation, you can generate dynamic, goal-oriented conversations that test your agent’s ability to handle real user interactions.

The from_case_for_user_simulator() factory method automatically configures the simulator with user-appropriate profiles and behaviors:

from strands_evals import ActorSimulator, Case

case = Case(
    input="I need to book a flight to Paris",
    metadata={"task_description": "Flight booking confirmed"}
)

# Automatically configured for user simulation
user_sim = ActorSimulator.from_case_for_user_simulator(
    case=case,
    max_turns=10
)

Key Features

Realistic Actor Simulation: Generates human-like responses based on actor profiles
Multi-turn Conversations: Maintains context across multiple conversation turns
Automatic Profile Generation: Creates actor profiles from test cases
Goal-Oriented Behavior: Tracks and evaluates goal completion
Flexible Configuration: Supports custom profiles, prompts, and tools
Conversation Control: Automatic stopping based on goal completion or turn limits
Integration with Evaluators: Works seamlessly with trace-based evaluators

When to Use

Use user simulation when you need to:

Evaluate agents in multi-turn user conversations
Test how agents handle realistic user behavior
Assess goal completion from the user’s perspective
Generate diverse user interaction patterns
Evaluate agents without predefined conversation scripts
Test conversational flow and context maintenance with users

Basic Usage

Simple User Simulation

from strands import Agent
from strands_evals import Case, ActorSimulator

# Create test case
case = Case(
    name="flight-booking",
    input="I need to book a flight to Paris next week",
    metadata={"task_description": "Flight booking confirmed"}
)

# Create user simulator
user_sim = ActorSimulator.from_case_for_user_simulator(
    case=case,
    max_turns=5  # Limits conversation length; simulator may stop earlier if goal is achieved
)

# Create target agent to evaluate
agent = Agent(
    system_prompt="You are a helpful travel assistant.",
    callback_handler=None
)

# Run multi-turn conversation
user_message = case.input
conversation_log = []

while user_sim.has_next():
    # Agent responds
    agent_response = agent(user_message)
    agent_message = str(agent_response)
    conversation_log.append({"role": "agent", "message": agent_message})

    # User simulator generates next message
    user_result = user_sim.act(agent_message)
    user_message = str(user_result.structured_output.message)
    conversation_log.append({"role": "user", "message": user_message})

print(f"Conversation completed in {len(conversation_log) // 2} turns")

Actor Profiles

Actor profiles define the characteristics, context, and goals of the simulated actor.

Automatic Profile Generation

The simulator can automatically generate realistic profiles from test cases:

from strands_evals import Case, ActorSimulator

case = Case(
    input="My order hasn't arrived yet",
    metadata={"task_description": "Order status resolved and customer satisfied"}
)

# Profile is automatically generated from input and task_description
user_sim = ActorSimulator.from_case_for_user_simulator(case=case)

# Access the generated profile
print(user_sim.actor_profile.traits)
print(user_sim.actor_profile.context)
print(user_sim.actor_profile.actor_goal)

Custom Actor Profiles

For more control, create custom profiles:

from strands_evals.simulation import ActorSimulator
from strands_evals.types.simulation import ActorProfile

# Define custom profile
profile = ActorProfile(
    traits={
        "expertise_level": "expert",
        "communication_style": "technical",
        "patience_level": "low",
        "detail_preference": "high"
    },
    context="A software engineer debugging a production memory leak issue.",
    actor_goal="Identify the root cause and get actionable steps to resolve the memory leak."
)

# Create simulator with custom profile
simulator = ActorSimulator(
    actor_profile=profile,
    initial_query="Our service is experiencing high memory usage in production.",
    system_prompt_template="You are simulating: {actor_profile}",
    max_turns=10
)

Integration with Evaluators

With Trace-Based Evaluators

from strands import Agent
from strands_evals import Case, Experiment, ActorSimulator
from strands_evals.evaluators import HelpfulnessEvaluator
from strands_evals.mappers import StrandsInMemorySessionMapper
from strands_evals.telemetry import StrandsEvalsTelemetry

# Setup telemetry
telemetry = StrandsEvalsTelemetry().setup_in_memory_exporter()
memory_exporter = telemetry.in_memory_exporter

def task_function(case: Case) -> dict:
    # Create simulator
    user_sim = ActorSimulator.from_case_for_user_simulator(
        case=case,
        max_turns=5
    )

    # Create target agent
    agent = Agent(
        trace_attributes={
            "gen_ai.conversation.id": case.session_id,
            "session.id": case.session_id
        },
        system_prompt="You are a helpful assistant.",
        callback_handler=None
    )

    # Collect spans across all turns
    all_spans = []
    user_message = case.input

    while user_sim.has_next():
        # Agent responds
        agent_response = agent(user_message)
        agent_message = str(agent_response)

        # User simulator responds
        user_result = user_sim.act(agent_message)
        user_message = str(user_result.structured_output.message)

    all_spans = memory_exporter.get_finished_spans()
    # Map spans to session
    mapper = StrandsInMemorySessionMapper()
    session = mapper.map_to_session(all_spans, session_id=case.session_id)

    return {"output": agent_message, "trajectory": session}

# Create test cases
test_cases = [
    Case(
        name="booking-1",
        input="I need to book a flight to Paris",
        metadata={"task_description": "Flight booking confirmed"}
    )
]

# Run evaluation
evaluators = [HelpfulnessEvaluator()]
experiment = Experiment(cases=test_cases, evaluators=evaluators)
reports = experiment.run_evaluations(task_function)
reports[0].run_display()

Conversation Control

Automatic Stopping

The simulator automatically stops when:

Goal Completion: Actor includes <stop/> token in message
Turn Limit: Maximum number of turns is reached

user_sim = ActorSimulator.from_case_for_user_simulator(
    case=case,
    max_turns=10  # Stop after 10 turns
)

# Check if conversation should continue
while user_sim.has_next():
    # ... conversation logic ...
    pass

Manual Turn Tracking

turn_count = 0
max_turns = 5

while user_sim.has_next() and turn_count < max_turns:
    agent_response = agent(user_message)
    user_result = user_sim.act(str(agent_response))
    user_message = str(user_result.structured_output.message)
    turn_count += 1

print(f"Conversation ended after {turn_count} turns")

Actor Response Structure

Each actor response includes reasoning and the actual message. The reasoning field provides insight into the simulator’s decision-making process, helping you understand why it responded in a particular way and whether it’s behaving realistically:

user_result = user_sim.act(agent_message)

# Access structured output
reasoning = user_result.structured_output.reasoning
message = user_result.structured_output.message

print(f"Actor's reasoning: {reasoning}")
print(f"Actor's message: {message}")

# Example output:
# Actor's reasoning: "The agent provided flight options but didn't ask for my preferred time.
#                     I should specify that I prefer morning flights to move the conversation forward."
# Actor's message: "Thanks! Do you have any morning flights available?"

The reasoning is particularly useful for:

Debugging: Understanding why the simulator isn’t reaching the goal
Validation: Ensuring the simulator is behaving realistically
Analysis: Identifying patterns in how users respond to agent behavior

Advanced Usage

Custom System Prompts

custom_prompt = """
You are simulating a user with the following profile:
{actor_profile}

Guidelines:
- Be concise and direct
- Ask clarifying questions when needed
- Express satisfaction when goals are met
- Include <stop/> when your goal is achieved
"""

user_sim = ActorSimulator.from_case_for_user_simulator(
    case=case,
    system_prompt_template=custom_prompt,
    max_turns=10
)

Adding Custom Tools

from strands import tool

@tool
def check_order_status(order_id: str) -> str:
    """Check the status of an order."""
    return f"Order {order_id} is in transit"

user_sim = ActorSimulator.from_case_for_user_simulator(
    case=case,
    tools=[check_order_status],  # Additional tools for the simulator
    max_turns=10
)

Different Model for Simulation

user_sim = ActorSimulator.from_case_for_user_simulator(
    case=case,
    model="anthropic.claude-3-5-sonnet-20241022-v2:0",  # Specific model
    max_turns=10
)

Complete Example: Customer Service Evaluation

from strands import Agent
from strands_evals import Case, Experiment, ActorSimulator
from strands_evals.evaluators import HelpfulnessEvaluator, GoalSuccessRateEvaluator
from strands_evals.mappers import StrandsInMemorySessionMapper
from strands_evals.telemetry import StrandsEvalsTelemetry

# Setup telemetry
telemetry = StrandsEvalsTelemetry().setup_in_memory_exporter()
memory_exporter = telemetry.in_memory_exporter

def customer_service_task(case: Case) -> dict:
    """Simulate customer service interaction."""

    # Create user simulator
    user_sim = ActorSimulator.from_case_for_user_simulator(
        case=case,
        max_turns=8
    )

    # Create customer service agent
    agent = Agent(
        trace_attributes={
            "gen_ai.conversation.id": case.session_id,
            "session.id": case.session_id
        },
        system_prompt="""
        You are a helpful customer service agent.
        - Be empathetic and professional
        - Gather necessary information
        - Provide clear solutions
        - Confirm customer satisfaction
        """,
        callback_handler=None
    )

    # Run conversation
    all_spans = []
    user_message = case.input
    conversation_history = []

    while user_sim.has_next():
        memory_exporter.clear()

        # Agent responds
        agent_response = agent(user_message)
        agent_message = str(agent_response)
        conversation_history.append({
            "role": "agent",
            "message": agent_message
        })

        # Collect spans
        turn_spans = list(memory_exporter.get_finished_spans())
        all_spans.extend(turn_spans)

        # User responds
        user_result = user_sim.act(agent_message)
        user_message = str(user_result.structured_output.message)
        conversation_history.append({
            "role": "user",
            "message": user_message,
            "reasoning": user_result.structured_output.reasoning
        })

    # Map to session
    mapper = StrandsInMemorySessionMapper()
    session = mapper.map_to_session(all_spans, session_id=case.session_id)

    return {
        "output": agent_message,
        "trajectory": session,
        "conversation_history": conversation_history
    }

# Create diverse test cases
test_cases = [
    Case(
        name="order-issue",
        input="My order #12345 hasn't arrived and it's been 2 weeks",
        metadata={
            "category": "order_tracking",
            "task_description": "Order status checked, issue resolved, customer satisfied"
        }
    ),
    Case(
        name="product-return",
        input="I want to return a product that doesn't fit",
        metadata={
            "category": "returns",
            "task_description": "Return initiated, return label provided, customer satisfied"
        }
    ),
    Case(
        name="billing-question",
        input="I was charged twice for my last order",
        metadata={
            "category": "billing",
            "task_description": "Billing issue identified, refund processed, customer satisfied"
        }
    )
]

# Run evaluation with multiple evaluators
evaluators = [
    HelpfulnessEvaluator(),
    GoalSuccessRateEvaluator()
]

experiment = Experiment(cases=test_cases, evaluators=evaluators)
reports = experiment.run_evaluations(customer_service_task)

# Display results
for report in reports:
    print(f"\n{'='*60}")
    print(f"Evaluator: {report.evaluator_name}")
    print(f"{'='*60}")
    report.run_display()

Best Practices

1. Clear Task Descriptions

# Good: Specific, measurable goal
case = Case(
    input="I need to book a flight",
    metadata={
        "task_description": "Flight booked with confirmation number, dates confirmed, payment processed"
    }
)

# Less effective: Vague goal
case = Case(
    input="I need to book a flight",
    metadata={"task_description": "Help with booking"}
)

2. Appropriate Turn Limits

# Simple queries: 3-5 turns
user_sim = ActorSimulator.from_case_for_user_simulator(
    case=simple_case,
    max_turns=5
)

# Complex tasks: 8-15 turns
user_sim = ActorSimulator.from_case_for_user_simulator(
    case=complex_case,
    max_turns=12
)

3. Clear Span Collection

# Always clear before agent calls to avoid capturing simulator traces
while user_sim.has_next():
    memory_exporter.clear()  # Clear simulator traces
    agent_response = agent(user_message)
    turn_spans = list(memory_exporter.get_finished_spans())  # Only agent spans
    all_spans.extend(turn_spans)
    user_result = user_sim.act(str(agent_response))
    user_message = str(user_result.structured_output.message)

4. Conversation Logging

# Log conversations for analysis
conversation_log = []

while user_sim.has_next():
    agent_response = agent(user_message)
    agent_message = str(agent_response)

    user_result = user_sim.act(agent_message)
    user_message = str(user_result.structured_output.message)

    conversation_log.append({
        "turn": len(conversation_log) // 2 + 1,
        "agent": agent_message,
        "user": user_message,
        "user_reasoning": user_result.structured_output.reasoning
    })

# Save for review
import json
with open("conversation_log.json", "w") as f:
    json.dump(conversation_log, f, indent=2)

Common Patterns

Pattern 1: Goal Completion Testing

def test_goal_completion(case: Case) -> bool:
    user_sim = ActorSimulator.from_case_for_user_simulator(case=case)
    agent = Agent(system_prompt="Your agent prompt")

    user_message = case.input
    goal_completed = False

    while user_sim.has_next():
        agent_response = agent(user_message)
        user_result = user_sim.act(str(agent_response))
        user_message = str(user_result.structured_output.message)

        # Check for stop token
        if "<stop/>" in user_message:
            goal_completed = True
            break

    return goal_completed

Pattern 2: Multi-Evaluator Assessment

def comprehensive_evaluation(case: Case) -> dict:
    # ... run conversation with simulator ...

    return {
        "output": final_message,
        "trajectory": session,
        "turns_taken": turn_count,
        "goal_completed": "<stop/>" in last_user_message
    }

evaluators = [
    HelpfulnessEvaluator(),
    GoalSuccessRateEvaluator(),
    FaithfulnessEvaluator()
]

experiment = Experiment(cases=cases, evaluators=evaluators)
reports = experiment.run_evaluations(comprehensive_evaluation)

Pattern 3: Conversation Analysis

def analyze_conversation(case: Case) -> dict:
    user_sim = ActorSimulator.from_case_for_user_simulator(case=case)
    agent = Agent(system_prompt="Your prompt")

    metrics = {
        "turns": 0,
        "agent_messages": [],
        "user_messages": [],
        "user_reasoning": []
    }

    user_message = case.input
    while user_sim.has_next():
        agent_response = agent(user_message)
        agent_message = str(agent_response)
        metrics["agent_messages"].append(agent_message)

        user_result = user_sim.act(agent_message)
        user_message = str(user_result.structured_output.message)
        metrics["user_messages"].append(user_message)
        metrics["user_reasoning"].append(user_result.structured_output.reasoning)
        metrics["turns"] += 1

    return metrics

Troubleshooting

Issue: Simulator Stops Too Early

Solution: Increase max_turns or check task_description clarity

user_sim = ActorSimulator.from_case_for_user_simulator(
    case=case,
    max_turns=15  # Increase limit
)

Issue: Simulator Doesn’t Stop

Solution: Ensure task_description is achievable and clear

# Make goal specific and achievable
case = Case(
    input="I need help",
    metadata={
        "task_description": "Specific, measurable goal that can be completed"
    }
)

Issue: Unrealistic Responses

Solution: Use custom profile or adjust system prompt

custom_prompt = """
You are simulating a realistic user with: {actor_profile}

Be natural and human-like:
- Don't be overly formal
- Ask follow-up questions naturally
- Express emotions appropriately
- Include <stop/> only when truly satisfied
"""

user_sim = ActorSimulator.from_case_for_user_simulator(
    case=case,
    system_prompt_template=custom_prompt
)

Issue: Capturing Simulator Traces

Solution: Always clear exporter before agent calls

while user_sim.has_next():
    memory_exporter.clear()  # Critical: clear before agent call
    agent_response = agent(user_message)
    spans = list(memory_exporter.get_finished_spans())
    # ... rest of logic ...

Simulators Overview: Learn about the ActorSimulator and simulator framework
Quickstart Guide: Get started with Strands Evals
Helpfulness Evaluator: Evaluate conversation helpfulness
Goal Success Rate Evaluator: Assess goal completion