Training Loop Overview

The Training Loop system automatically captures and evaluates AI decisions to help you understand, improve, and optimize your AI assistant's performance over time.

What is the Training Loop?

Think of the Training Loop as a "flight recorder" for your AI assistant. Just like airplanes record flight data to improve safety and performance, the Training Loop records:

What your AI decided (messages sent, actions taken, decisions made)
What information it had when making the decision
What happened afterward (did the customer reply? was there an escalation?)
How well it performed (was the decision helpful? correct?)

This data helps you:

Understand AI behavior: See exactly what your AI did and why
Improve performance: Identify what works and what doesn't
Compare versions: Test new AI models or prompts against past decisions
Make data-driven decisions: Use real outcomes to guide improvements

Key Features

Automatic Capture

Every AI decision is automatically recorded:

Message generation: When your AI sends a message
Tool calls: When your AI uses tools (like booking appointments or searching products)
Playbook decisions: When your AI makes decisions based on playbook rules

All sensitive information (phone numbers, emails, etc.) is automatically removed before storage to protect privacy.

Outcome Tracking

The system automatically detects what happened after each AI decision:

Customer replied: Did the customer respond within 24 hours?
No further inbound: Did the conversation end without more messages?
Escalation triggered: Was the conversation escalated to a human?

These outcomes are detected automatically—no manual tracking required.

Safe Replay

Replay past AI decisions with new models or prompts to see if they would have done better:

No side effects: Replays happen in "dry-run" mode—no emails sent, no appointments booked
Compare versions: See how different AI models or playbook versions would have handled the same situation
Performance metrics: Get detailed comparisons showing agreement, safety changes, and intent shifts

Human Feedback

Provide simple feedback on AI decisions:

Thumbs up/down: Quick feedback on whether decisions were helpful
Optional reasons: Specify why (helpful, unhelpful, incorrect, inappropriate, other)
Tracked over time: See feedback trends to identify improvement areas

Cohort Analysis

Group and compare AI decisions by version or experiment:

Automatic tagging: Records are automatically tagged with version information
Compare cohorts: See how different AI versions perform
Analytics: Get detailed statistics on outcomes, feedback, and performance

How It Works

1. Automatic Recording

When your AI makes a decision:

The system captures the decision automatically
Sensitive information is removed
Version metadata is recorded (which model, which playbook version, etc.)
The record is stored for later analysis

You don't need to do anything—this happens automatically in the background.

2. Outcome Detection

A background worker periodically:

Finds training records without outcome signals
Checks for customer replies, escalations, etc.
Records outcome signals automatically

This happens automatically—no manual tracking needed.

3. Analysis & Improvement

You can:

View training records: See what decisions were made
Provide feedback: Give thumbs up/down on decisions
Replay decisions: Test new models/prompts on past decisions
Compare cohorts: See how different versions perform
Use analytics: Get insights into AI performance

Privacy & Security

Data Protection

PII Redaction: All sensitive data (phone numbers, emails, etc.) is automatically removed before storage
Tenant Isolation: Your data is completely isolated from other tenants
Secure Storage: All data is stored securely with proper access controls

Safe Replay

Dry-Run Mode: Replay operations never cause side effects
No Tool Execution: Tool calls are not executed during replay
No Training Capture: Replays don't create new training records

What Gets Recorded

AI Decisions

The system records:

Message generation: What message the AI sent and why
Tool calls: What tools were used and their results
Playbook decisions: What playbook rules were triggered

Context Information

For each decision, the system records:

Conversation history: Previous messages in the conversation
System prompts: The instructions given to the AI
Available tools: What tools were available
Playbook context: Relevant playbook rules and settings

Version Metadata

Each record includes:

AI model: Which model was used (GPT-4o, GPT-4o Mini, etc.)
Playbook version: Which version of the playbook was active
Prompt version: Which version of prompts was used
Code version: Which version of the system code was running

Use Cases

Understanding AI Behavior

Review decisions: See exactly what your AI did in specific situations
Identify patterns: Find common behaviors or issues
Debug problems: Understand why certain decisions were made

Improving Performance

A/B testing: Compare different AI versions to see which performs better
Iterative improvement: Use feedback and outcomes to guide improvements
Data-driven optimization: Make decisions based on real performance data

Quality Assurance

Review problematic decisions: Replay decisions that led to escalations
Test improvements: See if new models/prompts would have done better
Monitor quality: Track feedback and outcomes over time

Best Practices

Review regularly: Check training records periodically to understand AI behavior
Provide feedback: Give feedback on decisions to help improve the system
Use cohorts: Tag experiments to compare different versions
Replay strategically: Replay important decisions to test improvements
Monitor outcomes: Watch outcome signals to identify trends

Next Steps

Learn about Providing Feedback - How to give feedback on AI decisions
Learn about Analytics & Cohorts - How to analyze and compare AI performance
Review Training Context - How AI uses your business context

Training Loop Overview ​

What is the Training Loop? ​

Key Features ​

Automatic Capture ​

Outcome Tracking ​

Safe Replay ​

Human Feedback ​

Cohort Analysis ​

How It Works ​

1. Automatic Recording ​

2. Outcome Detection ​

3. Analysis & Improvement ​

Privacy & Security ​

Data Protection ​

Safe Replay ​

What Gets Recorded ​

AI Decisions ​

Context Information ​

Version Metadata ​

Use Cases ​

Understanding AI Behavior ​

Improving Performance ​

Quality Assurance ​

Best Practices ​

Next Steps ​

Training Loop Overview

What is the Training Loop?

Key Features

Automatic Capture

Outcome Tracking

Safe Replay

Human Feedback

Cohort Analysis

How It Works

1. Automatic Recording

2. Outcome Detection

3. Analysis & Improvement

Privacy & Security

Data Protection

Safe Replay

What Gets Recorded

AI Decisions

Context Information

Version Metadata

Use Cases

Understanding AI Behavior

Improving Performance

Quality Assurance

Best Practices

Next Steps