Appearance
Training Loop Overview
The Training Loop system automatically captures and evaluates AI decisions to help you understand, improve, and optimize your AI assistant's performance over time.
What is the Training Loop?
Think of the Training Loop as a "flight recorder" for your AI assistant. Just like airplanes record flight data to improve safety and performance, the Training Loop records:
- What your AI decided (messages sent, actions taken, decisions made)
- What information it had when making the decision
- What happened afterward (did the customer reply? was there an escalation?)
- How well it performed (was the decision helpful? correct?)
This data helps you:
- Understand AI behavior: See exactly what your AI did and why
- Improve performance: Identify what works and what doesn't
- Compare versions: Test new AI models or prompts against past decisions
- Make data-driven decisions: Use real outcomes to guide improvements
Key Features
Automatic Capture
Every AI decision is automatically recorded:
- Message generation: When your AI sends a message
- Tool calls: When your AI uses tools (like booking appointments or searching products)
- Playbook decisions: When your AI makes decisions based on playbook rules
All sensitive information (phone numbers, emails, etc.) is automatically removed before storage to protect privacy.
Outcome Tracking
The system automatically detects what happened after each AI decision:
- Customer replied: Did the customer respond within 24 hours?
- No further inbound: Did the conversation end without more messages?
- Escalation triggered: Was the conversation escalated to a human?
These outcomes are detected automatically—no manual tracking required.
Safe Replay
Replay past AI decisions with new models or prompts to see if they would have done better:
- No side effects: Replays happen in "dry-run" mode—no emails sent, no appointments booked
- Compare versions: See how different AI models or playbook versions would have handled the same situation
- Performance metrics: Get detailed comparisons showing agreement, safety changes, and intent shifts
Human Feedback
Provide simple feedback on AI decisions:
- Thumbs up/down: Quick feedback on whether decisions were helpful
- Optional reasons: Specify why (helpful, unhelpful, incorrect, inappropriate, other)
- Tracked over time: See feedback trends to identify improvement areas
Cohort Analysis
Group and compare AI decisions by version or experiment:
- Automatic tagging: Records are automatically tagged with version information
- Compare cohorts: See how different AI versions perform
- Analytics: Get detailed statistics on outcomes, feedback, and performance
How It Works
1. Automatic Recording
When your AI makes a decision:
- The system captures the decision automatically
- Sensitive information is removed
- Version metadata is recorded (which model, which playbook version, etc.)
- The record is stored for later analysis
You don't need to do anything—this happens automatically in the background.
2. Outcome Detection
A background worker periodically:
- Finds training records without outcome signals
- Checks for customer replies, escalations, etc.
- Records outcome signals automatically
This happens automatically—no manual tracking needed.
3. Analysis & Improvement
You can:
- View training records: See what decisions were made
- Provide feedback: Give thumbs up/down on decisions
- Replay decisions: Test new models/prompts on past decisions
- Compare cohorts: See how different versions perform
- Use analytics: Get insights into AI performance
Privacy & Security
Data Protection
- PII Redaction: All sensitive data (phone numbers, emails, etc.) is automatically removed before storage
- Tenant Isolation: Your data is completely isolated from other tenants
- Secure Storage: All data is stored securely with proper access controls
Safe Replay
- Dry-Run Mode: Replay operations never cause side effects
- No Tool Execution: Tool calls are not executed during replay
- No Training Capture: Replays don't create new training records
What Gets Recorded
AI Decisions
The system records:
- Message generation: What message the AI sent and why
- Tool calls: What tools were used and their results
- Playbook decisions: What playbook rules were triggered
Context Information
For each decision, the system records:
- Conversation history: Previous messages in the conversation
- System prompts: The instructions given to the AI
- Available tools: What tools were available
- Playbook context: Relevant playbook rules and settings
Version Metadata
Each record includes:
- AI model: Which model was used (GPT-4o, GPT-4o Mini, etc.)
- Playbook version: Which version of the playbook was active
- Prompt version: Which version of prompts was used
- Code version: Which version of the system code was running
Use Cases
Understanding AI Behavior
- Review decisions: See exactly what your AI did in specific situations
- Identify patterns: Find common behaviors or issues
- Debug problems: Understand why certain decisions were made
Improving Performance
- A/B testing: Compare different AI versions to see which performs better
- Iterative improvement: Use feedback and outcomes to guide improvements
- Data-driven optimization: Make decisions based on real performance data
Quality Assurance
- Review problematic decisions: Replay decisions that led to escalations
- Test improvements: See if new models/prompts would have done better
- Monitor quality: Track feedback and outcomes over time
Best Practices
- Review regularly: Check training records periodically to understand AI behavior
- Provide feedback: Give feedback on decisions to help improve the system
- Use cohorts: Tag experiments to compare different versions
- Replay strategically: Replay important decisions to test improvements
- Monitor outcomes: Watch outcome signals to identify trends
Next Steps
- Learn about Providing Feedback - How to give feedback on AI decisions
- Learn about Analytics & Cohorts - How to analyze and compare AI performance
- Review Training Context - How AI uses your business context

