Skip to content

Training Loop Overview

The Training Loop system automatically captures and evaluates AI decisions to help you understand, improve, and optimize your AI assistant's performance over time.

What is the Training Loop?

Think of the Training Loop as a "flight recorder" for your AI assistant. Just like airplanes record flight data to improve safety and performance, the Training Loop records:

  • What your AI decided (messages sent, actions taken, decisions made)
  • What information it had when making the decision
  • What happened afterward (did the customer reply? was there an escalation?)
  • How well it performed (was the decision helpful? correct?)

This data helps you:

  • Understand AI behavior: See exactly what your AI did and why
  • Improve performance: Identify what works and what doesn't
  • Compare versions: Test new AI models or prompts against past decisions
  • Make data-driven decisions: Use real outcomes to guide improvements

Key Features

Automatic Capture

Every AI decision is automatically recorded:

  • Message generation: When your AI sends a message
  • Tool calls: When your AI uses tools (like booking appointments or searching products)
  • Playbook decisions: When your AI makes decisions based on playbook rules

All sensitive information (phone numbers, emails, etc.) is automatically removed before storage to protect privacy.

Outcome Tracking

The system automatically detects what happened after each AI decision:

  • Customer replied: Did the customer respond within 24 hours?
  • No further inbound: Did the conversation end without more messages?
  • Escalation triggered: Was the conversation escalated to a human?

These outcomes are detected automatically—no manual tracking required.

Safe Replay

Replay past AI decisions with new models or prompts to see if they would have done better:

  • No side effects: Replays happen in "dry-run" mode—no emails sent, no appointments booked
  • Compare versions: See how different AI models or playbook versions would have handled the same situation
  • Performance metrics: Get detailed comparisons showing agreement, safety changes, and intent shifts

Human Feedback

Provide simple feedback on AI decisions:

  • Thumbs up/down: Quick feedback on whether decisions were helpful
  • Optional reasons: Specify why (helpful, unhelpful, incorrect, inappropriate, other)
  • Tracked over time: See feedback trends to identify improvement areas

Cohort Analysis

Group and compare AI decisions by version or experiment:

  • Automatic tagging: Records are automatically tagged with version information
  • Compare cohorts: See how different AI versions perform
  • Analytics: Get detailed statistics on outcomes, feedback, and performance

How It Works

1. Automatic Recording

When your AI makes a decision:

  1. The system captures the decision automatically
  2. Sensitive information is removed
  3. Version metadata is recorded (which model, which playbook version, etc.)
  4. The record is stored for later analysis

You don't need to do anything—this happens automatically in the background.

2. Outcome Detection

A background worker periodically:

  1. Finds training records without outcome signals
  2. Checks for customer replies, escalations, etc.
  3. Records outcome signals automatically

This happens automatically—no manual tracking needed.

3. Analysis & Improvement

You can:

  • View training records: See what decisions were made
  • Provide feedback: Give thumbs up/down on decisions
  • Replay decisions: Test new models/prompts on past decisions
  • Compare cohorts: See how different versions perform
  • Use analytics: Get insights into AI performance

Privacy & Security

Data Protection

  • PII Redaction: All sensitive data (phone numbers, emails, etc.) is automatically removed before storage
  • Tenant Isolation: Your data is completely isolated from other tenants
  • Secure Storage: All data is stored securely with proper access controls

Safe Replay

  • Dry-Run Mode: Replay operations never cause side effects
  • No Tool Execution: Tool calls are not executed during replay
  • No Training Capture: Replays don't create new training records

What Gets Recorded

AI Decisions

The system records:

  • Message generation: What message the AI sent and why
  • Tool calls: What tools were used and their results
  • Playbook decisions: What playbook rules were triggered

Context Information

For each decision, the system records:

  • Conversation history: Previous messages in the conversation
  • System prompts: The instructions given to the AI
  • Available tools: What tools were available
  • Playbook context: Relevant playbook rules and settings

Version Metadata

Each record includes:

  • AI model: Which model was used (GPT-4o, GPT-4o Mini, etc.)
  • Playbook version: Which version of the playbook was active
  • Prompt version: Which version of prompts was used
  • Code version: Which version of the system code was running

Use Cases

Understanding AI Behavior

  • Review decisions: See exactly what your AI did in specific situations
  • Identify patterns: Find common behaviors or issues
  • Debug problems: Understand why certain decisions were made

Improving Performance

  • A/B testing: Compare different AI versions to see which performs better
  • Iterative improvement: Use feedback and outcomes to guide improvements
  • Data-driven optimization: Make decisions based on real performance data

Quality Assurance

  • Review problematic decisions: Replay decisions that led to escalations
  • Test improvements: See if new models/prompts would have done better
  • Monitor quality: Track feedback and outcomes over time

Best Practices

  1. Review regularly: Check training records periodically to understand AI behavior
  2. Provide feedback: Give feedback on decisions to help improve the system
  3. Use cohorts: Tag experiments to compare different versions
  4. Replay strategically: Replay important decisions to test improvements
  5. Monitor outcomes: Watch outcome signals to identify trends

Next Steps

autoch.at Documentation