Skip to content

Training Loop Analytics & Cohorts

The Training Loop system provides powerful analytics to help you understand, compare, and improve your AI assistant's performance. Use cohorts to group and compare different AI versions or experiments.

Overview

Analytics in the Training Loop system help you:

  • Compare AI versions: See how different models or playbook versions perform
  • Track outcomes: Monitor customer replies, escalations, and other outcomes
  • Analyze feedback: Understand what users think about AI decisions
  • Identify trends: Spot patterns in performance over time
  • Make data-driven decisions: Use real data to guide improvements

Cohorts

What are Cohorts?

Cohorts are groups of training records that share common characteristics. Records are automatically tagged with cohorts based on:

  • Playbook version: Which version of the playbook was used
  • Prompt version: Which version of prompts was used
  • AI model: Which AI model was used (GPT-4o, GPT-4o Mini, etc.)
  • Code version: Which version of the system code was running

Automatic Tagging

Training records are automatically tagged with cohorts when they're created. The cohort tag includes version information, making it easy to:

  • Group by version: See all decisions from a specific AI version
  • Compare experiments: Compare different versions side-by-side
  • Track changes: See how performance changes with new versions

Cohort Examples

Example cohort tags:

  • playbook_v3.2_control: Control group for playbook version 3.2
  • playbook_v3.2_experiment: Experiment group for playbook version 3.2
  • gpt4o_vs_gpt4o_mini: Comparison between two models

Analytics Metrics

Record Counts

See how many training records are in each cohort:

  • Total records: Total number of AI decisions recorded
  • By action type: Breakdown by message generation, tool calls, playbook decisions
  • By time period: Records created in specific date ranges

Outcome Signals

Track automatic outcome detection:

  • Customer replied: Percentage of decisions that led to customer replies
  • No further inbound: Percentage where conversations ended
  • Escalation triggered: Percentage that led to escalations
  • Response latency: Average time until customer reply

Feedback Statistics

Analyze human feedback:

  • Thumbs up/down: Overall feedback ratio
  • Feedback reasons: Breakdown by reason (helpful, unhelpful, incorrect, etc.)
  • Feedback trends: How feedback changes over time
  • Feedback by cohort: Compare feedback across different versions

Performance Metrics

Compare AI versions:

  • Agreement rates: How often different versions agree
  • Safety deltas: Changes in risk levels between versions
  • Intent shifts: How intent classification changes
  • Quality scores: Overall performance metrics

Using Analytics

Viewing Cohort Statistics

  1. Select a cohort: Choose which cohort to analyze
  2. View metrics: See record counts, outcomes, and feedback
  3. Compare cohorts: Select multiple cohorts to compare
  4. Filter by date: Analyze specific time periods

Comparing Cohorts

When comparing cohorts, you can see:

  • Side-by-side metrics: Compare key metrics across cohorts
  • Differences: See where versions differ
  • Trends: Identify which version performs better
  • Recommendations: Get suggestions based on data

Filtering Data

Filter analytics by:

  • Date range: Specific time periods
  • Action type: Message generation, tool calls, playbook decisions
  • Outcome type: Customer replied, escalation, etc.
  • Feedback value: Positive, negative, or all feedback

Use Cases

A/B Testing

Compare two AI versions:

  1. Create cohorts: Tag records with different version identifiers
  2. Run experiment: Let both versions handle conversations
  3. Compare results: Use analytics to see which performs better
  4. Make decision: Choose the better-performing version

Version Rollout

Monitor new versions:

  1. Tag new version: New records automatically get new cohort tag
  2. Monitor metrics: Watch outcomes and feedback
  3. Compare to previous: See if new version is better
  4. Rollback if needed: Revert if performance degrades

Performance Tracking

Track improvements over time:

  1. Baseline: Establish baseline metrics
  2. Track changes: Monitor metrics as you make improvements
  3. Identify trends: See if performance is improving
  4. Validate changes: Confirm improvements are working

Analytics Dashboard

Key Metrics

The analytics dashboard shows:

  • Total records: Number of AI decisions recorded
  • Outcome rates: Percentage of different outcomes
  • Feedback ratio: Positive vs negative feedback
  • Performance trends: How metrics change over time

Visualizations

Charts and graphs show:

  • Outcome distribution: Pie charts of outcome types
  • Feedback trends: Line graphs of feedback over time
  • Cohort comparison: Bar charts comparing cohorts
  • Performance metrics: Various visualizations of key metrics

Best Practices

Regular Monitoring

  • Check weekly: Review analytics regularly to catch issues early
  • Track trends: Watch for changes in metrics over time
  • Compare versions: Always compare new versions to previous ones
  • Set alerts: Get notified when metrics change significantly

Effective Cohort Management

  • Clear naming: Use descriptive cohort names
  • Consistent tagging: Tag records consistently
  • Document experiments: Note what each cohort represents
  • Archive old cohorts: Remove or archive outdated cohorts

Data-Driven Decisions

  • Use multiple metrics: Don't rely on a single metric
  • Consider context: Understand the context behind the data
  • Validate findings: Confirm findings with additional analysis
  • Act on insights: Use analytics to guide improvements

Privacy & Security

  • Tenant isolation: Your analytics are completely private
  • No PII: Analytics don't include customer information
  • Secure access: Only authorized users can view analytics
  • Audit logging: All analytics access is logged

Limitations

What Analytics Can't Do

  • Predict future: Analytics show past performance, not future
  • Explain everything: Some patterns may not have clear explanations
  • Replace judgment: Use analytics to inform, not replace, human judgment
  • Guarantee results: Better metrics don't guarantee better outcomes

Understanding Metrics

  • Context matters: Metrics need context to be meaningful
  • Sample size: Small samples may not be representative
  • Correlation vs causation: Correlation doesn't mean causation
  • Multiple factors: Many factors affect outcomes, not just AI version

Next Steps

autoch.at Documentation