Appearance
Training Loop Analytics & Cohorts
The Training Loop system provides powerful analytics to help you understand, compare, and improve your AI assistant's performance. Use cohorts to group and compare different AI versions or experiments.
Overview
Analytics in the Training Loop system help you:
- Compare AI versions: See how different models or playbook versions perform
- Track outcomes: Monitor customer replies, escalations, and other outcomes
- Analyze feedback: Understand what users think about AI decisions
- Identify trends: Spot patterns in performance over time
- Make data-driven decisions: Use real data to guide improvements
Cohorts
What are Cohorts?
Cohorts are groups of training records that share common characteristics. Records are automatically tagged with cohorts based on:
- Playbook version: Which version of the playbook was used
- Prompt version: Which version of prompts was used
- AI model: Which AI model was used (GPT-4o, GPT-4o Mini, etc.)
- Code version: Which version of the system code was running
Automatic Tagging
Training records are automatically tagged with cohorts when they're created. The cohort tag includes version information, making it easy to:
- Group by version: See all decisions from a specific AI version
- Compare experiments: Compare different versions side-by-side
- Track changes: See how performance changes with new versions
Cohort Examples
Example cohort tags:
playbook_v3.2_control: Control group for playbook version 3.2playbook_v3.2_experiment: Experiment group for playbook version 3.2gpt4o_vs_gpt4o_mini: Comparison between two models
Analytics Metrics
Record Counts
See how many training records are in each cohort:
- Total records: Total number of AI decisions recorded
- By action type: Breakdown by message generation, tool calls, playbook decisions
- By time period: Records created in specific date ranges
Outcome Signals
Track automatic outcome detection:
- Customer replied: Percentage of decisions that led to customer replies
- No further inbound: Percentage where conversations ended
- Escalation triggered: Percentage that led to escalations
- Response latency: Average time until customer reply
Feedback Statistics
Analyze human feedback:
- Thumbs up/down: Overall feedback ratio
- Feedback reasons: Breakdown by reason (helpful, unhelpful, incorrect, etc.)
- Feedback trends: How feedback changes over time
- Feedback by cohort: Compare feedback across different versions
Performance Metrics
Compare AI versions:
- Agreement rates: How often different versions agree
- Safety deltas: Changes in risk levels between versions
- Intent shifts: How intent classification changes
- Quality scores: Overall performance metrics
Using Analytics
Viewing Cohort Statistics
- Select a cohort: Choose which cohort to analyze
- View metrics: See record counts, outcomes, and feedback
- Compare cohorts: Select multiple cohorts to compare
- Filter by date: Analyze specific time periods
Comparing Cohorts
When comparing cohorts, you can see:
- Side-by-side metrics: Compare key metrics across cohorts
- Differences: See where versions differ
- Trends: Identify which version performs better
- Recommendations: Get suggestions based on data
Filtering Data
Filter analytics by:
- Date range: Specific time periods
- Action type: Message generation, tool calls, playbook decisions
- Outcome type: Customer replied, escalation, etc.
- Feedback value: Positive, negative, or all feedback
Use Cases
A/B Testing
Compare two AI versions:
- Create cohorts: Tag records with different version identifiers
- Run experiment: Let both versions handle conversations
- Compare results: Use analytics to see which performs better
- Make decision: Choose the better-performing version
Version Rollout
Monitor new versions:
- Tag new version: New records automatically get new cohort tag
- Monitor metrics: Watch outcomes and feedback
- Compare to previous: See if new version is better
- Rollback if needed: Revert if performance degrades
Performance Tracking
Track improvements over time:
- Baseline: Establish baseline metrics
- Track changes: Monitor metrics as you make improvements
- Identify trends: See if performance is improving
- Validate changes: Confirm improvements are working
Analytics Dashboard
Key Metrics
The analytics dashboard shows:
- Total records: Number of AI decisions recorded
- Outcome rates: Percentage of different outcomes
- Feedback ratio: Positive vs negative feedback
- Performance trends: How metrics change over time
Visualizations
Charts and graphs show:
- Outcome distribution: Pie charts of outcome types
- Feedback trends: Line graphs of feedback over time
- Cohort comparison: Bar charts comparing cohorts
- Performance metrics: Various visualizations of key metrics
Best Practices
Regular Monitoring
- Check weekly: Review analytics regularly to catch issues early
- Track trends: Watch for changes in metrics over time
- Compare versions: Always compare new versions to previous ones
- Set alerts: Get notified when metrics change significantly
Effective Cohort Management
- Clear naming: Use descriptive cohort names
- Consistent tagging: Tag records consistently
- Document experiments: Note what each cohort represents
- Archive old cohorts: Remove or archive outdated cohorts
Data-Driven Decisions
- Use multiple metrics: Don't rely on a single metric
- Consider context: Understand the context behind the data
- Validate findings: Confirm findings with additional analysis
- Act on insights: Use analytics to guide improvements
Privacy & Security
- Tenant isolation: Your analytics are completely private
- No PII: Analytics don't include customer information
- Secure access: Only authorized users can view analytics
- Audit logging: All analytics access is logged
Limitations
What Analytics Can't Do
- Predict future: Analytics show past performance, not future
- Explain everything: Some patterns may not have clear explanations
- Replace judgment: Use analytics to inform, not replace, human judgment
- Guarantee results: Better metrics don't guarantee better outcomes
Understanding Metrics
- Context matters: Metrics need context to be meaningful
- Sample size: Small samples may not be representative
- Correlation vs causation: Correlation doesn't mean causation
- Multiple factors: Many factors affect outcomes, not just AI version
Next Steps
- Learn about Providing Feedback - How feedback affects analytics
- Review Training Loop Overview - Understand the full system
- Check Training Context - How AI uses your business context

