Skip to main content
This agent automatically monitors your critical data tables to detect when data stops flowing. It checks timestamps on key tables, identifies stale data before dashboards break, and sends intelligent alerts to your team with diagnostic information to help quickly resolve pipeline failures.

Simple Example

name: freshness-monitor
description: Alert when critical tables go stale

triggers:
  - type: scheduled
    cron: "0 */2 * * *"  # Every 2 hours
    timezone: "America/New_York"

tools:
  preset: safe  # Read-only

prompt: |
  Check freshness of orders table:
  
  SELECT MAX(created_at) as last_order
  FROM production.orders
  
  If last_order > 6 hours ago:
    Send Slack alert to #data-incidents

More Robust Example

Production-ready with multiple tables and progressive alerting:
name: freshness-monitor
description: Monitor data freshness across critical tables

triggers:
  - type: scheduled
    cron: "0 * * * *"  # Every hour
    timezone: "America/New_York"

tools:
  preset: safe  # Read-only monitoring

context:
  # Define tables and their freshness SLAs
  tables:
    - name: "production.orders"
      max_age_hours: 2
      severity: "critical"
      timestamp_column: "created_at"
    
    - name: "production.customers"
      max_age_hours: 24
      severity: "warning"
      timestamp_column: "updated_at"
    
    - name: "production.sessions"
      max_age_hours: 1
      severity: "critical"
      timestamp_column: "session_start"
    
    - name: "production.revenue_daily"
      max_age_hours: 12
      severity: "warning"
      timestamp_column: "date"

prompt: |
  Monitor data freshness for production tables.
  
  ## 1. Check each table
  For each table in the context configuration:
  
  Query:
  ```sql
  SELECT 
    '[table_name]' as table_name,
    MAX([timestamp_column]) as last_update,
    DATEDIFF('hour', MAX([timestamp_column]), CURRENT_TIMESTAMP) as hours_since_update,
    COUNT(*) as total_rows
  FROM [table_name]

2. Assess freshness

For each table result: Calculate status:
  • FRESH: hours_since_update < max_age_hours
  • STALE: hours_since_update >= max_age_hours
  • CRITICAL: hours_since_update > (max_age_hours * 2)

3. Alert based on severity

For CRITICAL issues (data > 2x SLA):
  • Send Slack to #data-incidents: β€πŸš¨ CRITICAL: [table_name] data is [hours] hours old Expected: < [max_age] hours Last update: [last_update] Possible causes:
    • Pipeline failure
    • Source API down
    • Warehouse connection issue
    Action: Investigate immediately Runbook: [link to debugging guide]”
  • Create GitHub issue if doesn’t exist
  • Page on-call if > 4 hours stale
For WARNING issues (stale but < 2x SLA):
  • Send Slack to #data-pipeline: β€œβš οΈ WARNING: [table_name] is [hours] hours old Expected: < [max_age] hours Monitor closely.”
For FRESH tables:
  • No alert (all good)
  • Log status for audit

4. Summary report

After checking all tables, send summary to #data-pipeline: β€πŸ“Š Hourly Freshness Check βœ“ Fresh: [fresh_count]/[total_count] ⚠ Stale: [warning_count] 🚨 Critical: [critical_count] [if critical_count > 0] Immediate attention needed! Next check: [next_run_time]β€œ Log results to identify patterns:
  • Which tables fail most often?
  • What time of day do failures occur?
  • Are issues getting worse?

6. Include diagnostics

When data is stale, run diagnostics:
  1. Check dbt Cloud job status (recent failures?)
  2. Check Fivetran connector (sync successful?)
  3. Check source count: SELECT COUNT(*) FROM raw.table
  4. Compare to warehouse: Is raw table updated?
Include diagnostics in alert to help debugging.