Simple Example
More Robust Example
Production-ready with multiple event types and intelligent responses:- Check if it explicitly selects affected columns
- Determine if it will break or continue working
-
Create HIGH priority issue:
-
Send urgent Slack alert:
- DO NOT auto-create PR (too risky, needs human review)
-
Create PR to update staging model:
-
Send Slack notification:
-
Create issue suggesting update:
Context
- Appears in % of recent records
- Sample values:
-
Send informational Slack:
.buster/events/schema_changes.jsonl:
Handler 2: dbt Run Failure
Event Data Available:event.job_id: dbt Cloud job IDevent.run_id: Specific run IDevent.environment: prod/staging/devevent.failed_models: List of failed modelsevent.error_messages: Error detailsevent.run_url: Link to dbt Cloud
Process:
1. Classify failure type Analyze error messages:- SQL Error: Syntax, missing ref, column not found
- Data Quality: Test failure (unique, not_null, etc.)
- Compilation: Macro issue, config error
- Infrastructure: Timeout, warehouse connection
- Permission: Access denied
- Missing ref: If model was renamed, update references
- Test failure (data): May self-resolve, retry
- Timeout: Retry with longer timeout
- SQL syntax error: Needs code fix
- Data quality: Needs investigation
- Permission error: Needs admin
- Attempt fix
- Trigger retry
- Report outcome
-
Create detailed issue:
Got 3 duplicate customer_id values
-
Send Slack alert (severity based on environment):
- Flag as unstable
- Suggest adding better error handling
- Consider splitting into smaller models
Handler 3: Freshness Failure
Event Data Available:event.source: Which sourceevent.table: Which tableevent.max_loaded_at: Last data timestampevent.threshold: Expected freshnessevent.status: warn or error
Process:
1. Check if expected Some freshness failures are known:- Weekend data loads
- Scheduled maintenance windows
- Holiday periods
.buster/config/expected_delays.yml:
- Is data completely missing or just delayed?
- Is this source-side issue or pipeline issue?
- Are other tables from same source affected?
- Create issue for tracking
- Escalate if critical source
Handler 4: Data Quality Failure
Event Data Available:event.check_name: Which quality checkevent.table: Affected tableevent.severity: critical/high/medium/lowevent.metric: What failed (null_rate, duplicate_count, etc.)event.threshold: Expected valueevent.actual: Actual value
Process:
1. Run diagnostic queries Based on the failure type, investigate: High null rate:- Critical: Core business metric affected, blocks reporting
- High: Important quality issue, needs fix soon
- Medium: Degraded quality, investigate when possible
- Low: Minor anomaly, monitor
Handler 5: Health Check (Scheduled)
Run proactive checks every 6 hours: 1. Check warehouse connectionError Handling
For any handler:- If event data is malformed: Log error, alert #data-platform
- If can’t determine severity: Default to HIGH, escalate
- If action fails: Report failure, create issue for manual handling
- Always log every event processed for audit trail