Skip to main content
Testing agents before deployment and debugging issues when they arise are critical skills for building reliable automation. This guide covers local testing strategies, validation tools, debugging techniques, and best practices for iterating on agent configurations.

Testing Locally with the CLI

The Buster CLI provides a local testing environment that simulates agent execution without affecting production systems.

Installation

CLI Installation
setup
Install the Buster CLI globally using your preferred package manager:
npm install -g @buster/cli
Verify installation:
buster --version

Basic Testing

buster test
command
Execute an agent in a local sandbox environment.
buster test agents/my-agent.yaml
What happens:
  • Agent runs in isolated sandbox
  • Reads from your local repository
  • Connects to your warehouse using local credentials
  • All tools available (respects restrictions)
  • Dry-run mode (no actual pushes to GitHub)
Output includes:
  • Files read and modified
  • SQL queries executed with results
  • Tool calls and their outputs
  • Agent reasoning and decision-making
  • Actions attempted (PRs, comments, etc.)
  • Any errors or warnings
Review the complete execution trace to understand agent behavior before deploying.

Testing Pull Request Triggers

--mock-pr
flag
Simulate a pull request without opening a real one.
buster test agents/pr-reviewer.yaml --mock-pr \
  --files "models/marts/customers.sql,models/staging/orders.sql" \
  --pr-title "feat: Add new customer metrics" \
  --pr-author "alice"
Available mock options:
  • --files - Comma-separated list of changed files
  • --pr-title - Simulated PR title
  • --pr-author - Simulated PR author
  • --pr-number - Simulated PR number
  • --base-branch - Target branch (default: main)
  • --head-branch - Source branch
The agent will behave as if these files changed in a real PR, allowing you to test path filters and PR-specific logic.

Testing Scheduled Triggers

--context
flag
Provide custom context variables for scheduled agent testing.
buster test agents/daily-audit.yaml \
  --context lookback_hours=24 \
  --context audit_scope=marts
Context usage:
  • Pass any key-value pairs defined in your trigger context
  • Access in prompts via template variables
  • Test different time windows and configurations

Testing Event Triggers

--mock-event
flag
Simulate data stack events locally.
buster test agents/schema-change-handler.yaml --mock-event \
  --event-name schema_change_detected \
  --event-source fivetran \
  --event-data '{"table":"accounts","change_type":"column_added","columns":["last_activity_date"]}'
Event simulation:
  • Specify event type and source
  • Provide event-specific data as JSON
  • Test event filters and conditional logic

Validation

Validate agent configurations before testing execution.

YAML Syntax Validation

buster validate
command
Check agent configuration for syntax and structural errors.
buster validate agents/my-agent.yaml
Validates:
  • YAML syntax correctness
  • Required fields present (name, triggers, prompt)
  • Trigger configuration validity
  • Tool preset and tool list compatibility
  • Restriction consistency
  • No conflicting configurations
Example output:
✓ Valid YAML syntax
✓ Required fields present
✓ Trigger configuration valid
✗ Error: tools.preset 'safe' conflicts with tools.include 'delete_file'
✗ Warning: restrictions.files.allow has no effect without file operation tools
Run validation before every test to catch configuration errors early.

Batch Validation

buster validate [directory]
command
Validate all agents in a directory at once.
buster validate agents/
Checks every .yaml and .yml file in the directory and reports errors collectively.
Integrate this into your CI/CD pipeline to prevent invalid configurations from reaching production.

Schema Validation

--schema
flag
Validate against the full YAML schema with detailed error messages.
buster validate agents/my-agent.yaml --schema
Provides more detailed validation including:
  • Type checking for all fields
  • Enum value validation
  • Pattern matching for specific fields
  • Nested object validation

Debugging Strategies

Reading Agent Run Logs

When an agent doesn’t behave as expected, the run logs provide complete visibility into its execution.
Navigate to Runs in the web app to see all agent executions. Each run includes:
  1. Trigger Context - What caused the execution
    • PR number, author, changed files (for PR triggers)
    • Timestamp, cron expression (for scheduled triggers)
    • Event name, source, data (for event triggers)
  2. Files Accessed - Complete list of files read or modified
    • File paths and operation types
    • Content before/after for modifications
  3. SQL Queries - All queries executed
    • Full SQL text
    • Execution time
    • Rows returned
    • Any errors
  4. Tool Calls - Every tool invocation
    • Tool name and parameters
    • Output or return value
    • Execution duration
  5. Agent Reasoning - Decision-making process
    • Why certain actions were taken
    • How conclusions were reached
    • Alternative approaches considered
  6. Actions Taken - Final outcomes
    • PRs created with links
    • Comments posted
    • Files modified
    • Notifications sent
  7. Errors - Any failures
    • Error messages
    • Stack traces
    • Failed tool calls
Use filters to find specific runs:
  • By agent name
  • By trigger type
  • By status (success, failure, timeout)
  • By date range
  • By changed files
Search within run logs:
  • Search for specific file paths
  • Find SQL queries mentioning tables
  • Locate error messages
Export run logs for offline analysis or sharing:
buster logs get-run <run-id> --output run-log.json

Common Issues and Solutions

Possible causes:
  1. Path filters don’t match
    # Check if files actually match the pattern
    paths:
      include: ["models/**/*.sql"]  # Won't match .yml files
    
  2. Branch filters exclude the PR
    branches:
      - main  # Won't trigger on PRs to develop
    
  3. Conditions not satisfied
    conditions:
      - type: pr_labels
        any_of: ["needs-review"]  # PR must have this label
    
  4. Event filters too restrictive
    filters:
      schema: raw.salesforce  # Won't trigger for other schemas
    
Solution: Review logs to see if trigger conditions were evaluated and which failed.
Possible causes:
  1. Insufficient tool permissions
    • Agent tried to use a tool not in its preset or include list
    • Restrictions prevented the operation
  2. Ambiguous prompt
    • Agent interpreted instructions differently than intended
    • Missing conditional logic
  3. File/directory permissions
    • Agent couldn’t access required files
    • Path outside allowed directories
Solution:
  1. Check tool calls in logs to see which tools were attempted
  2. Review agent reasoning to understand its interpretation
  3. Make prompt more specific with numbered steps and examples
Common errors:
  1. Permission denied
    Error: Cannot modify file 'dbt_project.yml' - in critical_files list
    
    Fix: Remove from critical_files or add approval gate
  2. File not found
    Error: File 'models/staging/customers.sql' does not exist
    
    Fix: Verify file paths, check for typos
  3. SQL timeout
    Error: Query exceeded timeout of 120 seconds
    
    Fix: Optimize query or increase restrictions.sql.timeout_seconds
  4. Rate limit
    Error: Agent has reached max_runs_per_hour limit (10)
    
    Fix: Adjust rate limits or fix trigger logic causing excessive runs
Diagnosis:
  1. Review agent reasoning section to understand why it chose that action
  2. Check if prompt contains conditional logic that may have misfired
  3. Look for missing context or examples in the prompt
Fixes:
  1. Add explicit conditions:
    prompt: |
      If column count changed by more than 5:
        Create a PR
      Else if only column order changed:
        Post a comment only
      Else:
        Do nothing
    
  2. Provide examples:
    context:
      files:
        - "docs/NAMING_CONVENTIONS.md"
        - "examples/good_model.sql"
    
  3. Break into steps:
    prompt: |
      1. First, identify all changed models
      2. Then, for each model, check if...
      3. Only after checking all models, decide whether to...
    

Iterating on Prompts

Most debugging involves refining the agent prompt. Start broad and add specificity based on test results.

Iteration Strategy

1

Start with high-level goal

prompt: |
  Update documentation for changed models.
Result: Too vague, agent doesn’t know what to document or how.
2

Add specific steps

prompt: |
  Update documentation for changed models.

  For each changed model:
    1. Use retrieve_metadata to get column statistics
    2. Update the YAML file with model and column descriptions
    3. Run dbt parse to validate
    4. Create a PR with the changes
Result: Better, but descriptions may not follow standards.
3

Add detailed requirements

prompt: |
  Update documentation for changed models.

  For each changed model:
    1. Use retrieve_metadata to profile:
       - Total row count
       - Column data types and null percentages
       - Distinct value counts
       - Min/max for numeric columns

    2. Update the YAML file:
       - Model description: purpose, grain, approximate row count
       - Column descriptions:
         - Business meaning (infer from name and usage)
         - Data type
         - Null rate as percentage
         - For categoricals: common values if <10 unique
         - For numerics: typical range

    3. Follow documentation standards:
       - Sentence case for descriptions
       - Include units for numeric columns
       - Explain expected NULL values
       - Link to related models when relevant

    4. Run `dbt parse` to validate YAML syntax
    5. Create PR titled "docs: Update model documentation"
       - Include list of models updated in PR body
       - Add labels: "documentation", "auto-generated"
Result: Comprehensive, agent knows exactly what to do.

Testing Edge Cases

Test your agent with diverse scenarios:
  • PRs with no relevant file changes
  • Single file changes
  • Whitespace-only changes
Ensure agent handles gracefully without errors.
  • PRs with 50+ files changed
  • Multiple model layers affected
  • Complex SQL with CTEs and joins
Verify agent doesn’t timeout or produce incomplete results.
  • Invalid SQL syntax
  • Missing required files
  • Database connection failures
  • Git conflicts
Check that agent provides helpful error messages.
  • Tables with all NULLs
  • Empty tables
  • Wide tables (many columns)
  • Tables with unusual data types
Ensure agent adapts to various data characteristics.

Dry Run Mode

Test agents without making actual changes to your systems.
testing.dry_run
boolean
Enable simulation mode for testing.
testing:
  dry_run: true
Dry run behavior:
  • ✓ Files read normally
  • ✓ SQL queries execute (read-only enforced)
  • ✗ Git operations simulated (no actual commits/pushes)
  • ✗ PRs logged but not created
  • ✗ Comments logged but not posted
  • ✗ Notifications logged but not sent
Remove dry_run: true before deploying to production. Agents won’t take real actions in dry-run mode.

Rate Limiting

Prevent runaway agents and control execution frequency.
restrictions.rate_limits
object
Configure rate limiting to prevent excessive agent runs.
restrictions:
  rate_limits:
    max_runs_per_hour: 10
    max_runs_per_day: 50
    cooldown_minutes: 5
Options:
  • max_runs_per_hour - Maximum executions in any 60-minute window
  • max_runs_per_day - Maximum executions in any 24-hour period
  • cooldown_minutes - Minimum time between runs
Use rate limits during initial testing to prevent accidental cost or resource issues.

Best Practices

Pre-Deployment Testing Checklist

1

Validate configuration

buster validate agents/my-agent.yaml --schema
Ensure no syntax or structural errors.
2

Test with mock data

buster test agents/my-agent.yaml --mock-pr --files "path/to/test/file.sql"
Verify behavior with controlled input.
3

Review execution logs

Check that agent:
  • Read correct files
  • Executed expected queries
  • Made intended decisions
  • Produced desired output
4

Test edge cases

Run with empty PRs, large PRs, error conditions.
5

Enable dry run initially

testing:
  dry_run: true
Deploy with dry run first to observe without side effects.
6

Monitor first real runs

Watch first few production executions closely. Be ready to disable if needed.

Start Restrictive

Begin with the safe preset and limited scope. Expand permissions as you gain confidence.
# Initial version
tools:
  preset: safe
restrictions:
  files:
    allow: ["models/staging/**"]

# After testing, expand
tools:
  preset: standard
restrictions:
  files:
    allow: ["models/**"]

Version Control for Agents

Track changes to agent configurations:
metadata:
  version: "1.3.0"
  changelog:
    - "Added baseline comparison for drift detection"
    - "Improved Slack message formatting"
    - "Fixed edge case with empty tables"
  last_updated: "2024-11-07"
Commit agent configurations to git with meaningful commit messages. This enables rollback if issues arise.

Progressive Rollout

Deploy agents gradually:
  1. Test locally - Verify with buster test
  2. Deploy with dry run - Observe without actions
  3. Deploy with manual trigger - Test on-demand
  4. Enable for specific paths - Limit scope initially
  5. Full deployment - Enable all triggers and paths

Monitoring and Alerts

Set up monitoring for agent health:
restrictions:
  alerts:
    notify_on_failure: true
    notify_on_timeout: true
    notification_channel: "#agent-monitoring"

monitoring:
  success_rate_threshold: 0.95  # Alert if success rate drops below 95%
  avg_duration_threshold: 300   # Alert if average run time exceeds 5 minutes

Troubleshooting Tools

Debug Mode

--debug
flag
Run tests with verbose debug output.
buster test agents/my-agent.yaml --debug
Shows additional information:
  • Internal agent state
  • Detailed tool call parameters
  • Intermediate reasoning steps
  • Cache hits/misses

Replay Mode

buster replay
command
Replay a previous run locally for debugging.
buster replay <run-id>
Re-executes the agent with the same trigger context, allowing you to test fixes against historical runs.

Diff Mode

--show-diff
flag
Display file changes made by the agent.
buster test agents/my-agent.yaml --show-diff
Shows before/after for all file modifications in unified diff format.

Getting Help