Testing & Debugging

Testing agents before deployment and debugging issues when they arise are critical skills for building reliable automation. This guide covers local testing strategies, validation tools, debugging techniques, and best practices for iterating on agent configurations.

Testing Locally with the CLI

The Buster CLI provides a local testing environment that simulates agent execution without affecting production systems.

Installation

CLI Installation

setup

Install the Buster CLI globally using your preferred package manager:

npm install -g @buster/cli

Verify installation:

buster --version

Basic Testing

buster test

command

Execute an agent in a local sandbox environment.

buster test agents/my-agent.yaml

What happens:

Agent runs in isolated sandbox
Reads from your local repository
Connects to your warehouse using local credentials
All tools available (respects restrictions)
Dry-run mode (no actual pushes to GitHub)

Output includes:

Files read and modified
SQL queries executed with results
Tool calls and their outputs
Agent reasoning and decision-making
Actions attempted (PRs, comments, etc.)
Any errors or warnings

Review the complete execution trace to understand agent behavior before deploying.

Testing Pull Request Triggers

--mock-pr

flag

Simulate a pull request without opening a real one.

buster test agents/pr-reviewer.yaml --mock-pr \
  --files "models/marts/customers.sql,models/staging/orders.sql" \
  --pr-title "feat: Add new customer metrics" \
  --pr-author "alice"

Available mock options:

--files - Comma-separated list of changed files
--pr-title - Simulated PR title
--pr-author - Simulated PR author
--pr-number - Simulated PR number
--base-branch - Target branch (default: main)
--head-branch - Source branch

The agent will behave as if these files changed in a real PR, allowing you to test path filters and PR-specific logic.

Testing Scheduled Triggers

--context

flag

Provide custom context variables for scheduled agent testing.

buster test agents/daily-audit.yaml \
  --context lookback_hours=24 \
  --context audit_scope=marts

Context usage:

Pass any key-value pairs defined in your trigger context
Access in prompts via template variables
Test different time windows and configurations

Testing Event Triggers

--mock-event

flag

Simulate data stack events locally.

buster test agents/schema-change-handler.yaml --mock-event \
  --event-name schema_change_detected \
  --event-source fivetran \
  --event-data '{"table":"accounts","change_type":"column_added","columns":["last_activity_date"]}'

Event simulation:

Specify event type and source
Provide event-specific data as JSON
Test event filters and conditional logic

Validation

Validate agent configurations before testing execution.

YAML Syntax Validation

buster validate

command

Check agent configuration for syntax and structural errors.

buster validate agents/my-agent.yaml

Validates:

YAML syntax correctness
Required fields present (name, triggers, prompt)
Trigger configuration validity
Tool preset and tool list compatibility
Restriction consistency
No conflicting configurations

Example output:

✓ Valid YAML syntax
✓ Required fields present
✓ Trigger configuration valid
✗ Error: tools.preset 'safe' conflicts with tools.include 'delete_file'
✗ Warning: restrictions.files.allow has no effect without file operation tools

Run validation before every test to catch configuration errors early.

Batch Validation

buster validate [directory]

command

Validate all agents in a directory at once.

buster validate agents/

Checks every .yaml and .yml file in the directory and reports errors collectively.

Integrate this into your CI/CD pipeline to prevent invalid configurations from reaching production.

Schema Validation

--schema

flag

Validate against the full YAML schema with detailed error messages.

buster validate agents/my-agent.yaml --schema

Provides more detailed validation including:

Type checking for all fields
Enum value validation
Pattern matching for specific fields
Nested object validation

Debugging Strategies

Reading Agent Run Logs

When an agent doesn’t behave as expected, the run logs provide complete visibility into its execution.

Access run logs in the web interface

Navigate to Runs in the web app to see all agent executions. Each run includes:

Trigger Context - What caused the execution
- PR number, author, changed files (for PR triggers)
- Timestamp, cron expression (for scheduled triggers)
- Event name, source, data (for event triggers)
Files Accessed - Complete list of files read or modified
- File paths and operation types
- Content before/after for modifications
SQL Queries - All queries executed
- Full SQL text
- Execution time
- Rows returned
- Any errors
Tool Calls - Every tool invocation
- Tool name and parameters
- Output or return value
- Execution duration
Agent Reasoning - Decision-making process
- Why certain actions were taken
- How conclusions were reached
- Alternative approaches considered
Actions Taken - Final outcomes
- PRs created with links
- Comments posted
- Files modified
- Notifications sent
Errors - Any failures
- Error messages
- Stack traces
- Failed tool calls

Filter and search runs

Use filters to find specific runs:

By agent name
By trigger type
By status (success, failure, timeout)
By date range
By changed files

Search within run logs:

Search for specific file paths
Find SQL queries mentioning tables
Locate error messages

Download run logs

Export run logs for offline analysis or sharing:

buster logs get-run <run-id> --output run-log.json

Common Issues and Solutions

Agent didn't trigger when expected

Possible causes:

Path filters don’t match

# Check if files actually match the pattern
paths:
  include: ["models/**/*.sql"]  # Won't match .yml files

Branch filters exclude the PR

branches:
  - main  # Won't trigger on PRs to develop

Conditions not satisfied

conditions:
  - type: pr_labels
    any_of: ["needs-review"]  # PR must have this label

Event filters too restrictive

filters:
  schema: raw.salesforce  # Won't trigger for other schemas

Solution: Review logs to see if trigger conditions were evaluated and which failed.

Agent didn't perform expected action

Possible causes:

Insufficient tool permissions
- Agent tried to use a tool not in its preset or include list
- Restrictions prevented the operation
Ambiguous prompt
- Agent interpreted instructions differently than intended
- Missing conditional logic
File/directory permissions
- Agent couldn’t access required files
- Path outside allowed directories

Solution:

Check tool calls in logs to see which tools were attempted
Review agent reasoning to understand its interpretation
Make prompt more specific with numbered steps and examples

Agent encountered errors

Common errors:

Permission denied
```
Error: Cannot modify file 'dbt_project.yml' - in critical_files list
```
Fix: Remove from critical_files or add approval gate
File not found
```
Error: File 'models/staging/customers.sql' does not exist
```
Fix: Verify file paths, check for typos
SQL timeout
```
Error: Query exceeded timeout of 120 seconds
```
Fix: Optimize query or increase restrictions.sql.timeout_seconds
Rate limit
```
Error: Agent has reached max_runs_per_hour limit (10)
```
Fix: Adjust rate limits or fix trigger logic causing excessive runs

Agent took wrong action

Diagnosis:

Review agent reasoning section to understand why it chose that action
Check if prompt contains conditional logic that may have misfired
Look for missing context or examples in the prompt

Fixes:

Add explicit conditions:

prompt: |
  If column count changed by more than 5:
    Create a PR
  Else if only column order changed:
    Post a comment only
  Else:
    Do nothing

Provide examples:

context:
  files:
    - "docs/NAMING_CONVENTIONS.md"
    - "examples/good_model.sql"

Break into steps:

prompt: |
First, identify all changed models
Then, for each model, check if...
Only after checking all models, decide whether to...

Iterating on Prompts

Most debugging involves refining the agent prompt. Start broad and add specificity based on test results.

Iteration Strategy

Start with high-level goal

prompt: |
  Update documentation for changed models.

Result: Too vague, agent doesn’t know what to document or how.

Add specific steps

prompt: |
  Update documentation for changed models.

  For each changed model:
    1. Use retrieve_metadata to get column statistics
    2. Update the YAML file with model and column descriptions
    3. Run dbt parse to validate
    4. Create a PR with the changes

Result: Better, but descriptions may not follow standards.

Add detailed requirements

prompt: |
  Update documentation for changed models.

  For each changed model:
    1. Use retrieve_metadata to profile:
       - Total row count
       - Column data types and null percentages
       - Distinct value counts
       - Min/max for numeric columns

    2. Update the YAML file:
       - Model description: purpose, grain, approximate row count
       - Column descriptions:
         - Business meaning (infer from name and usage)
         - Data type
         - Null rate as percentage
         - For categoricals: common values if <10 unique
         - For numerics: typical range

    3. Follow documentation standards:
       - Sentence case for descriptions
       - Include units for numeric columns
       - Explain expected NULL values
       - Link to related models when relevant

    4. Run `dbt parse` to validate YAML syntax
    5. Create PR titled "docs: Update model documentation"
       - Include list of models updated in PR body
       - Add labels: "documentation", "auto-generated"

Result: Comprehensive, agent knows exactly what to do.

Testing Edge Cases

Test your agent with diverse scenarios:

Empty or minimal changes

PRs with no relevant file changes
Single file changes
Whitespace-only changes

Ensure agent handles gracefully without errors.

Large or complex changes

PRs with 50+ files changed
Multiple model layers affected
Complex SQL with CTEs and joins

Verify agent doesn’t timeout or produce incomplete results.

Error conditions

Invalid SQL syntax
Missing required files
Database connection failures
Git conflicts

Check that agent provides helpful error messages.

Different data patterns

Tables with all NULLs
Empty tables
Wide tables (many columns)
Tables with unusual data types

Ensure agent adapts to various data characteristics.

Dry Run Mode

Test agents without making actual changes to your systems.

testing.dry_run

boolean

Enable simulation mode for testing.

testing:
  dry_run: true

Dry run behavior:

✓ Files read normally
✓ SQL queries execute (read-only enforced)
✗ Git operations simulated (no actual commits/pushes)
✗ PRs logged but not created
✗ Comments logged but not posted
✗ Notifications logged but not sent

Remove dry_run: true before deploying to production. Agents won’t take real actions in dry-run mode.

Rate Limiting

Prevent runaway agents and control execution frequency.

restrictions.rate_limits

object

Configure rate limiting to prevent excessive agent runs.

restrictions:
  rate_limits:
    max_runs_per_hour: 10
    max_runs_per_day: 50
    cooldown_minutes: 5

Options:

max_runs_per_hour - Maximum executions in any 60-minute window
max_runs_per_day - Maximum executions in any 24-hour period
cooldown_minutes - Minimum time between runs

Use rate limits during initial testing to prevent accidental cost or resource issues.

Best Practices

Pre-Deployment Testing Checklist

Validate configuration

buster validate agents/my-agent.yaml --schema

Ensure no syntax or structural errors.

Test with mock data

buster test agents/my-agent.yaml --mock-pr --files "path/to/test/file.sql"

Verify behavior with controlled input.

Review execution logs

Check that agent:

Read correct files
Executed expected queries
Made intended decisions
Produced desired output

Test edge cases

Run with empty PRs, large PRs, error conditions.

Enable dry run initially

testing:
  dry_run: true

Deploy with dry run first to observe without side effects.

Monitor first real runs

Watch first few production executions closely. Be ready to disable if needed.

Start Restrictive

Begin with the safe preset and limited scope. Expand permissions as you gain confidence.

# Initial version
tools:
  preset: safe
restrictions:
  files:
    allow: ["models/staging/**"]

# After testing, expand
tools:
  preset: standard
restrictions:
  files:
    allow: ["models/**"]

Version Control for Agents

Track changes to agent configurations:

metadata:
  version: "1.3.0"
  changelog:
    - "Added baseline comparison for drift detection"
    - "Improved Slack message formatting"
    - "Fixed edge case with empty tables"
  last_updated: "2024-11-07"

Commit agent configurations to git with meaningful commit messages. This enables rollback if issues arise.

Progressive Rollout

Deploy agents gradually:

Test locally - Verify with buster test
Deploy with dry run - Observe without actions
Deploy with manual trigger - Test on-demand
Enable for specific paths - Limit scope initially
Full deployment - Enable all triggers and paths

Monitoring and Alerts

Set up monitoring for agent health:

restrictions:
  alerts:
    notify_on_failure: true
    notify_on_timeout: true
    notification_channel: "#agent-monitoring"

monitoring:
  success_rate_threshold: 0.95  # Alert if success rate drops below 95%
  avg_duration_threshold: 300   # Alert if average run time exceeds 5 minutes

Troubleshooting Tools

Debug Mode

--debug

flag

Run tests with verbose debug output.

buster test agents/my-agent.yaml --debug

Shows additional information:

Internal agent state
Detailed tool call parameters
Intermediate reasoning steps
Cache hits/misses

Replay Mode

buster replay

command

Replay a previous run locally for debugging.

buster replay <run-id>

Re-executes the agent with the same trigger context, allowing you to test fixes against historical runs.

Diff Mode

--show-diff

flag

Display file changes made by the agent.

buster test agents/my-agent.yaml --show-diff

Shows before/after for all file modifications in unified diff format.

Getting Help

FAQ

Common questions and solutions

Creating Agents

Complete agent configuration guide

Examples

Learn from working agent examples

Tools & Permissions

Configure agent capabilities

Introduction

Guides

Examples

Resources

Testing & Debugging

Testing Locally with the CLI

Installation

Basic Testing

Testing Pull Request Triggers

Testing Scheduled Triggers

Testing Event Triggers

Validation

YAML Syntax Validation

Batch Validation

Schema Validation

Debugging Strategies

Reading Agent Run Logs

Common Issues and Solutions

Iterating on Prompts

Iteration Strategy

Testing Edge Cases

Dry Run Mode

Rate Limiting

Best Practices

Pre-Deployment Testing Checklist

Start Restrictive

Version Control for Agents

Progressive Rollout

Monitoring and Alerts

Troubleshooting Tools

Debug Mode

Replay Mode

Diff Mode

Getting Help

FAQ

Creating Agents

Examples

Tools & Permissions

Introduction

Guides

Examples

Resources

​Testing Locally with the CLI

​Installation

​Basic Testing

​Testing Pull Request Triggers

​Testing Scheduled Triggers

​Testing Event Triggers

​Validation

​YAML Syntax Validation

​Batch Validation

​Schema Validation

​Debugging Strategies

​Reading Agent Run Logs

​Common Issues and Solutions

​Iterating on Prompts

​Iteration Strategy

​Testing Edge Cases

​Dry Run Mode

​Rate Limiting

​Best Practices

​Pre-Deployment Testing Checklist

​Start Restrictive

​Version Control for Agents

​Progressive Rollout

​Monitoring and Alerts

​Troubleshooting Tools

​Debug Mode

​Replay Mode

​Diff Mode

​Getting Help

FAQ

Creating Agents

Examples

Tools & Permissions

Testing Locally with the CLI

Installation

Basic Testing

Testing Pull Request Triggers

Testing Scheduled Triggers

Testing Event Triggers

Validation

YAML Syntax Validation

Batch Validation

Schema Validation

Debugging Strategies

Reading Agent Run Logs

Common Issues and Solutions

Iterating on Prompts

Iteration Strategy

Testing Edge Cases

Dry Run Mode

Rate Limiting

Best Practices

Pre-Deployment Testing Checklist

Start Restrictive

Version Control for Agents

Progressive Rollout

Monitoring and Alerts

Troubleshooting Tools

Debug Mode

Replay Mode

Diff Mode

Getting Help