Skip to main content
An agent is a YAML configuration file that defines when and how to automate a specific data engineering task. Agents respond to triggers like pull requests, schedules, or pipeline failures, and can take actions like creating PRs, posting comments, running SQL, and sending Slack notifications. This guide covers everything you need to create, configure, and deploy effective agents.

Getting Started

To create an agent, you need three core elements: a name, at least one trigger, and a prompt. The simplest agent looks like this:
my-agent.yml
name: my-first-agent

prompt: |
  Review this PR for dbt best practices.
  Comment if you find any issues.

triggers:
  - event: pull_request
Start with a single, specific task. You can always create additional agents for other workflows.

Enable Schema Validation

Add type safety and autocomplete to your agent files by including a JSON schema reference at the top of your YAML file:
agent.yaml
# yaml-language-server: $schema=https://platform.buster.so/schemas/v1/custom-agent.schema.json

name: my-first-agent
triggers:
  - event: pull_request
prompt: |
  Review this PR for dbt best practices.
  Comment if you find any issues.
VS Code users: You MUST have the YAML extension by Red Hat installed for schema validation and autocomplete to work.
This schema reference (platform.buster.so/schemas/v1/custom-agent.schema.json) enables type safety and real-time validation in your IDE, providing autocomplete suggestions and helping you catch configuration errors before deployment.

Project Setup

Agents are referenced from your buster.yml configuration file:
buster.yml
projects:
  - name: analytics
    data_source: my_datasource
    schema: public

agents:
  - .buster/agents/my-agent.yml
Then deploy with:
buster deploy

Reference

Core Fields

name
string
required
Unique identifier for your agent. Used in logs, the web interface, and GitHub Check Runs.Example: documentation_agent, pr-reviewer
description
string
A brief explanation of what the agent does. Appears in the web interface and helps team members understand the agent’s purpose.Example: Automatically maintains dbt model documentation
prompt
string
required
Instructions for the agent in natural language. Write this like you’re instructing a colleague—be clear about goals but let the agent determine implementation details.The agent has access to:
  • Your repository files (read, write, edit)
  • Your data warehouse (run SQL queries, retrieve metadata)
  • dbt project context (if configured)
  • Git operations (commits, branches, PRs via bash/gh commands)
  • Slack (if slack_tool is enabled)
Structure your prompt with clear steps when you want a specific workflow. Use open-ended instructions when you want the agent to figure out the best approach.
triggers
array
required
Defines when the agent runs. You must specify at least one trigger. See Triggers & Scheduling for complete details.
triggers:
  - event: pull_request
    types: ['opened', 'synchronize']
    branches: ['main']
    includes:
      - "models/**/*.sql"
tools
object
Configure additional tools for the agent.

Triggers

Agents support four trigger types:
EventDescription
pull_requestRuns when PRs are opened or updated
pushRuns when commits are pushed to branches
scheduledRuns on a cron schedule
airflowRuns when Airflow pipelines fail

Pull Request Trigger

triggers:
  - event: pull_request
    branches: ['main', 'develop']
    types: ['opened', 'synchronize', 'reopened']
    includes:
      - "models/**/*.sql"
      - "models/**/*.yml"
    auto_commit: false
event
"pull_request"
required
Trigger type for pull request events.
branches
array
Branch patterns to match. Defaults to ["*"] (all branches).
types
array
PR event types that trigger the agent:
  • opened — PR was created
  • synchronize — New commits pushed to PR
  • reopened — PR was reopened after being closed
includes
array
Glob patterns for file filtering. Only triggers if at least one changed file matches.
includes:
  - "models/**"
  - "schemas/**/*.yml"
auto_commit
boolean
Controls how the agent commits changes:
  • false (default) — Creates a new branch from the PR’s head and opens a new PR with changes. Safe for collaboration.
  • true — Commits directly to the PR’s branch. Can cause conflicts if others are also pushing.
Use auto_commit: true with caution. It can cause merge conflicts if multiple people are working on the same PR.

Push Trigger

triggers:
  - event: push
    branches: ['main']
    includes:
      - "**/*.sql"
Fires when commits are pushed directly to a branch.

Scheduled Trigger

triggers:
  - event: scheduled
    cron: "0 9 * * 1"  # Every Monday at 9 AM UTC
    branches: ['main']
    includes:
      - "models/**"
cron
string
required
Cron expression (5 fields): minute hour day month weekdayCommon patterns:
  • 0 * * * * — Every hour
  • 0 9 * * * — Daily at 9 AM UTC
  • 0 9 * * 1 — Every Monday at 9 AM UTC
  • */15 * * * * — Every 15 minutes
All schedules use UTC timezone. The agent only runs if files matching includes have changed since the last run.

Airflow Trigger

triggers:
  - event: airflow
    type: dag_run_failed
    includes:
      - "critical_pipeline"
      - "data_quality_dag"
type
string
required
Airflow event type:
  • dag_run_failed — DAG run failed
  • task_instance_failed — Individual task failed
includes
array
Specific DAGs or tasks to filter. Only triggers for matching DAG/task names.

Available Tools

Agents have access to these tools by default:
ToolDescription
ReadRead file contents
WriteCreate or overwrite files
EditEdit existing files with search-and-replace
BashExecute shell commands (git, dbt, python, etc.)
GrepSearch file contents
GlobFind files by pattern
LSList directory contents
RunSqlExecute SQL queries against your data warehouse
RetrieveMetadataGet column statistics, distributions, and cardinality

Optional Tools

Enable additional tools with tools.include:
ToolDescription
slack_toolSend messages to Slack channels
tools:
  include:
    - slack_tool

Git Operations

Agents perform Git operations through the Bash tool using git and gh (GitHub CLI) commands:
prompt: |
  After making changes:
  1. Create a new branch: git checkout -b agent/docs-update
  2. Commit changes: git add . && git commit -m "docs: Update model documentation"
  3. Push and create PR: gh pr create --title "Documentation update" --body "Auto-generated by Buster"

Examples

documentation-agent.yml
name: documentation_agent
description: Automatically updates dbt model documentation

prompt: |
  When triggered by a pull request:
  
  1. Identify all changed SQL model files
  2. For each changed model:
     - Use RetrieveMetadata to get column statistics
     - Update the corresponding YAML file with:
       - Model description (purpose, grain, approximate row count)
       - Column descriptions with data types and null rates
  3. Run `dbt parse` to validate changes
  4. If valid, commit changes and post a summary comment
  5. If invalid, post the error as a PR comment

triggers:
  - event: pull_request
    branches: ['main', 'develop']
    types: ['opened', 'synchronize']
    includes:
      - "models/**/*.sql"
    auto_commit: false

tools:
  include:
    - slack_tool

Best Practices

Writing Effective Prompts

Instead of “check for issues,” specify what issues to look for and what to do when found.
# Vague
prompt: Check this PR

# Specific
prompt: |
  Check if new columns in staging models follow our naming convention:
  - Boolean columns must start with "is_" or "has_"
  - Date columns must end with "_date" or "_at"
  - Amount columns must end with "_amount" or "_value"
  
  If violations found, comment on the PR with specific file and line numbers.
For multi-step workflows, use numbered lists to clarify sequence and dependencies.
prompt: |
  1. First, identify all changed models in this PR
  2. For each model, use RetrieveMetadata to get column stats
  3. Update YAML documentation based on the metadata
  4. Run dbt parse to validate syntax
  5. If validation passes, commit changes
  6. Post a summary comment listing what was updated
Tell the agent how to present results—should it create a PR, post a comment, send a Slack message?
prompt: |
  Generate a documentation coverage report.
  
  Format the report as a Slack message with:
  - Summary statistics at the top
  - Emoji indicators (✅ good, ⚠️ warning, ❌ critical)
  - Bulleted list of models needing attention
  
  Send to #data-documentation channel.
Define what the agent should do when things go wrong.
prompt: |
  Update documentation for changed models.
  
  If dbt parse fails:
  - Do NOT commit the changes
  - Post the error message as a PR comment
  - Tag @data-platform-team for review
  
  If unable to connect to the warehouse:
  - Post a comment explaining the issue
  - Do not block the PR

Choosing the Right Trigger

Match your trigger to the workflow pattern:
  • pull_request — Code review, validation, documentation updates
  • push — Post-merge automation, deployment triggers
  • scheduled — Regular audits, monitoring, batch processing
  • airflow — Pipeline failure response

Testing Agents

Before deploying to production:
  1. Validate configuration:
    buster deploy --dry-run --verbose
    
  2. Test with a sample PR: Create a test branch and open a PR to see how the agent behaves.
  3. Monitor initial runs: Check the Runs tab in the web app to see execution logs and outputs.
  4. Iterate on the prompt: Refine based on actual behavior—add specificity where the agent made unexpected decisions.

Deployment

After creating your agent:
  1. Reference it in buster.yml:
    agents:
      - .buster/agents/my-agent.yml
    
  2. Deploy:
    buster deploy
    
  3. Verify: Check the Buster web app to confirm your agent appears in the list.
The agent will start responding to its configured triggers immediately after deployment.
Every agent execution creates a GitHub Check Run and is logged in the Buster web app for full transparency and debugging.