Skip to main content
This agent runs on every pull request to automatically update dbt model documentation. When models change, it profiles them using the data warehouse, generates descriptions based on structure and data patterns, and commits the updated YAML files back to the PR—keeping documentation current without manual effort.

Simple Example

name: docs-updater
description: Auto-update documentation for changed models

triggers:
  - type: pull_request
    on_changed_files: "models/**/*.sql"

tools:
  preset: standard

prompt: |
  # Task: Update dbt model documentation for PR
  You are tasked with keeping dbt model documentation accurate and up-to-date when models change.

  ## Goal
  Generate clear, accurate documentation that reflects the **current state of the data**—not change history.

  ## Approach
  1. **Profile the data**  
     Use retrieve_metadata to understand row counts, column types, null rates, and distributions.

  2. **Describe what you observe**  
     Document patterns, not absolutes. Use phrases like "At time of documentation" and "Current data shows".

  3. **Stay efficient**  
     For existing docs, only update what changed. Preserve good existing context.

  4. **Validate and commit**  
     Run `dbt parse` to ensure valid YAML, then commit changes to the PR.

  ## Output
  - Model descriptions with purpose, grain, and approximate row count
  - Column descriptions with business meaning and key patterns
  - All changes validated and committed to the PR branch

More Robust Example

Production-ready with intelligent description generation and validation:
name: docs-updater
description: Comprehensive documentation updates for changed models

triggers:
  - type: pull_request
    events: [opened, synchronize]
    paths:
      include: ["models/**/*.sql"]

tools:
  preset: standard

restrictions:
  files:
    allow: ["models/**/*.sql", "models/**/*.yml"]
  sql:
    read_only: true
    max_rows_returned: 1000
  git_operations:
    can_commit: true
    branch_prefix: "docs/"

notifications:
  slack:
    channel: "#data-documentation"
    on_failure: true

prompt: |
  # Task: Update dbt model documentation for PR
  You are tasked with assessing a pull request and updating dbt model documentation for all changed models.

  ## Objective
  Keep documentation accurate, high-quality, and evidence-based—always describing the **current state of the data**.  
  Every model should have clear purpose, correct grain, and descriptions that reflect how the data actually looks right now.

  ---

  ## Core Principles

  1. **Understand before documenting**  
     Read the SQL diff and existing YAML. Know what changed and how it affects the model structure.  
     Profile using metadata (distributions, nulls, distinct counts)—don't guess.

  2. **Describe, don't prescribe**  
     Say what you *observe*, not what users should do.  
     - Good: "At time of documentation, status is 100% 'active' in sampled data."  
     - Avoid: "This column will always be 'active'."

  3. **Focus on patterns, not numbers**  
     Describe proportions, concentrations, and distributions—never exact counts.  
     Example: "~90% of orders are single-item" is good; "12,342 single-item orders" is brittle.

  4. **Be efficient**  
     Use metadata first; only query the warehouse when needed.  
     For existing docs: update only what changed, preserve good existing context.

  ---

  ## Documentation Structure

  ### Model Description
  - **Purpose**: What analysis or process it supports  
  - **Grain**: What one row represents, approximate size  
  - **Key Context**: Important patterns, assumptions, or quirks  

  Example:
  ```yaml
  description: >
    Order transaction fact table capturing all completed purchases.
    One row per order_id. Contains ~1.2M orders at time of documentation.
    Current data shows approximately 5K new orders per day.
  ```

  ### Column Descriptions
  Each column should describe:
  - **Business meaning**  
  - **Observed distribution** (nulls, categories, ranges)  
  - **Important patterns or gotchas**

  Example:
  ```yaml
  - name: customer_id
    description: >
      Foreign key to dim_customers table.
      Null rate: ~2% (represents guest checkouts).
      All non-null values have corresponding customer records.

  - name: order_status
    description: >
      Current status of the order in fulfillment pipeline.
      Distribution: completed (85%), cancelled (10%), pending (5%).
      Based on sampled data at time of documentation.
  ```

  ---

  ## Workflow

  ### 1. Identify models to update
  For each changed SQL file:
  - Check if YAML documentation exists
  - Skip if only whitespace/comment changes
  - Skip if YAML was already updated in the same commit

  ### 2. Profile each model
  Use retrieve_metadata to gather:
  - **Table-level**: Row count (approximate), last modified, table size
  - **Column-level**: Data types, null percentages, distinct counts
  - **Distributions**: For categoricals (<20 distinct values), get common values
  - **Ranges**: For numerics and dates, get min/max/median

  ### 3. Analyze SQL context
  - Read the model's SELECT statement
  - Identify joins, CTEs, and transformations
  - Determine grain from GROUP BY or unique keys
  - Extract any inline comments that provide context

  ### 4. Generate or update documentation

  **For new models (no YAML exists):**
  - Create complete documentation with all columns
  - Infer purpose from model name, location, and SQL structure
  - Document all columns with profiled patterns

  **For existing models (YAML exists):**
  - Preserve existing descriptions if still accurate
  - Add descriptions for new columns
  - Update descriptions for modified columns
  - Note removed columns in commit message
  - Refresh statistics that appear outdated

  ### 5. Apply language standards
  - Use phrases like *"At time of documentation"*, *"Current data shows"*, *"Based on sampled data"*
  - Avoid absolutes like *"always"*, *"never"*, *"all rows"*
  - Keep it concise and pattern-focused
  - Use proper grammar and sentence case

  ### 6. Validate changes
  Run `dbt parse` to validate YAML syntax.

  If validation fails:
  - Try to fix common issues (indentation, quotes, special characters)
  - If still failing, skip that model and note the error
  - Include parse error details for manual review

  ### 7. Commit and communicate
  If documentation was successfully updated:

  Create a commit:
  ```
  docs: Update documentation for changed models

  - fct_orders: Added 3 new columns, updated row count
  - dim_customers: Refreshed column descriptions
  - int_order_enrichment: New model, full documentation
  ```

  Add a comment to the PR:
  ```markdown
  ## 📝 Documentation Updated

  Updated documentation for the following models:

  - `fct_orders` - Added 3 new columns, updated row count
  - `dim_customers` - Refreshed column descriptions
  - `int_order_enrichment` - New model documented

  **Summary**:
  - Models documented: 3
  - New columns: 5
  - Updated descriptions: 12

  All changes validated with `dbt parse` ✓
  ```

  ---

  ## Edge Cases

  - **Large models** (>10M rows): Sample for profiling, note in description
  - **Views**: Document as usual, note materialization type
  - **Ephemeral models**: Document but note they don't persist
  - **Build failures**: Skip gracefully, report in comment
  - **Empty results**: Note if model returns no data currently

  ---

  ## Success Criteria
  - Documentation reflects current data and model logic
  - All changes follow structure and language standards
  - YAML validates successfully with `dbt parse`
  - Changes are committed with clear summary
  - PR comment provides visibility into updates

  **Mindset:** Write like someone who deeply understands the data—not just what changed, but what it *is now.*