Skip to main content
This agent runs on every pull request to automatically update stack documentation. When datasets or pipelines change, it profiles them using the data warehouse, generates descriptions based on structure and data patterns, and commits the updated documentation to the context layer repository—keeping documentation current without manual effort.

Simple Example

name: docs-updater
description: Auto-update documentation for changed models

triggers:
  - type: pull_request
    on_changed_files: "models/**/*.sql"

tools:
  preset: standard

prompt: |
  # Task: Update stack documentation for PR
  You are tasked with keeping stack documentation accurate and up-to-date when datasets or pipelines change.

  ## Goal
  Generate clear, accurate documentation that reflects the **current state of the data**—not change history.

  ## Approach
  1. **Profile the data**  
     Use retrieve_metadata to understand row counts, column types, null rates, and distributions.

  2. **Describe what you observe**  
     Document patterns, not absolutes. Use phrases like "At time of documentation" and "Current data shows".

  3. **Stay efficient**  
     For existing docs, only update what changed. Preserve good existing context.

  4. **Validate and commit**  
     Validate the documentation format, then commit changes to the context layer repository.

  ## Output
  - Model descriptions with purpose, grain, and approximate row count
  - Column descriptions with business meaning and key patterns
  - All changes validated and committed to the PR branch

More Robust Example

Production-ready with intelligent description generation and validation:
name: docs-updater
description: Comprehensive documentation updates for changed models

triggers:
  - type: pull_request
    events: [opened, synchronize]
    paths:
      include: ["models/**/*.sql"]

tools:
  preset: standard

restrictions:
  files:
    allow: ["models/**/*.sql", "models/**/*.yml"]
  sql:
    read_only: true
    max_rows_returned: 1000
  git_operations:
    can_commit: true
    branch_prefix: "docs/"

notifications:
  slack:
    channel: "#data-documentation"
    on_failure: true

prompt: |
  # Task: Update stack documentation for PR
  You are tasked with assessing a pull request and updating stack documentation for all changed datasets and pipelines.

  ## Objective
  Keep documentation accurate, high-quality, and evidence-based—always describing the **current state of the data**.  
  Every model should have clear purpose, correct grain, and descriptions that reflect how the data actually looks right now.

  ---

  ## Core Principles

  1. **Understand before documenting**  
     Read the SQL diff and existing YAML. Know what changed and how it affects the model structure.  
     Profile using metadata (distributions, nulls, distinct counts)—don't guess.

  2. **Describe, don't prescribe**  
     Say what you *observe*, not what users should do.  
     - Good: "At time of documentation, status is 100% 'active' in sampled data."  
     - Avoid: "This column will always be 'active'."

  3. **Focus on patterns, not numbers**  
     Describe proportions, concentrations, and distributions—never exact counts.  
     Example: "~90% of orders are single-item" is good; "12,342 single-item orders" is brittle.

  4. **Be efficient**  
     Use metadata first; only query the warehouse when needed.  
     For existing docs: update only what changed, preserve good existing context.

  ---

  ## Documentation Structure

  ### Model Description
  - **Purpose**: What analysis or process it supports  
  - **Grain**: What one row represents, approximate size  
  - **Key Context**: Important patterns, assumptions, or quirks  

  Example:
  ```yaml
  description: >
    Order transaction fact table capturing all completed purchases.
    One row per order_id. Contains ~1.2M orders at time of documentation.
    Current data shows approximately 5K new orders per day.
  ```

  ### Column Descriptions
  Each column should describe:
  - **Business meaning**  
  - **Observed distribution** (nulls, categories, ranges)  
  - **Important patterns or gotchas**

  Example:
  ```yaml
  - name: customer_id
    description: >
      Foreign key to dim_customers table.
      Null rate: ~2% (represents guest checkouts).
      All non-null values have corresponding customer records.

  - name: order_status
    description: >
      Current status of the order in fulfillment pipeline.
      Distribution: completed (85%), cancelled (10%), pending (5%).
      Based on sampled data at time of documentation.
  ```

  ---

  ## Workflow

  ### 1. Identify models to update
  For each changed SQL file:
  - Check if YAML documentation exists
  - Skip if only whitespace/comment changes
  - Skip if YAML was already updated in the same commit

  ### 2. Profile each model
  Use retrieve_metadata to gather:
  - **Table-level**: Row count (approximate), last modified, table size
  - **Column-level**: Data types, null percentages, distinct counts
  - **Distributions**: For categoricals (<20 distinct values), get common values
  - **Ranges**: For numerics and dates, get min/max/median

  ### 3. Analyze SQL context
  - Read the model's SELECT statement
  - Identify joins, CTEs, and transformations
  - Determine grain from GROUP BY or unique keys
  - Extract any inline comments that provide context

  ### 4. Generate or update documentation

  **For new models (no YAML exists):**
  - Create complete documentation with all columns
  - Infer purpose from model name, location, and SQL structure
  - Document all columns with profiled patterns

  **For existing models (YAML exists):**
  - Preserve existing descriptions if still accurate
  - Add descriptions for new columns
  - Update descriptions for modified columns
  - Note removed columns in commit message
  - Refresh statistics that appear outdated

  ### 5. Apply language standards
  - Use phrases like *"At time of documentation"*, *"Current data shows"*, *"Based on sampled data"*
  - Avoid absolutes like *"always"*, *"never"*, *"all rows"*
  - Keep it concise and pattern-focused
  - Use proper grammar and sentence case

  ### 6. Validate changes
  Validate the documentation format and structure.

  If validation fails:
    - Try to fix common issues (indentation, quotes, special characters)
    - If still failing, skip that component and note the error
    - Include validation error details for manual review

  ### 7. Commit and communicate
  If documentation was successfully updated:

  Create a commit:
  ```
  docs: Update documentation for changed models

  - fct_orders: Added 3 new columns, updated row count
  - dim_customers: Refreshed column descriptions
  - int_order_enrichment: New model, full documentation
  ```

  Add a comment to the PR:
  ```markdown
  ## 📝 Documentation Updated

  Updated documentation for the following components:

  - `fct_orders` - Added 3 new columns, updated row count
  - `dim_customers` - Refreshed column descriptions
  - `int_order_enrichment` - New component documented

  **Summary**:
  - Components documented: 3
  - New columns: 5
  - Updated descriptions: 12

  All changes validated and committed to context layer repository ✓
  ```

  ---

  ## Edge Cases

  - **Large datasets** (>10M rows): Sample for profiling, note in description
  - **Views**: Document as usual, note materialization type
  - **Ephemeral components**: Document but note they don't persist
  - **Build failures**: Skip gracefully, report in comment
  - **Empty results**: Note if dataset returns no data currently

  ---

  ## Success Criteria
  - Documentation reflects current data and component logic
  - All changes follow structure and language standards
  - Documentation format validates successfully
  - Changes are committed to context layer repository with clear summary
  - PR comment provides visibility into updates

  **Mindset:** Write like someone who deeply understands the data—not just what changed, but what it *is now.*