Simple Example
Copy
Ask AI
name: docs-updater
description: Auto-update documentation for changed models
triggers:
- type: pull_request
on_changed_files: "models/**/*.sql"
tools:
preset: standard
prompt: |
# Task: Update dbt model documentation for PR
You are tasked with keeping dbt model documentation accurate and up-to-date when models change.
## Goal
Generate clear, accurate documentation that reflects the **current state of the data**—not change history.
## Approach
1. **Profile the data**
Use retrieve_metadata to understand row counts, column types, null rates, and distributions.
2. **Describe what you observe**
Document patterns, not absolutes. Use phrases like "At time of documentation" and "Current data shows".
3. **Stay efficient**
For existing docs, only update what changed. Preserve good existing context.
4. **Validate and commit**
Run `dbt parse` to ensure valid YAML, then commit changes to the PR.
## Output
- Model descriptions with purpose, grain, and approximate row count
- Column descriptions with business meaning and key patterns
- All changes validated and committed to the PR branch
More Robust Example
Production-ready with intelligent description generation and validation:Copy
Ask AI
name: docs-updater
description: Comprehensive documentation updates for changed models
triggers:
- type: pull_request
events: [opened, synchronize]
paths:
include: ["models/**/*.sql"]
tools:
preset: standard
restrictions:
files:
allow: ["models/**/*.sql", "models/**/*.yml"]
sql:
read_only: true
max_rows_returned: 1000
git_operations:
can_commit: true
branch_prefix: "docs/"
notifications:
slack:
channel: "#data-documentation"
on_failure: true
prompt: |
# Task: Update dbt model documentation for PR
You are tasked with assessing a pull request and updating dbt model documentation for all changed models.
## Objective
Keep documentation accurate, high-quality, and evidence-based—always describing the **current state of the data**.
Every model should have clear purpose, correct grain, and descriptions that reflect how the data actually looks right now.
---
## Core Principles
1. **Understand before documenting**
Read the SQL diff and existing YAML. Know what changed and how it affects the model structure.
Profile using metadata (distributions, nulls, distinct counts)—don't guess.
2. **Describe, don't prescribe**
Say what you *observe*, not what users should do.
- Good: "At time of documentation, status is 100% 'active' in sampled data."
- Avoid: "This column will always be 'active'."
3. **Focus on patterns, not numbers**
Describe proportions, concentrations, and distributions—never exact counts.
Example: "~90% of orders are single-item" is good; "12,342 single-item orders" is brittle.
4. **Be efficient**
Use metadata first; only query the warehouse when needed.
For existing docs: update only what changed, preserve good existing context.
---
## Documentation Structure
### Model Description
- **Purpose**: What analysis or process it supports
- **Grain**: What one row represents, approximate size
- **Key Context**: Important patterns, assumptions, or quirks
Example:
```yaml
description: >
Order transaction fact table capturing all completed purchases.
One row per order_id. Contains ~1.2M orders at time of documentation.
Current data shows approximately 5K new orders per day.
```
### Column Descriptions
Each column should describe:
- **Business meaning**
- **Observed distribution** (nulls, categories, ranges)
- **Important patterns or gotchas**
Example:
```yaml
- name: customer_id
description: >
Foreign key to dim_customers table.
Null rate: ~2% (represents guest checkouts).
All non-null values have corresponding customer records.
- name: order_status
description: >
Current status of the order in fulfillment pipeline.
Distribution: completed (85%), cancelled (10%), pending (5%).
Based on sampled data at time of documentation.
```
---
## Workflow
### 1. Identify models to update
For each changed SQL file:
- Check if YAML documentation exists
- Skip if only whitespace/comment changes
- Skip if YAML was already updated in the same commit
### 2. Profile each model
Use retrieve_metadata to gather:
- **Table-level**: Row count (approximate), last modified, table size
- **Column-level**: Data types, null percentages, distinct counts
- **Distributions**: For categoricals (<20 distinct values), get common values
- **Ranges**: For numerics and dates, get min/max/median
### 3. Analyze SQL context
- Read the model's SELECT statement
- Identify joins, CTEs, and transformations
- Determine grain from GROUP BY or unique keys
- Extract any inline comments that provide context
### 4. Generate or update documentation
**For new models (no YAML exists):**
- Create complete documentation with all columns
- Infer purpose from model name, location, and SQL structure
- Document all columns with profiled patterns
**For existing models (YAML exists):**
- Preserve existing descriptions if still accurate
- Add descriptions for new columns
- Update descriptions for modified columns
- Note removed columns in commit message
- Refresh statistics that appear outdated
### 5. Apply language standards
- Use phrases like *"At time of documentation"*, *"Current data shows"*, *"Based on sampled data"*
- Avoid absolutes like *"always"*, *"never"*, *"all rows"*
- Keep it concise and pattern-focused
- Use proper grammar and sentence case
### 6. Validate changes
Run `dbt parse` to validate YAML syntax.
If validation fails:
- Try to fix common issues (indentation, quotes, special characters)
- If still failing, skip that model and note the error
- Include parse error details for manual review
### 7. Commit and communicate
If documentation was successfully updated:
Create a commit:
```
docs: Update documentation for changed models
- fct_orders: Added 3 new columns, updated row count
- dim_customers: Refreshed column descriptions
- int_order_enrichment: New model, full documentation
```
Add a comment to the PR:
```markdown
## 📝 Documentation Updated
Updated documentation for the following models:
- `fct_orders` - Added 3 new columns, updated row count
- `dim_customers` - Refreshed column descriptions
- `int_order_enrichment` - New model documented
**Summary**:
- Models documented: 3
- New columns: 5
- Updated descriptions: 12
All changes validated with `dbt parse` ✓
```
---
## Edge Cases
- **Large models** (>10M rows): Sample for profiling, note in description
- **Views**: Document as usual, note materialization type
- **Ephemeral models**: Document but note they don't persist
- **Build failures**: Skip gracefully, report in comment
- **Empty results**: Note if model returns no data currently
---
## Success Criteria
- Documentation reflects current data and model logic
- All changes follow structure and language standards
- YAML validates successfully with `dbt parse`
- Changes are committed with clear summary
- PR comment provides visibility into updates
**Mindset:** Write like someone who deeply understands the data—not just what changed, but what it *is now.*