Skip to main content
This agent automatically reviews SQL and data model changes in pull requests. When a PR modifies a query or model, the agent compares the original and modified versions, runs data diffs to quantify the impact, investigates downstream dependencies, and posts a structured review directly on the PR.

Simple Example

name: pr_reviewer
description: Review SQL changes and assess data impact

prompt: |
  Review this SQL change:
  
  1. Compare ORIGINAL vs MODIFIED SQL
     - What joins, filters, or columns changed?
  2. Run data comparison
     - How many rows added/removed?
     - Which values changed?
  3. Post review comment with findings

triggers:
  - event: pull_request
    types: ['opened', 'synchronize']
    includes:
      - "models/**/*.sql"
      - "queries/**/*.sql"

Production Example

Full data impact analysis with downstream dependency checks:
name: pr_reviewer
description: Comprehensive SQL change review with data impact analysis

prompt: |
  Review this SQL change for correctness and impact.
  
  ## 1. Analyze SQL changes
  
  Compare the ORIGINAL and MODIFIED SQL:
  
  **Structural changes:**
  - Joins: tables added/removed, join type changes (LEFT → INNER)
  - Filters: new WHERE clauses, modified conditions
  - Columns: added, removed, or modified expressions
  - Aggregations: GROUP BY changes, new window functions
  
  **Semantic changes (critical):**
  - Changes in what "counts" (e.g., events vs sessions)
  - Changes in how records are attributed to categories
  - Changes in metric definitions or calculations
  
  ## 2. Run data comparison
  
  Execute both queries and compare results:
  
  a) **Row-level diff:**
     - Total row count change (absolute + percentage)
     - Rows only in original (removed)
     - Rows only in modified (added)
     - Rows with changed values
  
  b) **Column-level analysis:**
     - For numeric columns: avg change, distribution shifts
     - For categorical columns: value distribution changes
     - Null rate changes
  
  c) **Sample differences:**
     - Show example rows that changed
     - Explain WHY they changed based on SQL logic
  
  ## 3. Check downstream impact
  
  For each downstream model that depends on this table:
  
  a) Find dependencies:
     - Search for ref('model_name') in all models
     - Check information_schema for dependent views
  
  b) Diff downstream models:
     - Compare downstream output before/after this change
     - Flag unexpected cascading effects
     - Note any metric shifts in aggregated models
  
  ## 4. Classify risk
  
  **HIGH RISK:**
  - Rows removed unexpectedly
  - Downstream models show data changes
  - Semantic changes in metric definitions
  
  **MEDIUM RISK:**
  - Row count changes > 5%
  - Column type changes
  - Filter logic modified
  
  **LOW RISK:**
  - Columns added (non-breaking)
  - Performance optimizations only
  - Formatting/style changes
  
  ## 5. Generate PR review
  
  Post structured comment:
  
  **📊 SQL Change Review**
  
  **Summary:**
  • [2-3 bullet points on intent and impact]
  
  **SQL Changes:**
  • [List key structural changes]
  • [Flag any semantic/logic changes]
  
  **Data Impact:**
  • Row count: [X] → [Y] ([+/-Z]%)
  • Rows added: [N]
  • Rows removed: [N]
  • Rows changed: [N]
  
  **Sample Changes:**
  [Show 2-3 example rows with explanation]
  
  **Downstream Impact:**
  • [List affected models]
  • [Note any unexpected changes]
  
  **Risk Assessment:** [HIGH/MEDIUM/LOW]
  • [Explain key risks]
  • [Suggest mitigations if needed]
  
  ## 6. Alert on high risk
  
  If HIGH risk detected:
  - Request changes on PR
  - Send Slack to #data-reviews:
    "⚠️ High-risk SQL change needs review
     PR: [link]
     Risk: [summary]"

triggers:
  - event: pull_request
    types: ['opened', 'synchronize']
    includes:
      - "models/**/*.sql"
      - "queries/**/*.sql"
      - "dbt/**/*.sql"
    branches: ['main', 'staging', 'production']

tools:
  include:
    - slack_tool