PR Data Diffs

This agent compares data output between the PR branch and the main branch for changed dbt models, catching logic errors and unexpected data changes that might not be caught by compilation or tests. It runs queries against both versions and reports meaningful differences in row counts, column values, or distributions.

Simple Example

name: data-diff-checker
description: Compare data output for changed models

triggers:
  - type: pull_request
    on_changed_files: "models/**/*.sql"

tools:
  preset: standard

prompt: |
  For each changed model:
  1. Run the model on PR branch
  2. Compare with production version
  3. Report row count changes and sample differences
  4. Comment on PR with findings

More Robust Example

Production-ready with statistical comparison and intelligent diff analysis:

name: data-diff-checker
description: Comprehensive data comparison for changed dbt models

triggers:
  - type: pull_request
    events: [opened, synchronize]
    paths:
      include: ["models/**/*.sql"]
    conditions:
      - type: pr_labels
        none_of: ["skip-diff"]

tools:
  preset: standard

restrictions:
  sql:
    read_only: false  # Need to create temp tables for comparison
    allowed_schemas: ["raw", "staging", "marts", "dbt_pr_build"]
    max_query_time: 600  # 10 min for large comparisons
  git_operations:
    can_commit: false  # Read-only, just comment

notifications:
  slack:
    channel: "#data-quality"
    on_failure: true

prompt: |
  ## 1. Identify models to compare
  
  For each SQL file changed in this PR:
  
  a) Skip if:
     - Model is brand new (no prod version to compare)
     - Changes are only to comments or whitespace
     - PR has "skip-diff" label
     - Model is in seeds/ or snapshots/
  
  b) Prioritize by model type:
     - **High priority**: Mart models (fct_, dim_)
     - **Medium priority**: Intermediate models
     - **Low priority**: Staging models (usually 1-1 with source)
  
  c) Check if model is safe to run:
     - No dependencies on unreleased models
     - Not incremental (or can run full-refresh)
     - Not too expensive (check for table size hints)
  
  ## 2. Build PR version
  
  For each model to compare:
  
  a) Check if already built in PR schema:
     ```sql
     SELECT COUNT(*) 
     FROM information_schema.tables 
     WHERE table_schema = 'dbt_pr_build_{pr_number}'
     AND table_name = '{model_name}'

b) If not built, run dbt:

dbt build --select {model_name} --target pr_build --full-refresh

c) Verify build succeeded before comparing

3. Run comparison queries

For each model, run comprehensive comparisons:

A. Row Count Comparison

SELECT 
  'production' as version,
  COUNT(*) as row_count
FROM production.{schema}.{model_name}

UNION ALL

SELECT 
  'pr' as version,
  COUNT(*) as row_count
FROM dbt_pr_build_{pr_number}.{schema}.{model_name}

Calculate:

Absolute difference
Percentage change
Flag if change > 10% (configurable threshold)

B. Column-Level Statistics

For each column, compare:

WITH prod_stats AS (
  SELECT 
    '{column}' as column_name,
    COUNT(*) as total_rows,
    COUNT({column}) as non_null_count,
    COUNT(DISTINCT {column}) as distinct_count,
    -- For numerics
    MIN({column}) as min_val,
    MAX({column}) as max_val,
    AVG({column}) as avg_val,
    STDDEV({column}) as stddev_val
  FROM production.{schema}.{model_name}
),
pr_stats AS (
  SELECT 
    '{column}' as column_name,
    COUNT(*) as total_rows,
    COUNT({column}) as non_null_count,
    COUNT(DISTINCT {column}) as distinct_count,
    MIN({column}) as min_val,
    MAX({column}) as max_val,
    AVG({column}) as avg_val,
    STDDEV({column}) as stddev_val
  FROM dbt_pr_build_{pr_number}.{schema}.{model_name}
)
SELECT 
  p.*,
  pr.*
FROM prod_stats p
CROSS JOIN pr_stats pr

Flag significant changes:

Null rate change > 5%
Distinct count change > 20%
Avg value change > 15% (for numerics)
Min/max outside expected range

C. Sample Row Comparison

Find rows that exist in one version but not the other:

-- Rows in PROD but not in PR (deleted)
SELECT 'deleted' as diff_type, p.*
FROM production.{schema}.{model_name} p
LEFT JOIN dbt_pr_build_{pr_number}.{schema}.{model_name} pr
  ON p.{primary_key} = pr.{primary_key}
WHERE pr.{primary_key} IS NULL
LIMIT 100

-- Rows in PR but not in PROD (added)
SELECT 'added' as diff_type, pr.*
FROM dbt_pr_build_{pr_number}.{schema}.{model_name} pr
LEFT JOIN production.{schema}.{model_name} p
  ON pr.{primary_key} = p.{primary_key}
WHERE p.{primary_key} IS NULL
LIMIT 100

-- Rows with changed values
SELECT 
  'modified' as diff_type,
  p.{primary_key},
  -- Show key columns that changed
  p.{column} as prod_value,
  pr.{column} as pr_value
FROM production.{schema}.{model_name} p
JOIN dbt_pr_build_{pr_number}.{schema}.{model_name} pr
  ON p.{primary_key} = pr.{primary_key}
WHERE p.{column} != pr.{column} 
   OR (p.{column} IS NULL AND pr.{column} IS NOT NULL)
   OR (p.{column} IS NOT NULL AND pr.{column} IS NULL)
LIMIT 100

D. Aggregate Checks

For fact tables, compare key metrics:

-- Compare revenue totals, for example
SELECT 
  'production' as version,
  SUM(revenue) as total_revenue,
  AVG(revenue) as avg_revenue,
  COUNT(DISTINCT customer_id) as unique_customers
FROM production.{schema}.{model_name}

UNION ALL

SELECT 
  'pr' as version,
  SUM(revenue) as total_revenue,
  AVG(revenue) as avg_revenue,
  COUNT(DISTINCT customer_id) as unique_customers
FROM dbt_pr_build_{pr_number}.{schema}.{model_name}

Flag if business metrics changed unexpectedly.

4. Analyze differences

For each detected difference, determine severity:

Expected Changes (Low Severity):

New column added (not in prod)
Formatting changes (trim, case)
Rounding differences (0.0001 tolerance)
Timestamp precision changes
Order changes (if model not ordered)

Potentially Breaking (Medium Severity):

Row count change >10%
Null rate change >5%
Column removed
Distinct count significant change

Likely Bugs (High Severity):

Row count change >50%
Key metric change (revenue, counts off significantly)
Many rows deleted
All values changed for important column
Primary key no longer unique

5. Investigate root causes

For unexpected differences, try to determine why: a) Compare SQL changes:

Check git diff for changed joins
Look for modified WHERE clauses
Find changed aggregations or GROUP BY

b) Check for data dependency changes:

Did upstream models change?
Is source data different in time windows?

c) Verify it’s not an infrastructure issue:

Did PR build use correct refs?
Are there data quality issues in staging?

6. Generate comparison report

Post detailed comment on PR:

## 📊 Data Diff Report

Compared **{model_count}** changed models between production and this PR.

---

### ✅ Models with Expected Changes

**`fct_orders`** - Row count increased
- **Production**: 1,245,890 rows
- **PR Branch**: 1,248,103 rows (+2,213 rows, +0.18%)
- **Analysis**: Increase expected from new orders in time window ✓

---

### ⚠️  Models with Unexpected Differences

**`fct_customer_lifetime_value`** - Row count decreased significantly
- **Production**: 45,678 rows
- **PR Branch**: 38,290 rows (-7,388 rows, **-16.2%**)
- **Severity**: HIGH

**Detailed Analysis:**

| Metric | Production | PR | Change |
|--------|------------|----|----|
| Total Customers | 45,678 | 38,290 | -16.2% |
| Avg LTV | $3,245 | $3,892 | +19.9% |
| Total LTV | $148.2M | $149.0M | +0.5% |
| Null LTV Count | 0 | 0 | - |

**Root Cause Investigation:**

Looking at the SQL diff, you added a filter:
```sql
WHERE total_orders >= 2  -- NEW FILTER

This excludes customers with only 1 order, removing 7,388 customers (16.2%). Impact:

Average LTV increased because we removed low-value customers
Total LTV barely changed (one-time customers contributed little)
This appears intentional based on the PR description

Sample Removed Customers (first 5):

customer_id	total_orders	revenue
cust_100234	1	$45.00
cust_100891	1	$23.50
cust_101245	1	$67.25

Recommendation: ✓ Verify this filter aligns with business definition of “LTV customers”

dim_products - Column value changes detected

Severity: MEDIUM
Changes: 234 products have different category values

Sample Changes:

product_id	prod_category	pr_category
prod_1001	Electronics	Consumer Electronics
prod_1002	Electronics	Consumer Electronics
prod_2031	Home & Garden	Home Goods

Root Cause: Updated category mapping logic in staging layer Downstream Impact:

5 models use dim_products.category for grouping
Dashboards will show new category names
Historical comparisons may be affected

Recommendation: ⚠️ Consider creating a category mapping to maintain historical continuity

🔍 Technical Details

📈 Summary

Models Compared: 6
No Differences: 3
Expected Differences: 1
Unexpected Differences: 2 (1 high, 1 medium)

Action Items:

⚠️ Verify the customer filter in fct_customer_lifetime_value is intentional
⚠️ Review category name changes in dim_products and downstream impact
✅ Other changes look good!

💡 Tip: Add “skip-diff” label to skip data comparison for this PR.

## 7. Handle edge cases

- **Large models**: Sample comparison instead of full table (>10M rows)
- **Slow models**: Set timeout, partial results if exceeded
- **Build failures**: Report failure, don't compare
- **No primary key**: Use row hashing for comparison
- **Too many differences**: Provide summary, link to full results
- **Incremental models**: Compare full-refresh version or skip

## 8. Clean up

After comparison:
- Drop temporary tables created for comparison
- Clear PR build schema if not needed
- Log execution time and costs for monitoring

Getting Started

Examples

Guides

Resources

Simple Example

More Robust Example

3. Run comparison queries

A. Row Count Comparison

B. Column-Level Statistics

C. Sample Row Comparison

D. Aggregate Checks

4. Analyze differences

Expected Changes (Low Severity):

Potentially Breaking (Medium Severity):

Likely Bugs (High Severity):

5. Investigate root causes

6. Generate comparison report

🔍 Technical Details

📈 Summary

Getting Started

Examples

Guides

Resources

​Simple Example

​More Robust Example

​3. Run comparison queries

​A. Row Count Comparison

​B. Column-Level Statistics

​C. Sample Row Comparison

​D. Aggregate Checks

​4. Analyze differences

​Expected Changes (Low Severity):

​Potentially Breaking (Medium Severity):

​Likely Bugs (High Severity):

​5. Investigate root causes

​6. Generate comparison report

​🔍 Technical Details

​📈 Summary

Simple Example

More Robust Example

3. Run comparison queries

A. Row Count Comparison

B. Column-Level Statistics

C. Sample Row Comparison

D. Aggregate Checks

4. Analyze differences

Expected Changes (Low Severity):

Potentially Breaking (Medium Severity):

Likely Bugs (High Severity):

5. Investigate root causes

6. Generate comparison report

🔍 Technical Details

📈 Summary