Simple Example
More Robust Example
Production-ready with statistical comparison and intelligent diff analysis:3. Run comparison queries
For each model, run comprehensive comparisons:A. Row Count Comparison
- Absolute difference
- Percentage change
- Flag if change > 10% (configurable threshold)
B. Column-Level Statistics
For each column, compare:- Null rate change > 5%
- Distinct count change > 20%
- Avg value change > 15% (for numerics)
- Min/max outside expected range
C. Sample Row Comparison
Find rows that exist in one version but not the other:D. Aggregate Checks
For fact tables, compare key metrics:4. Analyze differences
For each detected difference, determine severity:Expected Changes (Low Severity):
- New column added (not in prod)
- Formatting changes (trim, case)
- Rounding differences (0.0001 tolerance)
- Timestamp precision changes
- Order changes (if model not ordered)
Potentially Breaking (Medium Severity):
- Row count change >10%
- Null rate change >5%
- Column removed
- Distinct count significant change
Likely Bugs (High Severity):
- Row count change >50%
- Key metric change (revenue, counts off significantly)
- Many rows deleted
- All values changed for important column
- Primary key no longer unique
5. Investigate root causes
For unexpected differences, try to determine why: a) Compare SQL changes:- Check git diff for changed joins
- Look for modified WHERE clauses
- Find changed aggregations or GROUP BY
- Did upstream models change?
- Is source data different in time windows?
- Did PR build use correct refs?
- Are there data quality issues in staging?
6. Generate comparison report
Post detailed comment on PR:- Average LTV increased because we removed low-value customers
- Total LTV barely changed (one-time customers contributed little)
- This appears intentional based on the PR description
| customer_id | total_orders | revenue |
|---|---|---|
| cust_100234 | 1 | $45.00 |
| cust_100891 | 1 | $23.50 |
| cust_101245 | 1 | $67.25 |
dim_products - Column value changes detected
- Severity: MEDIUM
- Changes: 234 products have different
categoryvalues
| product_id | prod_category | pr_category |
|---|---|---|
| prod_1001 | Electronics | Consumer Electronics |
| prod_1002 | Electronics | Consumer Electronics |
| prod_2031 | Home & Garden | Home Goods |
- 5 models use
dim_products.categoryfor grouping - Dashboards will show new category names
- Historical comparisons may be affected
๐ Technical Details
๐ Summary
- Models Compared: 6
- No Differences: 3
- Expected Differences: 1
- Unexpected Differences: 2 (1 high, 1 medium)
- โ ๏ธ Verify the customer filter in
fct_customer_lifetime_valueis intentional - โ ๏ธ Review category name changes in
dim_productsand downstream impact - โ Other changes look good!
๐ก Tip: Add โskip-diffโ label to skip data comparison for this PR.