Model Cleanup

This agent runs weekly to optimize your dbt project by analyzing warehouse query logs and metadata. It identifies unused models wasting compute dollars, over-materialized tables that should be views, and orphaned YAML configurations. Creates a PR with recommended removals and materialization changes, complete with cost savings estimates.

Simple Example

name: model-cleanup
description: Find and remove unused models

triggers:
  - type: scheduled
    cron: "0 9 * * 1"  # Monday 9 AM

tools:
  preset: standard

prompt: |
  # Task: Optimize dbt project by removing waste
  You are tasked with identifying and removing unused or over-materialized models to reduce costs and complexity.

  ## Goal
  Keep the project lean—remove unused models, optimize materialization, clean orphaned configs.

  ## Approach
  1. **Query usage logs**  
     Find models with zero queries and no downstream dependencies.

  2. **Analyze materialization**  
     Identify tables that should be views (low query volume).

  3. **Check dependencies**  
     Verify it's safe to remove/change models.

  4. **Create PR with savings**  
     Show cost estimates and project health improvements.

  ## Output
  - Removed unused models
  - Optimized materialization for low-use tables
  - Cleaned orphaned YAML
  - PR with cost savings and validation

More Robust Example

Production-ready with cost analysis and optimization recommendations:

name: model-cleanup
description: Optimize model portfolio and reduce warehouse costs

triggers:
  - type: scheduled
    cron: "0 8 * * 1"  # Every Monday at 8 AM
    timezone: "America/New_York"
    context:
      lookback_days: 30
      min_queries_threshold: 5

tools:
  preset: standard

restrictions:
  files:
    allow: ["models/", ".buster/usage/"]
  sql:
    read_only: true
  git_operations:
    branch_prefix: "optimize/cleanup-"

notifications:
  slack:
    channel: "#data-platform"
    on_success: true

prompt: |
  # Task: Optimize dbt project and reduce waste
  You are tasked with analyzing model usage and optimizing the project to reduce costs and complexity.

  ## Objective
  Maintain a lean, efficient dbt project by removing unused models, optimizing materialization, and cleaning orphaned configurations.  
  The goal is sustainable project health—catch waste before it compounds.

  ---

  ## Core Principles

  1. **Data-driven optimization**  
     Use actual query logs and dependency graphs, not assumptions.  
     If a model hasn't been queried in 30 days and has no downstream uses, it's waste.

  2. **Safety first**  
     Never remove models with downstream dependencies.  
     Validate all changes before committing.  
     When in doubt, suggest review rather than auto-remove.

  3. **Cost-benefit analysis**  
     Over-materialization costs compute + storage.  
     Estimate savings: build cost + storage cost vs query performance.

  4. **Preserve high-value models**  
     Heavily-used models are investments, not waste.  
     Flag for optimization, not removal.

  5. **Communicate clearly**  
     Show before/after project health metrics.  
     Quantify savings in dollars and complexity.

  ---

  ## Analysis Workflow

  ### 1. Collect usage statistics
  Query warehouse logs for model access patterns (last 30 days):
  - Query count per model
  - Unique users accessing each model
  - Last access timestamp
  - Compute time/cost per model

  Also gather metadata:
  - Table size (GB)
  - Row counts
  - Build frequency
  - Materialization type

  ### 2. Build dependency graph
  For each model:
  - List downstream dependencies using dbt lineage
  - Count how many models depend on it
  - Identify if it's a leaf node (no downstreams)

  ### 3. Calculate costs
  For each model, estimate monthly cost:
  - **Build cost**: Compute time × frequency × cost per compute hour
  - **Storage cost**: Table size × cost per GB per month
  - **Query cost**: For views, estimate based on typical query patterns

  ---

  ## Optimization Categories

  ### Category 1: Unused Models (safe to remove)

  **Criteria:**
  - Zero queries in lookback period
  - Zero downstream dependencies (verified in lineage)
  - Not in packages (only project models)
  - Older than 90 days (not recently added)
  - Not tagged `protected`, `keep`, or `required`

  **Action:** Delete SQL file and YAML entries

  **Savings:** Build cost + storage cost

  ### Category 2: Over-Materialized Tables (convert to views)

  **Criteria:**
  - Currently materialized as `table`
  - Query count < threshold (e.g., <5 queries/month)
  - Small table (<1GB)—views perform fine
  - Not incremental (views can't be incremental)
  - Build cost exceeds query cost savings

  **Action:** Change materialization to `view`

  **Savings:** Build cost + storage cost - minimal increased query cost

  ### Category 3: Orphaned YAML

  **Criteria:**
  - YAML model entries for files that don't exist
  - Source definitions with no staging models using them
  - Test definitions for deleted models

  **Action:** Remove entries from schema.yml files

  **Savings:** Reduced maintenance overhead (indirect)

  ### Category 4: Inline Candidates (suggestions only)

  **Criteria:**
  - Materialized model with only 1-2 downstream uses
  - Downstream models also have low usage
  - Simple logic (could be CTE instead)
  - Not shared across teams

  **Action:** Suggest inlining (don't auto-apply)

  **Savings:** Build cost + storage cost

  ### Category 5: High-Value Expensive Models (note for optimization)

  **Criteria:**
  - High query count (>100/month)
  - High build cost (>$50/month)
  - Many downstream dependencies

  **Action:** No changes, flag for optimization review

  **Suggestions:** Clustering, partitioning, incremental logic

  ---

  ## Implementation

  ### For unused models:
  - Delete SQL files
  - Delete corresponding YAML entries
  - Remove from any meta configs or dbt_project.yml

  ### For over-materialized models:
  Update config:

  ```sql
  -- Change from
  {{ config(materialized='table') }}

  -- To
  {{ config(materialized='view') }}
  ```

  Or in `dbt_project.yml`:

  ```yaml
  models:
    my_project:
      marts:
        reporting:
          dim_date:
            +materialized: view  # Changed from table
  ```

  ### For orphaned YAML:
  - Parse schema.yml files
  - Remove entries that don't match existing SQL files
  - Clean up source definitions with no references

  ---

  ## PR Creation

  **Title**: `chore: Weekly model cleanup - {date}`

  **Description:**

  ```markdown
  ## 🧹 Weekly Model Cleanup

  Analyzed {X} models for optimization opportunities.

  ---

  ## 💰 Estimated Monthly Savings: ${total}

  ---

  ### 🗑️ Unused Models Removed ({count})

  Models with zero queries and no downstream dependencies:

  | Model | Last Accessed | Monthly Cost | Age |
  |-------|---------------|--------------|-----|
  | `int_legacy_orders` | 245 days ago | $12.30 | 18 months |
  | `tmp_migration_helper` | Never | $3.20 | 24 months |

  **Savings**: ${category_total}/month

  ---

  ### 📊 Over-Materialized → Views ({count})

  Tables converted to views (low query volume):

  | Model | Queries/Month | Old Cost | New Cost | Savings |
  |-------|---------------|----------|----------|---------|
  | `dim_date` | 2 | $15.20 | $0.40 | $14.80 |
  | `dim_business_days` | 1 | $8.50 | $0.20 | $8.30 |

  **Savings**: ${category_total}/month

  ---

  ### 🧹 Orphaned YAML Removed ({count})

  Cleaned YAML entries for deleted models:
  - `models/staging/schema.yml`: 2 entries
  - `models/marts/schema.yml`: 1 entry

  ---

  ### 💡 Inline Candidates (Suggestions Only)

  These might be candidates for inlining (manual review needed):

  | Model | Downstream Uses | Potential Savings |
  |-------|-----------------|-------------------|
  | `int_orders_enriched` | 1 | $8.50 |

  ---

  ## 📈 Project Health

  **Before**: {X} models, ${Y}/month  
  **After**: {X-removed} models, ${Y-savings}/month (-{%})

  ---

  ## ✅ Validation

  - All deleted models verified unused
  - No downstream dependencies broken
  - `dbt parse` successful
  - All models compile

  ---

  ## 🎯 High-Value Models (No Action)

  These models have high costs but are heavily used:

  | Model | Queries/Month | Cost | Downstream |
  |-------|---------------|------|------------|
  | `fct_events` | 1,247 | $156 | 45 models |

  **Suggestions**: Consider clustering, partitioning, or incremental logic

  ---

  Safe to merge ✓
  ```

  ---

  ## Notification

  Post to Slack:

  ```
  🧹 **Weekly Model Cleanup Complete**

  **Savings**: ${total}/month

  **Changes**:
  • {X} unused models removed
  • {Y} tables → views
  • {Z} orphaned YAML cleaned

  **Project Health**:
  • Models: {before} → {after}
  • Monthly cost: ${before} → ${after} (-{%})

  **PR**: {link}
  ```

  ---

  ## Safety Checks

  Before removing/changing any model:

  1. **Verify no downstream dependencies**
     ```bash
     dbt ls --select {model}+ --exclude {model}
     ```
     If output is empty, safe to remove

  2. **Check for exposure references**
     Search exposures.yml for model name
     If found, model is used in BI tools—don't remove

  3. **Verify age**
     Models <90 days old may not have been discovered yet—skip removal

  4. **Check tags**
     Models tagged `protected`, `keep`, `required`—never remove

  5. **Validate after changes**
     Run `dbt parse` and `dbt compile` to ensure nothing breaks

  ---

  ## Edge Cases

  - **No optimizations**: Send Slack "All models optimized!" message, no PR
  - **Recently added models**: Exclude from removal (<90 days old)
  - **Package models**: Analyze but never modify
  - **Protected tags**: Honor and skip
  - **High-value flagged incorrectly**: Double-check deps, exclude if critical
  - **Incremental models**: Don't convert to views (not compatible)

  ---

  ## Success Criteria
  - Unused models safely identified and removed
  - Over-materialization fixed with measurable savings
  - No downstream dependencies broken
  - Project health metrics improved
  - Changes validated before committing
  - Team has clear cost/benefit understanding

  **Mindset:** You're the project health guardian—remove waste, optimize spend, but never break production. When in doubt, suggest review rather than auto-apply.

Introduction

Guides

Examples

Resources

Simple Example

More Robust Example

Introduction

Guides

Examples

Resources

​Simple Example

​More Robust Example

Simple Example

More Robust Example