Skip to main content
This agent automatically monitors your warehouse for expensive queries, detects cost anomalies, and alerts your team before your bill explodes. It identifies runaway queries, inefficient patterns like full table scans or Cartesian joins, and sends detailed alerts with optimization recommendations.

Simple Example

name: cost_monitor
description: Alert on expensive queries

prompt: |
  Find yesterday's most expensive queries:
  
  SELECT 
    query_text,
    total_elapsed_time/1000 as seconds,
    bytes_scanned/1e12 as TB_scanned
  FROM snowflake.account_usage.query_history
  WHERE start_time >= DATEADD('day', -1, CURRENT_DATE())
  ORDER BY bytes_scanned DESC
  LIMIT 10
  
  Send Slack report to #data-costs with top 10.

triggers:
  - event: scheduled
    cron: "0 9 * * *"  # Daily at 9 AM UTC

tools:
  include:
    - slack_tool

Production Example

Anomaly detection and real-time monitoring:
name: warehouse_cost_monitor
description: Monitor and prevent warehouse cost explosions

prompt: |
  Monitor warehouse costs and detect anomalies.
  
  ## 1. Get recent query costs
  
  Run this query to get recent expensive queries:
  
  SELECT 
    query_id,
    query_text,
    user_name,
    warehouse_name,
    start_time,
    total_elapsed_time/1000 as seconds,
    bytes_scanned/1e12 as tb_scanned,
    credits_used_cloud_services,
    (credits_used_cloud_services * 2) as estimated_cost_usd
  FROM snowflake.account_usage.query_history
  WHERE start_time >= DATEADD('hour', -1, CURRENT_TIMESTAMP())
  ORDER BY estimated_cost_usd DESC
  LIMIT 100
  
  ## 2. Identify expensive queries
  
  For each query, classify:
  
  **CRITICAL** (> $50):
  - Single query cost > $50
  - Immediate action needed
  
  **WARNING** (> $10):
  - Single query cost > $10
  - Should optimize
  
  **INEFFICIENT** (large scans):
  - Scanned > 1 TB
  - Might have optimization opportunities
  
  ## 3. Detect patterns
  
  Look for:
  
  a) **Runaway queries**:
     - Same query running repeatedly
     - Check: GROUP BY query_text, COUNT(*)
     - If > 100 times/hour: Likely broken
  
  b) **Full table scans**:
     - No WHERE clause on large tables
     - Pattern: SELECT * FROM large_table
  
  c) **Cartesian joins**:
     - JOIN without ON condition
     - Result: Massive row explosion
  
  d) **Dashboards gone wild**:
     - User queries > 500 times/hour
     - Likely auto-refresh set too aggressive
  
  ## 4. Alert based on severity
  
  For CRITICAL issues, send Slack to #data-incidents:
  "🚨 EXPENSIVE QUERY ALERT
   
   Cost: $[cost] ([credits] credits)
   User: [user]
   Warehouse: [warehouse]
   Duration: [seconds]s
   Data scanned: [tb_scanned] TB
   
   Query (truncated):
   [query_text, first 500 chars]
   
   Action required:
   1. Check if query is necessary
   2. Consider killing: CALL SYSTEM$CANCEL_QUERY('[query_id]')
   3. Optimize query (add filters, use materialized views)
   4. Contact user: [user]"
  
  For WARNING issues:
  Send to #data-costs with optimization suggestions.
  
  ## 5. Daily cost report (9 AM trigger)
  
  Calculate yesterday's totals and send to #data-costs:
  "📊 Daily Cost Report - [date]
   
   💰 Total cost: $[total_cost] ([total_credits] credits)
   📈 Usage: [total_queries] queries, [total_tb_scanned] TB scanned
   
   🔥 Top 5 expensive queries:
   1. $[cost] - [user] - [truncated_query]
   ...
   
   🚀 Top users by cost:
   - [user]: $[cost]
   ...
   
   💡 Optimization opportunities:
   - [count] queries doing full table scans
   - [count] queries without LIMIT clauses"

triggers:
  # Daily report
  - event: scheduled
    cron: "0 9 * * *"
  
  # Real-time monitoring (every 15 min during business hours)
  - event: scheduled
    cron: "*/15 9-17 * * 1-5"

tools:
  include:
    - slack_tool