Cursor Rules, Claude Code, etc.
Buster AI Agent Guidelines
This document provides guidance for AI agents (Claude, Windsurf, Cursor, etc.) when working with Buster configuration files and semantic models.
Note for AI Agents: All Buster documentation is available via HTTP requests. Simply:
- Access any page from https://docs.buster.so/ by appending
.md
to the URL to get markdown format- For example: https://docs.buster.so/references/semantic-models.md
- Use tools like
curl
orfetch
to retrieve documentation as needed- Two specialized endpoints are available:
- https://docs.buster.so/llms.txt (concise overview)
- https://docs.buster.so/llms-full.txt (comprehensive reference)
What is Buster?
Buster is a platform for building and deploying AI data analysts that connects to various data sources and uses a semantic layer to understand your data structure and business logic. The platform enables natural language querying of your data warehouse with accurate, relevant responses.
Core Components
buster.yml
This is the primary configuration file that defines your Buster project settings.
Basic Structure
Multi-Project Structure
Buster.yml now supports configuring multiple projects in a single file:
Semantic Models
Semantic models define a business-friendly abstraction layer over your data warehouse. They are organized in YAML files and consist of these key components:
Model Structure
Key Concepts for Effective Models
Models
- Purpose: Define meaningful business concepts or entities
- Types:
- Entity-focused: Representing business objects (customers, products)
- Event-focused: Representing occurrences (orders, page views)
- Metric-focused: Centered around core business KPIs
- Best Practices:
- Use snake_case for naming
- Provide clear, concise descriptions
- Organize by business domain in separate files
- Name files after the main model they contain
Data Model Components Hierarchy
Buster’s semantic layer organizes data into a logical hierarchy:
Dimensions
- Definition: The lowest level properties about a data object (non-numeric attributes)
- Purpose: Attributes used for filtering, grouping, or display
- Examples: Names, dates, IDs, categories, locations, statuses
- Common Types:
string
: Text values (e.g., customer_name, product_category)timestamp
: Date and time values (e.g., created_at, order_date)boolean
: True/false values (e.g., is_active, is_completed)
- Properties:
name
: Identifier (required) - Should match the exact column name in your databasedescription
: Explanation (required)type
: Data type (recommended)primary_key
: Boolean designating unique identifier (optional)searchable
: Boolean for natural language search (optional)expr
: SQL expression (optional) - If not provided, the dimension name is used as the column identifier
- Best Practices:
- Use column names as dimension names for direct mapping to your database
- Include time-based dimensions for temporal analysis
- Mark primary keys for joining and uniqueness
- Mark searchable fields for important attributes
- Use expr only when transformation is needed from raw data
Measures
- Definition: The lowest level numeric properties about a data object
- Purpose: Raw quantitative values that can be aggregated
- Examples: Amounts, counts, durations, prices, quantities
- Common Types:
decimal
: Precise numeric values (e.g., price, revenue)integer
: Whole numbers (e.g., quantity, views)
- Aggregation Types:
sum
: Add values togetheravg
: Calculate averagecount
: Count occurrencesmin
: Find minimum valuemax
: Find maximum valuecount_distinct
: Count unique values
- Properties:
name
: Identifier (required) - Should match the exact column name in your databasedescription
: Explanation (required)type
: Data type (recommended)agg
: Aggregation type (required)expr
: SQL expression (optional) - If not provided, the measure name is used as the column identifier
- Best Practices:
- Use column names as measure names for direct mapping to your database
- Choose appropriate aggregation for business meaning
- Use clear, specific names describing the quantity
- Include unit of measurement in description when applicable
- Use expr only when transformation is needed from raw data
Metrics
- Definition: High-level abstractions or calculations derived from dimensions and measures
- Purpose: Complex business KPIs and calculations
- Examples: Average order value, conversion rate, retention rate, year-over-year growth
- Properties:
name
: Identifier (required) - Descriptive name for the calculationdescription
: Explanation (required)expr
: SQL expression (optional) - Must always return a numeric value- If not provided, uses the column with the same name as the metric
args
: Parameters for dynamic metrics (optional)
- Syntax:
- Use standard SQL expressions
- Reference fields from the same model directly by name
- Reference fields from other models using dot notation
- Best Practices:
- Centralize business logic in metrics
- Use descriptive names indicating calculation
- If referencing a single column directly, name the metric after the column
- Document formula and business significance
- Parameterize metrics when appropriate
- Ensure all metrics return numeric values
Filters
- Definition: High-level abstractions that provide reusable filtering conditions
- Purpose: Pre-defined conditions that must resolve to a boolean
- Examples: is_active_customer, recent_orders, high_value_transaction
- Properties:
name
: Identifier (required) - Descriptive name for the filterdescription
: Explanation (required)expr
: SQL expression (optional) - Must always resolve to TRUE or FALSE- If not provided, uses the boolean column with the same name as the filter
- Syntax:
- Use standard SQL boolean expressions
- Can reference dimensions and measures from the model directly by name
- Common patterns include comparison operators (equals, greater than, less than, IN, BETWEEN)
- Best Practices:
- Define common filtering patterns without assuming specific values in the data
- If referencing a single boolean column directly, name the filter after the column
- Use for time-based filters (e.g., recent_period, current_year)
- Create concept-based filters that describe what the filter does, not specific values
- Ensure expressions always evaluate to boolean values
- Avoid assuming specific values exist in a column; consult actual data first
Relationships
- Definition: Connections between different data models
- Purpose: Define how models relate to each other
- Examples: users-to-orders, products-to-reviews
- Properties:
name
: Relationship name (required)primary_key
: Unique identifier field (required)foreign_key
: Reference field in formatmodel.field
(required)cardinality
: Type of relationship (recommended)join_type
: SQL join type (optional, defaults to LEFT)
- Cardinality Types:
one_to_one
: Each record connects to exactly one recordone_to_many
: Parent record connects to multiple child recordsmany_to_one
: Multiple records connect to one parent recordmany_to_many
: Many-to-many relationship
- Best Practices:
- Only define relationships when there’s evidence they exist in the data
- Use descriptive relationship names that reflect business concepts
- Document the business meaning of relationships
- Define relationships bidirectionally when appropriate
- Verify column data types match between related fields
Tips for AI Agents
When generating or modifying Buster configuration files:
-
Start with Basic Structure:
- Define required fields first (data_source_name, schema, database)
- Then add optional configurations as needed
-
Ensure Semantic Clarity:
- Use descriptive names that reflect business concepts
- Include clear descriptions for all components
- Maintain consistent naming conventions (snake_case)
-
Model Organization:
- Organize models by business domain
- Keep related models in the same file
- Split large model collections into multiple files
-
Relationship Mapping:
- Ensure primary and foreign keys are correctly specified
- Define appropriate cardinality for each relationship
- Consider join type implications for queries
-
Metric Definition:
- Use SQL expressions that match the target database dialect
- Ensure calculations are mathematically sound
- Reference fields using correct notation
-
Field Types:
- Specify appropriate data types for dimensions and measures
- Choose correct aggregation types for measures
- Mark primary keys and searchable fields appropriately
-
Validation Checks:
- Ensure all required fields are present
- Verify that references between models are valid
- Check that expressions use correct field names and syntax
-
Avoid Data Assumptions:
- Never make assumptions about specific values in columns
- Use generic pattern descriptions rather than assuming particular statuses, categories, or values
- Consult the actual data or schema documentation before creating filters or metrics
- Create semantic models that adapt to the user’s actual data values, not presumed values
- Don’t assume relationships between models unless clearly evident in the schema
-
Relationship Discovery Guidelines:
- Only create relationships/joins when you have evidence from column names or schema analysis
- Acceptable approaches include:
- Finding matching column patterns (e.g.,
id
in one table,user_id
in another) - Analyzing foreign key constraints from database metadata
- Following explicit instructions or documentation about table relationships
- Finding matching column patterns (e.g.,
- When analyzing multiple models at once, look for conventional patterns like:
- Primary key matches foreign key name (e.g.,
users.id
→orders.user_id
) - Table name incorporated in foreign key (e.g.,
user_id
referencing theusers
table)
- Primary key matches foreign key name (e.g.,
- Always verify relationship assumptions with the user when possible
-
Use dbt Metadata Commands:
- Leverage dbt’s metadata commands to gather accurate schema information without affecting production:
dbt ls
- List all models in the projectdbt docs generate
- Generate documentation (then examine metadata files)dbt parse
- Parse project files without running modelsdbt compile
- Compile SQL without executing itdbt describe
- View upstream/downstream dependencies of a model
- IMPORTANT: Never run commands that execute models (
dbt run
,dbt build
, etc.) - Extract valuable information from dbt metadata such as:
- Column names, types, and descriptions
- Model dependencies and lineage
- Table properties and configurations
- Existing documentation and tests
- Use this metadata to make informed decisions when creating semantic models
Following these guidelines will help create effective Buster configurations that accurately represent business concepts and enable powerful natural language querying of data warehouses.
SQL Data Modeling with dbt
Buster works best when the underlying data models align well with the semantic layer. Using dbt as the primary modeling tool helps create a clean, modular data structure that maps effectively to Buster’s semantic models.
Entity-Based Modeling Approach
The SQL data models should mirror the entity-focused approach of the semantic layer, with tables structured to represent business entities and their relationships.
dbt Model Organization
- Sources: Raw data from warehouse (src_*)
- Staging: Cleaned, typed data (stg_*)
- Intermediate: Business logic processing (int_*)
- Entity Models: Core business entities (e.g., users, products, orders)
- Derived Models: Advanced calculations and aggregations (derived_*)
Recommended File Structure
Entity Model Pattern
Entity Relationship Model Pattern
Derived Model Pattern
Aligning SQL Models with Semantic Layer
When designing SQL models to work optimally with Buster’s semantic layer:
-
Entity-Based Alignment:
- Create entity-based models that directly mirror your semantic models
- Include both dimensional attributes and measures on the entity model
- Structure models around business concepts rather than database structure
-
Field Naming Consistency:
- Use consistent field names between SQL models and semantic models
- Follow snake_case naming convention for all database objects
- Include business-friendly names that accurately describe the data concepts
-
Data Organization:
- Group related entities by business domain
- Keep the model structure flat and intuitive
- Put attributes directly on the relevant entity rather than normalizing excessively
-
Data Type Considerations:
- Choose appropriate SQL data types that align with semantic types
- Use TIMESTAMP types for all date-time fields
- Consider using DECIMAL types with appropriate precision for monetary values
-
Relationship Implementation:
- Ensure entity models contain the necessary keys for relationships
- Use consistent naming for relationship fields (e.g., entity_id pattern)
- Create derived models to materialize complex relationships when needed
-
Performance Optimization:
- Create appropriate indexes for frequently queried fields
- Consider materializing important entities with calculated attributes
- Pre-compute complex derivations to improve query performance
-
Documentation and Testing:
- Include column descriptions in dbt model files
- Add documentation that explains business logic for complex calculations
- Implement dbt tests for primary keys, relationships, and business rules
-
Utilize dbt Metadata for Better Modeling:
- Run
dbt docs generate
to extract rich metadata about the existing models - Examine model structure, column types, and relationships from manifest.json
- Use
dbt compile
to see how models are transformed without executing them - Analyze upstream and downstream dependencies with
dbt describe
- Access descriptions and documentation from existing dbt projects
- Run
By aligning your SQL data modeling approach with Buster’s semantic layer, you create a cohesive data ecosystem where each entity model directly maps to business concepts in the semantic layer, making it intuitive for users to query through Buster.