AI Documentation
This document provides comprehensive guidance for AI agents (Claude, GPT, Cursor, etc.) when working with Buster’s semantic layer and configuration files.
Note for AI Agents: All Buster documentation is available via HTTP requests:
- Access any page from https://docs.buster.so/ by appending
.md
to the URL to get markdown format- For example: https://docs.buster.so/references/semantic-models.md
- Use tools like
curl
orfetch
to retrieve documentation as needed- Two specialized endpoints are available:
- https://docs.buster.so/llms.txt (concise overview)
- https://docs.buster.so/llms-full.txt (comprehensive reference)
What is Buster?
Buster is a platform for building and deploying AI data analysts that connects to various data sources and uses a semantic layer to understand your data structure and business logic. The platform enables natural language querying of your data warehouse with accurate, relevant responses.
Semantic Layer Overview
The semantic layer in Buster is a collection of YAML files that define the following components:
- Models: Core business entities that represent objects for analysis (orders, products, customers)
- Dimensions: Non-numeric attributes used for grouping, filtering, or segmenting (dates, categories, statuses)
- Measures: Quantifiable attributes that can be aggregated (amounts, counts, durations)
- Metrics: Calculations and business logic combining measures and dimensions (revenue, conversion rates)
- Filters: Named boolean conditions for common query constraints (is_active, recent_orders)
- Relationships: Connections between different models (customer-to-orders, orders-to-products)
These components provide Buster with the understanding needed to generate accurate SQL, interpret business metrics consistently, and deliver reliable insights.
Core Semantic Components in Detail
Models
Models define meaningful business entities or concepts in your data. They are the foundation of your semantic layer.
Best Practices for Models:
- Use snake_case for naming
- Provide clear, concise descriptions
- Organize by business domain in separate files
- Name files after the main model they contain
- At least one of dimensions, measures, or metrics is required
Types of Models:
- Entity-focused: Represent business objects (customers, products)
- Event-focused: Represent occurrences (orders, page views)
- Metric-focused: Centered around core business KPIs
Dimensions
Dimensions are the lowest level properties about a data object (non-numeric attributes).
Types of Dimensions:
string
: Text values (customer_name, product_category)timestamp
: Date and time values (created_at, order_date)boolean
: True/false values (is_active, is_completed)date
: Date without time componentnumber
/integer
: Numeric values used as identifiers or categories
Best Practices for Dimensions:
- Use descriptive column names that clearly indicate what the field represents
- Write descriptions that explain:
- What the column is (its contents and meaning)
- Patterns of values that might appear in it
- Its utility in analysis (how/why it’s used)
- Include time-based dimensions for temporal analysis
- Mark searchable fields for important attributes
- Add descriptive information about the data type and meaning
Measures
Measures are the lowest level numeric properties about a data object that can be aggregated in analytics.
Types of Measures:
decimal
/number
: Precise numeric values (e.g., price, revenue)integer
: Whole numbers (e.g., quantity, views)
Important Note: The type
field for measures represents the raw data type from the database, not an aggregation type. Aggregation methods (SUM, AVG, etc.) are determined at query time or within metric definitions, not in the measure definition itself.
Best Practices for Measures:
- Use descriptive column names that clearly indicate what the measure represents
- Write descriptions that explain:
- What the measure is (its contents and meaning)
- How it was calculated (if derived)
- Its utility in analysis (how/why it’s used)
- Include unit of measurement in description when applicable (e.g., “in USD”, “in minutes”)
- Specify the correct raw data type that matches your database schema
Metrics
Metrics are high-level calculations derived from dimensions and measures that represent business KPIs.
Best Practices for Metrics:
- Centralize business logic in metrics
- Use descriptive names indicating calculation
- Document formula and business significance
- Parameterize metrics when appropriate
- Ensure all metrics return numeric values
- Reference fields using consistent notation
- Use table prefixes (e.g.,
orders.amount
) for clarity when referencing fields across models
Filters
Filters are high-level abstractions that provide reusable filtering conditions.
Best Practices for Filters:
- Define common filtering patterns
- Create concept-based filters that describe what the filter does
- Use for time-based filters (e.g., recent_period, current_year)
- Ensure expressions always evaluate to boolean values
- Avoid assumptions about specific values existing in a column
- Document any required joins or relationships
Relationships
Relationships define connections between different data models.
Cardinality Types (kebab-case):
one-to-one
: Each record connects to exactly one recordone-to-many
: Parent record connects to multiple child recordsmany-to-one
: Multiple records connect to one parent recordmany-to-many
: Many-to-many relationship (usually requires a junction model)
Join Types (kebab-case):
left
: Left join (default)inner
: Inner joinright
: Right joinfull-outer
: Full outer join
Important Note: Cardinality and join type fields help AI agents understand how models should be joined together in queries. If the type
is left unspecified, it means the relationship can be joined in multiple ways depending on the query requirements. These fields provide guidance on the most appropriate join strategy for this relationship.
Best Practices for Relationships:
- Only define relationships when there’s evidence they exist in the data
- Use descriptive relationship names that reflect business concepts
- Document the business meaning of relationships and their analytical utility
- Define relationships bidirectionally when appropriate
- Verify column data types match between related fields
Parameter Arguments
Arguments are used to parameterize Metric
and Filter
expressions, allowing for dynamic input at query time.
Argument Types:
string
: Text valuesnumber
: Decimal or floating-point valuesinteger
: Whole number valuesdate
: Date valuesboolean
: True/false values
Guidelines for AI Agents
When generating or modifying Buster configuration files:
Structure and Organization
-
Start with Basic Structure:
- Define required fields first (name, description)
- Then add optional configurations as needed
-
Ensure Semantic Clarity:
- Use descriptive names that reflect business concepts
- Include clear descriptions for all components
- Maintain consistent naming conventions (snake_case for most identifiers, kebab-case for relationship types)
-
Model Organization:
- Organize models by business domain
- Keep related models in the same file
- Split large model collections into multiple files
Relationship and Field Handling
-
Relationship Mapping:
- Ensure
source_col
andref_col
are correctly specified - Define appropriate cardinality for each relationship using kebab-case (
one-to-one
,many-to-one
, etc.) - Use kebab-case for join types (
left
,inner
,right
,full-outer
) - Remember that cardinality and type are guidance for how AI should approach joins
- Ensure
-
Metric Definition:
- Use SQL expressions that match the target database dialect
- Ensure calculations are mathematically sound
- Reference fields using correct notation
-
Field Types:
- Specify appropriate data types for dimensions and measures
- Remember that measure types represent raw data types, not aggregation methods
- Mark primary keys and searchable fields appropriately
Data Integrity and Validation
-
Validation Checks:
- Ensure all required fields are present
- Verify that references between models are valid
- Check that expressions use correct field names and syntax
-
Avoid Data Assumptions:
- Never make assumptions about specific values in columns
- Use generic pattern descriptions rather than assuming particular statuses, categories, or values
- Consult the actual data or schema documentation before creating filters or metrics
- Create semantic models that adapt to the user’s actual data values, not presumed values
- Don’t assume relationships between models unless clearly evident in the schema
-
Relationship Discovery Guidelines:
- Only create relationships/joins when you have evidence from column names or schema analysis
- Acceptable approaches include:
- Finding matching column patterns (e.g.,
id
as asource_col
in one table,user_id
as aref_col
in another) - Analyzing foreign key constraints from database metadata
- Following explicit instructions or documentation about table relationships
- Finding matching column patterns (e.g.,
- Always verify relationship assumptions with the user when possible
Description Best Practices
- Write Comprehensive Descriptions:
- For dimensions and measures, explain:
- What the field represents (its content and meaning)
- How it was calculated or derived (if applicable)
- Common patterns of values (if relevant)
- Its analytical utility (how/why it’s used)
- For metrics, explain:
- What the metric measures
- Its business significance
- How it should be interpreted
- For relationships, explain:
- What business connection it represents
- How it enables specific types of analysis
- For dimensions and measures, explain:
Using dbt with Buster
Buster integrates with dbt to leverage your existing data modeling work and transform it into a powerful semantic layer. This integration allows you to maintain a single source of truth for both data transformation and semantic definitions.
dbt Metadata Commands for Buster Integration
When working with dbt and Buster together, use these metadata commands to extract schema information without affecting production data:
dbt ls
- Lists all models in your projectdbt docs generate
- Creates metadata files about your models and columnsdbt parse
- Validates dbt project files without running modelsdbt compile
- Compiles SQL without executing itdbt describe
- Shows model dependencies
IMPORTANT: Never run commands that execute models (dbt run
, dbt build
, etc.) as these could affect production data. Stick to metadata commands that only provide information.
Buster CLI Tools for dbt Integration
Buster provides tools to bridge dbt models with semantic layers:
buster generate
- Scaffolds semantic models based on your dbt project’s metadatabuster parse
- Validates the syntax and integrity of your Buster semantic modelsbuster deploy
- Deploys your semantic layer
Best Practices for dbt + Buster Workflow
For effective integration between dbt and Buster:
-
Define Rich Metadata in dbt:
- Add detailed descriptions to columns in your dbt models
- Implement tests for primary/foreign keys
- Use consistent naming patterns for join keys
-
Develop Iteratively:
- Use
buster generate
to create initial semantic models - Enhance them with additional metrics and filters
- Validate changes with
buster parse
- Use
-
Maintain Consistency:
- Keep naming conventions aligned between dbt and Buster
- Update semantic models when dbt models change
- Consider automation to sync descriptions and relationships
SQL Data Modeling Best Practices for Buster
To create SQL models in dbt that translate effectively to Buster’s semantic layer:
Model Organization
Structure your dbt models to represent business entities and concepts:
- Staging Models: Clean and standardize raw data
- Intermediate Models: Transform and join data from multiple sources
- Entity Models: Represent core business objects like customers, orders, products
- Metrics Models: Pre-calculate complex business metrics
Column Naming Best Practices
- Use descriptive names that clearly indicate column content and purpose
- Follow snake_case naming convention consistently
- Include units in names when relevant (e.g.,
amount_usd
,duration_minutes
) - Use consistent patterns for similar concepts (all date fields end with
_date
, etc.) - Avoid generic names like
id
,name
, orvalue
without context
Documentation Best Practices
- Write clear, comprehensive descriptions for models and columns
- Explain what each column represents, how it’s calculated, and its business purpose
- Document any data quality issues or caveats
- Note when columns are derived, calculated, or have special handling
Entity Model Design Best Practices
- Include attributes that business users commonly need for analysis
- Pre-calculate common metrics directly in entity models for performance
- Ensure each model has clear primary and foreign keys
- Include descriptive attributes along with measures in the same model
- Denormalize judiciously to improve query performance and simplify the semantic model
Relationship Design Best Practices
- Create clear, intuitive joins between business entities
- Use consistent key naming patterns across models
- Document relationship cardinality (one-to-many, etc.)
- Test relationship integrity with dbt tests
- Consider bidirectional relationship documentation to improve semantic understanding
From dbt Model to Buster Semantic Model
Well-structured dbt models map clearly to Buster semantic components:
- Non-numeric columns become dimensions
- Numeric columns become measures
- Foreign key columns indicate relationships
- Model descriptions provide context for semantic models
- Tests help identify primary keys and relationships
By following these best practices, your dbt models will provide an excellent foundation for Buster’s semantic layer, enabling powerful natural language querying with accurate results.