AI Documentation

This document provides comprehensive guidance for AI agents (Claude, GPT, Cursor, etc.) when working with Buster’s semantic layer and configuration files.

Note for AI Agents: All Buster documentation is available via HTTP requests:

Access any page from https://docs.buster.so/ by appending .md to the URL to get markdown format

For example: https://docs.buster.so/references/semantic-models.md

Use tools like curl or fetch to retrieve documentation as needed

Two specialized endpoints are available:

https://docs.buster.so/llms.txt (concise overview)

https://docs.buster.so/llms-full.txt (comprehensive reference)

What is Buster?

Buster is a platform for building and deploying AI data analysts that connects to various data sources and uses a semantic layer to understand your data structure and business logic. The platform enables natural language querying of your data warehouse with accurate, relevant responses.

Semantic Layer Overview

The semantic layer in Buster is a collection of YAML files that define the following components:

Models: Core business entities that represent objects for analysis (orders, products, customers)
Dimensions: Non-numeric attributes used for grouping, filtering, or segmenting (dates, categories, statuses)
Measures: Quantifiable attributes that can be aggregated (amounts, counts, durations)
Metrics: Calculations and business logic combining measures and dimensions (revenue, conversion rates)
Filters: Named boolean conditions for common query constraints (is_active, recent_orders)
Relationships: Connections between different models (customer-to-orders, orders-to-products)

These components provide Buster with the understanding needed to generate accurate SQL, interpret business metrics consistently, and deliver reliable insights.

Core Semantic Components in Detail

Models

Models define meaningful business entities or concepts in your data. They are the foundation of your semantic layer.

name: orders              # Required: Unique identifier (snake_case)
description: "Order transactions from our e-commerce platform"  # Required: Clear explanation
dimensions:               # Optional: Attributes for slicing/filtering
  # ... dimension definitions ...
measures:                 # Optional: Quantifiable aggregations
  # ... measure definitions ...
metrics:                  # Optional: Derived KPIs/calculations
  # ... metric definitions ...
filters:                  # Optional: Reusable query conditions
  # ... filter definitions ...
relationships:            # Optional: Connections to other models
  # ... relationship definitions ...

Best Practices for Models:

Use snake_case for naming
Provide clear, concise descriptions
Organize by business domain in separate files
Name files after the main model they contain
At least one of dimensions, measures, or metrics is required

Types of Models:

Entity-focused: Represent business objects (customers, products)
Event-focused: Represent occurrences (orders, page views)
Metric-focused: Centered around core business KPIs

Dimensions

Dimensions are the lowest level properties about a data object (non-numeric attributes).

dimensions:
  - name: order_id        # Required: Matches column name in database
    description: "Unique identifier for each order. Used as primary key and for joining to line items."  # Required
    type: string          # Recommended: Data type
    searchable: true      # Optional: For natural language search

Types of Dimensions:

string: Text values (customer_name, product_category)
timestamp: Date and time values (created_at, order_date)
boolean: True/false values (is_active, is_completed)
date: Date without time component
number/integer: Numeric values used as identifiers or categories

Best Practices for Dimensions:

Use descriptive column names that clearly indicate what the field represents
Write descriptions that explain:
1. What the column is (its contents and meaning)
2. Patterns of values that might appear in it
3. Its utility in analysis (how/why it’s used)
Include time-based dimensions for temporal analysis
Mark searchable fields for important attributes
Add descriptive information about the data type and meaning

Measures

Measures are the lowest level numeric properties about a data object that can be aggregated in analytics.

measures:
  - name: revenue         # Required: Matches column name in database
    description: "Total order value in USD. Calculated as price × quantity - discount. Used for revenue calculations."  # Required
    type: decimal         # Required: Raw data type from database

Types of Measures:

decimal/number: Precise numeric values (e.g., price, revenue)
integer: Whole numbers (e.g., quantity, views)

Important Note: The type field for measures represents the raw data type from the database, not an aggregation type. Aggregation methods (SUM, AVG, etc.) are determined at query time or within metric definitions, not in the measure definition itself. Best Practices for Measures:

Use descriptive column names that clearly indicate what the measure represents
Write descriptions that explain:
1. What the measure is (its contents and meaning)
2. How it was calculated (if derived)
3. Its utility in analysis (how/why it’s used)
Include unit of measurement in description when applicable (e.g., “in USD”, “in minutes”)
Specify the correct raw data type that matches your database schema

Metrics

Metrics are high-level calculations derived from dimensions and measures that represent business KPIs.

metrics:
  - name: average_order_value  # Required: Descriptive name
    description: "Average $ amount per order. Key indicator of purchasing behavior."  # Required
    expr: "sum(revenue) / count(order_id)"  # Required: SQL expression
  - name: revenue_by_period
    description: "Revenue over a specified time period. Used for time-based performance analysis."
    expr: "sum(case when created_at >= dateadd(day, -{{days}}, current_date()) then revenue else 0 end)"
    args:
      - name: days
        type: integer
        description: "Number of days to include"

Best Practices for Metrics:

Centralize business logic in metrics
Use descriptive names indicating calculation
Document formula and business significance
Parameterize metrics when appropriate
Ensure all metrics return numeric values
Reference fields using consistent notation
Use table prefixes (e.g., orders.amount) for clarity when referencing fields across models

Filters

Filters are high-level abstractions that provide reusable filtering conditions.

filters:
  - name: is_complete
    description: "Filter for completed orders. Used to analyze finalized transactions."
    expr: "status = 'complete'"  # Required: Boolean expression
  - name: orders_above_amount
    description: "Orders with revenue above threshold. Useful for high-value transaction analysis."
    expr: "revenue > {{min_amount}}"
    args:
      - name: min_amount
        type: number
        description: "Minimum order amount"

Best Practices for Filters:

Define common filtering patterns
Create concept-based filters that describe what the filter does
Use for time-based filters (e.g., recent_period, current_year)
Ensure expressions always evaluate to boolean values
Avoid assumptions about specific values existing in a column
Document any required joins or relationships

Relationships

Relationships define connections between different data models.

relationships:
  - name: customer        # Required: Name of the *other* model to link to
    description: "The customer who placed this order. Enables customer-centric order analysis."  # Required
    source_col: customer_id  # Required: Join key column in *this* model
    ref_col: id           # Required: Join key column in the *related* model
    cardinality: many-to-one  # Optional: Relationship type
    type: left            # Optional: Join type, defaults to left (use kebab-case)

Cardinality Types (kebab-case):

one-to-one: Each record connects to exactly one record
one-to-many: Parent record connects to multiple child records
many-to-one: Multiple records connect to one parent record
many-to-many: Many-to-many relationship (usually requires a junction model)

Join Types (kebab-case):

left: Left join (default)
inner: Inner join
right: Right join
full-outer: Full outer join

Important Note: Cardinality and join type fields help AI agents understand how models should be joined together in queries. If the type is left unspecified, it means the relationship can be joined in multiple ways depending on the query requirements. These fields provide guidance on the most appropriate join strategy for this relationship. Best Practices for Relationships:

Only define relationships when there’s evidence they exist in the data
Use descriptive relationship names that reflect business concepts
Document the business meaning of relationships and their analytical utility
Define relationships bidirectionally when appropriate
Verify column data types match between related fields

Parameter Arguments

Arguments are used to parameterize Metric and Filter expressions, allowing for dynamic input at query time.

args:
  - name: days          # Required: Name used in expression
    type: integer       # Required: Data type
    description: "Number of days to include in the calculation. Controls analysis timeframe."  # Required
    default: 30         # Optional: Default value

Argument Types:

string: Text values
number: Decimal or floating-point values
integer: Whole number values
date: Date values
boolean: True/false values

Guidelines for AI Agents

When generating or modifying Buster configuration files:

Structure and Organization

Start with Basic Structure:
- Define required fields first (name, description)
- Then add optional configurations as needed
Ensure Semantic Clarity:
- Use descriptive names that reflect business concepts
- Include clear descriptions for all components
- Maintain consistent naming conventions (snake_case for most identifiers, kebab-case for relationship types)
Model Organization:
- Organize models by business domain
- Keep related models in the same file
- Split large model collections into multiple files

Relationship and Field Handling

Relationship Mapping:
- Ensure source_col and ref_col are correctly specified
- Define appropriate cardinality for each relationship using kebab-case (one-to-one, many-to-one, etc.)
- Use kebab-case for join types (left, inner, right, full-outer)
- Remember that cardinality and type are guidance for how AI should approach joins
Metric Definition:
- Use SQL expressions that match the target database dialect
- Ensure calculations are mathematically sound
- Reference fields using correct notation
Field Types:
- Specify appropriate data types for dimensions and measures
- Remember that measure types represent raw data types, not aggregation methods
- Mark primary keys and searchable fields appropriately

Data Integrity and Validation

Validation Checks:
- Ensure all required fields are present
- Verify that references between models are valid
- Check that expressions use correct field names and syntax
Avoid Data Assumptions:
- Never make assumptions about specific values in columns
- Use generic pattern descriptions rather than assuming particular statuses, categories, or values
- Consult the actual data or schema documentation before creating filters or metrics
- Create semantic models that adapt to the user’s actual data values, not presumed values
- Don’t assume relationships between models unless clearly evident in the schema
Relationship Discovery Guidelines:
- Only create relationships/joins when you have evidence from column names or schema analysis
- Acceptable approaches include:
  - Finding matching column patterns (e.g., id as a source_col in one table, user_id as a ref_col in another)
  - Analyzing foreign key constraints from database metadata
  - Following explicit instructions or documentation about table relationships
- Always verify relationship assumptions with the user when possible

Description Best Practices

Write Comprehensive Descriptions:
- For dimensions and measures, explain:
  1. What the field represents (its content and meaning)
  2. How it was calculated or derived (if applicable)
  3. Common patterns of values (if relevant)
  4. Its analytical utility (how/why it’s used)
- For metrics, explain:
  1. What the metric measures
  2. Its business significance
  3. How it should be interpreted
- For relationships, explain:
  1. What business connection it represents
  2. How it enables specific types of analysis

Using dbt with Buster

Buster integrates with dbt to leverage your existing data modeling work and transform it into a powerful semantic layer. This integration allows you to maintain a single source of truth for both data transformation and semantic definitions.

dbt Metadata Commands for Buster Integration

When working with dbt and Buster together, use these metadata commands to extract schema information without affecting production data:

dbt ls - Lists all models in your project
dbt docs generate - Creates metadata files about your models and columns
dbt parse - Validates dbt project files without running models
dbt compile - Compiles SQL without executing it
dbt describe - Shows model dependencies

IMPORTANT: Never run commands that execute models (dbt run, dbt build, etc.) as these could affect production data. Stick to metadata commands that only provide information.

Buster CLI Tools for dbt Integration

Buster provides tools to bridge dbt models with semantic layers:

buster generate - Scaffolds semantic models based on your dbt project’s metadata
buster parse - Validates the syntax and integrity of your Buster semantic models
buster deploy - Deploys your semantic layer

Best Practices for dbt + Buster Workflow

For effective integration between dbt and Buster:

Define Rich Metadata in dbt:
- Add detailed descriptions to columns in your dbt models
- Implement tests for primary/foreign keys
- Use consistent naming patterns for join keys
Develop Iteratively:
- Use buster generate to create initial semantic models
- Enhance them with additional metrics and filters
- Validate changes with buster parse
Maintain Consistency:
- Keep naming conventions aligned between dbt and Buster
- Update semantic models when dbt models change
- Consider automation to sync descriptions and relationships

SQL Data Modeling Best Practices for Buster

To create SQL models in dbt that translate effectively to Buster’s semantic layer:

Model Organization

Structure your dbt models to represent business entities and concepts:

Staging Models: Clean and standardize raw data
Intermediate Models: Transform and join data from multiple sources
Entity Models: Represent core business objects like customers, orders, products
Metrics Models: Pre-calculate complex business metrics

Column Naming Best Practices

Use descriptive names that clearly indicate column content and purpose
Follow snake_case naming convention consistently
Include units in names when relevant (e.g., amount_usd, duration_minutes)
Use consistent patterns for similar concepts (all date fields end with _date, etc.)
Avoid generic names like id, name, or value without context

Documentation Best Practices

Write clear, comprehensive descriptions for models and columns
Explain what each column represents, how it’s calculated, and its business purpose
Document any data quality issues or caveats
Note when columns are derived, calculated, or have special handling

Entity Model Design Best Practices

Include attributes that business users commonly need for analysis
Pre-calculate common metrics directly in entity models for performance
Ensure each model has clear primary and foreign keys
Include descriptive attributes along with measures in the same model
Denormalize judiciously to improve query performance and simplify the semantic model

Relationship Design Best Practices

Create clear, intuitive joins between business entities
Use consistent key naming patterns across models
Document relationship cardinality (one-to-many, etc.)
Test relationship integrity with dbt tests
Consider bidirectional relationship documentation to improve semantic understanding

From dbt Model to Buster Semantic Model

Well-structured dbt models map clearly to Buster semantic components:

Non-numeric columns become dimensions
Numeric columns become measures
Foreign key columns indicate relationships
Model descriptions provide context for semantic models
Tests help identify primary keys and relationships

By following these best practices, your dbt models will provide an excellent foundation for Buster’s semantic layer, enabling powerful natural language querying with accurate results.

GETTING STARTED

SEMANTIC LAYER

CONNECTING TO DATA

USING THE CLI

RESOURCES FOR AI AGENTS

DEPLOYMENT

REFERENCES

AI Documentation

What is Buster?

Semantic Layer Overview

Core Semantic Components in Detail

Models

Dimensions

Measures

Metrics

Filters

Relationships

Parameter Arguments

Guidelines for AI Agents

Structure and Organization

Relationship and Field Handling

Data Integrity and Validation

Description Best Practices

Using dbt with Buster

dbt Metadata Commands for Buster Integration

Buster CLI Tools for dbt Integration

Best Practices for dbt + Buster Workflow

SQL Data Modeling Best Practices for Buster

Model Organization

Column Naming Best Practices

Documentation Best Practices

Entity Model Design Best Practices

Relationship Design Best Practices

From dbt Model to Buster Semantic Model

GETTING STARTED

SEMANTIC LAYER

CONNECTING TO DATA

USING THE CLI

RESOURCES FOR AI AGENTS

DEPLOYMENT

REFERENCES

​What is Buster?

​Semantic Layer Overview

​Core Semantic Components in Detail

​Models

​Dimensions

​Measures

​Metrics

​Filters

​Relationships

​Parameter Arguments

​Guidelines for AI Agents

​Structure and Organization

​Relationship and Field Handling

​Data Integrity and Validation

​Description Best Practices

​Using dbt with Buster

​dbt Metadata Commands for Buster Integration

​Buster CLI Tools for dbt Integration

​Best Practices for dbt + Buster Workflow

​SQL Data Modeling Best Practices for Buster

​Model Organization

​Column Naming Best Practices

​Documentation Best Practices

​Entity Model Design Best Practices

​Relationship Design Best Practices

​From dbt Model to Buster Semantic Model

What is Buster?

Semantic Layer Overview

Core Semantic Components in Detail

Models

Dimensions

Measures

Metrics

Filters

Relationships

Parameter Arguments

Guidelines for AI Agents

Structure and Organization

Relationship and Field Handling

Data Integrity and Validation

Description Best Practices

Using dbt with Buster

dbt Metadata Commands for Buster Integration

Buster CLI Tools for dbt Integration

Best Practices for dbt + Buster Workflow

SQL Data Modeling Best Practices for Buster

Model Organization

Column Naming Best Practices

Documentation Best Practices

Entity Model Design Best Practices

Relationship Design Best Practices

From dbt Model to Buster Semantic Model