This document provides comprehensive guidance for AI agents (Claude, GPT, Cursor, etc.) when working with Buster’s semantic layer and configuration files.

Note for AI Agents: All Buster documentation is available via HTTP requests:

What is Buster?

Buster is a platform for building and deploying AI data analysts that connects to various data sources and uses a semantic layer to understand your data structure and business logic. The platform enables natural language querying of your data warehouse with accurate, relevant responses.

Semantic Layer Overview

The semantic layer in Buster is a collection of YAML files that define the following components:

  • Models: Core business entities that represent objects for analysis (orders, products, customers)
  • Dimensions: Non-numeric attributes used for grouping, filtering, or segmenting (dates, categories, statuses)
  • Measures: Quantifiable attributes that can be aggregated (amounts, counts, durations)
  • Metrics: Calculations and business logic combining measures and dimensions (revenue, conversion rates)
  • Filters: Named boolean conditions for common query constraints (is_active, recent_orders)
  • Relationships: Connections between different models (customer-to-orders, orders-to-products)

These components provide Buster with the understanding needed to generate accurate SQL, interpret business metrics consistently, and deliver reliable insights.

Core Semantic Components in Detail

Models

Models define meaningful business entities or concepts in your data. They are the foundation of your semantic layer.

name: orders              # Required: Unique identifier (snake_case)
description: "Order transactions from our e-commerce platform"  # Required: Clear explanation
dimensions:               # Optional: Attributes for slicing/filtering
  # ... dimension definitions ...
measures:                 # Optional: Quantifiable aggregations
  # ... measure definitions ...
metrics:                  # Optional: Derived KPIs/calculations
  # ... metric definitions ...
filters:                  # Optional: Reusable query conditions
  # ... filter definitions ...
relationships:            # Optional: Connections to other models
  # ... relationship definitions ...

Best Practices for Models:

  • Use snake_case for naming
  • Provide clear, concise descriptions
  • Organize by business domain in separate files
  • Name files after the main model they contain
  • At least one of dimensions, measures, or metrics is required

Types of Models:

  1. Entity-focused: Represent business objects (customers, products)
  2. Event-focused: Represent occurrences (orders, page views)
  3. Metric-focused: Centered around core business KPIs

Dimensions

Dimensions are the lowest level properties about a data object (non-numeric attributes).

dimensions:
  - name: order_id        # Required: Matches column name in database
    description: "Unique identifier for each order. Used as primary key and for joining to line items."  # Required
    type: string          # Recommended: Data type
    searchable: true      # Optional: For natural language search

Types of Dimensions:

  • string: Text values (customer_name, product_category)
  • timestamp: Date and time values (created_at, order_date)
  • boolean: True/false values (is_active, is_completed)
  • date: Date without time component
  • number/integer: Numeric values used as identifiers or categories

Best Practices for Dimensions:

  • Use descriptive column names that clearly indicate what the field represents
  • Write descriptions that explain:
    1. What the column is (its contents and meaning)
    2. Patterns of values that might appear in it
    3. Its utility in analysis (how/why it’s used)
  • Include time-based dimensions for temporal analysis
  • Mark searchable fields for important attributes
  • Add descriptive information about the data type and meaning

Measures

Measures are the lowest level numeric properties about a data object that can be aggregated in analytics.

measures:
  - name: revenue         # Required: Matches column name in database
    description: "Total order value in USD. Calculated as price × quantity - discount. Used for revenue calculations."  # Required
    type: decimal         # Required: Raw data type from database

Types of Measures:

  • decimal/number: Precise numeric values (e.g., price, revenue)
  • integer: Whole numbers (e.g., quantity, views)

Important Note: The type field for measures represents the raw data type from the database, not an aggregation type. Aggregation methods (SUM, AVG, etc.) are determined at query time or within metric definitions, not in the measure definition itself.

Best Practices for Measures:

  • Use descriptive column names that clearly indicate what the measure represents
  • Write descriptions that explain:
    1. What the measure is (its contents and meaning)
    2. How it was calculated (if derived)
    3. Its utility in analysis (how/why it’s used)
  • Include unit of measurement in description when applicable (e.g., “in USD”, “in minutes”)
  • Specify the correct raw data type that matches your database schema

Metrics

Metrics are high-level calculations derived from dimensions and measures that represent business KPIs.

metrics:
  - name: average_order_value  # Required: Descriptive name
    description: "Average $ amount per order. Key indicator of purchasing behavior."  # Required
    expr: "sum(revenue) / count(order_id)"  # Required: SQL expression
  - name: revenue_by_period
    description: "Revenue over a specified time period. Used for time-based performance analysis."
    expr: "sum(case when created_at >= dateadd(day, -{{days}}, current_date()) then revenue else 0 end)"
    args:
      - name: days
        type: integer
        description: "Number of days to include"

Best Practices for Metrics:

  • Centralize business logic in metrics
  • Use descriptive names indicating calculation
  • Document formula and business significance
  • Parameterize metrics when appropriate
  • Ensure all metrics return numeric values
  • Reference fields using consistent notation
  • Use table prefixes (e.g., orders.amount) for clarity when referencing fields across models

Filters

Filters are high-level abstractions that provide reusable filtering conditions.

filters:
  - name: is_complete
    description: "Filter for completed orders. Used to analyze finalized transactions."
    expr: "status = 'complete'"  # Required: Boolean expression
  - name: orders_above_amount
    description: "Orders with revenue above threshold. Useful for high-value transaction analysis."
    expr: "revenue > {{min_amount}}"
    args:
      - name: min_amount
        type: number
        description: "Minimum order amount"

Best Practices for Filters:

  • Define common filtering patterns
  • Create concept-based filters that describe what the filter does
  • Use for time-based filters (e.g., recent_period, current_year)
  • Ensure expressions always evaluate to boolean values
  • Avoid assumptions about specific values existing in a column
  • Document any required joins or relationships

Relationships

Relationships define connections between different data models.

relationships:
  - name: customer        # Required: Name of the *other* model to link to
    description: "The customer who placed this order. Enables customer-centric order analysis."  # Required
    source_col: customer_id  # Required: Join key column in *this* model
    ref_col: id           # Required: Join key column in the *related* model
    cardinality: many-to-one  # Optional: Relationship type
    type: left            # Optional: Join type, defaults to left (use kebab-case)

Cardinality Types (kebab-case):

  • one-to-one: Each record connects to exactly one record
  • one-to-many: Parent record connects to multiple child records
  • many-to-one: Multiple records connect to one parent record
  • many-to-many: Many-to-many relationship (usually requires a junction model)

Join Types (kebab-case):

  • left: Left join (default)
  • inner: Inner join
  • right: Right join
  • full-outer: Full outer join

Important Note: Cardinality and join type fields help AI agents understand how models should be joined together in queries. If the type is left unspecified, it means the relationship can be joined in multiple ways depending on the query requirements. These fields provide guidance on the most appropriate join strategy for this relationship.

Best Practices for Relationships:

  • Only define relationships when there’s evidence they exist in the data
  • Use descriptive relationship names that reflect business concepts
  • Document the business meaning of relationships and their analytical utility
  • Define relationships bidirectionally when appropriate
  • Verify column data types match between related fields

Parameter Arguments

Arguments are used to parameterize Metric and Filter expressions, allowing for dynamic input at query time.

args:
  - name: days          # Required: Name used in expression
    type: integer       # Required: Data type
    description: "Number of days to include in the calculation. Controls analysis timeframe."  # Required
    default: 30         # Optional: Default value

Argument Types:

  • string: Text values
  • number: Decimal or floating-point values
  • integer: Whole number values
  • date: Date values
  • boolean: True/false values

Guidelines for AI Agents

When generating or modifying Buster configuration files:

Structure and Organization

  1. Start with Basic Structure:

    • Define required fields first (name, description)
    • Then add optional configurations as needed
  2. Ensure Semantic Clarity:

    • Use descriptive names that reflect business concepts
    • Include clear descriptions for all components
    • Maintain consistent naming conventions (snake_case for most identifiers, kebab-case for relationship types)
  3. Model Organization:

    • Organize models by business domain
    • Keep related models in the same file
    • Split large model collections into multiple files

Relationship and Field Handling

  1. Relationship Mapping:

    • Ensure source_col and ref_col are correctly specified
    • Define appropriate cardinality for each relationship using kebab-case (one-to-one, many-to-one, etc.)
    • Use kebab-case for join types (left, inner, right, full-outer)
    • Remember that cardinality and type are guidance for how AI should approach joins
  2. Metric Definition:

    • Use SQL expressions that match the target database dialect
    • Ensure calculations are mathematically sound
    • Reference fields using correct notation
  3. Field Types:

    • Specify appropriate data types for dimensions and measures
    • Remember that measure types represent raw data types, not aggregation methods
    • Mark primary keys and searchable fields appropriately

Data Integrity and Validation

  1. Validation Checks:

    • Ensure all required fields are present
    • Verify that references between models are valid
    • Check that expressions use correct field names and syntax
  2. Avoid Data Assumptions:

    • Never make assumptions about specific values in columns
    • Use generic pattern descriptions rather than assuming particular statuses, categories, or values
    • Consult the actual data or schema documentation before creating filters or metrics
    • Create semantic models that adapt to the user’s actual data values, not presumed values
    • Don’t assume relationships between models unless clearly evident in the schema
  3. Relationship Discovery Guidelines:

    • Only create relationships/joins when you have evidence from column names or schema analysis
    • Acceptable approaches include:
      • Finding matching column patterns (e.g., id as a source_col in one table, user_id as a ref_col in another)
      • Analyzing foreign key constraints from database metadata
      • Following explicit instructions or documentation about table relationships
    • Always verify relationship assumptions with the user when possible

Description Best Practices

  1. Write Comprehensive Descriptions:
    • For dimensions and measures, explain:
      1. What the field represents (its content and meaning)
      2. How it was calculated or derived (if applicable)
      3. Common patterns of values (if relevant)
      4. Its analytical utility (how/why it’s used)
    • For metrics, explain:
      1. What the metric measures
      2. Its business significance
      3. How it should be interpreted
    • For relationships, explain:
      1. What business connection it represents
      2. How it enables specific types of analysis

Using dbt with Buster

Buster integrates with dbt to leverage your existing data modeling work and transform it into a powerful semantic layer. This integration allows you to maintain a single source of truth for both data transformation and semantic definitions.

dbt Metadata Commands for Buster Integration

When working with dbt and Buster together, use these metadata commands to extract schema information without affecting production data:

  • dbt ls - Lists all models in your project
  • dbt docs generate - Creates metadata files about your models and columns
  • dbt parse - Validates dbt project files without running models
  • dbt compile - Compiles SQL without executing it
  • dbt describe - Shows model dependencies

IMPORTANT: Never run commands that execute models (dbt run, dbt build, etc.) as these could affect production data. Stick to metadata commands that only provide information.

Buster CLI Tools for dbt Integration

Buster provides tools to bridge dbt models with semantic layers:

  • buster generate - Scaffolds semantic models based on your dbt project’s metadata
  • buster parse - Validates the syntax and integrity of your Buster semantic models
  • buster deploy - Deploys your semantic layer

Best Practices for dbt + Buster Workflow

For effective integration between dbt and Buster:

  1. Define Rich Metadata in dbt:

    • Add detailed descriptions to columns in your dbt models
    • Implement tests for primary/foreign keys
    • Use consistent naming patterns for join keys
  2. Develop Iteratively:

    • Use buster generate to create initial semantic models
    • Enhance them with additional metrics and filters
    • Validate changes with buster parse
  3. Maintain Consistency:

    • Keep naming conventions aligned between dbt and Buster
    • Update semantic models when dbt models change
    • Consider automation to sync descriptions and relationships

SQL Data Modeling Best Practices for Buster

To create SQL models in dbt that translate effectively to Buster’s semantic layer:

Model Organization

Structure your dbt models to represent business entities and concepts:

  • Staging Models: Clean and standardize raw data
  • Intermediate Models: Transform and join data from multiple sources
  • Entity Models: Represent core business objects like customers, orders, products
  • Metrics Models: Pre-calculate complex business metrics

Column Naming Best Practices

  • Use descriptive names that clearly indicate column content and purpose
  • Follow snake_case naming convention consistently
  • Include units in names when relevant (e.g., amount_usd, duration_minutes)
  • Use consistent patterns for similar concepts (all date fields end with _date, etc.)
  • Avoid generic names like id, name, or value without context

Documentation Best Practices

  • Write clear, comprehensive descriptions for models and columns
  • Explain what each column represents, how it’s calculated, and its business purpose
  • Document any data quality issues or caveats
  • Note when columns are derived, calculated, or have special handling

Entity Model Design Best Practices

  • Include attributes that business users commonly need for analysis
  • Pre-calculate common metrics directly in entity models for performance
  • Ensure each model has clear primary and foreign keys
  • Include descriptive attributes along with measures in the same model
  • Denormalize judiciously to improve query performance and simplify the semantic model

Relationship Design Best Practices

  • Create clear, intuitive joins between business entities
  • Use consistent key naming patterns across models
  • Document relationship cardinality (one-to-many, etc.)
  • Test relationship integrity with dbt tests
  • Consider bidirectional relationship documentation to improve semantic understanding

From dbt Model to Buster Semantic Model

Well-structured dbt models map clearly to Buster semantic components:

  • Non-numeric columns become dimensions
  • Numeric columns become measures
  • Foreign key columns indicate relationships
  • Model descriptions provide context for semantic models
  • Tests help identify primary keys and relationships

By following these best practices, your dbt models will provide an excellent foundation for Buster’s semantic layer, enabling powerful natural language querying with accurate results.