# BigQuery
Source: https://docs.buster.so/data_sources/bigquery
To connect Buster to your BigQuery project, you will need the following credentials:
| Field | Description | Required |
| -------------------- | --------------------------------------------------- | -------- |
| `credentials_json` | Your Google Cloud service account key JSON. | Yes |
| `default_project_id` | The default Google Cloud Project ID to use. | Yes |
| `default_dataset_id` | The default BigQuery Dataset ID within the project. | Yes |
You may need to allowlist Buster's IPs: `44.219.39.124`, `34.230.173.35`, `100.26.25.127`
# ClickHouse
Source: https://docs.buster.so/data_sources/clickhouse
To connect Buster to your ClickHouse database (using the MySQL protocol), you will need the following credentials:
| Field | Description | Required |
| ------------------ | ------------------------------------------------------------------------ | -------- |
| `host` | The hostname of your database. | Yes |
| `port` | The port your database listens on (typically `9004` for MySQL protocol). | Yes |
| `username` | The username for database access. | Yes |
| `password` | The password for database access. | Yes |
| `default_database` | The name of the database to connect to. | Yes |
You may need to allowlist Buster's IPs: `44.219.39.124`, `34.230.173.35`, `100.26.25.127`
# Databricks
Source: https://docs.buster.so/data_sources/databricks
To connect Buster to your Databricks workspace, you will need the following credentials:
| Field | Description | Required |
| ----------------- | ------------------------------------------------------------------------ | -------- |
| `host` | The Databricks workspace URL (e.g., `dbc-a1b234c.cloud.databricks.com`). | Yes |
| `api_key` | Your Databricks personal access token (PAT). | Yes |
| `warehouse_id` | The HTTP path of your Databricks SQL Warehouse. | Yes |
| `default_catalog` | The default catalog to use within Databricks. | Yes |
| `default_schema` | The default schema to use within the catalog. | Yes |
You may need to allowlist Buster's IPs: `44.219.39.124`, `34.230.173.35`, `100.26.25.127`
# MySQL
Source: https://docs.buster.so/data_sources/mysql
To connect Buster to your MySQL database, you will need the following credentials:
| Field | Description | Required |
| ------------------ | --------------------------------------- | -------- |
| `host` | The hostname of your database. | Yes |
| `port` | The port your database listens on. | Yes |
| `username` | The username for database access. | Yes |
| `password` | The password for database access. | Yes |
| `default_database` | The name of the database to connect to. | Yes |
You may need to allowlist Buster's IPs: `44.219.39.124`, `34.230.173.35`, `100.26.25.127`
# Overview
Source: https://docs.buster.so/data_sources/overview
## Connecting to Data Sources
Buster supports a wide range of databases and data warehouses. Typically, you'll need to:
1. Create a Buster user with Read-Only access to the data source.
2. Create the connection in Buster via CLI or the Buster UI.
## Setup Guides
}
title="BigQuery"
href="bigquery"
>
Learn how to connect Buster with your Google BigQuery warehouse.
Clickhouse Streamline Icon: https://streamlinehq.comClickHouse} title="ClickHouse" href="clickhouse">
Get started with connecting Buster to your ClickHouse instance.
}
title="Databricks"
href="databricks"
>
See how to link Buster with your Databricks Lakehouse Platform.
}
title="MySQL"
href="mysql"
>
Learn the process of connecting Buster to your MySQL database.
}
title="Postgres"
href="postgres"
>
See how to set up Buster with your PostgreSQL database.
}
title="Redshift"
href="redshift"
>
Follow our guide to integrate Buster with your Amazon Redshift warehouse.
}
title="Snowflake"
href="snowflake"
>
Discover the steps to connect Buster to your Snowflake Data Cloud.
}
title="StarRocks"
href="starrocks"
>
Find out how to integrate Buster with your StarRocks environment.
}
title="Supabase"
href="supabase"
>
Set up a connection between Buster and your Supabase Postgres database.
# Postgres
Source: https://docs.buster.so/data_sources/postgres
To connect Buster to your Postgres database, you will need the following credentials:
| Field | Description | Required |
| ------------------ | ---------------------------------------------- | -------- |
| `host` | The hostname of your database. | Yes |
| `port` | The port your database listens on. | Yes |
| `username` | The username for database access. | Yes |
| `password` | The password for database access. | Yes |
| `default_database` | The name of the database to connect to. | Yes |
| `default_schema` | The default schema to use within the database. | Yes |
You may need to allowlist Buster's IPs: `44.219.39.124`, `34.230.173.35`, `100.26.25.127`
# Redshift
Source: https://docs.buster.so/data_sources/redshift
To connect Buster to your Redshift cluster, you will need the following credentials:
| Field | Description | Required |
| ------------------ | ---------------------------------------------- | -------- |
| `host` | The endpoint of your cluster. | Yes |
| `port` | The port your cluster listens on. | Yes |
| `username` | The username for database access. | Yes |
| `password` | The password for database access. | Yes |
| `default_database` | The name of the database to connect to. | Yes |
| `default_schema` | The default schema to use within the database. | Yes |
You may need to allowlist Buster's IPs: `44.219.39.124`, `34.230.173.35`, `100.26.25.127`
# Snowflake
Source: https://docs.buster.so/data_sources/snowflake
To connect Buster to your Snowflake account, you will need the following credentials:
| Field | Description | Required |
| ------------------ | ---------------------------------------------- | -------- |
| `account_id` | Your Snowflake account identifier. | Yes |
| `warehouse_id` | The name of the Snowflake warehouse to use. | Yes |
| `username` | The username for Snowflake access. | Yes |
| `password` | The password for Snowflake access. | Yes |
| `role` | The Snowflake role to use (optional). | No |
| `default_database` | The default database to use within Snowflake. | Yes |
| `default_schema` | The default schema to use within the database. | Yes |
You may need to allowlist Buster's IPs: `44.219.39.124`, `34.230.173.35`, `100.26.25.127`
# StarRocks
Source: https://docs.buster.so/data_sources/starrocks
To connect Buster to your StarRocks database (which uses the MySQL protocol), you will need the following credentials:
| Field | Description | Required |
| ------------------ | ------------------------------------------------------------------------ | -------- |
| `host` | The hostname of your database. | Yes |
| `port` | The port your database listens on (typically `9030` for MySQL protocol). | Yes |
| `username` | The username for database access. | Yes |
| `password` | The password for database access. | Yes |
| `default_database` | The name of the database to connect to. | Yes |
You may need to allowlist Buster's IPs: `44.219.39.124`, `34.230.173.35`, `100.26.25.127`
# Supabase
Source: https://docs.buster.so/data_sources/supabase
To connect Buster to your Supabase project (which uses Postgres), you will need the following credentials:
| Field | Description | Required |
| ------------------ | ---------------------------------------------- | -------- |
| `host` | The hostname of your database. | Yes |
| `port` | The port your database listens on. | Yes |
| `username` | The username for database access. | Yes |
| `password` | The password for database access. | Yes |
| `default_database` | The name of the database to connect to. | Yes |
| `default_schema` | The default schema to use within the database. | Yes |
You may need to allowlist Buster's IPs: `44.219.39.124`, `34.230.173.35`, `100.26.25.127`
# Cursor Rules, Claude Code, etc.
Source: https://docs.buster.so/docs/ai-resources/ai-rules-doc
# Buster AI Agent Guidelines
This document provides guidance for AI agents (Claude, Windsurf, Cursor, etc.) when working with Buster configuration files and semantic models.
> **Note for AI Agents**: All Buster documentation is available via HTTP requests. Simply:
>
> * Access any page from [https://docs.buster.so/](https://docs.buster.so/) by appending `.md` to the URL to get markdown format
> * For example: [https://docs.buster.so/references/semantic-models.md](https://docs.buster.so/references/semantic-models.md)
> * Use tools like `curl` or `fetch` to retrieve documentation as needed
> * Two specialized endpoints are available:
> * [https://docs.buster.so/llms.txt](https://docs.buster.so/llms.txt) (concise overview)
> * [https://docs.buster.so/llms-full.txt](https://docs.buster.so/llms-full.txt) (comprehensive reference)
## What is Buster?
Buster is a platform for building and deploying AI data analysts that connects to various data sources and uses a semantic layer to understand your data structure and business logic. The platform enables natural language querying of your data warehouse with accurate, relevant responses.
## Core Components
### buster.yml
This is the primary configuration file that defines your Buster project settings.
#### Basic Structure
```yaml
data_source_name: demo_db
schema: analytics
database: buster
model_paths:
- models/
semantic_model_paths: # Optional: Custom paths for semantic models
- semantic_models/
```
#### Multi-Project Structure
Buster.yml now supports configuring multiple projects in a single file:
```yaml
projects:
- path: .
data_source_name: new_one
schema: ont-ont
database: postgres
model_paths:
- models/
semantic_model_paths: # Optional: Custom paths for semantic models
- semantic_models/
- path: ./another-project
data_source_name: another_db
schema: analytics
database: warehouse
model_paths:
- models/
semantic_model_paths:
- semantic_models/
```
### Semantic Models
Semantic models define a business-friendly abstraction layer over your data warehouse. They are organized in YAML files and consist of these key components:
#### Model Structure
```yaml
models:
- name: orders # Required: Unique identifier (snake_case)
description: "Order transactions from our e-commerce platform" # Required: Clear explanation
dimensions: # Optional: Attributes for slicing/filtering
- name: order_id
description: "Unique identifier for each order"
type: string
primary_key: true
- name: created_at
description: "When the order was created"
type: timestamp
measures: # Optional: Quantifiable aggregations
- name: revenue
description: "Total order value"
type: decimal
agg: sum
metrics: # Optional: Derived KPIs/calculations
- name: average_order_value
description: "Average $ amount per order"
expr: "sum(revenue) / count(order_id)"
- name: revenue_by_period
description: "Revenue over a specified time period"
expr: "sum(case when created_at >= dateadd(day, -{{days}}, current_date()) then revenue else 0 end)"
args:
- name: days
type: integer
description: "Number of days to include"
default: 30
filters: # Optional: Reusable query conditions
- name: is_complete
description: "Filter for completed orders"
expr: "status = 'complete'" # The exact value depends on your data
- name: orders_above_amount
description: "Orders with revenue above threshold"
expr: "revenue > {{min_amount}}"
args:
- name: min_amount
type: number
description: "Minimum order amount"
default: 100
relationships: # Optional: Relationships to other models (previously called entities)
- name: customer
description: "The customer who placed this order"
primary_key: id # Field in the related customer model
foreign_key: customer_id # Field in this orders model
cardinality: many_to_one # Many orders to one customer
type: LEFT # LEFT JOIN is the default
```
## Key Concepts for Effective Models
### Models
* **Purpose**: Define meaningful business concepts or entities
* **Types**:
* **Entity-focused**: Representing business objects (customers, products)
* **Event-focused**: Representing occurrences (orders, page views)
* **Metric-focused**: Centered around core business KPIs
* **Best Practices**:
* Use snake\_case for naming
* Provide clear, concise descriptions
* Organize by business domain in separate files
* Name files after the main model they contain
## Data Model Components Hierarchy
Buster's semantic layer organizes data into a logical hierarchy:
### Dimensions
* **Definition**: The lowest level properties about a data object (non-numeric attributes)
* **Purpose**: Attributes used for filtering, grouping, or display
* **Examples**: Names, dates, IDs, categories, locations, statuses
* **Common Types**:
* `string`: Text values (e.g., customer\_name, product\_category)
* `timestamp`: Date and time values (e.g., created\_at, order\_date)
* `boolean`: True/false values (e.g., is\_active, is\_completed)
* **Properties**:
* `name`: Identifier (required) - **Should match the exact column name in your database**
* `description`: Explanation (required)
* `type`: Data type (recommended)
* `primary_key`: Boolean designating unique identifier (optional)
* `searchable`: Boolean for natural language search (optional)
* `expr`: SQL expression (optional) - **If not provided, the dimension name is used as the column identifier**
* **Best Practices**:
* Use column names as dimension names for direct mapping to your database
* Include time-based dimensions for temporal analysis
* Mark primary keys for joining and uniqueness
* Mark searchable fields for important attributes
* Use expr only when transformation is needed from raw data
### Measures
* **Definition**: The lowest level numeric properties about a data object
* **Purpose**: Raw quantitative values that can be aggregated
* **Examples**: Amounts, counts, durations, prices, quantities
* **Common Types**:
* `decimal`: Precise numeric values (e.g., price, revenue)
* `integer`: Whole numbers (e.g., quantity, views)
* **Aggregation Types**:
* `sum`: Add values together
* `avg`: Calculate average
* `count`: Count occurrences
* `min`: Find minimum value
* `max`: Find maximum value
* `count_distinct`: Count unique values
* **Properties**:
* `name`: Identifier (required) - **Should match the exact column name in your database**
* `description`: Explanation (required)
* `type`: Data type (recommended)
* `agg`: Aggregation type (required)
* `expr`: SQL expression (optional) - **If not provided, the measure name is used as the column identifier**
* **Best Practices**:
* Use column names as measure names for direct mapping to your database
* Choose appropriate aggregation for business meaning
* Use clear, specific names describing the quantity
* Include unit of measurement in description when applicable
* Use expr only when transformation is needed from raw data
### Metrics
* **Definition**: High-level abstractions or calculations derived from dimensions and measures
* **Purpose**: Complex business KPIs and calculations
* **Examples**: Average order value, conversion rate, retention rate, year-over-year growth
* **Properties**:
* `name`: Identifier (required) - Descriptive name for the calculation
* `description`: Explanation (required)
* `expr`: SQL expression (optional) - **Must always return a numeric value**
* **If not provided, uses the column with the same name as the metric**
* `args`: Parameters for dynamic metrics (optional)
* **Syntax**:
* Use standard SQL expressions
* Reference fields from the same model directly by name
* Reference fields from other models using dot notation
* **Best Practices**:
* Centralize business logic in metrics
* Use descriptive names indicating calculation
* If referencing a single column directly, name the metric after the column
* Document formula and business significance
* Parameterize metrics when appropriate
* Ensure all metrics return numeric values
### Filters
* **Definition**: High-level abstractions that provide reusable filtering conditions
* **Purpose**: Pre-defined conditions that must resolve to a boolean
* **Examples**: is\_active\_customer, recent\_orders, high\_value\_transaction
* **Properties**:
* `name`: Identifier (required) - Descriptive name for the filter
* `description`: Explanation (required)
* `expr`: SQL expression (optional) - **Must always resolve to TRUE or FALSE**
* **If not provided, uses the boolean column with the same name as the filter**
* **Syntax**:
* Use standard SQL boolean expressions
* Can reference dimensions and measures from the model directly by name
* Common patterns include comparison operators (equals, greater than, less than, IN, BETWEEN)
* **Best Practices**:
* Define common filtering patterns without assuming specific values in the data
* If referencing a single boolean column directly, name the filter after the column
* Use for time-based filters (e.g., recent\_period, current\_year)
* Create concept-based filters that describe what the filter does, not specific values
* Ensure expressions always evaluate to boolean values
* Avoid assuming specific values exist in a column; consult actual data first
### Relationships
* **Definition**: Connections between different data models
* **Purpose**: Define how models relate to each other
* **Examples**: users-to-orders, products-to-reviews
* **Properties**:
* `name`: Relationship name (required)
* `primary_key`: Unique identifier field (required)
* `foreign_key`: Reference field in format `model.field` (required)
* `cardinality`: Type of relationship (recommended)
* `join_type`: SQL join type (optional, defaults to LEFT)
* **Cardinality Types**:
* `one_to_one`: Each record connects to exactly one record
* `one_to_many`: Parent record connects to multiple child records
* `many_to_one`: Multiple records connect to one parent record
* `many_to_many`: Many-to-many relationship
* **Best Practices**:
* Only define relationships when there's evidence they exist in the data
* Use descriptive relationship names that reflect business concepts
* Document the business meaning of relationships
* Define relationships bidirectionally when appropriate
* Verify column data types match between related fields
## Tips for AI Agents
When generating or modifying Buster configuration files:
1. **Start with Basic Structure**:
* Define required fields first (data\_source\_name, schema, database)
* Then add optional configurations as needed
2. **Ensure Semantic Clarity**:
* Use descriptive names that reflect business concepts
* Include clear descriptions for all components
* Maintain consistent naming conventions (snake\_case)
3. **Model Organization**:
* Organize models by business domain
* Keep related models in the same file
* Split large model collections into multiple files
4. **Relationship Mapping**:
* Ensure primary and foreign keys are correctly specified
* Define appropriate cardinality for each relationship
* Consider join type implications for queries
5. **Metric Definition**:
* Use SQL expressions that match the target database dialect
* Ensure calculations are mathematically sound
* Reference fields using correct notation
6. **Field Types**:
* Specify appropriate data types for dimensions and measures
* Choose correct aggregation types for measures
* Mark primary keys and searchable fields appropriately
7. **Validation Checks**:
* Ensure all required fields are present
* Verify that references between models are valid
* Check that expressions use correct field names and syntax
8. **Avoid Data Assumptions**:
* Never make assumptions about specific values in columns
* Use generic pattern descriptions rather than assuming particular statuses, categories, or values
* Consult the actual data or schema documentation before creating filters or metrics
* Create semantic models that adapt to the user's actual data values, not presumed values
* Don't assume relationships between models unless clearly evident in the schema
9. **Relationship Discovery Guidelines**:
* Only create relationships/joins when you have evidence from column names or schema analysis
* Acceptable approaches include:
* Finding matching column patterns (e.g., `id` in one table, `user_id` in another)
* Analyzing foreign key constraints from database metadata
* Following explicit instructions or documentation about table relationships
* When analyzing multiple models at once, look for conventional patterns like:
* Primary key matches foreign key name (e.g., `users.id` → `orders.user_id`)
* Table name incorporated in foreign key (e.g., `user_id` referencing the `users` table)
* Always verify relationship assumptions with the user when possible
10. **Use dbt Metadata Commands**:
* Leverage dbt's metadata commands to gather accurate schema information without affecting production:
* `dbt ls` - List all models in the project
* `dbt docs generate` - Generate documentation (then examine metadata files)
* `dbt parse` - Parse project files without running models
* `dbt compile` - Compile SQL without executing it
* `dbt describe` - View upstream/downstream dependencies of a model
* **IMPORTANT**: Never run commands that execute models (`dbt run`, `dbt build`, etc.)
* Extract valuable information from dbt metadata such as:
* Column names, types, and descriptions
* Model dependencies and lineage
* Table properties and configurations
* Existing documentation and tests
* Use this metadata to make informed decisions when creating semantic models
Following these guidelines will help create effective Buster configurations that accurately represent business concepts and enable powerful natural language querying of data warehouses.
## SQL Data Modeling with dbt
Buster works best when the underlying data models align well with the semantic layer. Using dbt as the primary modeling tool helps create a clean, modular data structure that maps effectively to Buster's semantic models.
### Entity-Based Modeling Approach
The SQL data models should mirror the entity-focused approach of the semantic layer, with tables structured to represent business entities and their relationships.
#### dbt Model Organization
* **Sources**: Raw data from warehouse (src\_\*)
* **Staging**: Cleaned, typed data (stg\_\*)
* **Intermediate**: Business logic processing (int\_\*)
* **Entity Models**: Core business entities (e.g., users, products, orders)
* **Derived Models**: Advanced calculations and aggregations (derived\_\*)
#### Recommended File Structure
```
models/
├── domains/ # Business domain models
│ ├── customers/ # Customer domain
│ │ ├── users.sql # User entity model
│ │ ├── accounts.sql # Account entity model
│ │ └── ...
│ ├── sales/ # Sales domain
│ │ ├── orders.sql # Order entity model
│ │ ├── products.sql # Product entity model
│ │ └── ...
│ └── ...
├── derived/ # Derived metrics and aggregations
│ ├── derived_user_metrics.sql
│ ├── derived_sales_performance.sql
│ └── ...
├── intermediate/ # Transformation steps
│ ├── int_order_items.sql
│ └── ...
└── staging/ # Cleaned source data
├── stg_source1.sql
├── stg_source2.sql
└── ...
```
#### Entity Model Pattern
```sql
-- users.sql
WITH source AS (
SELECT * FROM {{ ref('stg_users') }}
),
-- Additional transformation/enrichment CTEs as needed
final AS (
SELECT
id, -- Primary key
first_name, -- Identity attributes
last_name,
email,
segment, -- Categorization attributes
created_at, -- Temporal attributes
updated_at,
-- Include both dimensional and measure fields directly on the entity
lifetime_value, -- Measure: calculated or aggregated values
is_active, -- Dimension: status flags
last_login_date -- Dimension: activity timestamps
-- All attributes related to this entity in one model
FROM source
LEFT JOIN {{ ref('int_user_activity') }} USING (id) -- Join intermediate calculations if needed
)
SELECT * FROM final
```
#### Entity Relationship Model Pattern
```sql
-- orders.sql
WITH order_source AS (
SELECT * FROM {{ ref('stg_orders') }}
),
order_items AS (
SELECT * FROM {{ ref('stg_order_items') }}
),
-- Join or additional transformation CTEs
final AS (
SELECT
id, -- Primary key
user_id, -- Relationship to users entity
created_at, -- Event timestamp
status, -- State attributes
-- Include both dimensional and measure fields
items_count, -- Measure
order_total, -- Measure
tax_amount, -- Measure
shipping_amount, -- Measure
payment_method, -- Dimension
channel, -- Dimension
is_first_purchase -- Calculated dimension
FROM order_source
LEFT JOIN order_items USING (id)
-- Additional joins as needed
)
SELECT * FROM final
```
#### Derived Model Pattern
```sql
-- derived_sales_performance.sql
WITH user_orders AS (
SELECT
user_id,
SUM(order_total) as total_spent,
COUNT(id) as order_count,
MIN(created_at) as first_order_date,
MAX(created_at) as last_order_date
FROM {{ ref('orders') }} -- Reference to entity model
WHERE status = 'complete' -- Status depends on your data
GROUP BY 1
),
user_attributes AS (
SELECT
id as user_id,
segment,
created_at as joined_date
FROM {{ ref('users') }} -- Reference to entity model
),
final AS (
SELECT
user_attributes.user_id,
segment,
joined_date,
-- Entity fields from orders
total_spent,
order_count,
first_order_date,
last_order_date,
-- Derived calculations
total_spent / NULLIF(order_count, 0) as average_order_value,
DATEDIFF('day', first_order_date, last_order_date) as customer_tenure_days,
DATEDIFF('day', last_order_date, CURRENT_DATE()) as days_since_last_order,
-- Complex derived metrics
CASE
WHEN days_since_last_order <= 30 THEN 'active'
WHEN days_since_last_order <= 90 THEN 'at_risk'
ELSE 'inactive'
END as activity_status
FROM user_attributes
LEFT JOIN user_orders USING (user_id)
)
SELECT * FROM final
```
### Aligning SQL Models with Semantic Layer
When designing SQL models to work optimally with Buster's semantic layer:
1. **Entity-Based Alignment**:
* Create entity-based models that directly mirror your semantic models
* Include both dimensional attributes and measures on the entity model
* Structure models around business concepts rather than database structure
2. **Field Naming Consistency**:
* Use consistent field names between SQL models and semantic models
* Follow snake\_case naming convention for all database objects
* Include business-friendly names that accurately describe the data concepts
3. **Data Organization**:
* Group related entities by business domain
* Keep the model structure flat and intuitive
* Put attributes directly on the relevant entity rather than normalizing excessively
4. **Data Type Considerations**:
* Choose appropriate SQL data types that align with semantic types
* Use TIMESTAMP types for all date-time fields
* Consider using DECIMAL types with appropriate precision for monetary values
5. **Relationship Implementation**:
* Ensure entity models contain the necessary keys for relationships
* Use consistent naming for relationship fields (e.g., entity\_id pattern)
* Create derived models to materialize complex relationships when needed
6. **Performance Optimization**:
* Create appropriate indexes for frequently queried fields
* Consider materializing important entities with calculated attributes
* Pre-compute complex derivations to improve query performance
7. **Documentation and Testing**:
* Include column descriptions in dbt model files
* Add documentation that explains business logic for complex calculations
* Implement dbt tests for primary keys, relationships, and business rules
8. **Utilize dbt Metadata for Better Modeling**:
* Run `dbt docs generate` to extract rich metadata about the existing models
* Examine model structure, column types, and relationships from manifest.json
* Use `dbt compile` to see how models are transformed without executing them
* Analyze upstream and downstream dependencies with `dbt describe`
* Access descriptions and documentation from existing dbt projects
By aligning your SQL data modeling approach with Buster's semantic layer, you create a cohesive data ecosystem where each entity model directly maps to business concepts in the semantic layer, making it intuitive for users to query through Buster.
{/* Note: AI agents should never run commands that execute models (dbt run, dbt build, etc.) as these could affect production data. Stick to metadata commands that only provide information. */}
# How does it work?
Source: https://docs.buster.so/docs/core-concepts/how-does-buster-work

Buster operates through a systematic process that mirrors how a data analyst might approach a request, leveraging a specialized data catalog and an intelligent execution engine. Here's a breakdown of the key stages:
### 1. Data Catalog Foundation

At its core, Buster relies on a self-maintained **Data Catalog**. This isn't just a passive inventory; it's the knowledge base Buster uses to understand your data landscape.
* **Catalog Components:** The catalog integrates various data artifacts, primarily:
* **Models:** These often originate from data modeling tools like dbt, representing the structured tables and views containing your data. Buster offers [native integration with dbt](/docs/getting-started/dbt-integration) to seamlessly transform your dbt models into semantic models.
* **Semantic Definitions:** Accompanying these models are definition files (commonly YAML). These files enrich the raw models with business context and semantic meaning, defining elements like:
* Dimensions and Measures
* Metrics
* Filters
* Enumerations (valid values for fields)
* Other common semantic layer components.
* **Deployment:** This rich context, forming the data catalog, is actively deployed *to* Buster. Currently, this is done using a Command Line Interface (CLI) tool. Typically, you'll run this deployment from within the repository where you manage your data models (e.g., your dbt project directory), ensuring Buster always has the latest understanding of your data structure and semantics.
### 2. Answering User Questions

When presented with a user request, Buster follows a structured approach to find and plan the necessary actions:
* **Intelligent Search:** Buster doesn't just do keyword matching. It performs a **semantic search** across its data catalog. This process is designed to mimic how a human analyst would intuitively search for the most relevant tables, columns, metrics, and definitions needed to address the specific question asked.
* **Planning Phase:** Once Buster identifies the potentially relevant information from the catalog, it enters a **planning mode**.
* **Analysis & Clarification:** Buster analyzes the user's request against the context it found. At this critical juncture, it might determine that more information is needed and **ask clarifying questions** to the user. It can also recognize if the required context isn't available in its catalog and will explicitly state that it cannot answer the question.
* **Plan Formulation:** If Buster determines it *can* answer the question, it formulates a step-by-step plan. This plan acts as an internal **"todo list"**, outlining the specific actions required. Examples include "build metric X using dimension Y and measure Z," or "create a new dashboard and add metrics A, B, and C."
### 3. Execution Loop

With a clear plan in place, Buster enters an iterative execution loop to fulfill the request:
* **Iterative Creation:** Buster systematically works through its todo list. This involves generating the necessary data artifacts:
* **Visualizations (Metrics):** What Buster refers to as "Metrics" are often akin to configured visualizations or specific data calculations ready for display.
* **Dashboards & Reports:** Buster constructs dashboards or reports to present the generated metrics and insights.
* **Safe & Semantic Query Generation:** When accessing data, Buster intelligently parses SQL. It leverages the semantic layer definitions to enforce correct **join paths** between tables and inject predefined **expressions** (like custom dimensions or measures). Furthermore, it incorporates safeguards to protect against potentially **malicious queries** and enforces **access controls**, ensuring users only see data they are permitted to access based on dataset permissions.
* **Version Controlled Artifacts:** Crucially, Buster creates these metrics and dashboards as **files**. This output allows them to be easily integrated into standard version control systems (like Git), enabling tracking, collaboration, and reproducibility.
* **Review & Progress:** During the execution loop, Buster includes an intermediary **review step**. It checks its todo list, identifies the tasks it has successfully completed, and marks them off.
* **Completion:** Buster continues this cycle of creation and review until every item on its initial plan (the todo list) has been successfully accomplished.
# Dimensions and Measures
Source: https://docs.buster.so/docs/core-concepts/modeling/dimensions-and-measures
Define fields and their properties in your semantic layer
Dimensions and measures form the foundation of your semantic layer, defining the fields that can be used for filtering, grouping, and aggregation.
## Dimensions
Dimensions represent the descriptive attributes or characteristics of your data. They are typically non-numeric or categorical fields used to slice, dice, group, and filter data. Think of them as the "who, what, where, when, why" of your analysis. Examples include dates, user IDs, product categories, geographical locations, or status flags.
```yaml
dimensions:
- name: order_date # Required - dimension name
description: Date ordered # Required - description
type: date # Optional - data type (string by default)
searchable: true # Optional - whether dimension can be searched (false by default)
options: ["A", "B", "C"] # Optional - list of allowed values
```
### Required Fields
* **name**: The field name, typically matching a column name in your database. It should represent a specific attribute of the model's entity.
* **description**: A clear explanation of what the dimension represents. This is mandatory.
### Optional Fields
* **type**: The data type of the dimension. Defining the correct type (e.g., `date`, `timestamp`, `boolean`, `string`) is crucial for proper handling and analysis.
* **searchable**: Whether this dimension can be used in search queries. Defaults to `false`.
* **options**: A list of allowed values for this dimension, useful for categorical data.
- `string` (default)
- `number`
- `integer`
- `boolean`
- `date`
- `timestamp`
**Dimension vs. Filter:** While a boolean dimension (e.g., `is_active`) can be used for filtering, its primary purpose is to describe an attribute *of* the entity (e.g., the user *is* active). A dedicated Filter, on the other hand, defines a specific *condition* or *subset* (e.g., `active_users_last_30_days`). If a boolean concept represents a core attribute, define it as a dimension. If it represents a common filtering condition, especially one involving logic beyond a single column, define it as a Filter. See [Defining Filters](/docs/core-concepts/modeling/filters).
## Measures
Measures are quantitative fields that can be aggregated, such as quantities or amounts.
```yaml
measures:
- name: amount # Required - measure name
description: Order amount in USD # Required - description
type: number # Optional - data type (inferred if omitted)
```
### Required Fields
* **name**: The field name, typically matching a column name in your database
* **description**: A clear explanation of what the measure represents. This is mandatory.
### Optional Fields
* **type**: The data type of the measure
## Best Practices
Use clear, business-oriented names that convey meaning.
Include detailed descriptions to help Buster understand the context.
Explicitly setting types ensures proper handling of values.
Mention units of measurement in descriptions (e.g., "in USD", "in days").
## Example
```yaml
models:
- name: orders
description: Customer orders
dimensions:
- name: id
description: Unique order identifier
type: string
- name: order_date
description: Date the order was placed
type: date
searchable: true
- name: status
description: Current order status
options: ["pending", "shipped", "delivered", "cancelled"]
measures:
- name: amount
description: Order amount in USD
type: number
- name: quantity
description: Number of items ordered
type: integer
```
## Next Steps
Now that you've defined dimensions and measures, you can:
1. [Create metrics](/docs/core-concepts/modeling/metrics) for business calculations
2. [Establish relationships](/docs/core-concepts/modeling/relationships) between models
# Examples
Source: https://docs.buster.so/docs/core-concepts/modeling/examples
View complete examples of Buster semantic layer definitions
These examples demonstrate complete semantic layer definitions for common business scenarios.
## E-Commerce Example
This example shows a semantic layer for an e-commerce application with orders, customers, products, and order items.
```yaml
models:
- name: orders
description: Customer orders
dimensions:
- name: id
description: Order ID
type: string
- name: order_date
description: Date the order was placed
type: date
searchable: true
- name: status
description: Order status
options: ["pending", "shipped", "delivered", "cancelled"]
measures:
- name: amount
description: Order amount in USD
type: number
- name: tax
description: Tax amount in USD
type: number
metrics:
- name: total_revenue
expr: "SUM(amount)"
description: Total revenue from orders
- name: average_order_value
expr: "AVG(amount)"
description: Average order value
- name: order_count
expr: "COUNT(*)"
description: Number of orders
filters:
- name: completed_orders
expr: "status IN ('shipped', 'delivered')"
description: Orders that have been shipped or delivered
- name: recent_orders
expr: "order_date >= DATE_SUB(CURRENT_DATE(), INTERVAL {{days}} DAY)"
description: Orders from the last N days
args:
- name: days
type: integer
description: Number of days to look back
entities:
- name: customers
primary_key: id
foreign_key: customer_id
cardinality: many-to-one
description: Customer who placed the order
- name: order_items
primary_key: id
foreign_key: order_id
cardinality: one-to-many
description: Items included in this order
- name: customers
description: Customer accounts
dimensions:
- name: id
description: Customer ID
type: string
- name: name
description: Customer name
searchable: true
- name: email
description: Customer email address
searchable: true
- name: sign_up_date
description: Date customer signed up
type: date
measures:
- name: lifetime_value
description: Customer lifetime value in USD
type: number
metrics:
- name: customer_count
expr: "COUNT(*)"
description: Number of customers
- name: average_lifetime_value
expr: "AVG(lifetime_value)"
description: Average lifetime value of customers
entities:
- name: orders
primary_key: customer_id
foreign_key: id
cardinality: one-to-many
description: Orders placed by this customer
- name: products
description: Products available for sale
dimensions:
- name: id
description: Product ID
type: string
- name: name
description: Product name
searchable: true
- name: category
description: Product category
options: ["electronics", "clothing", "home", "books"]
measures:
- name: price
description: Product price in USD
type: number
- name: cost
description: Product cost in USD
type: number
metrics:
- name: product_count
expr: "COUNT(*)"
description: Number of products
- name: average_price
expr: "AVG(price)"
description: Average product price
- name: margin
expr: "SUM(price - cost) / SUM(price)"
description: Average product margin
entities:
- name: order_items
primary_key: product_id
foreign_key: id
cardinality: one-to-many
description: Order items containing this product
- name: order_items
description: Individual items within orders
dimensions:
- name: id
description: Order item ID
type: string
measures:
- name: quantity
description: Number of items ordered
type: integer
- name: unit_price
description: Price per unit in USD
type: number
metrics:
- name: total_items
expr: "SUM(quantity)"
description: Total number of items sold
- name: average_quantity
expr: "AVG(quantity)"
description: Average quantity per order item
entities:
- name: orders
primary_key: id
foreign_key: order_id
cardinality: many-to-one
description: Order containing this item
- name: products
primary_key: id
foreign_key: product_id
cardinality: many-to-one
description: Product in this order item
```
## SaaS Application Example
This example shows a semantic layer for a SaaS application with subscriptions, users, and usage data.
```yaml
models:
- name: subscriptions
description: Customer subscriptions
dimensions:
- name: id
description: Subscription ID
type: string
- name: start_date
description: Date subscription started
type: date
- name: end_date
description: Date subscription ended (null if active)
type: date
- name: plan
description: Subscription plan
options: ["free", "basic", "pro", "enterprise"]
- name: billing_interval
description: Billing frequency
options: ["monthly", "annual"]
measures:
- name: monthly_amount
description: Monthly subscription amount in USD
type: number
metrics:
- name: mrr
expr: "SUM(monthly_amount)"
description: Monthly recurring revenue
- name: active_subscriptions
expr: "COUNT(CASE WHEN end_date IS NULL THEN 1 END)"
description: Number of active subscriptions
- name: churn_rate
expr: "COUNT(CASE WHEN end_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) AND CURRENT_DATE() THEN 1 END) / NULLIF(COUNT(CASE WHEN end_date IS NULL OR end_date > DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) THEN 1 END), 0)"
description: 30-day subscription churn rate
filters:
- name: active
expr: "end_date IS NULL"
description: Active subscriptions
- name: plan_tier
expr: "plan = {{plan}}"
description: Subscriptions on a specific plan
args:
- name: plan
type: string
description: Subscription plan
entities:
- name: users
primary_key: id
foreign_key: user_id
cardinality: many-to-one
description: User who owns this subscription
- name: users
description: Application users
dimensions:
- name: id
description: User ID
type: string
- name: email
description: User email
searchable: true
- name: signup_date
description: Date user signed up
type: date
- name: last_login
description: Date of last login
type: timestamp
metrics:
- name: user_count
expr: "COUNT(*)"
description: Number of users
- name: active_users
expr: "COUNT(CASE WHEN last_login >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) THEN 1 END)"
description: Users active in the last 30 days
entities:
- name: subscriptions
primary_key: user_id
foreign_key: id
cardinality: one-to-many
description: Subscriptions owned by this user
- name: usage_events
primary_key: user_id
foreign_key: id
cardinality: one-to-many
description: Usage events by this user
- name: usage_events
description: User activity and usage data
dimensions:
- name: id
description: Event ID
type: string
- name: event_time
description: Time the event occurred
type: timestamp
- name: event_type
description: Type of event
options: ["login", "feature_used", "export", "api_call"]
- name: feature
description: Feature that was used (for feature_used events)
type: string
measures:
- name: duration
description: Duration of the event in seconds
type: number
metrics:
- name: event_count
expr: "COUNT(*)"
description: Number of events
- name: average_duration
expr: "AVG(duration)"
description: Average event duration
- name: feature_usage
expr: "COUNT(CASE WHEN event_type = 'feature_used' AND feature = {{feature_name}} THEN 1 END)"
description: Usage count for a specific feature
args:
- name: feature_name
type: string
description: Name of the feature
entities:
- name: users
primary_key: id
foreign_key: user_id
cardinality: many-to-one
description: User who triggered this event
```
## Best Practices
Define models at an appropriate level of granularity - neither too detailed nor too abstract.
Ensure all important relationships between models are defined.
Add comprehensive descriptions to help Buster understand your data model.
Define business-relevant metrics rather than just exposing raw database fields.
## Next Steps
Now that you've seen complete examples of semantic layer definitions, you can:
1. [Deploy your model](/docs/core-concepts/deploying-models-to-buster/overview)
2. [Connect your data sources](/data_sources/overview)
3. [Start querying your data](/docs/getting-started/quickstart)
# Filters
Source: https://docs.buster.so/docs/core-concepts/modeling/filters
Learn how to define reusable filters in your semantic models
## 1. Introduction to Filters
Filters define named, reusable **conditions** that resolve to `true` or `false`. They encapsulate common query constraints, allowing users to easily apply predefined criteria without rewriting the logic repeatedly. Think of them as saved segments or common `WHERE` clauses.
Using filters makes queries cleaner, more consistent, and easier to understand, focusing the analysis on specific questions rather than boilerplate filtering logic.
### Basic Structure
Filters are defined under the `filters:` key within a model:
```yaml
filters:
- name: filter_name # Required: Unique identifier (snake_case)
description: "Business definition" # Required: Clear explanation
expr: "SQL_boolean_expression" # Optional if name matches a boolean dimension
args: # Optional: For parameterized filters
- name: param_name
type: param_type
# ... other parameter fields (e.g., default, description)
```
### Filters vs. Dimensions/Measures
It's crucial to distinguish Filters from Dimensions and Measures:
* **Dimensions:** Describe *attributes* (e.g., `country`, `status`). Used for grouping, slicing, and direct filtering (`WHERE country = 'USA'`). A **boolean Dimension** (e.g., `is_active`) represents a fundamental `true`/`false` attribute stored in the data.
* **Measures:** Quantifiable, aggregatable values (e.g., `total_revenue`). Used in aggregation functions (`SUM`) and `HAVING` clauses.
* **Filters:** Define reusable *conditions* or *subsets* (e.g., `last_30_days`, `high_value_customers`). They *must* resolve to boolean and generate `WHERE` clause logic.
**Guideline:**
* If a boolean concept describes a direct attribute (e.g., a user's stored active status), use a **boolean Dimension**.
* If you need a reusable condition (especially involving calculations, date logic, combined attributes, or related data), use a **Filter**.
## 2. Defining Basic Filters
Basic filters define simple, non-parameterized conditions.
### Required Fields
* **`name`**: A unique identifier (snake\_case) for the filter within the model.
* **`description`**: Explains what condition the filter represents. This is mandatory.
### The `expr` Field
* **Purpose:** Defines the filter's logic as a SQL expression (compatible with your data warehouse) that evaluates to `true` or `false`.
* **Optionality:** `expr` is **optional only if** the filter's `name` exactly matches the name of a **boolean dimension** defined within the same model. In this case, the filter implicitly means `dimension_name = true`.
* **Convention:** Name filters that map directly to boolean dimensions the same as the dimension (e.g., filter `is_active` for dimension `is_active`).
* **Content:** The `expr` can reference:
* Dimensions in the model's underlying table/view.
* SQL functions and operators.
* Literals (strings, numbers, booleans).
### Examples: Basic Filters
```yaml
# In a 'users' model with dimensions: user_id, created_at, country, is_active (boolean)
filters:
# Filter mapping directly to a boolean dimension (expr omitted)
- name: is_active
description: Users who are currently marked as active.
# Filter based on a single dimension expression
- name: is_us_based
description: Users located in the USA.
expr: "country = 'USA'" # Expression required
# Filter using a time-based expression
- name: recent_signups
description: Users who signed up in the last 90 days.
expr: "created_at >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)"
# Filter combining conditions within the *same* model
# Assumes a measure 'lifetime_value' and dimension 'last_login_date' exist
- name: high_value_inactive_users
description: Users with LTV > 500 who haven't logged in for 60 days.
expr: "lifetime_value > 500 AND last_login_date < DATE_SUB(CURRENT_DATE(), INTERVAL 60 DAY)"
```
## 3. Advanced Filter Types
Filters can incorporate parameters or reference data from related models.
### Parameterized Filters
Create dynamic filters that accept user input via `args`. Reference parameters using `{{parameter_name}}`.
* **`args`**: A list defining each parameter.
* `name`: The parameter name used in the `expr` (e.g., `{{min_amount}}`) (required).
* `type`: Data type (`string`, `integer`, `number`, `date`, `boolean`) (required).
* `description`: Explanation for the user (required).
* `default` (Optional): A default value if the user doesn't provide one.
```yaml
# In an 'orders' model with dimension 'amount'
filters:
- name: high_value_orders
description: Orders with amount greater than the specified minimum.
expr: "orders.amount > {{min_amount}}" # Note: Using table prefix for clarity
args:
- name: min_amount
type: number
description: The minimum order amount to filter by.
default: 100
```
### Cross-Model Filters (Conceptual)
Filters can define conditions based on data in related models, provided an `entity` relationship exists. The semantic layer uses these relationships to potentially generate necessary joins or subqueries when the filter is applied *in a query that also includes the required join*.
```yaml
# In a 'users' model with an entity 'orders' linking to an 'orders' model
filters:
# Example: Filter users based on related order data
# The exact SQL generated depends on the semantic layer engine and the query context.
- name: ordered_specific_product_recently
description: "Users who ordered Product SKU 'XYZ-123' in the last 30 days. Requires joining the 'orders' entity."
# This conceptual expr assumes the engine can translate it based on the 'orders' entity.
# A possible SQL realization involves a subquery or join:
expr: "users.user_id IN (SELECT o.user_id FROM orders o WHERE o.product_sku = 'XYZ-123' AND o.order_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))"
# NOTE: This filter only works if the QUERY using it also joins the 'orders' entity.
```
Filters involving related models rely on **explicit joins** being defined in the query itself. The filter definition alone does not cause the join. See Best Practices below.
## 4. Best Practices & Considerations
Follow these guidelines for effective filter definition.
### General Best Practices
* **Ensure Boolean Output:** The `expr` *must* evaluate to a boolean (`true`/`false`).
* **Clear Naming & Descriptions:** Use descriptive `name`s (snake\_case) and mandatory, clear `description`s.
* **Match Boolean Dimensions:** For filters representing `dimension = true`, name the filter identically to the boolean dimension and omit the `expr`.
* **Document Dependencies:** Clearly state required entities/joins or parameters in the `description`.
### Expression Best Practices
* **Use Table Prefixes:** Strongly recommended, especially when `expr` involves measures or might be used in contexts with joins (even implicit ones via entities). Use `model_name.column_name` (e.g., `users.country`, `orders.amount`).
* **Clarity and Readability:** Write clear `expr` logic. Use parentheses `()` for complex conditions to ensure correct operator precedence.
* **Null Handling:** Be mindful of nulls. Use `COALESCE`, `IS NULL`, `IS NOT NULL` etc., within `expr` as needed for robust logic.
### Parameter Best Practices
* **Explicit Typing:** Always define the `type` for each parameter (`string`, `integer`, `number`, `date`, `boolean`) to enable validation.
* **Placeholders:** Use double curly braces `{{parameter_name}}` to reference parameters in the `expr`.
* **Defaults:** Provide `default` values for optional parameters.
* **Descriptions:** Clearly explain each parameter's purpose in its `description`.
### Handling Joins and Related Models
* **Primary Model Association:** Filters are defined within a specific model.
* **Explicit Joins Required in Query:** If a filter's `expr` references fields from a *related* model (via an `entity`), the query using that filter *must* explicitly include the necessary join to that related model (using the entity).
* **No Automatic Joins:** The semantic layer does **not** automatically add joins based solely on a filter's `expr`. It validates that joins used in a *query* correspond to defined `entities`.
* **Document Join Requirements:** The filter's `description` must state if it relies on data from related models and which entities/joins are required. Example: "Filters for users whose latest order was cancelled. Requires joining the 'latest\_order' entity."
**Example: Filter Requiring a Join**
```yaml
# In the 'users' model
models:
- name: users
# ... dimensions: user_id, etc. ...
entities:
- name: latest_order # Relationship to 'orders' model, filtered for latest
# ... relationship definition linking users.user_id to orders.user_id ...
filters:
- name: latest_order_cancelled
description: "Filters for users whose most recent order was cancelled. Requires joining the 'latest_order' entity."
# Assumes 'latest_order' entity provides access to the related order's status
expr: "latest_order.status = 'cancelled'" # References related model via entity
```
## 5. How Filters are Used
When a query includes a defined filter (e.g., `WHERE filter: recent_signups` or selected in a BI tool), the semantic layer incorporates its logic into the final SQL query's `WHERE` clause:
* **Mapped Boolean Filter (`is_active`):** Adds `WHERE users.is_active = true` (or equivalent).
* **Filter with `expr` (`recent_signups`):** Adds `WHERE users.created_at >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)`.
* **Parameterized Filter (`high_value_orders` with `min_amount=200`):** Adds `WHERE orders.amount > 200`.
* **Cross-Model Filter (`latest_order_cancelled`):** *If the query also includes the `latest_order` join*, adds `WHERE latest_order.status = 'cancelled'`. If the join is missing, the query will likely fail validation.
## 6. Next Steps
* Explore how to define [Dimensions and Measures](/docs/core-concepts/modeling/dimensions-and-measures).
* Learn about creating [Metrics](/docs/core-concepts/modeling/metrics).
* Understand how to set up [Relationships (Entities)](/docs/core-concepts/modeling/relationships) between models, which are crucial for cross-model filters.
# Metrics
Source: https://docs.buster.so/docs/core-concepts/modeling/metrics
Define calculations and key performance indicators in your semantic layer
## 1. Introduction to Metrics
Metrics are calculations based on measures or dimensions within your semantic models. They represent key business performance indicators (KPIs) and other frequently used calculations, encapsulating business logic centrally.
Here's the basic structure for defining a metric:
```yaml
metrics:
- name: metric_name # Required - A unique identifier for the metric
expr: "SQL_expression" # Required - SQL-like expression defining the calculation
description: "Business definition" # Required - Clear explanation of the metric
args: # Optional - Used for parameterized metrics
- name: parameter_name
type: parameter_type
description: "Parameter description"
# ... other parameter fields (e.g., default)
```
## 2. Defining Basic Metrics
At their core, metrics combine a name, a calculation expression (`expr`), and a description.
### Required Fields
* **`name`**: A unique identifier for the metric within the model. Use clear, descriptive names.
* **`expr`**: A SQL-like expression that defines the calculation. This expression forms the basis of the metric's value.
* **`description`**: A clear explanation of what the metric represents in business terms. This is mandatory and crucial for understanding.
### Writing the `expr`
The `expr` field is where you define the metric's calculation using SQL syntax compatible with your data warehouse.
* **Referencing Fields:** You can directly reference dimensions and measures defined within the **same model** (e.g., `amount`, `user_id`).
* **Aggregations**: Use standard SQL aggregation functions: `SUM`, `AVG`, `COUNT`, `COUNT DISTINCT`, `MIN`, `MAX`, etc.
* **Operators**: Arithmetic (`+`, `-`, `*`, `/`), logical (`AND`, `OR`, `NOT`), comparison (`=`, `>`, `<`, etc.) are supported.
* **Functions**: Most standard SQL functions like `CASE`, `COALESCE`, `NULLIF`, date functions, string functions, etc., can be used.
When referencing fields within the same model in the `expr`, you typically don't need a table prefix (e.g., `SUM(amount)` is usually sufficient if `amount` is a measure in the current model). However, using prefixes (e.g., `SUM(orders.amount)`) can enhance clarity, especially in complex models. See Best Practices for more on prefixes when using related models.
**Example: Basic Metric**
```yaml
metrics:
- name: total_revenue
expr: "SUM(amount)"
description: Total revenue from all orders in the queried context.
- name: order_count
expr: "COUNT(DISTINCT order_id)"
description: The total number of unique orders.
```
## 3. Advanced Metric Types
Beyond basic calculations, you can create more dynamic and interconnected metrics.
### Parameterized Metrics
Create flexible metrics that accept user inputs at query time. Define parameters using the `args` field and reference them in the `expr` using double curly braces `{{parameter_name}}`.
* **`args`**: A list defining each parameter.
* `name`: The parameter name used in the `expr` (e.g., `{{days}}`) (required).
* `type`: Data type (`string`, `integer`, `number`, `date`, `boolean`) (required).
* `description`: Explanation for the user (required).
* `default` (Optional): A default value if the user doesn't provide one.
**Example: Parameterized Metric**
```yaml
metrics:
- name: revenue_in_last_n_days
description: Calculates total revenue over the last N days, specified by the user.
# Assumes 'order_date' and 'amount' are fields in the current model
expr: "SUM(CASE WHEN order_date >= DATE_SUB(CURRENT_DATE(), INTERVAL {{days}} DAY) THEN amount ELSE 0 END)"
args:
- name: days
type: integer
description: Number of days to include in the revenue calculation.
default: 30 # Example default value
```
### Cross-Model Metrics
Reference fields (dimensions or measures) from related models using dot notation (`related_model_name.field_name`). This requires an `entity` relationship to be defined between the models involved.
**Example: Cross-Model Metric**
```yaml
# In the 'orders' model, assuming an entity named 'customer' links to a 'customers' model
metrics:
- name: revenue_per_customer
# Assumes 'amount' is in 'orders', 'customer.id' references the 'id' in the related 'customers' model
expr: "SUM(amount) / NULLIF(COUNT(DISTINCT customer.id), 0)" # Use NULLIF for safety
description: "Average revenue per unique customer. Requires joining with the 'customer' entity."
```
Using cross-model references requires careful consideration of joins when querying. See "Handling Joins and Related Models" below.
## 4. Best Practices & Considerations
Follow these guidelines to create robust, understandable, and maintainable metrics.
### General Best Practices
Define metrics that represent meaningful business KPIs, translating raw data into actionable insights.
Use consistent, descriptive names and provide mandatory, clear descriptions explaining the calculation and business purpose.
Build metrics by referencing existing measures and dimensions within the model rather than duplicating complex logic.
If a metric relies on parameters or data from related models (via entities), clearly state these requirements in its `description`.
### Expression Best Practices
* **Type Consistency:** Ensure metric expressions resolve to appropriate data types (usually numeric) suitable for aggregation or further calculations.
* **Use Table Prefixes for Related Models:** *Always* use table-qualified names (e.g., `customer.id`, `orders.amount`) in your `expr` when referencing fields from related models (via entities) to prevent ambiguity. Prefixes are optional but recommended for clarity even for fields in the primary model.
* **SQL Conventions:**
* Wrap complex expressions or calculations in parentheses `()` to ensure correct operator precedence.
* Handle potential division-by-zero errors using `NULLIF(denominator, 0)`. Example: `SUM(profit) / NULLIF(SUM(revenue), 0)`.
* Be mindful of nulls. Use `COALESCE` or other relevant SQL functions to handle them appropriately within calculations, especially when dealing with outer joins.
* **Avoid Circular References:** Do not define metrics that refer back to themselves directly or indirectly.
**Good Expression Example (Parameterized):**
```yaml
metrics:
- name: avg_order_value_by_date
description: Average order value for orders within a specified date range. Requires 'orders.created_at' and 'orders.amount' fields.
# Uses table prefixes for clarity
expr: "SUM(CASE WHEN orders.created_at BETWEEN '{{start_date}}' AND '{{end_date}}' THEN orders.amount ELSE 0 END) / NULLIF(COUNT(CASE WHEN orders.created_at BETWEEN '{{start_date}}' AND '{{end_date}}' THEN orders.id ELSE NULL END), 0)"
args:
- name: start_date
type: date
description: The start date (YYYY-MM-DD). # Required
- name: end_date
type: date
description: The end date (YYYY-MM-DD). # Required
```
### Parameter Best Practices
* **Explicit Typing:** Always define the `type` for each parameter (`string`, `integer`, `number`, `date`, `boolean`). This enables validation and correct query generation.
* **Clear Documentation:** Provide a meaningful `description` for each parameter.
* **Placeholders:** Reference parameters in the `expr` using double curly braces: `{{parameter_name}}`.
* **Default Values:** Provide `default` values for optional parameters to simplify usage.
* **Validation Notes:**
* `string`: Will be automatically quoted if needed during query generation.
* `number`/`integer`: Must be valid numeric values.
* `date`: Should ideally be provided in ISO format (`YYYY-MM-DD`).
* `boolean`: Should be `true`/`false` (or potentially `1`/`0` depending on backend).
**Parameter Definition Pattern to Avoid:**
```yaml
# Bad: Parameter type is missing, hindering validation and usage.
args:
- name: min_amount
# type: number <-- Missing!
description: Minimum order amount to consider.
default: 100
```
### Handling Joins and Related Models
* **Primary Association:** Each metric belongs primarily to the model where it's defined.
* **Explicit Joins Required in Query:** If a metric's `expr` references fields from a *related* model (e.g., `customer.id`), the query using that metric *must* explicitly include the necessary join to that related model via its defined `entity`.
* **No Automatic Joins:** The semantic layer does *not* automatically infer or add joins based solely on the metric's `expr`. It validates that joins used in a *query* correspond to defined `entities`.
* **Robustness with Joins:** When dealing with related models (especially via outer joins), consider using `CASE` statements or checking for nulls on fields from those models to make the metric resilient to missing data.
## 5. Comprehensive Example
Here's an example within an `orders` model context, incorporating various metric types and practices:
```yaml
models:
- name: orders
description: Customer order data.
dimensions:
- name: order_id
type: string
- name: order_date # Renamed for clarity
type: date
- name: status
type: string
- name: customer_id # Foreign key to a customers model
type: string
measures:
- name: amount
description: Order amount in USD
type: number
- name: item_count
description: Number of items in the order
type: integer
entities:
- name: customer # Defines relationship to a 'customers' model
primary_key: id # Assumes 'id' is the PK in 'customers'
foreign_key: customer_id # FK in 'orders'
# ... other relationship details (e.g., join type)
metrics:
# Basic Metrics
- name: total_revenue
expr: "SUM(orders.amount)" # Prefix added for consistency
description: Total revenue from all orders in the queried context.
- name: average_order_value
expr: "AVG(orders.amount)" # Prefix added
description: The average value of a single order.
- name: order_count
expr: "COUNT(DISTINCT orders.order_id)" # Prefix added
description: The total number of unique orders.
- name: average_items_per_order
expr: "AVG(orders.item_count)" # Prefix added
description: Average number of items included in each order.
# Parameterized Metric
- name: revenue_by_period
expr: "SUM(CASE WHEN orders.order_date >= DATE_SUB(CURRENT_DATE(), INTERVAL {{days}} DAY) THEN orders.amount ELSE 0 END)"
description: Revenue for a specified time period (N days back from today).
args:
- name: days
type: integer
description: Number of days to look back.
default: 30 # Default to 30 days
# Cross-Model Metric
- name: revenue_per_customer
expr: "SUM(orders.amount) / NULLIF(COUNT(DISTINCT customer.id), 0)" # Prefixes used
description: "Average revenue per unique customer. Requires joining with the 'customer' entity."
# More Complex Parameterized Metric
- name: revenue_growth_vs_prior_periods
description: "Calculates the percentage growth in revenue compared to a specified number of prior periods. Requires 'orders.order_date' and 'orders.amount'. Note: Exact LAG syntax depends on SQL dialect."
expr: "(SUM(orders.amount) - LAG(SUM(orders.amount), {{periods}}) OVER (ORDER BY orders.order_date)) / NULLIF(LAG(SUM(orders.amount), {{periods}}) OVER (ORDER BY orders.order_date), 0)"
args:
- name: periods
type: integer
description: Number of periods to look back for comparison (e.g., 1 for previous period). # Required
```
## 6. Next Steps
With your metrics defined, consider:
1. [Defining Filters](/docs/core-concepts/modeling/filters) for reusable query conditions.
2. [Establishing Relationships (Entities)](/docs/core-concepts/modeling/relationships) to link different models together, enabling cross-model metrics.
3. Reviewing [Complete Examples](/docs/core-concepts/modeling/examples) of semantic model definitions.
By following these structures and practices, your metrics will be powerful, understandable, and easier to manage within your semantic layer.
# Models
Source: https://docs.buster.so/docs/core-concepts/modeling/models
Learn how to define models in your semantic layer
## Purpose of Models
Semantic models provide a flexible analytical layer on top of your data warehouse, aiming to create a **digital twin** of your business processes and concepts. They allow you to define business logic, metrics, and relationships in a centralized, governed place. This practice, often referred to as **"shifting left"**, involves moving analytical logic *out of* downstream BI tools or ad-hoc query generation (like that performed by AI agents) and *into* this upstream modeling layer. By doing so, you ensure logic is versioned, testable, and consistently applied across all uses.
Instead of embedding complex logic in individual queries or reports, or worse, allowing an AI to make assumptions about raw data (where infinite interpretations exist), you define key business concepts like "total revenue" or "active user" **once** in the model. **A key benefit is that if a metric's definition changes, you update it in one place (the model), and that change automatically propagates to all analyses and tools using the model.** This ensures consistency and reliability, dramatically reducing the risk of inconsistent answers or technical debt. For AI agents like Buster, this pre-defined logic is crucial; it allows the AI to leverage your curated business understanding, delivering trustworthy results instead of potentially flawed interpretations based on ambiguous raw data.
## Model Structure
Models represent business entities or concepts, typically mapping to a table or view in your database. Each model is defined with a name and can optionally include descriptions, dimensions, measures, metrics, filters, and entities (relationships).
```yaml
models:
- name: orders # Required - model identifier
description: Order data # Required - description
dimensions: # Optional - data attributes
# Dimension definitions
measures: # Optional - aggregations
# Measure definitions
metrics: # Optional - key performance indicators
# Metric definitions
filters: # Optional - predefined query filters
# Filter definitions
entities: # Optional - defines relationships
# Relationship definitions
```
### Required Fields
* **name**: A unique identifier for the model, typically matching the table name in your database.
* **description**: A clear explanation of what the model represents. This field is mandatory to ensure clarity and context for all users and downstream tools like AI agents.
### Optional Fields
All other fields within a model definition besides `name` and `description` are optional. You can create a model with just these two required fields, or include any combination of `dimensions`, `measures`, `metrics`, `filters`, and `entities` as needed.
* **dimensions**: Attributes used for slicing and dicing data (e.g., `user_country`, `product_category`).
* **measures**: Aggregations performed on data (e.g., `total_revenue`, `average_order_value`).
* **metrics**: Key performance indicators derived from measures and dimensions (e.g., `conversion_rate`).
* **filters**: Named, reusable boolean conditions (e.g., `is_active`, `last_30_days`) to simplify common query constraints. See [Defining Filters](/docs/core-concepts/modeling/filters).
* **entities**: Define relationships between this model and others.
## Modeling Approaches
While the structure above defines *how* to build a model, it's helpful to consider *what* you are modeling. There are a few common approaches:
### 1. Entity-Focused Models
These models represent core business objects or concepts. They often map directly to dimension tables in a traditional data warehouse.
* **Examples:** `customers`, `products`, `employees`, `stores`, `invoices`, `aircraft`.
* **Characteristics:** Primarily consist of descriptive `dimensions` (customer name, product category, employee department, store location, invoice status). May also include some `measures` directly related to the entity (e.g., `credit_limit` for a customer) or summary/rollup `metrics` (e.g., `total_invoiced_amount` on an invoice model).
* **Filters:** Often include filters based on the entity's attributes (e.g., `is_premium_customer`, `product_is_discontinued`).
```yaml
# Example: Entity-Focused Model
models:
- name: products
description: Represents individual products sold.
dimensions:
- name: product_id
type: string
primary_key: true # Indicates this is a primary key
- name: product_name
type: string
- name: category
type: string
- name: launch_date
type: date
- name: is_discontinued
type: boolean
measures:
- name: unit_cost
type: number
agg: avg # Example measure directly on the product
filters:
- name: is_discontinued
description: Products no longer actively sold.
# expr implicitly uses the is_discontinued dimension
- name: launched_this_year
description: Products launched in the current calendar year.
expr: "EXTRACT(YEAR FROM launch_date) = EXTRACT(YEAR FROM CURRENT_DATE())"
```
### 2. Event-Focused Models
These models represent occurrences or interactions that happen over time. They often map to fact tables or event stream data.
* **Examples:** `page_views`, `orders`, `transactions`, `login_events`, `support_tickets`, `security_incidents`.
* **Characteristics:** Almost always include a timestamp `dimension`. They contain `dimensions` describing the event (e.g., URL for page view, order status, transaction type, user agent for login). They typically contain `measures` quantifying the event (e.g., `order_amount`, `view_duration_seconds`, `transaction_value`). Summary `metrics` related to the event itself might be included.
* **Relationships:** Event models usually have relationships (`entities`) linking back to multiple Entity-Focused models (e.g., an `orders` event links to `customers`, `products`, and `employees`).
* **Filters:** Common filters relate to time (`last_7_days`, `this_quarter`) or event attributes (`successful_logins`, `high_priority_tickets`).
```yaml
# Example: Event-Focused Model (Simplified Order)
models:
- name: orders
description: Represents a customer placing an order.
dimensions:
- name: order_id
type: string
primary_key: true
- name: order_timestamp
type: timestamp
- name: order_status
type: string # e.g., 'pending', 'shipped', 'delivered'
measures:
- name: order_total_amount
type: number
agg: sum
- name: item_quantity
type: integer
agg: sum
metrics:
- name: average_order_value
expr: "AVG(order_total_amount)" # Metric summarizing the event itself
filters:
- name: shipped_orders
description: Orders that have been shipped.
expr: "order_status = 'shipped'"
- name: orders_last_30_days
description: Orders placed in the last 30 days.
expr: "order_timestamp >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)"
entities:
- name: customers # Relationship to an entity model
# ... relationship definition
- name: products # Another relationship
# ... relationship definition
```
### 3. Metric-Focused Models (Fact Models)
These models center around a core business metric or Key Performance Indicator (KPI). They often correspond to fact tables in a star or snowflake schema and serve as the central point for analyzing a specific metric across various business dimensions.
* **Examples:** `monthly_revenue`, `daily_user_activity`, `campaign_performance_facts`.
* **Characteristics:** Primarily contain `measures` that contribute to the core `metric`(s). The `dimensions` are typically foreign keys (e.g., `date_id`, `customer_id`, `product_id`, `region_id`) that link to separate Entity-Focused models. They might also include degenerate dimensions (attributes directly on the fact record).
* **Metrics:** The main purpose is often to define one or more significant `metrics` (e.g., `total_revenue`, `active_users`, `conversion_rate`) calculated from the base measures.
* **Relationships:** Heavily rely on `entities` to connect the foreign key dimensions back to their respective dimension/entity models.
* **Filters:** Often include time-based filters (`last_quarter`, `year_to_date`) or filters related to the metric calculation itself.
```yaml
# Example: Metric-Focused Model (Monthly Revenue Fact)
models:
- name: monthly_revenue
description: Central fact model for analyzing monthly revenue across different business dimensions.
dimensions:
# Foreign Keys linking to dimension/entity models
- name: month_start_date # Links to a date dimension model
type: date
- name: region_id # Links to a regions model
type: string
- name: customer_segment_id # Links to a customer_segments model
type: string
- name: sales_rep_id # Links to a sales_reps model
type: string
# You might also have direct measures if the metric uses them
measures:
- name: revenue_amount
description: The base revenue amount for a given record.
type: number
agg: sum
metrics:
- name: total_revenue
description: The total calculated revenue for the selected dimensions.
expr: "SUM(revenue_amount)" # Metric derived from the measure
filters:
- name: revenue_last_quarter
description: Revenue generated in the previous full quarter.
# Example expr assumes a date dimension model provides quarter info
expr: "month_start_date >= DATE_TRUNC('QUARTER', DATE_SUB(CURRENT_DATE(), INTERVAL 3 MONTH)) AND month_start_date < DATE_TRUNC('QUARTER', CURRENT_DATE())"
entities:
# Define relationships to the dimension/entity models
- name: date_dimension
primary_key: date_id # Assuming date_id in the date dimension model
foreign_key: month_start_date # Field in this model
# ... other relationship details
- name: regions
primary_key: region_id # PK in regions model
foreign_key: region_id # FK in this model
# ... other relationship details
- name: customer_segments
primary_key: segment_id # PK in segments model
foreign_key: customer_segment_id # FK in this model
# ... other relationship details
- name: sales_reps
primary_key: rep_id # PK in reps model
foreign_key: sales_rep_id # FK in this model
# ... other relationship details
```
**Best Practices & Combining Approaches:**
* **Start with Entities and Events:** Most semantic layers are built primarily around Entity-Focused and Event-Focused models, as these directly represent the core nouns and verbs of your business.
* **Use Relationships:** Define `entities` (relationships) clearly to connect your models. This is how Buster understands how events relate to entities (e.g., which customer placed which order).
* **Mix and Match:** Real-world data often requires a combination. An `orders` model (Event) will link to `customers` and `products` (Entities).
* **Keep Models Focused:** Aim for each model to represent a single, well-defined concept (one entity or one type of event).
* **Metric-Focused for Performance:** Consider Metric-Focused models if you have very large datasets and need pre-aggregation for performance, or if a specific, central metric links many otherwise disparate entities.
## Organizing Your Model Files
You have flexibility in how you structure your model definition files:
* **Single File:** You can define all your models within a single YAML file under the top-level `models:` key.
* **Multiple Files:** You can split your model definitions across multiple YAML files. Buster will discover and load models from all `.yml` files within the specified model paths in your `buster.yml`.
**Recommendation:** We recommend organizing models into separate files, potentially grouped into subdirectories based on business domain (e.g., `models/finance/`, `models/marketing/`, `models/product/`). This improves discoverability and maintainability as your semantic layer grows.
```yaml
# Example: All models in one file (models.yml)
models:
- name: orders
# ... definition ...
- name: customers
# ... definition ...
- name: products
# ... definition ...
```
```yaml
# Example: Models split by domain
# models/marketing/campaigns.yml
models:
- name: marketing_campaigns
# ... definition ...
# models/sales/orders.yml
models:
- name: orders
# ... definition ...
# models/product/users.yml
models:
- name: users
# ... definition ...
```
## Naming Conventions
* Use snake\_case for model names (e.g., `order_items`, not `OrderItems` or `order-items`).
* Choose names that reflect the business entity or concept, not necessarily the exact technical table name.
* Be consistent with naming patterns across your semantic layer.
## Best Practices
Always include descriptive text that explains what the model represents in business terms, even if brief.
Begin with a core model definition (just the `name`) and add dimensions, measures, etc., incrementally as analysis requires.
Generally, each model should align conceptually with a database table or view, but it's not a strict requirement.
Use separate files and potentially subdirectories to group related models logically (e.g., by business domain like finance, marketing).
## Example
Here's a more detailed example of a `page_views` model, demonstrating various optional components:
```yaml
# models/web_analytics/page_views.yml
models:
- name: page_views
description: Records each time a user views a page on the website. Event-based model.
dimensions:
- name: view_id
description: Unique identifier for the page view event.
type: string
- name: view_timestamp
description: Timestamp when the page view occurred.
type: timestamp
searchable: true # Allows searching/filtering on this dimension
- name: url
description: The URL of the page viewed.
type: string
- name: viewed_within_7_days_of_signup
description: Indicates if the page view occurred within 7 days of the user signing up.
type: boolean
measures:
- name: view_duration_seconds
description: How long the user spent on the page in seconds.
type: integer
agg: sum # Example specifying aggregation type
metrics:
- name: total_page_views
expr: "COUNT(view_id)"
description: The total count of page views.
- name: average_view_duration
expr: "AVG(view_duration_seconds)" # Assumes view_duration_seconds is a measure or dimension
description: The average duration of a page view in seconds.
filters:
- name: views_last_7_days
expr: "view_timestamp >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)"
description: Page views that occurred in the last 7 days.
entities: # Define relationships to other models
- name: users # Corresponds to a 'users' model
primary_key: user_id # Field in the 'users' model
foreign_key: user_id # Field in this 'page_views' model
cardinality: many-to-one
description: The user who performed the page view.
- name: products # Corresponds to a 'products' model
primary_key: product_id # Field in the 'products' model
foreign_key: product_id # Field in this 'page_views' model
cardinality: many-to-one # Assumes a page view relates to one product max
description: The product associated with the page view, if any.
```
## Next Steps
After defining your model structure, you'll need to add the specific components:
1. [Add dimensions and measures](/docs/core-concepts/modeling/dimensions-and-measures)
2. [Create metrics](/docs/core-concepts/modeling/metrics)
3. [Define filters](/docs/core-concepts/modeling/filters)
4. [Establish relationships (entities)](/docs/core-concepts/modeling/relationships)
# Overview
Source: https://docs.buster.so/docs/core-concepts/modeling/overview
Understanding Buster's semantic layer for analytical data modeling
The semantic layer helps Buster understand your data structure, relationships, and business logic.
## What is a Semantic Layer?
A semantic layer acts as a translation layer between your raw data and Buster's AI. It defines:
* Business entities (users, orders, products)
* Dimensions (attributes used for grouping, filtering, or segmenting data like dates, categories, statuses, or IDs)
* Measures (quantifiable data)
* Metrics (calculations)
* Relationships between models
By centralizing these definitions, the semantic layer allows Buster to generate accurate SQL, understand business metrics consistently, and provide reliable insights from your data, rather than relying on potentially inconsistent logic defined elsewhere.
## Why Use a Semantic Layer?
Define business logic and metrics once in a versioned, testable modeling layer ('shifting left'), ensuring consistent reuse across all analyses.
Map connections between tables so Buster can navigate your data model.
Add descriptions and metadata to help Buster understand what your data represents.
Control which data is accessible and how it can be used.
## Semantic Layer Components
Buster's semantic layer is defined using YAML files that describe:
1. **Models**: Business entities that typically map to database tables
2. **Dimensions**: Attributes used for grouping, filtering, or segmenting data (like dates, categories, statuses, or IDs)
3. **Measures**: Quantitative fields that can be aggregated
4. **Metrics**: Calculations based on measures and dimensions
5. **Filters**: Reusable, named boolean conditions for common query constraints
6. **Relationships (Entities)**: Connections between different models
## Getting Started
To build your semantic layer:
1. [Generate models from dbt catalog](/docs/core-concepts/modeling/generate) (optional)
2. [Define your models](/docs/core-concepts/modeling/models)
3. [Add dimensions and measures](/docs/core-concepts/modeling/dimensions-and-measures)
4. [Create metrics](/docs/core-concepts/modeling/metrics)
5. [Define filters](/docs/core-concepts/modeling/filters)
6. [Establish relationships](/docs/core-concepts/modeling/relationships)
7. [View complete examples](/docs/core-concepts/modeling/examples)
# Relationships
Source: https://docs.buster.so/docs/core-concepts/modeling/relationships
Define connections between models in your semantic layer
## 1. Introduction to Relationships
Relationships define how different semantic models connect to each other, similar to specifying join conditions in SQL. Defining these connections is crucial for enabling queries that combine data from multiple models.
By defining relationships, you allow the semantic layer to understand how to join data, enabling metrics, dimensions, and filters that reference fields across different models.
### Basic Structure
Relationships are defined within the `relationships:` list inside a model definition. Each item in the list represents a link to another model:
```yaml
models:
- name: current_model_name
# ... dimensions, measures, metrics ...
relationships:
- name: related_model_name # Required: Name of the model being linked TO
primary_key: pk_field # Required: Key field in the *related* model
foreign_key: fk_field # Required: Key field in the *current* model
type: LEFT # Optional: Join type (LEFT, INNER). Default: LEFT
cardinality: many-to-one # Optional: Relationship cardinality (see below)
description: "Explanation" # Optional: Business context for the relationship
```
## 2. Defining Relationships
Each relationship entry requires specifying how the current model links to the related model.
### Required Fields
* **`name`**: The name of the **related** semantic model you are connecting to.
* **`primary_key`**: The name of the field (usually a dimension) in the **related** model that acts as the join key on that side.
* **`foreign_key`**: The name of the field (usually a dimension) in the **current** model that acts as the join key on this side.
### Optional Fields
* **`description`**: A clear explanation of the relationship in business terms (e.g., "Customer who placed the order"). Highly recommended for clarity.
* **`type`**: The join type to use when connecting the models in a query. (See "Join Types" below).
* **`cardinality`**: The nature of the relationship between the models. (See "Cardinality Types" below).
### Simple Example: Orders to Customers
In an `orders` model, linking to a `customers` model:
```yaml
models:
- name: orders
# ... dimensions: order_id, customer_id, order_date ...
relationships:
- name: customers # Link to the 'customers' model
primary_key: id # The PK in the 'customers' model is 'id'
foreign_key: customer_id # The FK in the current 'orders' model is 'customer_id'
cardinality: many-to-one # Many orders belong to one customer
description: Customer associated with this order
```
## 3. Relationship Details
Understanding cardinality and join types is key to defining accurate relationships.
### Cardinality Types
Cardinality describes the numerical relationship between records in the two connected models. Specify this using the `cardinality` field.
* `one-to-one`: Each record in the current model corresponds to at most one record in the related model.
* `one-to-many`: Each record in the current model corresponds to potentially multiple records in the related model.
* `many-to-one` (Common): Multiple records in the current model can correspond to a single record in the related model (e.g., many orders belong to one customer).
* `many-to-many`: Records in the current model can correspond to multiple records in the related model, and vice-versa. Requires a junction model (see below).
### Join Types
The `type` field specifies the SQL join type used when this relationship is invoked in a query:
* **`LEFT`** (Default): Includes all records from the current model (the one defining the relationship), and matching records from the related model. If no match is found in the related model, fields from the related model will be `NULL`.
* **`INNER`**: Includes only those records where a match exists in **both** the current and related models based on the specified keys.
### Many-to-Many Relationships
Direct many-to-many relationships are typically modeled using an intermediary **junction model**. Each primary model has a one-to-many relationship *to* the junction model.
**Example: Products and Orders via `order_items`**
```yaml
models:
- name: products
# ... dimensions: id, product_name ...
relationships:
# Link from Product to the junction table
- name: order_items
primary_key: product_id # Key in junction table
foreign_key: id # Key in products table
cardinality: one-to-many # One product can be in many order items
- name: orders
# ... dimensions: id, order_date, customer_id ...
relationships:
# Link from Order to the junction table
- name: order_items
primary_key: order_id # Key in junction table
foreign_key: id # Key in orders table
cardinality: one-to-many # One order can have many order items
- name: order_items # The Junction Model
# ... dimensions: id, order_id, product_id, quantity ...
relationships:
# Link from junction table back to Products
- name: products
primary_key: id # Key in products table
foreign_key: product_id # Key in junction table
cardinality: many-to-one # Many items can be the same product
# Link from junction table back to Orders
- name: orders
primary_key: id # Key in orders table
foreign_key: order_id # Key in junction table
cardinality: many-to-one # Many items can be in the same order
```
## 4. Using Relationships: Cross-Model References
Once relationships are defined, you can reference dimensions and measures from related models within metrics and filters using dot notation: `related_model_name.field_name`.
```yaml
# In the 'orders' model (which has a relationship defined for 'customers')
metrics:
- name: revenue_per_customer
# 'amount' is from orders, 'customers.id' references 'id' from the related customers model
expr: "SUM(orders.amount) / COUNT(DISTINCT customers.id)"
description: "Average revenue per unique customer. Assumes 'customers' relationship exists."
```
A query using `revenue_per_customer` must explicitly include the `customers` relationship/join for this reference to work. The semantic layer uses the relationship definition to execute the join correctly when requested in the query.
## 5. Best Practices
For clarity and completeness, define relationships in both related models (e.g., `orders` links to `customers`, and `customers` links back to `orders`).
Ensure `primary_key`, `foreign_key`, and `cardinality` accurately reflect the underlying data structure. Incorrect definitions lead to wrong query results.
Use the optional `description` field to explain the business meaning of each relationship. This greatly improves model understandability.
Select `LEFT` or `INNER` based on whether records *must* exist in the related model for the query logic.
## 6. Comprehensive Example
Showing bidirectional relationships between `orders` and `customers`:
```yaml
models:
- name: orders
description: Customer transaction data.
# ... dimensions: order_id, customer_id, order_date ...
# ... measures: amount ...
relationships:
# Link from Orders TO Customers
- name: customers
primary_key: id # Customer's PK
foreign_key: customer_id # Order's FK pointing to customer
cardinality: many-to-one
description: Customer who placed the order
# Potentially link to order_items (junction model)
# - name: order_items
# ...
- name: customers
description: Customer account information.
# ... dimensions: id, name, signup_date ...
relationships:
# Link from Customers TO Orders
- name: orders
primary_key: customer_id # Order's FK pointing back to customer
foreign_key: id # Customer's PK
cardinality: one-to-many
description: Orders placed by this customer
```
## 7. Next Steps
After defining relationships between your models, you can:
1. Leverage these relationships to define cross-model [Metrics](/docs/core-concepts/modeling/metrics) and [Filters](/docs/core-concepts/modeling/filters).
2. [View complete examples](/docs/core-concepts/modeling/examples) of semantic layer definitions incorporating relationships.
3. [Deploy your model](/docs/core-concepts/deploying-models-to-buster/overview).
# Prompting Best Practices
Source: https://docs.buster.so/docs/core-concepts/prompting-best-practices
Effective prompting is key to getting the most out of Buster. By understanding how Buster interprets requests, you can ensure you receive accurate, relevant, and helpful responses.
## Be Direct and Clear for Ad-Hoc Questions
Buster excels at answering specific, unambiguous questions about your data. When you have a clear question in mind, state it directly.
**Good Examples:**
* "How many customers do we have?"
* "What was the total revenue last month?"
* "Show me monthly social media reactions for the last 12 months."
These types of questions are targeted, allowing Buster to understand the exact information you need.
**Avoid Vague Requests:**
Try to avoid overly broad or ambiguous terms.
* **Instead of:** "Sales"
**Try:** "What were the total sales figures for Q2?" or "Show me the sales trend over the last year."
* **Instead of:** "Who's the best?"
**Try:** "Which salesperson had the highest revenue last quarter?" or "Which product category generated the most profit?"
* **Instead of:** "Find order 12345"
**Try:** "Find the order with internal order ID 12345" or "Find the order with Shopify order ID 12345" (Especially important if multiple ID types exist).
While Buster can sometimes handle nuanced requests if the nuances are well-defined within your data or previous interactions, directness and clarity generally yield the best results.
## Provide Context for Exploratory Analysis
When using Buster for more investigative or exploratory tasks, providing context is crucial. Buster needs direction to perform meaningful analysis aligned with your business goals.
**Why Context Matters:**
Without context about your business objectives, Key Performance Indicators (KPIs), relevant timelines, specific goals, or topics of interest, Buster might generate analyses based on metrics that aren't relevant to you.
**Example Scenario:** Investigating a Sales Drop
* **Less Effective:** "Figure out why sales dropped." (Buster will make assumptions about what metrics to investigate)
* **More Effective:** "Figure out why sales dropped last quarter. Did it correlate with our recent website changes? Check performance in the Northeast region compared to others. Our main KPI is conversion rate."
**The more context you provide (hypotheses, relevant events, business specifics), the better Buster can tailor its analysis to your needs.**
## Specify Visualization Styling
You can instruct Buster on how to style the visualizations it creates. If you have specific preferences for colors, chart types, or other visual elements, include those details in your prompt.
**Example:**
* "Show monthly revenue for the past year as a line chart, and make the line green."
* "Update the 'Sales Performance' dashboard to use a dark theme."
Being specific about styling ensures the output matches your requirements.
## Understand Buster's Limitations
Currently, Buster's primary capabilities involve:
* Building data visualizations
* Creating and modifying dashboards
* Answering questions based on the connected data sources
Buster is **not** currently capable of:
* Performing web searches
* Conducting advanced statistical analysis (e.g., correlation, regression)
* Forecasting future trends
Keeping these limitations in mind will help you set appropriate expectations for your requests.
# Integrating with dbt
Source: https://docs.buster.so/docs/getting-started/dbt-integration
How Buster integrates with dbt projects to leverage your existing data models
## Overview
Buster is designed to seamlessly integrate with [dbt (data build tool)](https://www.getdbt.com/) projects, allowing you to leverage your existing data models and transform them into semantic models with minimal effort. This integration creates a powerful workflow that combines dbt's transformation capabilities with Buster's AI-powered analytics.
## How the Integration Works
The Buster CLI offers native integration with dbt that works through these key mechanisms:
### 1. Project Discovery
When you run `buster init` in a directory containing a dbt project:
* Buster automatically detects the presence of `dbt_project.yml`
* It uses the dbt project name as the default data source name
* It identifies model paths based on your dbt project configuration
* It configures database connections using settings from your dbt profiles
```bash
buster init
# Automatically detects dbt project settings
```
### 2. Catalog Utilization
Buster uses dbt's catalog.json as the source of truth for your data models:
* It looks for existing catalog.json files generated by `dbt docs generate`
* If no catalog exists, Buster can trigger the catalog generation for you
* The catalog provides comprehensive metadata about your models, columns, and documentation
```bash
# Buster uses this catalog data
dbt docs generate
buster generate
```
### 3. Semantic Model Generation
When transforming dbt models into Buster semantic models:
* Each dbt model becomes a semantic model in Buster
* Column types are intelligently mapped to dimensions or measures:
* Numeric columns become measures with appropriate aggregations
* Text, date, boolean columns become dimensions
* Descriptions from dbt are preserved in the semantic layer
* Primary keys and relationships are inferred where possible
## Setup Process
To set up a Buster project with your existing dbt project:
1. **Navigate to your dbt project directory**:
```bash
cd your-dbt-project
```
2. **Ensure you have a current dbt catalog**:
```bash
dbt docs generate
```
This creates/updates the catalog.json file that Buster will use.
3. **Initialize your Buster project**:
```bash
buster init
```
The CLI will automatically detect your dbt project and suggest appropriate settings.
4. **Generate semantic models**:
```bash
buster generate
```
This transforms your dbt models into Buster semantic models.
## Directory Structure Integration
Buster is flexible in how it integrates with your dbt directory structure:
```
your-dbt-project/
├── dbt_project.yml
├── models/
│ ├── marts/
│ │ └── core/
│ │ ├── orders.sql
│ │ └── orders.yml # dbt YAML
│ └── staging/
│ └── ...
├── buster.yml # Created by buster init
└── semantic_models/ # Created by buster generate
├── marts/
│ └── core/
│ └── orders.yml # Buster semantic model
└── staging/
└── ...
```
You can choose to:
* Generate semantic models alongside your SQL files
* Use a separate directory for semantic models
* Customize paths to fit your project structure
## Updating Models
When your dbt models change:
1. **Update your dbt models and run your dbt pipeline**:
```bash
dbt run
dbt docs generate # Update the catalog
```
2. **Regenerate or update your semantic models**:
```bash
buster generate
```
This preserves your custom enhancements while adding new columns or models.
3. **Deploy your updated models**:
```bash
buster deploy
```
## Debugging dbt Integration
If you encounter issues with the dbt integration:
* Ensure your dbt catalog is up-to-date with `dbt docs generate`
* Check that your dbt profiles.yml has valid database credentials
* Review the schema and database settings in both dbt and Buster configurations
## Example Workflow
Here's an example of a complete workflow integrating dbt and Buster:
```bash
# 1. Update your dbt models
vim models/marts/core/orders.sql
# 2. Run your dbt pipeline
dbt run --select orders
dbt test --select orders
dbt docs generate
# 3. Update your semantic models
buster generate
# 4. Enhance your semantic models with custom metrics
vim semantic_models/marts/core/orders.yml
# 5. Deploy to Buster
buster deploy
```
## Benefits of Integration
The integration between Buster and dbt provides several advantages:
* **No Duplicate Work**: Leverage existing dbt models and their documentation
* **Automatic Updates**: Keep your semantic layer in sync with schema changes
* **Consistent Naming**: Maintain consistency between transformation and semantic layers
* **Familiar Workflow**: Use tools that complement your existing data engineering practices
* **Version Control**: Keep both dbt and Buster models in the same repository
## Next Steps
After setting up the integration:
1. Enhance your generated semantic models with [metrics](../core-concepts/modeling/metrics.mdx) and [relationships](../core-concepts/modeling/relationships.mdx)
2. Deploy your semantic layer with [`buster deploy`](../using-the-cli/deploy-command.mdx)
3. Start asking questions of your data using Buster's AI capabilities
## Related Documentation
* [Initialize Projects](../using-the-cli/init-command.mdx)
* [Generate Models](../using-the-cli/generate-command.mdx)
* [Deploy Command](../using-the-cli/deploy-command.mdx)
* [Buster.yml Reference](../references/buster-yml.mdx)
# Overview
Source: https://docs.buster.so/docs/getting-started/overview
Buster is an open-source platform for building and deploying AI data analysts. You can use these analysts to analyze data, answer ad-hoc questions, build reports, and more.
## Why Buster?
* Integration between semantic layer and AI agents for reliable querying.
* Beautiful interface with charts designed from the ground up for AI.
* Continuous feedback between BI layer, data catalog layer, and modeling layer.
* Built-in access control and governance features for secure and manageable deployments.
# Quickstart
Source: https://docs.buster.so/docs/getting-started/quickstart
This guide walks you through setting up Buster with an existing [dbt](https://www.getdbt.com/) project. Buster is designed to work with dbt, leveraging your existing models and catalog to create a powerful semantic layer.
Buster works best with well-structured data models from dbt. If you already have a dbt project, Buster will automatically detect and use existing models to build a semantic layer for your data.
If you're not using dbt, please [contact us](https://buster.so/contact) for personalized assistance with your specific data stack.
VIDEO HERE
## Step 1: Install the CLI
```bash
brew tap buster-so/buster
brew install buster
```
Don't use Brew? Check out the [installation guide](../using-the-cli/install-the-cli) for other options.
## Step 2: Initialize Your Project & Connect Data Source
Let's initialize your Buster project and connect to your data source in one step using the `buster init` command:
```bash
buster init
```
This interactive command:
1. Asks you to select your data warehouse type (Postgres, BigQuery, Snowflake, etc.)
2. Prompts for connection details like host, port, credentials, etc.
3. Tests the connection to ensure everything works
4. Detects dbt project configurations if present
* Recognizes your dbt project structure
* Finds model paths automatically
* Discovers your dbt catalog for semantic model generation
5. Creates a `buster.yml` file with your project configuration:
```yaml
projects:
- path: .
data_source_name: demo_db
schema: analytics
database: buster
model_paths:
- models
semantic_model_paths:
- semantic_models # Optional path for semantic model YAML files
```
The command handles both project initialization and data source onboarding in a single workflow, making setup much easier.
See our [Data Sources guide](../../data_sources/overview.mdx) for specific database connection instructions and our [Init Command documentation](../using-the-cli/init-command.mdx) for more details on the setup process.
By default, Buster does not store any of your data. If you have questions about data handling, please see our [FAQ](../faqs/faq.mdx).
## Step 3: Create Semantic Models
If you didn't generate semantic models during initialization, you can create them now. These YAML files define how Buster understands your data. Use the `buster generate` command to automatically create them from your dbt catalog:
```bash
buster generate
```
For detailed information about the generate command, see our [CLI documentation](../using-the-cli/generate-command.mdx).
Buster analyzes your SQL models (like the examples below) and creates semantic model files for each one:
```sql
-- Example model: models/page_views.sql
WITH base_page_views AS (
SELECT * FROM {{ ref('stg_page_views') }}
),
users AS (
SELECT user_id, user_created_at FROM {{ ref('users') }}
)
SELECT
bpv.view_id,
bpv.user_id,
bpv.product_id,
bpv.timestamp AS view_timestamp,
bpv.url,
bpv.view_duration_seconds,
(bpv.timestamp <= TIMESTAMP_ADD(u.user_created_at, INTERVAL 7 DAY)) AS viewed_within_7_days_of_signup
FROM
base_page_views bpv
LEFT JOIN
users u ON bpv.user_id = u.user_id
```
```sql
-- Example model: models/users.sql
SELECT
user_id,
email,
created_at AS user_created_at,
country,
total_purchase_amount AS lifetime_value
FROM
{{ ref('stg_users') }}
```
These semantic model files add business context to your SQL models. Buster will create a separate YAML file for each model:
```yaml
# models/users.yml
name: users
description: Information about registered users.
dimensions:
- name: user_id
description: Unique identifier for the user.
type: string
- name: email
description: User's email address.
type: string
- name: user_created_at
description: Timestamp when the user account was created.
type: timestamp
searchable: true
- name: country
description: The country the user is registered in.
type: string
measures:
- name: lifetime_value
description: Total purchase amount attributed to the user in USD.
type: number
relationships:
- name: page_views
primary_key: user_id
foreign_key: user_id
cardinality: one-to-many
description: Page views generated by this user.
```
Note that each model is defined as an item in a list (with a leading dash), and Buster creates separate YAML files for each model instead of grouping them under a `models:` key.
We recommend defining semantic models for each of your key business entities and events. See our [Semantic Models Reference](../references/semantic-models.mdx) for more details.
## Step 4: Deploy Your Project
Once your data models and semantic layer are ready, deploy your project:
```bash
buster deploy
```
This command publishes your semantic models to the Buster platform, making them available for natural language querying.
For more information about deployment options and troubleshooting, see our [Deploy Command documentation](../using-the-cli/deploy-command.mdx).
## Step 5: Start Analyzing Your Data
Congratulations! You've successfully built and deployed your first Buster project. You can now use natural language to analyze your data through the Buster interface.
# buster.yml
Source: https://docs.buster.so/docs/references/buster-yml
# buster.yml
The `buster.yml` file is the primary configuration file for your Buster project. It defines how Buster connects to your data sources and where to find your semantic models.
## Basic Structure
Buster.yml now supports a multi-project configuration, allowing you to define multiple projects within a single file. This is useful when you need to connect to multiple data sources or schemas within the same Buster deployment.
```yaml
# Single project configuration
data_source_name: demo_db
schema: analytics
database: buster
model_paths:
- models/
# OR
# Multi-project configuration
projects:
- path: .
data_source_name: new_one
schema: ont-ont
database: postgres
model_paths:
- models/
```
## Configuration Options
### Single Project Configuration
For simple setups with a single data source:
| Field | Required | Description |
| ------------------ | -------- | ------------------------------------------------------------------------------------- |
| `data_source_name` | Yes | Name of the data source connection defined in Buster |
| `schema` | Yes | Database schema containing your tables |
| `database` | Yes | Database name to connect to |
| `model_paths` | No | List of paths to directories containing semantic model files (relative to buster.yml) |
### Multi-Project Configuration
For advanced setups with multiple data sources:
| Field | Required | Description |
| ----------------------------- | -------- | --------------------------------------------------------------------------------------- |
| `projects` | Yes | List of project configurations |
| `projects[].path` | Yes | Project path (relative to buster.yml location) |
| `projects[].data_source_name` | Yes | Name of the data source connection defined in Buster |
| `projects[].schema` | Yes | Database schema containing your tables |
| `projects[].database` | Yes | Database name to connect to |
| `projects[].model_paths` | No | List of paths to directories containing semantic model files (relative to project path) |
## Examples
### Single Project
```yaml
data_source_name: snowflake_prod
schema: analytics
database: business_data
```
### Multi-Project
```yaml
projects:
- path: ./marketing
data_source_name: snowflake_marketing
schema: marketing_analytics
database: marketing_data
model_paths:
- models/
- path: ./finance
data_source_name: postgres_finance
schema: finance
database: financial_data
model_paths:
- models/
- legacy_models/
```
# Overview
Source: https://docs.buster.so/docs/references/overview
[Dallin](https://x.com/dalbentley) will have this page up in the next \~48 hours.
# Semantic Models
Source: https://docs.buster.so/docs/references/semantic-models
[Dallin](https://x.com/dalbentley) will have this page up in the next \~48 hours.
# Deploy Projects
Source: https://docs.buster.so/docs/using-the-cli/deploy-command
How to use the deploy command to publish your Buster project
## Overview of the `buster deploy` Command
The `buster deploy` command publishes your Buster project to the platform, making your semantic models available for data analysis.
## Basic Usage
To deploy your Buster project:
```bash
buster deploy
```
This command:
1. Validates your buster.yml configuration
2. Verifies your semantic model definitions
3. Publishes your models to the Buster platform
4. Makes them available for natural language querying
## Deployment Process
When you run `buster deploy`, the following happens:
1. **Configuration Validation**: Checks your buster.yml file for correctness
2. **Model Validation**: Verifies that all semantic models are properly structured
3. **Data Source Verification**: Confirms that the configured data sources are accessible
4. **Model Publishing**: Uploads your semantic model definitions to the Buster platform
5. **Indexing**: Processes your models to enable natural language understanding
## Command Options
The following options are available:
* `--project-file FILE`: Specify a custom buster.yml file location
* `--dry-run`: Validate your project without actually deploying
* `--verbose`: Show detailed information during deployment
* `--force`: Force deployment even if there are minor validation issues
## Example
Deploy a project with the default configuration:
```bash
buster deploy
```
Deploy a project with a specific configuration file:
```bash
buster deploy --project-file ./configs/prod-buster.yml
```
## Deployment Lifecycle
Your project goes through several stages during deployment:
1. **Validation**: Initial checks of your configuration and model definitions
2. **Compilation**: Processing your semantic models into Buster's internal format
3. **Publishing**: Making your models available on the platform
4. **Activation**: Enabling your models for querying
## Troubleshooting Deployment
If you encounter issues during deployment:
1. **Validation Errors**: Check your buster.yml and semantic model files for syntax errors
2. **Connection Issues**: Verify that your data source connection details are correct
3. **Permission Problems**: Ensure the database user has the necessary permissions
4. **Timeout Errors**: For large projects, try increasing the deployment timeout
## Best Practices
1. **Test Before Production**: Use the `--dry-run` flag to validate your project before actual deployment
2. **Regular Deployments**: Deploy frequently as you make changes to keep your models up to date
3. **Versioning**: Use git or another version control system to track changes to your semantic models
4. **Validation**: Run `buster validate` before deployment to catch issues early
## Next Steps
After deploying your project:
1. Test your models with natural language queries
2. Monitor query performance and accuracy
3. Refine your semantic models based on usage patterns
4. Add more detailed metrics and relationships as needed
## Related Commands
* [`buster init`](init-command.mdx) - Initialize a new Buster project
* [`buster generate`](generate-command.mdx) - Generate semantic models from your dbt catalog
* [`buster validate`](validate-command.mdx) - Validate your project configuration
# Generate Models
Source: https://docs.buster.so/docs/using-the-cli/generate-command
How to use the generate command to create semantic models from dbt catalogs
## Overview of the `buster generate` Command
The `buster generate` command automatically creates semantic models from your dbt catalog. This significantly reduces the time and effort required to set up your semantic layer, especially for projects with many tables.
## Basic Usage
To generate semantic models from your dbt catalog:
```bash
buster generate [path] [--output-dir DIR]
```
Where:
* `[path]` is an optional path to your dbt catalog file
* `--output-dir DIR` specifies where to save the generated models
## How It Works
The `generate` command:
1. Locates your dbt catalog.json file (either at the specified path or in the default location)
2. Parses the catalog to extract table schemas and column metadata
3. Intelligently determines whether each column should be a dimension or measure based on its data type
4. Creates YAML semantic model files following Buster's best practices
5. Places these files either alongside your SQL files or in a specified output directory
## Type Classification
The command uses these rules to classify your columns:
* **Dimensions** are created for:
* String/text types
* Date/time types (timestamp, date, time)
* Boolean types
* UUID types
* Enum types
* **Measures** are created for:
* Numeric types (integer, decimal, float)
* Money types
Additionally, the command applies sensible defaults for:
* Identifying primary keys
* Setting appropriate aggregation types
* Adding searchable flags to common searchable fields
## Command Options
The following options are available:
* `--output-dir DIR`: Save generated models to a specific directory
* `--force`: Overwrite existing models, even if they were manually modified
* `--dry-run`: Show what would be generated without writing files
* `--verbose`: Show detailed information about the generation process
## Examples
### Basic Generation
Generate models from the default dbt catalog location:
```bash
buster generate
```
### Specify Catalog Path
Generate models from a specific dbt catalog file:
```bash
buster generate /path/to/catalog.json
```
### Custom Output Directory
Save generated models to a specific directory:
```bash
buster generate --output-dir ./semantic_models
```
## Handling Existing Models
When generating models for an existing project:
1. **New Models**: Created with sensible defaults
2. **Existing Models**: Updated while preserving your manual edits
* New columns will be added as dimensions or measures
* Existing dimension/measure definitions will be preserved
* Custom metrics, filters, and relationships won't be affected
This means you can safely regenerate models as your schema evolves without losing your customizations.
## Output Example
A generated model file will look like this:
```yaml
# models/orders.yml
models:
- name: orders
description: Orders table from data warehouse
dimensions:
- name: order_id
description: Unique identifier for the order
type: string
primary_key: true
- name: customer_id
description: Foreign key to the customer
type: string
- name: order_date
description: Date when the order was placed
type: timestamp
- name: status
description: Current order status
type: string
measures:
- name: total_amount
description: Total amount of the order
type: number
agg: sum
- name: item_count
description: Number of items in the order
type: integer
agg: sum
```
## Integration with `buster init`
When initializing a new project with `buster init`, you have the option to automatically generate semantic models from your dbt catalog as part of the setup process. This provides a seamless workflow from project initialization to having a functional semantic layer.
## Best Practices
1. **Initial Generation**: Use `buster generate` when first setting up your semantic layer to quickly scaffold models.
2. **Enrich Generated Models**: After generation, enhance the models with:
* More detailed descriptions
* Custom metrics
* Useful filters
* Proper relationships between models
3. **Regenerate When Schemas Change**: Update your semantic models when new columns are added to your dbt models.
## Next Steps
After generating your semantic models:
1. Review and enhance the generated models
2. Define relationships between models
3. Add metrics and filters
4. Deploy your project
## Related Commands
* [`buster init`](init-command.mdx) - Initialize a new Buster project
* [`buster deploy`](deploy-command.mdx) - Deploy your project to the Buster platform
* [Integrating with dbt](../getting-started/dbt-integration.mdx) - Learn more about Buster's integration with dbt
# Initialize Projects
Source: https://docs.buster.so/docs/using-the-cli/init-command
How to use the init command to set up Buster projects
## Overview of the `buster init` Command
The `buster init` command streamlines the process of setting up a new Buster project. It creates a `buster.yml` configuration file and helps connect your data warehouse in a single guided workflow.
## Basic Usage
To initialize a new Buster project:
```bash
buster init
```
Running this command will start an interactive setup process that:
1. Creates a `buster.yml` file in your current directory
2. Helps you connect to your data warehouse
3. Configures model paths and database settings
4. Optionally generates semantic models from your dbt catalog
## Connecting a Data Source
When you run `buster init`, you'll be guided through connecting your data warehouse:
1. You'll first be asked to select your data warehouse type from the supported options:
* Postgres
* BigQuery
* Snowflake
* Redshift
* MySQL
* SQL Server
* Databricks
* ClickHouse
* StarRocks
* Supabase
2. Based on your selection, you'll be prompted for the required connection details:
* Host/endpoint
* Port
* Database name
* Schema
* Username and password (or other authentication details)
3. The command will test the connection to ensure everything is working properly
4. Upon successful connection, the credentials will be securely stored
## Configuring Your Project
After connecting your data source, you'll be asked to configure your project:
1. **Data Source Name**: A unique identifier for this connection
2. **Database**: The database to connect to
3. **Schema**: The schema containing your models
4. **Model Paths**: Directories containing your dbt models or SQL files
5. **Semantic Model Paths** (Optional): Custom directories for your semantic model YAML files
## Discovering dbt Projects
The `init` command automatically detects and integrates with dbt projects:
1. It looks for common dbt configuration files (`dbt_project.yml`, etc.)
2. It identifies model paths based on dbt project settings
3. It finds your dbt catalog files for semantic model generation
For more details on working with dbt, see our complete guide to [Integrating with dbt](../getting-started/dbt-integration.mdx).
## Generating Semantic Models
The `init` command includes integration with the `generate` command, letting you:
1. Automatically discover your dbt catalog
2. Generate initial semantic models based on your database schema and dbt models
3. Place these models alongside your SQL files or in a dedicated directory
## Example Flow
Here's an example of initializing a project with a Postgres database:
```
$ buster init
✨ Creating a new Buster project ✨
Let's connect to your data warehouse.
Which database are you using? [postgres/bigquery/snowflake/...]: postgres
Enter connection details for Postgres:
Host: localhost
Port: 5432
Database: mydatabase
Username: myuser
Password: ********
Testing connection... Success!
Enter project configuration:
Data Source Name: my_analytics
Schema: public
Model Paths (comma-separated): models
Generate semantic models from dbt catalog? [y/N]: y
Searching for dbt catalog...
Found catalog with 15 models and 120 columns.
Generating semantic models... Done!
Project configured successfully! Your buster.yml file has been created.
```
## Configuration File
The `init` command creates a `buster.yml` file with your configuration:
```yaml
projects:
- path: .
data_source_name: my_analytics
schema: public
database: mydatabase
model_paths:
- models
semantic_model_paths:
- semantic_models
```
## Additional Options
The `init` command supports several flags to customize behavior:
* `--no-interactive`: Run in non-interactive mode (useful for scripts)
* `--project-file FILE`: Specify a custom location for the buster.yml file
* `--data-source NAME`: Pre-select a data source name
* `--database-type TYPE`: Pre-select a database type
* `--generate-models`: Automatically generate semantic models
* `--skip-connection-test`: Skip testing the database connection
## Next Steps
After initializing your project:
1. Review the generated `buster.yml` file
2. Examine any generated semantic models and enhance them as needed
3. Run `buster deploy` to deploy your project
## Related Commands
* [`buster generate`](generate-command.mdx) - Generate semantic models from your dbt catalog
* [`buster deploy`](deploy-command.mdx) - Deploy your project to the Buster platform
# Install the CLI
Source: https://docs.buster.so/docs/using-the-cli/install-the-cli
Install the Buster CLI on your operating system.
You can install the Buster CLI using various methods depending on your operating system.
## macOS
### Homebrew (Recommended)
If you use [Homebrew](https://brew.sh/), this is the simplest method:
```bash
brew tap buster-so/buster
brew install buster
```
### Manual Installation (x86\_64)
```bash
mkdir -p ~/.local/bin && curl -L https://github.com/buster-so/buster/releases/latest/download/buster-cli-darwin-x86_64.tar.gz | tar xz && mv buster-cli ~/.local/bin/buster && chmod +x ~/.local/bin/buster
```
### Manual Installation (ARM/Apple Silicon)
```bash
mkdir -p ~/.local/bin && curl -L https://github.com/buster-so/buster/releases/latest/download/buster-cli-darwin-arm64.tar.gz | tar xz && mv buster-cli ~/.local/bin/buster && chmod +x ~/.local/bin/buster
```
## Linux (x86\_64)
```bash
mkdir -p ~/.local/bin && curl -L https://github.com/buster-so/buster/releases/latest/download/buster-cli-linux-x86_64.tar.gz | tar xz && mv buster-cli ~/.local/bin/buster && chmod +x ~/.local/bin/buster
```
**macOS & Linux PATH Configuration**: After manual installation on macOS or Linux, make sure `~/.local/bin` is in your PATH. Add this line to your shell's configuration file (e.g., `.bashrc`, `.zshrc`):
```bash
export PATH="$HOME/.local/bin:$PATH"
```
You may need to restart your terminal session for the changes to take effect.
## Windows (x86\_64)
1. **Download the Windows binary using PowerShell:**
```powershell
Invoke-WebRequest -Uri https://github.com/buster-so/buster/releases/latest/download/buster-cli-windows-x86_64.zip -OutFile buster.zip
```
2. **Extract and install using PowerShell:**
```powershell
Expand-Archive -Path buster.zip -DestinationPath $env:USERPROFILE\buster
Move-Item -Path $env:USERPROFILE\buster\buster-cli.exe -Destination $env:LOCALAPPDATA\Microsoft\WindowsApps\buster.exe
```
This places `buster.exe` in a location typically already included in the system's PATH for user applications. You might need to restart your terminal.