ClickHouse

# BigQuery Source: https://docs.buster.so/data_sources/bigquery To connect Buster to your BigQuery project, you will need the following credentials: | Field | Description | Required | | -------------------- | --------------------------------------------------- | -------- | | `credentials_json` | Your Google Cloud service account key JSON. | Yes | | `default_project_id` | The default Google Cloud Project ID to use. | Yes | | `default_dataset_id` | The default BigQuery Dataset ID within the project. | Yes | You may need to allowlist Buster's IPs: `44.219.39.124`, `34.230.173.35`, `100.26.25.127` # ClickHouse Source: https://docs.buster.so/data_sources/clickhouse To connect Buster to your ClickHouse database (using the MySQL protocol), you will need the following credentials: | Field | Description | Required | | ------------------ | ------------------------------------------------------------------------ | -------- | | `host` | The hostname of your database. | Yes | | `port` | The port your database listens on (typically `9004` for MySQL protocol). | Yes | | `username` | The username for database access. | Yes | | `password` | The password for database access. | Yes | | `default_database` | The name of the database to connect to. | Yes | You may need to allowlist Buster's IPs: `44.219.39.124`, `34.230.173.35`, `100.26.25.127` # Databricks Source: https://docs.buster.so/data_sources/databricks To connect Buster to your Databricks workspace, you will need the following credentials: | Field | Description | Required | | ----------------- | ------------------------------------------------------------------------ | -------- | | `host` | The Databricks workspace URL (e.g., `dbc-a1b234c.cloud.databricks.com`). | Yes | | `api_key` | Your Databricks personal access token (PAT). | Yes | | `warehouse_id` | The HTTP path of your Databricks SQL Warehouse. | Yes | | `default_catalog` | The default catalog to use within Databricks. | Yes | | `default_schema` | The default schema to use within the catalog. | Yes | You may need to allowlist Buster's IPs: `44.219.39.124`, `34.230.173.35`, `100.26.25.127` # MySQL Source: https://docs.buster.so/data_sources/mysql To connect Buster to your MySQL database, you will need the following credentials: | Field | Description | Required | | ------------------ | --------------------------------------- | -------- | | `host` | The hostname of your database. | Yes | | `port` | The port your database listens on. | Yes | | `username` | The username for database access. | Yes | | `password` | The password for database access. | Yes | | `default_database` | The name of the database to connect to. | Yes | You may need to allowlist Buster's IPs: `44.219.39.124`, `34.230.173.35`, `100.26.25.127` # Overview Source: https://docs.buster.so/data_sources/overview ## Connecting to Data Sources Buster supports a wide range of databases and data warehouses. Typically, you'll need to: 1. Create a Buster user with Read-Only access to the data source. 2. Create the connection in Buster via CLI or the Buster UI. ## Setup Guides } title="BigQuery" href="bigquery" > Learn how to connect Buster with your Google BigQuery warehouse. Clickhouse Streamline Icon: https://streamlinehq.comClickHouse} title="ClickHouse" href="clickhouse"> Get started with connecting Buster to your ClickHouse instance. } title="Databricks" href="databricks" > See how to link Buster with your Databricks Lakehouse Platform. } title="MySQL" href="mysql" > Learn the process of connecting Buster to your MySQL database. } title="Postgres" href="postgres" > See how to set up Buster with your PostgreSQL database. } title="Redshift" href="redshift" > Follow our guide to integrate Buster with your Amazon Redshift warehouse. } title="Snowflake" href="snowflake" > Discover the steps to connect Buster to your Snowflake Data Cloud. } title="StarRocks" href="starrocks" > Find out how to integrate Buster with your StarRocks environment. } title="Supabase" href="supabase" > Set up a connection between Buster and your Supabase Postgres database. # Postgres Source: https://docs.buster.so/data_sources/postgres To connect Buster to your Postgres database, you will need the following credentials: | Field | Description | Required | | ------------------ | ---------------------------------------------- | -------- | | `host` | The hostname of your database. | Yes | | `port` | The port your database listens on. | Yes | | `username` | The username for database access. | Yes | | `password` | The password for database access. | Yes | | `default_database` | The name of the database to connect to. | Yes | | `default_schema` | The default schema to use within the database. | Yes | You may need to allowlist Buster's IPs: `44.219.39.124`, `34.230.173.35`, `100.26.25.127` # Redshift Source: https://docs.buster.so/data_sources/redshift To connect Buster to your Redshift cluster, you will need the following credentials: | Field | Description | Required | | ------------------ | ---------------------------------------------- | -------- | | `host` | The endpoint of your cluster. | Yes | | `port` | The port your cluster listens on. | Yes | | `username` | The username for database access. | Yes | | `password` | The password for database access. | Yes | | `default_database` | The name of the database to connect to. | Yes | | `default_schema` | The default schema to use within the database. | Yes | You may need to allowlist Buster's IPs: `44.219.39.124`, `34.230.173.35`, `100.26.25.127` # Snowflake Source: https://docs.buster.so/data_sources/snowflake To connect Buster to your Snowflake account, you will need the following credentials: | Field | Description | Required | | ------------------ | ---------------------------------------------- | -------- | | `account_id` | Your Snowflake account identifier. | Yes | | `warehouse_id` | The name of the Snowflake warehouse to use. | Yes | | `username` | The username for Snowflake access. | Yes | | `password` | The password for Snowflake access. | Yes | | `role` | The Snowflake role to use (optional). | No | | `default_database` | The default database to use within Snowflake. | Yes | | `default_schema` | The default schema to use within the database. | Yes | You may need to allowlist Buster's IPs: `44.219.39.124`, `34.230.173.35`, `100.26.25.127` # StarRocks Source: https://docs.buster.so/data_sources/starrocks To connect Buster to your StarRocks database (which uses the MySQL protocol), you will need the following credentials: | Field | Description | Required | | ------------------ | ------------------------------------------------------------------------ | -------- | | `host` | The hostname of your database. | Yes | | `port` | The port your database listens on (typically `9030` for MySQL protocol). | Yes | | `username` | The username for database access. | Yes | | `password` | The password for database access. | Yes | | `default_database` | The name of the database to connect to. | Yes | You may need to allowlist Buster's IPs: `44.219.39.124`, `34.230.173.35`, `100.26.25.127` # Supabase Source: https://docs.buster.so/data_sources/supabase To connect Buster to your Supabase project (which uses Postgres), you will need the following credentials: | Field | Description | Required | | ------------------ | ---------------------------------------------- | -------- | | `host` | The hostname of your database. | Yes | | `port` | The port your database listens on. | Yes | | `username` | The username for database access. | Yes | | `password` | The password for database access. | Yes | | `default_database` | The name of the database to connect to. | Yes | | `default_schema` | The default schema to use within the database. | Yes | You may need to allowlist Buster's IPs: `44.219.39.124`, `34.230.173.35`, `100.26.25.127` # Cursor Rules, Claude Code, etc. Source: https://docs.buster.so/docs/ai-resources/ai-rules-doc # Buster AI Agent Guidelines This document provides guidance for AI agents (Claude, Windsurf, Cursor, etc.) when working with Buster configuration files and semantic models. > **Note for AI Agents**: All Buster documentation is available via HTTP requests. Simply: > > * Access any page from [https://docs.buster.so/](https://docs.buster.so/) by appending `.md` to the URL to get markdown format > * For example: [https://docs.buster.so/references/semantic-models.md](https://docs.buster.so/references/semantic-models.md) > * Use tools like `curl` or `fetch` to retrieve documentation as needed > * Two specialized endpoints are available: > * [https://docs.buster.so/llms.txt](https://docs.buster.so/llms.txt) (concise overview) > * [https://docs.buster.so/llms-full.txt](https://docs.buster.so/llms-full.txt) (comprehensive reference) ## What is Buster? Buster is a platform for building and deploying AI data analysts that connects to various data sources and uses a semantic layer to understand your data structure and business logic. The platform enables natural language querying of your data warehouse with accurate, relevant responses. ## Core Components ### buster.yml This is the primary configuration file that defines your Buster project settings. #### Basic Structure ```yaml data_source_name: demo_db schema: analytics database: buster model_paths: - models/ semantic_model_paths: # Optional: Custom paths for semantic models - semantic_models/ ``` #### Multi-Project Structure Buster.yml now supports configuring multiple projects in a single file: ```yaml projects: - path: . data_source_name: new_one schema: ont-ont database: postgres model_paths: - models/ semantic_model_paths: # Optional: Custom paths for semantic models - semantic_models/ - path: ./another-project data_source_name: another_db schema: analytics database: warehouse model_paths: - models/ semantic_model_paths: - semantic_models/ ``` ### Semantic Models Semantic models define a business-friendly abstraction layer over your data warehouse. They are organized in YAML files and consist of these key components: #### Model Structure ```yaml models: - name: orders # Required: Unique identifier (snake_case) description: "Order transactions from our e-commerce platform" # Required: Clear explanation dimensions: # Optional: Attributes for slicing/filtering - name: order_id description: "Unique identifier for each order" type: string primary_key: true - name: created_at description: "When the order was created" type: timestamp measures: # Optional: Quantifiable aggregations - name: revenue description: "Total order value" type: decimal agg: sum metrics: # Optional: Derived KPIs/calculations - name: average_order_value description: "Average $ amount per order" expr: "sum(revenue) / count(order_id)" - name: revenue_by_period description: "Revenue over a specified time period" expr: "sum(case when created_at >= dateadd(day, -{{days}}, current_date()) then revenue else 0 end)" args: - name: days type: integer description: "Number of days to include" default: 30 filters: # Optional: Reusable query conditions - name: is_complete description: "Filter for completed orders" expr: "status = 'complete'" # The exact value depends on your data - name: orders_above_amount description: "Orders with revenue above threshold" expr: "revenue > {{min_amount}}" args: - name: min_amount type: number description: "Minimum order amount" default: 100 relationships: # Optional: Relationships to other models (previously called entities) - name: customer description: "The customer who placed this order" primary_key: id # Field in the related customer model foreign_key: customer_id # Field in this orders model cardinality: many_to_one # Many orders to one customer type: LEFT # LEFT JOIN is the default ``` ## Key Concepts for Effective Models ### Models * **Purpose**: Define meaningful business concepts or entities * **Types**: * **Entity-focused**: Representing business objects (customers, products) * **Event-focused**: Representing occurrences (orders, page views) * **Metric-focused**: Centered around core business KPIs * **Best Practices**: * Use snake\_case for naming * Provide clear, concise descriptions * Organize by business domain in separate files * Name files after the main model they contain ## Data Model Components Hierarchy Buster's semantic layer organizes data into a logical hierarchy: ### Dimensions * **Definition**: The lowest level properties about a data object (non-numeric attributes) * **Purpose**: Attributes used for filtering, grouping, or display * **Examples**: Names, dates, IDs, categories, locations, statuses * **Common Types**: * `string`: Text values (e.g., customer\_name, product\_category) * `timestamp`: Date and time values (e.g., created\_at, order\_date) * `boolean`: True/false values (e.g., is\_active, is\_completed) * **Properties**: * `name`: Identifier (required) - **Should match the exact column name in your database** * `description`: Explanation (required) * `type`: Data type (recommended) * `primary_key`: Boolean designating unique identifier (optional) * `searchable`: Boolean for natural language search (optional) * `expr`: SQL expression (optional) - **If not provided, the dimension name is used as the column identifier** * **Best Practices**: * Use column names as dimension names for direct mapping to your database * Include time-based dimensions for temporal analysis * Mark primary keys for joining and uniqueness * Mark searchable fields for important attributes * Use expr only when transformation is needed from raw data ### Measures * **Definition**: The lowest level numeric properties about a data object * **Purpose**: Raw quantitative values that can be aggregated * **Examples**: Amounts, counts, durations, prices, quantities * **Common Types**: * `decimal`: Precise numeric values (e.g., price, revenue) * `integer`: Whole numbers (e.g., quantity, views) * **Aggregation Types**: * `sum`: Add values together * `avg`: Calculate average * `count`: Count occurrences * `min`: Find minimum value * `max`: Find maximum value * `count_distinct`: Count unique values * **Properties**: * `name`: Identifier (required) - **Should match the exact column name in your database** * `description`: Explanation (required) * `type`: Data type (recommended) * `agg`: Aggregation type (required) * `expr`: SQL expression (optional) - **If not provided, the measure name is used as the column identifier** * **Best Practices**: * Use column names as measure names for direct mapping to your database * Choose appropriate aggregation for business meaning * Use clear, specific names describing the quantity * Include unit of measurement in description when applicable * Use expr only when transformation is needed from raw data ### Metrics * **Definition**: High-level abstractions or calculations derived from dimensions and measures * **Purpose**: Complex business KPIs and calculations * **Examples**: Average order value, conversion rate, retention rate, year-over-year growth * **Properties**: * `name`: Identifier (required) - Descriptive name for the calculation * `description`: Explanation (required) * `expr`: SQL expression (optional) - **Must always return a numeric value** * **If not provided, uses the column with the same name as the metric** * `args`: Parameters for dynamic metrics (optional) * **Syntax**: * Use standard SQL expressions * Reference fields from the same model directly by name * Reference fields from other models using dot notation * **Best Practices**: * Centralize business logic in metrics * Use descriptive names indicating calculation * If referencing a single column directly, name the metric after the column * Document formula and business significance * Parameterize metrics when appropriate * Ensure all metrics return numeric values ### Filters * **Definition**: High-level abstractions that provide reusable filtering conditions * **Purpose**: Pre-defined conditions that must resolve to a boolean * **Examples**: is\_active\_customer, recent\_orders, high\_value\_transaction * **Properties**: * `name`: Identifier (required) - Descriptive name for the filter * `description`: Explanation (required) * `expr`: SQL expression (optional) - **Must always resolve to TRUE or FALSE** * **If not provided, uses the boolean column with the same name as the filter** * **Syntax**: * Use standard SQL boolean expressions * Can reference dimensions and measures from the model directly by name * Common patterns include comparison operators (equals, greater than, less than, IN, BETWEEN) * **Best Practices**: * Define common filtering patterns without assuming specific values in the data * If referencing a single boolean column directly, name the filter after the column * Use for time-based filters (e.g., recent\_period, current\_year) * Create concept-based filters that describe what the filter does, not specific values * Ensure expressions always evaluate to boolean values * Avoid assuming specific values exist in a column; consult actual data first ### Relationships * **Definition**: Connections between different data models * **Purpose**: Define how models relate to each other * **Examples**: users-to-orders, products-to-reviews * **Properties**: * `name`: Relationship name (required) * `primary_key`: Unique identifier field (required) * `foreign_key`: Reference field in format `model.field` (required) * `cardinality`: Type of relationship (recommended) * `join_type`: SQL join type (optional, defaults to LEFT) * **Cardinality Types**: * `one_to_one`: Each record connects to exactly one record * `one_to_many`: Parent record connects to multiple child records * `many_to_one`: Multiple records connect to one parent record * `many_to_many`: Many-to-many relationship * **Best Practices**: * Only define relationships when there's evidence they exist in the data * Use descriptive relationship names that reflect business concepts * Document the business meaning of relationships * Define relationships bidirectionally when appropriate * Verify column data types match between related fields ## Tips for AI Agents When generating or modifying Buster configuration files: 1. **Start with Basic Structure**: * Define required fields first (data\_source\_name, schema, database) * Then add optional configurations as needed 2. **Ensure Semantic Clarity**: * Use descriptive names that reflect business concepts * Include clear descriptions for all components * Maintain consistent naming conventions (snake\_case) 3. **Model Organization**: * Organize models by business domain * Keep related models in the same file * Split large model collections into multiple files 4. **Relationship Mapping**: * Ensure primary and foreign keys are correctly specified * Define appropriate cardinality for each relationship * Consider join type implications for queries 5. **Metric Definition**: * Use SQL expressions that match the target database dialect * Ensure calculations are mathematically sound * Reference fields using correct notation 6. **Field Types**: * Specify appropriate data types for dimensions and measures * Choose correct aggregation types for measures * Mark primary keys and searchable fields appropriately 7. **Validation Checks**: * Ensure all required fields are present * Verify that references between models are valid * Check that expressions use correct field names and syntax 8. **Avoid Data Assumptions**: * Never make assumptions about specific values in columns * Use generic pattern descriptions rather than assuming particular statuses, categories, or values * Consult the actual data or schema documentation before creating filters or metrics * Create semantic models that adapt to the user's actual data values, not presumed values * Don't assume relationships between models unless clearly evident in the schema 9. **Relationship Discovery Guidelines**: * Only create relationships/joins when you have evidence from column names or schema analysis * Acceptable approaches include: * Finding matching column patterns (e.g., `id` in one table, `user_id` in another) * Analyzing foreign key constraints from database metadata * Following explicit instructions or documentation about table relationships * When analyzing multiple models at once, look for conventional patterns like: * Primary key matches foreign key name (e.g., `users.id` → `orders.user_id`) * Table name incorporated in foreign key (e.g., `user_id` referencing the `users` table) * Always verify relationship assumptions with the user when possible 10. **Use dbt Metadata Commands**: * Leverage dbt's metadata commands to gather accurate schema information without affecting production: * `dbt ls` - List all models in the project * `dbt docs generate` - Generate documentation (then examine metadata files) * `dbt parse` - Parse project files without running models * `dbt compile` - Compile SQL without executing it * `dbt describe` - View upstream/downstream dependencies of a model * **IMPORTANT**: Never run commands that execute models (`dbt run`, `dbt build`, etc.) * Extract valuable information from dbt metadata such as: * Column names, types, and descriptions * Model dependencies and lineage * Table properties and configurations * Existing documentation and tests * Use this metadata to make informed decisions when creating semantic models Following these guidelines will help create effective Buster configurations that accurately represent business concepts and enable powerful natural language querying of data warehouses. ## SQL Data Modeling with dbt Buster works best when the underlying data models align well with the semantic layer. Using dbt as the primary modeling tool helps create a clean, modular data structure that maps effectively to Buster's semantic models. ### Entity-Based Modeling Approach The SQL data models should mirror the entity-focused approach of the semantic layer, with tables structured to represent business entities and their relationships. #### dbt Model Organization * **Sources**: Raw data from warehouse (src\_\*) * **Staging**: Cleaned, typed data (stg\_\*) * **Intermediate**: Business logic processing (int\_\*) * **Entity Models**: Core business entities (e.g., users, products, orders) * **Derived Models**: Advanced calculations and aggregations (derived\_\*) #### Recommended File Structure ``` models/ ├── domains/ # Business domain models │ ├── customers/ # Customer domain │ │ ├── users.sql # User entity model │ │ ├── accounts.sql # Account entity model │ │ └── ... │ ├── sales/ # Sales domain │ │ ├── orders.sql # Order entity model │ │ ├── products.sql # Product entity model │ │ └── ... │ └── ... ├── derived/ # Derived metrics and aggregations │ ├── derived_user_metrics.sql │ ├── derived_sales_performance.sql │ └── ... ├── intermediate/ # Transformation steps │ ├── int_order_items.sql │ └── ... └── staging/ # Cleaned source data ├── stg_source1.sql ├── stg_source2.sql └── ... ``` #### Entity Model Pattern ```sql -- users.sql WITH source AS ( SELECT * FROM {{ ref('stg_users') }} ), -- Additional transformation/enrichment CTEs as needed final AS ( SELECT id, -- Primary key first_name, -- Identity attributes last_name, email, segment, -- Categorization attributes created_at, -- Temporal attributes updated_at, -- Include both dimensional and measure fields directly on the entity lifetime_value, -- Measure: calculated or aggregated values is_active, -- Dimension: status flags last_login_date -- Dimension: activity timestamps -- All attributes related to this entity in one model FROM source LEFT JOIN {{ ref('int_user_activity') }} USING (id) -- Join intermediate calculations if needed ) SELECT * FROM final ``` #### Entity Relationship Model Pattern ```sql -- orders.sql WITH order_source AS ( SELECT * FROM {{ ref('stg_orders') }} ), order_items AS ( SELECT * FROM {{ ref('stg_order_items') }} ), -- Join or additional transformation CTEs final AS ( SELECT id, -- Primary key user_id, -- Relationship to users entity created_at, -- Event timestamp status, -- State attributes -- Include both dimensional and measure fields items_count, -- Measure order_total, -- Measure tax_amount, -- Measure shipping_amount, -- Measure payment_method, -- Dimension channel, -- Dimension is_first_purchase -- Calculated dimension FROM order_source LEFT JOIN order_items USING (id) -- Additional joins as needed ) SELECT * FROM final ``` #### Derived Model Pattern ```sql -- derived_sales_performance.sql WITH user_orders AS ( SELECT user_id, SUM(order_total) as total_spent, COUNT(id) as order_count, MIN(created_at) as first_order_date, MAX(created_at) as last_order_date FROM {{ ref('orders') }} -- Reference to entity model WHERE status = 'complete' -- Status depends on your data GROUP BY 1 ), user_attributes AS ( SELECT id as user_id, segment, created_at as joined_date FROM {{ ref('users') }} -- Reference to entity model ), final AS ( SELECT user_attributes.user_id, segment, joined_date, -- Entity fields from orders total_spent, order_count, first_order_date, last_order_date, -- Derived calculations total_spent / NULLIF(order_count, 0) as average_order_value, DATEDIFF('day', first_order_date, last_order_date) as customer_tenure_days, DATEDIFF('day', last_order_date, CURRENT_DATE()) as days_since_last_order, -- Complex derived metrics CASE WHEN days_since_last_order <= 30 THEN 'active' WHEN days_since_last_order <= 90 THEN 'at_risk' ELSE 'inactive' END as activity_status FROM user_attributes LEFT JOIN user_orders USING (user_id) ) SELECT * FROM final ``` ### Aligning SQL Models with Semantic Layer When designing SQL models to work optimally with Buster's semantic layer: 1. **Entity-Based Alignment**: * Create entity-based models that directly mirror your semantic models * Include both dimensional attributes and measures on the entity model * Structure models around business concepts rather than database structure 2. **Field Naming Consistency**: * Use consistent field names between SQL models and semantic models * Follow snake\_case naming convention for all database objects * Include business-friendly names that accurately describe the data concepts 3. **Data Organization**: * Group related entities by business domain * Keep the model structure flat and intuitive * Put attributes directly on the relevant entity rather than normalizing excessively 4. **Data Type Considerations**: * Choose appropriate SQL data types that align with semantic types * Use TIMESTAMP types for all date-time fields * Consider using DECIMAL types with appropriate precision for monetary values 5. **Relationship Implementation**: * Ensure entity models contain the necessary keys for relationships * Use consistent naming for relationship fields (e.g., entity\_id pattern) * Create derived models to materialize complex relationships when needed 6. **Performance Optimization**: * Create appropriate indexes for frequently queried fields * Consider materializing important entities with calculated attributes * Pre-compute complex derivations to improve query performance 7. **Documentation and Testing**: * Include column descriptions in dbt model files * Add documentation that explains business logic for complex calculations * Implement dbt tests for primary keys, relationships, and business rules 8. **Utilize dbt Metadata for Better Modeling**: * Run `dbt docs generate` to extract rich metadata about the existing models * Examine model structure, column types, and relationships from manifest.json * Use `dbt compile` to see how models are transformed without executing them * Analyze upstream and downstream dependencies with `dbt describe` * Access descriptions and documentation from existing dbt projects By aligning your SQL data modeling approach with Buster's semantic layer, you create a cohesive data ecosystem where each entity model directly maps to business concepts in the semantic layer, making it intuitive for users to query through Buster. {/* Note: AI agents should never run commands that execute models (dbt run, dbt build, etc.) as these could affect production data. Stick to metadata commands that only provide information. */} # How does it work? Source: https://docs.buster.so/docs/core-concepts/how-does-buster-work ![Diagram: How Buster Works](https://mintlify.s3.us-west-1.amazonaws.com/buster-docs/docs/core-concepts/placeholder-diagram.png) Buster operates through a systematic process that mirrors how a data analyst might approach a request, leveraging a specialized data catalog and an intelligent execution engine. Here's a breakdown of the key stages: ### 1. Data Catalog Foundation ![Placeholder: Data Catalog Foundation](https://mintlify.s3.us-west-1.amazonaws.com/buster-docs/docs/core-concepts/placeholder-catalog.png) At its core, Buster relies on a self-maintained **Data Catalog**. This isn't just a passive inventory; it's the knowledge base Buster uses to understand your data landscape. * **Catalog Components:** The catalog integrates various data artifacts, primarily: * **Models:** These often originate from data modeling tools like dbt, representing the structured tables and views containing your data. Buster offers [native integration with dbt](/docs/getting-started/dbt-integration) to seamlessly transform your dbt models into semantic models. * **Semantic Definitions:** Accompanying these models are definition files (commonly YAML). These files enrich the raw models with business context and semantic meaning, defining elements like: * Dimensions and Measures * Metrics * Filters * Enumerations (valid values for fields) * Other common semantic layer components. * **Deployment:** This rich context, forming the data catalog, is actively deployed *to* Buster. Currently, this is done using a Command Line Interface (CLI) tool. Typically, you'll run this deployment from within the repository where you manage your data models (e.g., your dbt project directory), ensuring Buster always has the latest understanding of your data structure and semantics. ### 2. Answering User Questions ![Placeholder: Answering User Questions](https://mintlify.s3.us-west-1.amazonaws.com/buster-docs/docs/core-concepts/placeholder-answering.png) When presented with a user request, Buster follows a structured approach to find and plan the necessary actions: * **Intelligent Search:** Buster doesn't just do keyword matching. It performs a **semantic search** across its data catalog. This process is designed to mimic how a human analyst would intuitively search for the most relevant tables, columns, metrics, and definitions needed to address the specific question asked. * **Planning Phase:** Once Buster identifies the potentially relevant information from the catalog, it enters a **planning mode**. * **Analysis & Clarification:** Buster analyzes the user's request against the context it found. At this critical juncture, it might determine that more information is needed and **ask clarifying questions** to the user. It can also recognize if the required context isn't available in its catalog and will explicitly state that it cannot answer the question. * **Plan Formulation:** If Buster determines it *can* answer the question, it formulates a step-by-step plan. This plan acts as an internal **"todo list"**, outlining the specific actions required. Examples include "build metric X using dimension Y and measure Z," or "create a new dashboard and add metrics A, B, and C." ### 3. Execution Loop ![Placeholder: Execution Loop](https://mintlify.s3.us-west-1.amazonaws.com/buster-docs/docs/core-concepts/placeholder-execution.png) With a clear plan in place, Buster enters an iterative execution loop to fulfill the request: * **Iterative Creation:** Buster systematically works through its todo list. This involves generating the necessary data artifacts: * **Visualizations (Metrics):** What Buster refers to as "Metrics" are often akin to configured visualizations or specific data calculations ready for display. * **Dashboards & Reports:** Buster constructs dashboards or reports to present the generated metrics and insights. * **Safe & Semantic Query Generation:** When accessing data, Buster intelligently parses SQL. It leverages the semantic layer definitions to enforce correct **join paths** between tables and inject predefined **expressions** (like custom dimensions or measures). Furthermore, it incorporates safeguards to protect against potentially **malicious queries** and enforces **access controls**, ensuring users only see data they are permitted to access based on dataset permissions. * **Version Controlled Artifacts:** Crucially, Buster creates these metrics and dashboards as **files**. This output allows them to be easily integrated into standard version control systems (like Git), enabling tracking, collaboration, and reproducibility. * **Review & Progress:** During the execution loop, Buster includes an intermediary **review step**. It checks its todo list, identifies the tasks it has successfully completed, and marks them off. * **Completion:** Buster continues this cycle of creation and review until every item on its initial plan (the todo list) has been successfully accomplished. # Dimensions and Measures Source: https://docs.buster.so/docs/core-concepts/modeling/dimensions-and-measures Define fields and their properties in your semantic layer Dimensions and measures form the foundation of your semantic layer, defining the fields that can be used for filtering, grouping, and aggregation. ## Dimensions Dimensions represent the descriptive attributes or characteristics of your data. They are typically non-numeric or categorical fields used to slice, dice, group, and filter data. Think of them as the "who, what, where, when, why" of your analysis. Examples include dates, user IDs, product categories, geographical locations, or status flags. ```yaml dimensions: - name: order_date # Required - dimension name description: Date ordered # Required - description type: date # Optional - data type (string by default) searchable: true # Optional - whether dimension can be searched (false by default) options: ["A", "B", "C"] # Optional - list of allowed values ``` ### Required Fields * **name**: The field name, typically matching a column name in your database. It should represent a specific attribute of the model's entity. * **description**: A clear explanation of what the dimension represents. This is mandatory. ### Optional Fields * **type**: The data type of the dimension. Defining the correct type (e.g., `date`, `timestamp`, `boolean`, `string`) is crucial for proper handling and analysis. * **searchable**: Whether this dimension can be used in search queries. Defaults to `false`. * **options**: A list of allowed values for this dimension, useful for categorical data. - `string` (default) - `number` - `integer` - `boolean` - `date` - `timestamp` **Dimension vs. Filter:** While a boolean dimension (e.g., `is_active`) can be used for filtering, its primary purpose is to describe an attribute *of* the entity (e.g., the user *is* active). A dedicated Filter, on the other hand, defines a specific *condition* or *subset* (e.g., `active_users_last_30_days`). If a boolean concept represents a core attribute, define it as a dimension. If it represents a common filtering condition, especially one involving logic beyond a single column, define it as a Filter. See [Defining Filters](/docs/core-concepts/modeling/filters). ## Measures Measures are quantitative fields that can be aggregated, such as quantities or amounts. ```yaml measures: - name: amount # Required - measure name description: Order amount in USD # Required - description type: number # Optional - data type (inferred if omitted) ``` ### Required Fields * **name**: The field name, typically matching a column name in your database * **description**: A clear explanation of what the measure represents. This is mandatory. ### Optional Fields * **type**: The data type of the measure ## Best Practices Use clear, business-oriented names that convey meaning. Include detailed descriptions to help Buster understand the context. Explicitly setting types ensures proper handling of values. Mention units of measurement in descriptions (e.g., "in USD", "in days"). ## Example ```yaml models: - name: orders description: Customer orders dimensions: - name: id description: Unique order identifier type: string - name: order_date description: Date the order was placed type: date searchable: true - name: status description: Current order status options: ["pending", "shipped", "delivered", "cancelled"] measures: - name: amount description: Order amount in USD type: number - name: quantity description: Number of items ordered type: integer ``` ## Next Steps Now that you've defined dimensions and measures, you can: 1. [Create metrics](/docs/core-concepts/modeling/metrics) for business calculations 2. [Establish relationships](/docs/core-concepts/modeling/relationships) between models # Examples Source: https://docs.buster.so/docs/core-concepts/modeling/examples View complete examples of Buster semantic layer definitions These examples demonstrate complete semantic layer definitions for common business scenarios. ## E-Commerce Example This example shows a semantic layer for an e-commerce application with orders, customers, products, and order items. ```yaml models: - name: orders description: Customer orders dimensions: - name: id description: Order ID type: string - name: order_date description: Date the order was placed type: date searchable: true - name: status description: Order status options: ["pending", "shipped", "delivered", "cancelled"] measures: - name: amount description: Order amount in USD type: number - name: tax description: Tax amount in USD type: number metrics: - name: total_revenue expr: "SUM(amount)" description: Total revenue from orders - name: average_order_value expr: "AVG(amount)" description: Average order value - name: order_count expr: "COUNT(*)" description: Number of orders filters: - name: completed_orders expr: "status IN ('shipped', 'delivered')" description: Orders that have been shipped or delivered - name: recent_orders expr: "order_date >= DATE_SUB(CURRENT_DATE(), INTERVAL {{days}} DAY)" description: Orders from the last N days args: - name: days type: integer description: Number of days to look back entities: - name: customers primary_key: id foreign_key: customer_id cardinality: many-to-one description: Customer who placed the order - name: order_items primary_key: id foreign_key: order_id cardinality: one-to-many description: Items included in this order - name: customers description: Customer accounts dimensions: - name: id description: Customer ID type: string - name: name description: Customer name searchable: true - name: email description: Customer email address searchable: true - name: sign_up_date description: Date customer signed up type: date measures: - name: lifetime_value description: Customer lifetime value in USD type: number metrics: - name: customer_count expr: "COUNT(*)" description: Number of customers - name: average_lifetime_value expr: "AVG(lifetime_value)" description: Average lifetime value of customers entities: - name: orders primary_key: customer_id foreign_key: id cardinality: one-to-many description: Orders placed by this customer - name: products description: Products available for sale dimensions: - name: id description: Product ID type: string - name: name description: Product name searchable: true - name: category description: Product category options: ["electronics", "clothing", "home", "books"] measures: - name: price description: Product price in USD type: number - name: cost description: Product cost in USD type: number metrics: - name: product_count expr: "COUNT(*)" description: Number of products - name: average_price expr: "AVG(price)" description: Average product price - name: margin expr: "SUM(price - cost) / SUM(price)" description: Average product margin entities: - name: order_items primary_key: product_id foreign_key: id cardinality: one-to-many description: Order items containing this product - name: order_items description: Individual items within orders dimensions: - name: id description: Order item ID type: string measures: - name: quantity description: Number of items ordered type: integer - name: unit_price description: Price per unit in USD type: number metrics: - name: total_items expr: "SUM(quantity)" description: Total number of items sold - name: average_quantity expr: "AVG(quantity)" description: Average quantity per order item entities: - name: orders primary_key: id foreign_key: order_id cardinality: many-to-one description: Order containing this item - name: products primary_key: id foreign_key: product_id cardinality: many-to-one description: Product in this order item ``` ## SaaS Application Example This example shows a semantic layer for a SaaS application with subscriptions, users, and usage data. ```yaml models: - name: subscriptions description: Customer subscriptions dimensions: - name: id description: Subscription ID type: string - name: start_date description: Date subscription started type: date - name: end_date description: Date subscription ended (null if active) type: date - name: plan description: Subscription plan options: ["free", "basic", "pro", "enterprise"] - name: billing_interval description: Billing frequency options: ["monthly", "annual"] measures: - name: monthly_amount description: Monthly subscription amount in USD type: number metrics: - name: mrr expr: "SUM(monthly_amount)" description: Monthly recurring revenue - name: active_subscriptions expr: "COUNT(CASE WHEN end_date IS NULL THEN 1 END)" description: Number of active subscriptions - name: churn_rate expr: "COUNT(CASE WHEN end_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) AND CURRENT_DATE() THEN 1 END) / NULLIF(COUNT(CASE WHEN end_date IS NULL OR end_date > DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) THEN 1 END), 0)" description: 30-day subscription churn rate filters: - name: active expr: "end_date IS NULL" description: Active subscriptions - name: plan_tier expr: "plan = {{plan}}" description: Subscriptions on a specific plan args: - name: plan type: string description: Subscription plan entities: - name: users primary_key: id foreign_key: user_id cardinality: many-to-one description: User who owns this subscription - name: users description: Application users dimensions: - name: id description: User ID type: string - name: email description: User email searchable: true - name: signup_date description: Date user signed up type: date - name: last_login description: Date of last login type: timestamp metrics: - name: user_count expr: "COUNT(*)" description: Number of users - name: active_users expr: "COUNT(CASE WHEN last_login >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) THEN 1 END)" description: Users active in the last 30 days entities: - name: subscriptions primary_key: user_id foreign_key: id cardinality: one-to-many description: Subscriptions owned by this user - name: usage_events primary_key: user_id foreign_key: id cardinality: one-to-many description: Usage events by this user - name: usage_events description: User activity and usage data dimensions: - name: id description: Event ID type: string - name: event_time description: Time the event occurred type: timestamp - name: event_type description: Type of event options: ["login", "feature_used", "export", "api_call"] - name: feature description: Feature that was used (for feature_used events) type: string measures: - name: duration description: Duration of the event in seconds type: number metrics: - name: event_count expr: "COUNT(*)" description: Number of events - name: average_duration expr: "AVG(duration)" description: Average event duration - name: feature_usage expr: "COUNT(CASE WHEN event_type = 'feature_used' AND feature = {{feature_name}} THEN 1 END)" description: Usage count for a specific feature args: - name: feature_name type: string description: Name of the feature entities: - name: users primary_key: id foreign_key: user_id cardinality: many-to-one description: User who triggered this event ``` ## Best Practices Define models at an appropriate level of granularity - neither too detailed nor too abstract. Ensure all important relationships between models are defined. Add comprehensive descriptions to help Buster understand your data model. Define business-relevant metrics rather than just exposing raw database fields. ## Next Steps Now that you've seen complete examples of semantic layer definitions, you can: 1. [Deploy your model](/docs/core-concepts/deploying-models-to-buster/overview) 2. [Connect your data sources](/data_sources/overview) 3. [Start querying your data](/docs/getting-started/quickstart) # Filters Source: https://docs.buster.so/docs/core-concepts/modeling/filters Learn how to define reusable filters in your semantic models ## 1. Introduction to Filters Filters define named, reusable **conditions** that resolve to `true` or `false`. They encapsulate common query constraints, allowing users to easily apply predefined criteria without rewriting the logic repeatedly. Think of them as saved segments or common `WHERE` clauses. Using filters makes queries cleaner, more consistent, and easier to understand, focusing the analysis on specific questions rather than boilerplate filtering logic. ### Basic Structure Filters are defined under the `filters:` key within a model: ```yaml filters: - name: filter_name # Required: Unique identifier (snake_case) description: "Business definition" # Required: Clear explanation expr: "SQL_boolean_expression" # Optional if name matches a boolean dimension args: # Optional: For parameterized filters - name: param_name type: param_type # ... other parameter fields (e.g., default, description) ``` ### Filters vs. Dimensions/Measures It's crucial to distinguish Filters from Dimensions and Measures: * **Dimensions:** Describe *attributes* (e.g., `country`, `status`). Used for grouping, slicing, and direct filtering (`WHERE country = 'USA'`). A **boolean Dimension** (e.g., `is_active`) represents a fundamental `true`/`false` attribute stored in the data. * **Measures:** Quantifiable, aggregatable values (e.g., `total_revenue`). Used in aggregation functions (`SUM`) and `HAVING` clauses. * **Filters:** Define reusable *conditions* or *subsets* (e.g., `last_30_days`, `high_value_customers`). They *must* resolve to boolean and generate `WHERE` clause logic. **Guideline:** * If a boolean concept describes a direct attribute (e.g., a user's stored active status), use a **boolean Dimension**. * If you need a reusable condition (especially involving calculations, date logic, combined attributes, or related data), use a **Filter**. ## 2. Defining Basic Filters Basic filters define simple, non-parameterized conditions. ### Required Fields * **`name`**: A unique identifier (snake\_case) for the filter within the model. * **`description`**: Explains what condition the filter represents. This is mandatory. ### The `expr` Field * **Purpose:** Defines the filter's logic as a SQL expression (compatible with your data warehouse) that evaluates to `true` or `false`. * **Optionality:** `expr` is **optional only if** the filter's `name` exactly matches the name of a **boolean dimension** defined within the same model. In this case, the filter implicitly means `dimension_name = true`. * **Convention:** Name filters that map directly to boolean dimensions the same as the dimension (e.g., filter `is_active` for dimension `is_active`). * **Content:** The `expr` can reference: * Dimensions in the model's underlying table/view. * SQL functions and operators. * Literals (strings, numbers, booleans). ### Examples: Basic Filters ```yaml # In a 'users' model with dimensions: user_id, created_at, country, is_active (boolean) filters: # Filter mapping directly to a boolean dimension (expr omitted) - name: is_active description: Users who are currently marked as active. # Filter based on a single dimension expression - name: is_us_based description: Users located in the USA. expr: "country = 'USA'" # Expression required # Filter using a time-based expression - name: recent_signups description: Users who signed up in the last 90 days. expr: "created_at >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)" # Filter combining conditions within the *same* model # Assumes a measure 'lifetime_value' and dimension 'last_login_date' exist - name: high_value_inactive_users description: Users with LTV > 500 who haven't logged in for 60 days. expr: "lifetime_value > 500 AND last_login_date < DATE_SUB(CURRENT_DATE(), INTERVAL 60 DAY)" ``` ## 3. Advanced Filter Types Filters can incorporate parameters or reference data from related models. ### Parameterized Filters Create dynamic filters that accept user input via `args`. Reference parameters using `{{parameter_name}}`. * **`args`**: A list defining each parameter. * `name`: The parameter name used in the `expr` (e.g., `{{min_amount}}`) (required). * `type`: Data type (`string`, `integer`, `number`, `date`, `boolean`) (required). * `description`: Explanation for the user (required). * `default` (Optional): A default value if the user doesn't provide one. ```yaml # In an 'orders' model with dimension 'amount' filters: - name: high_value_orders description: Orders with amount greater than the specified minimum. expr: "orders.amount > {{min_amount}}" # Note: Using table prefix for clarity args: - name: min_amount type: number description: The minimum order amount to filter by. default: 100 ``` ### Cross-Model Filters (Conceptual) Filters can define conditions based on data in related models, provided an `entity` relationship exists. The semantic layer uses these relationships to potentially generate necessary joins or subqueries when the filter is applied *in a query that also includes the required join*. ```yaml # In a 'users' model with an entity 'orders' linking to an 'orders' model filters: # Example: Filter users based on related order data # The exact SQL generated depends on the semantic layer engine and the query context. - name: ordered_specific_product_recently description: "Users who ordered Product SKU 'XYZ-123' in the last 30 days. Requires joining the 'orders' entity." # This conceptual expr assumes the engine can translate it based on the 'orders' entity. # A possible SQL realization involves a subquery or join: expr: "users.user_id IN (SELECT o.user_id FROM orders o WHERE o.product_sku = 'XYZ-123' AND o.order_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))" # NOTE: This filter only works if the QUERY using it also joins the 'orders' entity. ``` Filters involving related models rely on **explicit joins** being defined in the query itself. The filter definition alone does not cause the join. See Best Practices below. ## 4. Best Practices & Considerations Follow these guidelines for effective filter definition. ### General Best Practices * **Ensure Boolean Output:** The `expr` *must* evaluate to a boolean (`true`/`false`). * **Clear Naming & Descriptions:** Use descriptive `name`s (snake\_case) and mandatory, clear `description`s. * **Match Boolean Dimensions:** For filters representing `dimension = true`, name the filter identically to the boolean dimension and omit the `expr`. * **Document Dependencies:** Clearly state required entities/joins or parameters in the `description`. ### Expression Best Practices * **Use Table Prefixes:** Strongly recommended, especially when `expr` involves measures or might be used in contexts with joins (even implicit ones via entities). Use `model_name.column_name` (e.g., `users.country`, `orders.amount`). * **Clarity and Readability:** Write clear `expr` logic. Use parentheses `()` for complex conditions to ensure correct operator precedence. * **Null Handling:** Be mindful of nulls. Use `COALESCE`, `IS NULL`, `IS NOT NULL` etc., within `expr` as needed for robust logic. ### Parameter Best Practices * **Explicit Typing:** Always define the `type` for each parameter (`string`, `integer`, `number`, `date`, `boolean`) to enable validation. * **Placeholders:** Use double curly braces `{{parameter_name}}` to reference parameters in the `expr`. * **Defaults:** Provide `default` values for optional parameters. * **Descriptions:** Clearly explain each parameter's purpose in its `description`. ### Handling Joins and Related Models * **Primary Model Association:** Filters are defined within a specific model. * **Explicit Joins Required in Query:** If a filter's `expr` references fields from a *related* model (via an `entity`), the query using that filter *must* explicitly include the necessary join to that related model (using the entity). * **No Automatic Joins:** The semantic layer does **not** automatically add joins based solely on a filter's `expr`. It validates that joins used in a *query* correspond to defined `entities`. * **Document Join Requirements:** The filter's `description` must state if it relies on data from related models and which entities/joins are required. Example: "Filters for users whose latest order was cancelled. Requires joining the 'latest\_order' entity." **Example: Filter Requiring a Join** ```yaml # In the 'users' model models: - name: users # ... dimensions: user_id, etc. ... entities: - name: latest_order # Relationship to 'orders' model, filtered for latest # ... relationship definition linking users.user_id to orders.user_id ... filters: - name: latest_order_cancelled description: "Filters for users whose most recent order was cancelled. Requires joining the 'latest_order' entity." # Assumes 'latest_order' entity provides access to the related order's status expr: "latest_order.status = 'cancelled'" # References related model via entity ``` ## 5. How Filters are Used When a query includes a defined filter (e.g., `WHERE filter: recent_signups` or selected in a BI tool), the semantic layer incorporates its logic into the final SQL query's `WHERE` clause: * **Mapped Boolean Filter (`is_active`):** Adds `WHERE users.is_active = true` (or equivalent). * **Filter with `expr` (`recent_signups`):** Adds `WHERE users.created_at >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)`. * **Parameterized Filter (`high_value_orders` with `min_amount=200`):** Adds `WHERE orders.amount > 200`. * **Cross-Model Filter (`latest_order_cancelled`):** *If the query also includes the `latest_order` join*, adds `WHERE latest_order.status = 'cancelled'`. If the join is missing, the query will likely fail validation. ## 6. Next Steps * Explore how to define [Dimensions and Measures](/docs/core-concepts/modeling/dimensions-and-measures). * Learn about creating [Metrics](/docs/core-concepts/modeling/metrics). * Understand how to set up [Relationships (Entities)](/docs/core-concepts/modeling/relationships) between models, which are crucial for cross-model filters. # Metrics Source: https://docs.buster.so/docs/core-concepts/modeling/metrics Define calculations and key performance indicators in your semantic layer ## 1. Introduction to Metrics Metrics are calculations based on measures or dimensions within your semantic models. They represent key business performance indicators (KPIs) and other frequently used calculations, encapsulating business logic centrally. Here's the basic structure for defining a metric: ```yaml metrics: - name: metric_name # Required - A unique identifier for the metric expr: "SQL_expression" # Required - SQL-like expression defining the calculation description: "Business definition" # Required - Clear explanation of the metric args: # Optional - Used for parameterized metrics - name: parameter_name type: parameter_type description: "Parameter description" # ... other parameter fields (e.g., default) ``` ## 2. Defining Basic Metrics At their core, metrics combine a name, a calculation expression (`expr`), and a description. ### Required Fields * **`name`**: A unique identifier for the metric within the model. Use clear, descriptive names. * **`expr`**: A SQL-like expression that defines the calculation. This expression forms the basis of the metric's value. * **`description`**: A clear explanation of what the metric represents in business terms. This is mandatory and crucial for understanding. ### Writing the `expr` The `expr` field is where you define the metric's calculation using SQL syntax compatible with your data warehouse. * **Referencing Fields:** You can directly reference dimensions and measures defined within the **same model** (e.g., `amount`, `user_id`). * **Aggregations**: Use standard SQL aggregation functions: `SUM`, `AVG`, `COUNT`, `COUNT DISTINCT`, `MIN`, `MAX`, etc. * **Operators**: Arithmetic (`+`, `-`, `*`, `/`), logical (`AND`, `OR`, `NOT`), comparison (`=`, `>`, `<`, etc.) are supported. * **Functions**: Most standard SQL functions like `CASE`, `COALESCE`, `NULLIF`, date functions, string functions, etc., can be used. When referencing fields within the same model in the `expr`, you typically don't need a table prefix (e.g., `SUM(amount)` is usually sufficient if `amount` is a measure in the current model). However, using prefixes (e.g., `SUM(orders.amount)`) can enhance clarity, especially in complex models. See Best Practices for more on prefixes when using related models. **Example: Basic Metric** ```yaml metrics: - name: total_revenue expr: "SUM(amount)" description: Total revenue from all orders in the queried context. - name: order_count expr: "COUNT(DISTINCT order_id)" description: The total number of unique orders. ``` ## 3. Advanced Metric Types Beyond basic calculations, you can create more dynamic and interconnected metrics. ### Parameterized Metrics Create flexible metrics that accept user inputs at query time. Define parameters using the `args` field and reference them in the `expr` using double curly braces `{{parameter_name}}`. * **`args`**: A list defining each parameter. * `name`: The parameter name used in the `expr` (e.g., `{{days}}`) (required). * `type`: Data type (`string`, `integer`, `number`, `date`, `boolean`) (required). * `description`: Explanation for the user (required). * `default` (Optional): A default value if the user doesn't provide one. **Example: Parameterized Metric** ```yaml metrics: - name: revenue_in_last_n_days description: Calculates total revenue over the last N days, specified by the user. # Assumes 'order_date' and 'amount' are fields in the current model expr: "SUM(CASE WHEN order_date >= DATE_SUB(CURRENT_DATE(), INTERVAL {{days}} DAY) THEN amount ELSE 0 END)" args: - name: days type: integer description: Number of days to include in the revenue calculation. default: 30 # Example default value ``` ### Cross-Model Metrics Reference fields (dimensions or measures) from related models using dot notation (`related_model_name.field_name`). This requires an `entity` relationship to be defined between the models involved. **Example: Cross-Model Metric** ```yaml # In the 'orders' model, assuming an entity named 'customer' links to a 'customers' model metrics: - name: revenue_per_customer # Assumes 'amount' is in 'orders', 'customer.id' references the 'id' in the related 'customers' model expr: "SUM(amount) / NULLIF(COUNT(DISTINCT customer.id), 0)" # Use NULLIF for safety description: "Average revenue per unique customer. Requires joining with the 'customer' entity." ``` Using cross-model references requires careful consideration of joins when querying. See "Handling Joins and Related Models" below. ## 4. Best Practices & Considerations Follow these guidelines to create robust, understandable, and maintainable metrics. ### General Best Practices Define metrics that represent meaningful business KPIs, translating raw data into actionable insights. Use consistent, descriptive names and provide mandatory, clear descriptions explaining the calculation and business purpose. Build metrics by referencing existing measures and dimensions within the model rather than duplicating complex logic. If a metric relies on parameters or data from related models (via entities), clearly state these requirements in its `description`. ### Expression Best Practices * **Type Consistency:** Ensure metric expressions resolve to appropriate data types (usually numeric) suitable for aggregation or further calculations. * **Use Table Prefixes for Related Models:** *Always* use table-qualified names (e.g., `customer.id`, `orders.amount`) in your `expr` when referencing fields from related models (via entities) to prevent ambiguity. Prefixes are optional but recommended for clarity even for fields in the primary model. * **SQL Conventions:** * Wrap complex expressions or calculations in parentheses `()` to ensure correct operator precedence. * Handle potential division-by-zero errors using `NULLIF(denominator, 0)`. Example: `SUM(profit) / NULLIF(SUM(revenue), 0)`. * Be mindful of nulls. Use `COALESCE` or other relevant SQL functions to handle them appropriately within calculations, especially when dealing with outer joins. * **Avoid Circular References:** Do not define metrics that refer back to themselves directly or indirectly. **Good Expression Example (Parameterized):** ```yaml metrics: - name: avg_order_value_by_date description: Average order value for orders within a specified date range. Requires 'orders.created_at' and 'orders.amount' fields. # Uses table prefixes for clarity expr: "SUM(CASE WHEN orders.created_at BETWEEN '{{start_date}}' AND '{{end_date}}' THEN orders.amount ELSE 0 END) / NULLIF(COUNT(CASE WHEN orders.created_at BETWEEN '{{start_date}}' AND '{{end_date}}' THEN orders.id ELSE NULL END), 0)" args: - name: start_date type: date description: The start date (YYYY-MM-DD). # Required - name: end_date type: date description: The end date (YYYY-MM-DD). # Required ``` ### Parameter Best Practices * **Explicit Typing:** Always define the `type` for each parameter (`string`, `integer`, `number`, `date`, `boolean`). This enables validation and correct query generation. * **Clear Documentation:** Provide a meaningful `description` for each parameter. * **Placeholders:** Reference parameters in the `expr` using double curly braces: `{{parameter_name}}`. * **Default Values:** Provide `default` values for optional parameters to simplify usage. * **Validation Notes:** * `string`: Will be automatically quoted if needed during query generation. * `number`/`integer`: Must be valid numeric values. * `date`: Should ideally be provided in ISO format (`YYYY-MM-DD`). * `boolean`: Should be `true`/`false` (or potentially `1`/`0` depending on backend). **Parameter Definition Pattern to Avoid:** ```yaml # Bad: Parameter type is missing, hindering validation and usage. args: - name: min_amount # type: number <-- Missing! description: Minimum order amount to consider. default: 100 ``` ### Handling Joins and Related Models * **Primary Association:** Each metric belongs primarily to the model where it's defined. * **Explicit Joins Required in Query:** If a metric's `expr` references fields from a *related* model (e.g., `customer.id`), the query using that metric *must* explicitly include the necessary join to that related model via its defined `entity`. * **No Automatic Joins:** The semantic layer does *not* automatically infer or add joins based solely on the metric's `expr`. It validates that joins used in a *query* correspond to defined `entities`. * **Robustness with Joins:** When dealing with related models (especially via outer joins), consider using `CASE` statements or checking for nulls on fields from those models to make the metric resilient to missing data. ## 5. Comprehensive Example Here's an example within an `orders` model context, incorporating various metric types and practices: ```yaml models: - name: orders description: Customer order data. dimensions: - name: order_id type: string - name: order_date # Renamed for clarity type: date - name: status type: string - name: customer_id # Foreign key to a customers model type: string measures: - name: amount description: Order amount in USD type: number - name: item_count description: Number of items in the order type: integer entities: - name: customer # Defines relationship to a 'customers' model primary_key: id # Assumes 'id' is the PK in 'customers' foreign_key: customer_id # FK in 'orders' # ... other relationship details (e.g., join type) metrics: # Basic Metrics - name: total_revenue expr: "SUM(orders.amount)" # Prefix added for consistency description: Total revenue from all orders in the queried context. - name: average_order_value expr: "AVG(orders.amount)" # Prefix added description: The average value of a single order. - name: order_count expr: "COUNT(DISTINCT orders.order_id)" # Prefix added description: The total number of unique orders. - name: average_items_per_order expr: "AVG(orders.item_count)" # Prefix added description: Average number of items included in each order. # Parameterized Metric - name: revenue_by_period expr: "SUM(CASE WHEN orders.order_date >= DATE_SUB(CURRENT_DATE(), INTERVAL {{days}} DAY) THEN orders.amount ELSE 0 END)" description: Revenue for a specified time period (N days back from today). args: - name: days type: integer description: Number of days to look back. default: 30 # Default to 30 days # Cross-Model Metric - name: revenue_per_customer expr: "SUM(orders.amount) / NULLIF(COUNT(DISTINCT customer.id), 0)" # Prefixes used description: "Average revenue per unique customer. Requires joining with the 'customer' entity." # More Complex Parameterized Metric - name: revenue_growth_vs_prior_periods description: "Calculates the percentage growth in revenue compared to a specified number of prior periods. Requires 'orders.order_date' and 'orders.amount'. Note: Exact LAG syntax depends on SQL dialect." expr: "(SUM(orders.amount) - LAG(SUM(orders.amount), {{periods}}) OVER (ORDER BY orders.order_date)) / NULLIF(LAG(SUM(orders.amount), {{periods}}) OVER (ORDER BY orders.order_date), 0)" args: - name: periods type: integer description: Number of periods to look back for comparison (e.g., 1 for previous period). # Required ``` ## 6. Next Steps With your metrics defined, consider: 1. [Defining Filters](/docs/core-concepts/modeling/filters) for reusable query conditions. 2. [Establishing Relationships (Entities)](/docs/core-concepts/modeling/relationships) to link different models together, enabling cross-model metrics. 3. Reviewing [Complete Examples](/docs/core-concepts/modeling/examples) of semantic model definitions. By following these structures and practices, your metrics will be powerful, understandable, and easier to manage within your semantic layer. # Models Source: https://docs.buster.so/docs/core-concepts/modeling/models Learn how to define models in your semantic layer ## Purpose of Models Semantic models provide a flexible analytical layer on top of your data warehouse, aiming to create a **digital twin** of your business processes and concepts. They allow you to define business logic, metrics, and relationships in a centralized, governed place. This practice, often referred to as **"shifting left"**, involves moving analytical logic *out of* downstream BI tools or ad-hoc query generation (like that performed by AI agents) and *into* this upstream modeling layer. By doing so, you ensure logic is versioned, testable, and consistently applied across all uses. Instead of embedding complex logic in individual queries or reports, or worse, allowing an AI to make assumptions about raw data (where infinite interpretations exist), you define key business concepts like "total revenue" or "active user" **once** in the model. **A key benefit is that if a metric's definition changes, you update it in one place (the model), and that change automatically propagates to all analyses and tools using the model.** This ensures consistency and reliability, dramatically reducing the risk of inconsistent answers or technical debt. For AI agents like Buster, this pre-defined logic is crucial; it allows the AI to leverage your curated business understanding, delivering trustworthy results instead of potentially flawed interpretations based on ambiguous raw data. ## Model Structure Models represent business entities or concepts, typically mapping to a table or view in your database. Each model is defined with a name and can optionally include descriptions, dimensions, measures, metrics, filters, and entities (relationships). ```yaml models: - name: orders # Required - model identifier description: Order data # Required - description dimensions: # Optional - data attributes # Dimension definitions measures: # Optional - aggregations # Measure definitions metrics: # Optional - key performance indicators # Metric definitions filters: # Optional - predefined query filters # Filter definitions entities: # Optional - defines relationships # Relationship definitions ``` ### Required Fields * **name**: A unique identifier for the model, typically matching the table name in your database. * **description**: A clear explanation of what the model represents. This field is mandatory to ensure clarity and context for all users and downstream tools like AI agents. ### Optional Fields All other fields within a model definition besides `name` and `description` are optional. You can create a model with just these two required fields, or include any combination of `dimensions`, `measures`, `metrics`, `filters`, and `entities` as needed. * **dimensions**: Attributes used for slicing and dicing data (e.g., `user_country`, `product_category`). * **measures**: Aggregations performed on data (e.g., `total_revenue`, `average_order_value`). * **metrics**: Key performance indicators derived from measures and dimensions (e.g., `conversion_rate`). * **filters**: Named, reusable boolean conditions (e.g., `is_active`, `last_30_days`) to simplify common query constraints. See [Defining Filters](/docs/core-concepts/modeling/filters). * **entities**: Define relationships between this model and others. ## Modeling Approaches While the structure above defines *how* to build a model, it's helpful to consider *what* you are modeling. There are a few common approaches: ### 1. Entity-Focused Models These models represent core business objects or concepts. They often map directly to dimension tables in a traditional data warehouse. * **Examples:** `customers`, `products`, `employees`, `stores`, `invoices`, `aircraft`. * **Characteristics:** Primarily consist of descriptive `dimensions` (customer name, product category, employee department, store location, invoice status). May also include some `measures` directly related to the entity (e.g., `credit_limit` for a customer) or summary/rollup `metrics` (e.g., `total_invoiced_amount` on an invoice model). * **Filters:** Often include filters based on the entity's attributes (e.g., `is_premium_customer`, `product_is_discontinued`). ```yaml # Example: Entity-Focused Model models: - name: products description: Represents individual products sold. dimensions: - name: product_id type: string primary_key: true # Indicates this is a primary key - name: product_name type: string - name: category type: string - name: launch_date type: date - name: is_discontinued type: boolean measures: - name: unit_cost type: number agg: avg # Example measure directly on the product filters: - name: is_discontinued description: Products no longer actively sold. # expr implicitly uses the is_discontinued dimension - name: launched_this_year description: Products launched in the current calendar year. expr: "EXTRACT(YEAR FROM launch_date) = EXTRACT(YEAR FROM CURRENT_DATE())" ``` ### 2. Event-Focused Models These models represent occurrences or interactions that happen over time. They often map to fact tables or event stream data. * **Examples:** `page_views`, `orders`, `transactions`, `login_events`, `support_tickets`, `security_incidents`. * **Characteristics:** Almost always include a timestamp `dimension`. They contain `dimensions` describing the event (e.g., URL for page view, order status, transaction type, user agent for login). They typically contain `measures` quantifying the event (e.g., `order_amount`, `view_duration_seconds`, `transaction_value`). Summary `metrics` related to the event itself might be included. * **Relationships:** Event models usually have relationships (`entities`) linking back to multiple Entity-Focused models (e.g., an `orders` event links to `customers`, `products`, and `employees`). * **Filters:** Common filters relate to time (`last_7_days`, `this_quarter`) or event attributes (`successful_logins`, `high_priority_tickets`). ```yaml # Example: Event-Focused Model (Simplified Order) models: - name: orders description: Represents a customer placing an order. dimensions: - name: order_id type: string primary_key: true - name: order_timestamp type: timestamp - name: order_status type: string # e.g., 'pending', 'shipped', 'delivered' measures: - name: order_total_amount type: number agg: sum - name: item_quantity type: integer agg: sum metrics: - name: average_order_value expr: "AVG(order_total_amount)" # Metric summarizing the event itself filters: - name: shipped_orders description: Orders that have been shipped. expr: "order_status = 'shipped'" - name: orders_last_30_days description: Orders placed in the last 30 days. expr: "order_timestamp >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)" entities: - name: customers # Relationship to an entity model # ... relationship definition - name: products # Another relationship # ... relationship definition ``` ### 3. Metric-Focused Models (Fact Models) These models center around a core business metric or Key Performance Indicator (KPI). They often correspond to fact tables in a star or snowflake schema and serve as the central point for analyzing a specific metric across various business dimensions. * **Examples:** `monthly_revenue`, `daily_user_activity`, `campaign_performance_facts`. * **Characteristics:** Primarily contain `measures` that contribute to the core `metric`(s). The `dimensions` are typically foreign keys (e.g., `date_id`, `customer_id`, `product_id`, `region_id`) that link to separate Entity-Focused models. They might also include degenerate dimensions (attributes directly on the fact record). * **Metrics:** The main purpose is often to define one or more significant `metrics` (e.g., `total_revenue`, `active_users`, `conversion_rate`) calculated from the base measures. * **Relationships:** Heavily rely on `entities` to connect the foreign key dimensions back to their respective dimension/entity models. * **Filters:** Often include time-based filters (`last_quarter`, `year_to_date`) or filters related to the metric calculation itself. ```yaml # Example: Metric-Focused Model (Monthly Revenue Fact) models: - name: monthly_revenue description: Central fact model for analyzing monthly revenue across different business dimensions. dimensions: # Foreign Keys linking to dimension/entity models - name: month_start_date # Links to a date dimension model type: date - name: region_id # Links to a regions model type: string - name: customer_segment_id # Links to a customer_segments model type: string - name: sales_rep_id # Links to a sales_reps model type: string # You might also have direct measures if the metric uses them measures: - name: revenue_amount description: The base revenue amount for a given record. type: number agg: sum metrics: - name: total_revenue description: The total calculated revenue for the selected dimensions. expr: "SUM(revenue_amount)" # Metric derived from the measure filters: - name: revenue_last_quarter description: Revenue generated in the previous full quarter. # Example expr assumes a date dimension model provides quarter info expr: "month_start_date >= DATE_TRUNC('QUARTER', DATE_SUB(CURRENT_DATE(), INTERVAL 3 MONTH)) AND month_start_date < DATE_TRUNC('QUARTER', CURRENT_DATE())" entities: # Define relationships to the dimension/entity models - name: date_dimension primary_key: date_id # Assuming date_id in the date dimension model foreign_key: month_start_date # Field in this model # ... other relationship details - name: regions primary_key: region_id # PK in regions model foreign_key: region_id # FK in this model # ... other relationship details - name: customer_segments primary_key: segment_id # PK in segments model foreign_key: customer_segment_id # FK in this model # ... other relationship details - name: sales_reps primary_key: rep_id # PK in reps model foreign_key: sales_rep_id # FK in this model # ... other relationship details ``` **Best Practices & Combining Approaches:** * **Start with Entities and Events:** Most semantic layers are built primarily around Entity-Focused and Event-Focused models, as these directly represent the core nouns and verbs of your business. * **Use Relationships:** Define `entities` (relationships) clearly to connect your models. This is how Buster understands how events relate to entities (e.g., which customer placed which order). * **Mix and Match:** Real-world data often requires a combination. An `orders` model (Event) will link to `customers` and `products` (Entities). * **Keep Models Focused:** Aim for each model to represent a single, well-defined concept (one entity or one type of event). * **Metric-Focused for Performance:** Consider Metric-Focused models if you have very large datasets and need pre-aggregation for performance, or if a specific, central metric links many otherwise disparate entities. ## Organizing Your Model Files You have flexibility in how you structure your model definition files: * **Single File:** You can define all your models within a single YAML file under the top-level `models:` key. * **Multiple Files:** You can split your model definitions across multiple YAML files. Buster will discover and load models from all `.yml` files within the specified model paths in your `buster.yml`. **Recommendation:** We recommend organizing models into separate files, potentially grouped into subdirectories based on business domain (e.g., `models/finance/`, `models/marketing/`, `models/product/`). This improves discoverability and maintainability as your semantic layer grows. ```yaml # Example: All models in one file (models.yml) models: - name: orders # ... definition ... - name: customers # ... definition ... - name: products # ... definition ... ``` ```yaml # Example: Models split by domain # models/marketing/campaigns.yml models: - name: marketing_campaigns # ... definition ... # models/sales/orders.yml models: - name: orders # ... definition ... # models/product/users.yml models: - name: users # ... definition ... ``` ## Naming Conventions * Use snake\_case for model names (e.g., `order_items`, not `OrderItems` or `order-items`). * Choose names that reflect the business entity or concept, not necessarily the exact technical table name. * Be consistent with naming patterns across your semantic layer. ## Best Practices Always include descriptive text that explains what the model represents in business terms, even if brief. Begin with a core model definition (just the `name`) and add dimensions, measures, etc., incrementally as analysis requires. Generally, each model should align conceptually with a database table or view, but it's not a strict requirement. Use separate files and potentially subdirectories to group related models logically (e.g., by business domain like finance, marketing). ## Example Here's a more detailed example of a `page_views` model, demonstrating various optional components: ```yaml # models/web_analytics/page_views.yml models: - name: page_views description: Records each time a user views a page on the website. Event-based model. dimensions: - name: view_id description: Unique identifier for the page view event. type: string - name: view_timestamp description: Timestamp when the page view occurred. type: timestamp searchable: true # Allows searching/filtering on this dimension - name: url description: The URL of the page viewed. type: string - name: viewed_within_7_days_of_signup description: Indicates if the page view occurred within 7 days of the user signing up. type: boolean measures: - name: view_duration_seconds description: How long the user spent on the page in seconds. type: integer agg: sum # Example specifying aggregation type metrics: - name: total_page_views expr: "COUNT(view_id)" description: The total count of page views. - name: average_view_duration expr: "AVG(view_duration_seconds)" # Assumes view_duration_seconds is a measure or dimension description: The average duration of a page view in seconds. filters: - name: views_last_7_days expr: "view_timestamp >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)" description: Page views that occurred in the last 7 days. entities: # Define relationships to other models - name: users # Corresponds to a 'users' model primary_key: user_id # Field in the 'users' model foreign_key: user_id # Field in this 'page_views' model cardinality: many-to-one description: The user who performed the page view. - name: products # Corresponds to a 'products' model primary_key: product_id # Field in the 'products' model foreign_key: product_id # Field in this 'page_views' model cardinality: many-to-one # Assumes a page view relates to one product max description: The product associated with the page view, if any. ``` ## Next Steps After defining your model structure, you'll need to add the specific components: 1. [Add dimensions and measures](/docs/core-concepts/modeling/dimensions-and-measures) 2. [Create metrics](/docs/core-concepts/modeling/metrics) 3. [Define filters](/docs/core-concepts/modeling/filters) 4. [Establish relationships (entities)](/docs/core-concepts/modeling/relationships) # Overview Source: https://docs.buster.so/docs/core-concepts/modeling/overview Understanding Buster's semantic layer for analytical data modeling The semantic layer helps Buster understand your data structure, relationships, and business logic. ## What is a Semantic Layer? A semantic layer acts as a translation layer between your raw data and Buster's AI. It defines: * Business entities (users, orders, products) * Dimensions (attributes used for grouping, filtering, or segmenting data like dates, categories, statuses, or IDs) * Measures (quantifiable data) * Metrics (calculations) * Relationships between models By centralizing these definitions, the semantic layer allows Buster to generate accurate SQL, understand business metrics consistently, and provide reliable insights from your data, rather than relying on potentially inconsistent logic defined elsewhere. ## Why Use a Semantic Layer? Define business logic and metrics once in a versioned, testable modeling layer ('shifting left'), ensuring consistent reuse across all analyses. Map connections between tables so Buster can navigate your data model. Add descriptions and metadata to help Buster understand what your data represents. Control which data is accessible and how it can be used. ## Semantic Layer Components Buster's semantic layer is defined using YAML files that describe: 1. **Models**: Business entities that typically map to database tables 2. **Dimensions**: Attributes used for grouping, filtering, or segmenting data (like dates, categories, statuses, or IDs) 3. **Measures**: Quantitative fields that can be aggregated 4. **Metrics**: Calculations based on measures and dimensions 5. **Filters**: Reusable, named boolean conditions for common query constraints 6. **Relationships (Entities)**: Connections between different models ## Getting Started To build your semantic layer: 1. [Generate models from dbt catalog](/docs/core-concepts/modeling/generate) (optional) 2. [Define your models](/docs/core-concepts/modeling/models) 3. [Add dimensions and measures](/docs/core-concepts/modeling/dimensions-and-measures) 4. [Create metrics](/docs/core-concepts/modeling/metrics) 5. [Define filters](/docs/core-concepts/modeling/filters) 6. [Establish relationships](/docs/core-concepts/modeling/relationships) 7. [View complete examples](/docs/core-concepts/modeling/examples) # Relationships Source: https://docs.buster.so/docs/core-concepts/modeling/relationships Define connections between models in your semantic layer ## 1. Introduction to Relationships Relationships define how different semantic models connect to each other, similar to specifying join conditions in SQL. Defining these connections is crucial for enabling queries that combine data from multiple models. By defining relationships, you allow the semantic layer to understand how to join data, enabling metrics, dimensions, and filters that reference fields across different models. ### Basic Structure Relationships are defined within the `relationships:` list inside a model definition. Each item in the list represents a link to another model: ```yaml models: - name: current_model_name # ... dimensions, measures, metrics ... relationships: - name: related_model_name # Required: Name of the model being linked TO primary_key: pk_field # Required: Key field in the *related* model foreign_key: fk_field # Required: Key field in the *current* model type: LEFT # Optional: Join type (LEFT, INNER). Default: LEFT cardinality: many-to-one # Optional: Relationship cardinality (see below) description: "Explanation" # Optional: Business context for the relationship ``` ## 2. Defining Relationships Each relationship entry requires specifying how the current model links to the related model. ### Required Fields * **`name`**: The name of the **related** semantic model you are connecting to. * **`primary_key`**: The name of the field (usually a dimension) in the **related** model that acts as the join key on that side. * **`foreign_key`**: The name of the field (usually a dimension) in the **current** model that acts as the join key on this side. ### Optional Fields * **`description`**: A clear explanation of the relationship in business terms (e.g., "Customer who placed the order"). Highly recommended for clarity. * **`type`**: The join type to use when connecting the models in a query. (See "Join Types" below). * **`cardinality`**: The nature of the relationship between the models. (See "Cardinality Types" below). ### Simple Example: Orders to Customers In an `orders` model, linking to a `customers` model: ```yaml models: - name: orders # ... dimensions: order_id, customer_id, order_date ... relationships: - name: customers # Link to the 'customers' model primary_key: id # The PK in the 'customers' model is 'id' foreign_key: customer_id # The FK in the current 'orders' model is 'customer_id' cardinality: many-to-one # Many orders belong to one customer description: Customer associated with this order ``` ## 3. Relationship Details Understanding cardinality and join types is key to defining accurate relationships. ### Cardinality Types Cardinality describes the numerical relationship between records in the two connected models. Specify this using the `cardinality` field. * `one-to-one`: Each record in the current model corresponds to at most one record in the related model. * `one-to-many`: Each record in the current model corresponds to potentially multiple records in the related model. * `many-to-one` (Common): Multiple records in the current model can correspond to a single record in the related model (e.g., many orders belong to one customer). * `many-to-many`: Records in the current model can correspond to multiple records in the related model, and vice-versa. Requires a junction model (see below). ### Join Types The `type` field specifies the SQL join type used when this relationship is invoked in a query: * **`LEFT`** (Default): Includes all records from the current model (the one defining the relationship), and matching records from the related model. If no match is found in the related model, fields from the related model will be `NULL`. * **`INNER`**: Includes only those records where a match exists in **both** the current and related models based on the specified keys. ### Many-to-Many Relationships Direct many-to-many relationships are typically modeled using an intermediary **junction model**. Each primary model has a one-to-many relationship *to* the junction model. **Example: Products and Orders via `order_items`** ```yaml models: - name: products # ... dimensions: id, product_name ... relationships: # Link from Product to the junction table - name: order_items primary_key: product_id # Key in junction table foreign_key: id # Key in products table cardinality: one-to-many # One product can be in many order items - name: orders # ... dimensions: id, order_date, customer_id ... relationships: # Link from Order to the junction table - name: order_items primary_key: order_id # Key in junction table foreign_key: id # Key in orders table cardinality: one-to-many # One order can have many order items - name: order_items # The Junction Model # ... dimensions: id, order_id, product_id, quantity ... relationships: # Link from junction table back to Products - name: products primary_key: id # Key in products table foreign_key: product_id # Key in junction table cardinality: many-to-one # Many items can be the same product # Link from junction table back to Orders - name: orders primary_key: id # Key in orders table foreign_key: order_id # Key in junction table cardinality: many-to-one # Many items can be in the same order ``` ## 4. Using Relationships: Cross-Model References Once relationships are defined, you can reference dimensions and measures from related models within metrics and filters using dot notation: `related_model_name.field_name`. ```yaml # In the 'orders' model (which has a relationship defined for 'customers') metrics: - name: revenue_per_customer # 'amount' is from orders, 'customers.id' references 'id' from the related customers model expr: "SUM(orders.amount) / COUNT(DISTINCT customers.id)" description: "Average revenue per unique customer. Assumes 'customers' relationship exists." ``` A query using `revenue_per_customer` must explicitly include the `customers` relationship/join for this reference to work. The semantic layer uses the relationship definition to execute the join correctly when requested in the query. ## 5. Best Practices For clarity and completeness, define relationships in both related models (e.g., `orders` links to `customers`, and `customers` links back to `orders`). Ensure `primary_key`, `foreign_key`, and `cardinality` accurately reflect the underlying data structure. Incorrect definitions lead to wrong query results. Use the optional `description` field to explain the business meaning of each relationship. This greatly improves model understandability. Select `LEFT` or `INNER` based on whether records *must* exist in the related model for the query logic. ## 6. Comprehensive Example Showing bidirectional relationships between `orders` and `customers`: ```yaml models: - name: orders description: Customer transaction data. # ... dimensions: order_id, customer_id, order_date ... # ... measures: amount ... relationships: # Link from Orders TO Customers - name: customers primary_key: id # Customer's PK foreign_key: customer_id # Order's FK pointing to customer cardinality: many-to-one description: Customer who placed the order # Potentially link to order_items (junction model) # - name: order_items # ... - name: customers description: Customer account information. # ... dimensions: id, name, signup_date ... relationships: # Link from Customers TO Orders - name: orders primary_key: customer_id # Order's FK pointing back to customer foreign_key: id # Customer's PK cardinality: one-to-many description: Orders placed by this customer ``` ## 7. Next Steps After defining relationships between your models, you can: 1. Leverage these relationships to define cross-model [Metrics](/docs/core-concepts/modeling/metrics) and [Filters](/docs/core-concepts/modeling/filters). 2. [View complete examples](/docs/core-concepts/modeling/examples) of semantic layer definitions incorporating relationships. 3. [Deploy your model](/docs/core-concepts/deploying-models-to-buster/overview). # Prompting Best Practices Source: https://docs.buster.so/docs/core-concepts/prompting-best-practices Effective prompting is key to getting the most out of Buster. By understanding how Buster interprets requests, you can ensure you receive accurate, relevant, and helpful responses. ## Be Direct and Clear for Ad-Hoc Questions Buster excels at answering specific, unambiguous questions about your data. When you have a clear question in mind, state it directly. **Good Examples:** * "How many customers do we have?" * "What was the total revenue last month?" * "Show me monthly social media reactions for the last 12 months." These types of questions are targeted, allowing Buster to understand the exact information you need. **Avoid Vague Requests:** Try to avoid overly broad or ambiguous terms. * **Instead of:** "Sales" **Try:** "What were the total sales figures for Q2?" or "Show me the sales trend over the last year." * **Instead of:** "Who's the best?" **Try:** "Which salesperson had the highest revenue last quarter?" or "Which product category generated the most profit?" * **Instead of:** "Find order 12345" **Try:** "Find the order with internal order ID 12345" or "Find the order with Shopify order ID 12345" (Especially important if multiple ID types exist). While Buster can sometimes handle nuanced requests if the nuances are well-defined within your data or previous interactions, directness and clarity generally yield the best results. ## Provide Context for Exploratory Analysis When using Buster for more investigative or exploratory tasks, providing context is crucial. Buster needs direction to perform meaningful analysis aligned with your business goals. **Why Context Matters:** Without context about your business objectives, Key Performance Indicators (KPIs), relevant timelines, specific goals, or topics of interest, Buster might generate analyses based on metrics that aren't relevant to you. **Example Scenario:** Investigating a Sales Drop * **Less Effective:** "Figure out why sales dropped." (Buster will make assumptions about what metrics to investigate) * **More Effective:** "Figure out why sales dropped last quarter. Did it correlate with our recent website changes? Check performance in the Northeast region compared to others. Our main KPI is conversion rate." **The more context you provide (hypotheses, relevant events, business specifics), the better Buster can tailor its analysis to your needs.** ## Specify Visualization Styling You can instruct Buster on how to style the visualizations it creates. If you have specific preferences for colors, chart types, or other visual elements, include those details in your prompt. **Example:** * "Show monthly revenue for the past year as a line chart, and make the line green." * "Update the 'Sales Performance' dashboard to use a dark theme." Being specific about styling ensures the output matches your requirements. ## Understand Buster's Limitations Currently, Buster's primary capabilities involve: * Building data visualizations * Creating and modifying dashboards * Answering questions based on the connected data sources Buster is **not** currently capable of: * Performing web searches * Conducting advanced statistical analysis (e.g., correlation, regression) * Forecasting future trends Keeping these limitations in mind will help you set appropriate expectations for your requests. # Integrating with dbt Source: https://docs.buster.so/docs/getting-started/dbt-integration How Buster integrates with dbt projects to leverage your existing data models ## Overview Buster is designed to seamlessly integrate with [dbt (data build tool)](https://www.getdbt.com/) projects, allowing you to leverage your existing data models and transform them into semantic models with minimal effort. This integration creates a powerful workflow that combines dbt's transformation capabilities with Buster's AI-powered analytics. ## How the Integration Works The Buster CLI offers native integration with dbt that works through these key mechanisms: ### 1. Project Discovery When you run `buster init` in a directory containing a dbt project: * Buster automatically detects the presence of `dbt_project.yml` * It uses the dbt project name as the default data source name * It identifies model paths based on your dbt project configuration * It configures database connections using settings from your dbt profiles ```bash buster init # Automatically detects dbt project settings ``` ### 2. Catalog Utilization Buster uses dbt's catalog.json as the source of truth for your data models: * It looks for existing catalog.json files generated by `dbt docs generate` * If no catalog exists, Buster can trigger the catalog generation for you * The catalog provides comprehensive metadata about your models, columns, and documentation ```bash # Buster uses this catalog data dbt docs generate buster generate ``` ### 3. Semantic Model Generation When transforming dbt models into Buster semantic models: * Each dbt model becomes a semantic model in Buster * Column types are intelligently mapped to dimensions or measures: * Numeric columns become measures with appropriate aggregations * Text, date, boolean columns become dimensions * Descriptions from dbt are preserved in the semantic layer * Primary keys and relationships are inferred where possible ## Setup Process To set up a Buster project with your existing dbt project: 1. **Navigate to your dbt project directory**: ```bash cd your-dbt-project ``` 2. **Ensure you have a current dbt catalog**: ```bash dbt docs generate ``` This creates/updates the catalog.json file that Buster will use. 3. **Initialize your Buster project**: ```bash buster init ``` The CLI will automatically detect your dbt project and suggest appropriate settings. 4. **Generate semantic models**: ```bash buster generate ``` This transforms your dbt models into Buster semantic models. ## Directory Structure Integration Buster is flexible in how it integrates with your dbt directory structure: ``` your-dbt-project/ ├── dbt_project.yml ├── models/ │ ├── marts/ │ │ └── core/ │ │ ├── orders.sql │ │ └── orders.yml # dbt YAML │ └── staging/ │ └── ... ├── buster.yml # Created by buster init └── semantic_models/ # Created by buster generate ├── marts/ │ └── core/ │ └── orders.yml # Buster semantic model └── staging/ └── ... ``` You can choose to: * Generate semantic models alongside your SQL files * Use a separate directory for semantic models * Customize paths to fit your project structure ## Updating Models When your dbt models change: 1. **Update your dbt models and run your dbt pipeline**: ```bash dbt run dbt docs generate # Update the catalog ``` 2. **Regenerate or update your semantic models**: ```bash buster generate ``` This preserves your custom enhancements while adding new columns or models. 3. **Deploy your updated models**: ```bash buster deploy ``` ## Debugging dbt Integration If you encounter issues with the dbt integration: * Ensure your dbt catalog is up-to-date with `dbt docs generate` * Check that your dbt profiles.yml has valid database credentials * Review the schema and database settings in both dbt and Buster configurations ## Example Workflow Here's an example of a complete workflow integrating dbt and Buster: ```bash # 1. Update your dbt models vim models/marts/core/orders.sql # 2. Run your dbt pipeline dbt run --select orders dbt test --select orders dbt docs generate # 3. Update your semantic models buster generate # 4. Enhance your semantic models with custom metrics vim semantic_models/marts/core/orders.yml # 5. Deploy to Buster buster deploy ``` ## Benefits of Integration The integration between Buster and dbt provides several advantages: * **No Duplicate Work**: Leverage existing dbt models and their documentation * **Automatic Updates**: Keep your semantic layer in sync with schema changes * **Consistent Naming**: Maintain consistency between transformation and semantic layers * **Familiar Workflow**: Use tools that complement your existing data engineering practices * **Version Control**: Keep both dbt and Buster models in the same repository ## Next Steps After setting up the integration: 1. Enhance your generated semantic models with [metrics](../core-concepts/modeling/metrics.mdx) and [relationships](../core-concepts/modeling/relationships.mdx) 2. Deploy your semantic layer with [`buster deploy`](../using-the-cli/deploy-command.mdx) 3. Start asking questions of your data using Buster's AI capabilities ## Related Documentation * [Initialize Projects](../using-the-cli/init-command.mdx) * [Generate Models](../using-the-cli/generate-command.mdx) * [Deploy Command](../using-the-cli/deploy-command.mdx) * [Buster.yml Reference](../references/buster-yml.mdx) # Overview Source: https://docs.buster.so/docs/getting-started/overview Buster is an open-source platform for building and deploying AI data analysts. You can use these analysts to analyze data, answer ad-hoc questions, build reports, and more.