Models
Learn how to define models in your semantic layer
Purpose of Models
Semantic models provide a flexible analytical layer on top of your data warehouse, aiming to create a digital twin of your business processes and concepts. They allow you to define business logic, metrics, and relationships in a centralized, governed place. This practice, often referred to as “shifting left”, involves moving analytical logic out of downstream BI tools or ad-hoc query generation (like that performed by AI agents) and into this upstream modeling layer. By doing so, you ensure logic is versioned, testable, and consistently applied across all uses.
Instead of embedding complex logic in individual queries or reports, or worse, allowing an AI to make assumptions about raw data (where infinite interpretations exist), you define key business concepts like “total revenue” or “active user” once in the model. A key benefit is that if a metric’s definition changes, you update it in one place (the model), and that change automatically propagates to all analyses and tools using the model. This ensures consistency and reliability, dramatically reducing the risk of inconsistent answers or technical debt. For AI agents like Buster, this pre-defined logic is crucial; it allows the AI to leverage your curated business understanding, delivering trustworthy results instead of potentially flawed interpretations based on ambiguous raw data.
Model Structure
Models represent business entities or concepts, typically mapping to a table or view in your database. Each model is defined with a name and can optionally include descriptions, dimensions, measures, metrics, filters, and entities (relationships).
Required Fields
- name: A unique identifier for the model, typically matching the table name in your database.
- description: A clear explanation of what the model represents. This field is mandatory to ensure clarity and context for all users and downstream tools like AI agents.
Optional Fields
All other fields within a model definition besides name
and description
are optional. You can create a model with just these two required fields, or include any combination of dimensions
, measures
, metrics
, filters
, and entities
as needed.
- dimensions: Attributes used for slicing and dicing data (e.g.,
user_country
,product_category
). - measures: Aggregations performed on data (e.g.,
total_revenue
,average_order_value
). - metrics: Key performance indicators derived from measures and dimensions (e.g.,
conversion_rate
). - filters: Named, reusable boolean conditions (e.g.,
is_active
,last_30_days
) to simplify common query constraints. See Defining Filters. - entities: Define relationships between this model and others.
Modeling Approaches
While the structure above defines how to build a model, it’s helpful to consider what you are modeling. There are a few common approaches:
1. Entity-Focused Models
These models represent core business objects or concepts. They often map directly to dimension tables in a traditional data warehouse.
- Examples:
customers
,products
,employees
,stores
,invoices
,aircraft
. - Characteristics: Primarily consist of descriptive
dimensions
(customer name, product category, employee department, store location, invoice status). May also include somemeasures
directly related to the entity (e.g.,credit_limit
for a customer) or summary/rollupmetrics
(e.g.,total_invoiced_amount
on an invoice model). - Filters: Often include filters based on the entity’s attributes (e.g.,
is_premium_customer
,product_is_discontinued
).
2. Event-Focused Models
These models represent occurrences or interactions that happen over time. They often map to fact tables or event stream data.
- Examples:
page_views
,orders
,transactions
,login_events
,support_tickets
,security_incidents
. - Characteristics: Almost always include a timestamp
dimension
. They containdimensions
describing the event (e.g., URL for page view, order status, transaction type, user agent for login). They typically containmeasures
quantifying the event (e.g.,order_amount
,view_duration_seconds
,transaction_value
). Summarymetrics
related to the event itself might be included. - Relationships: Event models usually have relationships (
entities
) linking back to multiple Entity-Focused models (e.g., anorders
event links tocustomers
,products
, andemployees
). - Filters: Common filters relate to time (
last_7_days
,this_quarter
) or event attributes (successful_logins
,high_priority_tickets
).
3. Metric-Focused Models (Fact Models)
These models center around a core business metric or Key Performance Indicator (KPI). They often correspond to fact tables in a star or snowflake schema and serve as the central point for analyzing a specific metric across various business dimensions.
- Examples:
monthly_revenue
,daily_user_activity
,campaign_performance_facts
. - Characteristics: Primarily contain
measures
that contribute to the coremetric
(s). Thedimensions
are typically foreign keys (e.g.,date_id
,customer_id
,product_id
,region_id
) that link to separate Entity-Focused models. They might also include degenerate dimensions (attributes directly on the fact record). - Metrics: The main purpose is often to define one or more significant
metrics
(e.g.,total_revenue
,active_users
,conversion_rate
) calculated from the base measures. - Relationships: Heavily rely on
entities
to connect the foreign key dimensions back to their respective dimension/entity models. - Filters: Often include time-based filters (
last_quarter
,year_to_date
) or filters related to the metric calculation itself.
Best Practices & Combining Approaches:
- Start with Entities and Events: Most semantic layers are built primarily around Entity-Focused and Event-Focused models, as these directly represent the core nouns and verbs of your business.
- Use Relationships: Define
entities
(relationships) clearly to connect your models. This is how Buster understands how events relate to entities (e.g., which customer placed which order). - Mix and Match: Real-world data often requires a combination. An
orders
model (Event) will link tocustomers
andproducts
(Entities). - Keep Models Focused: Aim for each model to represent a single, well-defined concept (one entity or one type of event).
- Metric-Focused for Performance: Consider Metric-Focused models if you have very large datasets and need pre-aggregation for performance, or if a specific, central metric links many otherwise disparate entities.
Organizing Your Model Files
You have flexibility in how you structure your model definition files:
- Single File: You can define all your models within a single YAML file under the top-level
models:
key. - Multiple Files: You can split your model definitions across multiple YAML files. Buster will discover and load models from all
.yml
files within the specified model paths in yourbuster.yml
.
Recommendation: We recommend organizing models into separate files, potentially grouped into subdirectories based on business domain (e.g., models/finance/
, models/marketing/
, models/product/
). This improves discoverability and maintainability as your semantic layer grows.
Naming Conventions
- Use snake_case for model names (e.g.,
order_items
, notOrderItems
ororder-items
). - Choose names that reflect the business entity or concept, not necessarily the exact technical table name.
- Be consistent with naming patterns across your semantic layer.
Best Practices
Clear Descriptions
Always include descriptive text that explains what the model represents in business terms, even if brief.
Start Simple
Begin with a core model definition (just the name
) and add dimensions, measures, etc., incrementally as analysis requires.
Align with Data Source
Generally, each model should align conceptually with a database table or view, but it’s not a strict requirement.
Logical Grouping
Use separate files and potentially subdirectories to group related models logically (e.g., by business domain like finance, marketing).
Example
Here’s a more detailed example of a page_views
model, demonstrating various optional components:
Next Steps
After defining your model structure, you’ll need to add the specific components: