Introduction to Dimensions and Measures

Dimensions and measures are the fundamental building blocks of your semantic models. They define the fields available for analysis, describing your data and enabling calculations.

  • Dimensions are descriptive attributes (like product category, date, or user ID) used for grouping and filtering.
  • Measures are quantifiable values (like sales amount, quantity, or cost) used for aggregations and calculations.

Here’s the basic structure for defining them within a model:

dimensions:
  - name: dimension_name   # Required - Unique identifier, often a column name
    description: "Business definition" # Required - Clear explanation
    type: data_type         # Required - Underlying data type (e.g., string, date)

measures:
  - name: measure_name      # Required - Unique identifier, often a column name
    description: "Business definition" # Required - Clear explanation
    type: data_type         # Required - Underlying data type (e.g., number, integer)

Dimensions

Dimensions represent the descriptive attributes or characteristics of your data. They are typically non-numeric or categorical fields used to slice, dice, group, and filter data. Think of them as the “who, what, where, when, why” of your analysis. Examples include dates, user IDs, product categories, geographical locations, or status flags.

FieldRequiredDescription
nameYesUnique name for the dimension within the model. Typically matches a column name.
descriptionYesA human-readable description of the dimension. This is mandatory.
typeYesThe raw data type from the underlying database or data warehouse (e.g., VARCHAR, INT, TIMESTAMP).
searchableNoIndicates if this dimension should be indexed for searching. Defaults to false if omitted.
optionsNoA list of predefined string values for this dimension, useful for categorical data.

Measures

Measures are quantitative fields that can be aggregated, such as quantities or amounts.

FieldRequiredDescription
nameYesUnique name for the measure within the model. Typically matches a column name.
descriptionYesA human-readable description of the measure. This is mandatory.
typeYesThe underlying column type as defined in the database/warehouse. Supported aggregation types depend on the query engine.

Best Practices

Descriptive Names

Use clear, business-oriented names that convey meaning.

Add Descriptions

Include detailed descriptions to help Buster understand the context.

Set Data Types

Explicitly setting types ensures proper handling of values.

Include Units

Mention units of measurement in descriptions (e.g., “in USD”, “in days”).

Example

Let’s illustrate these concepts with a practical example using our standardized orders semantic model. This model represents data about customer orders.

name: orders
description: >
  Represents individual customer order transactions. Contains details about each order,
  including identifier, customer linkage, timing, status, and financial value. This is
  an event-based model crucial for sales analysis, revenue tracking, and understanding
  purchase patterns.
dimensions:
  - name: order_id
    description: >
      The unique identifier for each order transaction. Serves as the primary key for
      this model. Used for looking up specific orders or counting unique orders.
    type: string
  - name: customer_id
    description: >
      Foreign key referencing the 'id' in the 'customers' model. Links the order
      back to the specific customer who placed it. Essential for customer-centric analysis.
    type: string
  - name: order_date
    description: >
      The date the order was placed. Used for time-series analysis, filtering by date ranges
      (e.g., daily, weekly, monthly revenue), and cohort analysis based on order time.
    type: date
    searchable: true # Common to filter/search by date
  - name: status
    description: >
      The current fulfillment or payment status of the order (e.g., 'pending', 'shipped',
      'delivered', 'cancelled'). Used for operational reporting and filtering orders by state.
    type: string
    options: ["pending", "shipped", "delivered", "cancelled"] # Example predefined values
measures:
  - name: amount
    description: >
      The total monetary value of the order, typically in a specific currency (e.g., USD).
      This is the primary value used for revenue calculations (SUM, AVG).
    type: number
  - name: item_count
    description: The total number of distinct items included in the order. Used for analyzing order size.
    type: integer

Dissecting the Example:

  • dimensions:
    • order_id: This is a classic dimension. While it might be numeric or a string, its primary purpose is identification, not calculation. You use it to find specific orders.
    • customer_id: This is a classic dimension. While it might be numeric or a string, its primary purpose is identification, not calculation. You use it to find specific orders.
    • order_date: Dates are dimensions because they provide context (when something happened). You use them to filter (“orders this week”) or group (“revenue by month”). Making it searchable: true optimizes queries that filter by this date.
    • status: This is categorical text. It describes the state of an order. You use it to group (“count of orders by status”) or filter (“show only ‘shipped’ orders”). The options list helps ensure data consistency and can populate UI elements.
  • measures:
    • amount: This is a core measure. It’s a numeric value representing how much. You’ll typically perform mathematical operations on it, like SUM(amount) for total revenue or AVG(amount) for average order value. The description clarifies the unit (USD).
    • item_count: Similar to amount, this is a numeric value representing how many. It’s meant for aggregation like SUM(item_count) to find the total number of items sold.

This structure clearly separates the descriptive attributes (dimensions) used for context and filtering from the quantifiable values (measures) used for calculations and aggregations.

Next Steps

Now that you’ve defined dimensions and measures, you can:

  1. Create metrics for business calculations
  2. Establish relationships between models