Introduction to Metrics

Metrics are calculations based on measures or dimensions within your semantic models. They represent key business performance indicators (KPIs) and other frequently used calculations, encapsulating business logic centrally.

Here’s the basic structure for defining a metric:

metrics:
  - name: metric_name     # Required - A unique identifier for the metric
    expr: "SQL_expression" # Required - SQL-like expression defining the calculation
    description: "Business definition" # Required - Clear explanation of the metric
    args:                   # Optional - Used for parameterized metrics
      - name: parameter_name
        type: parameter_type
        description: "Parameter description"
        default: default_value        # Optional

Metrics vs. Measures

It’s helpful to distinguish Metrics from Measures (and Dimensions):

  • Dimensions: Describe attributes used for grouping or filtering (e.g., country, status).
  • Measures: Represent direct, aggregatable calculations on a single column, often corresponding to a raw data field (e.g., SUM(amount) defined as a measure named total_amount, COUNT(user_id) defined as user_count). They form the basic building blocks for quantitative analysis.
  • Metrics: Define more complex calculations, business logic, or Key Performance Indicators (KPIs) that often combine multiple measures, dimensions, or involve conditional logic (e.g., total_revenue / unique_users, ratio_of_completed_orders). They encapsulate reusable business definitions.

Guideline:

  • If you need a simple, direct aggregation of a column (like sum, count, average), define it as a Measure.
  • If you need to define a specific business KPI, a ratio, a calculation involving multiple steps or measures, or a parameterized calculation, define it as a Metric.

Defining Basic Metrics

At their core, metrics combine a name, a calculation expression (expr), and a description.

Metric Fields

FieldRequiredDescription
nameYesUnique name for the metric within the model.
exprYesThe SQL-like expression defining the metric’s calculation. Can reference measures, dimensions, and arguments.
descriptionYesA human-readable description of the metric. This is mandatory.
argsNoA list of Argument objects to parameterize the metric’s expression. Defaults to an empty list if omitted.

Writing the expr

The expr field is where you define the metric’s calculation using SQL syntax compatible with your data warehouse.

  • Referencing Fields: You can directly reference dimensions and measures defined within the same model (e.g., amount, user_id).
  • Aggregations: Use standard SQL aggregation functions: SUM, AVG, COUNT, COUNT DISTINCT, MIN, MAX, etc.
  • Operators: Arithmetic (+, -, *, /), logical (AND, OR, NOT), comparison (=, >, <, etc.) are supported.
  • Functions: Most standard SQL functions like CASE, COALESCE, NULLIF, date functions, string functions, etc., can be used.

When referencing fields within the same model in the expr, you typically don’t need a table prefix (e.g., SUM(amount) is usually sufficient if amount is a measure in the current model). However, using prefixes (e.g., SUM(orders.amount)) can enhance clarity, especially in complex models. See Best Practices for more on prefixes when using related models.

Example: Basic Metric

metrics:
  - name: total_revenue
    expr: "SUM(orders.amount)"
    description: >
      The total revenue calculated by summing the 'amount' of all orders included
      in the query context. Represents the gross revenue.

  - name: order_count
    expr: "COUNT(DISTINCT orders.order_id)"
    description: >
      The total number of unique orders included in the query context, based on the
      distinct count of 'order_id'.

Advanced Metric Types

Beyond basic calculations, you can create more dynamic and interconnected metrics.

Parameterized Metrics

Create flexible metrics that accept user inputs at query time. Define parameters using the args field and reference them in the expr using double curly braces {{parameter_name}}.

The args list contains Argument objects. See the Semantic Model Reference for the full specification of Argument fields (name, type, description).

The Argument object specification also includes an optional default field.

The specification in semantic-models.mdx does not include a default field for arguments. Parameter values must be provided at query time.

Example: Parameterized Metric

metrics:
  - name: revenue_in_last_n_days
    description: Calculates total revenue over the last N days, specified by the user or defaulting to 30 days.
    # Assumes 'order_date' and 'amount' are fields in the current model
    expr: "SUM(CASE WHEN orders.order_date >= DATE_SUB(CURRENT_DATE(), INTERVAL {{days}} DAY) THEN orders.amount ELSE 0 END)" # Prefixes added
    args:
      - name: days
        type: number
        description: Number of days to include in the revenue calculation.
        default: 30

Cross-Model Metrics

Reference fields (dimensions or measures) from related models using dot notation (related_model_name.field_name). This requires an relationship to be defined between the models involved. See the Semantic Model Reference for details on defining relationships. Note that relationships is the keyword used in the YAML, though it conceptually represents the entity connections between models.

Example: Cross-Model Metric

# In the 'orders' model, assuming a relationship named 'customer' links to a 'customers' model
metrics:
  - name: revenue_per_customer
    description: >
      The average revenue per unique customer represented in the orders query context. Requires joining the 'orders'
      model with the 'customers' model via the 'customer' relationship to access 'customer.id'.
    # Assumes 'amount' is a measure in the 'orders' model
    # Assumes 'customer.id' references the 'id' dimension in the related 'customers' model via the 'customer' relationship
    expr: "SUM(orders.amount) / NULLIF(COUNT(DISTINCT customer.id), 0)"

Using cross-model references requires careful consideration of joins when querying. See “Handling Joins and Related Models” below.

Best Practices & Considerations

Follow these guidelines to create robust, understandable, and maintainable metrics.

General Best Practices

Business Focus

Define metrics that represent meaningful business KPIs, translating raw data into actionable insights.

Clear Naming & Descriptions

Use consistent, descriptive names and provide mandatory, clear descriptions explaining the calculation and business purpose.

Leverage Existing Components

Build metrics by referencing existing measures and dimensions within the model rather than duplicating complex logic.

Document Dependencies

If a metric relies on parameters or data from related models (via relationships), clearly state these requirements in its description.

Expression Best Practices

  • Type Consistency: Ensure metric expressions resolve to appropriate data types (usually numeric) suitable for aggregation or further calculations.
  • Use Table Prefixes for Related Models: Always use table-qualified names (e.g., customer.id, orders.amount) in your expr when referencing fields from related models (via relationships) to prevent ambiguity.
  • SQL Conventions:
    • Wrap complex expressions or calculations in parentheses () to ensure correct operator precedence.
    • Handle potential division-by-zero errors using NULLIF(denominator, 0). Example: SUM(profit) / NULLIF(SUM(revenue), 0).
    • Be mindful of nulls. Use COALESCE or other relevant SQL functions to handle them appropriately within calculations, especially when dealing with outer joins.
  • Avoid Circular References: Do not define metrics that refer back to themselves directly or indirectly.

Good Expression Example (Parameterized):

metrics:
  - name: avg_order_value_by_date
    description: Average order value for orders within a specified date range. Requires 'orders.created_at' and 'orders.amount' fields.
    # Uses table prefixes for clarity
    expr: "SUM(CASE WHEN orders.created_at BETWEEN '{{start_date}}' AND '{{end_date}}' THEN orders.amount ELSE 0 END) / NULLIF(COUNT(CASE WHEN orders.created_at BETWEEN '{{start_date}}' AND '{{end_date}}' THEN orders.id ELSE NULL END), 0)"
    args:
      - name: start_date
        type: date
        description: The start date (YYYY-MM-DD). # Required
      - name: end_date
        type: date
        description: The end date (YYYY-MM-DD). # Required

Parameter Best Practices

  • Explicit Typing: Always define the type for each parameter (string, number, date, boolean). This enables validation and correct query generation.
  • Argument Fields: Ensure each argument in the args list includes the required name, type, and description fields.
  • Defaults: Provide default values for optional parameters.
  • Clear Documentation: Provide a meaningful description for each parameter.
  • Placeholders: Reference parameters in the expr using double curly braces: {{parameter_name}}.
  • Type Handling:
  • Primary Association: Each metric belongs primarily to the model where it’s defined.
  • Explicit Joins Required in Query: If a metric’s expr references fields from a related model (e.g., customer.id), the query using that metric must explicitly include the necessary join to that related model via its defined relationship.
  • No Automatic Joins: The semantic layer does not automatically infer or add joins based solely on the metric’s expr. It validates that joins used in a query correspond to defined relationships.
  • Robustness with Joins: When dealing with related models (especially via outer joins), use CASE statements or check for nulls on fields from those models to make the metric resilient to missing data.

Comprehensive Example

Here’s an example within an orders model context, incorporating various metric types and practices:

name: orders
description: >
  Represents individual customer order transactions. Contains details about each order,
  including identifier, customer linkage, timing, status, and financial value. This is
  an event-based model crucial for sales analysis, revenue tracking, and understanding
  purchase patterns.
dimensions:
  - name: order_id
    description: Unique identifier for the order transaction.
    type: string
  - name: order_date
    description: Date the order was placed.
    type: date
  - name: status
    description: Current fulfillment or payment status of the order.
    type: string
  - name: customer_id # Foreign key to a customers model
    description: Identifier linking the order to a customer.
    type: string
measures:
  - name: amount
    description: Total monetary value of the order (e.g., in USD).
    type: number
  - name: item_count
    description: Number of distinct items included in the order.
    type: integer
relationships:
  - name: customer # Assumes a relationship named 'customer' to the 'customers' model
    description: Links orders to their corresponding customer.
    source_col: customer_id
    ref_col: id
    # ... other relationship properties like type, cardinality ...
metrics:
  # Basic Metrics
  - name: total_revenue
    description: >
      The total revenue calculated by summing the 'amount' of all orders included
      in the query context. Represents the gross revenue.
    expr: "SUM(orders.amount)"

  - name: average_order_value
    description: >
      The average order 'amount' calculated across all orders included
      in the query results. Useful for understanding typical transaction size.
    expr: "AVG(orders.amount)"

  - name: order_count
    description: >
      The total number of unique orders included in the query context, based on the
      distinct count of 'order_id'.
    expr: "COUNT(DISTINCT orders.order_id)"

  - name: average_items_per_order
    description: >
      The average number of items ('item_count') per order included in the query results.
      Provides insight into typical order composition.
    expr: "AVG(orders.item_count)"

  # Parameterized Metric
  - name: revenue_by_period
    description: >
      Total revenue from orders placed within the last N days, where N is specified by the 'days' parameter.
      Depends on 'orders.order_date' and 'orders.amount'.
    expr: "SUM(CASE WHEN orders.order_date >= DATE_SUB(CURRENT_DATE(), INTERVAL {{days}} DAY) THEN orders.amount ELSE 0 END)"
    args:
      - name: days
        type: number
        description: Number of days (integer) to look back from the current date for the calculation.
        default: 30

  # Cross-Model Metric
  - name: revenue_per_customer
    description: >
      The average revenue per unique customer represented in the orders query context. Requires joining the 'orders'
      model with the 'customers' model via the 'customer' relationship to access 'customer.id'.
    expr: "SUM(orders.amount) / NULLIF(COUNT(DISTINCT customer.id), 0)"

  # More Complex Parameterized Metric
  - name: revenue_growth_vs_prior_periods
    description: >
      Calculates the percentage growth in total revenue compared to N periods ago, ordered by 'order_date'.
      N is specified by the 'periods' parameter. Requires 'orders.order_date' and 'orders.amount'.
      **Important**: This calculation relies on the `LAG` window function, which requires the query using this metric to be structured correctly, typically grouped and ordered by a consistent time period (e.g., day, month) corresponding to `orders.order_date`. The results may be incorrect if the query context doesn't align with the `OVER (ORDER BY orders.order_date)` clause. Note: Exact `LAG` syntax depends on the data warehouse SQL dialect.
    expr: "(SUM(orders.amount) - LAG(SUM(orders.amount), {{periods}}) OVER (ORDER BY orders.order_date)) / NULLIF(LAG(SUM(orders.amount), {{periods}}) OVER (ORDER BY orders.order_date), 0)"
    args:
      - name: periods
        type: number
        description: Number of prior periods (integer) to compare against (e.g., 1 for the immediately preceding period).
        default: 1