Introduction to Filters

Filters define named, reusable conditions that resolve to true or false. They encapsulate common query constraints, allowing users to easily apply predefined criteria without rewriting the logic repeatedly. Think of them as saved segments or common WHERE clauses.

Using filters makes queries cleaner, more consistent, and easier to understand, focusing the analysis on specific questions rather than boilerplate filtering logic.

Here’s the basic structure for defining a filter:

filters:
  - name: filter_name          # Required: Unique identifier (snake_case)
    description: "Business definition" # Required: Clear explanation
    expr: "SQL_boolean_expression" # Optional if name matches a boolean dimension
    args:                      # Optional: For parameterized filters
      - name: param_name
        type: param_type
        description: "Parameter description" # Required
        default: default_value        # Optional

Filters vs. Dimensions/Measures

It’s crucial to distinguish Filters from Dimensions and Measures:

  • Dimensions: Describe attributes (e.g., country, status). Used for grouping, slicing, and direct filtering (WHERE country = 'USA'). A boolean Dimension (e.g., is_active) represents a fundamental true/false attribute stored in the data.
  • Measures: Quantifiable, aggregatable values (e.g., total_revenue). Used in aggregation functions (SUM) and HAVING clauses.
  • Filters: Define reusable conditions or subsets (e.g., last_30_days, high_value_customers). They must resolve to boolean and generate WHERE clause logic.

Guideline:

  • If a boolean concept describes a direct attribute (e.g., a user’s stored active status), use a boolean Dimension.
  • If you need a reusable condition (especially involving calculations, date logic, combined attributes, or related data), use a Filter.

Defining Basic Filters

Basic filters define simple, non-parameterized conditions.

Filter Fields

FieldRequiredDescription
nameYesUnique name for the filter within the model.
descriptionYesExplains what condition the filter represents. This is mandatory.
exprNoThe SQL expression defining the filter’s logic (must evaluate to boolean). Optional only if name matches a boolean dimension.
argsNoA list of Argument objects to parameterize the filter’s expression. Defaults to an empty list if omitted.

Writing the expr

  • Purpose: Defines the filter’s logic as a SQL expression (compatible with your data warehouse) that evaluates to true or false.
  • Optionality: expr is optional only if the filter’s name exactly matches the name of a boolean dimension defined within the same model. In this case, the filter implicitly means dimension_name = true.
    • Convention: Name filters that map directly to boolean dimensions the same as the dimension (e.g., filter is_active for dimension is_active).
  • Content: The expr can reference:
    • Dimensions in the model’s underlying table/view.
    • SQL functions and operators.
    • Literals (strings, numbers, booleans).
    • Arguments (using {{arg_name}} syntax).

When referencing fields within the same model in the expr, you typically don’t need a table prefix (e.g., country = 'USA'). However, using prefixes (e.g., users.country) can enhance clarity, especially when filters might be used in queries involving joins (even implicit ones via relationships). See Best Practices.

Example: Basic Filters

# In a 'users' model with dimensions: user_id, created_at, country, is_active (boolean)

filters:
  # Filter mapping directly to a boolean dimension (expr omitted)
  - name: is_active
    description: Users who are currently marked as active.

  # Filter based on a single dimension expression
  - name: is_us_based
    description: Users located in the USA.
    expr: "country = 'USA'" # Expression required

  # Filter using a time-based expression
  - name: recent_signups
    description: Users who signed up in the last 90 days.
    expr: "created_at >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)"

  # Filter combining conditions within the *same* model
  # Assumes a measure 'lifetime_value' and dimension 'last_login_date' exist
  - name: high_value_inactive_users
    description: Users with LTV > 500 who haven't logged in for 60 days.
    # Using prefixes for clarity, though optional if LTV/date are in the same model
    expr: "users.lifetime_value > 500 AND users.last_login_date < DATE_SUB(CURRENT_DATE(), INTERVAL 60 DAY)"

Advanced Filter Types

Filters can incorporate parameters or reference data from related models.

Parameterized Filters

Create dynamic filters that accept user input via args. Reference parameters using {{parameter_name}}.

  • args: A list defining each parameter. See the Semantic Model Reference for the full specification of Argument fields (name, type, description, default).

Example: Parameterized Filter

# In an 'orders' model with dimension 'order_date'
filters:
  - name: orders_within_date_range
    description: >
      Selects orders placed within a specific date range (inclusive).
      Requires the user to provide both a start and end date.
    # Assumes 'orders.order_date' is the relevant dimension
    expr: "orders.order_date BETWEEN {{start_date}} AND {{end_date}}"
    args:
      - name: start_date
        type: date
        description: "The start date (YYYY-MM-DD) for the filter range."
      - name: end_date
        type: date
        description: "The end date (YYYY-MM-DD) for the filter range."

Cross-Model Filters

Filters can define conditions based on data in related models, provided a relationship exists. The semantic layer uses these relationships to potentially generate necessary joins or subqueries when the filter is applied in a query that also includes the required join defined by the relationship.

See the Semantic Model Reference for details on defining relationships.

Example: Cross-Model Filter

# In a 'customers' model that has a relationship named 'region'
# linking to a 'regions' model (which has a 'sales_manager' dimension).
filters:
  - name: customers_in_region_managed_by
    description: >
      Selects customers located in a region assigned to a specific sales manager.
      Requires using the 'region' relationship to access 'regions.sales_manager'.
    # This expression uses the 'region' relationship to access a field
    # from the related 'regions' model.
    expr: "region.sales_manager = {{manager_name}}"
    args:
      - name: manager_name
        type: string
        description: "The name of the sales manager to filter by."
    # NOTE: Querying this filter requires joining customers to regions via the 'region' relationship.

Using cross-model references requires careful consideration of joins when querying. See “Handling Joins and Related Models” below.

Best Practices & Considerations

Follow these guidelines for effective filter definition.

General Best Practices

Ensure Boolean Output

The expr must evaluate to a boolean (true/false).

Clear Naming & Descriptions

Use descriptive names (snake_case) and mandatory, clear descriptions explaining the condition.

Match Boolean Dimensions

For filters representing dimension = true, name the filter identically to the boolean dimension and omit the expr.

Document Dependencies

Clearly state required relationships/joins or parameters in the description.

Expression Best Practices

  • Use Table Prefixes for Related Models: Strongly recommended when expr involves fields that might also exist in other models or when using cross-model filters via relationships. Use model_name.column_name (e.g., users.country) or relationship_name.column_name (e.g., user_orders.product_sku).
  • Clarity and Readability: Write clear expr logic. Use parentheses () for complex conditions to ensure correct operator precedence.
  • Null Handling: Be mindful of nulls. Use COALESCE, IS NULL, IS NOT NULL etc., within expr as needed for robust logic, as filter expressions are often used in WHERE clauses where null propagation matters.

Good Expression Example (Parameterized):

filters:
  - name: users_signed_up_between
    description: Filters users based on a signup date range. Requires 'users.created_at'.
    # Uses table prefix for clarity
    expr: "users.created_at BETWEEN '{{start_date}}' AND '{{end_date}}'"
    args:
      - name: start_date
        type: date
        description: The start date (YYYY-MM-DD).
      - name: end_date
        type: date
        description: The end date (YYYY-MM-DD).

Parameter Best Practices

  • Explicit Typing: Always define the type for each parameter (string, integer, number, date, boolean).
  • Argument Fields: Ensure each argument in the args list includes the required name, type, and description fields.
  • Defaults: Provide default values for optional parameters.
  • Clear Documentation: Provide a meaningful description for each parameter.
  • Placeholders: Reference parameters in the expr using double curly braces: {{parameter_name}}.
  • Primary Model Association: Filters are defined within a specific model.
  • Explicit Joins Required in Query: If a filter’s expr references fields from a related model (via a relationship), the query using that filter must explicitly include the necessary join to that related model (using the relationship).
  • No Automatic Joins: The semantic layer does not automatically add joins based solely on a filter’s expr. It validates that joins used in a query correspond to defined relationships.
  • Document Join Requirements: The filter’s description must state if it relies on data from related models and which relationships/joins are required. Example: “Filters for users whose latest order was cancelled. Requires joining using the ‘latest_order’ relationship.”

Comprehensive Example

Here’s an example within a customers model context, incorporating various filter types and practices:

name: customers
description: >
  Represents individual customer entities. Contains contact information, registration details,
  location, and derived metrics like lifetime value. This is a central model for
  understanding customer attributes and behavior segments. Essential for joining customer
  data with transactions, support interactions, etc.
dimensions:
  - name: id
    description: >
      The unique identifier for a customer, typically corresponds to the primary key
      in the source customer table. Essential for joining with event models like orders.
    type: string
  - name: name
    description: The full name of the customer. Used for display and personalization.
    type: string
  - name: email
    description: >
      The customer's primary email address. Used for communication, login identification,
      and potentially linking across different systems. Assumed to be unique per customer.
    type: string
  - name: signup_year
    description: >
      The year the customer first registered or created their account. Useful for cohort
      analysis based on tenure (e.g., comparing behavior of 2022 vs 2023 signups).
    type: integer
  - name: country
    description: >
      The country where the customer is located, typically derived from registration
      or geo-IP lookup. Often stored as ISO 3166-1 alpha-2 code (e.g., 'US', 'GB'). Used for
      geographical segmentation and analysis.
    type: string
  - name: user_created_at
    description: >
      The precise date and time (UTC recommended) when the customer account was created.
      Provides granular detail for time-based cohorting or analyzing initial user activity.
    type: timestamp
    searchable: true
  - name: is_active
    description: Boolean flag indicating if the customer account is currently considered active (e.g., not churned, not suspended).
    type: boolean
  - name: last_login_date
    description: The date of the customer's most recent login or activity. Used to identify inactive users.
    type: date
  - name: lifetime_value # Added as dimension for filter example clarity
    description: Pre-calculated lifetime value for the customer.
    type: number
relationships:
  - name: orders # Link back TO orders model
    source_col: id           # Key in customers (current model)
    ref_col: customer_id    # Key in orders (related model)
    description: >
      Links this customer to all their associated order records in the 'orders' model.
      Enables analysis of purchasing history and behavior for each customer.
    cardinality: one-to-many # One customer can have many orders
    type: LEFT             # Standard join type when starting from customer
filters:
  # Basic Filters
  - name: is_active # Maps to boolean dimension
    description: Customers currently marked as active.

  - name: is_us_based # Simple expression
    description: Customers located in the USA.
    expr: "customers.country = 'USA'"

  # Parameterized Filter
  - name: registered_after_date
    description: Customers who registered on or after a specified date.
    expr: "customers.user_created_at >= '{{signup_date}}'"
    args:
      - name: signup_date
        type: date
        description: The minimum registration date (YYYY-MM-DD).

  # Cross-Model Filter (using 'orders' relationship)
  - name: ordered_sku_recently
    description: "Customers who ordered a specific SKU in the last N days. Requires joining via 'orders' relationship."
    # Assumes orders model has product_sku and order_date
    expr: "orders.product_sku = '{{sku}}' AND orders.order_date >= DATE_SUB(CURRENT_DATE(), INTERVAL {{days}} DAY)"
    args:
      - name: sku
        type: string
        description: The product SKU to check for.
      - name: days
        type: number
        description: Number of days to look back.
        default: 30

  # More Complex Filter (combining local and related data implicitly)
  - name: high_value_ordered_sku_recently
    description: "High-value active customers (LTV > 1000) who ordered a specific SKU in the last N days. Requires 'orders' join."
    # Assumes lifetime_value dimension exists here, and orders model has product_sku/order_date
    expr: "customers.lifetime_value > 1000 AND customers.is_active AND orders.product_sku = '{{sku}}' AND orders.order_date >= DATE_SUB(CURRENT_DATE(), INTERVAL {{days}} DAY)"
    args:
      - name: sku
        type: string
        description: The product SKU.
      - name: days
        type: number
        description: Number of days to look back.
        default: 30