Filters define named, reusable conditions that resolve to true or false. They encapsulate common query constraints, allowing users to easily apply predefined criteria without rewriting the logic repeatedly. Think of them as saved segments or common WHERE clauses.
Using filters makes queries cleaner, more consistent, and easier to understand, focusing the analysis on specific questions rather than boilerplate filtering logic.
It’s crucial to distinguish Filters from Dimensions and Measures:
Dimensions: Describe attributes (e.g., country, status). Used for grouping, slicing, and direct filtering (WHERE country = 'USA'). A boolean Dimension (e.g., is_active) represents a fundamental true/false attribute stored in the data.
Measures: Quantifiable, aggregatable values (e.g., total_revenue). Used in aggregation functions (SUM) and HAVING clauses.
Filters: Define reusable conditions or subsets (e.g., last_30_days, high_value_customers). They must resolve to boolean and generate WHERE clause logic.
Guideline:
If a boolean concept describes a direct attribute (e.g., a user’s stored active status), use a boolean Dimension.
If you need a reusable condition (especially involving calculations, date logic, combined attributes, or related data), use a Filter.
Purpose: Defines the filter’s logic as a SQL expression (compatible with your data warehouse) that evaluates to true or false.
Optionality:expr is optional only if the filter’s name exactly matches the name of a boolean dimension defined within the same model. In this case, the filter implicitly means dimension_name = true.
Convention: Name filters that map directly to boolean dimensions the same as the dimension (e.g., filter is_active for dimension is_active).
Content: The expr can reference:
Dimensions in the model’s underlying table/view.
SQL functions and operators.
Literals (strings, numbers, booleans).
Arguments (using {{arg_name}} syntax).
When referencing fields within the same model in the expr, you typically don’t need a table prefix (e.g., country = 'USA'). However, using prefixes (e.g., users.country) can enhance clarity, especially when filters might be used in queries involving joins (even implicit ones via relationships). See Best Practices.
Example: Basic Filters
Copy
Ask AI
# In a 'users' model with dimensions: user_id, created_at, country, is_active (boolean)filters: # Filter mapping directly to a boolean dimension (expr omitted) - name: is_active description: Users who are currently marked as active. # Filter based on a single dimension expression - name: is_us_based description: Users located in the USA. expr: "country = 'USA'" # Expression required # Filter using a time-based expression - name: recent_signups description: Users who signed up in the last 90 days. expr: "created_at >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)" # Filter combining conditions within the *same* model # Assumes a measure 'lifetime_value' and dimension 'last_login_date' exist - name: high_value_inactive_users description: Users with LTV > 500 who haven't logged in for 60 days. # Using prefixes for clarity, though optional if LTV/date are in the same model expr: "users.lifetime_value > 500 AND users.last_login_date < DATE_SUB(CURRENT_DATE(), INTERVAL 60 DAY)"
Create dynamic filters that accept user input via args. Reference parameters using {{parameter_name}}.
args: A list defining each parameter. See the Semantic Model Reference for the full specification of Argument fields (name, type, description, default).
Example: Parameterized Filter
Copy
Ask AI
# In an 'orders' model with dimension 'order_date'filters: - name: orders_within_date_range description: > Selects orders placed within a specific date range (inclusive). Requires the user to provide both a start and end date. # Assumes 'orders.order_date' is the relevant dimension expr: "orders.order_date BETWEEN {{start_date}} AND {{end_date}}" args: - name: start_date type: date description: "The start date (YYYY-MM-DD) for the filter range." - name: end_date type: date description: "The end date (YYYY-MM-DD) for the filter range."
Filters can define conditions based on data in related models, provided a relationship exists. The semantic layer uses these relationships to potentially generate necessary joins or subqueries when the filter is applied in a query that also includes the required join defined by the relationship.
# In a 'customers' model that has a relationship named 'region'# linking to a 'regions' model (which has a 'sales_manager' dimension).filters: - name: customers_in_region_managed_by description: > Selects customers located in a region assigned to a specific sales manager. Requires using the 'region' relationship to access 'regions.sales_manager'. # This expression uses the 'region' relationship to access a field # from the related 'regions' model. expr: "region.sales_manager = {{manager_name}}" args: - name: manager_name type: string description: "The name of the sales manager to filter by." # NOTE: Querying this filter requires joining customers to regions via the 'region' relationship.
Using cross-model references requires careful consideration of joins when querying. See “Handling Joins and Related Models” below.
Use Table Prefixes for Related Models: Strongly recommended when expr involves fields that might also exist in other models or when using cross-model filters via relationships. Use model_name.column_name (e.g., users.country) or relationship_name.column_name (e.g., user_orders.product_sku).
Clarity and Readability: Write clear expr logic. Use parentheses () for complex conditions to ensure correct operator precedence.
Null Handling: Be mindful of nulls. Use COALESCE, IS NULL, IS NOT NULL etc., within expr as needed for robust logic, as filter expressions are often used in WHERE clauses where null propagation matters.
Good Expression Example (Parameterized):
Copy
Ask AI
filters: - name: users_signed_up_between description: Filters users based on a signup date range. Requires 'users.created_at'. # Uses table prefix for clarity expr: "users.created_at BETWEEN '{{start_date}}' AND '{{end_date}}'" args: - name: start_date type: date description: The start date (YYYY-MM-DD). - name: end_date type: date description: The end date (YYYY-MM-DD).
Primary Model Association: Filters are defined within a specific model.
Explicit Joins Required in Query: If a filter’s expr references fields from a related model (via a relationship), the query using that filter must explicitly include the necessary join to that related model (using the relationship).
No Automatic Joins: The semantic layer does not automatically add joins based solely on a filter’s expr. It validates that joins used in a query correspond to defined relationships.
Document Join Requirements: The filter’s description must state if it relies on data from related models and which relationships/joins are required. Example: “Filters for users whose latest order was cancelled. Requires joining using the ‘latest_order’ relationship.”
Here’s an example within a customers model context, incorporating various filter types and practices:
Copy
Ask AI
name: customersdescription: > Represents individual customer entities. Contains contact information, registration details, location, and derived metrics like lifetime value. This is a central model for understanding customer attributes and behavior segments. Essential for joining customer data with transactions, support interactions, etc.dimensions: - name: id description: > The unique identifier for a customer, typically corresponds to the primary key in the source customer table. Essential for joining with event models like orders. type: string - name: name description: The full name of the customer. Used for display and personalization. type: string - name: email description: > The customer's primary email address. Used for communication, login identification, and potentially linking across different systems. Assumed to be unique per customer. type: string - name: signup_year description: > The year the customer first registered or created their account. Useful for cohort analysis based on tenure (e.g., comparing behavior of 2022 vs 2023 signups). type: integer - name: country description: > The country where the customer is located, typically derived from registration or geo-IP lookup. Often stored as ISO 3166-1 alpha-2 code (e.g., 'US', 'GB'). Used for geographical segmentation and analysis. type: string - name: user_created_at description: > The precise date and time (UTC recommended) when the customer account was created. Provides granular detail for time-based cohorting or analyzing initial user activity. type: timestamp searchable: true - name: is_active description: Boolean flag indicating if the customer account is currently considered active (e.g., not churned, not suspended). type: boolean - name: last_login_date description: The date of the customer's most recent login or activity. Used to identify inactive users. type: date - name: lifetime_value # Added as dimension for filter example clarity description: Pre-calculated lifetime value for the customer. type: numberrelationships: - name: orders # Link back TO orders model source_col: id # Key in customers (current model) ref_col: customer_id # Key in orders (related model) description: > Links this customer to all their associated order records in the 'orders' model. Enables analysis of purchasing history and behavior for each customer. cardinality: one-to-many # One customer can have many orders type: LEFT # Standard join type when starting from customerfilters: # Basic Filters - name: is_active # Maps to boolean dimension description: Customers currently marked as active. - name: is_us_based # Simple expression description: Customers located in the USA. expr: "customers.country = 'USA'" # Parameterized Filter - name: registered_after_date description: Customers who registered on or after a specified date. expr: "customers.user_created_at >= '{{signup_date}}'" args: - name: signup_date type: date description: The minimum registration date (YYYY-MM-DD). # Cross-Model Filter (using 'orders' relationship) - name: ordered_sku_recently description: "Customers who ordered a specific SKU in the last N days. Requires joining via 'orders' relationship." # Assumes orders model has product_sku and order_date expr: "orders.product_sku = '{{sku}}' AND orders.order_date >= DATE_SUB(CURRENT_DATE(), INTERVAL {{days}} DAY)" args: - name: sku type: string description: The product SKU to check for. - name: days type: number description: Number of days to look back. default: 30 # More Complex Filter (combining local and related data implicitly) - name: high_value_ordered_sku_recently description: "High-value active customers (LTV > 1000) who ordered a specific SKU in the last N days. Requires 'orders' join." # Assumes lifetime_value dimension exists here, and orders model has product_sku/order_date expr: "customers.lifetime_value > 1000 AND customers.is_active AND orders.product_sku = '{{sku}}' AND orders.order_date >= DATE_SUB(CURRENT_DATE(), INTERVAL {{days}} DAY)" args: - name: sku type: string description: The product SKU. - name: days type: number description: Number of days to look back. default: 30
Assistant
Responses are generated using AI and may contain mistakes.