Introduction to Relationships

Relationships define how different semantic models connect to each other, effectively specifying join conditions for queries. Defining these connections is essential for enabling analyses that combine data from multiple related models.

By establishing relationships, you empower the semantic layer to understand the logical links between your data entities. This allows metrics, dimensions, and filters to reference fields across different models seamlessly, provided the necessary joins are included in the query.

Here’s the basic structure for defining relationships within a model’s YAML configuration:

# Inside a model definition (e.g., orders.yml)
relationships:
  - name: related_model_name # Required: Name of the model being linked TO (e.g., customers)
    source_col: join_key_in_current_model # Required: Key field in the *current* model (e.g., customer_id)
    ref_col: join_key_in_related_model # Required: Key field in the *related* model (e.g., id)
    description: "Business context" # Required: Clear explanation of the link
    type: LEFT # Optional: Join type (LEFT, INNER, RIGHT, FULL OUTER). Default: LEFT
    cardinality: many-to-one # Optional: Nature of the link (one-to-one, one-to-many, many-to-one, many-to-many)

The relationships key defines a list of connections from the current model to other models.

Defining Relationships

Each relationship specifies the connection details between the current model and another (related) model.

Relationship Fields

FieldRequiredDescription
nameYesThe name of the related semantic model this relationship connects to.
source_colYesThe name of the join key column in the current model.
ref_colYesThe name of the join key column in the related model.
descriptionYesA human-readable description explaining the relationship’s business context. This is mandatory.
typeNoSpecifies the join type. Must be one of: LEFT, INNER, RIGHT, FULL OUTER. Defaults to LEFT if omitted.
cardinalityNoDescribes the numerical link between records. Must be one of: one-to-one, one-to-many, many-to-one, many-to-many. Optional.

Understanding Join Keys

  • source_col: Refers to the column in the current model’s underlying table/view that holds the key value. Often a foreign key.
  • ref_col: Refers to the column in the related model’s underlying table/view that holds the corresponding key value. Often a primary key.

Example: Orders to Customers

Assuming an orders model needs to link to a customers model:

# In orders.yml
name: orders
description: >
  Represents individual customer order transactions. Contains details about each order,
  including identifier, customer linkage, timing, status, and financial value. This is
  an event-based model crucial for sales analysis, revenue tracking, and understanding
  purchase patterns.
dimensions:
  - name: order_id
    description: Unique identifier for the order transaction.
    type: string
  - name: customer_id
    description: Foreign key linking to the 'customers' model.
    type: string
  - name: order_date
    description: Date the order was placed.
    type: date
  # ... other dimensions like status ...
measures:
  - name: amount # Renamed from total_amount
    description: Total monetary value of the order (e.g., in USD).
    type: number
  # ... other measures like item_count ...
relationships:
  - name: customer # Link TO customers model
    source_col: customer_id # Key in orders (current)
    ref_col: id           # Key in customers (related)
    description: Links this order to the specific customer who placed it.
    cardinality: many-to-one
    type: LEFT
# ... other relationships ...
metrics:
  - name: revenue_from_recent_customers
    description: >
      Total revenue from orders placed by customers who signed up in the current calendar year.
      Requires joining with 'customers' model via 'customer' relationship.
    # References 'signup_year' from the related 'customers' model via 'customer' relationship
    # Assumes 'amount' is a measure in this model.
    expr: "SUM(CASE WHEN customer.signup_year = EXTRACT(YEAR FROM CURRENT_DATE()) THEN orders.amount ELSE 0 END)"
# ... other metrics ...
# In customers.yml (Referenced by the metric above)
name: customers
description: >
  Represents individual customer entities. Contains contact information, registration details,
  location, and potentially derived metrics. Used as a dimension table for analyzing orders.
dimensions:
  - name: id
    description: The unique identifier for the customer (primary key).
    type: string
  - name: signup_year
    description: >
      The year the customer registered. Used in the cross-model metric example to analyze
      revenue based on customer tenure.
    type: integer
  # ... other dimensions like name, email, country ...
# ... measures / metrics ...
# ... relationships ...

Relationship Details

Accurately defining join types and cardinality is crucial for correct query generation and results.

Join Types (type)

This field dictates the SQL join behavior when this relationship is used in a query:

  • LEFT (Default): Includes all records from the current model (where the relationship is defined) and matching records from the related model. Fields from the related model will be NULL if no match is found.
  • INNER: Includes only records where a match exists in both models based on the join keys.
  • RIGHT: Includes all records from the related model and matching records from the current model. Fields from the current model will be NULL if no match is found. (Use with caution, often LEFT from the other model is clearer).
  • FULL OUTER: Includes all records from both models. Fields will be NULL where no match exists in the corresponding model.

Cardinality (cardinality)

This describes the expected numerical relationship between records connected by the join keys:

  • one-to-one: Each record in the current model maps to at most one record in the related model, and vice-versa (e.g., users to user_profiles).
  • one-to-many: Each record in the current model can map to multiple records in the related model (e.g., customers to orders).
  • many-to-one: Multiple records in the current model can map to a single record in the related model (e.g., orders to customers). This is the inverse of one-to-many.
  • many-to-many: Records in one model can map to multiple records in the other, and vice-versa (e.g., products to orders). This typically requires a junction model.

Modeling Many-to-Many Relationships

Direct many-to-many links are best handled via an intermediate junction model. Each primary model then defines a one-to-many relationship to the junction model. The junction model itself defines many-to-one relationships back to each primary model.

Example: Products and Orders via order_items Junction

# In products.yml
name: products
description: >
  Information about products offered for sale. Contains product identifiers and
  descriptive attributes like name and category.
dimensions:
  - name: id
    description: The unique identifier (primary key) for the product.
    type: string
  - name: name
    description: The display name of the product.
    type: string
  - name: category
    description: The category the product belongs to (e.g., 'Electronics', 'Apparel').
    type: string
relationships:
  - name: order_items # Link TO the junction model
    source_col: id         # Key in products (current model)
    ref_col: product_id  # Key in order_items (related model)
    description: >
      Links this product to its corresponding line items across all orders via the
      'order_items' junction model. Allows analysis of which orders contain this product.
    cardinality: one-to-many # One product can be in many order items
    type: LEFT
# ... other relationships ...
# In orders.yml
name: orders
description: >
  Represents individual customer order transactions. Contains details about each order,
  including identifier, customer linkage, timing, status, and financial value. This is
  an event-based model crucial for sales analysis, revenue tracking, and understanding
  purchase patterns.
dimensions:
  - name: order_id
    description: >
      The unique identifier for each order transaction. Serves as the primary key for
      this model. Used for looking up specific orders or counting unique orders.
    type: string
  - name: customer_id # Changed from user_id for consistency
    description: >
      Foreign key referencing the 'id' in the 'customers' model. Links the order
      back to the specific customer who placed it. Essential for customer-centric analysis.
    type: string
  - name: order_date # Changed from order_timestamp for consistency
    description: >
      The date the order was placed. Used for time-series analysis, filtering by date ranges
      (e.g., daily, weekly, monthly revenue), and cohort analysis based on order time.
    type: date
    searchable: true # Common to filter/search by date
  - name: status
    description: >
      The current fulfillment or payment status of the order (e.g., 'pending', 'shipped',
      'delivered', 'cancelled'). Used for operational reporting and filtering orders by state.
    type: string
    options: ["pending", "shipped", "delivered", "cancelled"] # Example predefined values
measures:
  - name: amount # Changed from total_amount/amount_col
    description: >
      The total monetary value of the order, typically in a specific currency (e.g., USD).
      This is the primary value used for revenue calculations (SUM, AVG).
    type: number
  # ... other measures like item_count ...
relationships:
  - name: customer # Renamed from 'customers'
    source_col: customer_id # Key in orders (current)
    ref_col: id           # Key in customers (related)
    description: >
      Links this order back to the specific customer who placed it, using the 'customers' model.
      Enables retrieving customer details (e.g., name, email, country) for order analysis.
    cardinality: many-to-one # Many orders belong to one customer
    type: LEFT             # Ensures all orders are kept even if customer lookup fails
# ... other relationships ...
metrics:
  - name: revenue_from_recent_customers
    description: >
      Total revenue from orders placed by customers who signed up in the current calendar year.
      Requires joining with 'customers' model via 'customer' relationship.
    # References 'signup_year' from the related 'customers' model via 'customer' relationship
    # Assumes 'amount' is a measure in this model.
    expr: "SUM(CASE WHEN customer.signup_year = EXTRACT(YEAR FROM CURRENT_DATE()) THEN orders.amount ELSE 0 END)"
# ... other metrics ...
# In order_items.yml (The Junction Model)
name: order_items
description: >
  Junction model linking individual products to specific orders. Each record represents
  a line item within an order, connecting one order to one product and specifying the quantity.
  Essential for resolving the many-to-many relationship between orders and products.
dimensions:
  - name: order_id # Foreign key to orders
    description: Identifier linking this line item back to its parent order in the 'orders' model.
    type: string
  - name: product_id # Foreign key to products
    description: Identifier linking this line item back to the specific product in the 'products' model.
    type: string
measures:
  - name: quantity
    description: The quantity of the specific product included in this line item of the order.
    type: integer
relationships:
  - name: orders # Link back TO orders model
    source_col: order_id # Key in order_items (current model)
    ref_col: order_id    # Key in orders (related model) - Note: Changed ref_col for consistency
    description: Links this order line item back to its parent order record.
    cardinality: many-to-one # Many line items belong to one order
    type: LEFT
  - name: products # Link back TO products model
    source_col: product_id # Key in order_items (current model)
    ref_col: id          # Key in products (related model)
    description: Links this order line item back to the specific product record.
    cardinality: many-to-one # Many line items can refer to the same product
    type: LEFT
# ... other relationships ...

Using Relationships in Queries

Once defined, relationships enable referencing fields (dimensions, measures) from related models within metrics or filters using dot notation: relationship_name.field_name.

Example: Cross-Model Metric Reference

# In orders.yml
name: orders
description: Contains order information.
# ... dimensions (including order_id, customer_id, amount) ...
# ... measures ...
relationships:
  - name: customer # Renamed from 'customers' for consistency
    source_col: customer_id
    ref_col: id
    description: Links this order to the customer who placed it.
    cardinality: many-to-one
    type: LEFT
# ... other relationships ...
metrics:
  - name: revenue_per_customer_signup_year
    # 'amount' is a measure in this 'orders' model.
    # 'customer.signup_year' references 'signup_year' dimension
    # from the related 'customers' model via the 'customer' relationship defined above.
    expr: "SUM(orders.amount) / NULLIF(COUNT(DISTINCT customer.signup_year), 0)"
    description: "Average revenue per unique customer signup year represented in the orders. Requires joining via the 'customer' relationship."
# ... other metrics ...
# In customers.yml (Referenced by the metric above)
name: customers
description: Contains customer information.
dimensions:
  - name: id # Primary key referenced by orders.customer_id
    type: string
  - name: signup_year # Dimension used in the cross-model metric
    type: integer
# ... other dimensions ...
# ... measures ...
# ... relationships ...

Best Practices & Considerations

Accurate Keys

Ensure source_col and ref_col correctly identify the join keys in the respective models. Mismatches lead to failed or incorrect joins.

Correct Cardinality & Type

Define cardinality and type accurately based on the data’s structure and intended join logic. Defaults are convenient but should be verified.

Mandatory Descriptions

Always provide clear, mandatory descriptions explaining the business purpose of the relationship. This is crucial for understanding the model.

Bidirectional Definitions

For clarity and completeness, consider defining relationships in both directions between connected models (e.g., orders -> customers and customers -> orders).

Junction Models for M:N

Use explicit junction models to resolve many-to-many relationships clearly. Avoid attempting direct many-to-many definitions.

Consistency

Use consistent naming conventions for relationships and related fields across your models.

Comprehensive Example

Illustrating bidirectional relationships and referencing related data.

# models/orders.yml
name: orders
description: >
  Represents individual customer order transactions. Contains details about each order,
  including identifier, customer linkage, timing, status, and financial value. This is
  an event-based model crucial for sales analysis, revenue tracking, and understanding
  purchase patterns.
dimensions:
  - name: order_id
    description: >
      The unique identifier for each order transaction. Serves as the primary key for
      this model. Used for looking up specific orders or counting unique orders.
    type: string
  - name: customer_id # Changed from user_id for consistency
    description: >
      Foreign key referencing the 'id' in the 'customers' model. Links the order
      back to the specific customer who placed it. Essential for customer-centric analysis.
    type: string
  - name: order_date # Changed from order_timestamp for consistency
    description: >
      The date the order was placed. Used for time-series analysis, filtering by date ranges
      (e.g., daily, weekly, monthly revenue), and cohort analysis based on order time.
    type: date
    searchable: true # Common to filter/search by date
  - name: status
    description: >
      The current fulfillment or payment status of the order (e.g., 'pending', 'shipped',
      'delivered', 'cancelled'). Used for operational reporting and filtering orders by state.
    type: string
    options: ["pending", "shipped", "delivered", "cancelled"] # Example predefined values
measures:
  - name: amount # Changed from total_amount/amount_col
    description: >
      The total monetary value of the order, typically in a specific currency (e.g., USD).
      This is the primary value used for revenue calculations (SUM, AVG).
    type: number
  # ... other measures like item_count ...
relationships:
  - name: customer # Renamed from 'customers'
    source_col: customer_id # Key in orders (current)
    ref_col: id           # Key in customers (related)
    description: >
      Links this order back to the specific customer who placed it, using the 'customers' model.
      Enables retrieving customer details (e.g., name, email, country) for order analysis.
    cardinality: many-to-one # Many orders belong to one customer
    type: LEFT             # Ensures all orders are kept even if customer lookup fails
# ... other relationships ...
metrics:
  - name: revenue_from_recent_customers
    description: >
      Total revenue from orders placed by customers who signed up in the current calendar year.
      Requires joining with 'customers' model via 'customer' relationship.
    # References 'signup_year' from the related 'customers' model via 'customer' relationship
    # Assumes 'amount' is a measure in this model.
    expr: "SUM(CASE WHEN customer.signup_year = EXTRACT(YEAR FROM CURRENT_DATE()) THEN orders.amount ELSE 0 END)"
# ... other metrics ...
# models/customers.yml
name: customers
description: >
  Represents individual customer entities. Contains contact information, registration details,
  location, and derived metrics. This is a central model for understanding customer attributes
  and behavior, and for linking to their order history.
dimensions:
  - name: id
    description: The unique identifier for a customer (primary key).
    type: string
  - name: name
    description: The full name of the customer.
    type: string
  - name: signup_year
    description: >
      The year the customer first registered. Referenced by metrics in the 'orders' model
      to analyze behavior based on customer tenure.
    type: integer
  # ... other dimensions like email, country ...
# ... measures / metrics ...
relationships:
  - name: orders # Link back TO orders model
    source_col: id           # Key in customers (current model)
    ref_col: customer_id    # Key in orders (related model)
    description: >
      Links this customer to all their associated order records in the 'orders' model.
      Enables analysis of purchasing history and behavior for each customer.
    cardinality: one-to-many # One customer can have many orders
    type: LEFT             # Standard join type when starting analysis from the customer
# ... other relationships ...