version: 2
models:
- name: sales_order_detail
description: |
Individual line items representing products sold within each sales order.
Purpose: Line-item transaction table enabling revenue analysis, product performance tracking, discount effectiveness measurement, and basket composition analysis. Foundation for calculating revenue metrics, product-level profitability, and customer purchasing patterns. Used extensively by metrics models for calculating CLV, average order value, gross profit, and product-specific KPIs.
Contents: One row per product line item on a sales order. Composite key: (salesOrderID, salesOrderDetailID). Scale: ~121K line items across ~31K orders spanning Sept 2022 to July 2025 (date-shifted to align with current date).
Lineage: Direct pass-through from stg_sales_order_detail, which sources from sales.salesorderdetail. Staging layer calculates lineTotal field and applies date shifting to modifiedDate.
Patterns:
- Order simplicity: Most orders contain few items (avg 3.9 items per order). Single-item orders are extremely common, representing the dominant purchasing pattern.
- Quantity concentration: 58% of line items are quantity 1, 71% are quantity 1-2. Bulk purchases (qty >10) represent <3% but can reach qty 44.
- Product concentration: Top 10 products (out of 259) account for 20% of line items. Product 870 alone appears in 3.7% of all line items.
- Discount sparsity: 97% of line items have no discount (unitPriceDiscount = 0). When discounts apply, they're typically 2%, 5%, 10%, 15%, or 20%.
- Special offer dominance: 95% use specialOfferID = 1 (likely "No Discount" baseline offer), making non-promotional sales the norm.
- Carrier tracking: 45% of line items have null carrierTrackingNumber, suggesting orders not yet shipped or using ship methods without tracking.
- Price distribution: Highly skewed - median unit price $54.94, but ranges from $1.37 to $3578.27. High-value items (>$2000) appear in ~6% of line items.
- Line total pattern: Log-normal distribution with median $183.94, mean $989.34. Most line items are modest value, but tails extend to $22K+ for high-quantity luxury purchases.
Usage Guidance:
Foundational fact table for sales analytics. Essential for calculating revenue totals, analyzing product performance, measuring discount impact, and understanding purchasing behavior. Most revenue metrics aggregate lineTotal; product analysis groups by productID; discount analysis filters or segments by unitPriceDiscount or specialOfferID. For customer behavior analysis, aggregate to order level first via salesOrderID to avoid over-counting multi-item orders. For product profitability, join to product table for cost data then calculate margin (lineTotal - cost). When analyzing average order value, aggregate line items by order first to get order-level totals.
Critical Context:
- lineTotal is calculated in staging as (unitPrice * orderQty * (1 - unitPriceDiscount)) and represents net revenue after discounts but before taxes/freight. This is the primary revenue metric field.
- All dates shifted forward using shift_date() macro to make dataset feel current (max date aligns with March 28, 2025). Historical patterns span ~3 years.
- Null carrierTrackingNumber doesn't indicate data quality issue - reflects legitimate business states (orders not shipped yet, certain ship methods, or in-store pickup).
columns:
- name: salesOrderDetailID
description: |
Primary key uniquely identifying each line item across the entire table. Represents sequential line item numbering across all orders.
Range: 1-100827 with perfect uniqueness (~121K current rows, suggesting some IDs retired or skipped). Despite name suggesting "detail within order", this ID is globally unique, not just within an order. Conceptually represents the line item number, but implemented as table-wide identifier.
Use as primary key for uniqueness. Combine with salesOrderID for composite business key if needed for readability, though salesOrderDetailID alone guarantees uniqueness. No nulls, no duplicates. Sequential but not gapless - skipped numbers are normal (order cancellations, returns, or system behavior).
data_type: BIGINT
- name: carrierTrackingNumber
description: |
Shipment tracking identifier assigned by carrier for this line item. Enables shipment tracking and delivery confirmation.
Format: Standardized XX00-X000-XX pattern (e.g., "52CC-460F-B3"). ~2K distinct tracking numbers across line items. 45% null, indicating orders not yet shipped, ship methods without tracking, in-store pickup, or bundled shipments where tracking applies at order level rather than line level.
Use null vs non-null to segment shipped vs unshipped items. However, interpret carefully - null doesn't definitively mean "not shipped" as some fulfillment methods legitimately don't generate line-level tracking. For true shipment analysis, prefer sales_order_header.shipDate which provides order-level shipping status. Tracking numbers aren't strictly unique - multiple line items from same order may share tracking when shipped together.
data_type: VARCHAR
- name: orderQty
description: |
Quantity of product units ordered on this line item. Drives revenue via multiplication with unitPrice.
Highly concentrated at low quantities: 58% are qty 1, 71% are qty 1-2, 89% are qty 1-4. Long tail extends to qty 44, but quantities above 10 represent <3% of line items. Mean: 2.4, median: 1, heavily right-skewed distribution.
Represents customer purchase decision quantity. Multiply by unitPrice to calculate pre-discount subtotal, or use calculated lineTotal field which accounts for both quantity and discount. High quantities (>20) almost always involve accessories, components, or clothing - not bikes (bikes rarely exceed qty 5). For inventory and fulfillment analysis, this is units to pick/ship. For revenue analysis, this amplifies unit economics (price and discount) into total line value.
Watch out for: Outlier quantities can distort averages. When analyzing "typical" purchase behavior, consider median or restrict to qty ≤10. For total units sold analysis, SUM(orderQty) grouped appropriately. No nulls - quantity always specified.
data_type: BIGINT
options:
- value: 1
description: Single unit - dominant pattern; ~58% of line items
- value: 2
description: Two units; ~13% of line items
- value: 3
description: Three units; ~9% of line items