Why it exists
Data engineering tasks require context that isn’t easy to gather from code alone:- The data is often the source of truth, not the code
- Tooling is siloed (your warehouse doesn’t know about your dashboards, your airflow jobs don’t know about your dbt models)
- Critical knowledge lives in people’s heads and never gets documented
What it captures
The context layer aggregates information across your entire stack:- Structure: Tables, columns, relationships, models, pipelines, dashboards
- Lineage: How data flows across systems, upstream and downstream dependencies
- Business logic: Rules, constraints, and nuances specific to your data (e.g., “exclude data center signups from city-level metrics”)
- Tribal knowledge: Context that usually lives in someone’s head
How it’s structured
The context layer lives in a Git repository as structured files:- YML files for structured metadata (datasets, columns, relationships, lineage, DAGs, etc)
- MD files for unstructured context (business logic, nuances, documentation)
How it stays up to date
The context layer is continuously maintained through multiple channels:- Initial setup: When you connect your data stack, Buster automatically documents your data stack and builds the initial context
- Change triggers: PRs, schema changes, failed jobs, and other events trigger agents to update relevant context
- Agent discoveries: While doing work, agents may discover nuances about your stack and document them
- Human feedback: When you correct an agent’s work or provide feedback, that gets captured so it’s not lost
How agents use it
Every Buster agent has access to the context layer, regardless of the task. Whether it’s reviewing a PR, triaging a failed job, etc the agent can reference the context to understand:- What it’s working on
- What’s upstream and downstream
- What business rules apply
- What’s been learned from past work