Skip to main content
Buster maintains a context layer for your data stack—a repository that serves as the source of truth for everything agents need to know to do their work.

Why it exists

Data engineering tasks require context that isn’t easy to gather from code alone:
  • The data is often the source of truth, not the code
  • Tooling is siloed (your warehouse doesn’t know about your dashboards, your airflow jobs don’t know about your dbt models)
  • Critical knowledge lives in people’s heads and never gets documented

What it captures

The context layer aggregates information across your entire stack:
  • Structure: Tables, columns, relationships, models, pipelines, dashboards
  • Lineage: How data flows across systems, upstream and downstream dependencies
  • Business logic: Rules, constraints, and nuances specific to your data (e.g., “exclude data center signups from city-level metrics”)
  • Tribal knowledge: Context that usually lives in someone’s head

How it’s structured

The context layer lives in a Git repository as structured files:
  • YML files for structured metadata (datasets, columns, relationships, lineage, DAGs, etc)
  • MD files for unstructured context (business logic, nuances, documentation)
This file-based approach means agents can search, read, and update the context using standard tools.

How it stays up to date

The context layer is continuously maintained through multiple channels:
  • Initial setup: When you connect your data stack, Buster automatically documents your data stack and builds the initial context
  • Change triggers: PRs, schema changes, failed jobs, and other events trigger agents to update relevant context
  • Agent discoveries: While doing work, agents may discover nuances about your stack and document them
  • Human feedback: When you correct an agent’s work or provide feedback, that gets captured so it’s not lost

How agents use it

Every Buster agent has access to the context layer, regardless of the task. Whether it’s reviewing a PR, triaging a failed job, etc the agent can reference the context to understand:
  • What it’s working on
  • What’s upstream and downstream
  • What business rules apply
  • What’s been learned from past work