Concepts
The concepts below are the building blocks of the Common Agent Specification. Read this page before the resource reference pages — it provides the mental model that makes the field-level details make sense.
Resources
Everything in the specification is defined as a YAML manifest. There are five resource types:
- Agent — pairs a system prompt with a set of capabilities. Defines what an agent can do, what constraints apply, and how it interacts with tools and events.
- Tool — declares one or more outbound actions and the execution backend that backs each one (HTTP, CEL, MCP, etc.). A tool may also declare inbound events — signals from external platforms that the agent can respond to.
- Schedule — triggers an agent automatically on a recurring cron cadence.
- Trigger — triggers an agent automatically when an inbound event matches its conditions. The event-driven counterpart to Schedule.
- Bundle — a portable multi-document YAML file containing any combination of agents, tools, schedules, and triggers. Bundles are the distribution format for sharing and importing configurations.
Agents and tools are the core pair. An agent uses tools; tools don't know about agents. The same tool can be shared across many agents.
Tasks and the LLM Loop
When an agent is invoked, the runtime creates a Task — a stateful, persistent conversation session. A task does not end when the agent responds; it reaches an idle state and waits for further input.
A message is one complete conversational exchange: the input from the caller paired with the agent's reply. Under the hood, a single message may span many turns — individual entries in the LLM's conversation array, typed as input (user), llm (assistant), or capability (tool result). The LLM sees all turns; the caller sees only messages.
Inside a task, the runtime processes each message with a continuous loop:
- Input arrives — a user message, a scheduled trigger, an inbound event, or a continuation of a previous conversation.
- The LLM is called with the system prompt and conversation history.
- The LLM either sends a message and the task goes idle, or decides to invoke an action.
- If an action is invoked, the runtime executes it and feeds the result back to the LLM as a new turn.
- Steps 2–4 repeat until the LLM sends a message.
Capabilities
A capability is anything an agent can do or respond to during a task. An agent's capabilities come in three forms:
- Actions — outbound functions the LLM can invoke. The specification presents all actions through the same interface: the LLM sees a named, callable function regardless of whether it's backed by HTTP, a CEL expression, an MCP server, or any other runtime.
- Events — inbound signals from external platforms that inject input into a running task. Tools declare both their actions and their events; when an agent lists a tool as a capability it automatically subscribes to all of the tool's events, with no extra configuration required.
- Delegation — another agent exposed as a capability. When invoked, it creates an autonomous child task that runs its own conversation loop and returns its output to the parent. From the LLM's perspective, delegation is indistinguishable from invoking any other action.
All three forms flow through the same middleware and guardrail pipeline. See Events and Tool Runtimes for the full capability surface.
The Value Pipeline
Parameters are the structured inputs a capability expects at invocation time. How they are provided depends on context:
- For tool actions, parameters are generated by the LLM when it decides to invoke the action. They can also be injected deterministically via a binding, bypassing the LLM entirely.
- For tool events, all root tool parameters are available as
parameters.*inside the event'sreceive.filterCEL expression, so the tool author can use them to scope which events are routed. Parameters with a binding are hidden from the LLM and have their allow list entry sealed — making them reliable for use in filters.require_binding: trueis a tool-side validation constraint that ensures the agent must provide a binding. - For agents, parameters define the structured input the agent accepts from its caller. The
messagekey is a well-known parameter that carries the conversational content and MUST be present on every input — either provided explicitly or resolved from a schema default.
Settings are a separate category: static, operator-configured values declared in a tool manifest — API keys, base URLs, environment-specific configuration. They are never exposed to the LLM and never generated by it. Settings apply only to tools; agents do not have settings.
Bindings are expressions evaluated against trusted task context that supply parameter values without LLM involvement. They are the mechanism that keeps security-critical values — user identities, account references, resource paths — out of model-generated input. See Bindings and Parameter Pipeline.
Middleware and Guardrails
The concept of a guardrail in most platforms means running a prompt or response through an LLM-based content filter — probabilistic checks with no access to what happened earlier in a conversation. Common Agents guardrails are a fundamentally different thing: deterministic, logic-driven policy steps that can reference the full task context. They know what the user sent, what the agent has already done, which actions were invoked, and what the results were.
In Common Agents, guardrails are applied at the agent's conversational boundary — validating input from the caller and output back to the caller, across the entire conversation. Middleware is the same mechanism applied to individual capability invocations. Both support the same step types, the same context access, and the same deterministic logic. The only difference is where they are applied.
See Middleware for the full step specification.
Expressions
The specification uses two expression mechanisms with a clear distinction between them.
Some fields support {...} interpolation — curly braces embed a value from a known root into the surrounding string at execution time:
url: "https://api.example.com/users/{parameters.user_id}"
headers:
Authorization: "Bearer {settings.api_key}"
Expression fields — middleware assertions, binding values, guardrail conditions, transforms, event receive.filter — take a CEL expression directly as the field value, with no surrounding string to interpolate into. The entire field is evaluated as logic:
assert: "context.capabilities.fetch_ticket.count_successful > 0"
The difference is in the field type. Interpolation-capable fields use {...} to embed a reference into a string; expression fields are pure CEL. See CEL Reference.