Engineering Notes

Your AI Agents Need an Org Chart

Adrian Quiroga — Thu, 09 Apr 2026 14:30:00 GMT

You start out with one agent. As you embrace the productivity boost you tolerate the small failures, after all, the working label has a nice shimmer effect on your terminal, and you start delegating more and more. Your agents start spawning subagents, but you can't sit still.

You want MORE AGENTS. So you start divvying up your backlog, proliferating worktrees, and suddenly you're juggling fourteen terminals and acting like a human clipboard while slowly losing your mind. What if you could orchestrate your agents?

The industry is in a mad rush to solve this, coming at it from wildly different angles. Turns out orchestration needs two things: agents that can delegate work down (and keep going deeper), and agents that can talk to each other.

Claude Code's Agent Teams got the communication part right, any agent can message any other. But delegation is capped at one level. Codex went the other direction with configurable depth, agents all the way down, but locked communication to parent-child only. Siblings are invisible to each other, everything routes through the boss. Perhaps the most sophisticated agent orchestrator, Gas Town, ships with free messaging and a deep role hierarchy — nearly 400,000 lines of code worth of orchestration. But the agent that delegates the work doesn't own what it delegated. It can't terminate a subtree, can't choose how to recover from a failure. Crash recovery, termination, completion are all handled by a separate monitoring stack.

In a company, when your direct report's project goes sideways, you handle it. You don't wait for some monitoring department to notice and restart them. You decide: reassign, retry, or cut your losses. An org chart encodes who is responsible for what, not who can talk to whom.

I've been building an orchestrator that separates the two.

What happens without authority

The research on multi-agent coordination is starting to pile up, and the failure modes keep pointing in the same direction: Authority gaps.

A benchmark called CooperBench tested what happens when peer coding agents work together as equals: no manager, no hierarchy, just a shared communication channel. Success rates were 30% lower on average than agents working alone. More agents made it worse, monotonically. Two agents succeeded 68.6% of the time. Three agents: 46.5%, four agents: 30.0%. The interesting part is that communication wasn't the bottleneck. Agents used the channel extensively, burning up to 20% of their action budget on messages, and it actually helped with merge conflicts (29.4% conflict rate vs. 51.5% without). But they couldn't converge on what to build. The paper sorts the gaps into three categories: agents acknowledge each other's plans then proceed as if nothing was said (expectation), agents claim work as done when it isn't (commitment), and agents fail to reach a shared answer at all (communication). The shared root cause is that no agent had the standing to decide what gets built, verify what got done, or call a halt when things went sideways. These were teams with Slack but no manager.

An analysis of over 1,600 execution traces across seven multi-agent frameworks found that nearly one in five failures are termination-related: agents that don't know when to stop (12.4%), or stop too early (6.2%). In org chart terms: nobody to say "this project is done" or "this project is cancelled."

When these gaps compound, the numbers get ugly. A Google Research study found that independent multi-agent systems amplify errors 17.2x compared to single-agent baselines. Centralized coordination, where one agent has authority to intervene and redirect, reduces that to 4.4x. (A caveat: every multi-agent variant degraded sequential reasoning tasks by 39-70%. This advantage is specific to parallel workstreams where decomposition and oversight matter most.)

You can see the gap in production too. Codex CLI has documented zombie processes: 1,319 orphaned agents and 37GB of leaked memory, spawned with no authority to reclaim them. Deep Agents shipped async subagents with a cascade cancellation bug where one failure cancels all parallel siblings, an example of authority flowing the wrong direction. These problems are showing up throughout the industry.

Intent flows down, outcomes flow up

Telecom engineers hit this problem decades before AI agents existed. When you're running a telephone switching system with thousands of concurrent processes, some of them will crash at odd hours with no one around to fix it. The solution, formalized in Erlang's supervision trees, was a structural principle: if process A creates process B, then A is responsible for B. Not as a design preference, as an architectural requirement. That responsibility relationship, replicated at every level, produces a tree. The core rule was the same one every org chart encodes: the entity that creates is responsible for what it created. OpenAI's Symphony inherited this directly, built on Elixir which runs on the same runtime as Erlang.

In a previous post I argued that create and destroy need infrastructure. The next question is what that infrastructure produces when you go deep — not one level of delegation, but three, five, ten.

A useful way to think about this is information flow. In any hierarchical organization the intent tends to flow down and the outcomes flow up.

When a parent delegates, the child inherits the goal. When a parent terminates, that intent cascades through the entire subtree. Delegation and termination are both downward flows: one says "do this," the other says "stop." At the same time, outcomes flow up. When a child finishes or fails, the result surfaces to the parent. The parent absorbs it — retry, redelegate, or move on. A grandchild failure is the child's problem unless the child fails too. Each level of the tree filters the complexity below it, which is what makes deep delegation viable: whoever is at the root doesn't need to monitor every leaf. They manage one agent. That agent manages its own.

One important caveat is that while this authority tree should constrain who can terminate, recover, and verify lifecycle state, it should not constrain who can talk. Any agent should be able to message any other, like Slack in a company. You don't need your manager's permission to DM a colleague in another department. Block just articulated the same separation for their entire company: AI handles information routing, authority relationships persist.

Where are the current frameworks on this? Closer than you'd think. The spawn is there. Crash notifications exist. But cascade, recovery authority, and lifecycle reporting aren't in the delegation chain yet. Any framework with parent-child spawning is one architectural step from a full authority model. The gap is smaller than it looks.

What this looks like in practice

I've been building this separation into an orchestrator called AgentBeacon. The coordination surface an agent sees is two MCP tools:

fn delegate_schema() -> JsonValue {
    json!({
        "name": "delegate",
        "title": "Delegate",
        "description": "Assign work to a child agent. Returns immediately with a session_id.",
        "inputSchema": {
            "type": "object",
            "properties": {
                "agent":  { "type": "string", "description": "Name of the agent to delegate to" },
                "prompt": { "type": "string", "description": "Task description for the child agent" },
                "cwd":    { "type": "string", "description": "Working directory for child (defaults to parent's cwd)" }
            },
            "required": ["agent", "prompt"]
        }
    })
}

fn release_schema() -> JsonValue {
    json!({
        "name": "release",
        "title": "Release",
        "description": "Terminate a child session and free its resources. Works in any non-terminal state (including while the child is working). Also terminates any descendants.",
        "inputSchema": {
            "type": "object",
            "properties": {
                "session_id": { "type": "string", "description": "The session ID of the child to release (returned by delegate)" }
            },
            "required": ["session_id"]
        }
    })
}

Source: delegate_schema, release_schema

That's the whole coordination API. Everything else the infrastructure handles silently: cascade termination follows the delegation chain, crash recovery retries with a configurable budget or fails upward and notifies the parent, and when a child is terminated the system reports whether the exit was clean or interrupted. The agent delegates and releases. The system handles the rest.

The schemas above are the whole API. I run my own delegation workflows on it, and AgentBeacon is open source if you want to do the same. It's early — expect the rough edges that come with that — but the structural claim is testable today.

The interface boundary

This ownership structure changes more than reliability. It changes how humans interact with agent teams.

If each parent owns its children and absorbs their complexity, the human only needs to talk to the root. Not every agent, not every terminal. No more acting as a human clipboard between seven sessions. The tree becomes the interface boundary between human judgment and agent execution.

But that boundary creates a new problem. When agents can reliably execute deep hierarchies of work, the failure mode shifts. The bottleneck isn't crashes or zombies anymore, it's agents working on the wrong things. Without a structured way for agents to surface decisions upward, more agent capacity just means more wrong work, faster.

An org chart tells you who is responsible for whom. It doesn't tell you when to pick up the phone. That's the next question: what happens when you replace the chat window with a structured decision queue.

Multi-Agent Coordination Primitives

Adrian Quiroga — Thu, 19 Mar 2026 14:30:00 GMT

How many coordination primitives does your multi-agent framework need? The field is converging on a shared goal of effective multi-agent coordination but seems unsure about the mechanisms to get there.

There is a pervasive sense of re-invention and confusion between scaffolding tools and real coordination primitives. The coordination surface of current AI agent frameworks ranges from a handful of tools to over a dozen operations, without clear agreement on the fundamental operations that multi-agent coordination systems need. Where do you draw the line?

See for example:

Claude Code Agent Teams (Anthropic, experimental): 13+ operations — spawnTeam, cleanup, write, broadcast, approvePlan, rejectPlan, requestShutdown, approveShutdown, task management (TaskCreate, TaskUpdate, TaskList, TaskGet), shared task boards, file-based mailboxes, dependency tracking, plan approval workflows.
Codex CLI (OpenAI, experimental): spawn_agent, send_input, wait_agent, close_agent, resume_agent, spawn_agents_on_csv (hierarchical parent-child, configurable depth, no peer messaging).
Gas Town (by Steve Yegge): role-based hierarchy (Mayor, Polecats, Refinery, Witness, Deacon, Crew), Beads (Git-backed work tracking), Hooks (work queues), GUPP, Sweeps, Agent Mail integration.
CrewAI: delegation tools (Delegate work to coworker, Ask question to coworker), hierarchical process with manager agent, sequential process, Flows with @router conditional routing, role/goal/backstory agent definitions.
LangGraph: edges, conditional routing via add_conditional_edges(), state with reducers, checkpointing, Send for dynamic fan-out, Command for programmatic control.
Google ADK: SequentialAgent, ParallelAgent, LoopAgent (workflow agents), AgentTool, transfer_to_agent(), shared session state.

TLDR: You only need two coordination primitives. The rest is either communication (let the model handle it) or human interface (a separate concern).

Computer Science meets Organization Theory

Both computer science and organizational design theory have studied coordination for decades — different vocabularies, different concerns, but overlapping conclusions. The recurring patterns across both traditions converge on the following 11 primitives.

Primitive	What it is
TELL	Assert information to another agent Actor model send; Mintzberg informal communication → Agent Teams `message`, `broadcast`; Codex `send_input`; LangGraph shared state with `add_messages` reducer
ASK	Request information or action FIPA `request`; CSP input → CrewAI Ask question to coworker; ADK `AgentTool` (wraps agent as callable tool)
CREATE	Spawn a child, establish authority Actor model create; Unix `fork()` → Agent Teams `spawnTeam`; Codex `spawn_agent`; Gas Town Mayor dispatches work
DESTROY	Terminate a child and subtree Erlang supervision trees `terminate_child`; Unix `kill()` → Agent Teams `requestShutdown`; Codex `close_agent`
CLAIM	Atomically acquire exclusive access Linda `in` → Agent Teams task claiming (file-locked). Rare in practice — see below.
SYNC	Block until agents converge CSP parallel composition → Codex `wait_agent`; LangGraph `Send` fan-out + superstep convergence
ESCALATE	Transfer a decision upward Management by exception; military C2 → Agent Teams plan approval requests (teammate submits to lead); LangGraph `interrupt()` (pauses execution, surfaces to human/parent)
NEGOTIATE	Iterative exchange to reach agreement Contract Net Protocol bidding; org theory lateral negotiation → No mainstream framework implements negotiation in the Contract Net sense
OBSERVE	Monitor for state changes Linda `rd`; pub-sub pattern → Agent Teams shared task boards; ADK shared session state (read-on-demand); LangGraph state channels (nodes read state written by others)
SCOPE	Define a communication boundary CSP hiding; MOISE+ structural dimension → Agent Teams file-based mailboxes; Gas Town Rigs; LangGraph subgraphs (own state schema); CrewAI crew boundaries (agents scoped to a crew)
REPORT	Send aggregated status upward Contract Net Protocol report phase → Agent Teams task status updates; Codex parent-child result return

This list probably isn't perfect. Some of these may collapse into each other, others may be missing. But it gives us something to work with. The real engineering question is: how many of these actually need to be built out as infrastructure?

Taking inspiration from Richard Sutton's "bitter lesson", which observes that hand-engineered approaches in AI tend to lose to general methods as computation scales, we can apply a similar lens to agent coordination infrastructure. Some teams are already finding that removing tools sometimes improves performance, not degrades it. We could take this to the extreme and say "no infrastructure is needed", "swarms of autonomous agents will solve everything", but there is no historical precedent for this ever working at scale. I'm talking about undirected, non-hierarchical swarms of entities, building a complex system, and maintaining it over time. Even ant colonies, the canonical example of leaderless coordination, only produce repetitive/stereotypical output, honed by millions of years of evolution. Building complex systems requires authority structures: Linux has a BDFL, Wikipedia has admins, you (probably) have a boss.

I will explore this in more detail in a future post, but to answer the question from earlier we can use a simple heuristic to categorize our coordination primitives. Each primitive above can be categorized as one of:

requires infrastructure enforcement
the model can handle it through conversation

Out of all the primitives above, only two require external enforcement in an organizational setting: CREATE and DESTROY. The rest can all be reduced to communication when you really stop to think about it. If an agent sends a bad message, it can just correct in the next one. Two agents can negotiate a contract using natural language (whether they adhere to the contract is a separate story, but that applies to all systems where autonomous entities interact). Even synchronization and exclusive access (SYNC, CLAIM), which sound like they might need formal mechanisms, are things that high-functioning teams handle through conversation every day: "I'm working on the auth module, don't touch it" or "let me know when you're done." Think of a well-run Zoom stand up, you don't need to fight over the unmute button, you just talk. But you can't "talk" another team member into existence.

CREATE needs infrastructure because it implies resource allocation. In a human org, if you want to add someone to your team, you need to go through the proper channels for recruiting/HR/onboarding/budgeting etc. In a human organization "spawn" means "add to organizational structure" rather than literal "spawn from nothing". In the AI agents world this means "add to coordination unit". Whether the agent is pre-existing (ie. a human being brought in as an employee) or new-to-this-world (fresh AI Agent process), we care about its lifecycle as a part of the organization/coordination unit. For AI agents, spawning a new one means allocating compute, worktrees, auth, process ownership, potentially gigabytes of memory (looking at you Claude Code). A model can make this request, just like you can ask HR to post a job opening, but the organizational infra and process needs to actually set it in motion. "But at my last company we just hired a person on the spot off the street and it worked out great." How big was that company? Two people working in a coffee shop? Non-structured spawning doesn't scale.

DESTROY is less obvious. The first instinct might be to think this can just be handled in conversation, and to some extent it can. An agent can send a message to a child agent asking it to shut down, and you can tell your employee to stop what they are doing or fire them. But where conversation breaks down is enforceability. You might have a rogue agent going off script, or an employee who refuses to leave. Human organizations have developed processes and systems to deal with this: cancel ID badge, lock out of company accounts, escort them out of the building. AI agents need similar infrastructure and systems in place to enforce the organizational lifecycle of an entity. If an agent hangs, gets stuck in a loop, or ignores the parent instructions then we need a reliable way to terminate the process.

You may have noticed that the DESTROY primitive appears to be mostly sourced from real engineering implementations. Most theoretical models assume that processes are self-terminating. Process calculi formalized self-completion ie. a process that finishes and becomes inert. The real world is a lot messier. Agent sessions don't always finish, they get stuck in loops, they stop responding. The parent needs a mechanism that works whether or not the child cooperates, for the same reason that a company needs HR procedures for termination and not just a polite message. Erlang's supervision trees and Unix signals both arrived at the same answer: the entity that creates must be able to destroy.

"But if my agent has a Bash tool it doesn't need any infra." Sure, a model with terminal access can technically do anything, including claude -p 'go rogue'. But "technically can" isn't the same as "organizationally should." That's the agent equivalent of you hiring your friend off the street, giving them a desk, no paperwork, and hoping it works out. Maybe some work gets done, but you can imagine how this ends. When things go wrong, there's no audit trail of who spawned what, no resource management, no authority hierarchy, and no guarantee that "fired" actually means fired. Maybe they handed their badge to someone else on the way out. CREATE and DESTROY need to be infrastructure because the organization needs to know what's running, what it costs, and who's responsible for each running entity.

Two Is All You Need

If you go back to the framework list from the opening and run each tool through this filter, the same pattern shows up every time. spawnTeam and requestShutdown. spawn_agent and close_agent. fork() and kill(). Everything else, the messaging, the broadcasting, the task boards, the approval workflows and so on is just communication disguised as infrastructure.

Teams are finding that having fewer tools generally improves agent performance. Vercel removed 80% of their agent's tools and saw a 3.5x speedup, Cloudflare found that agents handle complex APIs better when they write code instead of calling tools directly, and Anthropic measured a 98.7% token reduction when agents have freedom to explore their tools vs. loading all upfront. If this holds for API tools, it applies even more to coordination tools, where each additional primitive is another point of failure in a multi-agent chain.

I have a strong suspicion that frameworks will converge on thinner coordination surfaces over the next few model generations. Infrastructure for authority: who gets created, who gets terminated, and who decides. Conversation for everything else. If this is wrong, adding a tool later is easy. If it's right, frameworks that built tools for negotiation, synchronization, and claiming are carrying dead weight, and that dead weight compounds. Google Research found that independent multi-agent systems amplify errors 17.2x compared to single-agent baselines. Each additional coordination step in a chain eats into your end-to-end reliability.

This is a bet on the trajectory of model improvement. Models will keep getting better at communication, but just as in human organizations, the create/destroy, hire/fire operations need to be regulated and enforceable.

In subsequent posts I will show what a system built on these principles looks like in practice, starting with the problem none of these primitives solve: what happens when an organization of agents needs human input?