Agentic Coordination
The hidden tax & conductors of infinite minds
Foundation models are extraordinarily good at knowledge work tasks. And getting better by the quarter.
I’m using the term “tasks” loosely; on purpose. From making a hiring decision, to building a financial model, to editing a strategy deck, to writing a legal brief. Simple or complex, tactical or executive. Tasks are the meat of any professional role. What creates economic value.
And yet, we humans spend an extraordinary amount of time outside the task itself. Depending on the source, between 60% to 80% of a person’s time at work is spent on “work about work”.
The exact number varies by definition. The direction is consistent and damning. More than half of your professional life is not the work. It’s the overhead of doing the work together.
Intuitively, you’d expect that automating tasks through AI would reduce much of this overhead.
If your coding agent can build your new feature in the background, the scrum standup might be unnecessary. If an agent can draft the financial model, maybe you don’t need the three alignment meetings it took to agree on the assumptions. If another agent can monitor and adjust a marketing campaign in real time, maybe the weekly performance standup becomes unnecessary
That intuition is partially right. But it misses something fundamental.
The deployment of AI tools and agentic workflows within the enterprise doesn’t just reduce existing coordination. It creates new forms of it.
Humans need to coordinate with agents. Agents need to coordinate with other agents. And, eventually, agents need to interface with the external world (software, data, other companies’ agents) in ways that carry their own coordination costs.
Hence, Agentic Coordination: the set of interactions, interfaces, and protocols through which humans and AI agents align on intent, exchange information, and manage shared workflows.
For companies thinking about how to deeply integrate AI into their business processes, agentic coordination is an unsolved puzzle with non-standard answers.
What follows is my attempt at a simple mental framework for its dimensions and the current (imperfect) solutions emerging around each.
Human ↔ Agent: The form factors of working together
The chatbot is a surprisingly sticky paradigm.
The reason is simple: coordination is built-in. The back-and-forth nature of a conversational interface organically embeds the feedback loop. Query in, response out, human re-directs, the agent adjusts. It’s not efficient for everything, but the coordination cost is low and intuitive.
The problem is that chat is a terrible fit for a lot of real work. You don’t want to conduct a code review through a chatbot. You don’t want to manage a fleet of marketing campaigns through a conversation thread. And you definitely don’t want to supervise a background research process by sending messages into a void and hoping for the best.
Satya Nadella, Microsoft’s CEO, described an evolution of human-to-agent interaction through a series of form factors that compose together. The framing resonated because it captures something I’ve been observing in the most advanced use case we have right now: coding agents.
If we take coding as the canonical example of knowledge work, and maybe the clearest lighthouse of what’s coming for other workflows, you already see multiple form factors coexisting:
Inline suggestions. The earliest modality. GitHub Copilot tab-completions, autocomplete in your IDE. The agent whispers a suggestion, you accept or reject with a keystroke.
Chat. Request-response, now enhanced with chain-of-thought reasoning where you can see the agent work through the problem. Copilot Chat, ChatGPT, Claude. The coordination mechanism is the conversation itself.
Actions. The agent executes discrete tasks through tool calls, computer use, or MCP server interactions. You issue a command, the agent does something in the world. The coordination shifts from dialogue to delegation.
Foreground agents. Autonomous agents running in your active session, interactively steered. Claude Code in a terminal, Copilot in VS Code. You’re watching, you’re intervening, you’re collaborating in real time.
Background agents. Autonomous agents running asynchronously, in the cloud or locally, without your active supervision. GitHub Copilot Coding Agent, OpenAI Codex, Devin. The coordination happens at checkpoints: you review results after the fact, approve or redirect, then let them continue
Embedded agents. A particular type of background agent deeply integrated into vertical software. The UI itself responds to and triggers agent activity. Think of AI-native SaaS products where the application boundary and the agent boundary blur.
The main point here is that all form factors coexist and compose together.
When coding, you can run a foreground agent, a background agent, and simultaneously edit in VS Code, all happening in parallel.
I imagine this is where professional work broadly is heading. A developer (or analyst, or marketer, or executive) using all of these form factors simultaneously, like a well-tuned orchestra. Locally and in the cloud.
“We macro-delegate and micro-steer. You do a macro delegation, and then I can in parallel give it instructions while it is doing work.” - Satya Nadella
The IDE, with its combination of panels, diff viewers, consoles, and background terminal processes, provided the perfect fertile ground upon which to build a multi-form-factor agentic experience. And even then, it has taken over a year to get the UX roughly right.
How this translates to legal review, financial planning, or marketing operations (domains where quality is subjective and verification is expensive) is, for the moment, heterogeneous and messy.
Consider two directional questions that every organization deploying agents will eventually need to answer:
Agent to human. How does a customer service voice agent summarize its progress and escalate interactions effectively? What is the right format and cadence for a human supervisor to evaluate the agent’s accuracy and judgment? The coordination here is about trust calibration: how much autonomy, how much oversight, and what does the reporting interface look like?
Human to agent. Given an agentic workflow that monitors customer behavior and contextually communicates with them, how does a digital marketing manager track and steer its behavior? The coordination here is about control surfaces: dashboards, override mechanisms, goal-setting interfaces that don’t require the manager to understand the agent’s internal reasoning.
Human in the loop is no longer the only paradigm to think about. Human parallel to the loop, after the loop, or even outside the loop are all valid configurations for specific use cases. Each demands a different coordination UX, a different trust threshold, and different failure modes.
Agent ↔ Agent: The orchestration layer
How AI agents interact with other agents is a rapidly evolving space. And a fascinating one.
Once you move past a single agent performing a single task, you immediately face the classic organizational problem: how do multiple specialized actors coordinate toward a shared goal without stepping on each other, duplicating work, or spiraling into chaos?
The approaches emerging break down roughly into three categories:
Framework-level primitives. Major agentic frameworks now incorporate agent-to-agent coordination as a core feature. Some examples:
LangGraph models agent workflows as directed graphs with centralized persistent state.
OpenAI’s Agents SDK offers two clean multi-agent patterns: handoffs for peer-to-peer delegation, and agents-as-tools for centralized orchestration.
Anthropic’s Claude Agent SDK (the same infrastructure that powers Claude Code, now available to developers) ships with native multi-agent support, including subagents that report to a caller and fully independent agent teammates that coordinate with each other directly.
CrewAI takes a role-based approach where agents are defined with roles, goals, and backstories.
The specifics vary, but the challenge is the same across all of them. Harrison Chase, LangChain’s CEO, framed it: “When agents mess up, they mess up because they don’t have the right context; when they succeed, they succeed because they have the right context.”
The coordination problem, in other words, is a context engineering problem.
Orchestration platforms. A layer above individual frameworks. Paperclip is probably the most interesting example right now. The mental model is striking: you define a company goal, a CEO agent decomposes it into roles, hires specialized sub-agents (a coder, a marketer, a QA reviewer), and they operate with org charts, budgets, approval gates, and audit trails.
Agent orchestration mirrors organizational design. Reporting lines, budget constraints, governance, accountability. The same problems human organizations have been solving (imperfectly) for centuries.
Interoperability protocols. The protocol stack is consolidating around two complementary standards, both now governed by the Linux Foundation:
Anthropic’s Model Context Protocol (MCP), announced in late 2024, has become the default standard for connecting agents to tools and data. OpenAI, Google, and Microsoft all adopted it within months.
Google’s Agent2Agent protocol (A2A), launched in April 2025, focuses on the complementary problem: enabling agents to discover, negotiate with, and delegate to other agents. IBM’s competing Agent Communication Protocol was merged into A2A in August 2025, consolidating the space.
MCP gives agents hands. A2A gives agents the ability to talk to other agents.
The uncomfortable math
The empirical evidence for most multi-agent architectures is sobering. One study found that unstructured (key adjective!) and budget-constrained multi-agent systems amplify errors significantly. For sequential reasoning, every multi-agent variant performed worse than a single well-configured agent.
Another analysis of 1,600+ multi-agent traces found that coordination breakdowns were the single largest failure category.
A sharp analogy to the microservices era captures the risk: we took workable applications, broke them into a confusing cloud of services, then built entire platform teams just to manage the complexity we’d created. The conclusion: most enterprise teams need one well-instrumented agent with clear exit conditions. Not a swarm.
The question, is whether these limitations are structural or temporary. I think most of the technical ones are temporary. But the organizational ones (trust, knowledge preservation, governance design) may not be. Those are human coordination problems.
Agent ↔ External World: A short bridge
There’s a third coordination axis worth acknowledging, even if it deserves its own dedicated exploration: how agents interface with the external world.
MCPs, APIs, CLIs, agentic search, RAG pipelines, computer use, web automation. All are different mechanisms to give agents access to the data, software, and services they need to act
Whether this qualifies as “coordination” in a purist sense is debatable. It’s more like infrastructure. But it shapes how the other two axes work. An agent’s ability to coordinate with a human (or another agent) depends directly on what it can see, touch, and act upon. The richer the external interfaces, the more useful the coordination becomes, and the harder it is to govern.
A wide and deep area. I’ll save it for a future piece.
Orchestra conductors
Satya Nadella coined the metaphor “managers of infinite minds” to describe how humans will relate to AI agents (crediting the concept, he noted, to the CEO of Notion). I like the metaphor. But I’d make one edit.
I think the ideal end state is for humans to become conductors of infinite minds.
The distinction matters. A manager assigns, tracks, and evaluates. A conductor shapes tempo, dynamics, and coherence across an ensemble that is already capable of playing its parts.
We want humans to supervise the loops from a leveraged point, not be in them.” - Ivan Zhao, Steam, Steel, and Infinite Minds
The conductor doesn’t need to know how to play every instrument. But they need an ear for when something is off, a sense for how the parts fit together, and the authority to intervene when the orchestra drifts.
Jensen Huang projects that NVIDIA’s 75,000 employees will work alongside 7.5 million agents within a decade. A 100:1 ratio. At that density, “prompting” is not a skill. Conducting is.
Very few professionals may have the familiarity with these form factors, or the hands-on mastery of orchestrating across them, that the moment demands.
We started with a coordination tax. Sixty percent of a knowledge worker’s time, lost to the overhead of working together. AI agents don’t eliminate that tax. They restructure it. And they demand a new kind of leadership and professional: one that can conduct effectively an orchestra of infinite minds.
Agentic Coordination is a deep dive on the second of the factors limiting the actual generation of economic value from AI. You can read the introductory post here and the deep dive on the first factor, diffusion, here.
I publish a post a week on key ideas around AI, Agents and everything around their diffusion into the enterprise and people’s lives. You can read them all here.


