Naveen and his co-founders launched Mercury this week, backed by $1.5 million from a16z and angels from OpenAI and Cognition. Mercury is a no-code canvas for orchestrating mixed teams of humans and AI agents, designed to solve the chaos of managing multiple agents across terminal windows, browser tabs, and Slack channels. The platform treats delegation as a persistent primitive, supports OpenClaw and other major frameworks through adapters, and integrates with 800+ tools via Composio. After dogfooding the system with 30 agents supporting a three-person team for months, the founders discovered that the real challenge in enterprise AI is not making individual agents smarter, but preventing them from duplicating work, contradicting each other, and spamming humans with redundant requests. This new approach to AI orchestration signals a significant step forward for production deployments of AI agent teams.
What Just Happened: Mercury Enters Production
The Mercury team spent the last year deploying AI agents inside large enterprises, observing firsthand the operational challenges faced by early adopters. They watched as teams struggled with fragmentation: Claude Code running in a terminal, research agents living in browser tabs, Slack bots responding in channels, and scheduling assistants buried in separate windows. No single pane of glass existed to view what was running, who was doing what, or how these components connected. As agent counts scaled from three to thirty, the operational overhead became unmanageable, consuming valuable engineering and operations time.
Their solution is a visual canvas where you draw connections between agents and humans to form operational graphs. An edge between Agent A and Agent B means A can delegate work to B, which processes the request, calls its tools, and replies. Agents can chain delegations through multiple levels, creating a living map of how your team actually operates. This graphical representation transforms complex, distributed agent systems into understandable and manageable workflows, providing much-needed clarity for enterprise AI initiatives.
Why Agent Management Became Unmanageable
Many organizations embark on AI agent initiatives with enthusiasm, only to encounter significant management hurdles as their deployments grow. You have probably felt this pain yourself. You start with one agent running in a terminal, perhaps an OpenClaw instance performing a specific task. Then you add a browser-based research assistant to gather information. Subsequently, a Slack bot is integrated for notifications and team communication. Suddenly, you are managing five different authentication flows, five different state stores, and five different user interfaces, each with its own quirks and maintenance requirements.
There is no inherent visibility into the handoffs between these disparate systems. When Agent A finishes a task that Agent B needs, you manually copy the output, introducing potential errors and delays. When both agents try to book the same calendar slot, they conflict, leading to scheduling nightmares. When three different agents decide to ask you the same question via different channels, you experience notification fatigue. This is the coordination problem that Mercury directly addresses. The agents themselves often work fine in isolation; it is the infrastructure connecting them that fails. Current setups force you to act as the human router, manually shuttling context between disconnected systems that should be talking directly and intelligently.
The Canvas Model: Visual Programming for Agent Teams
Mercury replaces the traditional, code-heavy approach to agent management with an intuitive graph editor. Instead of configuring agents through command-line interfaces or complex API calls, you drag agents onto a canvas and draw edges between them to define delegation paths and communication flows. Humans also appear as nodes within this visual representation, creating a unified and holistic view of your operational team, encompassing both AI and human contributors.
This visual approach makes relationships explicit and immediately understandable. You can clearly see that your scheduling agent feeds into your Slack notification agent, which then feeds into a human approval node. This is not merely a documentation tool; the canvas is the actual runtime environment. When Agent A delegates to Agent B, the visual edge represents a real message passing through Mercury’s robust task system. The graph becomes executable infrastructure, dynamically reflecting the state of your agent ecosystem. For OpenClaw developers, this means you can visualize your agent swarms in real-time, gaining insights that were previously only available through laborious log file debugging. You see the data flow as it happens, tracing requests from trigger to completion across the entire team, simplifying troubleshooting and optimization.
Delegation as a Core Primitive: Persistent Task Architecture
A fundamental innovation of Mercury is its treatment of delegation as a first-class primitive, rather than an afterthought or a simple message passing mechanism. When Agent A messages Agent B with a task, Mercury creates a persistent task object. This task object is not ephemeral; it survives across agent activations, meaning multi-step workflows do not lose context if an agent restarts, experiences a temporary network issue, or hits a rate limit. The complete task state lives within the orchestration layer, independent of any single agent’s memory.
Agent B processes the request, potentially calling its tools through Composio integrations, and replies through the same channel. Agents can delegate further down the graph, creating intricate chains of responsibility. This persistence layer is crucial for solving the “what was I doing?” problem that plagues stateless agent setups. If your OpenClaw agent crashes mid-workflow due to an unforeseen error, the task remains in Mercury’s queue, ready to resume when the agent comes back online, ensuring continuity and reliability in complex operations. This robust design significantly enhances the resilience of multi-agent systems in production environments.
OpenClaw Integration: Bridging Open Source and Enterprise Needs
Mercury ships with native adapters for a variety of agent frameworks, including OpenClaw, Claude Code, Devin, Manus, Gumloop, and any MCP-compatible agent. This broad compatibility is particularly significant for the OpenClaw ecosystem, offering a direct pathway to enterprise adoption and deployment. You can take your existing OpenClaw agents, connect them to Mercury’s canvas, and immediately gain sophisticated orchestration capabilities without the need to rewrite your agent logic or significantly alter its internal structure.
The adapter handles the protocol translation between OpenClaw’s native communication methods and Mercury’s orchestration layer, allowing OpenClaw agents to seamlessly participate in Mercury’s delegation graph while retaining their local tool access and operational autonomy. This means you can mix and match native Mercury agents built on foundational AI SDKs with your custom OpenClaw implementations on the same canvas, fostering a heterogeneous and highly specialized agent team. This positions OpenClaw as a first-class citizen in enterprise orchestration stacks, moving beyond its role as primarily a development framework for local experimentation into a robust component of production-grade AI solutions.
The Multi-Agent Coordination Problem Nobody Talks About
The Mercury team discovered a profound, yet often overlooked, challenge while running 30 agents internally: the hard part was never getting a single agent to be intelligent or capable. The true difficulty lay in getting five, ten, or even thirty agents to not duplicate work, contradict each other’s actions, or spam the same human with redundant questions and requests. This problem represents the hidden cost and operational inefficiency of multi-agent systems deployed without a proper orchestration layer.
Without an overarching coordination layer, agents operate in silos, lacking awareness of what their teammates are doing. This leads to inefficiencies such as two research agents independently pulling the same data, a scheduling agent and a travel agent booking conflicting flights for the same individual, or three different agents emailing you about the exact same calendar conflict. Mercury addresses this through its graph structure and persistent task system. Tasks are visible to the orchestration layer, which can detect potential collisions, identify redundant requests, and prevent them from reaching human operators, thereby significantly reducing notification fatigue and operational friction.
Human-in-the-Loop by Design: Security Defaults That Matter
Mercury prioritizes security and control by requiring human approval before agents execute real-world actions. This is not an optional feature; it is the baseline security model embedded into the platform’s core. Agents can request actions, but the system holds these requests in a pending state until a human explicitly clicks “approve” via a web UI, Slack, or iMessage. This provides a critical safeguard, ensuring that automated actions align with human intent and oversight.
You can gradually relax these controls on a per-agent basis as trust builds over time, graduating agents from approval-required to auto-execute based on their track record of reliable and accurate performance. This graduated autonomy model is essential for enterprise deployment, allowing organizations to deploy aggressive automation while maintaining robust guardrails. The per-agent OAuth scoping, implemented through Composio, adds another crucial layer of security. This ensures that even if an agent were to malfunction or be compromised, it could only access the specific Gmail, Calendar, or Linear instances that were explicitly authorized for that particular agent, significantly limiting the potential blast radius of any security incident.
Memory Architecture: The Unsolved Debate
During their development and dogfooding phases, the Mercury founders posed a fundamental architectural question to the community: where should the system’s memory reside? They initially explored managing memory at the organization level, exposing it as tools or injecting it directly into agent context. However, they quickly realized that not all agents handle memory equally. Some OpenClaw agents, for instance, are designed to manage state efficiently and beautifully on their own, leveraging internal mechanisms for context retention. Others, however, might require external memory stores to maintain long-term context or complex state.
The orchestration layer could centralize memory for consistency across all agents, ensuring everyone operates from the same ground truth. However, this approach risks creating a single point of failure and a potential performance bottleneck. Alternatively, the orchestration layer could delegate memory management to individual agents, respecting the autonomy and specialized capabilities of frameworks like OpenClaw. This approach offers flexibility but risks fragmentation and conflicting recollections of events across the agent team. Mercury is currently wrestling with this critical design decision, and their eventual choice will undoubtedly influence how the entire AI agent ecosystem approaches state management and contextual understanding in complex, multi-agent deployments.
Tool Integration at Scale: Composio and OAuth
Mercury achieves its extensive tool integration capabilities through Composio, a robust platform that connects to over 800 common enterprise tools, including Gmail, Calendar, Slack, Linear, Notion, Salesforce, and many other enterprise SaaS staples. The critical differentiator here is the implementation of per-agent OAuth. Unlike traditional integrations where a single, broad set of credentials might be shared across multiple components, Mercury ensures each agent receives its own distinct, scoped credentials.
This granular permission model is absolutely critical for maintaining security and compliance in enterprise environments. For example, your scheduling agent might be granted specific calendar write access, while your research agent is limited to read-only web access. A finance agent would receive highly restricted access to specific spreadsheets or financial systems. This fine-grained control means you can safely run a 30-agent team without the constant worry that a compromised research bot could suddenly delete your production database, send unauthorized emails from a high-level executive’s account, or access sensitive customer data. This approach significantly reduces risk and enhances the trustworthiness of AI agent deployments.
From Terminal Chaos to Production Graphs
The fundamental shift that Mercury represents is a transition from an ad-hoc, terminal-based agent management paradigm to a structured, graph-based system that provides real-time operational visibility. In the current, often chaotic paradigm, managing AI agents typically involves SSHing into a server to check if your agent is running, tailing log files to understand its recent activities, and grepping through JSON outputs to diagnose failures. This approach is reactive, labor-intensive, and inherently prone to errors, especially as the number of agents grows.
Mercury replaces this fragmented experience with a living canvas that continuously displays the runtime state of your entire agent ecosystem. You can immediately see which agents are active, which tasks are pending human approval, and where bottlenecks are forming within your workflows. For enterprises, this distinction is akin to the difference between running a simple script on a server and managing a resilient, observable service in a production environment. The visual graph becomes the documentation, the monitoring dashboard, and the control plane simultaneously. When something breaks, instead of hunting through disconnected log streams and disparate systems, you simply follow the edges of the graph to pinpoint the exact failure point, streamlining debugging and incident response.
The Infrastructure Bet: Analyzing the $1.5M a16z Round
Andreessen Horowitz leading a $1.5 million seed round in Mercury, alongside angel investors from prominent AI organizations like OpenAI and Cognition, signals a serious conviction in the importance of the orchestration layer for AI agents. The core investment thesis here is that agent infrastructure will increasingly separate from agent intelligence. In this evolving landscape, the primary value will move up the stack, from merely providing raw LLM access to developing sophisticated coordination systems that can effectively manage multiple specialized agents.
This investment validates the very problem space that many OpenClaw developers have been experiencing: building an individual agent with impressive capabilities is becoming increasingly accessible, but running that agent reliably in production alongside other agents, especially in an enterprise setting, remains a significant challenge. The funding suggests that enterprise buyers are ready and willing to pay for dedicated orchestration solutions as a distinct product category, rather than viewing it as merely a feature to be bolted onto existing chatbots or automation tools. This strategic investment positions Mercury at the forefront of this emerging infrastructure category.
MCP Compatibility and Protocol Strategy
Mercury’s design emphasizes broad compatibility, supporting any Model Context Protocol (MCP) compatible agent, which notably includes the latest OpenClaw builds. This is a highly strategic choice, positioning Mercury as an open and flexible orchestration layer rather than a proprietary, closed system. By adopting the MCP standard, Mercury avoids vendor lock-in and fosters an ecosystem where organizations can leverage their existing agent investments. You can bring your own agents from any framework that speaks MCP, ensuring maximum interoperability.
This protocol-centric approach mirrors successful paradigms seen in other areas of technology, such as the ubiquitous HTTP for web services or SQL for database interactions. The orchestration layer becomes largely agnostic to the underlying agent implementation, allowing for diverse agent types to coexist and collaborate effectively. For developers and builders, this means you can prototype innovative agents in OpenClaw, confidently deploy and manage them through Mercury, and even swap out underlying components or frameworks without needing to rewrite your entire orchestration logic. It is a clear bet on standardization and open integration over proprietary walled gardens, benefiting the entire AI agent community.
Dogfooding Insights: Running 30 Agents with 3 People
The Mercury team has rigorously tested their platform by “dogfooding” it for months, running an impressive 30 agents to support a lean three-person team. These agents handle diverse functions such as scheduling, sales operations, engineering triage, and finance. This high density of agents per human is a compelling vision for the future of work, but it unequivocally demands robust orchestration discipline to function effectively.
Their key insight from this intensive internal use was that agent specialization consistently outperforms generalization. Instead of attempting to build one monolithic agent that tries to do everything, they found greater success in deploying narrow, highly specialized agents that delegate tasks to each other. For example, a meeting-scheduling agent might delegate to a timezone-resolution agent, which in turn delegates to a conflict-checking agent. Each individual node in this graph-based system is kept relatively simple and focused. The true power and complexity emerge from the intelligent connections and delegation pathways between these specialized agents. This microservices approach to AI agents necessitates a robust and intelligent orchestration layer to handle the intricate message passing, state management, and potential failure modes inherent in such a distributed system.
Mercury vs Traditional Orchestration: A Technical Comparison
Understanding Mercury’s unique value proposition requires a comparison against existing approaches and frameworks for managing AI agents. The following table highlights key differentiators:
| Feature | Mercury | LangChain | AutoGen | Custom Code |
|---|---|---|---|---|
| Visual Editing | Native canvas for graph-based workflow design | Requires code for orchestration logic, limited visualizers | Focuses on conversational group chat, limited visual flow | None, entirely code-driven |
| Persistent Tasks | Built-in delegation primitive with task state management in orchestration layer | Memory modules often agent-specific, less inherent persistence across agents | Group chat history for context, not robust task persistence | Requires manual database or state management implementation |
| Human-in-the-Loop | Default approval flow for real-world actions, graduated autonomy | Requires custom implementation and integration with external UIs | Human proxy agent available, but less integrated with enterprise approval flows | Requires custom UI development and workflow integration |
| Protocol Support | Native MCP (Model Context Protocol) compatibility for broad agent interoperability | Primarily LangChain Expression Language (LCEL) and its own protocol | AutoGen-specific communication protocol for group chat | Any protocol, depending on custom implementation |
| Tool OAuth | Granular, per-agent OAuth via Composio for 800+ integrations | Shared credentials or manual management for tool access | Shared credentials or manual management for tool access | Manual implementation of OAuth for each tool and agent |
| Agent Types | Supports OpenClaw, Claude Code, Devin, and any MCP-compatible agent | Primarily LangChain agents; can integrate others via wrappers | Primarily AutoGen agents; can interact with LLMs | Any agent type, depending on custom development |
Mercury differentiates itself significantly through its native visual graph model and robust persistent task system. Where frameworks like LangChain require you to explicitly code your orchestration logic, and AutoGen primarily focuses on conversational group chat paradigms, Mercury elevates the orchestration graph itself to be the primary interface and runtime environment. The implementation of per-agent OAuth and native MCP support makes Mercury inherently more enterprise-ready and secure compared to typical notebook-based or script-driven workflows, while its no-code canvas dramatically lowers the barrier for operations teams to manage and deploy agent systems without requiring extensive engineering support.
What This Means for OpenClaw Developers
For developers actively building with OpenClaw, Mercury represents a significant and compelling potential deployment target, especially for enterprise scenarios. You can continue to develop your agents using OpenClaw’s flexible and open-source framework, benefiting from its capabilities and community. However, when the time comes to integrate these agents with complex corporate systems like Salesforce, Workday, or internal legacy applications, Mercury’s orchestration layer provides the necessary bridge.
The adapter-based approach means you do not have to choose between OpenClaw’s inherent flexibility and Mercury’s enterprise-grade features; you get the best of both worlds. This also opens up new avenues for monetization and productization. OpenClaw developers can build highly specialized agents, package them for deployment on Mercury’s canvas, and offer them to operations teams who require specific functions, such as a “compliance checker,” “invoice processor,” or “customer support triager,” without requiring those teams to write a single line of code. This symbiotic relationship can accelerate the adoption of OpenClaw in business-critical applications.
Getting Access: The Alpha Evaluation Guide
Mercury is currently in an alpha phase, and you can request access via their website at mercury.build. If you are interested in testing it with your existing OpenClaw setup, here is a recommended approach:
- Identify Core Agents: Begin by identifying three to five OpenClaw agents that you currently run manually or manage with ad-hoc scripts. These should ideally be agents that interact with each other or with human operators.
- Connect and Map: Once approved for access, use the native OpenClaw adapter provided by Mercury to connect your selected agents to the platform’s canvas. Then, visually map your current workflows by drawing delegation edges between these agents, representing how they currently pass tasks or information.
- Test Human-in-the-Loop: Trigger an action that inherently requires human approval (e.g., sending an email, posting to a public Slack channel, or making a calendar entry). Observe how Mercury’s human-in-the-loop mechanism funnels these requests through the web UI, Slack, or iMessage for approval.
- Evaluate Persistent Tasks: Test the resilience of Mercury’s persistent task system. Initiate a multi-step workflow involving several OpenClaw agents, and then deliberately restart one or more of your OpenClaw instances mid-process. Verify that the task state is maintained by Mercury and that the workflow can resume seamlessly once the agents are back online.
- Assess Composio Integrations: Explore the 800+ Composio integrations. Evaluate which of these could replace or enhance your current tool authentication and interaction setup, potentially simplifying your agent’s code by offloading tool access to Mercury.
- Memory Behavior Feedback: Pay close attention to how your OpenClaw agents manage context and memory when orchestrated by Mercury. Does your OpenClaw agent handle context better when Mercury manages the task state, or are there conflicts with your existing memory implementations? The Mercury team is actively soliciting detailed feedback on their memory architecture question, and your observations will be valuable contributions.
By following these steps, you can thoroughly evaluate Mercury’s capabilities and provide meaningful feedback to help shape its future development.
Looking Ahead: Orchestration Layers and Agent Intelligence
The launch of Mercury marks a pivotal moment and a significant shift in the broader AI agent ecosystem. The focus of innovation is demonstrably moving from simply enhancing raw agent capabilities and individual intelligence to developing sophisticated coordination systems. We are witnessing the emergence of the “Kubernetes era” for AI agents, where robust orchestration becomes the critical layer for managing complex, distributed AI operations.
Just as container orchestration became the indispensable layer for deploying and managing microservices at scale, agent orchestration will become the essential infrastructure for AI operations. The long-term winners in this space will likely not be the agents with the largest context windows or the most advanced foundational models in isolation, but rather the orchestration layers that can effectively coordinate hundreds or even thousands of specialized agents without descending into chaos.
As the market matures, closely watch how Mercury resolves their internal memory architecture question; their chosen solution is likely to become a standard pattern or influence best practices across the industry. Furthermore, anticipate the development of pricing models that center around task volume and operational value rather than simple token counts, reflecting the shift towards managed agent services. Finally, expect integrations with frameworks like OpenClaw to become a key differentiator, as robust, open-source compatibility will be crucial for widespread adoption in this rapidly evolving and increasingly complex domain.