Stigmergic Objects
Typed Shared State for Multi-Agent Coordination
Marcus is building a competitive analysis. He has Claude Code open, and the task is straightforward: research five companies, find patterns and contradictions across the findings, and produce a summary report. He starts with Company A.
"Research Acme Corp's market position, pricing model, and recent product launches. Write the findings to /research/acme.md."
Claude produces a clean markdown file. Marcus reads it, nods, and repeats the process for Companies B through E. Within an hour, he has five files in a folder. Each one is a different length. Company B's file is three paragraphs; Company D's is two pages. They use different heading structures. Company C has a "Pricing" section; Company A calls it "Revenue Model." Marcus can't tell which files are thorough and which ones the agent abandoned halfway through, because there is no schema that defines what a complete research file looks like.
He moves to analysis. He asks a second agent to read all five files and find contradictions. "Look through /research and identify any claims that conflict with each other." The agent produces contradictions.md. Marcus opens it and reads: "Company B's claim about enterprise pricing appears to conflict with Company D's market positioning." He wants to check the original sources, but the analyst has paraphrased the claims rather than pointing to specific locations. He spends ten minutes searching through two files to find what the analyst was referring to.
Next, a writer. Marcus asks a third agent to synthesize the research and the contradictions into a final report. He points it at both the /research folder and contradictions.md. The writer produces something that reads well, but when Marcus compares it against the contradictions file, two of the seven contradictions are missing entirely. He checks the context window. The writer's input was too long; part of the contradictions file was truncated. The writer didn't mention the omission, and Marcus didn't know anything was missing until he read everything himself.
Now Marcus hits a real problem. Company B's research was thin to begin with. He wants to re-run just that one company. But the analyst already incorporated Company B's findings into the contradictions file. The writer already cited the analyst's output. Re-running the researcher on Company B means re-running the analyst, which means re-running the writer. Three agents, three files, a full cascade. He can't update one piece without manually propagating the changes through the pipeline.
A colleague suggests adding a fact-checker between the analyst and writer. Marcus looks at his folder structure. There is no "between." There are files in a directory. To insert a new step, he has to decide which files the fact-checker reads, where it writes its output, update the writer's prompt to consume the fact-checker's output instead of the analyst's, and, if the fact-checker sends corrections back, modify the analyst to accept a kind of input it was never designed for. One new agent. Four integration changes. He hasn't even decided what the fact-checker should check.
By hour five, Marcus is spending more time orchestrating file inputs and outputs than performing the analysis itself. He tracks the entire coordination logic manually: which agent reads which files, in what order, what constitutes "done" for each step, and what to re-run when something changes. The next day, he tries to hand the pipeline to a colleague. He can't. The coordination logic lives in his memory, not in the system.
The problem underneath
The files in Marcus's /research folder lack four properties that coordination requires.
They have no type. Each file is a bag of text. Nothing declares what fields a complete research file should contain, so there is no way to check whether Company B's thin file is missing sections or simply brief.
They have no lifecycle. A file that was written ten minutes ago and a file that has been reviewed, revised, and approved look identical. "Draft" and "final" are conventions Marcus tracks in his head. Nothing prevents the writer from reading a file the researcher hasn't finished yet.
They have no operations. The only thing you can do with a file is read it or overwrite it. There is no "add a claim with a source and a confidence score." There is no "flag a contradiction between claim 7 and claim 12." The only interactions are reading the entire file or overwriting it; the system does not support structured operations on individual claims.
They have no history. When the analyst flags a contradiction, Marcus can't trace that judgment back to the specific claims it references, the order in which the analyst considered them, or whether the analyst changed its mind. The file records a final state, not a process.
Wherever multiple agents share mutable state through plain files, these four missing properties cause coordination problems. This pattern shows up in Claude Code, Cursor, Windsurf, and other agentic coding tools where the human assigns agents to specific files, reviews their changes, and decides which edits to keep. The human coordinates multiple agents by deciding which tools run on which files and when; in these tools, that coordination is implemented through file management operations.
The human coordinates multiple agents by deciding which tools run on which files and when; in these tools, that coordination is implemented through file management operations.
What we tried
We've spent two years working on this problem across four prototypes. We used each failed prototype to refine the requirements for the next.
Our first attempt was a visual builder: a node-and-flow editor where you could wire together prompts, tools, and data sources on a canvas. We expected the visual interface to change how people designed and debugged AI systems. What we found was that users arranged their prompt nodes in different layouts, but the prompts themselves didn't change. The canvas made it easier to organize AI calls, but it did not help users specify the behavior of those calls. The canvas showed boxes (prompts) and lines (data flow), but those structures carried no machine-readable semantics. This was in late 2023, before agent-equipped coding tools existed. Nobody had experienced Marcus's problem yet, so nobody was looking for the solution we were building.
Our second attempt borrowed the actor model from our earlier work in blockchain systems. Agents would be actors, communicating through typed messages. The design encoded topology directly into the message-passing graph. Adding an agent meant rewiring every connection it touched. The same inflexibility Marcus experienced with files, but now implemented in code instead of in a folder structure.
Our third attempt was a document system. Objects lived inside documents with structured schemas and version history. This was closer, but the document metaphor implied features like collaborative editing, cursors, and presence that were irrelevant to the coordination problem. Agents don't need to see each other's cursors. They need to see each other's structured outputs.
The fourth design removed these abstractions. The agents don't need to talk to each other. They don't need a visual canvas. They don't need a document editor. They need objects: shared state with a defined type, lifecycle, operations, and history. Coordination happens through updates to shared state rather than direct messages between agents. In stigmergic systems, each agent modifies a shared medium that other agents later read. The technical term for this is stigmergy: indirect coordination through modification of a shared environment.
Stigmergic objects
We built a system called stigmergic objects, a TypeScript library with two layers.
The first layer is an Environment: a type registry and instance manager. A developer defines objects as plain TypeScript classes. A ResearchBoard has claims, contradictions, and gaps. A Report has sections, an abstract, and a status. A Deployment has stages and transition rules. These classes carry static metadata (a name, a description, and a map of method signatures) so that agents can discover what operations are available without reading source code. The Environment wraps each object in a reactive proxy for change tracking, manages snapshots and undo, and emits events on every mutation. Six generic tools expose the Environment to LLM agents: list what exists, inspect a type's schema, query an instance's state, invoke a method, create a new instance, or execute arbitrary code against the environment.
The second layer is an Orchestrator that composes agents into pipelines. We implemented four topology patterns, each averaging 70 lines of code. Sequential runs agents in order, passing state through the environment. Parallel runs them concurrently with an optional merge step. Router classifies input and dispatches to a handler. Shared Artifact lets agents take turns refining the same object until it converges. Agents coordinate stigmergically: they read and write shared objects rather than messaging each other. The orchestrator manages agent lifecycle, enforces constraints (which objects an agent can modify, how many turns it gets), and streams events.
The core operation is method invocation. When an agent calls a method on an object, the Environment snapshots the state before the call, dispatches the method, diffs what changed, records the change in the object's history, and emits an event:
const prevState = structuredClone(obj.raw);
const result = obj.instance[method](...paramValues);
const nextState = structuredClone(obj.raw);
const changes = this.diffState(prevState, nextState);
obj.snapshots.push({ timestamp: Date.now(), state: prevState, trigger: method });
obj.history.push({ timestamp: Date.now(), method, args, changes });
this.emit({ type: "change", id, method, args, prev: prevState, next: nextState });
Every interaction goes through this path. There is no way to mutate an object without generating a history entry and an event.
Observability follows directly from this architecture: every mutation creates a history entry and an event.
We describe the system's key design decisions in five parts: why agents coordinate through objects rather than messages, why we started with six generic tools and what we learned from that, why termination conditions take the full environment, why convergence detection lives in the object, and why we used plain TypeScript classes with no base class or decorators.
Coordination through objects, not messages
The central design choice is that agents never communicate directly. There is no message channel between the researcher and the analyst. Instead, the researcher writes claims to a ResearchBoard. The analyst reads the ResearchBoard, flags contradictions, and identifies gaps. The writer reads both the board and the report object, and produces sections. Each agent's context contains its system prompt and the current state of the objects it can see. No message history from other agents.
Most multi-agent frameworks coordinate agents through message passing rather than shared objects. In AutoGen, agents are chat participants who reply to each other in a conversation thread. The model is natural for tasks that require discussion, but every agent accumulates every other agent's conversation history. Context grows with the number of agents and the number of turns. In LangGraph, agents are nodes in a directed graph, and edges define message flow. The graph is explicit and inspectable, but adding an agent means adding nodes and edges. In CrewAI, agents have roles and goals, and tasks define dependencies. The coordination is implicit in the task structure, but the shared state between agents is unstructured output strings.
Message passing has real strengths. The data flow is explicit: you can trace what each agent received. For pipelines with fixed participants and message flows, message passing can be easier to reason about because each agent has a predefined set of inputs and outputs. But it encodes the topology into the code, so adding an agent means rewiring every connection it touches.
Shared state reduces the need to update message routing when you add or remove agents. Agents read and write a common structure without encoding topology anywhere. But without lifecycle rules or type safety, nothing prevents one agent from overwriting another's work. Shared state simplifies coordination but provides no safeguards against agents overwriting each other's changes.
| Message Passing | Shared State | Stigmergic Objects | |
|---|---|---|---|
| Data flow | Explicit | Implicit | Implicit + typed |
| Topology | Encoded in wiring | None | None |
| Adding an agent | Rewire connections | Just connect | Grant object access |
| Integrity | Guaranteed by protocol | None | Enforced by schema + lifecycle |
Stigmergic objects provide shared state with type safety, lifecycle enforcement, and structured operations. Agents coordinate by reading and writing shared objects rather than by passing messages defined in the orchestration framework. Adding an agent means giving it access to existing objects and specifying which ones it can modify. No existing code changes.
The tradeoff we accepted is that there may be cases where agents genuinely need to negotiate or ask each other questions. Pure stigmergy assumes the objects carry enough information for the next agent to proceed. If they don't, there's no fallback. We have not yet encountered this limitation in our demos. Tasks involving ambiguity or disagreement may require explicit negotiation channels; we have not evaluated stigmergic objects on such tasks.
What happened when we ran it
To see how stigmergic objects behave in practice, we ran three demos. The first is the closest analog to Marcus's scenario: a research pipeline. The second is an adversarial optimization loop that tested coordination under a different pressure. The third is a standalone walkthrough of the Environment API with no LLM calls.
The research pipeline
We configured three agents (researcher, analyst, writer) in a sequential topology with two shared objects: a ResearchBoard and a Report. Each agent could list, inspect, query, and invoke, but not create new objects or execute arbitrary code. The researcher and analyst were read-only on the Report. The writer was read-only on the ResearchBoard. The topic was "the current state of AI agent frameworks."
The researcher ran for five turns and stopped on its own. It produced twelve claims about frameworks including LangChain, LangGraph, AutoGen, CrewAI, and LlamaIndex. Each claim had a text field, a source, a confidence score (ranging from 0.62 to 0.80), and tags. The researcher set the board's status to "analyzing" when it finished.
The ResearchBoard's addClaim method has four parameters: text, source, confidence, and tags. We never instructed the researcher to cite sources or assign confidence scores. It inferred the expected format from the method signature. The object's type system constrained the agent's outputs; in this run it produced well-formed claims without additional prompt instructions. Whether this generalizes across different LLMs and schema complexities is unknown; we have only this single run as evidence.
The object's type system constrained the agent's outputs; the method signature defined the required fields for each claim.
The analyst ran for five turns and stopped. It found one contradiction out of twelve claims and flagged it: two claims about whether agent loops are primitive components or whether structured state replaces loops entirely. The analyst characterized this as an evolution rather than a strict contradiction. It identified seven research gaps (comparative adoption data, benchmarking, security, memory management, planning capabilities, deployment patterns, and source diversity) and set the board's status to "writing."
The writer ran for four turns and stopped. It produced a six-section report with an abstract that cited specific claim IDs from the board. The report's status was set to "final." Every claim in the report could be traced back to the researcher, through the board, to the original method call that added it.
The read-only constraints held without incident. No agent attempted to violate them. The writer never touched the board; the researcher never touched the report. Structural permissions alone enforced the read-only behavior; no additional prompt instructions were required.
The entire pipeline produced a structured event stream: 12 addClaim events, 1 flagContradiction event, 7 addGap events, 6 addSection events, 2 setStatus events, 1 setAbstract event. Every event carries the object ID, the method name, the arguments, and the before/after state diff. This observability came from the framework's event model; we did not add any explicit logging code.
Hill-climbing optimization
The second demo tested a different coordination pattern: two agents (a designer and a critic) iterating on a rate limiter API design. The topology was shared artifact, meaning the agents took turns until the Solution object detected convergence. Convergence was defined as a score of 90 or above, or two consecutive scores within 5 points of each other with both above 75. The maximum was five rounds.
The designer examined the Solution object, proposed an API, and the critic scored it. Then the designer revised based on the feedback, the critic scored again, and so on.
The interaction between the designer and critic revealed a failure mode where the agents optimized against each other's feedback instead of improving the API. The designer addressed the critic's feedback from round one ("the API is too simple, it doesn't cover edge cases") by adding more structure. The critic then penalized the growing complexity. Each round, the designer responded to "this is too complex" by adding more precision, which the critic interpreted as more complexity. The designer and the critic were optimizing against each other, not toward convergence.
The agents alternated turns correctly, the Solution object tracked scores, and convergence checks ran after each round. The coordination layer behaved as designed; the unexpected behavior came from how the LLMs responded to each other's feedback.
This failure exposes two behaviors of LLM-based optimization that matter for system design. First, LLM-based optimization doesn't behave like gradient descent. There is no fixed loss surface. The evaluator's interpretation shifts as the artifact changes shape. What the critic meant by "simplicity" in round one was not what it meant in round three. The fitness landscape was non-stationary because the evaluator itself changed.
LLM-based optimization doesn't behave like gradient descent. The evaluator's interpretation shifts as the artifact changes shape.
Second, despite not converging on the metric, the process produced a substantive artifact. The final rate limiter API was a real, thought-through TypeScript design with branded numeric types, store protocols, and composite limiters. The scores said it was getting worse; the artifact suggested otherwise. We don't have a clean resolution for this divergence. It points to a research question about how to make LLM evaluation consistent enough for iterative optimization.
What we found
We used the system for two demo sessions over the course of a day. This is a narrow empirical base, and we want to be precise about what we can and cannot claim from it. All observations below are from a single developer running demos with one LLM provider (Azure OpenAI, gpt-5.2). We have not had external users, and we have not used the system for production work.
All observations are from a single developer, one LLM provider (Azure OpenAI), and two demo sessions. We have not had external users, and we have not used the system for production work.
Several patterns appeared during the demo sessions:
Objects shape agent behavior. The researcher produced claims with sources and confidence scores because the method signature asked for them. The analyst produced structured gap analyses because addGap had a description and severity parameter. We didn't write detailed instructions for output format. The objects' affordances defined which actions were available through their method signatures, and the agents used those methods accordingly. This mechanism differs from prompt engineering: the shared objects serve as coordination artifacts that both agents and human users can read and update to meet their respective information needs.
Structural constraints outperformed prompt-level constraints. Read-only enforcement was absolute. No agent in any run attempted to write to an object it was constrained from modifying. Anecdotally, prompt-level instructions like "do not modify the research board" are often ignored by LLMs, especially under long contexts or complex tasks. The structural constraint left nothing to interpret. We suspect this difference would hold across models, but we have only tested one.
Read-only enforcement was absolute. The structural constraint left nothing to interpret.
Stigmergic coordination was simple to implement. Our initial hypothesis, from the spec written before implementation, was that multi-agent coordination could reduce to "multiple agent configs taking turns interacting with the same environment." This held up. The four topology runners average 70 lines of code each. All complexity is in the agent-tool bridge (integrating with the LLM runtime) and the agent runner (managing turns and termination), not in coordination logic. There is no coordination logic. The objects carry it.
Event streams provided observability without effort. Because every mutation goes through Environment.invoke, the system produces a complete, structured log of every action taken by every agent. We did not build a logging framework or an observability layer; the event emission in Environment.invoke produced the complete log. In the research pipeline, the event stream reads as a play-by-play: "researcher added claim about LangGraph with confidence 0.75," "analyst flagged contradiction between claims 3 and 7," "writer added section 'Framework Landscape' citing claims 1, 3, 5, 8." This level of traceability typically requires additional instrumentation; in this design it comes from the core event mechanism.
Convergence self-detection worked mechanically but not evaluatively. The Solution object's convergence logic (score plateau detection) ran correctly at every step. It checked the scores, compared deltas, and accurately reported "not converged." The problem was that the scores themselves were unstable. The object faithfully tracked what the critic reported; the critic's reports were inconsistent. This suggests that convergence detection belongs in the object, where it can encode domain-specific criteria, but the criteria need to account for non-monotonic evaluation, which ours did not.
Something we didn't plan for: the Solution object could detect its own convergence. We had given it a score history and a boolean computed from the last two entries, but we hadn't expected this to change the topology's behavior. In earlier prototypes, the orchestration layer decided when to stop. Here, the object told the orchestrator it was done. Termination logic moved from the runner into the Solution object, which had enough state to signal when the topology should stop.
What doesn't work
Token usage tracking is broken in our current implementation. The runner reads token counts from agent events, but different LLM providers structure those fields differently, so every agent reports zero tokens. This matters in production: without clear token usage, teams avoid running or iterating on the system because they cannot predict or control costs. The fix involves a normalization layer for provider-specific token fields, but we haven't built it.
The six generic tools add overhead from extra tool calls and token usage. Every agent in the research pipeline spent its first two to three tool calls on env_list and env_inspect before doing any real work. That's pure overhead. From the agent's perspective, each call incurred overhead: extra tokens, JSON wrapped in strings, and a discovery phase before any substantive work. From the system's perspective, the generic verbs were the point: six tools that worked for any object without code generation. Both designs were defensible: generic tools optimized for reuse across objects, while specialized tools optimized for lower per-call overhead. Our current plan is a hybrid: generate direct tools from object methods at agent creation time (board_addClaim(text, source, confidence, tags) with proper typed schemas), and keep env_list and env_query for discovery only. This preserves the coordination benefits while eliminating the interface tax.
Concurrent writes are undefined. Two agents writing to the same object simultaneously would produce unpredictable results. The proxy tracks changes but doesn't lock. The parallel topology demo avoided this by having agents write to different objects, but a real parallel pipeline would need either locking, CRDT-style merge, or a constraint that concurrent agents must write to disjoint objects. We haven't built any of these.
The system is entirely in-memory. There is no persistence, no serialization, no recovery after crash. Everything runs in a single process and evaporates when it stops. For a demo, this is fine. For real work, it means a crash during a pipeline run loses all progress. Persistence is the first thing we would add for production use, but we haven't added it because the coordination model needed to be right first.
We have not tested with more than three agents and two objects in a single pipeline. We don't know how the system behaves with twenty objects and ten agents. Progressive disclosure (agents only load what they inspect) should help, but we haven't measured system prompt size growth or LLM performance degradation at scale.
We have not measured performance in any formal way. We can report that the research pipeline completed in approximately two minutes and the hill-climbing demo in approximately four minutes, but we don't have latency breakdowns, token cost per agent, or throughput numbers. We have not yet measured stigmergic objects' performance in any formal way.
Related ideas
The idea that agents can coordinate through a shared environment rather than through messages has appeared before, in different forms.
Blackboard systems, developed in the 1970s and 1980s for speech understanding and other AI tasks, used a shared data structure that multiple knowledge sources could read and write. A control component decided which knowledge source to activate next. Stigmergic objects are structurally similar: LLM agents are the knowledge sources, and the topology runner is the control component. The key difference is that blackboard entries were typically untyped; stigmergic objects carry their own schema, methods, and lifecycle constraints. The object is not just a data store; its schema and lifecycle rules constrain which operations agents can perform.
The term stigmergy comes from entomology. Pierre-Paul Grassé coined it in 1959 to describe how termites coordinate construction without direct communication: each termite modifies the environment (deposits a mud pellet), and the modification stimulates the next termite's behavior. The key property is that coordination scales sublinearly, because adding a termite doesn't increase communication overhead. By analogy, adding an agent to a stigmergic pipeline can often avoid changes to existing agents or additional message channels, as long as the shared objects already expose the required operations. It requires giving the new agent access to the objects it needs to read and write.
Alan Kay's original framing of object-oriented programming centered on objects communicating through messages. Stigmergic objects take this literally, but with a twist: the message senders are LLM agents with probabilistic behavior, not deterministic programs, and the dispatch goes through a reflective environment that can intercept, diff, and record every call.
The Deployment object's lifecycle transitions (build, test, staging, production, with rollback) resemble a smart contract's state machine. The enforcement pattern is the same: rules live in the data, not in the caller. But there is no consensus mechanism, no gas, no immutability. The parallel is in the enforcement pattern, not the implementation.
A useful conceptual frame is Susan Leigh Star's notion of boundary objects: artifacts that inhabit multiple communities of practice and satisfy the informational requirements of each. A ResearchBoard is a boundary object. The object doesn't need to know about the agents, and the agents don't need to know about each other; the object sits at the boundary between them. The researcher uses it for claim collection. The analyst uses it for contradiction detection. The writer uses it for citation. Each agent interacts with the same object through different operations relevant to its role.
What's next
Several questions are open.
The most immediate is the tool interface. We know the six generic tools carry unnecessary friction for LLMs. We believe the right answer is generating typed tools from object methods at agent creation time, keeping the generic tools for discovery. Generating typed tools from object methods at agent creation time may reduce token overhead and improve agent accuracy; we plan to evaluate this empirically. We haven't built it yet.
The harder question is evaluation stability. The hill-climbing demo revealed that adversarial LLM optimization with a shifting evaluator doesn't converge. We don't know whether a fixed rubric stored in the object would solve this, whether a rubric-generating agent that runs once at the start would help, or whether the problem is fundamental to LLM-based evaluation. We currently treat this as a research question and haven't identified an engineering fix.
We also don't know under what conditions stigmergic coordination stops working well. Our demos were all cases where the objects carried enough information for each agent to proceed without asking questions. There are surely tasks where agents need to negotiate, clarify ambiguity, or express disagreement. What those tasks look like, and whether the solution is a message channel alongside the objects or richer objects that can express uncertainty, is something we'd like to explore.
The thing we most want to do next is use this system for real work—a pipeline that produces something we actually need, rather than another demo. The current observations are based on one day of demo execution by a single developer. Sustained use would show which objects we reach for, which operations we add, and what breaks when tasks are messy and stakes are higher. We built the coordination layer; the next step is to use it in day-to-day workflows.
We've published the @imps/semantic-objects package and the demo code. The primitives are small: an Environment class, typed objects with lifecycle transitions, and six bridge tools (soon to be generated direct tools). If you're building multi-agent systems and have found your own version of Marcus's coordination problem, or solved it differently, we'd like to hear about it.
Contact: william@idylliclabs.com