Questions

The questions that guide our research. Each is a doorway into a problem space we're actively exploring.

How should persistent memory be structured for agents?

Is this a database problem, a retrieval problem, or a representation problem? The answer shapes whether agent memory looks more like a knowledge graph, a vector store, or something entirely new.

What patterns are reliable enough to deserve first-class composability in agent systems?

The question isn't what's theoretically irreducible, but what works consistently enough in practice to elevate into a shared building block. We're looking at patterns across agent frameworks and real deployments to find the signal.

Can agent behavior be fully specified in a declarative format?

We're testing whether markdown or YAML can fully capture agent behavior, or whether control flow inevitably requires imperative code. The mdagent project is our proving ground for this boundary.

What's the right delegation model for human-agent work?

When should the human specify precisely and when should the agent decide? This is about finding the right handoff points that feel natural rather than bureaucratic.

How do you evaluate agent systems beyond "did it produce correct output"?

System design quality matters, not just output quality. We're looking for evaluation frameworks that capture elegance, maintainability, and composability — not just correctness.

What does a "legible representation" of an AI system actually look like?

When can a human look at a system's state and actually understand what it will do next? This cuts across visualization, inspection tools, and the fundamental question of how much complexity can be made transparent.

What can humans write in prompts that the model cannot do by itself?

This question defines the boundary of what needs to be built into agent runtimes versus what can be left to the model. It emerged from observing that some prompt constructs genuinely expand model capabilities while others are redundant.