We spent a day deploying a multi-agent orchestration system for a creative agency client. Three specialized agents, skill definitions, worker chains, prompt engineering discipline files — the whole nine yards. It worked. Sometimes. When it felt like it.

Then we ripped it all out and replaced it with 400 lines of Python that does the job better, faster, and at 1/100th the cost.

Here’s what we learned.

The Setup

The client needed a Slack bot that could answer questions about their Asana tasks, find files in Dropbox, and check Google Calendar availability. That’s it. Three integrations, read-heavy, straightforward queries.

We reached for an agent framework. It seemed like the right call — agents handle multi-tool workflows, they reason about which tool to use, they chain operations together. The framework we chose (SpaceBot) supports multiple specialized agents, inter-agent routing, skill definitions, and a sophisticated worker system with conversation compaction.

On paper, it was perfect.

What Actually Happened

The $0.70 Calendar Check

The simplest possible query — “what’s on my calendar today?” — would routinely take 5-8 turns to resolve. The agent would:

Receive the message in the hub agent
Route it to the PM agent (specialized for calendar/tasks)
Spawn a worker subprocess
The worker would read its skill definition
Attempt to launch a browser (ignoring explicit instructions not to)
Fall back to the API after the browser failed
Make the actual calendar API call
Format the response
Return through the agent chain

Cost: $0.30-0.70 in tokens. Response time: 2-3 minutes. For a query that should be a single API call.

The Free-Range Problem

We wrote extensive discipline files (SOUL.md) for each agent. “NEVER launch a browser.” “ALWAYS read skills FIRST.” “Do NOT use apt-get.” The workers ignored them roughly 40% of the time. Not because the instructions were unclear — because the underlying model’s training data creates a gravitational pull toward certain behaviors. When a Haiku worker sees “Google Calendar,” its strongest association is “launch browser and authenticate,” not “call REST API with a service account JWT.”

No amount of prompt engineering fixed this reliably. We were fighting the model’s priors on every query.

The Token Furnace

Every query, regardless of complexity, burned through the full agent pipeline:

Hub agent processes the message and loads its SOUL file (~2,000 tokens)
Routing decision to specialist (~1,000 tokens)
Specialist loads its own SOUL, context, and skill definitions (~3,000+ tokens)
Worker spawns with its own context, SOUL, and skill files (~3,000+ tokens)
Worker reasons about approach — and every turn re-sends the full context
Workers regularly get caught in 15-25 turn loops, each pass burning the full context window
Response formatting through the chain back up

A simple “list my Asana projects” query that returns 10 words of data could consume 100,000+ tokens — because every turn in the chain reloaded the full SOUL file, skill definitions, and accumulated context. A worker caught in a 25-turn reasoning loop passes that entire context every single time. The same query through a direct API call needs about 3,000 tokens total.

Environment Variable Hell

Workers run as subprocesses. They don’t inherit environment variables from the parent process. We had to create .api_keys files in every agent’s workspace directory, source them at the top of every skill command, and debug silent failures when a key was missing or the source command was forgotten.

This is a framework-level architectural decision that made sense for sandboxed execution but created constant operational friction for our use case.

The Realization

After a day of debugging worker behavior, writing increasingly desperate prompt instructions, and watching the token counter spin, we had a conversation that crystallized the problem:

We were using a reasoning engine for a routing problem.

The client’s queries fall into a handful of categories:

“What tasks are assigned to me?” → Call Asana API
“Find the brand assets folder” → Call Dropbox API
“What’s on my calendar Friday?” → Call Google Calendar API
“Get me a link to that file” → Call Dropbox share API

There’s no multi-step reasoning required. No planning. No dynamic tool selection across ambiguous problem spaces. The LLM’s job is to understand which tool to call and what parameters to pass. That’s intent classification, not reasoning.

What We Built Instead

We call it OpenEar. It’s a constrained dispatch gateway — not an agent.

The entire dispatch loop:

Message arrives
Load tenant config (which tools are available)
Call Claude with tool schemas + system prompt
Claude picks a tool and formats the parameters
Execute the tool (direct REST API call)
Call Claude with the result to format a response
Reply

Maximum 2 tool calls per query. No planning loops. No worker chains. No skill definitions. No agent routing.

The Numbers

Metric	Agent Framework	Dispatch Gateway
Tokens per query	100,000+	~3,000 (2 LLM calls total)
Cost per query	$0.30-0.70	~$0.003
Response time	2-3 minutes	~2 seconds
Reliability	~60% first-try success	~99%
Code complexity	3 agents, skills, SOUL files, routing	400 lines of Python
Config per client	Agent definitions, workspace files, skill scripts	One YAML file

The cost difference is roughly 100x. Not because we switched models — both systems use Haiku for the actual work. The difference is eliminating the reasoning overhead. The agent framework was spending 80% of its tokens on deciding how to do something. The dispatch gateway spends 100% of its tokens on doing it.

What We’d Tell You Before You Deploy an Agent

1. Classify your queries first

Before choosing an architecture, look at what your users actually ask. If 90% of queries map to “identify intent → call one API → format result,” you don’t need an agent. You need a router.

Agents earn their complexity when queries require genuine multi-step planning with branching logic: “Research this topic, evaluate three options, draft a recommendation, and schedule a review meeting.” If your queries don’t look like that, an agent is overhead.

2. Tool schemas are your prompt engineering

In the agent framework, we spent days writing skill definitions, discipline files, and behavioral constraints — all to get the LLM to call the right API in the right way.

With tool_use mode, Claude’s tool schemas ARE the instructions. You define the function signature, the parameter descriptions, and Claude fills in the blanks. No skill files. No behavioral coaching. The structured format eliminates an entire class of “the LLM didn’t follow instructions” problems.

3. Bound your loops or pay the price

An agent framework’s default behavior is to keep going until it thinks it’s done. “Keep going” means more LLM calls, more tokens, more latency, more opportunities for the model to go off-script.

Our dispatch gateway has a hard cap: 2 tool calls, then you must produce a text response. This isn’t a limitation — it’s a feature. It means costs are predictable, responses are fast, and the system can’t spiral into a 25-turn reasoning loop about how to check a calendar.

4. Thread context doesn’t need memory compaction

The agent framework had a sophisticated “compactor” that would summarize long conversations to stay within context limits, using a separate LLM call. We replaced this with a SQLite table that stores the last N user/assistant messages per Slack thread and injects them into the system prompt as plain text.

Cost of the compactor: one Haiku call per message after the context window fills up. Cost of our approach: one SQL query. Zero tokens.

5. Multi-tenant doesn’t need multi-agent

The agent framework required a complete agent configuration per client: workspace directories, identity files, skill definitions, routing rules, inter-agent communication edges.

Our multi-tenant approach: one YAML file per client. Different clients can have different tools, different models, different system prompts, different Slack workspaces. Adding a new client is copying a file and adding credentials. No code changes. No agent configuration. No restart required beyond a container restart.

Where Agents DO Make Sense

We’re not anti-agent. Agents are the right tool when:

The task requires genuine planning — “analyze this codebase and propose a refactoring strategy” needs multiple passes, evaluation, and synthesis
The tool selection is genuinely ambiguous — when the system needs to reason about WHICH combination of tools to use, not just which single tool
The workflow is non-deterministic — different inputs lead to fundamentally different execution paths that can’t be pre-mapped
The task is long-running — research, analysis, and content generation that benefits from iterative refinement

For everything else — and “everything else” covers the vast majority of business tool integration — a dispatch gateway is simpler, cheaper, faster, and more reliable.

What We’re Building Next

The dispatch pattern opens up extensions that would have been impractical with the agent approach:

Passive business context — every tool call returns data about how the business operates (who’s assigned to what, where files live, who attends which meetings). We’re building a lightweight knowledge graph in SQLite that captures these relationships and injects relevant context into future queries. The system gets smarter over time without a separate “training” step.
Write workflows — the same dispatch pattern handles “create a task in Asana and link the Dropbox file” with 2 tool calls. No planning needed. Claude picks the tools, we execute them.
New integrations — each new tool is a single Python file that implements two methods: get_tool_schemas() and call_tool(). The dispatch loop handles everything else. We’ve gone from “weeks to deploy a new integration” to “afternoon.”

The lesson isn’t “agents are bad.” It’s “agents are expensive, and most problems aren’t agent-shaped.” Know which one you’re looking at before you start building.