Stop Burning Tokens: What We Learned Replacing an AI Agent with a Dispatch Gateway
We spent a day deploying a multi-agent orchestration system for a creative agency client. Three specialized agents, skill definitions, worker chains, prompt engineering discipline files — the whole nine yards. It worked. Sometimes. When it felt like it.
Then we ripped it all out and replaced it with 400 lines of Python that does the job better, faster, and at 1/100th the cost.
Here’s what we learned.
The Setup
The client needed a Slack bot that could answer questions about their Asana tasks, find files in Dropbox, and check Google Calendar availability. That’s it. Three integrations, read-heavy, straightforward queries.
We reached for an agent framework. It seemed like the right call — agents handle multi-tool workflows, they reason about which tool to use, they chain operations together. The framework we chose (SpaceBot) supports multiple specialized agents, inter-agent routing, skill definitions, and a sophisticated worker system with conversation compaction.
On paper, it was perfect.
What Actually Happened
The $0.70 Calendar Check
The simplest possible query — “what’s on my calendar today?” — would routinely take 5-8 turns to resolve. The agent would:
- Receive the message in the hub agent
- Route it to the PM agent (specialized for calendar/tasks)
- Spawn a worker subprocess
- The worker would read its skill definition
- Attempt to launch a browser (ignoring explicit instructions not to)
- Fall back to the API after the browser failed
- Make the actual calendar API call
- Format the response
- Return through the agent chain
Cost: $0.30-0.70 in tokens. Response time: 2-3 minutes. For a query that should be a single API call.
The Free-Range Problem
We wrote extensive discipline files (SOUL.md) for each agent. “NEVER launch a browser.” “ALWAYS read skills FIRST.” “Do NOT use apt-get.” The workers ignored them roughly 40% of the time. Not because the instructions were unclear — because the underlying model’s training data creates a gravitational pull toward certain behaviors. When a Haiku worker sees “Google Calendar,” its strongest association is “launch browser and authenticate,” not “call REST API with a service account JWT.”
No amount of prompt engineering fixed this reliably. We were fighting the model’s priors on every query.
The Token Furnace
Every query, regardless of complexity, burned through the full agent pipeline:
- Hub agent processes the message and loads its SOUL file (~2,000 tokens)
- Routing decision to specialist (~1,000 tokens)
- Specialist loads its own SOUL, context, and skill definitions (~3,000+ tokens)
- Worker spawns with its own context, SOUL, and skill files (~3,000+ tokens)
- Worker reasons about approach — and every turn re-sends the full context
- Workers regularly get caught in 15-25 turn loops, each pass burning the full context window
- Response formatting through the chain back up
A simple “list my Asana projects” query that returns 10 words of data could consume 100,000+ tokens — because every turn in the chain reloaded the full SOUL file, skill definitions, and accumulated context. A worker caught in a 25-turn reasoning loop passes that entire context every single time. The same query through a direct API call needs about 3,000 tokens total.
Environment Variable Hell
Workers run as subprocesses. They don’t inherit environment variables from the parent process. We had to create .api_keys files in every agent’s workspace directory, source them at the top of every skill command, and debug silent failures when a key was missing or the source command was forgotten.
This is a framework-level architectural decision that made sense for sandboxed execution but created constant operational friction for our use case.
The Realization
After a day of debugging worker behavior, writing increasingly desperate prompt instructions, and watching the token counter spin, we had a conversation that crystallized the problem:
We were using a reasoning engine for a routing problem.
The client’s queries fall into a handful of categories:
- “What tasks are assigned to me?” → Call Asana API
- “Find the brand assets folder” → Call Dropbox API
- “What’s on my calendar Friday?” → Call Google Calendar API
- “Get me a link to that file” → Call Dropbox share API
There’s no multi-step reasoning required. No planning. No dynamic tool selection across ambiguous problem spaces. The LLM’s job is to understand which tool to call and what parameters to pass. That’s intent classification, not reasoning.
What We Built Instead
We call it OpenEar. It’s a constrained dispatch gateway — not an agent.
The entire dispatch loop:
- Message arrives
- Load tenant config (which tools are available)
- Call Claude with tool schemas + system prompt
- Claude picks a tool and formats the parameters
- Execute the tool (direct REST API call)
- Call Claude with the result to format a response
- Reply
Maximum 2 tool calls per query. No planning loops. No worker chains. No skill definitions. No agent routing.
The Numbers
| Metric | Agent Framework | Dispatch Gateway |
|---|---|---|
| Tokens per query | 100,000+ | ~3,000 (2 LLM calls total) |
| Cost per query | $0.30-0.70 | ~$0.003 |
| Response time | 2-3 minutes | ~2 seconds |
| Reliability | ~60% first-try success | ~99% |
| Code complexity | 3 agents, skills, SOUL files, routing | 400 lines of Python |
| Config per client | Agent definitions, workspace files, skill scripts | One YAML file |
The cost difference is roughly 100x. Not because we switched models — both systems use Haiku for the actual work. The difference is eliminating the reasoning overhead. The agent framework was spending 80% of its tokens on deciding how to do something. The dispatch gateway spends 100% of its tokens on doing it.
What We’d Tell You Before You Deploy an Agent
1. Classify your queries first
Before choosing an architecture, look at what your users actually ask. If 90% of queries map to “identify intent → call one API → format result,” you don’t need an agent. You need a router.
Agents earn their complexity when queries require genuine multi-step planning with branching logic: “Research this topic, evaluate three options, draft a recommendation, and schedule a review meeting.” If your queries don’t look like that, an agent is overhead.
2. Tool schemas are your prompt engineering
In the agent framework, we spent days writing skill definitions, discipline files, and behavioral constraints — all to get the LLM to call the right API in the right way.
With tool_use mode, Claude’s tool schemas ARE the instructions. You define the function signature, the parameter descriptions, and Claude fills in the blanks. No skill files. No behavioral coaching. The structured format eliminates an entire class of “the LLM didn’t follow instructions” problems.
3. Bound your loops or pay the price
An agent framework’s default behavior is to keep going until it thinks it’s done. “Keep going” means more LLM calls, more tokens, more latency, more opportunities for the model to go off-script.
Our dispatch gateway has a hard cap: 2 tool calls, then you must produce a text response. This isn’t a limitation — it’s a feature. It means costs are predictable, responses are fast, and the system can’t spiral into a 25-turn reasoning loop about how to check a calendar.
4. Thread context doesn’t need memory compaction
The agent framework had a sophisticated “compactor” that would summarize long conversations to stay within context limits, using a separate LLM call. We replaced this with a SQLite table that stores the last N user/assistant messages per Slack thread and injects them into the system prompt as plain text.
Cost of the compactor: one Haiku call per message after the context window fills up. Cost of our approach: one SQL query. Zero tokens.
5. Multi-tenant doesn’t need multi-agent
The agent framework required a complete agent configuration per client: workspace directories, identity files, skill definitions, routing rules, inter-agent communication edges.
Our multi-tenant approach: one YAML file per client. Different clients can have different tools, different models, different system prompts, different Slack workspaces. Adding a new client is copying a file and adding credentials. No code changes. No agent configuration. No restart required beyond a container restart.
Where Agents DO Make Sense
We’re not anti-agent. Agents are the right tool when:
- The task requires genuine planning — “analyze this codebase and propose a refactoring strategy” needs multiple passes, evaluation, and synthesis
- The tool selection is genuinely ambiguous — when the system needs to reason about WHICH combination of tools to use, not just which single tool
- The workflow is non-deterministic — different inputs lead to fundamentally different execution paths that can’t be pre-mapped
- The task is long-running — research, analysis, and content generation that benefits from iterative refinement
For everything else — and “everything else” covers the vast majority of business tool integration — a dispatch gateway is simpler, cheaper, faster, and more reliable.
What We’re Building Next
The dispatch pattern opens up extensions that would have been impractical with the agent approach:
-
Passive business context — every tool call returns data about how the business operates (who’s assigned to what, where files live, who attends which meetings). We’re building a lightweight knowledge graph in SQLite that captures these relationships and injects relevant context into future queries. The system gets smarter over time without a separate “training” step.
-
Write workflows — the same dispatch pattern handles “create a task in Asana and link the Dropbox file” with 2 tool calls. No planning needed. Claude picks the tools, we execute them.
-
New integrations — each new tool is a single Python file that implements two methods:
get_tool_schemas()andcall_tool(). The dispatch loop handles everything else. We’ve gone from “weeks to deploy a new integration” to “afternoon.”
The lesson isn’t “agents are bad.” It’s “agents are expensive, and most problems aren’t agent-shaped.” Know which one you’re looking at before you start building.