What I Learned Building Production Agents on SpaceBot
I’ve been building production agents on SpaceBot, the open-source multi-agent framework I use for most of my autonomous systems. This is a behind-the-scenes account of what I learned about the framework in the process — and the management dashboard I ended up building because the tooling didn’t exist yet.
Running Real Workloads on SpaceBot
My agents handle scheduled, multi-step workflows — the kind where an agent fires on a timer, chains multiple MCP tool calls, processes data, and takes action based on the results. These jobs run on fixed intervals via SpaceBot’s scheduler, with each job prompt including a time guard so the agent checks the current time before doing any work and skips cleanly if it’s outside the allowed window.
For the LLM routing: worker tasks (MCP tool calls, data processing, structured output) run on DeepSeek — fast and cheap for work that doesn’t require frontier reasoning. The channel agent, which synthesizes results and handles Telegram conversations, uses Claude Sonnet for quality.
This is where SpaceBot’s rough edges show up. Not in the hello-world demo, but three weeks into production when your scheduled job silently stops firing and you’re reading Rust logs at midnight trying to figure out why.
What I Learned About SpaceBot the Hard Way
Most of what follows was discovered through log inspection, API reverse-engineering, and direct experimentation. SpaceBot is infrastructure-level software — opinionated, fast to get running, and not especially well-documented at the edges. That’s the tradeoff for using a framework that’s still early and moving fast.
The scheduling system isn’t what you think
SpaceBot accepts a cron_expr field in job configuration that looks like standard cron syntax (0 21 * * 1-5 for 9pm on weekdays). It is silently ignored. The actual scheduling field is interval_secs, which fires the job every N seconds from the Unix epoch, aligned to wall-clock boundaries.
A 43200-second interval (12 hours) always fires at midnight and noon UTC — not at configurable times. This is why all my time-window logic lives in the agent’s prompt, not the scheduler. It took a few missed runs to figure this out.
The default timeout will bite you
Cron jobs have a default timeout of 120 seconds. Fine for simple LLM calls. Way too short for jobs that chain multiple MCP tool calls — a complex agent run can take 3–8 minutes depending on how many external APIs it touches.
Jobs that time out deliver whatever partial result exists and then — in version 0.2.1 — trigger a scheduler bug where no subsequent auto-fires are scheduled. The job appears enabled in the API with zero failures, but nothing runs. Manual API triggers still work. Setting timeout_secs = 600 on each cron entry and upgrading to 0.2.2 resolves both issues.
The readiness contract can silently skip runs
Before dispatching a cron job, SpaceBot checks that the agent meets a “readiness contract” — specifically that the memory bulletin is fresh (no older than 30 minutes by default). If the contract isn’t met at fire time, scheduled runs are silently skipped. No error. No log entry at the warn level. Just silence.
This means a poorly-timed bulletin generation pause can drop a scheduled run entirely. It’s defensible design — don’t run jobs against stale context — but it’s the kind of thing you want to know before your agent misses a critical run and you spend an hour checking the scheduler.
The full API isn’t documented
SpaceBot’s REST API is broader than the docs suggest. To build the Studio dashboard (more on this below), I extracted all fetch() calls from the minified JS bundle served by the built-in web UI. This revealed undocumented endpoints: /api/agents/workers, /api/agents/workers/detail, /api/agents/identity, /api/agents/config, /api/agents/profile, /api/providers, /api/models, /api/events (SSE stream), and the full cron management CRUD surface.
If you’re building on SpaceBot, inspect the network tab before you start writing API calls. The docs are a starting point, not a complete reference.
SpaceBot Studio: The Dashboard I Built Because It Didn’t Exist
SpaceBot ships with a built-in web UI that’s functional but generic. As my agents became operational, I needed something purpose-built — a dashboard that could show cron job status, conversation history with collapsible tool calls, provider connectivity, and live model routing at a glance.
So I built SpaceBot Studio: a standalone React + Vite application with a Bloomberg Terminal aesthetic. Dark background, amber accents, monospace type, no border-radius anywhere.
What it does
- Dashboard — system status, uptime, cron job table with manual run buttons, active channels
- Cron Jobs — expandable job cards with full prompt text, execution history, pause/resume controls
- Conversations — channel list with threaded message view; worker runs are collapsible with full task and result text
- Agent — AI-generated profile, model routing table, tuning parameters, editable identity and system prompt
- Providers & Models — all configured LLM providers with connection status, model capability badges (reasoning, tool use), API key testing
Real-time updates via SSE
Studio connects to SpaceBot’s /api/events SSE endpoint through a custom useSpacebotEvents hook. On incoming events, the hook invalidates the relevant React Query cache keys — messages, channels, workers, crons — which triggers automatic refetches. This replaces aggressive polling with push-based updates.
The hook handles named SSE event types (channel_message, worker_run, cron_execution) and falls back to broad invalidation for unknown formats. Reconnection is automatic after a 5-second delay on disconnect. Connection state (connecting / live / offline) is visible in the sidebar.
SpaceBot Studio is open-sourced on our GitLab as a general-purpose management UI for any SpaceBot deployment.
The Broader Lesson
Building autonomous agents that handle real consequences forces a level of rigor that demo projects don’t. Every edge case matters. Every silent failure has to be accounted for. And the framework you build on needs to be solid enough to trust at 3 AM when nobody is watching.
SpaceBot gets you most of the way there. The rest is engineering.
If you’re building autonomous agents for your business and want to talk through architecture, framework choices, or whether an agent is even the right approach — I’m happy to have that conversation. See more about how I work or read the framework comparison.