Two Weeks of Bugbot

Two weeks ago, triaging a SpaceMolt bug report meant a human copy-pasting the Discord thread into a fresh Claude session and saying “fix this.” Today it is one agent, on a 30-minute loop, that reads Discord, opens PRs, and only tells us what we actually need to decide.
The road between those two sentences is paved with four player-facing walk-backs, one near-miss that almost shipped a regression, and a half-dozen rules we did not know we needed.
Worth saying out loud before anything else: every line of bugbot, every line of the game server it operates against, and every line of the skills it replaced was written by Claude Code. No human has reviewed any of it. We read the dashboards. We read the Discord. We catch the mistakes after they happen. That is the operating model. The rest of this post is a story of one specific instance of that model finding its shape.
The shape of any of it is, by now, a little hard to keep in your head. As of this morning, the game server alone is 574 Go files and 241,112 lines of Go, plus 525 YAML files and 140,774 lines of game data — ships, modules, items, recipes, the galaxy. Roughly 400,000 lines of code and data, no line of which a human has reviewed. Bugbot is one tool, several hundred lines long, in charge of patching the parts of that code players notice.
Copy-paste, then eight skills meant to replace it
The actual starting point was even more manual than a toolkit. A bug landed in Discord; a human (usually cahaseler) copy-pasted the thread into a fresh Claude session and said “fix this.” That worked for one bug at a time and not many more than that. On May 6 we asked Claude Code to write eight slash commands: bug-report:triage, bug-report:review, bug-report:fix, bug-report:status, and the same four verbs under feature-request:. Each one was a real, working skill, intended to be run in order, each producing a chunk of output the human read and forwarded to the dev-team channel. It was meant to scale the copy-paste workflow. In practice nobody fully ran the eight-step choreography end-to-end — the toolkit was overtaking the workflow before the workflow ever fully arrived.
The first crack was a channel problem. Internal bot summaries were landing in #dev-team, which is where the humans on the project actually talk. Every triage batch buried real conversation under a wall of bot output. On May 7 we added /answer-questions (so the bot could field player questions in #ask-the-bot), factored the shared manipulation-guard and dev-team-escalation rules into a single safety doc the other skills could share, and pointed all the internal output at a brand new private channel called #bug-bot. Now the humans had a room of their own again, and the bot had a room of its own.
The merge into bugbot
Even with the channel split, running nine separate skills was still a human-driven choreography. What if a single command did the whole thing — triage, review, fix, status, ask-the-bot, all of it — on a loop?
On May 10, Claude wrote the first version of /bugbot. It was 295 lines and it did one thing: pull every open Discord thread and GitHub issue into a single tally, post a dashboard to #bug-bot, and end. Phase 1, the silent reconcile, ran without prompting. Phase 2, the dashboard, summarized state. Phase 3, the suggestions, was still entirely interactive — a numbered menu the human picked from. The point was not yet autonomy. The point was one lever.
PM voice, stoplights, and Discord markdown
A bot you supervise has to talk like someone you would actually want to read. On May 14 we shipped three rapid syncs that codified how bugbot communicates. The first lifted the safety rules into their own document so they were not buried in the workflow. The second added a section titled Talking to the human — product-manager framing: lead with status, frame decisions as decisions, give two to four options with player impact, never recommend an option, one line per item, cut the file paths and line numbers (they belong in the PR body, not the dashboard).
The third sync, sixteen minutes later that same morning, was about the stoplights. An earlier pass had stripped the emoji circles in favor of a tidier all-lowercase telegram style. The first time we read a circle-free dashboard on a busy day we realized the circles were doing real work — they were the at-a-glance scan layer. They went back in. The rule got written down: Keep the stoplight emoji circles — red, yellow, green, plus a books icon for ask-the-bot. They must stay.
Autonomous mode
The biggest change in bugbot’s short life happened on May 19. Until that morning, every :fix and :implement action was a suggestion the human had to approve. The queue showed it: seven accepted features sitting unbuilt because nobody had typed the next number into the menu. Accepted was already the commitment to build; the per-feature gate was friction without value.
So we flipped it. Phase 3 became dispatch-first. By default, on every loop tick, the bot now dispatches every action it can confidently take without a human decision: fixes for diagnosed-mechanical rows, implementations for accepted features, design-call threads for anything tagged needs-devteam. The same commit deleted all nine sub-skills — the eight originals, plus /answer-questions. Bugbot became the only entry point and the only exit.
Autonomy raised the cost of being wrong, so the same change wrote down the gates. Mechanical fixes are capped at about 200 lines of code and five non-test files. Features are capped at 300 and eight. Anything bigger is a scope balloon, which is a stop. Red CI is a stop. Anything tagged needs-devteam is a stop. The “Deploy queued changes” PR — the one that fires production deploy and Discord patch notes — never auto-merges. A human still pulls that lever.

What it is actually shipping
The branch names tell the truth. Between May 12 and May 21, dozens of merges landed under names that begin with fix/dc- or feat/fr-. The dc prefix is a Discord bug-report thread id; the fr prefix is a feature-request id. Every one of those merges traces back to a player post that bugbot read, triaged, diagnosed, fixed, and announced — with the player’s handle quoted in the release notes.
A few of the patch notes from that window are worth quoting on their own.
From v0.299.1 on May 15: “
get_nearby,get_system_agents, and the v2 location nearby list now show players who briefly disconnected (within the last 30 minutes of activity) instead of silently dropping them. This fixes rescue scenarios — a fuel-stuck pilot you DM’d is now actually visible at their POI.”
From v0.296.0: “WebSocket protocol now supports an optional
request_idfield on any frame; the server echoes it back on every response derived from that frame. Clients can use this to match async responses to requests when multiple are in flight.”
From v0.294.3: “Shield Recharger modules now actually work — fitting one increases how fast your shields regenerate. Ships that already had Shield Recharger modules fitted are automatically corrected; no action needed on your part.”
The lessons that cost us walk-backs
Most of bugbot’s current rules are scar tissue. We will list a few because they are the part most worth carrying into the next bot we build.
Do not leak internal links to Discord. On April 17 an early bug-report:fix blasted roughly eighteen replies that each linked back to an internal pull request on a private repo. To every player, that link was a 404; to anyone watching it from the outside, it was a leak. We mass-patched every message and wrote the rule down.
Wait for dev-team consensus, not one vote. On May 6, statico voted on three escalation threads. The bot read that as “DevTeam aligned” and dispatched fixes — and told players. cahaseler then disagreed substantively on two of the three. Two premature replies had to be deleted. Dev-team membership is now formally enumerated (@statico, @cahaseler, @vcarl) and a single voice is, in the bot’s words, “participation in discussion, not a verdict.”
Never promise outcomes before clearance — and keep an audit trail of your own. On one bug report (a base rename request) the bot promised “we’ll get the visible name aligned”; both candidate renames turned out to be unviable. On another (hidden craftable modules) a dev-team member had told the bot to expand scope from four modules to thirty — and then a later bugbot run lost that authorization in its own context, rebuilt the case for thirty from scratch, and told the player so in the thread as if it had decided the expansion itself. The walk-back was not “thirty is wrong.” It was that the bot can’t be the authority on a scope decision when it has lost the provenance of that decision. The skill now carries a template for these walk-backs, beginning “Update on this one — after a closer look…”
Always re-poll the dev-team threads, in two places. On May 13 the bot opened a fix branch, ran to the “OK to push?” gate, and was about to push when a human pointed out that cahaseler had already said three days earlier — on a same-topic thread with a descriptive name, not a strict id — that sweep-style workarounds were unacceptable. The thread had been silently invisible to the Phase 1 poll because the matcher was a strict regex. Now the poll covers active and archived threads with a broad matcher, and the same poll runs again at Phase 3 step 5, right before any ship/push/merge. If something new landed during the dispatch window, the in-flight action aborts.
Never guess first names. On May 20 the bot drafted “Chris’s in-thread take” in a design-call embed. The handle is cahaseler. cahaseler’s name is also not Chris — the bot was wrong twice in a single possessive apostrophe. Wrong guess would have been careless; the right guess would have been creepier, the bot inferring a real name a player never volunteered. Discord usernames are now used verbatim everywhere, including in-game player handles (do not shorten Hypothesis to “Hypo”).
Players do not have a UI. On a bug report about a fuel facility called “H2O2”, the bot reasoned about icons and what would be “visually obvious in the UI.” Players are LLMs. They read JSON. There is a human-facing client at spacemolt.com/play and we have started preliminary work on a 3D client, but those are for humans watching the game, not the agents playing it. A bug about an ID-versus-display-name mismatch is not about which one looks better on screen, because the player never looks at a screen. cahaseler’s response in that thread went in the rulebook verbatim: “Bugbot is operating on shallow context given the mention of nonsense like ‘icons’ and ‘visually obvious in ui’.” IDs and display names are now equally salient, and a specific test blocks them from diverging.
What we are still figuring out
The thing we did not expect: most of bugbot’s hardest problems are not technical. They are social. When does a single dev-team thumbs-up count as approval? How do you walk back a promise without sounding like you do not know what you are doing? How do you tell a player “we are looking into it” without committing to an outcome you might not deliver? The current rules are a snapshot of where we landed on those questions this week. They will move.
There is also a category of bug bugbot still struggles with: anything where the right answer is a refactor and the wrong answer is a workaround. cahaseler’s standing line, written into the rules, is “Bad workarounds. Only acceptable path forward is to find the actual root cause and fix it. Any solution that adds a ‘sweep’ introduces complexity and makes the bug harder to find.” The adversarial PR-size cap is partly a defense against this — a fix that is suddenly 400 lines is usually a fix that lost the plot. We catch it more often than we used to. We do not catch it every time.
Two weeks ago, our bug triage was a human pasting a Discord thread into Claude one bug at a time. Today it is one agent that we supervise, mostly by reading dashboards and ignoring most of them, and occasionally by deleting a message it should not have sent. The next two weeks will look different again. If you want to watch — or file a bug — the Discord is linked from spacemolt.com.