Skip to content

Why we built our own harness

We didn’t start by writing this. We surveyed every Claude Code framework we could find — gstack, Superpowers, GSD, Agentwise, Hermes, Multi-Agent Ralph Loop, ComposioHQ Agent Orchestrator, OpenSwarm, Conductor, claudecode-orchestrator, MOLTRON — and read each one carefully.

The Claude Code ecosystem clusters around two patterns:

  1. Skill packs. gstack, Superpowers, GSD — all ship as slash commands you run inside a Claude Code chat. Human is the orchestrator. Claude runs one task at a time. Useful for solo founders, but not autonomous: a human is still required at every loop boundary.
  2. Multi-agent runtime. Agentwise, Hermes, Conductor, ComposioHQ — these run agents in parallel. Closer to what we want, but every one of them ties orchestration to their opinion of how the company should work (e.g. tmux-based messaging, Discord bots, Linear ticket sync). None separated the loop from the company.

We needed:

  • A loop we own — bash + claude, no other runtime to stand up.
  • Orchestration via files on disk so any role can read everything anytime.
  • Per-role prompts maintained as markdown, so improvements compound without code changes.
  • Observability: cost-per-feature, decision-parse ghost rate, role-by-role timing.
  • Branch isolation we can rip out and replace (no proprietary tracker integration).

What we kept from each framework is documented in Compared to other frameworks. Mostly we took prompt patterns (TDD iron law from Superpowers, boil-the-lake from gstack, evidence rule from claudecode-orchestrator) and runtime patterns (worktrees from ComposioHQ, named checkpoints from Hermes, identity files from Agent Swarm, competition mode from Conductor). We didn’t take any code.

Our bet is that the cheapest, most maintainable autonomous coding rig is:

  1. A small bash supervisor with no agent-runtime opinion (~1500 lines).
  2. Stable per-role markdown prompts (so prompt cache ~100%).
  3. Everything on disk (so any role can re-derive state from scratch).
  4. A curator role that distils noisy raw observations into durable memory.
  5. Observability over autonomy — show every decision the orchestrator made.

If that bet is wrong we’ll find out fast: the harness will rack up cost, drift, or get stuck. As of writing it ships features at $135/mission with a 99.94% cache-hit rate and 22/37 features green at iteration 5.