Compared to other frameworks
We surveyed every Claude Code agent framework we could find before deciding to write our own. Here’s the matrix.
| Framework | Author / Org | Core idea | Closest pattern we adopted |
|---|---|---|---|
| gstack | Garry Tan (YC) | 23 role-based slash commands, “boil the lake” | Boil the lake principle in 5 role prompts |
| Superpowers | Jesse Vincent | 7-phase TDD iron law | TDD iron law in validator |
| GSD | Lex Christopherson | Per-phase orchestrators with state-to-disk | Fresh contexts + cache-stable prompts |
| Agentwise | Phil | 8 specialist agents in parallel + dashboard | Real-time dashboard pattern |
| Hermes | (referenced in gstack) | Autonomous orchestration with checkpoints | Named checkpoints |
| Multi-Agent Ralph Loop | alfredolopez80 | MemPalace 4-layer memory | 4-layer memory (L0-L3) |
| ComposioHQ Agent Orchestrator | Composio | Git worktrees per parallel agent | Worktrees per lane |
| OpenSwarm | unohee | Linear-driven Worker/Reviewer pair | (rejected — no external task source) |
| Conductor | $22M Series A | Two-mode parallelism | Competition mode |
| claudecode-orchestrator | darrenapfel (deprecated) | “Quality through truth” + service smoke-test | Evidence rule + smoke-test gate |
| MOLTRON | — | Self-evolving Skills.md | TRICK: convention |
Plus general-purpose: AutoGPT, LangChain, CrewAI, MetaGPT, SuperAGI, Haystack, Semantic Kernel — these aren’t coding-specific and we didn’t borrow patterns from them directly.
Two ecosystem clusters
Section titled “Two ecosystem clusters”Reading the surveys, the Claude Code ecosystem clusters around two patterns:
- Skill packs — gstack, Superpowers, GSD. Slash commands inside a Claude Code chat. Human is the orchestrator. Claude runs one task at a time. Useful for solo founders but not autonomous.
- Multi-agent runtime — Agentwise, Hermes, Conductor, ComposioHQ. They run agents in parallel. Closer to what we needed, but each ties orchestration to their opinion of how the company should work.
We landed in a third cluster: a tiny supervised loop that owns no opinion about the company, just the agent runtime.
What we borrowed
Section titled “What we borrowed”| From | Pattern | Our adoption |
|---|---|---|
| Superpowers | TDD iron law | Validator HARD RULE: behavioural assertions need failing-then-passing test |
| gstack | Boil the lake | Added to debugger / ui-qa / curator / product / architect prompts |
| claudecode-orchestrator | Quality through truth | Validator EVIDENCE RULE: quote source output to claim PASS |
| claudecode-orchestrator | Service smoke-test | bin/service-smoke-test.sh + smokeTest.onDone/onFeaturePass |
| Hermes | Named checkpoints | CHECKPOINT <name> decision verb + triggerOn: post-planner auto-fire |
| Composio | Git worktree per agent | branchIsolation.useWorktrees + worktree_* fns |
| Conductor | Two-mode parallelism | parallelWorkers.mode: "lane"|"competition" |
| Agent Swarm | Per-agent IDENTITY.md | ~/autonomous-harness/identity/<role>.md cross-mission append-only |
| MOLTRON | Self-evolving learnings | Worker TRICK: convention promoted by curator |
| Ralph Loop | 4-layer memory | L0 runtime / L1 raw / L2 summary / L3 MEMORY / L4 identity |
| GSD | Per-phase orchestrators with state-to-disk | Fresh-context per role per iteration |
| Agentwise | Real-time dashboard | Next.js /harness UI |
What we rejected
Section titled “What we rejected”| From | Pattern | Why rejected |
|---|---|---|
| OpenSwarm | Linear/Notion task source | User explicit: no external task source |
| OpenSwarm | LanceDB vector memory | Overkill at our scale |
| Claude-Swarm | Tmux-based agent messaging | File-based simpler |
| Agentwise | Discord/Slack control | UI-coupled |
| MOLTRON | Workers rewrite their own prompts | Audit / determinism preferred |
| AutoGPT | Infinite loop, no decision verbs | Drift catastrophe |
| LangChain | Heavyweight Python runtime | Bash + claude won |
| All “skill packs” | Human as orchestrator | We needed autonomous loop |
What we invented
Section titled “What we invented”| Pattern | Why we needed it | Where it lives |
|---|---|---|
| Decisions timeline + ghost rate | Detect orchestrator parser regressions | /api/harness/:slug/decisions + dashboard panel |
| Cost-per-feature attribution | Per-session cost tracking is too coarse | Filename tag pattern: <ts>-<role>-<fid>.jsonl → /usage byFeature |
| Plan-reviewer as autonomous gate | gstack’s /plan-ceo-review is human-triggered; ours fires automatically | prompts/plan-reviewer.md + run.sh gate |
| Autonomous Product role | MOLTRON evolves capabilities; we wanted scope expansion | prompts/product.md + GOAL.md vs SPEC.md diff + proposals CRUD |
| Proposals CRUD with bulk accept/reject | Operator triage workflow | .harness/proposals/*.md + dashboard 🪄 tab |
| Per-feature timeline aggregating snapshots + agent runs + debug + PR | Forensics replacement for log scrolling | /api/harness/:slug/features/:id/timeline |
| Health endpoint with 9 composite checks | One-glance system health | /api/harness/:slug/health |
| Org layer: 5 fixed director agents coordinating via 17 typed message kinds with server-side validation + ACTIONS contract for autonomous workers | We needed an outer-loop “company” model that didn’t exist anywhere | apps/web/app/api/_hono/org.ts + ~/.restart-org/ + ~/org-{dept}/.harness/ |
The bet
Section titled “The bet”The cheapest, most maintainable autonomous coding rig is:
- A small bash supervisor with no agent-runtime opinion (~1500 lines).
- Stable per-role markdown prompts (so prompt-cache hits dominate).
- Files-on-disk for state (so any role can re-derive from scratch).
- A curator role for memory compaction.
- Observability over autonomy — show every decision the orchestrator made.
So far: 99.94% cache-hit rate, 22/37 features green at iteration 5 of the live mission, $135/mission, ~$0.05/role-invocation effective cost.