Skip to content

Compared to other frameworks

We surveyed every Claude Code agent framework we could find before deciding to write our own. Here’s the matrix.

FrameworkAuthor / OrgCore ideaClosest pattern we adopted
gstackGarry Tan (YC)23 role-based slash commands, “boil the lake”Boil the lake principle in 5 role prompts
SuperpowersJesse Vincent7-phase TDD iron lawTDD iron law in validator
GSDLex ChristophersonPer-phase orchestrators with state-to-diskFresh contexts + cache-stable prompts
AgentwisePhil8 specialist agents in parallel + dashboardReal-time dashboard pattern
Hermes(referenced in gstack)Autonomous orchestration with checkpointsNamed checkpoints
Multi-Agent Ralph Loopalfredolopez80MemPalace 4-layer memory4-layer memory (L0-L3)
ComposioHQ Agent OrchestratorComposioGit worktrees per parallel agentWorktrees per lane
OpenSwarmunoheeLinear-driven Worker/Reviewer pair(rejected — no external task source)
Conductor$22M Series ATwo-mode parallelismCompetition mode
claudecode-orchestratordarrenapfel (deprecated)“Quality through truth” + service smoke-testEvidence rule + smoke-test gate
MOLTRONSelf-evolving Skills.mdTRICK: convention

Plus general-purpose: AutoGPT, LangChain, CrewAI, MetaGPT, SuperAGI, Haystack, Semantic Kernel — these aren’t coding-specific and we didn’t borrow patterns from them directly.

Reading the surveys, the Claude Code ecosystem clusters around two patterns:

  1. Skill packs — gstack, Superpowers, GSD. Slash commands inside a Claude Code chat. Human is the orchestrator. Claude runs one task at a time. Useful for solo founders but not autonomous.
  2. Multi-agent runtime — Agentwise, Hermes, Conductor, ComposioHQ. They run agents in parallel. Closer to what we needed, but each ties orchestration to their opinion of how the company should work.

We landed in a third cluster: a tiny supervised loop that owns no opinion about the company, just the agent runtime.

FromPatternOur adoption
SuperpowersTDD iron lawValidator HARD RULE: behavioural assertions need failing-then-passing test
gstackBoil the lakeAdded to debugger / ui-qa / curator / product / architect prompts
claudecode-orchestratorQuality through truthValidator EVIDENCE RULE: quote source output to claim PASS
claudecode-orchestratorService smoke-testbin/service-smoke-test.sh + smokeTest.onDone/onFeaturePass
HermesNamed checkpointsCHECKPOINT <name> decision verb + triggerOn: post-planner auto-fire
ComposioGit worktree per agentbranchIsolation.useWorktrees + worktree_* fns
ConductorTwo-mode parallelismparallelWorkers.mode: "lane"|"competition"
Agent SwarmPer-agent IDENTITY.md~/autonomous-harness/identity/<role>.md cross-mission append-only
MOLTRONSelf-evolving learningsWorker TRICK: convention promoted by curator
Ralph Loop4-layer memoryL0 runtime / L1 raw / L2 summary / L3 MEMORY / L4 identity
GSDPer-phase orchestrators with state-to-diskFresh-context per role per iteration
AgentwiseReal-time dashboardNext.js /harness UI
FromPatternWhy rejected
OpenSwarmLinear/Notion task sourceUser explicit: no external task source
OpenSwarmLanceDB vector memoryOverkill at our scale
Claude-SwarmTmux-based agent messagingFile-based simpler
AgentwiseDiscord/Slack controlUI-coupled
MOLTRONWorkers rewrite their own promptsAudit / determinism preferred
AutoGPTInfinite loop, no decision verbsDrift catastrophe
LangChainHeavyweight Python runtimeBash + claude won
All “skill packs”Human as orchestratorWe needed autonomous loop
PatternWhy we needed itWhere it lives
Decisions timeline + ghost rateDetect orchestrator parser regressions/api/harness/:slug/decisions + dashboard panel
Cost-per-feature attributionPer-session cost tracking is too coarseFilename tag pattern: <ts>-<role>-<fid>.jsonl/usage byFeature
Plan-reviewer as autonomous gategstack’s /plan-ceo-review is human-triggered; ours fires automaticallyprompts/plan-reviewer.md + run.sh gate
Autonomous Product roleMOLTRON evolves capabilities; we wanted scope expansionprompts/product.md + GOAL.md vs SPEC.md diff + proposals CRUD
Proposals CRUD with bulk accept/rejectOperator triage workflow.harness/proposals/*.md + dashboard 🪄 tab
Per-feature timeline aggregating snapshots + agent runs + debug + PRForensics replacement for log scrolling/api/harness/:slug/features/:id/timeline
Health endpoint with 9 composite checksOne-glance system health/api/harness/:slug/health
Org layer: 5 fixed director agents coordinating via 17 typed message kinds with server-side validation + ACTIONS contract for autonomous workersWe needed an outer-loop “company” model that didn’t exist anywhereapps/web/app/api/_hono/org.ts + ~/.restart-org/ + ~/org-{dept}/.harness/

The cheapest, most maintainable autonomous coding rig is:

  1. A small bash supervisor with no agent-runtime opinion (~1500 lines).
  2. Stable per-role markdown prompts (so prompt-cache hits dominate).
  3. Files-on-disk for state (so any role can re-derive from scratch).
  4. A curator role for memory compaction.
  5. Observability over autonomy — show every decision the orchestrator made.

So far: 99.94% cache-hit rate, 22/37 features green at iteration 5 of the live mission, $135/mission, ~$0.05/role-invocation effective cost.