Skip to content

Decision log

WhenDecisionRationale
Initialbash + claude as the runtimeNo agent-runtime opinion. Drop-in replaceable. ~1500 lines of supervisor that anyone can read in an afternoon.
InitialDisk-only stateAny role can be invoked from a fresh shell at any time and produce the same output. Snapshots take iteration boundaries with no quiescence.
InitialFresh context per roleNo drift across iterations. Anthropic prompt cache absorbs the cost (99.94% hit rate observed). See Why fresh contexts.
InitialPer-role markdown promptsImprovements compound without code changes. Branch-mergeable. Cache-key-stable.
InitialCurator roleWithout compaction raw.md grows unboundedly. Curator owns memory pruning. See Why a curator.
MidPlan-reviewer gateBorrowed the structural intuition from gstack’s /plan-ceo-review but made it autonomous and gating. Catches scope drift before any worker fires.
MidDebugger roleStuck features at attempts ≥ 3 deserve root-cause analysis, not blind retry. Fires once, writes notes, next worker reads them.
MidCrosscheck roleDifferent model re-validates the validator’s pass. Catches “confidently wrong” validator decisions. Optional, gated by config.
LateCost-cap removedWas exiting at $25 vs real $135 spend (miscount). User wanted unbounded cost. Replaced with cost-soft-warn that paused via SIGSTOP. Even that was disabled.
LateTDD iron law in validatorSuperpowers-inspired. Behavioural assertions need a test that failed before the change. ~20-line prompt change, biggest quality lever we have.
LateBoil the lake principlegstack-inspired. Roles that sprawled (debugger, ui-qa, curator, product, architect) got an explicit “do fewer things perfectly” section.
LateEvidence ruleclaudecode-orchestrator-inspired. Every PASS claim must quote source output. Reports without evidence revert to failing.
LateIDENTITY.md per roleAgent-Swarm-inspired. Cross-mission append-only memory. Curator promotes recurring patterns.
LateTRICK: conventionMOLTRON-inspired. Worker-tagged generalisable observations get promoted to identity/worker.md after recurring.
LateNamed checkpointsHermes-inspired. Pre-agreed milestones gate progress. Distinct from escalations. Auto-fire on triggerOn: post-planner.
LateGit worktreesComposio-inspired. Each parallel lane gets its own filesystem dir. No git race possible.
LateService smoke-testclaudecode-orchestrator-inspired. Real curl + service startup before declaring DONE. Reopens features on failure.
LateTwo-mode parallelismConductor-inspired. lane (different features) vs competition (same feature, validator picks winner).
LateDecisions timeline + ghost rateGenuine differentiator. No other framework exposes orchestrator parse-ghost telemetry.

The coding harness predated the org layer. These decisions were made when adding the company-of-directors layer on top.

WhenDecisionRationale
Org-init5 fixed departments, no runtime mutabilityAdding/retiring directors at runtime would break the message contract (each kind has a from-dept allowlist). 5 felt right (Business, R&D, Tech, Mgmt, Marketing) and changing it requires a code change on purpose.
Org-initCEO is a mode of Business, not a 6th departmentA 6th would expand the contract surface and require new outboxKinds. Modes already exist (orchestrator vs worker vs validator); adding “ceo” as a worker-variant for Business when handling Directives kept the dept count fixed.
Org-initFile-based storage (~/.restart-org/*.json/*.jsonl)Mirrors the existing harness pattern. No DB migration. Any role can read full state from a fresh shell. The audit.jsonl is the durable record.
Org-initTyped messages over conversationalDrift catastrophe in conversational orgs. 17 named kinds, each with explicit from/to/projectId rules, server-validated. Workers can’t invent new kinds.
Org-initAll actions live in a projectReserved projects platform-reserved (Tech cross-cutting) and org-ops-reserved (governance) catch messages without a specific project so projectId is always set.
Org-initACTIONS contract instead of network access for workersWorkers run as claude -p subprocesses with no API access. They emit a structured actions block; the backend executes ops sequentially with charter validation. Failed actions abort the iteration cleanly.
Mid-orgServer-side schedulers, not browser-sideAn auto-loop only when the operator has a browser tab open is useless. Schedulers live in globalThis.__papercupSchedulers, survive HMR, run even with all browsers closed.
Mid-orgDirector memory injectionEach director’s prompt now includes a synthesized ## Recent decisions (your own memory, newest first) block from the last 12 run records. Lets directors notice patterns like “I keep deferring this.”
Mid-orgProgressUpdate side-effect → project statusManagement can advance project state with a single send_message action (metadata.statusUpdate: "in_progress"). Cleaner autonomous loop — no separate PATCH op needed.
Mid-orgReserved-project guard on deleteDELETE /projects/:slug rejects platform-reserved / org-ops-reserved. Reserved projects can still be wiped via direct file edit; the guard is for accidental UI/API deletes.
Late-orgHealth folded into Harnesses/papercup/health was a 4th view of the same 5 dept cards already on Organization + Harnesses. Folded the live ops controls onto Harnesses (now the single ops cockpit); cockpit is preserved at _discarded/HarnessOpsCockpit.tsx for revival.
Late-orgDocumentation-first tab orderNew operators learn the model before acting. Docs leftmost; Harnesses second (daily-use); reference tabs after.
Late-orgShared lib for Papercup viewsBoth apps/web/app/papercup/ and apps/public-site/ render identical components from libs/papercup-shared/. Single source of truth; readOnly prop disables edit affordances on the public render.
RejectedWhy
AutoGPT-style infinite loopDrift catastrophe. Rejected in favour of supervised iterations with explicit decision verbs.
LangChain orchestrationHeavyweight Python runtime + agent abstractions we didn’t need. Bash+files won.
Linear/Notion task source integrationUser explicitly said no external task source.
Vector DB memory (LanceDB, etc)Markdown files are fine until they aren’t. Identity files stay sub-100KB.
Discord/Slack control surfaceCoupled the harness to a particular UI chat surface. Rejected in favour of HTTP API + Next dashboard.
Tmux-based agent messaging (claude-swarm)Coupled state to a terminal multiplexer. Files-on-disk are simpler.
Subdomain-based public site rewriteConsidered Cloudflare-rewrite of /p/papercup → public domain. Rejected: not isolated enough. Built apps/public-site instead.