Hermes

Referenced as a target in the gstack installer (./setup --host hermes installs gstack into ~/.hermes/skills/). Public-facing details are sparse; what we know comes from the comparative writeups.

What it is (per the surveys)

Most opinionated of the popular Claude Code frameworks. Built around autonomous orchestration: Claude operates with significant autonomy on long-running multi-step workflows, with planning, persistent memory, multi-agent coordination, retry/error handling, and human-in-the-loop checkpoints.

Designed for “AI-powered product” use cases rather than personal-tool use.

What we kept

Named human-in-loop checkpoints. Borrowed verbatim. Our CHECKPOINT <name> decision verb is the structural equivalent. triggerOn: post-planner auto-fires after planner+plan-reviewer pass.
Distinct from escalation. Hermes’ insight: a checkpoint is “we hit the milestone, please approve” — not “I’m stuck.” We split the verbs and the UI banners accordingly.

What we dropped

Autonomous retry/error handling at the framework level. Our retry is stateful at the feature level — attempts increments, debugger fires at threshold ≥ 3. Hermes seems to retry at the agent level; we prefer disk-tracked retry at the workflow level.
Multi-agent coordination protocol. We didn’t need a wire protocol; files-on-disk are the coordination surface.

Differences in philosophy

Hermes	Papercup
Multi-agent wire protocol	File-based coordination
Built around “AI product” use case	Built around autonomous coding
Checkpoint defined per workflow	Checkpoint defined per `triggerOn` config

Honest take

We don’t have first-hand experience with Hermes — the public surface is mostly comparison writeups. The named-checkpoint pattern is the highest-value thing we lifted. If you’re building Hermes-style “AI products” in production, the framework is worth a deep look.