Skip to content

feat(exec): detect and recover from zero-commit agent executions 😵‍💫#55

Closed
ivy wants to merge 45 commits intomainfrom
hive/ca7be0e4-afb4-40e0-89a4-506c7f8bebe8
Closed

feat(exec): detect and recover from zero-commit agent executions 😵‍💫#55
ivy wants to merge 45 commits intomainfrom
hive/ca7be0e4-afb4-40e0-89a4-506c7f8bebe8

Conversation

@ivy
Copy link
Copy Markdown
Owner

@ivy ivy commented Feb 10, 2026

Why

hive exec was silently succeeding when agents produced zero commits, pushing failure detection to publish, which would crash with GraphQL: No commits between main and <branch> when attempting to create a PR. This violated Principle 3 (Single Responsibility) — publish shouldn't be responsible for detecting that exec didn't produce work.

The blind spot: exec only checked for uncommitted changes after execution, but had no validation for "no work at all." An agent could analyze a design issue, exit cleanly with zero commits and zero uncommitted changes, and exec would report success.

What

Implemented a two-layer approach:

Proactive (prevention): Updated the agent system prompt to explicitly require implementation and commits. If an issue is not implementable (e.g., a design discussion), the agent should report completion as false with blockers.

Reactive (detection & recovery): After exec completes:

  1. Check for commits with git rev-list --count main..HEAD (happy path is free)
  2. If zero commits, request a structured completion report via --json-schema (following the prdraft pattern)
  3. If agent claims completion but no commits exist, retry with a nudge to actually implement (up to 3 attempts)
  4. If agent reports blockers, fail with the blocker reason

Safety net: Added a guard in publish that checks for commits before attempting to push/PR. This is belt-and-suspenders — exec should catch it first, but publish won't crash if something slips through.

Notes for reviewers

  • The validation uses j.RunCapture() (not j.Run()) to parse structured output without losing real-time logs during the main execution
  • The completion schema mirrors the pattern from prdraft — same retry loop, same structured output parsing
  • Test coverage includes specs for HasNewCommits in the workspace package
  • Also includes an unrelated fix: changed workspace.Create to branch from main instead of HEAD (was causing test failures)

Generated with Hive | Closes #48

ivy and others added 30 commits February 10, 2026 01:31
…ths ✨

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New commands: `hive ls` lists sessions, `hive cd` spawns a shell in a
session's workspace. Shared resolveSession logic handles both UUID
direct lookup and ref-based resolution via claims. `hive attach` now
resolves by ref or UUID. `hive list` becomes a deprecated alias for ls.
Replaces manual `hive cleanup` with session-aware reaping. Scans for
terminal sessions past retention, removes workspaces/sessions/claims.
Detects stale claims with no active systemd unit and marks them failed.
Deprecates the cleanup command.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract reapSessions with injectable deps (isUnitActive, releaseOnSource,
removeWorkspace) so core reap logic is testable without systemd or GitHub
API. Tests cover: expired published/failed reaping, retention-window
preservation, stale session detection, and active unit skipping.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Manual runs (hive run owner/repo#N) don't go through poll so sessions
lack board_item_id in SourceMetadata. Skip the source release for these
instead of hitting "unknown ref" from the ghprojects adapter's empty
cache. Also populate SourceMetadata when --board-item-id flag is passed.
Add poll.max-concurrent, poll.instance, reap section with retention
settings, and github.ready-option-id. Improve comments to document
each key's purpose and per-instance config pattern.
Repo map, pipeline, command table, and tech stack now reflect the
session/claim/source packages, systemd dispatch, reap, and new CLI
surface (ls, cd, attach).
Reflects the new dispatch model (systemd units, claims, sessions),
source abstraction, XDG data layout, session status lifecycle, and
updated CLI surface (ls, cd, reap).
The end-state specification that guided the 8-phase lifecycle
migration is now reality. Track it alongside architecture.md as
the reference for dispatch, sessions, claims, and systemd units.
Explains threat model (helpful-not-malicious), trust boundaries across
pipeline stages, credential isolation via systemd-run mounts, the .git
write trade-off, network access stance, author authorization, and
what's explicitly out of scope.
Guided tutorial taking the operator from a freshly built hive binary
through manually dispatching a single issue to a completed pull request.
Covers prepare, exec, publish, and the hive run shortcut.
Extract CLI examples (tutorial/how-to material), session status lifecycle
and poll loop detail (now in lifecycle.md). Add conceptual "why" framing
to each component, cross-links to security model and ADRs, data flow
diagram, and authz module.
Step-by-step: build from source, install binary, install systemd
units, verify toolchain.
Finding GraphQL IDs, setting up config.toml, per-instance config
for multi-project setups, poll and reap tuning.
Systemd user service setup, loginctl enable-linger, log checking,
zero-downtime binary upgrades, multi-instance deployment.
hive ls, hive cd, hive attach, reading .hive/ metadata, journald
logs, manual resume with hive exec --resume.
Writing GitHub issues that work as agent prompts, based on real
learnings from prototype usage.
Complete reference for every hive command with synopsis, flags,
config keys, credentials, exit codes, and examples.
All config keys organized by section, environment variables,
resolution order, and named instance support.
Session JSON schema, status state machine, claim file format,
workspace directory layout, and metadata files.
Source interface contract, WorkItem struct, ref format conventions,
and GitHub Projects adapter specifics.
Jail interface contract, RunOpts struct, systemd-run backend details,
sandbox properties, mount strategy, and credential isolation.
ivy and others added 15 commits February 10, 2026 02:31
All unit templates, instance specifiers, unit relationships,
and useful systemctl/journalctl commands.
Trim duplicated content that now lives in docs/, add Diátaxis-structured
documentation navigation, streamline quick start, and add commands table.
Link out to tutorial, how-to, reference, and explanation docs instead of
inlining setup/usage/sandboxing/workspace details.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Set version via -ldflags at build time so `hive --version` shows the
build timestamp. Logs version on startup for debugging. Defaults to
"dev" for unset builds.
Remove board-item-id (not written by prepare), replace metadata table
with inline mention + cross-link, cut flags table, trim reap details,
and shorten sandbox/retry explanations to single sentences with links.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Aligns with Diátaxis convention for how-to guide naming.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
VERSION defaults to "dev" and is overridable via env var
(e.g. VERSION=v1.0.0 make build). Build timestamp is always
injected separately so version and build time are independent.
debug-session: move status table and metadata listing to session ref links.
configure: move config search order to config reference link.
write-issues: remove "Why issue quality matters" explanation and
"Authorization"/"Board workflow" reference sections, add cross-links.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
jail-interface: stdin is not connected (only stdout/stderr are set).
source-interface: Complete is not called by publish — publish calls
gh.MoveToInReview() directly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
architecture: replace inline Data Layout tree with cross-link to
reference/session.md, add links to jail and source interface refs.
lifecycle: update data layout links to reference/session.md, make
"reference docs" mention link to specific docs.
security-model: fix "silently fail" to "rejected with permission denied".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Repo map now includes reference/, how-to/, explanation/, tutorial/,
and prototype/ directories added during docs reorganization. Corrects
logging entry from nonexistent slog-journal to actual TextHandler,
and adds Key Docs entries for the full Diátaxis structure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use coreos/go-systemd daemon.SdNotify to report poll loop and run
pipeline status via STATUS= strings, visible in systemctl status.
Poll service upgraded to Type=notify with READY=1 on startup.
Run service gets NotifyAccess=all for status passthrough.
No-op when not running under systemd.
Prevents publish crashes when agents produce analysis without code changes.

Two-layer approach:
1. Proactive: Updated agent system prompt to require implementation
2. Reactive: Post-exec validation using structured output

When zero commits detected, exec requests completion report via
--json-schema. If agent claims completion (but no commits), retries with
nudge to implement. If agent reports blockers, fails with reason.

Publish now guards against zero-commit branches as safety net.

Closes #54
@ivy ivy closed this Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

exec should validate agent produced commits

1 participant