Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
135 changes: 134 additions & 1 deletion commands/run/triage.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,140 @@ migrateProjectState();
" 2>/dev/null || true
```

**Checkpoint detection — check for resumable progress before stage routing:**

After loading state and running migration, detect whether a prior pipeline run left
a checkpoint with meaningful progress (beyond triage). If found, present the user
with Resume/Fresh/Skip options before proceeding.

```bash
# Detect checkpoint with progress beyond triage
CHECKPOINT_DATA=$(node -e "
const { detectCheckpoint, resumeFromCheckpoint } = require('./lib/state.cjs');
const cp = detectCheckpoint(${ISSUE_NUMBER});
if (!cp) {
console.log('none');
} else {
const resume = resumeFromCheckpoint(${ISSUE_NUMBER});
console.log(JSON.stringify(resume));
}
" 2>/dev/null || echo "none")
```

If checkpoint is found (`CHECKPOINT_DATA !== "none"`):

Parse the checkpoint data and display to the user:
```bash
CHECKPOINT_STEP=$(echo "$CHECKPOINT_DATA" | node -e "
const d=JSON.parse(require('fs').readFileSync('/dev/stdin','utf-8'));
console.log(d.checkpoint.pipeline_step);
")
RESUME_ACTION=$(echo "$CHECKPOINT_DATA" | node -e "
const d=JSON.parse(require('fs').readFileSync('/dev/stdin','utf-8'));
console.log(d.resumeAction);
")
RESUME_STAGE=$(echo "$CHECKPOINT_DATA" | node -e "
const d=JSON.parse(require('fs').readFileSync('/dev/stdin','utf-8'));
console.log(d.resumeStage);
")
COMPLETED_STEPS=$(echo "$CHECKPOINT_DATA" | node -e "
const d=JSON.parse(require('fs').readFileSync('/dev/stdin','utf-8'));
console.log(d.completedSteps.join(', '));
")
ARTIFACTS_COUNT=$(echo "$CHECKPOINT_DATA" | node -e "
const d=JSON.parse(require('fs').readFileSync('/dev/stdin','utf-8'));
console.log(d.checkpoint.artifacts.length);
")
STARTED_AT=$(echo "$CHECKPOINT_DATA" | node -e "
const d=JSON.parse(require('fs').readFileSync('/dev/stdin','utf-8'));
console.log(d.checkpoint.started_at || 'unknown');
")
UPDATED_AT=$(echo "$CHECKPOINT_DATA" | node -e "
const d=JSON.parse(require('fs').readFileSync('/dev/stdin','utf-8'));
console.log(d.checkpoint.updated_at || 'unknown');
")
```

Display checkpoint state and prompt user:
```
AskUserQuestion(
header: "Checkpoint Detected for #${ISSUE_NUMBER}",
question: "A prior pipeline run left progress at step '${CHECKPOINT_STEP}'.

| | |
|---|---|
| **Last step** | ${CHECKPOINT_STEP} |
| **Completed steps** | ${COMPLETED_STEPS} |
| **Artifacts** | ${ARTIFACTS_COUNT} file(s) |
| **Resume action** | ${RESUME_ACTION} → stage: ${RESUME_STAGE} |
| **Started** | ${STARTED_AT} |
| **Last updated** | ${UPDATED_AT} |

How would you like to proceed?",
options: [
{ label: "Resume", description: "Resume from checkpoint — skip completed steps (${COMPLETED_STEPS}), jump to ${RESUME_STAGE}" },
{ label: "Fresh", description: "Discard checkpoint and re-run pipeline from scratch" },
{ label: "Skip", description: "Skip this issue entirely" }
]
)
```

Handle user choice:

| Choice | Action |
|--------|--------|
| **Resume** | Load checkpoint context. Set `pipeline_stage` in state to `${RESUME_STAGE}`. Log: "MGW: Resuming #${ISSUE_NUMBER} from checkpoint (step: ${CHECKPOINT_STEP}, action: ${RESUME_ACTION})." Skip triage/worktree stages that already completed and jump directly to the resume stage in the pipeline. The `resume.context` object carries step-specific data (e.g., `quick_dir`, `plan_num`, `phase_number`) needed by the target stage. |
| **Fresh** | Clear checkpoint via `clearCheckpoint()`. Reset `pipeline_stage` to `"triaged"`. Log: "MGW: Checkpoint cleared for #${ISSUE_NUMBER}. Starting fresh." Continue with normal pipeline flow. |
| **Skip** | Log: "MGW: Skipping #${ISSUE_NUMBER} per user request." STOP pipeline. |

```bash
case "$USER_CHOICE" in
Resume)
# Load resume context and jump to the appropriate stage
node -e "
const fs = require('fs'), path = require('path');
const activeDir = path.join(process.cwd(), '.mgw', 'active');
const files = fs.readdirSync(activeDir);
const file = files.find(f => f.startsWith('${ISSUE_NUMBER}-') && f.endsWith('.json'));
const filePath = path.join(activeDir, file);
const state = JSON.parse(fs.readFileSync(filePath, 'utf-8'));
// The pipeline_stage already reflects prior progress — do not overwrite
// unless the resume target is more advanced than current stage
console.log('Resuming from checkpoint: ' + JSON.stringify(state.checkpoint.resume));
" 2>/dev/null || true
# Set RESUME_MODE=true — downstream stages check this flag to skip completed work
RESUME_MODE=true
RESUME_CONTEXT="${CHECKPOINT_DATA}"
;;
Fresh)
node -e "
const { clearCheckpoint } = require('./lib/state.cjs');
clearCheckpoint(${ISSUE_NUMBER});
console.log('Checkpoint cleared for #${ISSUE_NUMBER}');
" 2>/dev/null || true
# Reset pipeline_stage to triaged for fresh start
node -e "
const fs = require('fs'), path = require('path');
const activeDir = path.join(process.cwd(), '.mgw', 'active');
const files = fs.readdirSync(activeDir);
const file = files.find(f => f.startsWith('${ISSUE_NUMBER}-') && f.endsWith('.json'));
const filePath = path.join(activeDir, file);
const state = JSON.parse(fs.readFileSync(filePath, 'utf-8'));
state.pipeline_stage = 'triaged';
fs.writeFileSync(filePath, JSON.stringify(state, null, 2));
" 2>/dev/null || true
RESUME_MODE=false
;;
Skip)
echo "MGW: Skipping #${ISSUE_NUMBER} per user request."
exit 0
;;
esac
```

If no checkpoint found (or checkpoint is at triage step only), continue with
normal pipeline stage routing below.

**Initialize checkpoint** when pipeline first transitions past triage:
```bash
# Checkpoint initialization — called once when pipeline execution begins.
Expand All @@ -67,7 +201,6 @@ updateCheckpoint(${ISSUE_NUMBER}, {
});
" 2>/dev/null || true
```

Check pipeline_stage:
- "triaged" → proceed to GSD execution
- "planning" / "executing" → resume from where we left off
Expand Down
86 changes: 86 additions & 0 deletions commands/workflows/state.md
Original file line number Diff line number Diff line change
Expand Up @@ -647,6 +647,91 @@ GSD phase directory (`.planning/phases/{NN}-{slug}/`) to operate in.
Issues created outside of `/mgw:project` (e.g., manually filed bugs) will not have
a `phase_number`. In this case, `/mgw:run` falls back to the quick pipeline.

## Checkpoint Resume Detection

When `mgw:run` starts for an issue, the validate_and_load step checks whether a prior
pipeline run left a checkpoint with progress beyond the initial triage step. This enables
resuming interrupted sessions without re-doing completed work.

### Resume Detection Functions (lib/state.cjs)

| Function | Signature | Returns | Description |
|----------|-----------|---------|-------------|
| `detectCheckpoint` | `(issueNumber)` | `object\|null` | Checks if active state file has a non-null checkpoint with `pipeline_step` beyond `"triage"`. Returns the checkpoint data if resumable, `null` otherwise. |
| `resumeFromCheckpoint` | `(issueNumber)` | `object\|null` | Returns checkpoint data plus computed `resumeStage`, `resumeAction`, and `completedSteps`. Maps `resume.action` to the pipeline stage to jump to. |
| `clearCheckpoint` | `(issueNumber)` | `{ cleared: boolean }` | Resets the checkpoint field to `null` in the active state file. Used for "Fresh start" option. |

### Resume Action to Stage Mapping

The `resume.action` field in the checkpoint tells `resumeFromCheckpoint()` which pipeline
stage to jump to:

| resume.action | resumeStage | Meaning |
|---------------|-------------|---------|
| `run-plan-checker` | `planning` | Plan exists, needs quality check |
| `spawn-executor` | `executing` | Plan complete, execute next |
| `continue-execution` | `executing` | Mid-execution resume |
| `spawn-verifier` | `verifying` | Execution done, verify next |
| `create-pr` | `pr-pending` | Verification done, create PR |
| `begin-execution` | `planning` | Triage done, begin planning |
| `null` / unknown | `planning` | Safe default |

### Resume Detection Flow

```
mgw:run #N starts
|
v
Load state file → migrateProjectState()
|
v
detectCheckpoint(N)
|
+---> null (no checkpoint or triage-only) → proceed with normal stage routing
|
+---> checkpoint found → display state to user
|
v
AskUserQuestion: Resume / Fresh / Skip
|
+---> Resume: load checkpoint context, set RESUME_MODE=true,
| jump to resume.action stage (skip completed steps)
|
+---> Fresh: clearCheckpoint(N), reset pipeline_stage to "triaged",
| continue normal pipeline
|
+---> Skip: exit pipeline for this issue
```

### Pipeline Step Order

The `CHECKPOINT_STEP_ORDER` constant defines the ordered progression of checkpoint steps:

```
triage → plan → execute → verify → pr
```

Only checkpoints with `pipeline_step` at index > 0 (beyond `"triage"`) are considered
resumable. A checkpoint at `"triage"` means nothing meaningful has been completed yet.

### Resume Context

When resuming, the `resume.context` object carries step-specific data needed by the
target stage. The context shape varies by `resume.action`:

| resume.action | Context fields |
|---------------|----------------|
| `spawn-executor` | `{ quick_dir, plan_num }` |
| `run-plan-checker` | `{ quick_dir, plan_num }` |
| `spawn-verifier` | `{ quick_dir, plan_num }` |
| `create-pr` | `{ quick_dir, plan_num }` |
| `continue-execution` | `{ phase_number }` |
| `begin-execution` | `{ gsd_route, branch }` |

Downstream pipeline stages read `resume.context` to pick up where the prior run left
off. For example, the executor stage uses `quick_dir` and `plan_num` to locate the
existing plan files rather than re-creating them.

## Consumers

| Pattern | Referenced By |
Expand All @@ -661,5 +746,6 @@ a `phase_number`. In this case, `/mgw:run` falls back to the quick pipeline.
| Project state | milestone.md, next.md, ask.md |
| Gate result schema | issue.md (populate), run.md (validate) |
| Board status sync | board-sync.md (utility), issue.md (triage transitions), run.md (pipeline transitions) |
| Checkpoint resume | run.md (detect + prompt), milestone.md (detect resume point for failed issues) |
| Checkpoint writes | triage.md (init), execute.md (plan/execute/verify), pr-create.md (pr) |
| Atomic writes | lib/state.cjs (`atomicWriteJson`, `updateCheckpoint`) |
Loading
Loading