feat(opencode): one-task-per-run model + 9 routing fixes (ADRs 13-21)

Captures the design grilling outcome. Adds ADRs 13-21 covering:
- run-level plan_rework_remaining counter to bound P3<->P5.5/P7/P8 thrash
- non-resumable workflow with throwaway-worktree recovery procedure
- @simplify advisory at every gate (not just Phase 8)
- Phase 8 fix specs go to disk as task-fix-N.md (preserves ADR-6)
- Phase 5.5 BLOCK protocol: orchestrator edits plan, decrements counter, re-enters P4
- Phase 8 NOT_TESTABLE manifest in reviewer prompt
- unified Implementation Incomplete diagnosis (test_design / production_logic / split_needed)
- Phase 1 working-tree cleanliness + depends-on enforcement
- one-task-per-run pivot: Phase 5 still splits N tasks, only task-1 runs;
  tasks 2..N filed as sub-issues with rich seed bodies; split_needed at P7
  aborts to Failure Handler (one-task-per-run = no salvageable prior work)

Auto-resolves big-diff Phase 8 reviews, cross-task regression-within-run, and
mid-flight task-split routing. Rewrites routing matrix and three Mermaid
diagrams; updates @pm (depends-on frontmatter, split-time filing), @check
(third diagnosis verdict), @make (escalate: split_needed flag).
This commit is contained in:
Harald Hoyer 2026-05-08 13:02:54 +02:00
parent 0b15944d1c
commit af6481a5a7
5 changed files with 342 additions and 130 deletions

View file

@ -86,7 +86,10 @@ Define `RUN_DIR="$WORKTREE_PATH/.workflow/run-$ISSUE_ID"` once in Phase 1 and re
- Else if `git rev-parse --verify --quiet master` succeeds, use `master`.
- Else stop: "Could not determine base branch (no `main` or `master`). Pass it as the second argument: `/workflow <ISSUE-ID> <base-branch>`."
7. Verify the current branch is not the base branch: if `BRANCH_NAME == BASE_BRANCH`, stop: "Cannot run workflow on the base branch (`$BASE_BRANCH`). Switch to a feature branch first."
8. Set the run-artifacts directory: `RUN_DIR="$WORKTREE_PATH/.workflow/run-$ISSUE_ID"`. Phase 3 will `mkdir -p "$RUN_DIR"` before writing the first artifact.
8. **Verify the working tree is clean** (ADR-20): `git status --porcelain` must return empty. If not, stop: "Working tree must be clean. Commit or stash uncommitted changes before running the workflow."
9. **Check `depends-on:` declarations** (ADR-21): if `./TODO/$ISSUE_ID.md`'s frontmatter contains a `depends-on: [<ID>, ...]` list, verify every listed dependency's status is `Done` (read each `./TODO/<DEP-ID>.md`). If any dependency is not `Done`, stop: "Cannot start `$ISSUE_ID`; it depends on `<DEP-ID>` (status: `<status>`). Complete dependencies first." If a listed dependency file does not exist, stop with: "Cannot start `$ISSUE_ID`; declared dependency `<DEP-ID>` has no issue file." If the field is absent, proceed.
10. Set the run-artifacts directory: `RUN_DIR="$WORKTREE_PATH/.workflow/run-$ISSUE_ID"`. Phase 3 will `mkdir -p "$RUN_DIR"` before writing the first artifact.
11. Initialize the run-level rework counter: `PLAN_REWORK_REMAINING=1` (per ADR-13). Decrement on every P5.5-BLOCK→P4, P7-escalation-exhaustion→P3, and P8-plan-level→P3 transition. When the counter is `0` and another such transition fires, abort to the Failure Handler instead of re-entering.
---
@ -196,24 +199,23 @@ Reviewers should evaluate testability:
- `@check`: Is the design testable? Are the right behaviors identified? (Review Framework §8)
- `@simplify`: Is the test scope appropriate? Over-testing proposed?
**Merge rules:**
- `@check` safety/correctness findings are hard constraints
- If `@simplify` recommends removing something `@check` flags as needed, `@check` wins
- Note conflicts explicitly
**Verdict authority** (ADR-15): `@check` is the only blocking reviewer. Its `NEEDS WORK` and `BLOCK` verdicts gate progression; its `ACCEPTABLE` verdict permits proceeding regardless of what `@simplify` reports. `@simplify` is advisory at every gate — its findings are recorded in the run summary's "Advisory notes (not filed)" section but never trigger a re-dispatch loop. The user may manually elevate a `@simplify` finding into a `task-fix-<N>.md` if it warrants follow-up.
**Review loop (max 3 cycles):**
**Review loop (max 3 cycles, gated on `@check`):**
1. Dispatch both reviewers against `$RUN_DIR/plan.md`.
2. Merge findings
3. If verdict is ACCEPTABLE from both (or JUSTIFIED COMPLEXITY from `@simplify`): proceed to Phase 5
4. If BLOCK or NEEDS WORK: edit `$RUN_DIR/plan.md` in place addressing findings (re-apply Dispatch Hygiene to the updated file), then re-review.
5. **Convergence detection:** if reviewers return the same findings as the previous cycle, stop the loop early
6. If still unresolved after 3 cycles: note unresolved blockers and proceed anyway (they will be documented in the workflow summary and commit message)
2. Merge findings: record `@simplify`'s output for the summary; act only on `@check`'s verdict.
3. If `@check` returns ACCEPTABLE: proceed to Phase 5 (regardless of `@simplify`).
4. If `@check` returns BLOCK or NEEDS WORK: edit `$RUN_DIR/plan.md` in place addressing the findings (re-apply Dispatch Hygiene to the updated file), then re-review.
5. **Convergence detection:** if `@check` returns the same findings as the previous cycle, stop the loop early.
6. If still unresolved after 3 cycles: note unresolved blockers and proceed anyway (they will be documented in the workflow summary and commit message).
---
## Phase 5: Split into Tasks
**The output of this phase is one file per task at `$RUN_DIR/task-<N>.md`** (1-indexed: `task-1.md`, `task-2.md`, …). These files are the source-of-truth that Phase 6 (`@test`) and Phase 7 (`@make`) read by absolute path. **No file written = no dispatch in later phases.** If you skip the file-write step, every downstream dispatch will reference a non-existent path and fail.
**The output of this phase is one file per task at `$RUN_DIR/task-<N>.md`** (1-indexed: `task-1.md`, `task-2.md`, …). These files are the source-of-truth that Phase 5.5, Phase 6, and Phase 7 read by absolute path. **No file written = no dispatch in later phases.** If you skip the file-write step, every downstream dispatch will reference a non-existent path and fail.
**One-task-per-run model (ADR-21):** Phase 5 still produces N task files. After Phase 5.5 review, only **task-1** runs through Phases 68 in this invocation. If N > 1, tasks 2…N are filed as TODO sub-issues at the end of Phase 5.5 (see "File sibling tasks as sub-issues" below) and the user runs `/workflow` separately on each. This bounds the run's scope, keeps Phase 8 diffs reviewable, and eliminates cross-task regression risk inside a run.
Steps:
@ -254,9 +256,11 @@ Include **Test Design** from Phase 3 when available, attached to the relevant ta
**Task size:** ~10-30 minutes each, single coherent change, clear boundaries.
### Split Heuristic — when in doubt, split
### Split Heuristic — when in doubt, **do not** split
A task must be **split** if any of the following apply:
In the one-task-per-run model (ADR-21), splitting fans work out across user sessions: every additional task becomes a sub-issue the user must come back and run as its own `/workflow` invocation, with full P3/P4/P5/P5.5/P6/P7/P8 overhead per sub-issue. **Default to keeping work in one task.** Only split when one of the mechanical triggers below clearly applies *and* the resulting sub-tasks each warrant their own commit/PR-sized chunk of attention.
A task should be **split** when any of the following apply:
- It touches more than two distinct concerns (e.g. *constants + new component + sprite spawn + new system + main wiring* is **five** concerns — at least three tasks).
- It changes more than ~50 lines across more than 2 files.
@ -264,12 +268,14 @@ A task must be **split** if any of the following apply:
- It mixes pure-logic changes (math helpers) with stateful changes (queries, world mutation).
- It mixes new APIs with their first call sites in the same task.
When a task fails the heuristic, split into:
**Tiebreaker:** when none of the triggers clearly applies and the work plausibly fits a single coherent commit, do not split. Splitting fans out across sessions; only split if each resulting sub-issue is genuinely independently runnable and benefits from its own plan.
When a task does fail the heuristic, split into:
1. **Foundations** — new constants, types, components (no behavior change yet).
2. **Implementation** — the actual production logic, calling the foundations.
3. **Wiring** — registration in `main.rs` / `lib.rs` / app-builder.
Each split is dispatched separately to `@make` and verified before the next.
Tasks 2…N are filed as sub-issues at the end of Phase 5.5; only task 1 runs in this invocation.
### Code Context — what to include
@ -289,32 +295,73 @@ Apply **Dispatch Hygiene** to each task spec before dispatch in Phase 7.
## Phase 5.5: Review Task Split
A short, focused review of the task split as a set. Catches split errors (missed scope, overlap, multi-purpose tasks, missing integration contracts) when they're cheap to fix — *before* `@test` and `@make` dispatch on a broken split. Without this gate, the same errors surface only at Phase 8 final review, after expensive test/implementation work has already been done.
A short, focused review of the task split as a set. In the one-task-per-run model (ADR-21), this phase is the gate for **two** things: (a) catching split errors (missed scope, overlap, multi-purpose tasks, missing integration contracts) before `@test`/`@make` dispatch, and (b) preventing a botched split from being persisted as garbage sub-issues that the user has to manually clean up later. Both stakes are higher than in the original N-tasks-per-run design.
**Dispatch only `@check`** for this phase — split review is structural / coverage, not complexity. `@simplify` is not involved. Apply **Dispatch Hygiene** to the prompt.
**Skip Phase 5.5 entirely when N=1** (ADR-21): a single-task plan has no split to review. Three of the six questions below (no overlap, integration contracts, sub-issue self-containment) are degenerate. The remaining structural concerns (coverage, single-purpose, testable AC) are already evaluated at Phase 4 plan acceptance. Proceed directly to Phase 6.
The dispatch prompt names:
- `$RUN_DIR/plan.md` (the plan being decomposed)
- `$RUN_DIR/task-1.md` through `$RUN_DIR/task-N.md` (the split — list every task file)
- The worktree path
`@check` evaluates the split against five questions:
`@check` evaluates the split against six questions:
1. **Coverage** — do the tasks together implement everything the plan promises? Any gaps?
2. **No overlap** — do two tasks claim the same scope or modify the same lines?
3. **Single-purpose** — does any task do more than one thing? (See Phase 5's Split Heuristic.)
4. **Integration contracts** — where two tasks touch a shared interface, is the contract documented in both task files?
4. **Integration contracts** — where two tasks touch a shared interface, is the contract documented in *both* task files in a form that survives sub-issue filing? (Each sub-issue runs in isolation later — its eventual P3 plan must be reconstructable from the sub-issue body alone, including any cross-sub-issue dependencies. This is the load-bearing question in the new model.)
5. **Testable acceptance criteria** — does every task have specific, falsifiable AC?
6. **Self-containment** — is each task spec runnable as a standalone `/workflow` invocation? Does its description carry enough plan-level context (rationale, code seams, scope boundary) that a fresh run could re-plan it without seeing the parent plan or sibling task files?
**Review loop (max 2 cycles):**
1. Dispatch `@check` against the plan + all task files.
2. If `ACCEPTABLE` → proceed to Phase 6.
3. If `NEEDS WORK` → edit the task files in place (split a task into two, merge two tasks, add integration contracts, sharpen AC). Re-apply Dispatch Hygiene to each updated file. Re-dispatch.
4. If `BLOCK` → the plan itself does not decompose cleanly. Return to Phase 4 with `@check`'s finding instead of forcing the split.
5. **Convergence detection:** same finding twice → stop loop, document the unresolved split issue in the run summary, proceed.
2. If `ACCEPTABLE` → proceed to "File sibling tasks as sub-issues" (below), then Phase 6.
3. If `NEEDS WORK` → edit the task files in place (split a task into two, merge two tasks, strengthen integration contracts, sharpen AC, add self-containment context). Re-apply Dispatch Hygiene to each updated file. Re-dispatch.
4. If `BLOCK` plan-level finding (ADR-17) → translate the split-level finding into a concrete `plan.md` edit, save the edit, **decrement `PLAN_REWORK_REMAINING`**, and re-enter Phase 4 against the revised plan. If `PLAN_REWORK_REMAINING` was already `0`, abort to the Failure Handler instead.
5. **Convergence detection:** same `@check` finding twice → stop loop, document the unresolved split issue in the run summary, proceed.
**This is a quick gate, not a deep review.** No line-by-line code feedback (there's no code), no design re-litigation (that was Phase 4's job). The whole point is a fast structural check before downstream phases start churning.
**This is a quick gate, not a deep review.** No line-by-line code feedback (there's no code), no design re-litigation (that was Phase 4's job). The whole point is a fast structural check before downstream phases start churning *and* before sibling tasks become persistent sub-issues.
### File sibling tasks as sub-issues (when N > 1)
After Phase 5.5 returns ACCEPTABLE, dispatch `@pm` to file each of `task-2.md` through `task-N.md` as a TODO sub-issue with `parent: $ISSUE_ID`. **Only task-1 continues into Phase 6.** Each filed sub-issue gets a rich seed body (ADR-21) so its eventual `/workflow` run can plan and implement without seeing siblings or the original `plan.md`.
For each task `$N` in 2…N, dispatch `@pm` with the following body content (assembled by the orchestrator from `task-<N>.md` and the relevant slice of `plan.md`):
```markdown
## What to implement
<task description from task-<N>.md>
## Acceptance criteria
<AC checkboxes from task-<N>.md>
## Code Context
<code snippets from task-<N>.md>
## Integration with sibling sub-issues
<dependencies on sibling sub-issues, with brief rationale; declared in frontmatter as `depends-on: [<SIB-ID>, ...]`>
## Plan rationale
<relevant slice of plan.md typically 13 paragraphs covering why this approach was chosen>
## Test design
<from task-<N>.md or plan.md if present>
---
Discovered during run on `$BRANCH_NAME` for parent issue `$ISSUE_ID`.
```
`@pm` invocation per sub-issue:
- Title — derived from `task-<N>.md`'s task description (short imperative).
- Status — `Todo`.
- Parent — `$ISSUE_ID`.
- Labels — propagate relevant labels from the parent (e.g. `gameplay`); add `split-from-run` to mark the provenance.
- `depends-on:` — sibling sub-issue IDs that this task requires to be `Done` first. The orchestrator determines the dependency graph from the integration contracts captured in Phase 5.5 question 4.
The new sub-issue files plus the parent's updated `## Sub-issues` list are staged in Phase 9's `chore(todo): …` commit alongside the parent's status/AC updates.
---
@ -338,7 +385,7 @@ Do **not** quote task or plan content inline — `@test` reads from disk.
|-----------|--------|
| `TESTS_READY` + `escalate_to_check: false` | Proceed to Phase 7 |
| `TESTS_READY` + `escalate_to_check: true` | Route tests to `@check` for light review. `@check` diagnoses, caller routes fixes to `@test`. Then proceed. |
| `NOT_TESTABLE` | Route to `@check` for sign-off on justification. If `Missing testability seam`, dispatch `@make` to add the seam first, then re-run `@test`. Otherwise the task goes to `@make` without tests. |
| `NOT_TESTABLE` | Route to `@check` for sign-off on justification. If `Missing testability seam`, dispatch `@make` to add the seam first, then re-run `@test`. Otherwise the task goes to `@make` without tests. **Record the `@test` justification + `@check` sign-off rationale** for the Phase 8 NOT_TESTABLE manifest (ADR-18). |
| `BLOCKED` | Investigate. May need to revise task spec or plan. |
| Test passes immediately | Investigate — behavior may already exist. Task spec may be wrong. |
| Stub-first run: tests pass with zero `todo!()` panics | **Structural-only tests.** Every test is asserting type/struct/enum facts without calling any stubbed symbol. Reject the test output and route back to `@test` with a "must exercise the stubbed symbols by calling them" note. Do not let these tests gate Phase 7 — they cannot RED→GREEN, so the body-pass `@make` would commit code with false-green coverage. |
@ -373,82 +420,88 @@ To get a clean runtime RED, dispatch a **stub-first `@make` pass** *before* `@te
The stub pass and the body pass each produce their own atomic commit (per Phase 9 rules): `feat(<scope>): scaffold <thing> with todo!() stubs` followed by `feat(<scope>): implement <thing>` (or whichever conventional type fits).
**Parallelism:**
- **Python:** Independent tasks can have tests written in parallel, *provided* their test files are disjoint and no shared `conftest.py` is being modified.
- **Rust:** Run `@test` dispatches **sequentially**. Cargo serialises the build via the `target/` directory lock, so parallel dispatches give no speedup; they only add risk (a long-running build in one branch starves the other, and any task that touches a shared crate-level fixture/helper file will race).
**Parallelism:** N/A in the one-task-per-run model (ADR-21). Phase 6 dispatches `@test` for task-1 only; the stub-pass `@make` (when applicable) runs strictly before `@test`, and the body-pass `@make` strictly after. There are no peer dispatches to parallelise.
**Constraint:** `@test` must not modify existing `conftest.py` files (prevents collision during parallel execution).
**Constraint:** `@test` must not modify existing `conftest.py` files (preserves cross-test invariants for the project's broader suite).
---
## Phase 7: Implement
Apply **Dispatch Hygiene** to each `@make` spec before sending. Repeated trips on a single task signal a Phase 5 split problem — go back and split.
Apply **Dispatch Hygiene** to each `@make` spec before sending. Repeated trips on a single task signal a scoping problem — `@check`'s diagnosis (below) will return `split_needed`, and the orchestrator files the task as a sub-issue and exits.
**`@make` dispatches are SEQUENTIAL — never in parallel.** Run each task to completion (writes, every verification command, and the orchestrator's post-check) before dispatching the next. Reasons:
- `@make` writes source files. Parallel agents picking the same file (e.g. `src/lib.rs` for adding both a new `pub mod` and a registration) corrupt each other.
- Even on disjoint files, Cargo's `target/` lock and uv's venv state serialise the verification builds anyway, so parallelism gives no speedup.
- Stub-pass/body-pass pairs (Rust integration TDD) must be strictly ordered within a task; running stub-pass for task 2 while body-pass for task 1 is still building yields a non-deterministic crate state for `@test` to RED against.
**One task per run** (ADR-21): only task-1 reaches Phase 7. Tasks 2…N were filed as sub-issues at the end of Phase 5.5; this run does not dispatch `@make` for any of them. Stub-pass and body-pass dispatches for task-1 still run sequentially (the stub-pass must commit before `@test` runs against it, and the body-pass must run after `@test` produces failing tests).
This applies to **all** `@make` invocations: standard mode, TDD mode, stub-pass, body-pass, and integration-fix dispatches.
Execute each task by dispatching `@make` with a short prompt:
- The absolute path to the task spec: `$RUN_DIR/task-<N>.md``@make` reads acceptance criteria, code context, and files-to-modify from there.
Execute task-1 by dispatching `@make` with a short prompt:
- The absolute path to the task spec: `$RUN_DIR/task-1.md``@make` reads acceptance criteria, code context, and files-to-modify from there.
- The worktree path.
- **Pre-written failing tests and handoff from `@test` (if TESTS_READY)** — these are short and per-dispatch, so include them inline in the prompt.
Do **not** quote the task spec inline.
`@make` runs in TDD mode when tests are provided:
1. Entry validation: run tests, verify RED, check failure codes match handoff
2. Implement minimal code to make tests pass (GREEN)
3. Regression check on broader area
4. Refactor while keeping green
5. Report RED→GREEN evidence
1. Entry validation: run tests, verify RED, check failure codes match handoff.
2. Implement minimal code to make tests pass (GREEN).
3. Regression check: run the project's full test suite (`nix develop -c cargo test` / `uv run pytest` / `nix flake check` as appropriate).
4. Refactor while keeping green.
5. Report RED→GREEN evidence.
**Escalation — two paths route through `@check``@test` → back to `@make`:**
For NOT_TESTABLE tasks, `@make` runs in standard mode (no entry validation; standard implementation + verification).
1. **Entry-validation escalation.** Before implementing, `@make`'s entry check (run tests, verify RED, compare against handoff) reveals test-quality concerns — wrong assertion target, mixed failure codes, mocks of internal boundaries, etc. `@make` reports without writing any production code.
2. **Mid-implementation escalation.** After implementing, `@make` hits its iteration limit (23 attempts) because the test demands production code that's impossible or contradicts the spec. `@make` returns `Implementation Incomplete` with the flag `escalate: test_design`. **Do not** re-dispatch `@make` with marginal context tweaks — that just burns cycles on a test that needs redesign, not better implementation.
### Implementation Incomplete — unified diagnosis path (ADR-19)
In both cases:
When `@make` returns `Implementation Incomplete` for *any* reason — entry-validation concern, mid-implementation iteration limit, `escalate: test_design` flag, `escalate: split_needed` flag, or no flag at all — the orchestrator routes the report through `@check` for diagnosis. **Do not re-dispatch `@make` with marginal context tweaks.** `@make`'s self-diagnosis (the `escalate:` flag, if present) becomes a *hint* for `@check`; `@check` is the authority that decides what to do next.
1. `@make` returns its report (entry-time concern or mid-impl `escalate: test_design`).
2. Orchestrator routes the report to `@check` for diagnosis (light review of the *tests*, not the implementation).
3. `@check` confirms or rejects the test-design suspicion.
4. **If confirmed:** orchestrator routes to `@test` to redesign the tests. Apply Dispatch Hygiene. Fixed tests return to `@make` for fresh entry validation and a clean implementation attempt.
5. **If rejected:** the issue is in the production code; orchestrator re-dispatches `@make` with `@check`'s diagnostic notes attached.
Steps:
**Iteration limit on this loop: max 2 cycles.** If a test-design suspicion keeps surfacing but `@check` never confirms it, the design problem is upstream — revisit the Phase 3 plan rather than thrashing between `@test` and `@make`.
1. `@make` returns its `Implementation Incomplete` report (with or without an `escalate:` flag).
2. Orchestrator dispatches `@check` for diagnosis. Inputs: the test files (`@check` reads them from disk), the production code state (the in-progress diff), the task spec at `$RUN_DIR/task-1.md`, and `@make`'s self-diagnosis hint.
3. `@check` returns one of three verdicts:
- **`test_design`** — the test demands production code that's impossible, internally-inconsistent, or testing the wrong observable. Orchestrator dispatches `@test` to redesign the tests. Apply Dispatch Hygiene. Fixed tests return to `@make` for fresh entry validation and a clean implementation attempt.
- **`production_logic`** — the test is sound; `@make`'s implementation is wrong or incomplete. Orchestrator re-dispatches `@make` with `@check`'s diagnostic notes attached.
- **`split_needed`** — the task is over-scoped; no realistic implementation can satisfy the AC within the task's stated files-to-modify. See "split_needed exit" below.
For NOT_TESTABLE tasks, `@make` runs in standard mode.
**Iteration limit on this loop: max 2 cycles.** If two cycles of `@check` diagnosis don't yield a clean `@make` Implementation Complete, the design problem is upstream — **decrement `PLAN_REWORK_REMAINING`** and revisit the Phase 3 plan. If `PLAN_REWORK_REMAINING` is already `0`, abort to the Failure Handler instead.
After all tasks complete, verify overall integration:
- Run the project's test suite if available
- Run linting/type checking if configured
- Fix any integration issues between tasks
### split_needed exit (ADR-21)
When `@check` diagnosis returns `split_needed` for task-1, **abort to the Failure Handler.** Rationale (Q19a per ADR-21): in the one-task-per-run model, task-1 is the only task in this invocation. No feature AC have been satisfied yet (any stub-pass commit is scaffolding, not feature work). Pre-filing a sub-issue from a botched plan would pollute `TODO/` with bad scoping the user has to manually clean up; the most likely upstream diagnosis is a plan-level scoping error, not a task-level over-scope.
The Failure Handler's recovery procedure (ADR-14: discard worktree, delete branch, re-create from base, re-run) cleanly destroys the stub-pass commit and any in-progress body-pass changes. The user re-runs `/workflow` on the same parent issue; Phase 3/4/5 produce a fresh plan and split. Sub-issues filed at Phase 5.5 of the *previous* run remain in `TODO/` (they were committed there in the failed run only if Phase 9 reached the TODO commit, which by definition it didn't if Phase 7 invoked the Failure Handler) — typically they're gone with the discarded worktree.
Concretely on `split_needed`:
1. Write a Failure Handler summary noting `@check`'s diagnosis verbatim and the Phase 5 split that was attempted.
2. Dispatch `@pm` to add a comment on `./TODO/$ISSUE_ID.md`: `- YYYY-MM-DD — split_needed at Phase 7 task-1; <one-line diagnosis>. Re-run after re-creating the worktree.`
3. Stop execution. Do not commit code, do not file new sub-issues, do not stage anything under `.workflow/`.
---
## Phase 8: Final Review
Apply **Dispatch Hygiene** to each reviewer prompt before sending. Dispatch `@check` and `@simplify` in parallel to review the full implementation (all changes across all files).
Apply **Dispatch Hygiene** to each reviewer prompt before sending. Dispatch `@check` and `@simplify` in parallel to review the implementation. Because the run executes exactly one task (ADR-21), the diff is bounded by the task's scope (~50 lines per the Split Heuristic) and fits comfortably in reviewer context — no chunking needed.
Provide reviewers with:
- The absolute path to `$RUN_DIR/plan.md` (the same file Phase 4 reviewed; mid-loop revisions will have updated it in place)
- The full diff (`git diff "$BASE_BRANCH"...HEAD`)
- Any decisions or deviations from the plan, captured inline in the dispatch prompt
- The absolute path to `$RUN_DIR/plan.md` (the same file Phase 4 reviewed; mid-loop revisions will have updated it in place).
- The absolute path to `$RUN_DIR/task-1.md` (the spec the implementation actually targeted).
- The full diff (`git diff "$BASE_BRANCH"...HEAD`).
- Any decisions or deviations from the plan, captured inline in the dispatch prompt.
- **NOT_TESTABLE manifest (ADR-18):** if task-1 went `NOT_TESTABLE` at Phase 6, the dispatch prompt includes a "Tasks completed without tests (NOT_TESTABLE)" section listing the `@test` justification and the `@check` sign-off rationale. Reviewers explicitly evaluate "does the justification still hold given the final diff?" and may BLOCK if it doesn't. If task-1 had tests (the common case), this section reads "None — task-1 has tests."
**Review loop (max 3 cycles):**
1. Send implementation to both reviewers
2. Merge findings (same precedence rules as Phase 4)
3. If ACCEPTABLE: proceed to Phase 9
4. If issues found, route per the kind of finding — **the orchestrator does not write production code; `@make` does**:
- **`BLOCK`, behavioral, correctness, or test-quality findings:** build a new `@make` task spec from the finding (apply Dispatch Hygiene, finalized text, no draft answer). Dispatch `@make`. Do **not** fix directly. Every `BLOCK` is by definition behavioral and must round-trip through `@make`.
- **Strictly cosmetic findings** (typo in a comment, missing trailing newline, formatting that does not change the AST or behavior): the orchestrator may fix directly, then re-review. Anything compiler-detected (unused import, dead code) goes through `@make`, since removing it is still a code change.
**Verdict authority** (ADR-15): same as Phase 4. `@check` is the only blocking reviewer; `@simplify`'s findings are recorded as advisory in the run summary but never trigger a re-dispatch loop.
**Review loop (max 3 cycles, gated on `@check`):**
1. Send implementation to both reviewers (dispatched in parallel).
2. Merge findings: record `@simplify`'s output for the summary; act only on `@check`'s verdict.
3. If `@check` returns ACCEPTABLE: proceed to Phase 9.
4. If `@check` finds issues, route per the kind of finding — **the orchestrator does not write production code; `@make` does**:
- **`BLOCK`, behavioral, correctness, or production-code findings:** write a fix spec to `$RUN_DIR/task-fix-<N>.md` (1-indexed within this Phase 8 cycle, ADR-16). Apply Dispatch Hygiene, finalized text, no draft answer. Verify with `test -f` before dispatching. Dispatch `@make` with the absolute path. Do **not** fix directly. Every `BLOCK` is by definition behavioral and must round-trip through `@make`.
- **Test-quality / test-design findings:** route through the unified Implementation Incomplete diagnosis path (Phase 7) — `@check` diagnosis → `@test` redesign → fresh `@make` re-attempt against the existing task spec.
- **Plan-level finding:** **decrement `PLAN_REWORK_REMAINING`** and re-enter Phase 3 with the finding. If the counter is already `0`, abort to the Failure Handler.
- **Strictly cosmetic findings** (typo in a comment, missing trailing newline, formatting that does not change the AST or behavior): the orchestrator may fix directly, then re-review. Anything compiler-detected (unused import, dead code) goes through `@make` via a `task-fix-<N>.md`, since removing it is still a code change.
- When in doubt, dispatch `@make`.
5. **Convergence detection:** same findings twice = stop loop early
6. If unresolved after 3 cycles: document blockers, proceed to commit anyway
5. **Convergence detection:** same `@check` findings twice = stop loop early.
6. If unresolved after 3 cycles: document blockers, proceed to commit anyway.
---
@ -462,11 +515,16 @@ The workflow is forge-agnostic. It commits locally and stops. **Do not push, and
- If changes are large/varied, use multiple atomic commits (one per logical unit)
### TODO Update
- Dispatch `@pm` against the absolute `./TODO/` path. Ask it to:
- Set the issue file's frontmatter `status` to `Done` (or leave at `In Progress` if the run is incomplete and the user must verify before marking Done).
- Add a comment of the form: `- YYYY-MM-DD — Branch \`$BRANCH_NAME\`, commit <SHA><one-line summary>` (date from the shell, never fabricated).
- Propagate any status flip to the dependent index: `TODO/README.md` for top-level issues (`parent: null`), or the parent file's `## Sub-issues` line for sub-issues.
- If acceptance-criteria checkboxes were addressed by the implementation, ask `@pm` to check them off (flip `- [ ]` to `- [x]` under `## Acceptance criteria`).
Dispatch `@pm` against the absolute `./TODO/` path. Ask it to:
1. **Check off the AC checkboxes that task-1 satisfied.** For each `- [ ]` AC line in `./TODO/$ISSUE_ID.md`'s `## Acceptance criteria` section that the implemented work fulfilled, flip to `- [x]`. The orchestrator decides which AC are satisfied by inspecting task-1's spec and verification output.
2. **Set the issue's frontmatter `status` based on AC completion** (ADR-21, AC-driven):
- **All AC are now `[x]`**`status: Done`.
- **Some AC remain `[ ]`**`status: In Progress`. (Sub-issues filed at Phase 5.5 cover the unmet AC; the user runs them in subsequent invocations.)
- **No AC section in the file**`status: Done` (the parent had no testable AC; one task ran end-to-end).
3. **Propagate any status flip to the dependent index:** `TODO/README.md` for top-level issues (`parent: null`), or the parent issue file's `## Sub-issues` line for sub-issues.
4. **Add a comment** of the form: `- YYYY-MM-DD — Branch \`$BRANCH_NAME\`, commit <SHA><one-line summary>` (date from the shell, never fabricated).
### File Follow-ups
@ -512,11 +570,27 @@ If no follow-ups were filed, the commit message simplifies to `chore(todo): upda
## Failure Handling
At any phase, if an unrecoverable error occurs:
At any phase, if an unrecoverable error occurs (or a routing rule explicitly aborts to the Failure Handler — `PLAN_REWORK_REMAINING` exhausted, `split_needed` at Phase 7, etc.):
1. Write `$RUN_DIR/summary.md` (creating `$RUN_DIR` first if it doesn't exist) with what was completed and what failed. Do **not** stage or commit anything under `.workflow/`.
2. If any code was written, commit it with message `wip: incomplete workflow run for <issue-id>`. Stage code only — exclude `.workflow/` and `TODO/`.
3. Leave the branch and worktree intact for the user to inspect — do not push, do not delete.
4. Dispatch `@pm` against `./TODO/` to add a comment on the issue file (`./TODO/<ID>.md`) summarising what failed.
4. Dispatch `@pm` against `./TODO/` to add a comment on the issue file (`./TODO/<ID>.md`) summarising what failed and naming the abort reason if it was a routing-rule abort (e.g. `split_needed at Phase 7 task-1`, `plan_rework_remaining exhausted at Phase 8`).
5. Stop execution.
### Recovery procedure (workflow is non-resumable, ADR-14)
The workflow is **non-resumable**. There is no `--resume` mode and no idempotent re-run path. To retry after a Failure Handler invocation (or after a user-initiated cancellation):
1. `git worktree remove <path>` — discard the failed worktree.
2. Delete the feature branch: `git branch -D <branch>`. The Failure Handler's `wip:` commit (if any) is discarded with the branch.
3. Re-create the worktree from `$BASE_BRANCH`: `git worktree add <path> -b <branch> <base-branch>`.
4. Re-run `/workflow <ISSUE-ID>` from the fresh worktree.
The throwaway-worktree model is the recovery story. Re-running on the same worktree without this cleanup risks committing partial state or appending duplicate `@pm` comments.
### User-initiated cancellation
User-initiated cancellation (Ctrl-C) follows the same procedure as automatic Failure Handler invocations: discard the worktree per the recovery procedure above. The orchestrator does not poll a soft-stop sentinel — runs are short enough (one task, ~1030 min) that hard cancellation is fine.
**Never hang on interactive prompts.** If any command appears to require input, treat it as a failure and follow the above procedure.