feat(opencode): one-task-per-run model + 9 routing fixes (ADRs 13-21)

Captures the design grilling outcome. Adds ADRs 13-21 covering: - run-level plan_rework_remaining counter to bound P3<->P5.5/P7/P8 thrash - non-resumable workflow with throwaway-worktree recovery procedure - @simplify advisory at every gate (not just Phase 8) - Phase 8 fix specs go to disk as task-fix-N.md (preserves ADR-6) - Phase 5.5 BLOCK protocol: orchestrator edits plan, decrements counter, re-enters P4 - Phase 8 NOT_TESTABLE manifest in reviewer prompt - unified Implementation Incomplete diagnosis (test_design / production_logic / split_needed) - Phase 1 working-tree cleanliness + depends-on enforcement - one-task-per-run pivot: Phase 5 still splits N tasks, only task-1 runs; tasks 2..N filed as sub-issues with rich seed bodies; split_needed at P7 aborts to Failure Handler (one-task-per-run = no salvageable prior work) Auto-resolves big-diff Phase 8 reviews, cross-task regression-within-run, and mid-flight task-split routing. Rewrites routing matrix and three Mermaid diagrams; updates @pm (depends-on frontmatter, split-time filing), @check (third diagnosis verdict), @make (escalate: split_needed flag).
2026-05-08 13:02:54 +02:00 · 2026-05-08 13:02:54 +02:00 · af6481a5a7
commit af6481a5a7
parent 0b15944d1c
5 changed files with 342 additions and 130 deletions
--- a/config/opencode/agents/check.md
+++ b/config/opencode/agents/check.md
@ -145,6 +145,14 @@ High-level:
 - No excessive mocking (>2 mocks is a yellow flag)?
 - Diagnose issues and report findings. Do NOT edit test files — the caller routes fixes back to `@test`.

+**When diagnosing `Implementation Incomplete` from `@make`** (the `/workflow` Phase 7 unified diagnosis path, per ADR-19): you receive `@make`'s self-diagnosis hint (`escalate: test_design`, `escalate: split_needed`, or no flag), the test files, the in-progress production diff, and the task spec. Return one of three verdicts in your output:
+
+- **`test_design`** — the test demands production code that's impossible, internally-inconsistent, or testing the wrong observable. The fix is in the tests. (Caller routes to `@test` for redesign.)
+- **`production_logic`** — the test is sound; `@make`'s implementation is wrong or incomplete. The fix is in the production code. (Caller re-dispatches `@make` with your notes.)
+- **`split_needed`** — the task itself is over-scoped: no realistic implementation can satisfy the AC within the task's stated files-to-modify. Either the AC require touching files not listed, or the AC mix multiple concerns that should have been split at Phase 5 (per the workflow's Split Heuristic). (Caller aborts to the Failure Handler; the user re-plans from scratch.)
+
+State the verdict explicitly — e.g. "Diagnosis: `split_needed` — the AC implies modifying both `src/foo.rs` and the EventLoop registration in `src/main.rs`, but the task spec lists only `src/foo.rs`. This is a Phase 5 split error, not a code or test error." Calibrate confidence honestly: `split_needed` is the heaviest verdict (it kills the run); reserve it for cases where neither test redesign nor code-fix would plausibly converge.
+
 **When reviewing NOT_TESTABLE verdicts:**
 - Does the reason match an allowed category (config-only, external-system, non-deterministic, pure-wiring)?
 - Was a test approach genuinely attempted?
--- a/config/opencode/agents/make.md
+++ b/config/opencode/agents/make.md
@ -273,12 +273,15 @@ If tests fail or verification doesn't pass:
 3. **Code issues** — Attempt fix (max 2-3 attempts if making progress)
 4. **Flaky/infra issues** — Stop and report with diagnostics
 5. **Test-design suspicion** — If after 1–2 attempts the test seems to demand production code that contradicts the spec, asserts on internal state that shouldn't be observable, mocks an internal boundary instead of the external one, or otherwise looks like it's testing the wrong thing — **stop and report with `escalate: test_design`** in the Blocking Issue section. Do not modify the test file yourself; the caller will route to `@check` for diagnosis and `@test` for redesign per the workflow's Phase 7 escalation.
+6. **Task-scope suspicion** — If after 1–2 attempts you find that the AC realistically require modifying files not listed in your "Files to Modify," or the AC mix multiple distinct concerns that don't fit one coherent change (e.g. a new type *and* its registration site *and* a new system using it), the task is over-scoped — **stop and report with `escalate: split_needed`** in the Blocking Issue section. State concretely which file(s) outside your modify list you'd need, or which concerns the task is mixing. Do not silently expand scope; the caller will route to `@check` for diagnosis and (per the workflow's ADR-21) the run will abort to the Failure Handler so the user can re-plan from scratch.
+
+The `escalate:` flag is a *hint* to the caller's diagnosis routing — `@check` is the authority that confirms or rejects it. Reporting `escalate: split_needed` doesn't guarantee the run aborts; if `@check` decides the task is sound and the issue is in tests or production code, the diagnosis will route back to a normal `test_design` or `production_logic` verdict.

 If still failing after 2-3 focused attempts, **stop and report**:
 - What was implemented
 - What's failing and why
 - What you tried
- Suggested next steps (including `escalate: test_design` if the failure points at the test rather than the production code)
+- Suggested next steps (with the appropriate `escalate:` flag if you have a strong hypothesis about which kind of failure this is)

 Do not loop indefinitely. Better to report a clear failure than burn context.

--- a/config/opencode/agents/pm.md
+++ b/config/opencode/agents/pm.md
@ -48,6 +48,7 @@ title: Implement a special stage type
 status: Done
 parent: GAL-38
 labels: [gameplay, advanced-mechanics]
+depends-on: [GAL-37]
 ---

 # GAL-39: Implement a special stage type
@ -80,6 +81,7 @@ Free-form markdown describing the problem and context. Spans as many paragraphs
 - `status` — one of: `Todo`, `In Progress`, `Done`. (No other values; the old `Backlog`/`In Review`/`Cancelled` set is gone.)
 - `parent` — either `null` (top-level issue) or another issue ID (e.g. `GAL-38`). Sub-issues belong to their parent's `## Sub-issues` list.
 - `labels` — YAML list of strings, e.g. `[gameplay, advanced-mechanics]`. May be `[]`.
+- `depends-on` — *optional* YAML list of issue IDs that must reach `status: Done` before this issue can be started. Used by `/workflow`'s Phase 1 sanity check to hard-block runs whose dependencies aren't satisfied (per ADR-21). Omit the field entirely when there are no dependencies; do not write `depends-on: []`. Cycles are not detected by this agent — the caller is responsible for not creating a cycle.

 **Body rules:**
 - The first heading is `# <ID>: <title>` (matches frontmatter).
@ -172,9 +174,24 @@ For list output, return an array of `{id, title, status, parent, labels}` object
 ### When creating issues
 - Default `status: Todo` unless the caller says otherwise.
 - Title: short, imperative ("Add retry logic to ingest worker", not "retry stuff").
- Frontmatter must be complete: `id`, `title`, `status`, `parent`, `labels`.
+- Frontmatter must be complete: `id`, `title`, `status`, `parent`, `labels`. Add `depends-on:` when the caller specifies dependencies.
 - Always update the dependent index (README.md for top-level, parent file for sub-issues) so the new issue is visible.

+### Split-time sub-issue creation (rich-body filings)
+
+When the `/workflow` orchestrator dispatches you mid-run to file a sub-issue from a Phase 5.5 task split (per ADR-21), the caller passes a structured body containing more than the usual minimum. Treat the body as already-finalized — write it verbatim into the new issue file. Common sections you'll see:
+
+- `## What to implement` — one-line + brief description.
+- `## Acceptance criteria` — checkboxes; preserve `- [ ]` state (newly filed sub-issues start with all AC unchecked).
+- `## Code Context` — code snippets carried over from the split-time task spec.
+- `## Integration with sibling sub-issues` — narrative; the structural dependencies belong in the `depends-on:` frontmatter list, which the caller will pass alongside the body.
+- `## Plan rationale` — slice of the parent's plan.
+- `## Test design` — when present.
+
+Use the rendered ordering: H1 → description (the "Discovered during run on …" attribution paragraph that ends the body counts as part of the description) → `## Sub-issues` (omit; sub-issues won't have their own children at filing time) → `## Acceptance criteria` → `## Integration test hints` (omit unless caller passed it) → `## Comments` (omit until first comment is appended).
+
+Add the `split-from-run` label to the labels list when the caller specifies it, alongside any propagated parent labels.
+
 ### When updating status
 - Confirm the change (e.g. "GAL-39 status: In Progress → Done").
 - A status change to `Done` is only valid if all acceptance-criteria checkboxes (when the section exists) are checked. If they are not, report which ones remain and ask for confirmation before forcing the change.
--- a/config/opencode/commands/workflow.md
+++ b/config/opencode/commands/workflow.md
@ -86,7 +86,10 @@ Define `RUN_DIR="$WORKTREE_PATH/.workflow/run-$ISSUE_ID"` once in Phase 1 and re
   - Else if `git rev-parse --verify --quiet master` succeeds, use `master`.
   - Else stop: "Could not determine base branch (no `main` or `master`). Pass it as the second argument: `/workflow <ISSUE-ID> <base-branch>`."
 7. Verify the current branch is not the base branch: if `BRANCH_NAME == BASE_BRANCH`, stop: "Cannot run workflow on the base branch (`$BASE_BRANCH`). Switch to a feature branch first."
-8. Set the run-artifacts directory: `RUN_DIR="$WORKTREE_PATH/.workflow/run-$ISSUE_ID"`. Phase 3 will `mkdir -p "$RUN_DIR"` before writing the first artifact.
+8. **Verify the working tree is clean** (ADR-20): `git status --porcelain` must return empty. If not, stop: "Working tree must be clean. Commit or stash uncommitted changes before running the workflow."
+9. **Check `depends-on:` declarations** (ADR-21): if `./TODO/$ISSUE_ID.md`'s frontmatter contains a `depends-on: [<ID>, ...]` list, verify every listed dependency's status is `Done` (read each `./TODO/<DEP-ID>.md`). If any dependency is not `Done`, stop: "Cannot start `$ISSUE_ID`; it depends on `<DEP-ID>` (status: `<status>`). Complete dependencies first." If a listed dependency file does not exist, stop with: "Cannot start `$ISSUE_ID`; declared dependency `<DEP-ID>` has no issue file." If the field is absent, proceed.
+10. Set the run-artifacts directory: `RUN_DIR="$WORKTREE_PATH/.workflow/run-$ISSUE_ID"`. Phase 3 will `mkdir -p "$RUN_DIR"` before writing the first artifact.
+11. Initialize the run-level rework counter: `PLAN_REWORK_REMAINING=1` (per ADR-13). Decrement on every P5.5-BLOCK→P4, P7-escalation-exhaustion→P3, and P8-plan-level→P3 transition. When the counter is `0` and another such transition fires, abort to the Failure Handler instead of re-entering.

 ---

@ -196,24 +199,23 @@ Reviewers should evaluate testability:
 - `@check`: Is the design testable? Are the right behaviors identified? (Review Framework §8)
 - `@simplify`: Is the test scope appropriate? Over-testing proposed?

-**Merge rules:**
- `@check` safety/correctness findings are hard constraints
- If `@simplify` recommends removing something `@check` flags as needed, `@check` wins
- Note conflicts explicitly
+**Verdict authority** (ADR-15): `@check` is the only blocking reviewer. Its `NEEDS WORK` and `BLOCK` verdicts gate progression; its `ACCEPTABLE` verdict permits proceeding regardless of what `@simplify` reports. `@simplify` is advisory at every gate — its findings are recorded in the run summary's "Advisory notes (not filed)" section but never trigger a re-dispatch loop. The user may manually elevate a `@simplify` finding into a `task-fix-<N>.md` if it warrants follow-up.

-**Review loop (max 3 cycles):**
+**Review loop (max 3 cycles, gated on `@check`):**
 1. Dispatch both reviewers against `$RUN_DIR/plan.md`.
-2. Merge findings
-3. If verdict is ACCEPTABLE from both (or JUSTIFIED COMPLEXITY from `@simplify`): proceed to Phase 5
-4. If BLOCK or NEEDS WORK: edit `$RUN_DIR/plan.md` in place addressing findings (re-apply Dispatch Hygiene to the updated file), then re-review.
-5. **Convergence detection:** if reviewers return the same findings as the previous cycle, stop the loop early
-6. If still unresolved after 3 cycles: note unresolved blockers and proceed anyway (they will be documented in the workflow summary and commit message)
+2. Merge findings: record `@simplify`'s output for the summary; act only on `@check`'s verdict.
+3. If `@check` returns ACCEPTABLE: proceed to Phase 5 (regardless of `@simplify`).
+4. If `@check` returns BLOCK or NEEDS WORK: edit `$RUN_DIR/plan.md` in place addressing the findings (re-apply Dispatch Hygiene to the updated file), then re-review.
+5. **Convergence detection:** if `@check` returns the same findings as the previous cycle, stop the loop early.
+6. If still unresolved after 3 cycles: note unresolved blockers and proceed anyway (they will be documented in the workflow summary and commit message).

 ---

 ## Phase 5: Split into Tasks

-**The output of this phase is one file per task at `$RUN_DIR/task-<N>.md`** (1-indexed: `task-1.md`, `task-2.md`, …). These files are the source-of-truth that Phase 6 (`@test`) and Phase 7 (`@make`) read by absolute path. **No file written = no dispatch in later phases.** If you skip the file-write step, every downstream dispatch will reference a non-existent path and fail.
+**The output of this phase is one file per task at `$RUN_DIR/task-<N>.md`** (1-indexed: `task-1.md`, `task-2.md`, …). These files are the source-of-truth that Phase 5.5, Phase 6, and Phase 7 read by absolute path. **No file written = no dispatch in later phases.** If you skip the file-write step, every downstream dispatch will reference a non-existent path and fail.
+
+**One-task-per-run model (ADR-21):** Phase 5 still produces N task files. After Phase 5.5 review, only **task-1** runs through Phases 6–8 in this invocation. If N > 1, tasks 2…N are filed as TODO sub-issues at the end of Phase 5.5 (see "File sibling tasks as sub-issues" below) and the user runs `/workflow` separately on each. This bounds the run's scope, keeps Phase 8 diffs reviewable, and eliminates cross-task regression risk inside a run.

 Steps:

@ -254,9 +256,11 @@ Include **Test Design** from Phase 3 when available, attached to the relevant ta

 **Task size:** ~10-30 minutes each, single coherent change, clear boundaries.

-### Split Heuristic — when in doubt, split
+### Split Heuristic — when in doubt, **do not** split

-A task must be **split** if any of the following apply:
+In the one-task-per-run model (ADR-21), splitting fans work out across user sessions: every additional task becomes a sub-issue the user must come back and run as its own `/workflow` invocation, with full P3/P4/P5/P5.5/P6/P7/P8 overhead per sub-issue. **Default to keeping work in one task.** Only split when one of the mechanical triggers below clearly applies *and* the resulting sub-tasks each warrant their own commit/PR-sized chunk of attention.
+
+A task should be **split** when any of the following apply:

 - It touches more than two distinct concerns (e.g. *constants + new component + sprite spawn + new system + main wiring* is **five** concerns — at least three tasks).
 - It changes more than ~50 lines across more than 2 files.
@ -264,12 +268,14 @@ A task must be **split** if any of the following apply:
 - It mixes pure-logic changes (math helpers) with stateful changes (queries, world mutation).
 - It mixes new APIs with their first call sites in the same task.

-When a task fails the heuristic, split into:
+**Tiebreaker:** when none of the triggers clearly applies and the work plausibly fits a single coherent commit, do not split. Splitting fans out across sessions; only split if each resulting sub-issue is genuinely independently runnable and benefits from its own plan.
+
+When a task does fail the heuristic, split into:
 1. **Foundations** — new constants, types, components (no behavior change yet).
 2. **Implementation** — the actual production logic, calling the foundations.
 3. **Wiring** — registration in `main.rs` / `lib.rs` / app-builder.

-Each split is dispatched separately to `@make` and verified before the next.
+Tasks 2…N are filed as sub-issues at the end of Phase 5.5; only task 1 runs in this invocation.

 ### Code Context — what to include

@ -289,32 +295,73 @@ Apply **Dispatch Hygiene** to each task spec before dispatch in Phase 7.

 ## Phase 5.5: Review Task Split

-A short, focused review of the task split as a set. Catches split errors (missed scope, overlap, multi-purpose tasks, missing integration contracts) when they're cheap to fix — *before* `@test` and `@make` dispatch on a broken split. Without this gate, the same errors surface only at Phase 8 final review, after expensive test/implementation work has already been done.
+A short, focused review of the task split as a set. In the one-task-per-run model (ADR-21), this phase is the gate for **two** things: (a) catching split errors (missed scope, overlap, multi-purpose tasks, missing integration contracts) before `@test`/`@make` dispatch, and (b) preventing a botched split from being persisted as garbage sub-issues that the user has to manually clean up later. Both stakes are higher than in the original N-tasks-per-run design.

 **Dispatch only `@check`** for this phase — split review is structural / coverage, not complexity. `@simplify` is not involved. Apply **Dispatch Hygiene** to the prompt.

+**Skip Phase 5.5 entirely when N=1** (ADR-21): a single-task plan has no split to review. Three of the six questions below (no overlap, integration contracts, sub-issue self-containment) are degenerate. The remaining structural concerns (coverage, single-purpose, testable AC) are already evaluated at Phase 4 plan acceptance. Proceed directly to Phase 6.
+
 The dispatch prompt names:
 - `$RUN_DIR/plan.md` (the plan being decomposed)
 - `$RUN_DIR/task-1.md` through `$RUN_DIR/task-N.md` (the split — list every task file)
 - The worktree path

-`@check` evaluates the split against five questions:
+`@check` evaluates the split against six questions:

 1. **Coverage** — do the tasks together implement everything the plan promises? Any gaps?
 2. **No overlap** — do two tasks claim the same scope or modify the same lines?
 3. **Single-purpose** — does any task do more than one thing? (See Phase 5's Split Heuristic.)
-4. **Integration contracts** — where two tasks touch a shared interface, is the contract documented in both task files?
+4. **Integration contracts** — where two tasks touch a shared interface, is the contract documented in *both* task files in a form that survives sub-issue filing? (Each sub-issue runs in isolation later — its eventual P3 plan must be reconstructable from the sub-issue body alone, including any cross-sub-issue dependencies. This is the load-bearing question in the new model.)
 5. **Testable acceptance criteria** — does every task have specific, falsifiable AC?
+6. **Self-containment** — is each task spec runnable as a standalone `/workflow` invocation? Does its description carry enough plan-level context (rationale, code seams, scope boundary) that a fresh run could re-plan it without seeing the parent plan or sibling task files?

 **Review loop (max 2 cycles):**

 1. Dispatch `@check` against the plan + all task files.
-2. If `ACCEPTABLE` → proceed to Phase 6.
-3. If `NEEDS WORK` → edit the task files in place (split a task into two, merge two tasks, add integration contracts, sharpen AC). Re-apply Dispatch Hygiene to each updated file. Re-dispatch.
-4. If `BLOCK` → the plan itself does not decompose cleanly. Return to Phase 4 with `@check`'s finding instead of forcing the split.
-5. **Convergence detection:** same finding twice → stop loop, document the unresolved split issue in the run summary, proceed.
+2. If `ACCEPTABLE` → proceed to "File sibling tasks as sub-issues" (below), then Phase 6.
+3. If `NEEDS WORK` → edit the task files in place (split a task into two, merge two tasks, strengthen integration contracts, sharpen AC, add self-containment context). Re-apply Dispatch Hygiene to each updated file. Re-dispatch.
+4. If `BLOCK` plan-level finding (ADR-17) → translate the split-level finding into a concrete `plan.md` edit, save the edit, **decrement `PLAN_REWORK_REMAINING`**, and re-enter Phase 4 against the revised plan. If `PLAN_REWORK_REMAINING` was already `0`, abort to the Failure Handler instead.
+5. **Convergence detection:** same `@check` finding twice → stop loop, document the unresolved split issue in the run summary, proceed.

-**This is a quick gate, not a deep review.** No line-by-line code feedback (there's no code), no design re-litigation (that was Phase 4's job). The whole point is a fast structural check before downstream phases start churning.
+**This is a quick gate, not a deep review.** No line-by-line code feedback (there's no code), no design re-litigation (that was Phase 4's job). The whole point is a fast structural check before downstream phases start churning *and* before sibling tasks become persistent sub-issues.
+
+### File sibling tasks as sub-issues (when N > 1)
+
+After Phase 5.5 returns ACCEPTABLE, dispatch `@pm` to file each of `task-2.md` through `task-N.md` as a TODO sub-issue with `parent: $ISSUE_ID`. **Only task-1 continues into Phase 6.** Each filed sub-issue gets a rich seed body (ADR-21) so its eventual `/workflow` run can plan and implement without seeing siblings or the original `plan.md`.
+
+For each task `$N` in 2…N, dispatch `@pm` with the following body content (assembled by the orchestrator from `task-<N>.md` and the relevant slice of `plan.md`):
+
+```markdown
+## What to implement
+<task description from task-<N>.md>
+
+## Acceptance criteria
+<AC checkboxes from task-<N>.md>
+
+## Code Context
+<code snippets from task-<N>.md>
+
+## Integration with sibling sub-issues
+<dependencies on sibling sub-issues, with brief rationale; declared in frontmatter as `depends-on: [<SIB-ID>, ...]`>
+
+## Plan rationale
+<relevant slice of plan.md — typically 1–3 paragraphs covering why this approach was chosen>
+
+## Test design
+<from task-<N>.md or plan.md if present>
+
+---
+Discovered during run on `$BRANCH_NAME` for parent issue `$ISSUE_ID`.
+```
+
+`@pm` invocation per sub-issue:
+- Title — derived from `task-<N>.md`'s task description (short imperative).
+- Status — `Todo`.
+- Parent — `$ISSUE_ID`.
+- Labels — propagate relevant labels from the parent (e.g. `gameplay`); add `split-from-run` to mark the provenance.
+- `depends-on:` — sibling sub-issue IDs that this task requires to be `Done` first. The orchestrator determines the dependency graph from the integration contracts captured in Phase 5.5 question 4.
+
+The new sub-issue files plus the parent's updated `## Sub-issues` list are staged in Phase 9's `chore(todo): …` commit alongside the parent's status/AC updates.

 ---

@ -338,7 +385,7 @@ Do **not** quote task or plan content inline — `@test` reads from disk.
 |-----------|--------|
 | `TESTS_READY` + `escalate_to_check: false` | Proceed to Phase 7 |
 | `TESTS_READY` + `escalate_to_check: true` | Route tests to `@check` for light review. `@check` diagnoses, caller routes fixes to `@test`. Then proceed. |
-| `NOT_TESTABLE` | Route to `@check` for sign-off on justification. If `Missing testability seam`, dispatch `@make` to add the seam first, then re-run `@test`. Otherwise the task goes to `@make` without tests. |
+| `NOT_TESTABLE` | Route to `@check` for sign-off on justification. If `Missing testability seam`, dispatch `@make` to add the seam first, then re-run `@test`. Otherwise the task goes to `@make` without tests. **Record the `@test` justification + `@check` sign-off rationale** for the Phase 8 NOT_TESTABLE manifest (ADR-18). |
 | `BLOCKED` | Investigate. May need to revise task spec or plan. |
 | Test passes immediately | Investigate — behavior may already exist. Task spec may be wrong. |
 | Stub-first run: tests pass with zero `todo!()` panics | **Structural-only tests.** Every test is asserting type/struct/enum facts without calling any stubbed symbol. Reject the test output and route back to `@test` with a "must exercise the stubbed symbols by calling them" note. Do not let these tests gate Phase 7 — they cannot RED→GREEN, so the body-pass `@make` would commit code with false-green coverage. |
@ -373,82 +420,88 @@ To get a clean runtime RED, dispatch a **stub-first `@make` pass** *before* `@te

 The stub pass and the body pass each produce their own atomic commit (per Phase 9 rules): `feat(<scope>): scaffold <thing> with todo!() stubs` followed by `feat(<scope>): implement <thing>` (or whichever conventional type fits).

-**Parallelism:**
- **Python:** Independent tasks can have tests written in parallel, *provided* their test files are disjoint and no shared `conftest.py` is being modified.
- **Rust:** Run `@test` dispatches **sequentially**. Cargo serialises the build via the `target/` directory lock, so parallel dispatches give no speedup; they only add risk (a long-running build in one branch starves the other, and any task that touches a shared crate-level fixture/helper file will race).
+**Parallelism:** N/A in the one-task-per-run model (ADR-21). Phase 6 dispatches `@test` for task-1 only; the stub-pass `@make` (when applicable) runs strictly before `@test`, and the body-pass `@make` strictly after. There are no peer dispatches to parallelise.

-**Constraint:** `@test` must not modify existing `conftest.py` files (prevents collision during parallel execution).
+**Constraint:** `@test` must not modify existing `conftest.py` files (preserves cross-test invariants for the project's broader suite).

 ---

 ## Phase 7: Implement

-Apply **Dispatch Hygiene** to each `@make` spec before sending. Repeated trips on a single task signal a Phase 5 split problem — go back and split.
+Apply **Dispatch Hygiene** to each `@make` spec before sending. Repeated trips on a single task signal a scoping problem — `@check`'s diagnosis (below) will return `split_needed`, and the orchestrator files the task as a sub-issue and exits.

-**`@make` dispatches are SEQUENTIAL — never in parallel.** Run each task to completion (writes, every verification command, and the orchestrator's post-check) before dispatching the next. Reasons:
- `@make` writes source files. Parallel agents picking the same file (e.g. `src/lib.rs` for adding both a new `pub mod` and a registration) corrupt each other.
- Even on disjoint files, Cargo's `target/` lock and uv's venv state serialise the verification builds anyway, so parallelism gives no speedup.
- Stub-pass/body-pass pairs (Rust integration TDD) must be strictly ordered within a task; running stub-pass for task 2 while body-pass for task 1 is still building yields a non-deterministic crate state for `@test` to RED against.
+**One task per run** (ADR-21): only task-1 reaches Phase 7. Tasks 2…N were filed as sub-issues at the end of Phase 5.5; this run does not dispatch `@make` for any of them. Stub-pass and body-pass dispatches for task-1 still run sequentially (the stub-pass must commit before `@test` runs against it, and the body-pass must run after `@test` produces failing tests).

-This applies to **all** `@make` invocations: standard mode, TDD mode, stub-pass, body-pass, and integration-fix dispatches.
-
-Execute each task by dispatching `@make` with a short prompt:
- The absolute path to the task spec: `$RUN_DIR/task-<N>.md` — `@make` reads acceptance criteria, code context, and files-to-modify from there.
+Execute task-1 by dispatching `@make` with a short prompt:
+- The absolute path to the task spec: `$RUN_DIR/task-1.md` — `@make` reads acceptance criteria, code context, and files-to-modify from there.
 - The worktree path.
 - **Pre-written failing tests and handoff from `@test` (if TESTS_READY)** — these are short and per-dispatch, so include them inline in the prompt.

 Do **not** quote the task spec inline.

 `@make` runs in TDD mode when tests are provided:
-1. Entry validation: run tests, verify RED, check failure codes match handoff
-2. Implement minimal code to make tests pass (GREEN)
-3. Regression check on broader area
-4. Refactor while keeping green
-5. Report RED→GREEN evidence
+1. Entry validation: run tests, verify RED, check failure codes match handoff.
+2. Implement minimal code to make tests pass (GREEN).
+3. Regression check: run the project's full test suite (`nix develop -c cargo test` / `uv run pytest` / `nix flake check` as appropriate).
+4. Refactor while keeping green.
+5. Report RED→GREEN evidence.

-**Escalation — two paths route through `@check` → `@test` → back to `@make`:**
+For NOT_TESTABLE tasks, `@make` runs in standard mode (no entry validation; standard implementation + verification).

-1. **Entry-validation escalation.** Before implementing, `@make`'s entry check (run tests, verify RED, compare against handoff) reveals test-quality concerns — wrong assertion target, mixed failure codes, mocks of internal boundaries, etc. `@make` reports without writing any production code.
-2. **Mid-implementation escalation.** After implementing, `@make` hits its iteration limit (2–3 attempts) because the test demands production code that's impossible or contradicts the spec. `@make` returns `Implementation Incomplete` with the flag `escalate: test_design`. **Do not** re-dispatch `@make` with marginal context tweaks — that just burns cycles on a test that needs redesign, not better implementation.
+### Implementation Incomplete — unified diagnosis path (ADR-19)

-In both cases:
+When `@make` returns `Implementation Incomplete` for *any* reason — entry-validation concern, mid-implementation iteration limit, `escalate: test_design` flag, `escalate: split_needed` flag, or no flag at all — the orchestrator routes the report through `@check` for diagnosis. **Do not re-dispatch `@make` with marginal context tweaks.** `@make`'s self-diagnosis (the `escalate:` flag, if present) becomes a *hint* for `@check`; `@check` is the authority that decides what to do next.

-1. `@make` returns its report (entry-time concern or mid-impl `escalate: test_design`).
-2. Orchestrator routes the report to `@check` for diagnosis (light review of the *tests*, not the implementation).
-3. `@check` confirms or rejects the test-design suspicion.
-4. **If confirmed:** orchestrator routes to `@test` to redesign the tests. Apply Dispatch Hygiene. Fixed tests return to `@make` for fresh entry validation and a clean implementation attempt.
-5. **If rejected:** the issue is in the production code; orchestrator re-dispatches `@make` with `@check`'s diagnostic notes attached.
+Steps:

-**Iteration limit on this loop: max 2 cycles.** If a test-design suspicion keeps surfacing but `@check` never confirms it, the design problem is upstream — revisit the Phase 3 plan rather than thrashing between `@test` and `@make`.
+1. `@make` returns its `Implementation Incomplete` report (with or without an `escalate:` flag).
+2. Orchestrator dispatches `@check` for diagnosis. Inputs: the test files (`@check` reads them from disk), the production code state (the in-progress diff), the task spec at `$RUN_DIR/task-1.md`, and `@make`'s self-diagnosis hint.
+3. `@check` returns one of three verdicts:
+   - **`test_design`** — the test demands production code that's impossible, internally-inconsistent, or testing the wrong observable. Orchestrator dispatches `@test` to redesign the tests. Apply Dispatch Hygiene. Fixed tests return to `@make` for fresh entry validation and a clean implementation attempt.
+   - **`production_logic`** — the test is sound; `@make`'s implementation is wrong or incomplete. Orchestrator re-dispatches `@make` with `@check`'s diagnostic notes attached.
+   - **`split_needed`** — the task is over-scoped; no realistic implementation can satisfy the AC within the task's stated files-to-modify. See "split_needed exit" below.

-For NOT_TESTABLE tasks, `@make` runs in standard mode.
+**Iteration limit on this loop: max 2 cycles.** If two cycles of `@check` diagnosis don't yield a clean `@make` Implementation Complete, the design problem is upstream — **decrement `PLAN_REWORK_REMAINING`** and revisit the Phase 3 plan. If `PLAN_REWORK_REMAINING` is already `0`, abort to the Failure Handler instead.

-After all tasks complete, verify overall integration:
- Run the project's test suite if available
- Run linting/type checking if configured
- Fix any integration issues between tasks
+### split_needed exit (ADR-21)
+
+When `@check` diagnosis returns `split_needed` for task-1, **abort to the Failure Handler.** Rationale (Q19a per ADR-21): in the one-task-per-run model, task-1 is the only task in this invocation. No feature AC have been satisfied yet (any stub-pass commit is scaffolding, not feature work). Pre-filing a sub-issue from a botched plan would pollute `TODO/` with bad scoping the user has to manually clean up; the most likely upstream diagnosis is a plan-level scoping error, not a task-level over-scope.
+
+The Failure Handler's recovery procedure (ADR-14: discard worktree, delete branch, re-create from base, re-run) cleanly destroys the stub-pass commit and any in-progress body-pass changes. The user re-runs `/workflow` on the same parent issue; Phase 3/4/5 produce a fresh plan and split. Sub-issues filed at Phase 5.5 of the *previous* run remain in `TODO/` (they were committed there in the failed run only if Phase 9 reached the TODO commit, which by definition it didn't if Phase 7 invoked the Failure Handler) — typically they're gone with the discarded worktree.
+
+Concretely on `split_needed`:
+
+1. Write a Failure Handler summary noting `@check`'s diagnosis verbatim and the Phase 5 split that was attempted.
+2. Dispatch `@pm` to add a comment on `./TODO/$ISSUE_ID.md`: `- YYYY-MM-DD — split_needed at Phase 7 task-1; <one-line diagnosis>. Re-run after re-creating the worktree.`
+3. Stop execution. Do not commit code, do not file new sub-issues, do not stage anything under `.workflow/`.

 ---

 ## Phase 8: Final Review

-Apply **Dispatch Hygiene** to each reviewer prompt before sending. Dispatch `@check` and `@simplify` in parallel to review the full implementation (all changes across all files).
+Apply **Dispatch Hygiene** to each reviewer prompt before sending. Dispatch `@check` and `@simplify` in parallel to review the implementation. Because the run executes exactly one task (ADR-21), the diff is bounded by the task's scope (~50 lines per the Split Heuristic) and fits comfortably in reviewer context — no chunking needed.

 Provide reviewers with:
- The absolute path to `$RUN_DIR/plan.md` (the same file Phase 4 reviewed; mid-loop revisions will have updated it in place)
- The full diff (`git diff "$BASE_BRANCH"...HEAD`)
- Any decisions or deviations from the plan, captured inline in the dispatch prompt
+- The absolute path to `$RUN_DIR/plan.md` (the same file Phase 4 reviewed; mid-loop revisions will have updated it in place).
+- The absolute path to `$RUN_DIR/task-1.md` (the spec the implementation actually targeted).
+- The full diff (`git diff "$BASE_BRANCH"...HEAD`).
+- Any decisions or deviations from the plan, captured inline in the dispatch prompt.
+- **NOT_TESTABLE manifest (ADR-18):** if task-1 went `NOT_TESTABLE` at Phase 6, the dispatch prompt includes a "Tasks completed without tests (NOT_TESTABLE)" section listing the `@test` justification and the `@check` sign-off rationale. Reviewers explicitly evaluate "does the justification still hold given the final diff?" and may BLOCK if it doesn't. If task-1 had tests (the common case), this section reads "None — task-1 has tests."

-**Review loop (max 3 cycles):**
-1. Send implementation to both reviewers
-2. Merge findings (same precedence rules as Phase 4)
-3. If ACCEPTABLE: proceed to Phase 9
-4. If issues found, route per the kind of finding — **the orchestrator does not write production code; `@make` does**:
-   - **`BLOCK`, behavioral, correctness, or test-quality findings:** build a new `@make` task spec from the finding (apply Dispatch Hygiene, finalized text, no draft answer). Dispatch `@make`. Do **not** fix directly. Every `BLOCK` is by definition behavioral and must round-trip through `@make`.
-   - **Strictly cosmetic findings** (typo in a comment, missing trailing newline, formatting that does not change the AST or behavior): the orchestrator may fix directly, then re-review. Anything compiler-detected (unused import, dead code) goes through `@make`, since removing it is still a code change.
+**Verdict authority** (ADR-15): same as Phase 4. `@check` is the only blocking reviewer; `@simplify`'s findings are recorded as advisory in the run summary but never trigger a re-dispatch loop.
+
+**Review loop (max 3 cycles, gated on `@check`):**
+1. Send implementation to both reviewers (dispatched in parallel).
+2. Merge findings: record `@simplify`'s output for the summary; act only on `@check`'s verdict.
+3. If `@check` returns ACCEPTABLE: proceed to Phase 9.
+4. If `@check` finds issues, route per the kind of finding — **the orchestrator does not write production code; `@make` does**:
+   - **`BLOCK`, behavioral, correctness, or production-code findings:** write a fix spec to `$RUN_DIR/task-fix-<N>.md` (1-indexed within this Phase 8 cycle, ADR-16). Apply Dispatch Hygiene, finalized text, no draft answer. Verify with `test -f` before dispatching. Dispatch `@make` with the absolute path. Do **not** fix directly. Every `BLOCK` is by definition behavioral and must round-trip through `@make`.
+   - **Test-quality / test-design findings:** route through the unified Implementation Incomplete diagnosis path (Phase 7) — `@check` diagnosis → `@test` redesign → fresh `@make` re-attempt against the existing task spec.
+   - **Plan-level finding:** **decrement `PLAN_REWORK_REMAINING`** and re-enter Phase 3 with the finding. If the counter is already `0`, abort to the Failure Handler.
+   - **Strictly cosmetic findings** (typo in a comment, missing trailing newline, formatting that does not change the AST or behavior): the orchestrator may fix directly, then re-review. Anything compiler-detected (unused import, dead code) goes through `@make` via a `task-fix-<N>.md`, since removing it is still a code change.
   - When in doubt, dispatch `@make`.
-5. **Convergence detection:** same findings twice = stop loop early
-6. If unresolved after 3 cycles: document blockers, proceed to commit anyway
+5. **Convergence detection:** same `@check` findings twice = stop loop early.
+6. If unresolved after 3 cycles: document blockers, proceed to commit anyway.

 ---

@ -462,11 +515,16 @@ The workflow is forge-agnostic. It commits locally and stops. **Do not push, and
 - If changes are large/varied, use multiple atomic commits (one per logical unit)

 ### TODO Update
- Dispatch `@pm` against the absolute `./TODO/` path. Ask it to:
-  - Set the issue file's frontmatter `status` to `Done` (or leave at `In Progress` if the run is incomplete and the user must verify before marking Done).
-  - Add a comment of the form: `- YYYY-MM-DD — Branch \`$BRANCH_NAME\`, commit <SHA> — <one-line summary>` (date from the shell, never fabricated).
-  - Propagate any status flip to the dependent index: `TODO/README.md` for top-level issues (`parent: null`), or the parent file's `## Sub-issues` line for sub-issues.
- If acceptance-criteria checkboxes were addressed by the implementation, ask `@pm` to check them off (flip `- [ ]` to `- [x]` under `## Acceptance criteria`).
+
+Dispatch `@pm` against the absolute `./TODO/` path. Ask it to:
+
+1. **Check off the AC checkboxes that task-1 satisfied.** For each `- [ ]` AC line in `./TODO/$ISSUE_ID.md`'s `## Acceptance criteria` section that the implemented work fulfilled, flip to `- [x]`. The orchestrator decides which AC are satisfied by inspecting task-1's spec and verification output.
+2. **Set the issue's frontmatter `status` based on AC completion** (ADR-21, AC-driven):
+   - **All AC are now `[x]`** → `status: Done`.
+   - **Some AC remain `[ ]`** → `status: In Progress`. (Sub-issues filed at Phase 5.5 cover the unmet AC; the user runs them in subsequent invocations.)
+   - **No AC section in the file** → `status: Done` (the parent had no testable AC; one task ran end-to-end).
+3. **Propagate any status flip to the dependent index:** `TODO/README.md` for top-level issues (`parent: null`), or the parent issue file's `## Sub-issues` line for sub-issues.
+4. **Add a comment** of the form: `- YYYY-MM-DD — Branch \`$BRANCH_NAME\`, commit <SHA> — <one-line summary>` (date from the shell, never fabricated).

 ### File Follow-ups

@ -512,11 +570,27 @@ If no follow-ups were filed, the commit message simplifies to `chore(todo): upda

 ## Failure Handling

-At any phase, if an unrecoverable error occurs:
+At any phase, if an unrecoverable error occurs (or a routing rule explicitly aborts to the Failure Handler — `PLAN_REWORK_REMAINING` exhausted, `split_needed` at Phase 7, etc.):
+
 1. Write `$RUN_DIR/summary.md` (creating `$RUN_DIR` first if it doesn't exist) with what was completed and what failed. Do **not** stage or commit anything under `.workflow/`.
 2. If any code was written, commit it with message `wip: incomplete workflow run for <issue-id>`. Stage code only — exclude `.workflow/` and `TODO/`.
 3. Leave the branch and worktree intact for the user to inspect — do not push, do not delete.
-4. Dispatch `@pm` against `./TODO/` to add a comment on the issue file (`./TODO/<ID>.md`) summarising what failed.
+4. Dispatch `@pm` against `./TODO/` to add a comment on the issue file (`./TODO/<ID>.md`) summarising what failed and naming the abort reason if it was a routing-rule abort (e.g. `split_needed at Phase 7 task-1`, `plan_rework_remaining exhausted at Phase 8`).
 5. Stop execution.

+### Recovery procedure (workflow is non-resumable, ADR-14)
+
+The workflow is **non-resumable**. There is no `--resume` mode and no idempotent re-run path. To retry after a Failure Handler invocation (or after a user-initiated cancellation):
+
+1. `git worktree remove <path>` — discard the failed worktree.
+2. Delete the feature branch: `git branch -D <branch>`. The Failure Handler's `wip:` commit (if any) is discarded with the branch.
+3. Re-create the worktree from `$BASE_BRANCH`: `git worktree add <path> -b <branch> <base-branch>`.
+4. Re-run `/workflow <ISSUE-ID>` from the fresh worktree.
+
+The throwaway-worktree model is the recovery story. Re-running on the same worktree without this cleanup risks committing partial state or appending duplicate `@pm` comments.
+
+### User-initiated cancellation
+
+User-initiated cancellation (Ctrl-C) follows the same procedure as automatic Failure Handler invocations: discard the worktree per the recovery procedure above. The orchestrator does not poll a soft-stop sentinel — runs are short enough (one task, ~10–30 min) that hard cancellation is fine.
+
 **Never hang on interactive prompts.** If any command appears to require input, treat it as a failure and follow the above procedure.
--- a/config/opencode/workflow-design.md
+++ b/config/opencode/workflow-design.md
@ -36,32 +36,36 @@ One orchestrator, five subagents. The orchestrator runs in `agent: build` mode;

 ### 3.1 Phase pipeline

-High-level happy path with the major escalation arms.
+High-level happy path with the major escalation arms. The workflow runs **one task per invocation** (ADR-21): Phase 5 produces N task files; if N>1, tasks 2…N are filed as sub-issues and only task 1 runs through Phases 6–8.

 ```mermaid
 flowchart TD
-    P1["Phase 1: Sanity Check"]
+    P1["Phase 1: Sanity Check<br/>incl clean tree + depends-on"]
    P2["Phase 2: Issue Context<br/>pm reads TODO/ID.md"]
    P3["Phase 3: Plan<br/>write plan.md"]
-    P4{"Phase 4: Review Plan<br/>check + simplify<br/>max 3 cycles"}
+    P4{"Phase 4: Review Plan<br/>check blocking, simplify advisory<br/>max 3 cycles"}
    P5["Phase 5: Split into Tasks<br/>write task-N.md"]
-    P55{"Phase 5.5: Review Split<br/>check<br/>max 2 cycles"}
+    P55{"Phase 5.5: Review Split<br/>check, 6 questions<br/>max 2 cycles"}
+    P5F["File tasks 2..N as sub-issues<br/>only when N more than 1"]
    P6["Phase 6: Write Tests<br/>test, stub-first make"]
-    P7["Phase 7: Implement<br/>make"]
-    P7E{"Test-design escalation<br/>max 2 cycles"}
-    P8{"Phase 8: Final Review<br/>check + simplify<br/>max 3 cycles"}
-    P9["Phase 9: Commit + TODO + Follow-ups + Summary"]
+    P7["Phase 7: Implement<br/>make, single task"]
+    P7E{"Implementation Incomplete<br/>check diagnoses<br/>max 2 cycles"}
+    P7F["split_needed: Failure Handler<br/>(discard worktree, re-run)"]
+    P8{"Phase 8: Final Review<br/>check blocking, simplify advisory<br/>max 3 cycles"}
+    P9["Phase 9: Commit + TODO + Follow-ups + Summary<br/>parent status AC-driven"]

    P1 --> P2 --> P3 --> P4
    P4 -->|ACCEPTABLE| P5 --> P55
    P4 -->|NEEDS WORK or BLOCK| P3
-    P55 -->|ACCEPTABLE| P6 --> P7
+    P55 -->|ACCEPTABLE| P5F --> P6 --> P7
    P55 -->|NEEDS WORK| P5
    P55 -->|BLOCK plan-level| P3
    P7 --> P8
-    P7 -.->|escalate test_design| P7E
-    P7E -->|check then test then make| P7
+    P7 -.->|Implementation Incomplete| P7E
+    P7E -->|test_design or production_logic| P7
+    P7E -.->|split_needed| P7F
    P7E -.->|2 cycles exhausted| P3
+    P7F --> ABORT([Failure Handler])
    P8 -->|ACCEPTABLE| P9
    P8 -->|production-code finding| P7
    P8 -->|test-design finding| P7E
@ -69,96 +73,114 @@ flowchart TD
    P9 --> END([Done])
 ```

+**Run-level cap:** `plan_rework_remaining` (default 1, ADR-13) decrements on every P5.5-BLOCK→P4, P7-escalation-exhaustion→P3, and P8-plan-level→P3 transition. Exhausted counter aborts to the Failure Handler.
+
 ### 3.2 Phase 7 escalation loop

-The pattern when `@make` cannot reach GREEN.
+The pattern when `@make` cannot reach GREEN. Unified diagnosis path (ADR-19): every Implementation Incomplete routes through `@check` test-diagnosis-first; `@check` returns one of three verdicts.

 ```mermaid
 stateDiagram-v2
    [*] --> Dispatched: orchestrator dispatches make
    Dispatched --> EntryCheck: run tests verify RED
    EntryCheck --> Implementing: failure code matches handoff
-    EntryCheck --> EntryEscalation: test-quality concern
+    EntryCheck --> CheckDiag: Implementation Incomplete
    Implementing --> GreenReached: tests pass within 2-3 attempts
-    Implementing --> MidEscalation: escalate test_design flag
-    Implementing --> MidStuck: incomplete no flag
-    MidStuck --> Implementing: re-dispatch with check notes 1 retry
-    MidStuck --> MidEscalation: still failing on retry
-    EntryEscalation --> CheckDiag
-    MidEscalation --> CheckDiag
-    CheckDiag --> TestRedesign: confirmed test-design error
-    CheckDiag --> Dispatched: rejected production issue
+    Implementing --> CheckDiag: Implementation Incomplete
+    CheckDiag --> TestRedesign: verdict test_design
+    CheckDiag --> Dispatched: verdict production_logic
+    CheckDiag --> FailureHandler: verdict split_needed
    TestRedesign --> Dispatched: test fixes fresh entry validation
    Dispatched --> PlanRevisit: 2 escalation cycles exhausted
    GreenReached --> [*]
-    PlanRevisit --> [*]: back to Phase 3
+    FailureHandler --> [*]: discard worktree, re-run
+    PlanRevisit --> [*]: back to Phase 3 if rework budget intact
 ```

 ### 3.3 Issue lifecycle

-How TODO entries move through statuses, with sub-issue filing during a run.
+How TODO entries move through statuses. In the one-task-per-run model (ADR-21), a single workflow invocation may file multiple sub-issues mid-run, and the parent's final status is AC-driven, not run-driven.

 ```mermaid
 stateDiagram-v2
    [*] --> Todo: issue file created
    Todo --> InProgress: Phase 2 workflow starts
-    InProgress --> Done: Phase 9 run completes
+    InProgress --> Done: Phase 9 - all parent AC checked
+    InProgress --> InProgress2: Phase 9 - some parent AC remain
    InProgress --> Todo: workflow fails, failure handler adds comment

+    note right of InProgress2
+        Parent stays In Progress when sub-issues
+        cover the unmet AC. User runs sub-issues
+        in subsequent /workflow invocations.
+    end note
+
    note right of InProgress
-        New sub-issues may be filed during Phase 9
-        with parent ISSUE_ID, status Todo
-        and label bug, followup, or tech-debt
+        Sub-issues filed during a run carry:
+        - parent: ISSUE_ID, status: Todo
+        - label: bug, followup, tech-debt, or split-from-run
+        - depends-on: [...] for cross-sub-issue ordering
+        - rich seed body for split-time filings (ADR-21)
    end note

    Done --> [*]
+    InProgress2 --> [*]
 ```

 ---

 ## 4. Routing Matrix

-Every observed `(phase, signal) → action`. Empty cells are gaps. Walking this table is the cheap way to spot routing issues like the recent Phase 7 mid-implementation escalation.
+Every observed `(phase, signal) → action`. Empty cells are gaps. Walking this table is the cheap way to spot routing issues.

 | Phase | Signal source | Signal | Action |
 |---|---|---|---|
 | 1 | Sanity checks | Bare repo / detached HEAD / missing `TODO/<ID>.md` / branch == base | Stop with error |
+| 1 | Sanity checks | Working tree dirty (`git status --porcelain` non-empty) | Stop with error (ADR-20) |
+| 1 | Sanity checks | `depends-on:` issue not in `Done` status | Stop with error (ADR-21) |
 | 2 | `@pm` | Issue not found | Stop with error |
 | 2 | `@pm` | Status is `Todo` | Flip to `In Progress`; propagate to README.md / parent's Sub-issues |
 | 3 | Orchestrator | Plan drafted | Apply Dispatch Hygiene; write `plan.md`; verify `test -f` |
-| 4 | `@check` + `@simplify` | Both ACCEPTABLE | Proceed to Phase 5 |
-| 4 | Either reviewer | NEEDS WORK | Edit `plan.md` in place; re-dispatch (max 3 cycles) |
+| 4 | `@check` | ACCEPTABLE (regardless of `@simplify`) | Proceed to Phase 5 |
+| 4 | `@check` | NEEDS WORK | Edit `plan.md` in place; re-dispatch (max 3 cycles) |
 | 4 | `@check` | BLOCK | Edit `plan.md` addressing the finding; re-dispatch |
-| 4 | Reviewers | Same finding twice | Convergence detected; stop loop early |
+| 4 | `@simplify` | Any verdict (ADR-15) | Advisory only — record in summary; never blocks Phase 4 progression |
+| 4 | Reviewers | Same `@check` finding twice | Convergence detected; stop loop early |
 | 4 | Reviewers | Unresolved after 3 cycles | Document blockers in summary; proceed |
 | 5 | Orchestrator | Tasks drafted | Apply Dispatch Hygiene; write each `task-N.md`; verify `test -f` for every N |
-| 5.5 | `@check` | ACCEPTABLE | Proceed to Phase 6 |
+| 5.5 | `@check` | ACCEPTABLE, N=1 | Skip P5.5 entirely (ADR-21); proceed to Phase 6 — degenerate split |
+| 5.5 | `@check` | ACCEPTABLE, N>1 | File tasks 2…N as sub-issues with rich seed bodies (ADR-21) via `@pm`; proceed to Phase 6 with task-1 only |
 | 5.5 | `@check` | NEEDS WORK | Edit `task-N.md` in place; re-dispatch (max 2 cycles) |
-| 5.5 | `@check` | BLOCK | Plan doesn't decompose cleanly; back to Phase 4 |
+| 5.5 | `@check` | BLOCK plan-level | Edit `plan.md` addressing the BLOCK finding; decrement `plan_rework_remaining`; re-enter Phase 4 (ADR-17) |
+| 5.5 | Run-level | `plan_rework_remaining` exhausted | Abort to Failure Handler (ADR-13) |
 | 6 | `@test` | TESTS_READY + `escalate_to_check: false` | Proceed to Phase 7 |
 | 6 | `@test` | TESTS_READY + `escalate_to_check: true` | `@check` light review → `@test` fixes → forward |
-| 6 | `@test` | NOT_TESTABLE (general) | `@check` sign-off; task goes to `@make` without tests |
+| 6 | `@test` | NOT_TESTABLE (general) | `@check` sign-off; task goes to `@make` without tests; record in NOT_TESTABLE manifest for Phase 8 (ADR-18) |
 | 6 | `@test` | NOT_TESTABLE: Missing testability seam | `@make` adds the seam; re-run `@test` |
 | 6 | `@test` | BLOCKED | Investigate; may need spec or plan revision |
 | 6 | `@test` (stub-first) | All tests pass with zero `todo!()` panics | Reject — structural-only tests; route back to `@test` to rewrite |
 | 7 | `@make` | Implementation Complete | Proceed to Phase 8 |
-| 7 | `@make` | Implementation Incomplete + entry-validation flag | `@check` (test diagnosis) → `@test` (fixes) → fresh `@make` |
-| 7 | `@make` | Implementation Incomplete + `escalate: test_design` | Same path; max 2 escalation cycles |
-| 7 | `@make` | Implementation Incomplete (no flag) | Re-dispatch with `@check` notes once; if 2nd attempt fails, treat as `escalate: test_design` |
-| 7 | Escalation loop | 2 cycles exhausted | Back to Phase 3 (plan revisit) |
-| 8 | `@check` + `@simplify` | ACCEPTABLE | Proceed to Phase 9 |
-| 8 | `@check` | BLOCK / behavioral / production-code finding | New `@make` task spec from finding; dispatch (max 3 cycles) |
-| 8 | `@check` | BLOCK / test-design / test-quality finding | `@check` → `@test` → `@make` re-verify |
-| 8 | `@check` | BLOCK / plan-level finding | Back to Phase 3 with the finding |
-| 8 | `@simplify` | Advisory | Record in summary's "Advisory notes (not filed)" |
+| 7 | `@make` | Implementation Incomplete (any flag or no flag) | Route through `@check` test-diagnosis-first (ADR-19); orchestrator follows `@check`'s verdict |
+| 7 | `@check` diagnosis | `test_design` | Dispatch `@test` to redesign tests; fresh `@make` re-attempt |
+| 7 | `@check` diagnosis | `production_logic` | Re-dispatch `@make` with `@check`'s production-side notes |
+| 7 | `@check` diagnosis | `split_needed` | Abort to Failure Handler (ADR-21 / Q19a). In the one-task-per-run model task-1 is the only task; no AC have been satisfied; recovery is "discard worktree, re-plan from scratch." `@pm` adds a comment recording the diagnosis. |
+| 7 | Escalation loop | 2 cycles exhausted | Decrement `plan_rework_remaining`; back to Phase 3 (plan revisit) |
+| 8 | `@check` | ACCEPTABLE | Proceed to Phase 9 |
+| 8 | `@check` | BLOCK / behavioral / production-code finding | Write `task-fix-<N>.md` to `$RUN_DIR/` (ADR-16); dispatch `@make` against it (max 3 cycles) |
+| 8 | `@check` | BLOCK / test-design / test-quality finding | Route through `@check` diagnosis → `@test` → `@make` re-verify |
+| 8 | `@check` | BLOCK / plan-level finding | Decrement `plan_rework_remaining`; back to Phase 3 with the finding |
+| 8 | `@simplify` | Any verdict (ADR-15) | Advisory only — record in summary; never blocks Phase 8 progression |
 | 8 | Reviewers | Strictly cosmetic finding (typo, missing newline, AST-preserving) | Orchestrator fixes directly; re-review |
+| 8 | Reviewers | NOT_TESTABLE manifest task flagged as questionable | Apply same routing as a normal `@check` finding for that task |
 | 8 | Review loop | Same finding twice | Convergence; stop loop |
 | 8 | Review loop | 3 cycles exhausted | Document blockers; proceed |
 | 9 | Orchestrator | Pre-existing bug, out of scope | File sub-issue via `@pm` (label: `bug`) |
 | 9 | Orchestrator | Unresolved review-loop blocker | File sub-issue via `@pm` (label: `followup`) |
 | 9 | `@test` (Phase 6) | NOT_TESTABLE future-seam note | File sub-issue via `@pm` (label: `tech-debt`) |
 | 9 | Orchestrator | `@simplify` advisory not acted on | Record in summary; do NOT file (records, not work) |
-| 9 | Orchestrator | All commits done | Set issue status to `Done`; sync README/parent; commit `chore(todo): …` |
+| 9 | Orchestrator | All parent AC checked off | Set issue status to `Done`; sync README/parent; commit `chore(todo): …` |
+| 9 | Orchestrator | Some parent AC remain unchecked AND sub-issues exist | Leave issue at `In Progress`; commit `chore(todo): …` |
+| Run-level | Failure Handler | Workflow is non-resumable (ADR-14) | Document the cleanup procedure: `git worktree remove`, delete branch, re-create from base, retry |

 ---

@ -215,6 +237,8 @@ ADR-flavoured. New decisions append at the end. If a decision is later reversed
 **Alternatives:** accept compile-error RED; let `@make` write tests + bodies in one pass; allow `@test` to add stubs to production source.
 **Consequences:** two atomic commits per affected task (`feat: scaffold X with todo!() stubs`, then `feat: implement X`). Stub-pass scope is tight: bodies are exactly `todo!()`, signatures must match the planned final API. Phase 6 also adds a mandatory panic-coverage check after `@test`: every test must panic on `todo!()` to prove it actually exercises the stubbed symbols (catches structural-only tests).

+**On reviewer bypass:** the stub-pass commit is not sent through Phase 5.5 or Phase 8 review. The bypass is intentional and safe because (a) stubs are mechanical — signatures plus `todo!()`, no logic; (b) the body-pass commit *is* reviewed and the body-pass diff strictly subsumes the stub-pass diff (the same signatures, now with bodies); (c) Phase 6's mandatory panic-coverage check is what actually validates that the stubs are exercised. Reviewing the stub-pass would duplicate work that the body-pass review catches anyway.
+
 ### ADR-8 (2026-05-08) — `@test` may write inside `#[cfg(test)] mod` blocks

 **Context:** Rust unit tests live colocated in production source files inside `#[cfg(test)] mod tests { … }` blocks — the canonical idiom, not an edge case. Original `@test` File Constraint forbade `src/` writes entirely, which forced `@make` to write both production code and tests in a single dispatch. This lost the RED→GREEN separation that TDD relies on.
@ -248,7 +272,85 @@ ADR-flavoured. New decisions append at the end. If a decision is later reversed
 **Context:** Phase 7's escalation rule was gated on `@make` flagging concerns *during entry validation* (the RED check before implementing). When `@make` got past entry validation, started implementing, and then ground for 2-3 attempts because the test demanded impossible production code, the orchestrator had no documented route — it would re-dispatch `@make` with marginal context tweaks instead of recognizing the diagnosis as test-architecture failure.
 **Decision:** split Phase 7's escalation into entry-validation and mid-implementation paths. `@make` reports `escalate: test_design` when its iteration limit is reached and the test seems to demand impossible / unreasonable code. Both paths route through `@check` (test diagnosis) → `@test` (redesign) → fresh `@make` dispatch. Max 2 escalation cycles before reverting to Phase 3 plan revisit.
 **Alternatives:** status quo; let `@make` modify test files itself.
-**Consequences:** faster recovery from test-design errors. Bounded loop prevents thrashing. `@make.md` Iteration Limits section gains a new red-flag class.
+**Consequences:** faster recovery from test-design errors. Bounded loop prevents thrashing. `@make.md` Iteration Limits section gains a new red-flag class. *Superseded in part by ADR-19 (unified diagnosis path).*
+
+### ADR-13 (2026-05-08) — Run-level `plan_rework_remaining` counter
+
+**Context:** several routes return control to an upstream phase when downstream signals reveal the upstream artifact was wrong: P5.5-BLOCK→P4 (split doesn't decompose), P7-escalation-exhaustion→P3 (test/code thrash exceeded its bound), P8-plan-level→P3 (final review exposes a plan defect). Each upstream phase has its own per-loop cycle cap (P4 max 3, P5.5 max 2, etc.), but those caps reset on every re-entry — so a run could in principle thrash P3↔P4↔P5.5↔P3 indefinitely without violating any local rule.
+**Decision:** introduce one run-level counter, `plan_rework_remaining`, default value `1`. It decrements on every transition where downstream signal forces upstream rework: `P5.5 BLOCK → P4`, `P7 escalation exhausted → P3`, `P8 plan-level finding → P3`. When the counter is `0` and another such transition fires, abort to the Failure Handler instead of re-entering. Per-phase cycle caps are unchanged.
+**Alternatives:** (a) a global `max_subagent_dispatches` budget — over-engineered for the specific failure mode; (b) document the resets as intentional and rely on convergence detection — leaves the bug present.
+**Consequences:** at most two plan attempts per run (the initial plan plus one revision). Failure Handler invocation distinct in cause from earlier-phase aborts: the cleanup is the same (per ADR-14) but the summary explains *which* downstream signal exhausted the budget.
+
+### ADR-14 (2026-05-08) — Workflow is non-resumable
+
+**Context:** Phase 9 has multiple sub-steps (code commit → `@pm` status update → file follow-ups → TODO commit → summary). Crashing between any two sub-steps leaves the worktree in a state that earlier docs called "partial." The original Failure Handler did not flip status back, did not recognize partial-Phase-9 separately from earlier-phase crashes, and re-running `/workflow` after a crash could append new comments and re-do work indefinitely.
+**Decision:** declare the workflow non-resumable. On any failure (Failure Handler invocation), the recovery procedure is: `git worktree remove` the failed worktree, delete the feature branch, re-create the worktree from `$BASE_BRANCH`, then re-run `/workflow`. Document this explicitly in the Failure Handler section. The throwaway-worktree model means there is no in-place resume state to corrupt — the user discards the worktree and starts fresh.
+**Alternatives:** (a) smarter Failure Handler that cleans up partial state idempotently; (b) transactional Phase 9 via a state file; (c) idempotent sub-steps so re-runs auto-resume.
+**Consequences:** simplest possible recovery model. Phase 9 sub-step ordering doesn't need to be defended against partial failures — partial state is acceptable because the recovery is "discard everything and re-run." User-initiated cancellation (Ctrl-C) follows the same procedure.
+
+### ADR-15 (2026-05-08) — `@simplify` is advisory at every gate
+
+**Context:** the Phase 4 routing matrix used to read "Either reviewer NEEDS WORK → re-dispatch the loop," giving `@simplify` veto power equivalent to `@check`'s. Phase 8's matrix said `@simplify` was advisory only ("Record in summary's 'Advisory notes (not filed)'"). Same agent, two different powers.
+**Decision:** `@simplify` is advisory at every gate. Its findings are recorded in the run summary; they never force a re-dispatch loop. `@check` is the only reviewer with veto authority (NEEDS WORK / BLOCK).
+**Alternatives:** (a) make `@simplify` blocking everywhere — too heavy for a heuristic agent prone to false positives; (b) keep the asymmetry and document a principle — fragile.
+**Consequences:** uniform model — `@check` enforces correctness, `@simplify` advises on shape. Phase 4 review loops only run on `@check` findings; `@simplify` complexity flags get logged in the summary like at Phase 8. The user can manually promote a `@simplify` finding if it matters.
+
+### ADR-16 (2026-05-08) — Phase 8 fix specs go to disk
+
+**Context:** when Phase 8 review surfaced a behavioral or production-code finding, the orchestrator would "build a new `@make` task spec from the finding" and dispatch it inline. That violates ADR-6's invariant (run artifacts on disk, no inline paraphrase) — and exactly when it matters most, because Phase 8 has up to 3 review cycles and the same finding can re-dispatch.
+**Decision:** Phase 8 fix dispatches write a new artifact `$RUN_DIR/task-fix-<N>.md` (1-indexed within the Phase 8 cycle) before dispatching `@make`. Same Dispatch Hygiene rules as Phase 5 task specs, same `test -f` verification. Cosmetic findings (orchestrator fixes directly per workflow.md) skip the file — only `@make`-dispatched findings get one.
+**Alternatives:** (a) inline in the dispatch prompt with an ADR-6 footnote — erodes the invariant for the highest-risk dispatch class; (b) edit the original `task-N.md` — muddies the audit trail of an already-met spec.
+**Consequences:** ADR-6's invariant holds end-to-end. Phase 8 cycles re-dispatch against the same on-disk file (mid-loop edits in place), eliminating paraphrase drift across review cycles.
+
+### ADR-17 (2026-05-08) — Phase 5.5 BLOCK protocol
+
+**Context:** Phase 5.5 BLOCK ("plan does not decompose cleanly") used to route "back to Phase 4 with `@check`'s finding," but `@check` at 5.5 evaluated the *split*, not the plan; its finding may not map cleanly to a plan edit. Re-entering P4 with the same `plan.md` and a finding tagged on the prompt asks the wrong question.
+**Decision:** on P5.5 BLOCK, the orchestrator translates the split-level finding into a concrete `plan.md` edit (e.g. "the plan conflates structural and runtime work; split into two milestones"), saves the edit, decrements `plan_rework_remaining` (per ADR-13), and re-dispatches Phase 4 reviewers against the *revised* plan. P4 reviewers see a genuinely different plan.
+**Alternatives:** (a) re-dispatch P4 unchanged with finding attached — burns reviewers on a known-broken plan; (b) treat P5.5 BLOCK as terminal — too strict, we have the rework budget for one revisit.
+**Consequences:** P5.5 BLOCK is an effective signal. The orchestrator's plan-edit step is mandatory; skipping it is a routing error. Run-level rework budget bounds the loop.
+
+### ADR-18 (2026-05-08) — Phase 8 NOT_TESTABLE manifest
+
+**Context:** Phase 6 routes NOT_TESTABLE tasks through `@check` for sign-off, then dispatches `@make` without tests. Phase 8 reviews the diff but has no signal that "this change has no test because `@test` claimed it untestable." If `@check` at P6 was wrong, untested code ships.
+**Decision:** Phase 8's dispatch prompt includes a "Tasks completed without tests (NOT_TESTABLE)" section listing each task ID, the `@test` justification, and the `@check` sign-off rationale. Reviewers explicitly evaluate "does the justification still hold given the final diff?" If a reviewer pushes back, routing follows the normal Phase 8 finding rules.
+**Alternatives:** (a) double-up `@check` + `@simplify` at P6 NOT_TESTABLE granting — doubles dispatch cost without targeting the actual gap; (b) restrict NOT_TESTABLE to a fixed taxonomy — won't generalize across languages; (c) reject NOT_TESTABLE entirely — ignores legitimate cases.
+**Consequences:** pure plumbing change. P8 reviewers gain visibility into the bypass without new agents or new authority.
+
+### ADR-19 (2026-05-08) — Unified Implementation Incomplete diagnosis path
+
+**Context:** ADR-12 introduced three paths for `@make` reporting Implementation Incomplete: entry-validation flag, mid-impl `escalate: test_design` flag, no flag (re-dispatch with `@check` notes once, escalate after second failure). Three paths converging on the same destination (`@check` test-diagnosis → `@test` redesign or `@make` re-dispatch) added matrix surface and obscured the routing.
+**Decision:** every Implementation Incomplete from `@make` routes through `@check` test-diagnosis-first. `@check` returns one of three verdicts — `test_design` (route to `@test` redesign), `production_logic` (re-dispatch `@make` with `@check`'s notes), or `split_needed` (per ADR-21). `@make`'s self-diagnosis flag becomes a *hint* for `@check`, not a control-flow input for the orchestrator.
+**Alternatives:** (a) keep three paths, tighten what `@check` reviews in each — preserves the surface area; (b) push burden to `@make` — orchestrator still needs to gate via `@check`.
+**Consequences:** routing logic shrinks. Matrix has fewer rows. ADR-12's split-into-two-paths is partially superseded — the *escalation diagnosis* unified, the iteration limit (max 2 cycles) preserved.
+
+### ADR-20 (2026-05-08) — Phase 1 working-tree cleanliness check
+
+**Context:** Phase 1 verified non-bare repo, branch identity, base branch, issue file presence — but not that the working tree was clean. Stale uncommitted edits would be swept into the Phase 9 commit (workflow.md stages "code changes only" but doesn't distinguish *which* code) or a `wip:` failure commit.
+**Decision:** Phase 1 runs `git status --porcelain`; if non-empty, stop with: "Working tree must be clean. Commit or stash uncommitted changes before running the workflow."
+**Alternatives:** (a) capture initial dirty state, stage only files modified by the workflow at Phase 9 — error-prone baseline tracking; (b) document the requirement, don't enforce — `// TODO: don't forget` in design-doc form.
+**Consequences:** matches the ADR-14 throwaway-worktree model. One additional sanity-check line. User's "but I have manual edits I want the workflow to build on" case is solved by them committing those edits first, which is what they should do anyway.
+
+### ADR-21 (2026-05-08) — One-task-per-run model
+
+**Context:** the workflow originally executed N tasks per run, sequentially through Phase 7. That introduced cross-task regression risk (task 4 breaks task 1's tests, found N tasks late), big-diff Phase 8 reviews (multi-day branches accumulate thousands of diff lines that hit reviewer context limits silently), and the mid-flight task-split problem (when `@make` discovers task N is over-scoped, no documented route to re-split). It also coupled the workflow's success to "all N tasks complete," when in practice an issue worth one good commit shouldn't depend on unrelated downstream work succeeding.
+**Decision:** every workflow run executes **exactly one task** through Phases 6–8. Phase 5 still splits the plan into N tasks via the Split Heuristic. If N=1, proceed normally. If N>1, the orchestrator dispatches `@pm` to file tasks 2…M as TODO sub-issues *before* Phase 6 starts, and only task 1 runs through Phases 6–8. If task 1 itself reports `split_needed` mid-Phase-7, abort to the Failure Handler (Q19a: in the one-task-per-run model task-1 is the only task in the run, so no feature AC have been satisfied; recovery is the standard non-resumable cleanup from ADR-14).
+
+The model carries five sub-decisions:
+
+1. **Sub-issue body schema for split-time filings:** rich seed body that lets a fresh `/workflow` invocation re-plan and implement without seeing siblings or the original `plan.md`. Includes task description + AC + Code Context + Integration Contracts (declared in frontmatter as `depends-on: [...]`) + relevant slice of `plan.md` + Test Design section if present + a "Discovered during run on `$BRANCH_NAME` for parent issue `$ISSUE_ID`" attribution paragraph.
+2. **Phase 5.5 review questions strengthen to six**: coverage, no overlap, single-purpose, integration contracts (with stronger bar — must be self-contained for cross-session use), testable AC, and *self-containment* (is each task spec runnable as a standalone `/workflow` invocation?). Self-containment is the new load-bearing question because each filed sub-issue runs in isolation.
+3. **Split Heuristic recalibration**: keep the existing mechanical thresholds (>2 concerns, >50 lines across >2 files, mixes structural + runtime, etc.) but add a "default to no split" tiebreaker — when in doubt, do not split, because splitting now fans out across user sessions with full orchestration overhead per sub-issue.
+4. **Parent issue status is AC-driven**: Phase 9's existing AC checkbox logic (workflow.md flips ticked AC) determines status. If all parent AC are checked → `Done`; if some remain unchecked → stays `In Progress` with the filed sub-issues covering the remaining work.
+5. **`depends-on:` frontmatter and Phase 1 enforcement**: `@pm` schema gains a `depends-on: [<ID>, ...]` list. Phase 1 sanity check refuses to start if any listed dependency is not `Done`. Hard block — soft-warn means the user (who has walked away) doesn't see the warning until later.
+
+**Auto-resolved problems:**
+- Mid-flight task split (formerly Q2 in Open Questions, ADR-12's adjacent gap): collapses into "file as sub-issue and exit."
+- Big-diff Phase 8 reviews: one task = bounded diff (~50 lines per Split Heuristic). No big-diff problem possible.
+- Cross-task regression within a run: no cross-task regressions possible inside a single-task run; subsequent sub-issue runs detect them at their own Phase 7 entry validation (which runs the project's test suite).
+- Skip-P5.5-when-N=1 optimization: trivially satisfied — N=1 from Phase 6 onward in every run.
+
+**Alternatives:** (a) keep N-task runs, add mid-flight re-splitting via P7→P5 re-entry — doesn't solve big-diff or cross-task regression; (b) keep N-task runs, accept the gaps — leaves three known-bad routes; (c) always one task per issue (skip Phase 5 entirely) — loses the planning-phase split heuristic that's catching legitimate over-scoping at design time.
+**Consequences:** runs become shorter and more focused. Each commit/PR carries a bounded scope. Sub-issue fan-out becomes the primary scaling mechanism for multi-step work. `TODO/` sees more sub-issue files; `@pm`'s split-time filing path becomes a hot code path. Concurrent runs in different worktrees on the same repo become trivially safe because each worktree has its own `TODO/` checkout (file conflicts surface as standard git merge conflicts at integration time, not as mid-run race conditions).

 ---

@ -260,20 +362,28 @@ When a question gets answered, move it to the [Design decisions log](#5-design-d

 Currently Phase 5.5 reviews the **split** (coverage, overlap, single-purpose, integration contracts, testable AC). It does *not* explicitly evaluate whether the test approach implied by each task spec is sound. That would partially overlap with Phase 4 (which has a plan-level Test Design section the reviewers evaluate). If a test-design error escapes Phase 4 and is encoded in a task spec, it surfaces at Phase 7 via the mid-impl escalation (ADR-12) — but earlier detection might be cheaper. Open: should Phase 5.5 add "test approach for each task is sound" as a sixth review question, or is that scope creep into Phase 4 territory?

-### Q2: How does the orchestrator handle "split heuristic violated only after attempting a task"?
+### ~~Q2~~: Mid-flight task split — *closed by ADR-21*

-Phase 5's Split Heuristic catches obvious over-/under-split cases at planning time. But sometimes a task that *looked* single-purpose during planning turns out to mix structural and runtime work only when `@make` starts implementing it. There's no documented mid-Phase-7 routing for "this task needs to be split now." Currently `@make` would either thrash (mid-impl escalation, ADR-12), or report the spec is ambiguous (Insufficient Context Protocol in `make.md`). Open: should there be a "split mid-flight" route that takes the task back to Phase 5 for re-splitting?
+The one-task-per-run model collapses this question. When `@make` discovers task-1 is over-scoped, the unified diagnosis path (ADR-19) returns `split_needed` from `@check`, and the orchestrator aborts to the Failure Handler (no P5 re-entry, no sub-issue filing — the recovery is "discard worktree, re-plan from scratch"). Tasks 2…M are already filed as sub-issues at Phase 5.5 acceptance, so there's no "remaining tasks" cleanup to think about.

-### Q3: Phase 9 has no rollback for partial commits if it fails between sub-steps
+### ~~Q3~~: Phase 9 partial-commit rollback — *closed by ADR-14*

-Phase 9's order is: code commit → TODO update → file follow-ups → commit TODO changes → write summary. If the workflow crashes between code commit and TODO commit, the worktree has the code change but the issue file still says `In Progress`. The Failure Handler covers earlier-phase crashes but Phase-9-internal partial states aren't explicitly addressed. Open: should the Failure Handler distinguish "Phase 9 partial" and resume from the right sub-step on retry, or is leaving manual cleanup to the user good enough?
+The workflow is non-resumable. Phase 9 partial states are addressed by the throwaway-worktree recovery procedure: discard the worktree, delete the branch, re-create from base, re-run. Phase 9 sub-step ordering doesn't need to defend against partial failures because the recovery is "discard everything and re-run."

 ### Q4: `@simplify` not involved at Phase 5.5 — is that the right call?

-Phase 5.5 only dispatches `@check`. Rationale (ADR-9) is that split review is structural, not complexity. But `@simplify`'s lens — "what if we deleted this?" — could legitimately catch unnecessary tasks (e.g. a third task that adds an abstraction nothing else needs). Open: is the cost of one more dispatch worth the catch?
+Phase 5.5 only dispatches `@check`. Rationale (ADR-9) is that split review is structural, not complexity. But `@simplify`'s lens — "what if we deleted this?" — could legitimately catch unnecessary tasks (e.g. a third task that adds an abstraction nothing else needs). With ADR-21's one-task-per-run pivot, this question gains a different angle: a `@simplify` flag on a sibling sub-issue at Phase 5.5 could prevent filing a wasteful sub-issue, which is more valuable than catching the same redundancy at Phase 8 of a future run. Open: is the cost of one more dispatch worth the catch, especially now that Phase 5.5 is the gate for sub-issue fan-out?

 ### Q5: Test-design loop bound vs plan-revisit threshold

 ADR-12 sets max 2 cycles for the Phase 7 test-design escalation before reverting to Phase 3 plan revisit. The plan-review and final-review loops have max 3. Why the asymmetry? The test-design loop is more expensive per cycle (`@check` + `@test` + `@make` re-implement vs. just reviewers + plan edit), so 2 may be right. But the choice was made by feel, not measured. Open: is 2 the right number, or should it match Phase 4 / Phase 8 at 3?

+### Q6: Sub-issue ordering in the parent's `## Sub-issues` list
+
+ADR-21's split-time filing creates new sub-issues with `depends-on:` declarations, but the parent's `## Sub-issues` list (rendered by `@pm`) is currently flat. When dependencies form a chain (sub-issue 2 depends on 1), the user has to read the chain from each sub-issue's frontmatter. Open: should `@pm` render the parent's sub-issue list in dependency order, with a visible indicator (e.g. indentation or `↳`) for dependent items? Cosmetic but would speed up "what to run next" decisions.
+
+### Q7: Concurrent-worktree edge case — sub-issue ID collisions
+
+Two parallel runs in different worktrees, each filing sub-issues, can both pick the same next ID (e.g. both pick `GAL-42` because both saw `GAL-41` as the highest at start). On merge, git surfaces this as a conflict over `TODO/GAL-42.md` content (two different files staked on the same name). Recoverable but annoying. Open: should `@pm`'s ID generation use a strategy that's safer under concurrent runs (e.g. timestamp suffix, branch-prefix, content-addressable), or accept the merge-conflict-on-collision cost given the one-user assumption?
+
 ---