feat(opencode): one-task-per-run model + 9 routing fixes (ADRs 13-21)

Captures the design grilling outcome. Adds ADRs 13-21 covering:
- run-level plan_rework_remaining counter to bound P3<->P5.5/P7/P8 thrash
- non-resumable workflow with throwaway-worktree recovery procedure
- @simplify advisory at every gate (not just Phase 8)
- Phase 8 fix specs go to disk as task-fix-N.md (preserves ADR-6)
- Phase 5.5 BLOCK protocol: orchestrator edits plan, decrements counter, re-enters P4
- Phase 8 NOT_TESTABLE manifest in reviewer prompt
- unified Implementation Incomplete diagnosis (test_design / production_logic / split_needed)
- Phase 1 working-tree cleanliness + depends-on enforcement
- one-task-per-run pivot: Phase 5 still splits N tasks, only task-1 runs;
  tasks 2..N filed as sub-issues with rich seed bodies; split_needed at P7
  aborts to Failure Handler (one-task-per-run = no salvageable prior work)

Auto-resolves big-diff Phase 8 reviews, cross-task regression-within-run, and
mid-flight task-split routing. Rewrites routing matrix and three Mermaid
diagrams; updates @pm (depends-on frontmatter, split-time filing), @check
(third diagnosis verdict), @make (escalate: split_needed flag).
This commit is contained in:
Harald Hoyer 2026-05-08 13:02:54 +02:00
parent 0b15944d1c
commit af6481a5a7
5 changed files with 342 additions and 130 deletions

View file

@ -36,32 +36,36 @@ One orchestrator, five subagents. The orchestrator runs in `agent: build` mode;
### 3.1 Phase pipeline
High-level happy path with the major escalation arms.
High-level happy path with the major escalation arms. The workflow runs **one task per invocation** (ADR-21): Phase 5 produces N task files; if N>1, tasks 2…N are filed as sub-issues and only task 1 runs through Phases 68.
```mermaid
flowchart TD
P1["Phase 1: Sanity Check"]
P1["Phase 1: Sanity Check<br/>incl clean tree + depends-on"]
P2["Phase 2: Issue Context<br/>pm reads TODO/ID.md"]
P3["Phase 3: Plan<br/>write plan.md"]
P4{"Phase 4: Review Plan<br/>check + simplify<br/>max 3 cycles"}
P4{"Phase 4: Review Plan<br/>check blocking, simplify advisory<br/>max 3 cycles"}
P5["Phase 5: Split into Tasks<br/>write task-N.md"]
P55{"Phase 5.5: Review Split<br/>check<br/>max 2 cycles"}
P55{"Phase 5.5: Review Split<br/>check, 6 questions<br/>max 2 cycles"}
P5F["File tasks 2..N as sub-issues<br/>only when N more than 1"]
P6["Phase 6: Write Tests<br/>test, stub-first make"]
P7["Phase 7: Implement<br/>make"]
P7E{"Test-design escalation<br/>max 2 cycles"}
P8{"Phase 8: Final Review<br/>check + simplify<br/>max 3 cycles"}
P9["Phase 9: Commit + TODO + Follow-ups + Summary"]
P7["Phase 7: Implement<br/>make, single task"]
P7E{"Implementation Incomplete<br/>check diagnoses<br/>max 2 cycles"}
P7F["split_needed: Failure Handler<br/>(discard worktree, re-run)"]
P8{"Phase 8: Final Review<br/>check blocking, simplify advisory<br/>max 3 cycles"}
P9["Phase 9: Commit + TODO + Follow-ups + Summary<br/>parent status AC-driven"]
P1 --> P2 --> P3 --> P4
P4 -->|ACCEPTABLE| P5 --> P55
P4 -->|NEEDS WORK or BLOCK| P3
P55 -->|ACCEPTABLE| P6 --> P7
P55 -->|ACCEPTABLE| P5F --> P6 --> P7
P55 -->|NEEDS WORK| P5
P55 -->|BLOCK plan-level| P3
P7 --> P8
P7 -.->|escalate test_design| P7E
P7E -->|check then test then make| P7
P7 -.->|Implementation Incomplete| P7E
P7E -->|test_design or production_logic| P7
P7E -.->|split_needed| P7F
P7E -.->|2 cycles exhausted| P3
P7F --> ABORT([Failure Handler])
P8 -->|ACCEPTABLE| P9
P8 -->|production-code finding| P7
P8 -->|test-design finding| P7E
@ -69,96 +73,114 @@ flowchart TD
P9 --> END([Done])
```
**Run-level cap:** `plan_rework_remaining` (default 1, ADR-13) decrements on every P5.5-BLOCK→P4, P7-escalation-exhaustion→P3, and P8-plan-level→P3 transition. Exhausted counter aborts to the Failure Handler.
### 3.2 Phase 7 escalation loop
The pattern when `@make` cannot reach GREEN.
The pattern when `@make` cannot reach GREEN. Unified diagnosis path (ADR-19): every Implementation Incomplete routes through `@check` test-diagnosis-first; `@check` returns one of three verdicts.
```mermaid
stateDiagram-v2
[*] --> Dispatched: orchestrator dispatches make
Dispatched --> EntryCheck: run tests verify RED
EntryCheck --> Implementing: failure code matches handoff
EntryCheck --> EntryEscalation: test-quality concern
EntryCheck --> CheckDiag: Implementation Incomplete
Implementing --> GreenReached: tests pass within 2-3 attempts
Implementing --> MidEscalation: escalate test_design flag
Implementing --> MidStuck: incomplete no flag
MidStuck --> Implementing: re-dispatch with check notes 1 retry
MidStuck --> MidEscalation: still failing on retry
EntryEscalation --> CheckDiag
MidEscalation --> CheckDiag
CheckDiag --> TestRedesign: confirmed test-design error
CheckDiag --> Dispatched: rejected production issue
Implementing --> CheckDiag: Implementation Incomplete
CheckDiag --> TestRedesign: verdict test_design
CheckDiag --> Dispatched: verdict production_logic
CheckDiag --> FailureHandler: verdict split_needed
TestRedesign --> Dispatched: test fixes fresh entry validation
Dispatched --> PlanRevisit: 2 escalation cycles exhausted
GreenReached --> [*]
PlanRevisit --> [*]: back to Phase 3
FailureHandler --> [*]: discard worktree, re-run
PlanRevisit --> [*]: back to Phase 3 if rework budget intact
```
### 3.3 Issue lifecycle
How TODO entries move through statuses, with sub-issue filing during a run.
How TODO entries move through statuses. In the one-task-per-run model (ADR-21), a single workflow invocation may file multiple sub-issues mid-run, and the parent's final status is AC-driven, not run-driven.
```mermaid
stateDiagram-v2
[*] --> Todo: issue file created
Todo --> InProgress: Phase 2 workflow starts
InProgress --> Done: Phase 9 run completes
InProgress --> Done: Phase 9 - all parent AC checked
InProgress --> InProgress2: Phase 9 - some parent AC remain
InProgress --> Todo: workflow fails, failure handler adds comment
note right of InProgress2
Parent stays In Progress when sub-issues
cover the unmet AC. User runs sub-issues
in subsequent /workflow invocations.
end note
note right of InProgress
New sub-issues may be filed during Phase 9
with parent ISSUE_ID, status Todo
and label bug, followup, or tech-debt
Sub-issues filed during a run carry:
- parent: ISSUE_ID, status: Todo
- label: bug, followup, tech-debt, or split-from-run
- depends-on: [...] for cross-sub-issue ordering
- rich seed body for split-time filings (ADR-21)
end note
Done --> [*]
InProgress2 --> [*]
```
---
## 4. Routing Matrix
Every observed `(phase, signal) → action`. Empty cells are gaps. Walking this table is the cheap way to spot routing issues like the recent Phase 7 mid-implementation escalation.
Every observed `(phase, signal) → action`. Empty cells are gaps. Walking this table is the cheap way to spot routing issues.
| Phase | Signal source | Signal | Action |
|---|---|---|---|
| 1 | Sanity checks | Bare repo / detached HEAD / missing `TODO/<ID>.md` / branch == base | Stop with error |
| 1 | Sanity checks | Working tree dirty (`git status --porcelain` non-empty) | Stop with error (ADR-20) |
| 1 | Sanity checks | `depends-on:` issue not in `Done` status | Stop with error (ADR-21) |
| 2 | `@pm` | Issue not found | Stop with error |
| 2 | `@pm` | Status is `Todo` | Flip to `In Progress`; propagate to README.md / parent's Sub-issues |
| 3 | Orchestrator | Plan drafted | Apply Dispatch Hygiene; write `plan.md`; verify `test -f` |
| 4 | `@check` + `@simplify` | Both ACCEPTABLE | Proceed to Phase 5 |
| 4 | Either reviewer | NEEDS WORK | Edit `plan.md` in place; re-dispatch (max 3 cycles) |
| 4 | `@check` | ACCEPTABLE (regardless of `@simplify`) | Proceed to Phase 5 |
| 4 | `@check` | NEEDS WORK | Edit `plan.md` in place; re-dispatch (max 3 cycles) |
| 4 | `@check` | BLOCK | Edit `plan.md` addressing the finding; re-dispatch |
| 4 | Reviewers | Same finding twice | Convergence detected; stop loop early |
| 4 | `@simplify` | Any verdict (ADR-15) | Advisory only — record in summary; never blocks Phase 4 progression |
| 4 | Reviewers | Same `@check` finding twice | Convergence detected; stop loop early |
| 4 | Reviewers | Unresolved after 3 cycles | Document blockers in summary; proceed |
| 5 | Orchestrator | Tasks drafted | Apply Dispatch Hygiene; write each `task-N.md`; verify `test -f` for every N |
| 5.5 | `@check` | ACCEPTABLE | Proceed to Phase 6 |
| 5.5 | `@check` | ACCEPTABLE, N=1 | Skip P5.5 entirely (ADR-21); proceed to Phase 6 — degenerate split |
| 5.5 | `@check` | ACCEPTABLE, N>1 | File tasks 2…N as sub-issues with rich seed bodies (ADR-21) via `@pm`; proceed to Phase 6 with task-1 only |
| 5.5 | `@check` | NEEDS WORK | Edit `task-N.md` in place; re-dispatch (max 2 cycles) |
| 5.5 | `@check` | BLOCK | Plan doesn't decompose cleanly; back to Phase 4 |
| 5.5 | `@check` | BLOCK plan-level | Edit `plan.md` addressing the BLOCK finding; decrement `plan_rework_remaining`; re-enter Phase 4 (ADR-17) |
| 5.5 | Run-level | `plan_rework_remaining` exhausted | Abort to Failure Handler (ADR-13) |
| 6 | `@test` | TESTS_READY + `escalate_to_check: false` | Proceed to Phase 7 |
| 6 | `@test` | TESTS_READY + `escalate_to_check: true` | `@check` light review → `@test` fixes → forward |
| 6 | `@test` | NOT_TESTABLE (general) | `@check` sign-off; task goes to `@make` without tests |
| 6 | `@test` | NOT_TESTABLE (general) | `@check` sign-off; task goes to `@make` without tests; record in NOT_TESTABLE manifest for Phase 8 (ADR-18) |
| 6 | `@test` | NOT_TESTABLE: Missing testability seam | `@make` adds the seam; re-run `@test` |
| 6 | `@test` | BLOCKED | Investigate; may need spec or plan revision |
| 6 | `@test` (stub-first) | All tests pass with zero `todo!()` panics | Reject — structural-only tests; route back to `@test` to rewrite |
| 7 | `@make` | Implementation Complete | Proceed to Phase 8 |
| 7 | `@make` | Implementation Incomplete + entry-validation flag | `@check` (test diagnosis) → `@test` (fixes) → fresh `@make` |
| 7 | `@make` | Implementation Incomplete + `escalate: test_design` | Same path; max 2 escalation cycles |
| 7 | `@make` | Implementation Incomplete (no flag) | Re-dispatch with `@check` notes once; if 2nd attempt fails, treat as `escalate: test_design` |
| 7 | Escalation loop | 2 cycles exhausted | Back to Phase 3 (plan revisit) |
| 8 | `@check` + `@simplify` | ACCEPTABLE | Proceed to Phase 9 |
| 8 | `@check` | BLOCK / behavioral / production-code finding | New `@make` task spec from finding; dispatch (max 3 cycles) |
| 8 | `@check` | BLOCK / test-design / test-quality finding | `@check``@test``@make` re-verify |
| 8 | `@check` | BLOCK / plan-level finding | Back to Phase 3 with the finding |
| 8 | `@simplify` | Advisory | Record in summary's "Advisory notes (not filed)" |
| 7 | `@make` | Implementation Incomplete (any flag or no flag) | Route through `@check` test-diagnosis-first (ADR-19); orchestrator follows `@check`'s verdict |
| 7 | `@check` diagnosis | `test_design` | Dispatch `@test` to redesign tests; fresh `@make` re-attempt |
| 7 | `@check` diagnosis | `production_logic` | Re-dispatch `@make` with `@check`'s production-side notes |
| 7 | `@check` diagnosis | `split_needed` | Abort to Failure Handler (ADR-21 / Q19a). In the one-task-per-run model task-1 is the only task; no AC have been satisfied; recovery is "discard worktree, re-plan from scratch." `@pm` adds a comment recording the diagnosis. |
| 7 | Escalation loop | 2 cycles exhausted | Decrement `plan_rework_remaining`; back to Phase 3 (plan revisit) |
| 8 | `@check` | ACCEPTABLE | Proceed to Phase 9 |
| 8 | `@check` | BLOCK / behavioral / production-code finding | Write `task-fix-<N>.md` to `$RUN_DIR/` (ADR-16); dispatch `@make` against it (max 3 cycles) |
| 8 | `@check` | BLOCK / test-design / test-quality finding | Route through `@check` diagnosis → `@test``@make` re-verify |
| 8 | `@check` | BLOCK / plan-level finding | Decrement `plan_rework_remaining`; back to Phase 3 with the finding |
| 8 | `@simplify` | Any verdict (ADR-15) | Advisory only — record in summary; never blocks Phase 8 progression |
| 8 | Reviewers | Strictly cosmetic finding (typo, missing newline, AST-preserving) | Orchestrator fixes directly; re-review |
| 8 | Reviewers | NOT_TESTABLE manifest task flagged as questionable | Apply same routing as a normal `@check` finding for that task |
| 8 | Review loop | Same finding twice | Convergence; stop loop |
| 8 | Review loop | 3 cycles exhausted | Document blockers; proceed |
| 9 | Orchestrator | Pre-existing bug, out of scope | File sub-issue via `@pm` (label: `bug`) |
| 9 | Orchestrator | Unresolved review-loop blocker | File sub-issue via `@pm` (label: `followup`) |
| 9 | `@test` (Phase 6) | NOT_TESTABLE future-seam note | File sub-issue via `@pm` (label: `tech-debt`) |
| 9 | Orchestrator | `@simplify` advisory not acted on | Record in summary; do NOT file (records, not work) |
| 9 | Orchestrator | All commits done | Set issue status to `Done`; sync README/parent; commit `chore(todo): …` |
| 9 | Orchestrator | All parent AC checked off | Set issue status to `Done`; sync README/parent; commit `chore(todo): …` |
| 9 | Orchestrator | Some parent AC remain unchecked AND sub-issues exist | Leave issue at `In Progress`; commit `chore(todo): …` |
| Run-level | Failure Handler | Workflow is non-resumable (ADR-14) | Document the cleanup procedure: `git worktree remove`, delete branch, re-create from base, retry |
---
@ -215,6 +237,8 @@ ADR-flavoured. New decisions append at the end. If a decision is later reversed
**Alternatives:** accept compile-error RED; let `@make` write tests + bodies in one pass; allow `@test` to add stubs to production source.
**Consequences:** two atomic commits per affected task (`feat: scaffold X with todo!() stubs`, then `feat: implement X`). Stub-pass scope is tight: bodies are exactly `todo!()`, signatures must match the planned final API. Phase 6 also adds a mandatory panic-coverage check after `@test`: every test must panic on `todo!()` to prove it actually exercises the stubbed symbols (catches structural-only tests).
**On reviewer bypass:** the stub-pass commit is not sent through Phase 5.5 or Phase 8 review. The bypass is intentional and safe because (a) stubs are mechanical — signatures plus `todo!()`, no logic; (b) the body-pass commit *is* reviewed and the body-pass diff strictly subsumes the stub-pass diff (the same signatures, now with bodies); (c) Phase 6's mandatory panic-coverage check is what actually validates that the stubs are exercised. Reviewing the stub-pass would duplicate work that the body-pass review catches anyway.
### ADR-8 (2026-05-08) — `@test` may write inside `#[cfg(test)] mod` blocks
**Context:** Rust unit tests live colocated in production source files inside `#[cfg(test)] mod tests { … }` blocks — the canonical idiom, not an edge case. Original `@test` File Constraint forbade `src/` writes entirely, which forced `@make` to write both production code and tests in a single dispatch. This lost the RED→GREEN separation that TDD relies on.
@ -248,7 +272,85 @@ ADR-flavoured. New decisions append at the end. If a decision is later reversed
**Context:** Phase 7's escalation rule was gated on `@make` flagging concerns *during entry validation* (the RED check before implementing). When `@make` got past entry validation, started implementing, and then ground for 2-3 attempts because the test demanded impossible production code, the orchestrator had no documented route — it would re-dispatch `@make` with marginal context tweaks instead of recognizing the diagnosis as test-architecture failure.
**Decision:** split Phase 7's escalation into entry-validation and mid-implementation paths. `@make` reports `escalate: test_design` when its iteration limit is reached and the test seems to demand impossible / unreasonable code. Both paths route through `@check` (test diagnosis) → `@test` (redesign) → fresh `@make` dispatch. Max 2 escalation cycles before reverting to Phase 3 plan revisit.
**Alternatives:** status quo; let `@make` modify test files itself.
**Consequences:** faster recovery from test-design errors. Bounded loop prevents thrashing. `@make.md` Iteration Limits section gains a new red-flag class.
**Consequences:** faster recovery from test-design errors. Bounded loop prevents thrashing. `@make.md` Iteration Limits section gains a new red-flag class. *Superseded in part by ADR-19 (unified diagnosis path).*
### ADR-13 (2026-05-08) — Run-level `plan_rework_remaining` counter
**Context:** several routes return control to an upstream phase when downstream signals reveal the upstream artifact was wrong: P5.5-BLOCK→P4 (split doesn't decompose), P7-escalation-exhaustion→P3 (test/code thrash exceeded its bound), P8-plan-level→P3 (final review exposes a plan defect). Each upstream phase has its own per-loop cycle cap (P4 max 3, P5.5 max 2, etc.), but those caps reset on every re-entry — so a run could in principle thrash P3↔P4↔P5.5↔P3 indefinitely without violating any local rule.
**Decision:** introduce one run-level counter, `plan_rework_remaining`, default value `1`. It decrements on every transition where downstream signal forces upstream rework: `P5.5 BLOCK → P4`, `P7 escalation exhausted → P3`, `P8 plan-level finding → P3`. When the counter is `0` and another such transition fires, abort to the Failure Handler instead of re-entering. Per-phase cycle caps are unchanged.
**Alternatives:** (a) a global `max_subagent_dispatches` budget — over-engineered for the specific failure mode; (b) document the resets as intentional and rely on convergence detection — leaves the bug present.
**Consequences:** at most two plan attempts per run (the initial plan plus one revision). Failure Handler invocation distinct in cause from earlier-phase aborts: the cleanup is the same (per ADR-14) but the summary explains *which* downstream signal exhausted the budget.
### ADR-14 (2026-05-08) — Workflow is non-resumable
**Context:** Phase 9 has multiple sub-steps (code commit → `@pm` status update → file follow-ups → TODO commit → summary). Crashing between any two sub-steps leaves the worktree in a state that earlier docs called "partial." The original Failure Handler did not flip status back, did not recognize partial-Phase-9 separately from earlier-phase crashes, and re-running `/workflow` after a crash could append new comments and re-do work indefinitely.
**Decision:** declare the workflow non-resumable. On any failure (Failure Handler invocation), the recovery procedure is: `git worktree remove` the failed worktree, delete the feature branch, re-create the worktree from `$BASE_BRANCH`, then re-run `/workflow`. Document this explicitly in the Failure Handler section. The throwaway-worktree model means there is no in-place resume state to corrupt — the user discards the worktree and starts fresh.
**Alternatives:** (a) smarter Failure Handler that cleans up partial state idempotently; (b) transactional Phase 9 via a state file; (c) idempotent sub-steps so re-runs auto-resume.
**Consequences:** simplest possible recovery model. Phase 9 sub-step ordering doesn't need to be defended against partial failures — partial state is acceptable because the recovery is "discard everything and re-run." User-initiated cancellation (Ctrl-C) follows the same procedure.
### ADR-15 (2026-05-08) — `@simplify` is advisory at every gate
**Context:** the Phase 4 routing matrix used to read "Either reviewer NEEDS WORK → re-dispatch the loop," giving `@simplify` veto power equivalent to `@check`'s. Phase 8's matrix said `@simplify` was advisory only ("Record in summary's 'Advisory notes (not filed)'"). Same agent, two different powers.
**Decision:** `@simplify` is advisory at every gate. Its findings are recorded in the run summary; they never force a re-dispatch loop. `@check` is the only reviewer with veto authority (NEEDS WORK / BLOCK).
**Alternatives:** (a) make `@simplify` blocking everywhere — too heavy for a heuristic agent prone to false positives; (b) keep the asymmetry and document a principle — fragile.
**Consequences:** uniform model — `@check` enforces correctness, `@simplify` advises on shape. Phase 4 review loops only run on `@check` findings; `@simplify` complexity flags get logged in the summary like at Phase 8. The user can manually promote a `@simplify` finding if it matters.
### ADR-16 (2026-05-08) — Phase 8 fix specs go to disk
**Context:** when Phase 8 review surfaced a behavioral or production-code finding, the orchestrator would "build a new `@make` task spec from the finding" and dispatch it inline. That violates ADR-6's invariant (run artifacts on disk, no inline paraphrase) — and exactly when it matters most, because Phase 8 has up to 3 review cycles and the same finding can re-dispatch.
**Decision:** Phase 8 fix dispatches write a new artifact `$RUN_DIR/task-fix-<N>.md` (1-indexed within the Phase 8 cycle) before dispatching `@make`. Same Dispatch Hygiene rules as Phase 5 task specs, same `test -f` verification. Cosmetic findings (orchestrator fixes directly per workflow.md) skip the file — only `@make`-dispatched findings get one.
**Alternatives:** (a) inline in the dispatch prompt with an ADR-6 footnote — erodes the invariant for the highest-risk dispatch class; (b) edit the original `task-N.md` — muddies the audit trail of an already-met spec.
**Consequences:** ADR-6's invariant holds end-to-end. Phase 8 cycles re-dispatch against the same on-disk file (mid-loop edits in place), eliminating paraphrase drift across review cycles.
### ADR-17 (2026-05-08) — Phase 5.5 BLOCK protocol
**Context:** Phase 5.5 BLOCK ("plan does not decompose cleanly") used to route "back to Phase 4 with `@check`'s finding," but `@check` at 5.5 evaluated the *split*, not the plan; its finding may not map cleanly to a plan edit. Re-entering P4 with the same `plan.md` and a finding tagged on the prompt asks the wrong question.
**Decision:** on P5.5 BLOCK, the orchestrator translates the split-level finding into a concrete `plan.md` edit (e.g. "the plan conflates structural and runtime work; split into two milestones"), saves the edit, decrements `plan_rework_remaining` (per ADR-13), and re-dispatches Phase 4 reviewers against the *revised* plan. P4 reviewers see a genuinely different plan.
**Alternatives:** (a) re-dispatch P4 unchanged with finding attached — burns reviewers on a known-broken plan; (b) treat P5.5 BLOCK as terminal — too strict, we have the rework budget for one revisit.
**Consequences:** P5.5 BLOCK is an effective signal. The orchestrator's plan-edit step is mandatory; skipping it is a routing error. Run-level rework budget bounds the loop.
### ADR-18 (2026-05-08) — Phase 8 NOT_TESTABLE manifest
**Context:** Phase 6 routes NOT_TESTABLE tasks through `@check` for sign-off, then dispatches `@make` without tests. Phase 8 reviews the diff but has no signal that "this change has no test because `@test` claimed it untestable." If `@check` at P6 was wrong, untested code ships.
**Decision:** Phase 8's dispatch prompt includes a "Tasks completed without tests (NOT_TESTABLE)" section listing each task ID, the `@test` justification, and the `@check` sign-off rationale. Reviewers explicitly evaluate "does the justification still hold given the final diff?" If a reviewer pushes back, routing follows the normal Phase 8 finding rules.
**Alternatives:** (a) double-up `@check` + `@simplify` at P6 NOT_TESTABLE granting — doubles dispatch cost without targeting the actual gap; (b) restrict NOT_TESTABLE to a fixed taxonomy — won't generalize across languages; (c) reject NOT_TESTABLE entirely — ignores legitimate cases.
**Consequences:** pure plumbing change. P8 reviewers gain visibility into the bypass without new agents or new authority.
### ADR-19 (2026-05-08) — Unified Implementation Incomplete diagnosis path
**Context:** ADR-12 introduced three paths for `@make` reporting Implementation Incomplete: entry-validation flag, mid-impl `escalate: test_design` flag, no flag (re-dispatch with `@check` notes once, escalate after second failure). Three paths converging on the same destination (`@check` test-diagnosis → `@test` redesign or `@make` re-dispatch) added matrix surface and obscured the routing.
**Decision:** every Implementation Incomplete from `@make` routes through `@check` test-diagnosis-first. `@check` returns one of three verdicts — `test_design` (route to `@test` redesign), `production_logic` (re-dispatch `@make` with `@check`'s notes), or `split_needed` (per ADR-21). `@make`'s self-diagnosis flag becomes a *hint* for `@check`, not a control-flow input for the orchestrator.
**Alternatives:** (a) keep three paths, tighten what `@check` reviews in each — preserves the surface area; (b) push burden to `@make` — orchestrator still needs to gate via `@check`.
**Consequences:** routing logic shrinks. Matrix has fewer rows. ADR-12's split-into-two-paths is partially superseded — the *escalation diagnosis* unified, the iteration limit (max 2 cycles) preserved.
### ADR-20 (2026-05-08) — Phase 1 working-tree cleanliness check
**Context:** Phase 1 verified non-bare repo, branch identity, base branch, issue file presence — but not that the working tree was clean. Stale uncommitted edits would be swept into the Phase 9 commit (workflow.md stages "code changes only" but doesn't distinguish *which* code) or a `wip:` failure commit.
**Decision:** Phase 1 runs `git status --porcelain`; if non-empty, stop with: "Working tree must be clean. Commit or stash uncommitted changes before running the workflow."
**Alternatives:** (a) capture initial dirty state, stage only files modified by the workflow at Phase 9 — error-prone baseline tracking; (b) document the requirement, don't enforce — `// TODO: don't forget` in design-doc form.
**Consequences:** matches the ADR-14 throwaway-worktree model. One additional sanity-check line. User's "but I have manual edits I want the workflow to build on" case is solved by them committing those edits first, which is what they should do anyway.
### ADR-21 (2026-05-08) — One-task-per-run model
**Context:** the workflow originally executed N tasks per run, sequentially through Phase 7. That introduced cross-task regression risk (task 4 breaks task 1's tests, found N tasks late), big-diff Phase 8 reviews (multi-day branches accumulate thousands of diff lines that hit reviewer context limits silently), and the mid-flight task-split problem (when `@make` discovers task N is over-scoped, no documented route to re-split). It also coupled the workflow's success to "all N tasks complete," when in practice an issue worth one good commit shouldn't depend on unrelated downstream work succeeding.
**Decision:** every workflow run executes **exactly one task** through Phases 68. Phase 5 still splits the plan into N tasks via the Split Heuristic. If N=1, proceed normally. If N>1, the orchestrator dispatches `@pm` to file tasks 2…M as TODO sub-issues *before* Phase 6 starts, and only task 1 runs through Phases 68. If task 1 itself reports `split_needed` mid-Phase-7, abort to the Failure Handler (Q19a: in the one-task-per-run model task-1 is the only task in the run, so no feature AC have been satisfied; recovery is the standard non-resumable cleanup from ADR-14).
The model carries five sub-decisions:
1. **Sub-issue body schema for split-time filings:** rich seed body that lets a fresh `/workflow` invocation re-plan and implement without seeing siblings or the original `plan.md`. Includes task description + AC + Code Context + Integration Contracts (declared in frontmatter as `depends-on: [...]`) + relevant slice of `plan.md` + Test Design section if present + a "Discovered during run on `$BRANCH_NAME` for parent issue `$ISSUE_ID`" attribution paragraph.
2. **Phase 5.5 review questions strengthen to six**: coverage, no overlap, single-purpose, integration contracts (with stronger bar — must be self-contained for cross-session use), testable AC, and *self-containment* (is each task spec runnable as a standalone `/workflow` invocation?). Self-containment is the new load-bearing question because each filed sub-issue runs in isolation.
3. **Split Heuristic recalibration**: keep the existing mechanical thresholds (>2 concerns, >50 lines across >2 files, mixes structural + runtime, etc.) but add a "default to no split" tiebreaker — when in doubt, do not split, because splitting now fans out across user sessions with full orchestration overhead per sub-issue.
4. **Parent issue status is AC-driven**: Phase 9's existing AC checkbox logic (workflow.md flips ticked AC) determines status. If all parent AC are checked → `Done`; if some remain unchecked → stays `In Progress` with the filed sub-issues covering the remaining work.
5. **`depends-on:` frontmatter and Phase 1 enforcement**: `@pm` schema gains a `depends-on: [<ID>, ...]` list. Phase 1 sanity check refuses to start if any listed dependency is not `Done`. Hard block — soft-warn means the user (who has walked away) doesn't see the warning until later.
**Auto-resolved problems:**
- Mid-flight task split (formerly Q2 in Open Questions, ADR-12's adjacent gap): collapses into "file as sub-issue and exit."
- Big-diff Phase 8 reviews: one task = bounded diff (~50 lines per Split Heuristic). No big-diff problem possible.
- Cross-task regression within a run: no cross-task regressions possible inside a single-task run; subsequent sub-issue runs detect them at their own Phase 7 entry validation (which runs the project's test suite).
- Skip-P5.5-when-N=1 optimization: trivially satisfied — N=1 from Phase 6 onward in every run.
**Alternatives:** (a) keep N-task runs, add mid-flight re-splitting via P7→P5 re-entry — doesn't solve big-diff or cross-task regression; (b) keep N-task runs, accept the gaps — leaves three known-bad routes; (c) always one task per issue (skip Phase 5 entirely) — loses the planning-phase split heuristic that's catching legitimate over-scoping at design time.
**Consequences:** runs become shorter and more focused. Each commit/PR carries a bounded scope. Sub-issue fan-out becomes the primary scaling mechanism for multi-step work. `TODO/` sees more sub-issue files; `@pm`'s split-time filing path becomes a hot code path. Concurrent runs in different worktrees on the same repo become trivially safe because each worktree has its own `TODO/` checkout (file conflicts surface as standard git merge conflicts at integration time, not as mid-run race conditions).
---
@ -260,20 +362,28 @@ When a question gets answered, move it to the [Design decisions log](#5-design-d
Currently Phase 5.5 reviews the **split** (coverage, overlap, single-purpose, integration contracts, testable AC). It does *not* explicitly evaluate whether the test approach implied by each task spec is sound. That would partially overlap with Phase 4 (which has a plan-level Test Design section the reviewers evaluate). If a test-design error escapes Phase 4 and is encoded in a task spec, it surfaces at Phase 7 via the mid-impl escalation (ADR-12) — but earlier detection might be cheaper. Open: should Phase 5.5 add "test approach for each task is sound" as a sixth review question, or is that scope creep into Phase 4 territory?
### Q2: How does the orchestrator handle "split heuristic violated only after attempting a task"?
### ~~Q2~~: Mid-flight task split — *closed by ADR-21*
Phase 5's Split Heuristic catches obvious over-/under-split cases at planning time. But sometimes a task that *looked* single-purpose during planning turns out to mix structural and runtime work only when `@make` starts implementing it. There's no documented mid-Phase-7 routing for "this task needs to be split now." Currently `@make` would either thrash (mid-impl escalation, ADR-12), or report the spec is ambiguous (Insufficient Context Protocol in `make.md`). Open: should there be a "split mid-flight" route that takes the task back to Phase 5 for re-splitting?
The one-task-per-run model collapses this question. When `@make` discovers task-1 is over-scoped, the unified diagnosis path (ADR-19) returns `split_needed` from `@check`, and the orchestrator aborts to the Failure Handler (no P5 re-entry, no sub-issue filing — the recovery is "discard worktree, re-plan from scratch"). Tasks 2…M are already filed as sub-issues at Phase 5.5 acceptance, so there's no "remaining tasks" cleanup to think about.
### Q3: Phase 9 has no rollback for partial commits if it fails between sub-steps
### ~~Q3~~: Phase 9 partial-commit rollback — *closed by ADR-14*
Phase 9's order is: code commit → TODO update → file follow-ups → commit TODO changes → write summary. If the workflow crashes between code commit and TODO commit, the worktree has the code change but the issue file still says `In Progress`. The Failure Handler covers earlier-phase crashes but Phase-9-internal partial states aren't explicitly addressed. Open: should the Failure Handler distinguish "Phase 9 partial" and resume from the right sub-step on retry, or is leaving manual cleanup to the user good enough?
The workflow is non-resumable. Phase 9 partial states are addressed by the throwaway-worktree recovery procedure: discard the worktree, delete the branch, re-create from base, re-run. Phase 9 sub-step ordering doesn't need to defend against partial failures because the recovery is "discard everything and re-run."
### Q4: `@simplify` not involved at Phase 5.5 — is that the right call?
Phase 5.5 only dispatches `@check`. Rationale (ADR-9) is that split review is structural, not complexity. But `@simplify`'s lens — "what if we deleted this?" — could legitimately catch unnecessary tasks (e.g. a third task that adds an abstraction nothing else needs). Open: is the cost of one more dispatch worth the catch?
Phase 5.5 only dispatches `@check`. Rationale (ADR-9) is that split review is structural, not complexity. But `@simplify`'s lens — "what if we deleted this?" — could legitimately catch unnecessary tasks (e.g. a third task that adds an abstraction nothing else needs). With ADR-21's one-task-per-run pivot, this question gains a different angle: a `@simplify` flag on a sibling sub-issue at Phase 5.5 could prevent filing a wasteful sub-issue, which is more valuable than catching the same redundancy at Phase 8 of a future run. Open: is the cost of one more dispatch worth the catch, especially now that Phase 5.5 is the gate for sub-issue fan-out?
### Q5: Test-design loop bound vs plan-revisit threshold
ADR-12 sets max 2 cycles for the Phase 7 test-design escalation before reverting to Phase 3 plan revisit. The plan-review and final-review loops have max 3. Why the asymmetry? The test-design loop is more expensive per cycle (`@check` + `@test` + `@make` re-implement vs. just reviewers + plan edit), so 2 may be right. But the choice was made by feel, not measured. Open: is 2 the right number, or should it match Phase 4 / Phase 8 at 3?
### Q6: Sub-issue ordering in the parent's `## Sub-issues` list
ADR-21's split-time filing creates new sub-issues with `depends-on:` declarations, but the parent's `## Sub-issues` list (rendered by `@pm`) is currently flat. When dependencies form a chain (sub-issue 2 depends on 1), the user has to read the chain from each sub-issue's frontmatter. Open: should `@pm` render the parent's sub-issue list in dependency order, with a visible indicator (e.g. indentation or `↳`) for dependent items? Cosmetic but would speed up "what to run next" decisions.
### Q7: Concurrent-worktree edge case — sub-issue ID collisions
Two parallel runs in different worktrees, each filing sub-issues, can both pick the same next ID (e.g. both pick `GAL-42` because both saw `GAL-41` as the highest at start). On merge, git surfaces this as a conflict over `TODO/GAL-42.md` content (two different files staked on the same name). Recoverable but annoying. Open: should `@pm`'s ID generation use a strategy that's safer under concurrent runs (e.g. timestamp suffix, branch-prefix, content-addressable), or accept the merge-conflict-on-collision cost given the one-user assumption?
---