nixcfg/config/opencode/workflow-design.md
Harald Hoyer af6481a5a7 feat(opencode): one-task-per-run model + 9 routing fixes (ADRs 13-21)
Captures the design grilling outcome. Adds ADRs 13-21 covering:
- run-level plan_rework_remaining counter to bound P3<->P5.5/P7/P8 thrash
- non-resumable workflow with throwaway-worktree recovery procedure
- @simplify advisory at every gate (not just Phase 8)
- Phase 8 fix specs go to disk as task-fix-N.md (preserves ADR-6)
- Phase 5.5 BLOCK protocol: orchestrator edits plan, decrements counter, re-enters P4
- Phase 8 NOT_TESTABLE manifest in reviewer prompt
- unified Implementation Incomplete diagnosis (test_design / production_logic / split_needed)
- Phase 1 working-tree cleanliness + depends-on enforcement
- one-task-per-run pivot: Phase 5 still splits N tasks, only task-1 runs;
  tasks 2..N filed as sub-issues with rich seed bodies; split_needed at P7
  aborts to Failure Handler (one-task-per-run = no salvageable prior work)

Auto-resolves big-diff Phase 8 reviews, cross-task regression-within-run, and
mid-flight task-split routing. Rewrites routing matrix and three Mermaid
diagrams; updates @pm (depends-on frontmatter, split-time filing), @check
(third diagnosis verdict), @make (escalate: split_needed flag).
2026-05-08 13:02:54 +02:00

389 lines
40 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Workflow Design
## 1. Purpose
This document is the **design rationale and decision log** for the multi-agent workflow. The operational rules — what the orchestrator does, in what order, with what guardrails — live in [`commands/workflow.md`](commands/workflow.md) and the agent files under [`agents/`](agents/). This document is where we discuss changes *before* they land in those files.
**Intended flow:**
1. A new idea, gap, or failure mode comes up (often from a real run).
2. Discuss in this document — capture context, options, trade-offs.
3. When a decision is reached, update `commands/workflow.md` and/or the relevant agent file.
4. Record the decision in the [Design decisions log](#5-design-decisions-log) below.
The operational files stay terse and procedural. The "why" lives here.
---
## 2. Cast & Responsibilities
One orchestrator, five subagents. The orchestrator runs in `agent: build` mode; the subagents are defined as separate agent files under `config/opencode/agents/`.
| Actor | File | Role | Boundary |
|---|---|---|---|
| **Orchestrator** | `commands/workflow.md` | Plans, dispatches, merges findings, edits artifacts under `.workflow/`, commits. | **Does not** write production code, write tests, or play any subagent's role. |
| `@check` | `agents/check.md` | Reviews plans / task splits / code for risks, correctness, testability. | Read-only — no write / edit / bash. |
| `@simplify` | `agents/simplify.md` | Reviews for unnecessary complexity. Advisory only. | Read-only. |
| `@test` | `agents/test.md` | Writes failing tests for a task spec, verifies RED. | May modify test files / `#[cfg(test)] mod` blocks. Sandboxed bash. |
| `@make` | `agents/make.md` | Implements a single task spec. Verifies acceptance criteria. | May modify files listed in the task spec. Sandboxed bash; no `git` / network / `cd`. |
| `@pm` | `agents/pm.md` | Reads / updates `TODO/` issue files. | May modify only `TODO/` contents. No bash. |
**Permission boundaries are enforced per agent.** The orchestrator (in `agent: build` mode) has full edit/bash capabilities, which is precisely why it must not act as the subagents — the agent files are where the limits live.
---
## 3. Flow Diagrams
### 3.1 Phase pipeline
High-level happy path with the major escalation arms. The workflow runs **one task per invocation** (ADR-21): Phase 5 produces N task files; if N>1, tasks 2…N are filed as sub-issues and only task 1 runs through Phases 68.
```mermaid
flowchart TD
P1["Phase 1: Sanity Check<br/>incl clean tree + depends-on"]
P2["Phase 2: Issue Context<br/>pm reads TODO/ID.md"]
P3["Phase 3: Plan<br/>write plan.md"]
P4{"Phase 4: Review Plan<br/>check blocking, simplify advisory<br/>max 3 cycles"}
P5["Phase 5: Split into Tasks<br/>write task-N.md"]
P55{"Phase 5.5: Review Split<br/>check, 6 questions<br/>max 2 cycles"}
P5F["File tasks 2..N as sub-issues<br/>only when N more than 1"]
P6["Phase 6: Write Tests<br/>test, stub-first make"]
P7["Phase 7: Implement<br/>make, single task"]
P7E{"Implementation Incomplete<br/>check diagnoses<br/>max 2 cycles"}
P7F["split_needed: Failure Handler<br/>(discard worktree, re-run)"]
P8{"Phase 8: Final Review<br/>check blocking, simplify advisory<br/>max 3 cycles"}
P9["Phase 9: Commit + TODO + Follow-ups + Summary<br/>parent status AC-driven"]
P1 --> P2 --> P3 --> P4
P4 -->|ACCEPTABLE| P5 --> P55
P4 -->|NEEDS WORK or BLOCK| P3
P55 -->|ACCEPTABLE| P5F --> P6 --> P7
P55 -->|NEEDS WORK| P5
P55 -->|BLOCK plan-level| P3
P7 --> P8
P7 -.->|Implementation Incomplete| P7E
P7E -->|test_design or production_logic| P7
P7E -.->|split_needed| P7F
P7E -.->|2 cycles exhausted| P3
P7F --> ABORT([Failure Handler])
P8 -->|ACCEPTABLE| P9
P8 -->|production-code finding| P7
P8 -->|test-design finding| P7E
P8 -->|plan-level finding| P3
P9 --> END([Done])
```
**Run-level cap:** `plan_rework_remaining` (default 1, ADR-13) decrements on every P5.5-BLOCK→P4, P7-escalation-exhaustion→P3, and P8-plan-level→P3 transition. Exhausted counter aborts to the Failure Handler.
### 3.2 Phase 7 escalation loop
The pattern when `@make` cannot reach GREEN. Unified diagnosis path (ADR-19): every Implementation Incomplete routes through `@check` test-diagnosis-first; `@check` returns one of three verdicts.
```mermaid
stateDiagram-v2
[*] --> Dispatched: orchestrator dispatches make
Dispatched --> EntryCheck: run tests verify RED
EntryCheck --> Implementing: failure code matches handoff
EntryCheck --> CheckDiag: Implementation Incomplete
Implementing --> GreenReached: tests pass within 2-3 attempts
Implementing --> CheckDiag: Implementation Incomplete
CheckDiag --> TestRedesign: verdict test_design
CheckDiag --> Dispatched: verdict production_logic
CheckDiag --> FailureHandler: verdict split_needed
TestRedesign --> Dispatched: test fixes fresh entry validation
Dispatched --> PlanRevisit: 2 escalation cycles exhausted
GreenReached --> [*]
FailureHandler --> [*]: discard worktree, re-run
PlanRevisit --> [*]: back to Phase 3 if rework budget intact
```
### 3.3 Issue lifecycle
How TODO entries move through statuses. In the one-task-per-run model (ADR-21), a single workflow invocation may file multiple sub-issues mid-run, and the parent's final status is AC-driven, not run-driven.
```mermaid
stateDiagram-v2
[*] --> Todo: issue file created
Todo --> InProgress: Phase 2 workflow starts
InProgress --> Done: Phase 9 - all parent AC checked
InProgress --> InProgress2: Phase 9 - some parent AC remain
InProgress --> Todo: workflow fails, failure handler adds comment
note right of InProgress2
Parent stays In Progress when sub-issues
cover the unmet AC. User runs sub-issues
in subsequent /workflow invocations.
end note
note right of InProgress
Sub-issues filed during a run carry:
- parent: ISSUE_ID, status: Todo
- label: bug, followup, tech-debt, or split-from-run
- depends-on: [...] for cross-sub-issue ordering
- rich seed body for split-time filings (ADR-21)
end note
Done --> [*]
InProgress2 --> [*]
```
---
## 4. Routing Matrix
Every observed `(phase, signal) → action`. Empty cells are gaps. Walking this table is the cheap way to spot routing issues.
| Phase | Signal source | Signal | Action |
|---|---|---|---|
| 1 | Sanity checks | Bare repo / detached HEAD / missing `TODO/<ID>.md` / branch == base | Stop with error |
| 1 | Sanity checks | Working tree dirty (`git status --porcelain` non-empty) | Stop with error (ADR-20) |
| 1 | Sanity checks | `depends-on:` issue not in `Done` status | Stop with error (ADR-21) |
| 2 | `@pm` | Issue not found | Stop with error |
| 2 | `@pm` | Status is `Todo` | Flip to `In Progress`; propagate to README.md / parent's Sub-issues |
| 3 | Orchestrator | Plan drafted | Apply Dispatch Hygiene; write `plan.md`; verify `test -f` |
| 4 | `@check` | ACCEPTABLE (regardless of `@simplify`) | Proceed to Phase 5 |
| 4 | `@check` | NEEDS WORK | Edit `plan.md` in place; re-dispatch (max 3 cycles) |
| 4 | `@check` | BLOCK | Edit `plan.md` addressing the finding; re-dispatch |
| 4 | `@simplify` | Any verdict (ADR-15) | Advisory only — record in summary; never blocks Phase 4 progression |
| 4 | Reviewers | Same `@check` finding twice | Convergence detected; stop loop early |
| 4 | Reviewers | Unresolved after 3 cycles | Document blockers in summary; proceed |
| 5 | Orchestrator | Tasks drafted | Apply Dispatch Hygiene; write each `task-N.md`; verify `test -f` for every N |
| 5.5 | `@check` | ACCEPTABLE, N=1 | Skip P5.5 entirely (ADR-21); proceed to Phase 6 — degenerate split |
| 5.5 | `@check` | ACCEPTABLE, N>1 | File tasks 2…N as sub-issues with rich seed bodies (ADR-21) via `@pm`; proceed to Phase 6 with task-1 only |
| 5.5 | `@check` | NEEDS WORK | Edit `task-N.md` in place; re-dispatch (max 2 cycles) |
| 5.5 | `@check` | BLOCK plan-level | Edit `plan.md` addressing the BLOCK finding; decrement `plan_rework_remaining`; re-enter Phase 4 (ADR-17) |
| 5.5 | Run-level | `plan_rework_remaining` exhausted | Abort to Failure Handler (ADR-13) |
| 6 | `@test` | TESTS_READY + `escalate_to_check: false` | Proceed to Phase 7 |
| 6 | `@test` | TESTS_READY + `escalate_to_check: true` | `@check` light review → `@test` fixes → forward |
| 6 | `@test` | NOT_TESTABLE (general) | `@check` sign-off; task goes to `@make` without tests; record in NOT_TESTABLE manifest for Phase 8 (ADR-18) |
| 6 | `@test` | NOT_TESTABLE: Missing testability seam | `@make` adds the seam; re-run `@test` |
| 6 | `@test` | BLOCKED | Investigate; may need spec or plan revision |
| 6 | `@test` (stub-first) | All tests pass with zero `todo!()` panics | Reject — structural-only tests; route back to `@test` to rewrite |
| 7 | `@make` | Implementation Complete | Proceed to Phase 8 |
| 7 | `@make` | Implementation Incomplete (any flag or no flag) | Route through `@check` test-diagnosis-first (ADR-19); orchestrator follows `@check`'s verdict |
| 7 | `@check` diagnosis | `test_design` | Dispatch `@test` to redesign tests; fresh `@make` re-attempt |
| 7 | `@check` diagnosis | `production_logic` | Re-dispatch `@make` with `@check`'s production-side notes |
| 7 | `@check` diagnosis | `split_needed` | Abort to Failure Handler (ADR-21 / Q19a). In the one-task-per-run model task-1 is the only task; no AC have been satisfied; recovery is "discard worktree, re-plan from scratch." `@pm` adds a comment recording the diagnosis. |
| 7 | Escalation loop | 2 cycles exhausted | Decrement `plan_rework_remaining`; back to Phase 3 (plan revisit) |
| 8 | `@check` | ACCEPTABLE | Proceed to Phase 9 |
| 8 | `@check` | BLOCK / behavioral / production-code finding | Write `task-fix-<N>.md` to `$RUN_DIR/` (ADR-16); dispatch `@make` against it (max 3 cycles) |
| 8 | `@check` | BLOCK / test-design / test-quality finding | Route through `@check` diagnosis → `@test``@make` re-verify |
| 8 | `@check` | BLOCK / plan-level finding | Decrement `plan_rework_remaining`; back to Phase 3 with the finding |
| 8 | `@simplify` | Any verdict (ADR-15) | Advisory only — record in summary; never blocks Phase 8 progression |
| 8 | Reviewers | Strictly cosmetic finding (typo, missing newline, AST-preserving) | Orchestrator fixes directly; re-review |
| 8 | Reviewers | NOT_TESTABLE manifest task flagged as questionable | Apply same routing as a normal `@check` finding for that task |
| 8 | Review loop | Same finding twice | Convergence; stop loop |
| 8 | Review loop | 3 cycles exhausted | Document blockers; proceed |
| 9 | Orchestrator | Pre-existing bug, out of scope | File sub-issue via `@pm` (label: `bug`) |
| 9 | Orchestrator | Unresolved review-loop blocker | File sub-issue via `@pm` (label: `followup`) |
| 9 | `@test` (Phase 6) | NOT_TESTABLE future-seam note | File sub-issue via `@pm` (label: `tech-debt`) |
| 9 | Orchestrator | `@simplify` advisory not acted on | Record in summary; do NOT file (records, not work) |
| 9 | Orchestrator | All parent AC checked off | Set issue status to `Done`; sync README/parent; commit `chore(todo): …` |
| 9 | Orchestrator | Some parent AC remain unchecked AND sub-issues exist | Leave issue at `In Progress`; commit `chore(todo): …` |
| Run-level | Failure Handler | Workflow is non-resumable (ADR-14) | Document the cleanup procedure: `git worktree remove`, delete branch, re-create from base, retry |
---
## 5. Design Decisions Log
ADR-flavoured. New decisions append at the end. If a decision is later reversed or refined, mark the original *Superseded by ADR-N* and add a new entry.
### ADR-1 (2026-05-06) — Forge-agnostic workflow
**Context:** original gist used the GitHub `gh` CLI for auth checks and `gh pr create --draft` at the end of the run.
**Decision:** workflow stops at `git commit`. No push, no PR/MR creation, no `gh` references anywhere.
**Alternatives:** keep `gh` integration; abstract behind a forge-plugin interface.
**Consequences:** workflow runs on any git host; user opens PR/MR manually on whichever forge they use. Removes the need for forge auth setup as a prerequisite.
### ADR-2 (2026-05-06) — `@pm` operates on local `TODO/` folder
**Context:** original `@pm` agent used the Linear CLI.
**Decision:** Linear-style folder-as-tracker with one `<ID>.md` file per issue plus a category-grouped `README.md`.
**Alternatives:** keep Linear; multi-backend abstraction; single-file `TODO.md`.
**Consequences:** project-local, version-controlled, no external service. Schema enforced in `agents/pm.md`. Initial single-file design moved to per-issue files in ADR-12.
### ADR-3 (2026-05-07) — Workflow runs in worktree, not bare repo
**Context:** original orchestrated bare-clone → worktree creation as Phase 3 of the workflow.
**Decision:** user creates the worktree before launching opencode; the workflow assumes CWD is the worktree.
**Alternatives:** keep auto-worktree-creation; auto-detect bare vs. worktree.
**Consequences:** simpler workflow; opencode CWD = worktree, so subagents inherit the right project root naturally; less plumbing around `WORKTREE_PATH`. (Subagents still get absolute paths in dispatch prompts — see ADR-7.)
### ADR-4 (2026-05-07) — `@make` and `@test` are polyglot
**Context:** original was Python-only via `uv`.
**Decision:** detect toolchain from marker files (`pyproject.toml`, `Cargo.toml`, `flake.nix`); wrap all toolchain commands in `nix develop -c` if a devshell is present.
**Alternatives:** per-language agents; keep Python-only.
**Consequences:** one agent per role serves multiple languages. Permission allowlists expanded for `cargo` and `nix develop -c`. Bash sandbox still denies shell escapes inside the wrapper.
### ADR-5 (2026-05-07) — Subagent CWD via absolute paths
**Context:** opencode subagents do not inherit the orchestrator's `cd`. A `@check` dispatched from inside a worktree resolved relative paths against the parent project root and failed with "file not found."
**Decision:** capture `WORKTREE_PATH` in Phase 1 and pass absolute paths to every subagent dispatch.
**Alternatives:** patch opencode (out of scope); symlink dance.
**Consequences:** every dispatch has an explicit `Worktree: <abs path>` header convention. Verbose but reliable. Eventually superseded by run-artifact paths under `$RUN_DIR` (ADR-7).
### ADR-6 (2026-05-08) — Run artifacts on disk in `.workflow/run-<ID>/`
**Context:** the orchestrator was paraphrasing the plan and task specs into each dispatch prompt. Result: `@check` and `@simplify` could see slightly different versions of the same plan; mid-loop revisions could leak as "actually let me reconsider…" passages; long specs ate context budget on every dispatch.
**Decision:** orchestrator writes `plan.md` (Phase 3), `task-N.md` (Phase 5), and `summary.md` (Phase 9) to `$WORKTREE_PATH/.workflow/run-<ISSUE_ID>/`. Dispatches name files by absolute path; subagents read them.
**Alternatives:** inline prompts (status quo); database; in-memory orchestrator state.
**Consequences:** byte-for-byte source of truth across dispatches. Mid-loop revisions edit the file in place; every subsequent reader sees the new version. Run-artifact directory is gitignored (`.workflow/`).
### ADR-7 (2026-05-08) — Stub-first Rust TDD (mandatory for new symbols)
**Context:** Rust integration tests reference symbols imported from `lib.rs`. If those symbols don't exist yet, the test crate fails to compile — a build-error RED with no stack trace and no assertion diagnostics. Same for module tests against not-yet-existing functions.
**Decision:** for any Rust task that introduces new symbols, dispatch a stub-pass `@make` first (writes `todo!()`-bodied stubs, runs `cargo check` only). Then `@test` runs against compiling stubs; runtime panic on `todo!()` is the clean RED. Then `@make` body pass replaces stubs.
**Alternatives:** accept compile-error RED; let `@make` write tests + bodies in one pass; allow `@test` to add stubs to production source.
**Consequences:** two atomic commits per affected task (`feat: scaffold X with todo!() stubs`, then `feat: implement X`). Stub-pass scope is tight: bodies are exactly `todo!()`, signatures must match the planned final API. Phase 6 also adds a mandatory panic-coverage check after `@test`: every test must panic on `todo!()` to prove it actually exercises the stubbed symbols (catches structural-only tests).
**On reviewer bypass:** the stub-pass commit is not sent through Phase 5.5 or Phase 8 review. The bypass is intentional and safe because (a) stubs are mechanical — signatures plus `todo!()`, no logic; (b) the body-pass commit *is* reviewed and the body-pass diff strictly subsumes the stub-pass diff (the same signatures, now with bodies); (c) Phase 6's mandatory panic-coverage check is what actually validates that the stubs are exercised. Reviewing the stub-pass would duplicate work that the body-pass review catches anyway.
### ADR-8 (2026-05-08) — `@test` may write inside `#[cfg(test)] mod` blocks
**Context:** Rust unit tests live colocated in production source files inside `#[cfg(test)] mod tests { … }` blocks — the canonical idiom, not an edge case. Original `@test` File Constraint forbade `src/` writes entirely, which forced `@make` to write both production code and tests in a single dispatch. This lost the RED→GREEN separation that TDD relies on.
**Decision:** `@test` may modify `src/**/*.rs` strictly inside `#[cfg(test)] mod <name> { … }` blocks. Every line outside such a block stays read-only.
**Alternatives:** keep the restriction; write all unit-level tests as integration tests.
**Consequences:** TDD works for module tests as well as integration tests. The previous Phase 6 file gate (path-based `git status` snapshot diff) is removed — with `@test` now legitimately writing inside `src/`, a path-based gate proves nothing. Constraint is now enforced by the prompt rule, the diff being human-reviewable, and `@check` flagging production-code drift in Phase 8.
### ADR-9 (2026-05-08) — Phase 5.5 task-split review by `@check`
**Context:** `ppries`' README mentioned `@check` reviewing the task split for completeness, but the gist's `workflow.md` never implemented it. Without a split-review gate, an over- or under-split task surfaced only at Phase 8 final review — after expensive `@test` and `@make` dispatches had already run on a broken split.
**Decision:** new Phase 5.5 dispatches `@check` against `plan.md` + every `task-N.md` to evaluate the split against five questions: coverage, no overlap, single-purpose, integration contracts, testable AC. Max 2 cycles; BLOCK routes back to Phase 4 (plan itself doesn't decompose).
**Alternatives:** status quo (catch at Phase 8); orchestrator self-check.
**Consequences:** one extra `@check` dispatch per run. `@simplify` is not involved at this phase — split review is structural, not complexity. Cheaper failure modes for over-/under-split tasks.
### ADR-10 (2026-05-08) — `@pm` is single-mode (filesystem only)
**Context:** `@pm` had two read modes — `git show <ref>:TODO.md` (read-only) and filesystem (read/write). Git-ref mode existed for the bare-repo flow that ADR-3 retired. After ADR-3, the workflow always used filesystem mode; git-ref mode was dead weight that still added bash permissions and doc surface.
**Decision:** remove git-ref mode. `@pm` has no bash access. Ad-hoc historical reads (`git show main:TODO/GAL-39.md`) are out of scope — the user runs them directly.
**Alternatives:** keep dual-mode; document the separation more clearly.
**Consequences:** simpler agent. One less permission allowlist to maintain. Workflow's "(live filesystem mode)" qualifier dropped from Phase 2 / Phase 9 / Failure handler.
### ADR-11 (2026-05-08) — Phase 9 files follow-ups as TODO sub-issues
**Context:** unresolved items (pre-existing bugs out of scope, blocked review findings, future-seam notes) were recorded only in `summary.md` — per-run, untracked, overwritten on the next run, read by nobody since the user has walked away.
**Decision:** Phase 9 has a `### File Follow-ups` step that dispatches `@pm` to create new TODO sub-issues for tracked-worthy items. Each new issue has `parent: <ISSUE_ID>`, status `Todo`, and an appropriate label (`bug` / `followup` / `tech-debt`). `@simplify` advisories that the orchestrator chose not to act on stay in the summary as records, not filed.
**Alternatives:** leave items in summary; create as top-level issues (would need a README.md category, which can't be picked at unattended runtime).
**Consequences:** unresolved items become tracked work. Sub-issue routing avoids the README-category problem. The follow-up files commit alongside the worked-issue update in a single `chore(todo): …` commit.
### ADR-12 (2026-05-08) — Phase 7 mid-implementation escalation
**Context:** Phase 7's escalation rule was gated on `@make` flagging concerns *during entry validation* (the RED check before implementing). When `@make` got past entry validation, started implementing, and then ground for 2-3 attempts because the test demanded impossible production code, the orchestrator had no documented route — it would re-dispatch `@make` with marginal context tweaks instead of recognizing the diagnosis as test-architecture failure.
**Decision:** split Phase 7's escalation into entry-validation and mid-implementation paths. `@make` reports `escalate: test_design` when its iteration limit is reached and the test seems to demand impossible / unreasonable code. Both paths route through `@check` (test diagnosis) → `@test` (redesign) → fresh `@make` dispatch. Max 2 escalation cycles before reverting to Phase 3 plan revisit.
**Alternatives:** status quo; let `@make` modify test files itself.
**Consequences:** faster recovery from test-design errors. Bounded loop prevents thrashing. `@make.md` Iteration Limits section gains a new red-flag class. *Superseded in part by ADR-19 (unified diagnosis path).*
### ADR-13 (2026-05-08) — Run-level `plan_rework_remaining` counter
**Context:** several routes return control to an upstream phase when downstream signals reveal the upstream artifact was wrong: P5.5-BLOCK→P4 (split doesn't decompose), P7-escalation-exhaustion→P3 (test/code thrash exceeded its bound), P8-plan-level→P3 (final review exposes a plan defect). Each upstream phase has its own per-loop cycle cap (P4 max 3, P5.5 max 2, etc.), but those caps reset on every re-entry — so a run could in principle thrash P3↔P4↔P5.5↔P3 indefinitely without violating any local rule.
**Decision:** introduce one run-level counter, `plan_rework_remaining`, default value `1`. It decrements on every transition where downstream signal forces upstream rework: `P5.5 BLOCK → P4`, `P7 escalation exhausted → P3`, `P8 plan-level finding → P3`. When the counter is `0` and another such transition fires, abort to the Failure Handler instead of re-entering. Per-phase cycle caps are unchanged.
**Alternatives:** (a) a global `max_subagent_dispatches` budget — over-engineered for the specific failure mode; (b) document the resets as intentional and rely on convergence detection — leaves the bug present.
**Consequences:** at most two plan attempts per run (the initial plan plus one revision). Failure Handler invocation distinct in cause from earlier-phase aborts: the cleanup is the same (per ADR-14) but the summary explains *which* downstream signal exhausted the budget.
### ADR-14 (2026-05-08) — Workflow is non-resumable
**Context:** Phase 9 has multiple sub-steps (code commit → `@pm` status update → file follow-ups → TODO commit → summary). Crashing between any two sub-steps leaves the worktree in a state that earlier docs called "partial." The original Failure Handler did not flip status back, did not recognize partial-Phase-9 separately from earlier-phase crashes, and re-running `/workflow` after a crash could append new comments and re-do work indefinitely.
**Decision:** declare the workflow non-resumable. On any failure (Failure Handler invocation), the recovery procedure is: `git worktree remove` the failed worktree, delete the feature branch, re-create the worktree from `$BASE_BRANCH`, then re-run `/workflow`. Document this explicitly in the Failure Handler section. The throwaway-worktree model means there is no in-place resume state to corrupt — the user discards the worktree and starts fresh.
**Alternatives:** (a) smarter Failure Handler that cleans up partial state idempotently; (b) transactional Phase 9 via a state file; (c) idempotent sub-steps so re-runs auto-resume.
**Consequences:** simplest possible recovery model. Phase 9 sub-step ordering doesn't need to be defended against partial failures — partial state is acceptable because the recovery is "discard everything and re-run." User-initiated cancellation (Ctrl-C) follows the same procedure.
### ADR-15 (2026-05-08) — `@simplify` is advisory at every gate
**Context:** the Phase 4 routing matrix used to read "Either reviewer NEEDS WORK → re-dispatch the loop," giving `@simplify` veto power equivalent to `@check`'s. Phase 8's matrix said `@simplify` was advisory only ("Record in summary's 'Advisory notes (not filed)'"). Same agent, two different powers.
**Decision:** `@simplify` is advisory at every gate. Its findings are recorded in the run summary; they never force a re-dispatch loop. `@check` is the only reviewer with veto authority (NEEDS WORK / BLOCK).
**Alternatives:** (a) make `@simplify` blocking everywhere — too heavy for a heuristic agent prone to false positives; (b) keep the asymmetry and document a principle — fragile.
**Consequences:** uniform model — `@check` enforces correctness, `@simplify` advises on shape. Phase 4 review loops only run on `@check` findings; `@simplify` complexity flags get logged in the summary like at Phase 8. The user can manually promote a `@simplify` finding if it matters.
### ADR-16 (2026-05-08) — Phase 8 fix specs go to disk
**Context:** when Phase 8 review surfaced a behavioral or production-code finding, the orchestrator would "build a new `@make` task spec from the finding" and dispatch it inline. That violates ADR-6's invariant (run artifacts on disk, no inline paraphrase) — and exactly when it matters most, because Phase 8 has up to 3 review cycles and the same finding can re-dispatch.
**Decision:** Phase 8 fix dispatches write a new artifact `$RUN_DIR/task-fix-<N>.md` (1-indexed within the Phase 8 cycle) before dispatching `@make`. Same Dispatch Hygiene rules as Phase 5 task specs, same `test -f` verification. Cosmetic findings (orchestrator fixes directly per workflow.md) skip the file — only `@make`-dispatched findings get one.
**Alternatives:** (a) inline in the dispatch prompt with an ADR-6 footnote — erodes the invariant for the highest-risk dispatch class; (b) edit the original `task-N.md` — muddies the audit trail of an already-met spec.
**Consequences:** ADR-6's invariant holds end-to-end. Phase 8 cycles re-dispatch against the same on-disk file (mid-loop edits in place), eliminating paraphrase drift across review cycles.
### ADR-17 (2026-05-08) — Phase 5.5 BLOCK protocol
**Context:** Phase 5.5 BLOCK ("plan does not decompose cleanly") used to route "back to Phase 4 with `@check`'s finding," but `@check` at 5.5 evaluated the *split*, not the plan; its finding may not map cleanly to a plan edit. Re-entering P4 with the same `plan.md` and a finding tagged on the prompt asks the wrong question.
**Decision:** on P5.5 BLOCK, the orchestrator translates the split-level finding into a concrete `plan.md` edit (e.g. "the plan conflates structural and runtime work; split into two milestones"), saves the edit, decrements `plan_rework_remaining` (per ADR-13), and re-dispatches Phase 4 reviewers against the *revised* plan. P4 reviewers see a genuinely different plan.
**Alternatives:** (a) re-dispatch P4 unchanged with finding attached — burns reviewers on a known-broken plan; (b) treat P5.5 BLOCK as terminal — too strict, we have the rework budget for one revisit.
**Consequences:** P5.5 BLOCK is an effective signal. The orchestrator's plan-edit step is mandatory; skipping it is a routing error. Run-level rework budget bounds the loop.
### ADR-18 (2026-05-08) — Phase 8 NOT_TESTABLE manifest
**Context:** Phase 6 routes NOT_TESTABLE tasks through `@check` for sign-off, then dispatches `@make` without tests. Phase 8 reviews the diff but has no signal that "this change has no test because `@test` claimed it untestable." If `@check` at P6 was wrong, untested code ships.
**Decision:** Phase 8's dispatch prompt includes a "Tasks completed without tests (NOT_TESTABLE)" section listing each task ID, the `@test` justification, and the `@check` sign-off rationale. Reviewers explicitly evaluate "does the justification still hold given the final diff?" If a reviewer pushes back, routing follows the normal Phase 8 finding rules.
**Alternatives:** (a) double-up `@check` + `@simplify` at P6 NOT_TESTABLE granting — doubles dispatch cost without targeting the actual gap; (b) restrict NOT_TESTABLE to a fixed taxonomy — won't generalize across languages; (c) reject NOT_TESTABLE entirely — ignores legitimate cases.
**Consequences:** pure plumbing change. P8 reviewers gain visibility into the bypass without new agents or new authority.
### ADR-19 (2026-05-08) — Unified Implementation Incomplete diagnosis path
**Context:** ADR-12 introduced three paths for `@make` reporting Implementation Incomplete: entry-validation flag, mid-impl `escalate: test_design` flag, no flag (re-dispatch with `@check` notes once, escalate after second failure). Three paths converging on the same destination (`@check` test-diagnosis → `@test` redesign or `@make` re-dispatch) added matrix surface and obscured the routing.
**Decision:** every Implementation Incomplete from `@make` routes through `@check` test-diagnosis-first. `@check` returns one of three verdicts — `test_design` (route to `@test` redesign), `production_logic` (re-dispatch `@make` with `@check`'s notes), or `split_needed` (per ADR-21). `@make`'s self-diagnosis flag becomes a *hint* for `@check`, not a control-flow input for the orchestrator.
**Alternatives:** (a) keep three paths, tighten what `@check` reviews in each — preserves the surface area; (b) push burden to `@make` — orchestrator still needs to gate via `@check`.
**Consequences:** routing logic shrinks. Matrix has fewer rows. ADR-12's split-into-two-paths is partially superseded — the *escalation diagnosis* unified, the iteration limit (max 2 cycles) preserved.
### ADR-20 (2026-05-08) — Phase 1 working-tree cleanliness check
**Context:** Phase 1 verified non-bare repo, branch identity, base branch, issue file presence — but not that the working tree was clean. Stale uncommitted edits would be swept into the Phase 9 commit (workflow.md stages "code changes only" but doesn't distinguish *which* code) or a `wip:` failure commit.
**Decision:** Phase 1 runs `git status --porcelain`; if non-empty, stop with: "Working tree must be clean. Commit or stash uncommitted changes before running the workflow."
**Alternatives:** (a) capture initial dirty state, stage only files modified by the workflow at Phase 9 — error-prone baseline tracking; (b) document the requirement, don't enforce — `// TODO: don't forget` in design-doc form.
**Consequences:** matches the ADR-14 throwaway-worktree model. One additional sanity-check line. User's "but I have manual edits I want the workflow to build on" case is solved by them committing those edits first, which is what they should do anyway.
### ADR-21 (2026-05-08) — One-task-per-run model
**Context:** the workflow originally executed N tasks per run, sequentially through Phase 7. That introduced cross-task regression risk (task 4 breaks task 1's tests, found N tasks late), big-diff Phase 8 reviews (multi-day branches accumulate thousands of diff lines that hit reviewer context limits silently), and the mid-flight task-split problem (when `@make` discovers task N is over-scoped, no documented route to re-split). It also coupled the workflow's success to "all N tasks complete," when in practice an issue worth one good commit shouldn't depend on unrelated downstream work succeeding.
**Decision:** every workflow run executes **exactly one task** through Phases 68. Phase 5 still splits the plan into N tasks via the Split Heuristic. If N=1, proceed normally. If N>1, the orchestrator dispatches `@pm` to file tasks 2…M as TODO sub-issues *before* Phase 6 starts, and only task 1 runs through Phases 68. If task 1 itself reports `split_needed` mid-Phase-7, abort to the Failure Handler (Q19a: in the one-task-per-run model task-1 is the only task in the run, so no feature AC have been satisfied; recovery is the standard non-resumable cleanup from ADR-14).
The model carries five sub-decisions:
1. **Sub-issue body schema for split-time filings:** rich seed body that lets a fresh `/workflow` invocation re-plan and implement without seeing siblings or the original `plan.md`. Includes task description + AC + Code Context + Integration Contracts (declared in frontmatter as `depends-on: [...]`) + relevant slice of `plan.md` + Test Design section if present + a "Discovered during run on `$BRANCH_NAME` for parent issue `$ISSUE_ID`" attribution paragraph.
2. **Phase 5.5 review questions strengthen to six**: coverage, no overlap, single-purpose, integration contracts (with stronger bar — must be self-contained for cross-session use), testable AC, and *self-containment* (is each task spec runnable as a standalone `/workflow` invocation?). Self-containment is the new load-bearing question because each filed sub-issue runs in isolation.
3. **Split Heuristic recalibration**: keep the existing mechanical thresholds (>2 concerns, >50 lines across >2 files, mixes structural + runtime, etc.) but add a "default to no split" tiebreaker — when in doubt, do not split, because splitting now fans out across user sessions with full orchestration overhead per sub-issue.
4. **Parent issue status is AC-driven**: Phase 9's existing AC checkbox logic (workflow.md flips ticked AC) determines status. If all parent AC are checked → `Done`; if some remain unchecked → stays `In Progress` with the filed sub-issues covering the remaining work.
5. **`depends-on:` frontmatter and Phase 1 enforcement**: `@pm` schema gains a `depends-on: [<ID>, ...]` list. Phase 1 sanity check refuses to start if any listed dependency is not `Done`. Hard block — soft-warn means the user (who has walked away) doesn't see the warning until later.
**Auto-resolved problems:**
- Mid-flight task split (formerly Q2 in Open Questions, ADR-12's adjacent gap): collapses into "file as sub-issue and exit."
- Big-diff Phase 8 reviews: one task = bounded diff (~50 lines per Split Heuristic). No big-diff problem possible.
- Cross-task regression within a run: no cross-task regressions possible inside a single-task run; subsequent sub-issue runs detect them at their own Phase 7 entry validation (which runs the project's test suite).
- Skip-P5.5-when-N=1 optimization: trivially satisfied — N=1 from Phase 6 onward in every run.
**Alternatives:** (a) keep N-task runs, add mid-flight re-splitting via P7→P5 re-entry — doesn't solve big-diff or cross-task regression; (b) keep N-task runs, accept the gaps — leaves three known-bad routes; (c) always one task per issue (skip Phase 5 entirely) — loses the planning-phase split heuristic that's catching legitimate over-scoping at design time.
**Consequences:** runs become shorter and more focused. Each commit/PR carries a bounded scope. Sub-issue fan-out becomes the primary scaling mechanism for multi-step work. `TODO/` sees more sub-issue files; `@pm`'s split-time filing path becomes a hot code path. Concurrent runs in different worktrees on the same repo become trivially safe because each worktree has its own `TODO/` checkout (file conflicts surface as standard git merge conflicts at integration time, not as mid-run race conditions).
---
## 6. Open Questions / Known Gaps
When a question gets answered, move it to the [Design decisions log](#5-design-decisions-log).
### Q1: Phase 5.5 review scope — does `@check` evaluate test-design soundness here?
Currently Phase 5.5 reviews the **split** (coverage, overlap, single-purpose, integration contracts, testable AC). It does *not* explicitly evaluate whether the test approach implied by each task spec is sound. That would partially overlap with Phase 4 (which has a plan-level Test Design section the reviewers evaluate). If a test-design error escapes Phase 4 and is encoded in a task spec, it surfaces at Phase 7 via the mid-impl escalation (ADR-12) — but earlier detection might be cheaper. Open: should Phase 5.5 add "test approach for each task is sound" as a sixth review question, or is that scope creep into Phase 4 territory?
### ~~Q2~~: Mid-flight task split — *closed by ADR-21*
The one-task-per-run model collapses this question. When `@make` discovers task-1 is over-scoped, the unified diagnosis path (ADR-19) returns `split_needed` from `@check`, and the orchestrator aborts to the Failure Handler (no P5 re-entry, no sub-issue filing — the recovery is "discard worktree, re-plan from scratch"). Tasks 2…M are already filed as sub-issues at Phase 5.5 acceptance, so there's no "remaining tasks" cleanup to think about.
### ~~Q3~~: Phase 9 partial-commit rollback — *closed by ADR-14*
The workflow is non-resumable. Phase 9 partial states are addressed by the throwaway-worktree recovery procedure: discard the worktree, delete the branch, re-create from base, re-run. Phase 9 sub-step ordering doesn't need to defend against partial failures because the recovery is "discard everything and re-run."
### Q4: `@simplify` not involved at Phase 5.5 — is that the right call?
Phase 5.5 only dispatches `@check`. Rationale (ADR-9) is that split review is structural, not complexity. But `@simplify`'s lens — "what if we deleted this?" — could legitimately catch unnecessary tasks (e.g. a third task that adds an abstraction nothing else needs). With ADR-21's one-task-per-run pivot, this question gains a different angle: a `@simplify` flag on a sibling sub-issue at Phase 5.5 could prevent filing a wasteful sub-issue, which is more valuable than catching the same redundancy at Phase 8 of a future run. Open: is the cost of one more dispatch worth the catch, especially now that Phase 5.5 is the gate for sub-issue fan-out?
### Q5: Test-design loop bound vs plan-revisit threshold
ADR-12 sets max 2 cycles for the Phase 7 test-design escalation before reverting to Phase 3 plan revisit. The plan-review and final-review loops have max 3. Why the asymmetry? The test-design loop is more expensive per cycle (`@check` + `@test` + `@make` re-implement vs. just reviewers + plan edit), so 2 may be right. But the choice was made by feel, not measured. Open: is 2 the right number, or should it match Phase 4 / Phase 8 at 3?
### Q6: Sub-issue ordering in the parent's `## Sub-issues` list
ADR-21's split-time filing creates new sub-issues with `depends-on:` declarations, but the parent's `## Sub-issues` list (rendered by `@pm`) is currently flat. When dependencies form a chain (sub-issue 2 depends on 1), the user has to read the chain from each sub-issue's frontmatter. Open: should `@pm` render the parent's sub-issue list in dependency order, with a visible indicator (e.g. indentation or `↳`) for dependent items? Cosmetic but would speed up "what to run next" decisions.
### Q7: Concurrent-worktree edge case — sub-issue ID collisions
Two parallel runs in different worktrees, each filing sub-issues, can both pick the same next ID (e.g. both pick `GAL-42` because both saw `GAL-41` as the highest at start). On merge, git surfaces this as a conflict over `TODO/GAL-42.md` content (two different files staked on the same name). Recoverable but annoying. Open: should `@pm`'s ID generation use a strategy that's safer under concurrent runs (e.g. timestamp suffix, branch-prefix, content-addressable), or accept the merge-conflict-on-collision cost given the one-user assumption?
---