nixcfg/config/opencode/workflow-design.md

# Workflow Design

## 1. Purpose

This document is the **design rationale and decision log** for the multi-agent workflow. The operational rules — what the orchestrator does, in what order, with what guardrails — live in [`commands/workflow.md`](commands/workflow.md) and the agent files under [`agents/`](agents/). This document is where we discuss changes *before* they land in those files.

**Intended flow:**

1. A new idea, gap, or failure mode comes up (often from a real run).
2. Discuss in this document — capture context, options, trade-offs.
3. When a decision is reached, update `commands/workflow.md` and/or the relevant agent file.
4. Record the decision in the [Design decisions log](#5-design-decisions-log) below.

The operational files stay terse and procedural. The "why" lives here.

---

## 2. Cast & Responsibilities

One orchestrator, five subagents. The orchestrator runs in `agent: build` mode; the subagents are defined as separate agent files under `config/opencode/agents/`.

| Actor | File | Role | Boundary |
|---|---|---|---|
| **Orchestrator** | `commands/workflow.md` | Plans, dispatches, merges findings, edits artifacts under `.workflow/`, commits. | **Does not** write production code, write tests, or play any subagent's role. |
| `@check` | `agents/check.md` | Reviews plans / task splits / code for risks, correctness, testability. | Read-only — no write / edit / bash. |
| `@simplify` | `agents/simplify.md` | Reviews for unnecessary complexity. Advisory only. | Read-only. |
| `@test` | `agents/test.md` | Writes failing tests for a task spec, verifies RED. | May modify test files / `#[cfg(test)] mod` blocks. Sandboxed bash. |
| `@make` | `agents/make.md` | Implements a single task spec. Verifies acceptance criteria. | May modify files listed in the task spec. Sandboxed bash; no `git` / network / `cd`. |
| `@pm` | `agents/pm.md` | Reads / updates `TODO/` issue files. | May modify only `TODO/` contents. No bash. |

**Permission boundaries are enforced per agent.** The orchestrator (in `agent: build` mode) has full edit/bash capabilities, which is precisely why it must not act as the subagents — the agent files are where the limits live.

---

## 3. Flow Diagrams

### 3.1 Phase pipeline

High-level happy path with the major escalation arms. The workflow runs **one task per invocation** (ADR-21): Phase 5 produces N task files; if N>1, tasks 2…N are filed as sub-issues and only task 1 runs through Phases 6–8.

```mermaid
flowchart TD
    P1["Phase 1: Sanity Check<br/>incl clean tree + depends-on"]
    P2["Phase 2: Issue Context<br/>pm reads TODO/ID.md"]
    P3["Phase 3: Plan<br/>write plan.md"]
    P4{"Phase 4: Review Plan<br/>check blocking, simplify advisory<br/>max 3 cycles"}
    P5["Phase 5: Split into Tasks<br/>write task-N.md"]
    P55{"Phase 5.5: Review Split<br/>check, 6 questions<br/>max 2 cycles"}
    P5F["File tasks 2..N as sub-issues<br/>only when N more than 1"]
    P6["Phase 6: Write Tests<br/>test, stub-first make"]
    P7["Phase 7: Implement<br/>make, single task"]
    P7E{"Implementation Incomplete<br/>check diagnoses<br/>max 2 cycles"}
    P7F["split_needed: Failure Handler<br/>(discard worktree, re-run)"]
    P8{"Phase 8: Final Review<br/>check blocking, simplify advisory<br/>max 3 cycles"}
    P9["Phase 9: Commit + TODO + Follow-ups + Summary<br/>parent status AC-driven"]

    P1 --> P2 --> P3 --> P4
    P4 -->|ACCEPTABLE| P5 --> P55
    P4 -->|NEEDS WORK or BLOCK| P3
    P55 -->|ACCEPTABLE| P5F --> P6 --> P7
    P55 -->|NEEDS WORK| P5
    P55 -->|BLOCK plan-level| P3
    P7 --> P8
    P7 -.->|Implementation Incomplete| P7E
    P7E -->|test_design or production_logic| P7
    P7E -.->|split_needed| P7F
    P7E -.->|2 cycles exhausted| P3
    P7F --> ABORT([Failure Handler])
    P8 -->|ACCEPTABLE| P9
    P8 -->|production-code finding| P7
    P8 -->|test-design finding| P7E
    P8 -->|plan-level finding| P3
    P9 --> END([Done])
```

**Run-level cap:** `plan_rework_remaining` (default 1, ADR-13) decrements on every P5.5-BLOCK→P4, P7-escalation-exhaustion→P3, and P8-plan-level→P3 transition. Exhausted counter aborts to the Failure Handler.

### 3.2 Phase 7 escalation loop

The pattern when `@make` cannot reach GREEN. Unified diagnosis path (ADR-19): every Implementation Incomplete routes through `@check` test-diagnosis-first; `@check` returns one of three verdicts.

```mermaid
stateDiagram-v2
    [*] --> Dispatched: orchestrator dispatches make
    Dispatched --> EntryCheck: run tests verify RED
    EntryCheck --> Implementing: failure code matches handoff
    EntryCheck --> CheckDiag: Implementation Incomplete
    Implementing --> GreenReached: tests pass within 2-3 attempts
    Implementing --> CheckDiag: Implementation Incomplete
    CheckDiag --> TestRedesign: verdict test_design
    CheckDiag --> Dispatched: verdict production_logic
    CheckDiag --> FailureHandler: verdict split_needed
    TestRedesign --> Dispatched: test fixes fresh entry validation
    Dispatched --> PlanRevisit: 2 escalation cycles exhausted
    GreenReached --> [*]
    FailureHandler --> [*]: discard worktree, re-run
    PlanRevisit --> [*]: back to Phase 3 if rework budget intact
```

### 3.3 Issue lifecycle

How TODO entries move through statuses. In the one-task-per-run model (ADR-21), a single workflow invocation may file multiple sub-issues mid-run, and the parent's final status is AC-driven, not run-driven.

```mermaid
stateDiagram-v2
    [*] --> Todo: issue file created
    Todo --> InProgress: Phase 2 workflow starts
    InProgress --> Done: Phase 9 - all parent AC checked
    InProgress --> InProgress2: Phase 9 - some parent AC remain
    InProgress --> Todo: workflow fails, failure handler adds comment

    note right of InProgress2
        Parent stays In Progress when sub-issues
        cover the unmet AC. User runs sub-issues
        in subsequent /workflow invocations.
    end note

    note right of InProgress
        Sub-issues filed during a run carry:
        - parent: ISSUE_ID, status: Todo
        - label: bug, followup, tech-debt, or split-from-run
        - depends-on: [...] for cross-sub-issue ordering
        - rich seed body for split-time filings (ADR-21)
    end note

    Done --> [*]
    InProgress2 --> [*]
```

---

## 4. Routing Matrix

Every observed `(phase, signal) → action`. Empty cells are gaps. Walking this table is the cheap way to spot routing issues.

| Phase | Signal source | Signal | Action |
|---|---|---|---|
| 1 | Sanity checks | Bare repo / detached HEAD / missing `TODO/<ID>.md` / branch == base | Stop with error |
| 1 | Sanity checks | Working tree dirty (`git status --porcelain` non-empty) | Stop with error (ADR-20) |
| 1 | Sanity checks | `depends-on:` issue not in `Done` status | Stop with error (ADR-21) |
| 2 | `@pm` | Issue not found | Stop with error |
| 2 | `@pm` | Status is `Todo` | Flip to `In Progress`; propagate to README.md / parent's Sub-issues |
| 3 | Orchestrator | Plan drafted | Apply Dispatch Hygiene; write `plan.md`; verify `test -f` |
| 4 | `@check` | ACCEPTABLE (regardless of `@simplify`) | Proceed to Phase 5 |
| 4 | `@check` | NEEDS WORK | Edit `plan.md` in place; re-dispatch (max 3 cycles) |
| 4 | `@check` | BLOCK | Edit `plan.md` addressing the finding; re-dispatch |
| 4 | `@simplify` | Any verdict (ADR-15) | Advisory only — record in summary; never blocks Phase 4 progression |
| 4 | Reviewers | Same `@check` finding twice | Convergence detected; stop loop early |
| 4 | Reviewers | Unresolved after 3 cycles | Document blockers in summary; proceed |
| 5 | Orchestrator | Tasks drafted | Apply Dispatch Hygiene; write each `task-N.md`; verify `test -f` for every N |
| 5.5 | `@check` | ACCEPTABLE, N=1 | Skip P5.5 entirely (ADR-21); proceed to Phase 6 — degenerate split |
| 5.5 | `@check` | ACCEPTABLE, N>1 | File tasks 2…N as sub-issues with rich seed bodies (ADR-21) via `@pm`; proceed to Phase 6 with task-1 only |
| 5.5 | `@check` | NEEDS WORK | Edit `task-N.md` in place; re-dispatch (max 2 cycles) |
| 5.5 | `@check` | BLOCK plan-level | Edit `plan.md` addressing the BLOCK finding; decrement `plan_rework_remaining`; re-enter Phase 4 (ADR-17) |
| 5.5 | Run-level | `plan_rework_remaining` exhausted | Abort to Failure Handler (ADR-13) |
| 6 | `@test` | TESTS_READY + `escalate_to_check: false` | Proceed to Phase 7 |
| 6 | `@test` | TESTS_READY + `escalate_to_check: true` | `@check` light review → `@test` fixes → forward |
| 6 | `@test` | NOT_TESTABLE (general) | `@check` sign-off; task goes to `@make` without tests; record in NOT_TESTABLE manifest for Phase 8 (ADR-18) |
| 6 | `@test` | NOT_TESTABLE: Missing testability seam | `@make` adds the seam; re-run `@test` |
| 6 | `@test` | BLOCKED | Investigate; may need spec or plan revision |
| 6 | `@test` (stub-first) | All tests pass with zero `todo!()` panics | Reject — structural-only tests; route back to `@test` to rewrite |
| 7 | `@make` | Implementation Complete | Proceed to Phase 8 |
| 7 | `@make` | Implementation Incomplete (any flag or no flag) | Route through `@check` test-diagnosis-first (ADR-19); orchestrator follows `@check`'s verdict |
| 7 | `@check` diagnosis | `test_design` | Dispatch `@test` to redesign tests; fresh `@make` re-attempt |
| 7 | `@check` diagnosis | `production_logic` | Re-dispatch `@make` with `@check`'s production-side notes |
| 7 | `@check` diagnosis | `split_needed` | Abort to Failure Handler (ADR-21 / Q19a). In the one-task-per-run model task-1 is the only task; no AC have been satisfied; recovery is "discard worktree, re-plan from scratch." `@pm` adds a comment recording the diagnosis. |
| 7 | Escalation loop | 2 cycles exhausted | Decrement `plan_rework_remaining`; back to Phase 3 (plan revisit) |
| 8 | `@check` | ACCEPTABLE | Proceed to Phase 9 |
| 8 | `@check` | BLOCK / behavioral / production-code finding | Write `task-fix-<N>.md` to `$RUN_DIR/` (ADR-16); dispatch `@make` against it (max 3 cycles) |
| 8 | `@check` | BLOCK / test-design / test-quality finding | Route through `@check` diagnosis → `@test` → `@make` re-verify |
| 8 | `@check` | BLOCK / plan-level finding | Decrement `plan_rework_remaining`; back to Phase 3 with the finding |
| 8 | `@simplify` | Any verdict (ADR-15) | Advisory only — record in summary; never blocks Phase 8 progression |
| 8 | Reviewers | Strictly cosmetic finding (typo, missing newline, AST-preserving) | Orchestrator fixes directly; re-review |
| 8 | Reviewers | NOT_TESTABLE manifest task flagged as questionable | Apply same routing as a normal `@check` finding for that task |
| 8 | Review loop | Same finding twice | Convergence; stop loop |
| 8 | Review loop | 3 cycles exhausted | Document blockers; proceed |
| 9 | Orchestrator | Pre-existing bug, out of scope | File sub-issue via `@pm` (label: `bug`) |
| 9 | Orchestrator | Unresolved review-loop blocker | File sub-issue via `@pm` (label: `followup`) |
| 9 | `@test` (Phase 6) | NOT_TESTABLE future-seam note | File sub-issue via `@pm` (label: `tech-debt`) |
| 9 | Orchestrator | `@simplify` advisory not acted on | Record in summary; do NOT file (records, not work) |
| 9 | Orchestrator | All parent AC checked off | Set issue status to `Done`; sync README/parent; commit `chore(todo): …` |
| 9 | Orchestrator | Some parent AC remain unchecked AND sub-issues exist | Leave issue at `In Progress`; commit `chore(todo): …` |
| Run-level | Failure Handler | Workflow is non-resumable (ADR-14) | Document the cleanup procedure: `git worktree remove`, delete branch, re-create from base, retry |

---

## 5. Design Decisions Log

ADR-flavoured. New decisions append at the end. If a decision is later reversed or refined, mark the original *Superseded by ADR-N* and add a new entry.

### ADR-1 (2026-05-06) — Forge-agnostic workflow

**Context:** original gist used the GitHub `gh` CLI for auth checks and `gh pr create --draft` at the end of the run.
**Decision:** workflow stops at `git commit`. No push, no PR/MR creation, no `gh` references anywhere.
**Alternatives:** keep `gh` integration; abstract behind a forge-plugin interface.
**Consequences:** workflow runs on any git host; user opens PR/MR manually on whichever forge they use. Removes the need for forge auth setup as a prerequisite.

### ADR-2 (2026-05-06) — `@pm` operates on local `TODO/` folder

**Context:** original `@pm` agent used the Linear CLI.
**Decision:** Linear-style folder-as-tracker with one `<ID>.md` file per issue plus a category-grouped `README.md`.
**Alternatives:** keep Linear; multi-backend abstraction; single-file `TODO.md`.
**Consequences:** project-local, version-controlled, no external service. Schema enforced in `agents/pm.md`. Initial single-file design moved to per-issue files in ADR-12.

### ADR-3 (2026-05-07) — Workflow runs in worktree, not bare repo

**Context:** original orchestrated bare-clone → worktree creation as Phase 3 of the workflow.
**Decision:** user creates the worktree before launching opencode; the workflow assumes CWD is the worktree.
**Alternatives:** keep auto-worktree-creation; auto-detect bare vs. worktree.
**Consequences:** simpler workflow; opencode CWD = worktree, so subagents inherit the right project root naturally; less plumbing around `WORKTREE_PATH`. (Subagents still get absolute paths in dispatch prompts — see ADR-7.)

### ADR-4 (2026-05-07) — `@make` and `@test` are polyglot

**Context:** original was Python-only via `uv`.
**Decision:** detect toolchain from marker files (`pyproject.toml`, `Cargo.toml`, `flake.nix`); wrap all toolchain commands in `nix develop -c` if a devshell is present.
**Alternatives:** per-language agents; keep Python-only.
**Consequences:** one agent per role serves multiple languages. Permission allowlists expanded for `cargo` and `nix develop -c`. Bash sandbox still denies shell escapes inside the wrapper.

### ADR-5 (2026-05-07) — Subagent CWD via absolute paths

**Context:** opencode subagents do not inherit the orchestrator's `cd`. A `@check` dispatched from inside a worktree resolved relative paths against the parent project root and failed with "file not found."
**Decision:** capture `WORKTREE_PATH` in Phase 1 and pass absolute paths to every subagent dispatch.
**Alternatives:** patch opencode (out of scope); symlink dance.
**Consequences:** every dispatch has an explicit `Worktree: <abs path>` header convention. Verbose but reliable. Eventually superseded by run-artifact paths under `$RUN_DIR` (ADR-7).

### ADR-6 (2026-05-08) — Run artifacts on disk in `.workflow/run-<ID>/`

**Context:** the orchestrator was paraphrasing the plan and task specs into each dispatch prompt. Result: `@check` and `@simplify` could see slightly different versions of the same plan; mid-loop revisions could leak as "actually let me reconsider…" passages; long specs ate context budget on every dispatch.
**Decision:** orchestrator writes `plan.md` (Phase 3), `task-N.md` (Phase 5), and `summary.md` (Phase 9) to `$WORKTREE_PATH/.workflow/run-<ISSUE_ID>/`. Dispatches name files by absolute path; subagents read them.
**Alternatives:** inline prompts (status quo); database; in-memory orchestrator state.
**Consequences:** byte-for-byte source of truth across dispatches. Mid-loop revisions edit the file in place; every subsequent reader sees the new version. Run-artifact directory is gitignored (`.workflow/`).

### ADR-7 (2026-05-08) — Stub-first Rust TDD (mandatory for new symbols)

**Context:** Rust integration tests reference symbols imported from `lib.rs`. If those symbols don't exist yet, the test crate fails to compile — a build-error RED with no stack trace and no assertion diagnostics. Same for module tests against not-yet-existing functions.
**Decision:** for any Rust task that introduces new symbols, dispatch a stub-pass `@make` first (writes `todo!()`-bodied stubs, runs `cargo check` only). Then `@test` runs against compiling stubs; runtime panic on `todo!()` is the clean RED. Then `@make` body pass replaces stubs.
**Alternatives:** accept compile-error RED; let `@make` write tests + bodies in one pass; allow `@test` to add stubs to production source.
**Consequences:** two atomic commits per affected task (`feat: scaffold X with todo!() stubs`, then `feat: implement X`). Stub-pass scope is tight: bodies are exactly `todo!()`, signatures must match the planned final API. Phase 6 also adds a mandatory panic-coverage check after `@test`: every test must panic on `todo!()` to prove it actually exercises the stubbed symbols (catches structural-only tests).

**On reviewer bypass:** the stub-pass commit is not sent through Phase 5.5 or Phase 8 review. The bypass is intentional and safe because (a) stubs are mechanical — signatures plus `todo!()`, no logic; (b) the body-pass commit *is* reviewed and the body-pass diff strictly subsumes the stub-pass diff (the same signatures, now with bodies); (c) Phase 6's mandatory panic-coverage check is what actually validates that the stubs are exercised. Reviewing the stub-pass would duplicate work that the body-pass review catches anyway.

### ADR-8 (2026-05-08) — `@test` may write inside `#[cfg(test)] mod` blocks

**Context:** Rust unit tests live colocated in production source files inside `#[cfg(test)] mod tests { … }` blocks — the canonical idiom, not an edge case. Original `@test` File Constraint forbade `src/` writes entirely, which forced `@make` to write both production code and tests in a single dispatch. This lost the RED→GREEN separation that TDD relies on.
**Decision:** `@test` may modify `src/**/*.rs` strictly inside `#[cfg(test)] mod <name> { … }` blocks. Every line outside such a block stays read-only.
**Alternatives:** keep the restriction; write all unit-level tests as integration tests.
**Consequences:** TDD works for module tests as well as integration tests. The previous Phase 6 file gate (path-based `git status` snapshot diff) is removed — with `@test` now legitimately writing inside `src/`, a path-based gate proves nothing. Constraint is now enforced by the prompt rule, the diff being human-reviewable, and `@check` flagging production-code drift in Phase 8.

### ADR-9 (2026-05-08) — Phase 5.5 task-split review by `@check`

**Context:** `ppries`' README mentioned `@check` reviewing the task split for completeness, but the gist's `workflow.md` never implemented it. Without a split-review gate, an over- or under-split task surfaced only at Phase 8 final review — after expensive `@test` and `@make` dispatches had already run on a broken split.
**Decision:** new Phase 5.5 dispatches `@check` against `plan.md` + every `task-N.md` to evaluate the split against five questions: coverage, no overlap, single-purpose, integration contracts, testable AC. Max 2 cycles; BLOCK routes back to Phase 4 (plan itself doesn't decompose).
**Alternatives:** status quo (catch at Phase 8); orchestrator self-check.
**Consequences:** one extra `@check` dispatch per run. `@simplify` is not involved at this phase — split review is structural, not complexity. Cheaper failure modes for over-/under-split tasks.

### ADR-10 (2026-05-08) — `@pm` is single-mode (filesystem only)

**Context:** `@pm` had two read modes — `git show <ref>:TODO.md` (read-only) and filesystem (read/write). Git-ref mode existed for the bare-repo flow that ADR-3 retired. After ADR-3, the workflow always used filesystem mode; git-ref mode was dead weight that still added bash permissions and doc surface.
**Decision:** remove git-ref mode. `@pm` has no bash access. Ad-hoc historical reads (`git show main:TODO/GAL-39.md`) are out of scope — the user runs them directly.
**Alternatives:** keep dual-mode; document the separation more clearly.
**Consequences:** simpler agent. One less permission allowlist to maintain. Workflow's "(live filesystem mode)" qualifier dropped from Phase 2 / Phase 9 / Failure handler.

### ADR-11 (2026-05-08) — Phase 9 files follow-ups as TODO sub-issues

**Context:** unresolved items (pre-existing bugs out of scope, blocked review findings, future-seam notes) were recorded only in `summary.md` — per-run, untracked, overwritten on the next run, read by nobody since the user has walked away.
**Decision:** Phase 9 has a `### File Follow-ups` step that dispatches `@pm` to create new TODO sub-issues for tracked-worthy items. Each new issue has `parent: <ISSUE_ID>`, status `Todo`, and an appropriate label (`bug` / `followup` / `tech-debt`). `@simplify` advisories that the orchestrator chose not to act on stay in the summary as records, not filed.
**Alternatives:** leave items in summary; create as top-level issues (would need a README.md category, which can't be picked at unattended runtime).
**Consequences:** unresolved items become tracked work. Sub-issue routing avoids the README-category problem. The follow-up files commit alongside the worked-issue update in a single `chore(todo): …` commit.

### ADR-12 (2026-05-08) — Phase 7 mid-implementation escalation

**Context:** Phase 7's escalation rule was gated on `@make` flagging concerns *during entry validation* (the RED check before implementing). When `@make` got past entry validation, started implementing, and then ground for 2-3 attempts because the test demanded impossible production code, the orchestrator had no documented route — it would re-dispatch `@make` with marginal context tweaks instead of recognizing the diagnosis as test-architecture failure.
**Decision:** split Phase 7's escalation into entry-validation and mid-implementation paths. `@make` reports `escalate: test_design` when its iteration limit is reached and the test seems to demand impossible / unreasonable code. Both paths route through `@check` (test diagnosis) → `@test` (redesign) → fresh `@make` dispatch. Max 2 escalation cycles before reverting to Phase 3 plan revisit.
**Alternatives:** status quo; let `@make` modify test files itself.
**Consequences:** faster recovery from test-design errors. Bounded loop prevents thrashing. `@make.md` Iteration Limits section gains a new red-flag class. *Superseded in part by ADR-19 (unified diagnosis path).*

### ADR-13 (2026-05-08) — Run-level `plan_rework_remaining` counter

**Context:** several routes return control to an upstream phase when downstream signals reveal the upstream artifact was wrong: P5.5-BLOCK→P4 (split doesn't decompose), P7-escalation-exhaustion→P3 (test/code thrash exceeded its bound), P8-plan-level→P3 (final review exposes a plan defect). Each upstream phase has its own per-loop cycle cap (P4 max 3, P5.5 max 2, etc.), but those caps reset on every re-entry — so a run could in principle thrash P3↔P4↔P5.5↔P3 indefinitely without violating any local rule.
**Decision:** introduce one run-level counter, `plan_rework_remaining`, default value `1`. It decrements on every transition where downstream signal forces upstream rework: `P5.5 BLOCK → P4`, `P7 escalation exhausted → P3`, `P8 plan-level finding → P3`. When the counter is `0` and another such transition fires, abort to the Failure Handler instead of re-entering. Per-phase cycle caps are unchanged.
**Alternatives:** (a) a global `max_subagent_dispatches` budget — over-engineered for the specific failure mode; (b) document the resets as intentional and rely on convergence detection — leaves the bug present.
**Consequences:** at most two plan attempts per run (the initial plan plus one revision). Failure Handler invocation distinct in cause from earlier-phase aborts: the cleanup is the same (per ADR-14) but the summary explains *which* downstream signal exhausted the budget.

### ADR-14 (2026-05-08) — Workflow is non-resumable

**Context:** Phase 9 has multiple sub-steps (code commit → `@pm` status update → file follow-ups → TODO commit → summary). Crashing between any two sub-steps leaves the worktree in a state that earlier docs called "partial." The original Failure Handler did not flip status back, did not recognize partial-Phase-9 separately from earlier-phase crashes, and re-running `/workflow` after a crash could append new comments and re-do work indefinitely.
**Decision:** declare the workflow non-resumable. On any failure (Failure Handler invocation), the recovery procedure is: `git worktree remove` the failed worktree, delete the feature branch, re-create the worktree from `$BASE_BRANCH`, then re-run `/workflow`. Document this explicitly in the Failure Handler section. The throwaway-worktree model means there is no in-place resume state to corrupt — the user discards the worktree and starts fresh.
**Alternatives:** (a) smarter Failure Handler that cleans up partial state idempotently; (b) transactional Phase 9 via a state file; (c) idempotent sub-steps so re-runs auto-resume.
**Consequences:** simplest possible recovery model. Phase 9 sub-step ordering doesn't need to be defended against partial failures — partial state is acceptable because the recovery is "discard everything and re-run." User-initiated cancellation (Ctrl-C) follows the same procedure.

### ADR-15 (2026-05-08) — `@simplify` is advisory at every gate

**Context:** the Phase 4 routing matrix used to read "Either reviewer NEEDS WORK → re-dispatch the loop," giving `@simplify` veto power equivalent to `@check`'s. Phase 8's matrix said `@simplify` was advisory only ("Record in summary's 'Advisory notes (not filed)'"). Same agent, two different powers.
**Decision:** `@simplify` is advisory at every gate. Its findings are recorded in the run summary; they never force a re-dispatch loop. `@check` is the only reviewer with veto authority (NEEDS WORK / BLOCK).
**Alternatives:** (a) make `@simplify` blocking everywhere — too heavy for a heuristic agent prone to false positives; (b) keep the asymmetry and document a principle — fragile.
**Consequences:** uniform model — `@check` enforces correctness, `@simplify` advises on shape. Phase 4 review loops only run on `@check` findings; `@simplify` complexity flags get logged in the summary like at Phase 8. The user can manually promote a `@simplify` finding if it matters.

### ADR-16 (2026-05-08) — Phase 8 fix specs go to disk

**Context:** when Phase 8 review surfaced a behavioral or production-code finding, the orchestrator would "build a new `@make` task spec from the finding" and dispatch it inline. That violates ADR-6's invariant (run artifacts on disk, no inline paraphrase) — and exactly when it matters most, because Phase 8 has up to 3 review cycles and the same finding can re-dispatch.
**Decision:** Phase 8 fix dispatches write a new artifact `$RUN_DIR/task-fix-<N>.md` (1-indexed within the Phase 8 cycle) before dispatching `@make`. Same Dispatch Hygiene rules as Phase 5 task specs, same `test -f` verification. Cosmetic findings (orchestrator fixes directly per workflow.md) skip the file — only `@make`-dispatched findings get one.
**Alternatives:** (a) inline in the dispatch prompt with an ADR-6 footnote — erodes the invariant for the highest-risk dispatch class; (b) edit the original `task-N.md` — muddies the audit trail of an already-met spec.
**Consequences:** ADR-6's invariant holds end-to-end. Phase 8 cycles re-dispatch against the same on-disk file (mid-loop edits in place), eliminating paraphrase drift across review cycles.

### ADR-17 (2026-05-08) — Phase 5.5 BLOCK protocol

**Context:** Phase 5.5 BLOCK ("plan does not decompose cleanly") used to route "back to Phase 4 with `@check`'s finding," but `@check` at 5.5 evaluated the *split*, not the plan; its finding may not map cleanly to a plan edit. Re-entering P4 with the same `plan.md` and a finding tagged on the prompt asks the wrong question.
**Decision:** on P5.5 BLOCK, the orchestrator translates the split-level finding into a concrete `plan.md` edit (e.g. "the plan conflates structural and runtime work; split into two milestones"), saves the edit, decrements `plan_rework_remaining` (per ADR-13), and re-dispatches Phase 4 reviewers against the *revised* plan. P4 reviewers see a genuinely different plan.
**Alternatives:** (a) re-dispatch P4 unchanged with finding attached — burns reviewers on a known-broken plan; (b) treat P5.5 BLOCK as terminal — too strict, we have the rework budget for one revisit.
**Consequences:** P5.5 BLOCK is an effective signal. The orchestrator's plan-edit step is mandatory; skipping it is a routing error. Run-level rework budget bounds the loop.

### ADR-18 (2026-05-08) — Phase 8 NOT_TESTABLE manifest

**Context:** Phase 6 routes NOT_TESTABLE tasks through `@check` for sign-off, then dispatches `@make` without tests. Phase 8 reviews the diff but has no signal that "this change has no test because `@test` claimed it untestable." If `@check` at P6 was wrong, untested code ships.
**Decision:** Phase 8's dispatch prompt includes a "Tasks completed without tests (NOT_TESTABLE)" section listing each task ID, the `@test` justification, and the `@check` sign-off rationale. Reviewers explicitly evaluate "does the justification still hold given the final diff?" If a reviewer pushes back, routing follows the normal Phase 8 finding rules.
**Alternatives:** (a) double-up `@check` + `@simplify` at P6 NOT_TESTABLE granting — doubles dispatch cost without targeting the actual gap; (b) restrict NOT_TESTABLE to a fixed taxonomy — won't generalize across languages; (c) reject NOT_TESTABLE entirely — ignores legitimate cases.
**Consequences:** pure plumbing change. P8 reviewers gain visibility into the bypass without new agents or new authority.

### ADR-19 (2026-05-08) — Unified Implementation Incomplete diagnosis path

**Context:** ADR-12 introduced three paths for `@make` reporting Implementation Incomplete: entry-validation flag, mid-impl `escalate: test_design` flag, no flag (re-dispatch with `@check` notes once, escalate after second failure). Three paths converging on the same destination (`@check` test-diagnosis → `@test` redesign or `@make` re-dispatch) added matrix surface and obscured the routing.
**Decision:** every Implementation Incomplete from `@make` routes through `@check` test-diagnosis-first. `@check` returns one of three verdicts — `test_design` (route to `@test` redesign), `production_logic` (re-dispatch `@make` with `@check`'s notes), or `split_needed` (per ADR-21). `@make`'s self-diagnosis flag becomes a *hint* for `@check`, not a control-flow input for the orchestrator.
**Alternatives:** (a) keep three paths, tighten what `@check` reviews in each — preserves the surface area; (b) push burden to `@make` — orchestrator still needs to gate via `@check`.
**Consequences:** routing logic shrinks. Matrix has fewer rows. ADR-12's split-into-two-paths is partially superseded — the *escalation diagnosis* unified, the iteration limit (max 2 cycles) preserved.

### ADR-20 (2026-05-08) — Phase 1 working-tree cleanliness check

**Context:** Phase 1 verified non-bare repo, branch identity, base branch, issue file presence — but not that the working tree was clean. Stale uncommitted edits would be swept into the Phase 9 commit (workflow.md stages "code changes only" but doesn't distinguish *which* code) or a `wip:` failure commit.
**Decision:** Phase 1 runs `git status --porcelain`; if non-empty, stop with: "Working tree must be clean. Commit or stash uncommitted changes before running the workflow."
**Alternatives:** (a) capture initial dirty state, stage only files modified by the workflow at Phase 9 — error-prone baseline tracking; (b) document the requirement, don't enforce — `// TODO: don't forget` in design-doc form.
**Consequences:** matches the ADR-14 throwaway-worktree model. One additional sanity-check line. User's "but I have manual edits I want the workflow to build on" case is solved by them committing those edits first, which is what they should do anyway.

### ADR-21 (2026-05-08) — One-task-per-run model

**Context:** the workflow originally executed N tasks per run, sequentially through Phase 7. That introduced cross-task regression risk (task 4 breaks task 1's tests, found N tasks late), big-diff Phase 8 reviews (multi-day branches accumulate thousands of diff lines that hit reviewer context limits silently), and the mid-flight task-split problem (when `@make` discovers task N is over-scoped, no documented route to re-split). It also coupled the workflow's success to "all N tasks complete," when in practice an issue worth one good commit shouldn't depend on unrelated downstream work succeeding.
**Decision:** every workflow run executes **exactly one task** through Phases 6–8. Phase 5 still splits the plan into N tasks via the Split Heuristic. If N=1, proceed normally. If N>1, the orchestrator dispatches `@pm` to file tasks 2…M as TODO sub-issues *before* Phase 6 starts, and only task 1 runs through Phases 6–8. If task 1 itself reports `split_needed` mid-Phase-7, abort to the Failure Handler (Q19a: in the one-task-per-run model task-1 is the only task in the run, so no feature AC have been satisfied; recovery is the standard non-resumable cleanup from ADR-14).

The model carries five sub-decisions:

1. **Sub-issue body schema for split-time filings:** rich seed body that lets a fresh `/workflow` invocation re-plan and implement without seeing siblings or the original `plan.md`. Includes task description + AC + Code Context + Integration Contracts (declared in frontmatter as `depends-on: [...]`) + relevant slice of `plan.md` + Test Design section if present + a "Discovered during run on `$BRANCH_NAME` for parent issue `$ISSUE_ID`" attribution paragraph.
2. **Phase 5.5 review questions strengthen to six**: coverage, no overlap, single-purpose, integration contracts (with stronger bar — must be self-contained for cross-session use), testable AC, and *self-containment* (is each task spec runnable as a standalone `/workflow` invocation?). Self-containment is the new load-bearing question because each filed sub-issue runs in isolation.
3. **Split Heuristic recalibration**: keep the existing mechanical thresholds (>2 concerns, >50 lines across >2 files, mixes structural + runtime, etc.) but add a "default to no split" tiebreaker — when in doubt, do not split, because splitting now fans out across user sessions with full orchestration overhead per sub-issue.
4. **Parent issue status is AC-driven**: Phase 9's existing AC checkbox logic (workflow.md flips ticked AC) determines status. If all parent AC are checked → `Done`; if some remain unchecked → stays `In Progress` with the filed sub-issues covering the remaining work.
5. **`depends-on:` frontmatter and Phase 1 enforcement**: `@pm` schema gains a `depends-on: [<ID>, ...]` list. Phase 1 sanity check refuses to start if any listed dependency is not `Done`. Hard block — soft-warn means the user (who has walked away) doesn't see the warning until later.

**Auto-resolved problems:**
- Mid-flight task split (formerly Q2 in Open Questions, ADR-12's adjacent gap): collapses into "file as sub-issue and exit."
- Big-diff Phase 8 reviews: one task = bounded diff (~50 lines per Split Heuristic). No big-diff problem possible.
- Cross-task regression within a run: no cross-task regressions possible inside a single-task run; subsequent sub-issue runs detect them at their own Phase 7 entry validation (which runs the project's test suite).
- Skip-P5.5-when-N=1 optimization: trivially satisfied — N=1 from Phase 6 onward in every run.

**Alternatives:** (a) keep N-task runs, add mid-flight re-splitting via P7→P5 re-entry — doesn't solve big-diff or cross-task regression; (b) keep N-task runs, accept the gaps — leaves three known-bad routes; (c) always one task per issue (skip Phase 5 entirely) — loses the planning-phase split heuristic that's catching legitimate over-scoping at design time.
**Consequences:** runs become shorter and more focused. Each commit/PR carries a bounded scope. Sub-issue fan-out becomes the primary scaling mechanism for multi-step work. `TODO/` sees more sub-issue files; `@pm`'s split-time filing path becomes a hot code path. Concurrent runs in different worktrees on the same repo become trivially safe because each worktree has its own `TODO/` checkout (file conflicts surface as standard git merge conflicts at integration time, not as mid-run race conditions).

---

## 6. Open Questions / Known Gaps

When a question gets answered, move it to the [Design decisions log](#5-design-decisions-log).

### Q1: Phase 5.5 review scope — does `@check` evaluate test-design soundness here?

Currently Phase 5.5 reviews the **split** (coverage, overlap, single-purpose, integration contracts, testable AC). It does *not* explicitly evaluate whether the test approach implied by each task spec is sound. That would partially overlap with Phase 4 (which has a plan-level Test Design section the reviewers evaluate). If a test-design error escapes Phase 4 and is encoded in a task spec, it surfaces at Phase 7 via the mid-impl escalation (ADR-12) — but earlier detection might be cheaper. Open: should Phase 5.5 add "test approach for each task is sound" as a sixth review question, or is that scope creep into Phase 4 territory?

### ~~Q2~~: Mid-flight task split — *closed by ADR-21*

The one-task-per-run model collapses this question. When `@make` discovers task-1 is over-scoped, the unified diagnosis path (ADR-19) returns `split_needed` from `@check`, and the orchestrator aborts to the Failure Handler (no P5 re-entry, no sub-issue filing — the recovery is "discard worktree, re-plan from scratch"). Tasks 2…M are already filed as sub-issues at Phase 5.5 acceptance, so there's no "remaining tasks" cleanup to think about.

### ~~Q3~~: Phase 9 partial-commit rollback — *closed by ADR-14*

The workflow is non-resumable. Phase 9 partial states are addressed by the throwaway-worktree recovery procedure: discard the worktree, delete the branch, re-create from base, re-run. Phase 9 sub-step ordering doesn't need to defend against partial failures because the recovery is "discard everything and re-run."

### Q4: `@simplify` not involved at Phase 5.5 — is that the right call?

Phase 5.5 only dispatches `@check`. Rationale (ADR-9) is that split review is structural, not complexity. But `@simplify`'s lens — "what if we deleted this?" — could legitimately catch unnecessary tasks (e.g. a third task that adds an abstraction nothing else needs). With ADR-21's one-task-per-run pivot, this question gains a different angle: a `@simplify` flag on a sibling sub-issue at Phase 5.5 could prevent filing a wasteful sub-issue, which is more valuable than catching the same redundancy at Phase 8 of a future run. Open: is the cost of one more dispatch worth the catch, especially now that Phase 5.5 is the gate for sub-issue fan-out?

### Q5: Test-design loop bound vs plan-revisit threshold

ADR-12 sets max 2 cycles for the Phase 7 test-design escalation before reverting to Phase 3 plan revisit. The plan-review and final-review loops have max 3. Why the asymmetry? The test-design loop is more expensive per cycle (`@check` + `@test` + `@make` re-implement vs. just reviewers + plan edit), so 2 may be right. But the choice was made by feel, not measured. Open: is 2 the right number, or should it match Phase 4 / Phase 8 at 3?

### Q6: Sub-issue ordering in the parent's `## Sub-issues` list

ADR-21's split-time filing creates new sub-issues with `depends-on:` declarations, but the parent's `## Sub-issues` list (rendered by `@pm`) is currently flat. When dependencies form a chain (sub-issue 2 depends on 1), the user has to read the chain from each sub-issue's frontmatter. Open: should `@pm` render the parent's sub-issue list in dependency order, with a visible indicator (e.g. indentation or `↳`) for dependent items? Cosmetic but would speed up "what to run next" decisions.

### Q7: Concurrent-worktree edge case — sub-issue ID collisions

Two parallel runs in different worktrees, each filing sub-issues, can both pick the same next ID (e.g. both pick `GAL-42` because both saw `GAL-41` as the highest at start). On merge, git surfaces this as a conflict over `TODO/GAL-42.md` content (two different files staked on the same name). Recoverable but annoying. Open: should `@pm`'s ID generation use a strategy that's safer under concurrent runs (e.g. timestamp suffix, branch-prefix, content-addressable), or accept the merge-conflict-on-collision cost given the one-user assumption?

---