nixcfg/config/opencode/workflow-design.md
Harald Hoyer af6481a5a7 feat(opencode): one-task-per-run model + 9 routing fixes (ADRs 13-21)
Captures the design grilling outcome. Adds ADRs 13-21 covering:
- run-level plan_rework_remaining counter to bound P3<->P5.5/P7/P8 thrash
- non-resumable workflow with throwaway-worktree recovery procedure
- @simplify advisory at every gate (not just Phase 8)
- Phase 8 fix specs go to disk as task-fix-N.md (preserves ADR-6)
- Phase 5.5 BLOCK protocol: orchestrator edits plan, decrements counter, re-enters P4
- Phase 8 NOT_TESTABLE manifest in reviewer prompt
- unified Implementation Incomplete diagnosis (test_design / production_logic / split_needed)
- Phase 1 working-tree cleanliness + depends-on enforcement
- one-task-per-run pivot: Phase 5 still splits N tasks, only task-1 runs;
  tasks 2..N filed as sub-issues with rich seed bodies; split_needed at P7
  aborts to Failure Handler (one-task-per-run = no salvageable prior work)

Auto-resolves big-diff Phase 8 reviews, cross-task regression-within-run, and
mid-flight task-split routing. Rewrites routing matrix and three Mermaid
diagrams; updates @pm (depends-on frontmatter, split-time filing), @check
(third diagnosis verdict), @make (escalate: split_needed flag).
2026-05-08 13:02:54 +02:00

40 KiB
Raw Blame History

Workflow Design

1. Purpose

This document is the design rationale and decision log for the multi-agent workflow. The operational rules — what the orchestrator does, in what order, with what guardrails — live in commands/workflow.md and the agent files under agents/. This document is where we discuss changes before they land in those files.

Intended flow:

  1. A new idea, gap, or failure mode comes up (often from a real run).
  2. Discuss in this document — capture context, options, trade-offs.
  3. When a decision is reached, update commands/workflow.md and/or the relevant agent file.
  4. Record the decision in the Design decisions log below.

The operational files stay terse and procedural. The "why" lives here.


2. Cast & Responsibilities

One orchestrator, five subagents. The orchestrator runs in agent: build mode; the subagents are defined as separate agent files under config/opencode/agents/.

Actor File Role Boundary
Orchestrator commands/workflow.md Plans, dispatches, merges findings, edits artifacts under .workflow/, commits. Does not write production code, write tests, or play any subagent's role.
@check agents/check.md Reviews plans / task splits / code for risks, correctness, testability. Read-only — no write / edit / bash.
@simplify agents/simplify.md Reviews for unnecessary complexity. Advisory only. Read-only.
@test agents/test.md Writes failing tests for a task spec, verifies RED. May modify test files / #[cfg(test)] mod blocks. Sandboxed bash.
@make agents/make.md Implements a single task spec. Verifies acceptance criteria. May modify files listed in the task spec. Sandboxed bash; no git / network / cd.
@pm agents/pm.md Reads / updates TODO/ issue files. May modify only TODO/ contents. No bash.

Permission boundaries are enforced per agent. The orchestrator (in agent: build mode) has full edit/bash capabilities, which is precisely why it must not act as the subagents — the agent files are where the limits live.


3. Flow Diagrams

3.1 Phase pipeline

High-level happy path with the major escalation arms. The workflow runs one task per invocation (ADR-21): Phase 5 produces N task files; if N>1, tasks 2…N are filed as sub-issues and only task 1 runs through Phases 68.

flowchart TD
    P1["Phase 1: Sanity Check<br/>incl clean tree + depends-on"]
    P2["Phase 2: Issue Context<br/>pm reads TODO/ID.md"]
    P3["Phase 3: Plan<br/>write plan.md"]
    P4{"Phase 4: Review Plan<br/>check blocking, simplify advisory<br/>max 3 cycles"}
    P5["Phase 5: Split into Tasks<br/>write task-N.md"]
    P55{"Phase 5.5: Review Split<br/>check, 6 questions<br/>max 2 cycles"}
    P5F["File tasks 2..N as sub-issues<br/>only when N more than 1"]
    P6["Phase 6: Write Tests<br/>test, stub-first make"]
    P7["Phase 7: Implement<br/>make, single task"]
    P7E{"Implementation Incomplete<br/>check diagnoses<br/>max 2 cycles"}
    P7F["split_needed: Failure Handler<br/>(discard worktree, re-run)"]
    P8{"Phase 8: Final Review<br/>check blocking, simplify advisory<br/>max 3 cycles"}
    P9["Phase 9: Commit + TODO + Follow-ups + Summary<br/>parent status AC-driven"]

    P1 --> P2 --> P3 --> P4
    P4 -->|ACCEPTABLE| P5 --> P55
    P4 -->|NEEDS WORK or BLOCK| P3
    P55 -->|ACCEPTABLE| P5F --> P6 --> P7
    P55 -->|NEEDS WORK| P5
    P55 -->|BLOCK plan-level| P3
    P7 --> P8
    P7 -.->|Implementation Incomplete| P7E
    P7E -->|test_design or production_logic| P7
    P7E -.->|split_needed| P7F
    P7E -.->|2 cycles exhausted| P3
    P7F --> ABORT([Failure Handler])
    P8 -->|ACCEPTABLE| P9
    P8 -->|production-code finding| P7
    P8 -->|test-design finding| P7E
    P8 -->|plan-level finding| P3
    P9 --> END([Done])

Run-level cap: plan_rework_remaining (default 1, ADR-13) decrements on every P5.5-BLOCK→P4, P7-escalation-exhaustion→P3, and P8-plan-level→P3 transition. Exhausted counter aborts to the Failure Handler.

3.2 Phase 7 escalation loop

The pattern when @make cannot reach GREEN. Unified diagnosis path (ADR-19): every Implementation Incomplete routes through @check test-diagnosis-first; @check returns one of three verdicts.

stateDiagram-v2
    [*] --> Dispatched: orchestrator dispatches make
    Dispatched --> EntryCheck: run tests verify RED
    EntryCheck --> Implementing: failure code matches handoff
    EntryCheck --> CheckDiag: Implementation Incomplete
    Implementing --> GreenReached: tests pass within 2-3 attempts
    Implementing --> CheckDiag: Implementation Incomplete
    CheckDiag --> TestRedesign: verdict test_design
    CheckDiag --> Dispatched: verdict production_logic
    CheckDiag --> FailureHandler: verdict split_needed
    TestRedesign --> Dispatched: test fixes fresh entry validation
    Dispatched --> PlanRevisit: 2 escalation cycles exhausted
    GreenReached --> [*]
    FailureHandler --> [*]: discard worktree, re-run
    PlanRevisit --> [*]: back to Phase 3 if rework budget intact

3.3 Issue lifecycle

How TODO entries move through statuses. In the one-task-per-run model (ADR-21), a single workflow invocation may file multiple sub-issues mid-run, and the parent's final status is AC-driven, not run-driven.

stateDiagram-v2
    [*] --> Todo: issue file created
    Todo --> InProgress: Phase 2 workflow starts
    InProgress --> Done: Phase 9 - all parent AC checked
    InProgress --> InProgress2: Phase 9 - some parent AC remain
    InProgress --> Todo: workflow fails, failure handler adds comment

    note right of InProgress2
        Parent stays In Progress when sub-issues
        cover the unmet AC. User runs sub-issues
        in subsequent /workflow invocations.
    end note

    note right of InProgress
        Sub-issues filed during a run carry:
        - parent: ISSUE_ID, status: Todo
        - label: bug, followup, tech-debt, or split-from-run
        - depends-on: [...] for cross-sub-issue ordering
        - rich seed body for split-time filings (ADR-21)
    end note

    Done --> [*]
    InProgress2 --> [*]

4. Routing Matrix

Every observed (phase, signal) → action. Empty cells are gaps. Walking this table is the cheap way to spot routing issues.

Phase Signal source Signal Action
1 Sanity checks Bare repo / detached HEAD / missing TODO/<ID>.md / branch == base Stop with error
1 Sanity checks Working tree dirty (git status --porcelain non-empty) Stop with error (ADR-20)
1 Sanity checks depends-on: issue not in Done status Stop with error (ADR-21)
2 @pm Issue not found Stop with error
2 @pm Status is Todo Flip to In Progress; propagate to README.md / parent's Sub-issues
3 Orchestrator Plan drafted Apply Dispatch Hygiene; write plan.md; verify test -f
4 @check ACCEPTABLE (regardless of @simplify) Proceed to Phase 5
4 @check NEEDS WORK Edit plan.md in place; re-dispatch (max 3 cycles)
4 @check BLOCK Edit plan.md addressing the finding; re-dispatch
4 @simplify Any verdict (ADR-15) Advisory only — record in summary; never blocks Phase 4 progression
4 Reviewers Same @check finding twice Convergence detected; stop loop early
4 Reviewers Unresolved after 3 cycles Document blockers in summary; proceed
5 Orchestrator Tasks drafted Apply Dispatch Hygiene; write each task-N.md; verify test -f for every N
5.5 @check ACCEPTABLE, N=1 Skip P5.5 entirely (ADR-21); proceed to Phase 6 — degenerate split
5.5 @check ACCEPTABLE, N>1 File tasks 2…N as sub-issues with rich seed bodies (ADR-21) via @pm; proceed to Phase 6 with task-1 only
5.5 @check NEEDS WORK Edit task-N.md in place; re-dispatch (max 2 cycles)
5.5 @check BLOCK plan-level Edit plan.md addressing the BLOCK finding; decrement plan_rework_remaining; re-enter Phase 4 (ADR-17)
5.5 Run-level plan_rework_remaining exhausted Abort to Failure Handler (ADR-13)
6 @test TESTS_READY + escalate_to_check: false Proceed to Phase 7
6 @test TESTS_READY + escalate_to_check: true @check light review → @test fixes → forward
6 @test NOT_TESTABLE (general) @check sign-off; task goes to @make without tests; record in NOT_TESTABLE manifest for Phase 8 (ADR-18)
6 @test NOT_TESTABLE: Missing testability seam @make adds the seam; re-run @test
6 @test BLOCKED Investigate; may need spec or plan revision
6 @test (stub-first) All tests pass with zero todo!() panics Reject — structural-only tests; route back to @test to rewrite
7 @make Implementation Complete Proceed to Phase 8
7 @make Implementation Incomplete (any flag or no flag) Route through @check test-diagnosis-first (ADR-19); orchestrator follows @check's verdict
7 @check diagnosis test_design Dispatch @test to redesign tests; fresh @make re-attempt
7 @check diagnosis production_logic Re-dispatch @make with @check's production-side notes
7 @check diagnosis split_needed Abort to Failure Handler (ADR-21 / Q19a). In the one-task-per-run model task-1 is the only task; no AC have been satisfied; recovery is "discard worktree, re-plan from scratch." @pm adds a comment recording the diagnosis.
7 Escalation loop 2 cycles exhausted Decrement plan_rework_remaining; back to Phase 3 (plan revisit)
8 @check ACCEPTABLE Proceed to Phase 9
8 @check BLOCK / behavioral / production-code finding Write task-fix-<N>.md to $RUN_DIR/ (ADR-16); dispatch @make against it (max 3 cycles)
8 @check BLOCK / test-design / test-quality finding Route through @check diagnosis → @test@make re-verify
8 @check BLOCK / plan-level finding Decrement plan_rework_remaining; back to Phase 3 with the finding
8 @simplify Any verdict (ADR-15) Advisory only — record in summary; never blocks Phase 8 progression
8 Reviewers Strictly cosmetic finding (typo, missing newline, AST-preserving) Orchestrator fixes directly; re-review
8 Reviewers NOT_TESTABLE manifest task flagged as questionable Apply same routing as a normal @check finding for that task
8 Review loop Same finding twice Convergence; stop loop
8 Review loop 3 cycles exhausted Document blockers; proceed
9 Orchestrator Pre-existing bug, out of scope File sub-issue via @pm (label: bug)
9 Orchestrator Unresolved review-loop blocker File sub-issue via @pm (label: followup)
9 @test (Phase 6) NOT_TESTABLE future-seam note File sub-issue via @pm (label: tech-debt)
9 Orchestrator @simplify advisory not acted on Record in summary; do NOT file (records, not work)
9 Orchestrator All parent AC checked off Set issue status to Done; sync README/parent; commit chore(todo): …
9 Orchestrator Some parent AC remain unchecked AND sub-issues exist Leave issue at In Progress; commit chore(todo): …
Run-level Failure Handler Workflow is non-resumable (ADR-14) Document the cleanup procedure: git worktree remove, delete branch, re-create from base, retry

5. Design Decisions Log

ADR-flavoured. New decisions append at the end. If a decision is later reversed or refined, mark the original Superseded by ADR-N and add a new entry.

ADR-1 (2026-05-06) — Forge-agnostic workflow

Context: original gist used the GitHub gh CLI for auth checks and gh pr create --draft at the end of the run. Decision: workflow stops at git commit. No push, no PR/MR creation, no gh references anywhere. Alternatives: keep gh integration; abstract behind a forge-plugin interface. Consequences: workflow runs on any git host; user opens PR/MR manually on whichever forge they use. Removes the need for forge auth setup as a prerequisite.

ADR-2 (2026-05-06) — @pm operates on local TODO/ folder

Context: original @pm agent used the Linear CLI. Decision: Linear-style folder-as-tracker with one <ID>.md file per issue plus a category-grouped README.md. Alternatives: keep Linear; multi-backend abstraction; single-file TODO.md. Consequences: project-local, version-controlled, no external service. Schema enforced in agents/pm.md. Initial single-file design moved to per-issue files in ADR-12.

ADR-3 (2026-05-07) — Workflow runs in worktree, not bare repo

Context: original orchestrated bare-clone → worktree creation as Phase 3 of the workflow. Decision: user creates the worktree before launching opencode; the workflow assumes CWD is the worktree. Alternatives: keep auto-worktree-creation; auto-detect bare vs. worktree. Consequences: simpler workflow; opencode CWD = worktree, so subagents inherit the right project root naturally; less plumbing around WORKTREE_PATH. (Subagents still get absolute paths in dispatch prompts — see ADR-7.)

ADR-4 (2026-05-07) — @make and @test are polyglot

Context: original was Python-only via uv. Decision: detect toolchain from marker files (pyproject.toml, Cargo.toml, flake.nix); wrap all toolchain commands in nix develop -c if a devshell is present. Alternatives: per-language agents; keep Python-only. Consequences: one agent per role serves multiple languages. Permission allowlists expanded for cargo and nix develop -c. Bash sandbox still denies shell escapes inside the wrapper.

ADR-5 (2026-05-07) — Subagent CWD via absolute paths

Context: opencode subagents do not inherit the orchestrator's cd. A @check dispatched from inside a worktree resolved relative paths against the parent project root and failed with "file not found." Decision: capture WORKTREE_PATH in Phase 1 and pass absolute paths to every subagent dispatch. Alternatives: patch opencode (out of scope); symlink dance. Consequences: every dispatch has an explicit Worktree: <abs path> header convention. Verbose but reliable. Eventually superseded by run-artifact paths under $RUN_DIR (ADR-7).

ADR-6 (2026-05-08) — Run artifacts on disk in .workflow/run-<ID>/

Context: the orchestrator was paraphrasing the plan and task specs into each dispatch prompt. Result: @check and @simplify could see slightly different versions of the same plan; mid-loop revisions could leak as "actually let me reconsider…" passages; long specs ate context budget on every dispatch. Decision: orchestrator writes plan.md (Phase 3), task-N.md (Phase 5), and summary.md (Phase 9) to $WORKTREE_PATH/.workflow/run-<ISSUE_ID>/. Dispatches name files by absolute path; subagents read them. Alternatives: inline prompts (status quo); database; in-memory orchestrator state. Consequences: byte-for-byte source of truth across dispatches. Mid-loop revisions edit the file in place; every subsequent reader sees the new version. Run-artifact directory is gitignored (.workflow/).

ADR-7 (2026-05-08) — Stub-first Rust TDD (mandatory for new symbols)

Context: Rust integration tests reference symbols imported from lib.rs. If those symbols don't exist yet, the test crate fails to compile — a build-error RED with no stack trace and no assertion diagnostics. Same for module tests against not-yet-existing functions. Decision: for any Rust task that introduces new symbols, dispatch a stub-pass @make first (writes todo!()-bodied stubs, runs cargo check only). Then @test runs against compiling stubs; runtime panic on todo!() is the clean RED. Then @make body pass replaces stubs. Alternatives: accept compile-error RED; let @make write tests + bodies in one pass; allow @test to add stubs to production source. Consequences: two atomic commits per affected task (feat: scaffold X with todo!() stubs, then feat: implement X). Stub-pass scope is tight: bodies are exactly todo!(), signatures must match the planned final API. Phase 6 also adds a mandatory panic-coverage check after @test: every test must panic on todo!() to prove it actually exercises the stubbed symbols (catches structural-only tests).

On reviewer bypass: the stub-pass commit is not sent through Phase 5.5 or Phase 8 review. The bypass is intentional and safe because (a) stubs are mechanical — signatures plus todo!(), no logic; (b) the body-pass commit is reviewed and the body-pass diff strictly subsumes the stub-pass diff (the same signatures, now with bodies); (c) Phase 6's mandatory panic-coverage check is what actually validates that the stubs are exercised. Reviewing the stub-pass would duplicate work that the body-pass review catches anyway.

ADR-8 (2026-05-08) — @test may write inside #[cfg(test)] mod blocks

Context: Rust unit tests live colocated in production source files inside #[cfg(test)] mod tests { … } blocks — the canonical idiom, not an edge case. Original @test File Constraint forbade src/ writes entirely, which forced @make to write both production code and tests in a single dispatch. This lost the RED→GREEN separation that TDD relies on. Decision: @test may modify src/**/*.rs strictly inside #[cfg(test)] mod <name> { … } blocks. Every line outside such a block stays read-only. Alternatives: keep the restriction; write all unit-level tests as integration tests. Consequences: TDD works for module tests as well as integration tests. The previous Phase 6 file gate (path-based git status snapshot diff) is removed — with @test now legitimately writing inside src/, a path-based gate proves nothing. Constraint is now enforced by the prompt rule, the diff being human-reviewable, and @check flagging production-code drift in Phase 8.

ADR-9 (2026-05-08) — Phase 5.5 task-split review by @check

Context: ppries' README mentioned @check reviewing the task split for completeness, but the gist's workflow.md never implemented it. Without a split-review gate, an over- or under-split task surfaced only at Phase 8 final review — after expensive @test and @make dispatches had already run on a broken split. Decision: new Phase 5.5 dispatches @check against plan.md + every task-N.md to evaluate the split against five questions: coverage, no overlap, single-purpose, integration contracts, testable AC. Max 2 cycles; BLOCK routes back to Phase 4 (plan itself doesn't decompose). Alternatives: status quo (catch at Phase 8); orchestrator self-check. Consequences: one extra @check dispatch per run. @simplify is not involved at this phase — split review is structural, not complexity. Cheaper failure modes for over-/under-split tasks.

ADR-10 (2026-05-08) — @pm is single-mode (filesystem only)

Context: @pm had two read modes — git show <ref>:TODO.md (read-only) and filesystem (read/write). Git-ref mode existed for the bare-repo flow that ADR-3 retired. After ADR-3, the workflow always used filesystem mode; git-ref mode was dead weight that still added bash permissions and doc surface. Decision: remove git-ref mode. @pm has no bash access. Ad-hoc historical reads (git show main:TODO/GAL-39.md) are out of scope — the user runs them directly. Alternatives: keep dual-mode; document the separation more clearly. Consequences: simpler agent. One less permission allowlist to maintain. Workflow's "(live filesystem mode)" qualifier dropped from Phase 2 / Phase 9 / Failure handler.

ADR-11 (2026-05-08) — Phase 9 files follow-ups as TODO sub-issues

Context: unresolved items (pre-existing bugs out of scope, blocked review findings, future-seam notes) were recorded only in summary.md — per-run, untracked, overwritten on the next run, read by nobody since the user has walked away. Decision: Phase 9 has a ### File Follow-ups step that dispatches @pm to create new TODO sub-issues for tracked-worthy items. Each new issue has parent: <ISSUE_ID>, status Todo, and an appropriate label (bug / followup / tech-debt). @simplify advisories that the orchestrator chose not to act on stay in the summary as records, not filed. Alternatives: leave items in summary; create as top-level issues (would need a README.md category, which can't be picked at unattended runtime). Consequences: unresolved items become tracked work. Sub-issue routing avoids the README-category problem. The follow-up files commit alongside the worked-issue update in a single chore(todo): … commit.

ADR-12 (2026-05-08) — Phase 7 mid-implementation escalation

Context: Phase 7's escalation rule was gated on @make flagging concerns during entry validation (the RED check before implementing). When @make got past entry validation, started implementing, and then ground for 2-3 attempts because the test demanded impossible production code, the orchestrator had no documented route — it would re-dispatch @make with marginal context tweaks instead of recognizing the diagnosis as test-architecture failure. Decision: split Phase 7's escalation into entry-validation and mid-implementation paths. @make reports escalate: test_design when its iteration limit is reached and the test seems to demand impossible / unreasonable code. Both paths route through @check (test diagnosis) → @test (redesign) → fresh @make dispatch. Max 2 escalation cycles before reverting to Phase 3 plan revisit. Alternatives: status quo; let @make modify test files itself. Consequences: faster recovery from test-design errors. Bounded loop prevents thrashing. @make.md Iteration Limits section gains a new red-flag class. Superseded in part by ADR-19 (unified diagnosis path).

ADR-13 (2026-05-08) — Run-level plan_rework_remaining counter

Context: several routes return control to an upstream phase when downstream signals reveal the upstream artifact was wrong: P5.5-BLOCK→P4 (split doesn't decompose), P7-escalation-exhaustion→P3 (test/code thrash exceeded its bound), P8-plan-level→P3 (final review exposes a plan defect). Each upstream phase has its own per-loop cycle cap (P4 max 3, P5.5 max 2, etc.), but those caps reset on every re-entry — so a run could in principle thrash P3↔P4↔P5.5↔P3 indefinitely without violating any local rule. Decision: introduce one run-level counter, plan_rework_remaining, default value 1. It decrements on every transition where downstream signal forces upstream rework: P5.5 BLOCK → P4, P7 escalation exhausted → P3, P8 plan-level finding → P3. When the counter is 0 and another such transition fires, abort to the Failure Handler instead of re-entering. Per-phase cycle caps are unchanged. Alternatives: (a) a global max_subagent_dispatches budget — over-engineered for the specific failure mode; (b) document the resets as intentional and rely on convergence detection — leaves the bug present. Consequences: at most two plan attempts per run (the initial plan plus one revision). Failure Handler invocation distinct in cause from earlier-phase aborts: the cleanup is the same (per ADR-14) but the summary explains which downstream signal exhausted the budget.

ADR-14 (2026-05-08) — Workflow is non-resumable

Context: Phase 9 has multiple sub-steps (code commit → @pm status update → file follow-ups → TODO commit → summary). Crashing between any two sub-steps leaves the worktree in a state that earlier docs called "partial." The original Failure Handler did not flip status back, did not recognize partial-Phase-9 separately from earlier-phase crashes, and re-running /workflow after a crash could append new comments and re-do work indefinitely. Decision: declare the workflow non-resumable. On any failure (Failure Handler invocation), the recovery procedure is: git worktree remove the failed worktree, delete the feature branch, re-create the worktree from $BASE_BRANCH, then re-run /workflow. Document this explicitly in the Failure Handler section. The throwaway-worktree model means there is no in-place resume state to corrupt — the user discards the worktree and starts fresh. Alternatives: (a) smarter Failure Handler that cleans up partial state idempotently; (b) transactional Phase 9 via a state file; (c) idempotent sub-steps so re-runs auto-resume. Consequences: simplest possible recovery model. Phase 9 sub-step ordering doesn't need to be defended against partial failures — partial state is acceptable because the recovery is "discard everything and re-run." User-initiated cancellation (Ctrl-C) follows the same procedure.

ADR-15 (2026-05-08) — @simplify is advisory at every gate

Context: the Phase 4 routing matrix used to read "Either reviewer NEEDS WORK → re-dispatch the loop," giving @simplify veto power equivalent to @check's. Phase 8's matrix said @simplify was advisory only ("Record in summary's 'Advisory notes (not filed)'"). Same agent, two different powers. Decision: @simplify is advisory at every gate. Its findings are recorded in the run summary; they never force a re-dispatch loop. @check is the only reviewer with veto authority (NEEDS WORK / BLOCK). Alternatives: (a) make @simplify blocking everywhere — too heavy for a heuristic agent prone to false positives; (b) keep the asymmetry and document a principle — fragile. Consequences: uniform model — @check enforces correctness, @simplify advises on shape. Phase 4 review loops only run on @check findings; @simplify complexity flags get logged in the summary like at Phase 8. The user can manually promote a @simplify finding if it matters.

ADR-16 (2026-05-08) — Phase 8 fix specs go to disk

Context: when Phase 8 review surfaced a behavioral or production-code finding, the orchestrator would "build a new @make task spec from the finding" and dispatch it inline. That violates ADR-6's invariant (run artifacts on disk, no inline paraphrase) — and exactly when it matters most, because Phase 8 has up to 3 review cycles and the same finding can re-dispatch. Decision: Phase 8 fix dispatches write a new artifact $RUN_DIR/task-fix-<N>.md (1-indexed within the Phase 8 cycle) before dispatching @make. Same Dispatch Hygiene rules as Phase 5 task specs, same test -f verification. Cosmetic findings (orchestrator fixes directly per workflow.md) skip the file — only @make-dispatched findings get one. Alternatives: (a) inline in the dispatch prompt with an ADR-6 footnote — erodes the invariant for the highest-risk dispatch class; (b) edit the original task-N.md — muddies the audit trail of an already-met spec. Consequences: ADR-6's invariant holds end-to-end. Phase 8 cycles re-dispatch against the same on-disk file (mid-loop edits in place), eliminating paraphrase drift across review cycles.

ADR-17 (2026-05-08) — Phase 5.5 BLOCK protocol

Context: Phase 5.5 BLOCK ("plan does not decompose cleanly") used to route "back to Phase 4 with @check's finding," but @check at 5.5 evaluated the split, not the plan; its finding may not map cleanly to a plan edit. Re-entering P4 with the same plan.md and a finding tagged on the prompt asks the wrong question. Decision: on P5.5 BLOCK, the orchestrator translates the split-level finding into a concrete plan.md edit (e.g. "the plan conflates structural and runtime work; split into two milestones"), saves the edit, decrements plan_rework_remaining (per ADR-13), and re-dispatches Phase 4 reviewers against the revised plan. P4 reviewers see a genuinely different plan. Alternatives: (a) re-dispatch P4 unchanged with finding attached — burns reviewers on a known-broken plan; (b) treat P5.5 BLOCK as terminal — too strict, we have the rework budget for one revisit. Consequences: P5.5 BLOCK is an effective signal. The orchestrator's plan-edit step is mandatory; skipping it is a routing error. Run-level rework budget bounds the loop.

ADR-18 (2026-05-08) — Phase 8 NOT_TESTABLE manifest

Context: Phase 6 routes NOT_TESTABLE tasks through @check for sign-off, then dispatches @make without tests. Phase 8 reviews the diff but has no signal that "this change has no test because @test claimed it untestable." If @check at P6 was wrong, untested code ships. Decision: Phase 8's dispatch prompt includes a "Tasks completed without tests (NOT_TESTABLE)" section listing each task ID, the @test justification, and the @check sign-off rationale. Reviewers explicitly evaluate "does the justification still hold given the final diff?" If a reviewer pushes back, routing follows the normal Phase 8 finding rules. Alternatives: (a) double-up @check + @simplify at P6 NOT_TESTABLE granting — doubles dispatch cost without targeting the actual gap; (b) restrict NOT_TESTABLE to a fixed taxonomy — won't generalize across languages; (c) reject NOT_TESTABLE entirely — ignores legitimate cases. Consequences: pure plumbing change. P8 reviewers gain visibility into the bypass without new agents or new authority.

ADR-19 (2026-05-08) — Unified Implementation Incomplete diagnosis path

Context: ADR-12 introduced three paths for @make reporting Implementation Incomplete: entry-validation flag, mid-impl escalate: test_design flag, no flag (re-dispatch with @check notes once, escalate after second failure). Three paths converging on the same destination (@check test-diagnosis → @test redesign or @make re-dispatch) added matrix surface and obscured the routing. Decision: every Implementation Incomplete from @make routes through @check test-diagnosis-first. @check returns one of three verdicts — test_design (route to @test redesign), production_logic (re-dispatch @make with @check's notes), or split_needed (per ADR-21). @make's self-diagnosis flag becomes a hint for @check, not a control-flow input for the orchestrator. Alternatives: (a) keep three paths, tighten what @check reviews in each — preserves the surface area; (b) push burden to @make — orchestrator still needs to gate via @check. Consequences: routing logic shrinks. Matrix has fewer rows. ADR-12's split-into-two-paths is partially superseded — the escalation diagnosis unified, the iteration limit (max 2 cycles) preserved.

ADR-20 (2026-05-08) — Phase 1 working-tree cleanliness check

Context: Phase 1 verified non-bare repo, branch identity, base branch, issue file presence — but not that the working tree was clean. Stale uncommitted edits would be swept into the Phase 9 commit (workflow.md stages "code changes only" but doesn't distinguish which code) or a wip: failure commit. Decision: Phase 1 runs git status --porcelain; if non-empty, stop with: "Working tree must be clean. Commit or stash uncommitted changes before running the workflow." Alternatives: (a) capture initial dirty state, stage only files modified by the workflow at Phase 9 — error-prone baseline tracking; (b) document the requirement, don't enforce — // TODO: don't forget in design-doc form. Consequences: matches the ADR-14 throwaway-worktree model. One additional sanity-check line. User's "but I have manual edits I want the workflow to build on" case is solved by them committing those edits first, which is what they should do anyway.

ADR-21 (2026-05-08) — One-task-per-run model

Context: the workflow originally executed N tasks per run, sequentially through Phase 7. That introduced cross-task regression risk (task 4 breaks task 1's tests, found N tasks late), big-diff Phase 8 reviews (multi-day branches accumulate thousands of diff lines that hit reviewer context limits silently), and the mid-flight task-split problem (when @make discovers task N is over-scoped, no documented route to re-split). It also coupled the workflow's success to "all N tasks complete," when in practice an issue worth one good commit shouldn't depend on unrelated downstream work succeeding. Decision: every workflow run executes exactly one task through Phases 68. Phase 5 still splits the plan into N tasks via the Split Heuristic. If N=1, proceed normally. If N>1, the orchestrator dispatches @pm to file tasks 2…M as TODO sub-issues before Phase 6 starts, and only task 1 runs through Phases 68. If task 1 itself reports split_needed mid-Phase-7, abort to the Failure Handler (Q19a: in the one-task-per-run model task-1 is the only task in the run, so no feature AC have been satisfied; recovery is the standard non-resumable cleanup from ADR-14).

The model carries five sub-decisions:

  1. Sub-issue body schema for split-time filings: rich seed body that lets a fresh /workflow invocation re-plan and implement without seeing siblings or the original plan.md. Includes task description + AC + Code Context + Integration Contracts (declared in frontmatter as depends-on: [...]) + relevant slice of plan.md + Test Design section if present + a "Discovered during run on $BRANCH_NAME for parent issue $ISSUE_ID" attribution paragraph.
  2. Phase 5.5 review questions strengthen to six: coverage, no overlap, single-purpose, integration contracts (with stronger bar — must be self-contained for cross-session use), testable AC, and self-containment (is each task spec runnable as a standalone /workflow invocation?). Self-containment is the new load-bearing question because each filed sub-issue runs in isolation.
  3. Split Heuristic recalibration: keep the existing mechanical thresholds (>2 concerns, >50 lines across >2 files, mixes structural + runtime, etc.) but add a "default to no split" tiebreaker — when in doubt, do not split, because splitting now fans out across user sessions with full orchestration overhead per sub-issue.
  4. Parent issue status is AC-driven: Phase 9's existing AC checkbox logic (workflow.md flips ticked AC) determines status. If all parent AC are checked → Done; if some remain unchecked → stays In Progress with the filed sub-issues covering the remaining work.
  5. depends-on: frontmatter and Phase 1 enforcement: @pm schema gains a depends-on: [<ID>, ...] list. Phase 1 sanity check refuses to start if any listed dependency is not Done. Hard block — soft-warn means the user (who has walked away) doesn't see the warning until later.

Auto-resolved problems:

  • Mid-flight task split (formerly Q2 in Open Questions, ADR-12's adjacent gap): collapses into "file as sub-issue and exit."
  • Big-diff Phase 8 reviews: one task = bounded diff (~50 lines per Split Heuristic). No big-diff problem possible.
  • Cross-task regression within a run: no cross-task regressions possible inside a single-task run; subsequent sub-issue runs detect them at their own Phase 7 entry validation (which runs the project's test suite).
  • Skip-P5.5-when-N=1 optimization: trivially satisfied — N=1 from Phase 6 onward in every run.

Alternatives: (a) keep N-task runs, add mid-flight re-splitting via P7→P5 re-entry — doesn't solve big-diff or cross-task regression; (b) keep N-task runs, accept the gaps — leaves three known-bad routes; (c) always one task per issue (skip Phase 5 entirely) — loses the planning-phase split heuristic that's catching legitimate over-scoping at design time. Consequences: runs become shorter and more focused. Each commit/PR carries a bounded scope. Sub-issue fan-out becomes the primary scaling mechanism for multi-step work. TODO/ sees more sub-issue files; @pm's split-time filing path becomes a hot code path. Concurrent runs in different worktrees on the same repo become trivially safe because each worktree has its own TODO/ checkout (file conflicts surface as standard git merge conflicts at integration time, not as mid-run race conditions).


6. Open Questions / Known Gaps

When a question gets answered, move it to the Design decisions log.

Q1: Phase 5.5 review scope — does @check evaluate test-design soundness here?

Currently Phase 5.5 reviews the split (coverage, overlap, single-purpose, integration contracts, testable AC). It does not explicitly evaluate whether the test approach implied by each task spec is sound. That would partially overlap with Phase 4 (which has a plan-level Test Design section the reviewers evaluate). If a test-design error escapes Phase 4 and is encoded in a task spec, it surfaces at Phase 7 via the mid-impl escalation (ADR-12) — but earlier detection might be cheaper. Open: should Phase 5.5 add "test approach for each task is sound" as a sixth review question, or is that scope creep into Phase 4 territory?

Q2: Mid-flight task split — closed by ADR-21

The one-task-per-run model collapses this question. When @make discovers task-1 is over-scoped, the unified diagnosis path (ADR-19) returns split_needed from @check, and the orchestrator aborts to the Failure Handler (no P5 re-entry, no sub-issue filing — the recovery is "discard worktree, re-plan from scratch"). Tasks 2…M are already filed as sub-issues at Phase 5.5 acceptance, so there's no "remaining tasks" cleanup to think about.

Q3: Phase 9 partial-commit rollback — closed by ADR-14

The workflow is non-resumable. Phase 9 partial states are addressed by the throwaway-worktree recovery procedure: discard the worktree, delete the branch, re-create from base, re-run. Phase 9 sub-step ordering doesn't need to defend against partial failures because the recovery is "discard everything and re-run."

Q4: @simplify not involved at Phase 5.5 — is that the right call?

Phase 5.5 only dispatches @check. Rationale (ADR-9) is that split review is structural, not complexity. But @simplify's lens — "what if we deleted this?" — could legitimately catch unnecessary tasks (e.g. a third task that adds an abstraction nothing else needs). With ADR-21's one-task-per-run pivot, this question gains a different angle: a @simplify flag on a sibling sub-issue at Phase 5.5 could prevent filing a wasteful sub-issue, which is more valuable than catching the same redundancy at Phase 8 of a future run. Open: is the cost of one more dispatch worth the catch, especially now that Phase 5.5 is the gate for sub-issue fan-out?

Q5: Test-design loop bound vs plan-revisit threshold

ADR-12 sets max 2 cycles for the Phase 7 test-design escalation before reverting to Phase 3 plan revisit. The plan-review and final-review loops have max 3. Why the asymmetry? The test-design loop is more expensive per cycle (@check + @test + @make re-implement vs. just reviewers + plan edit), so 2 may be right. But the choice was made by feel, not measured. Open: is 2 the right number, or should it match Phase 4 / Phase 8 at 3?

Q6: Sub-issue ordering in the parent's ## Sub-issues list

ADR-21's split-time filing creates new sub-issues with depends-on: declarations, but the parent's ## Sub-issues list (rendered by @pm) is currently flat. When dependencies form a chain (sub-issue 2 depends on 1), the user has to read the chain from each sub-issue's frontmatter. Open: should @pm render the parent's sub-issue list in dependency order, with a visible indicator (e.g. indentation or ) for dependent items? Cosmetic but would speed up "what to run next" decisions.

Q7: Concurrent-worktree edge case — sub-issue ID collisions

Two parallel runs in different worktrees, each filing sub-issues, can both pick the same next ID (e.g. both pick GAL-42 because both saw GAL-41 as the highest at start). On merge, git surfaces this as a conflict over TODO/GAL-42.md content (two different files staked on the same name). Recoverable but annoying. Open: should @pm's ID generation use a strategy that's safer under concurrent runs (e.g. timestamp suffix, branch-prefix, content-addressable), or accept the merge-conflict-on-collision cost given the one-user assumption?