Captures the design grilling outcome. Adds ADRs 13-21 covering: - run-level plan_rework_remaining counter to bound P3<->P5.5/P7/P8 thrash - non-resumable workflow with throwaway-worktree recovery procedure - @simplify advisory at every gate (not just Phase 8) - Phase 8 fix specs go to disk as task-fix-N.md (preserves ADR-6) - Phase 5.5 BLOCK protocol: orchestrator edits plan, decrements counter, re-enters P4 - Phase 8 NOT_TESTABLE manifest in reviewer prompt - unified Implementation Incomplete diagnosis (test_design / production_logic / split_needed) - Phase 1 working-tree cleanliness + depends-on enforcement - one-task-per-run pivot: Phase 5 still splits N tasks, only task-1 runs; tasks 2..N filed as sub-issues with rich seed bodies; split_needed at P7 aborts to Failure Handler (one-task-per-run = no salvageable prior work) Auto-resolves big-diff Phase 8 reviews, cross-task regression-within-run, and mid-flight task-split routing. Rewrites routing matrix and three Mermaid diagrams; updates @pm (depends-on frontmatter, split-time filing), @check (third diagnosis verdict), @make (escalate: split_needed flag).
40 KiB
Workflow Design
1. Purpose
This document is the design rationale and decision log for the multi-agent workflow. The operational rules — what the orchestrator does, in what order, with what guardrails — live in commands/workflow.md and the agent files under agents/. This document is where we discuss changes before they land in those files.
Intended flow:
- A new idea, gap, or failure mode comes up (often from a real run).
- Discuss in this document — capture context, options, trade-offs.
- When a decision is reached, update
commands/workflow.mdand/or the relevant agent file. - Record the decision in the Design decisions log below.
The operational files stay terse and procedural. The "why" lives here.
2. Cast & Responsibilities
One orchestrator, five subagents. The orchestrator runs in agent: build mode; the subagents are defined as separate agent files under config/opencode/agents/.
| Actor | File | Role | Boundary |
|---|---|---|---|
| Orchestrator | commands/workflow.md |
Plans, dispatches, merges findings, edits artifacts under .workflow/, commits. |
Does not write production code, write tests, or play any subagent's role. |
@check |
agents/check.md |
Reviews plans / task splits / code for risks, correctness, testability. | Read-only — no write / edit / bash. |
@simplify |
agents/simplify.md |
Reviews for unnecessary complexity. Advisory only. | Read-only. |
@test |
agents/test.md |
Writes failing tests for a task spec, verifies RED. | May modify test files / #[cfg(test)] mod blocks. Sandboxed bash. |
@make |
agents/make.md |
Implements a single task spec. Verifies acceptance criteria. | May modify files listed in the task spec. Sandboxed bash; no git / network / cd. |
@pm |
agents/pm.md |
Reads / updates TODO/ issue files. |
May modify only TODO/ contents. No bash. |
Permission boundaries are enforced per agent. The orchestrator (in agent: build mode) has full edit/bash capabilities, which is precisely why it must not act as the subagents — the agent files are where the limits live.
3. Flow Diagrams
3.1 Phase pipeline
High-level happy path with the major escalation arms. The workflow runs one task per invocation (ADR-21): Phase 5 produces N task files; if N>1, tasks 2…N are filed as sub-issues and only task 1 runs through Phases 6–8.
flowchart TD
P1["Phase 1: Sanity Check<br/>incl clean tree + depends-on"]
P2["Phase 2: Issue Context<br/>pm reads TODO/ID.md"]
P3["Phase 3: Plan<br/>write plan.md"]
P4{"Phase 4: Review Plan<br/>check blocking, simplify advisory<br/>max 3 cycles"}
P5["Phase 5: Split into Tasks<br/>write task-N.md"]
P55{"Phase 5.5: Review Split<br/>check, 6 questions<br/>max 2 cycles"}
P5F["File tasks 2..N as sub-issues<br/>only when N more than 1"]
P6["Phase 6: Write Tests<br/>test, stub-first make"]
P7["Phase 7: Implement<br/>make, single task"]
P7E{"Implementation Incomplete<br/>check diagnoses<br/>max 2 cycles"}
P7F["split_needed: Failure Handler<br/>(discard worktree, re-run)"]
P8{"Phase 8: Final Review<br/>check blocking, simplify advisory<br/>max 3 cycles"}
P9["Phase 9: Commit + TODO + Follow-ups + Summary<br/>parent status AC-driven"]
P1 --> P2 --> P3 --> P4
P4 -->|ACCEPTABLE| P5 --> P55
P4 -->|NEEDS WORK or BLOCK| P3
P55 -->|ACCEPTABLE| P5F --> P6 --> P7
P55 -->|NEEDS WORK| P5
P55 -->|BLOCK plan-level| P3
P7 --> P8
P7 -.->|Implementation Incomplete| P7E
P7E -->|test_design or production_logic| P7
P7E -.->|split_needed| P7F
P7E -.->|2 cycles exhausted| P3
P7F --> ABORT([Failure Handler])
P8 -->|ACCEPTABLE| P9
P8 -->|production-code finding| P7
P8 -->|test-design finding| P7E
P8 -->|plan-level finding| P3
P9 --> END([Done])
Run-level cap: plan_rework_remaining (default 1, ADR-13) decrements on every P5.5-BLOCK→P4, P7-escalation-exhaustion→P3, and P8-plan-level→P3 transition. Exhausted counter aborts to the Failure Handler.
3.2 Phase 7 escalation loop
The pattern when @make cannot reach GREEN. Unified diagnosis path (ADR-19): every Implementation Incomplete routes through @check test-diagnosis-first; @check returns one of three verdicts.
stateDiagram-v2
[*] --> Dispatched: orchestrator dispatches make
Dispatched --> EntryCheck: run tests verify RED
EntryCheck --> Implementing: failure code matches handoff
EntryCheck --> CheckDiag: Implementation Incomplete
Implementing --> GreenReached: tests pass within 2-3 attempts
Implementing --> CheckDiag: Implementation Incomplete
CheckDiag --> TestRedesign: verdict test_design
CheckDiag --> Dispatched: verdict production_logic
CheckDiag --> FailureHandler: verdict split_needed
TestRedesign --> Dispatched: test fixes fresh entry validation
Dispatched --> PlanRevisit: 2 escalation cycles exhausted
GreenReached --> [*]
FailureHandler --> [*]: discard worktree, re-run
PlanRevisit --> [*]: back to Phase 3 if rework budget intact
3.3 Issue lifecycle
How TODO entries move through statuses. In the one-task-per-run model (ADR-21), a single workflow invocation may file multiple sub-issues mid-run, and the parent's final status is AC-driven, not run-driven.
stateDiagram-v2
[*] --> Todo: issue file created
Todo --> InProgress: Phase 2 workflow starts
InProgress --> Done: Phase 9 - all parent AC checked
InProgress --> InProgress2: Phase 9 - some parent AC remain
InProgress --> Todo: workflow fails, failure handler adds comment
note right of InProgress2
Parent stays In Progress when sub-issues
cover the unmet AC. User runs sub-issues
in subsequent /workflow invocations.
end note
note right of InProgress
Sub-issues filed during a run carry:
- parent: ISSUE_ID, status: Todo
- label: bug, followup, tech-debt, or split-from-run
- depends-on: [...] for cross-sub-issue ordering
- rich seed body for split-time filings (ADR-21)
end note
Done --> [*]
InProgress2 --> [*]
4. Routing Matrix
Every observed (phase, signal) → action. Empty cells are gaps. Walking this table is the cheap way to spot routing issues.
| Phase | Signal source | Signal | Action |
|---|---|---|---|
| 1 | Sanity checks | Bare repo / detached HEAD / missing TODO/<ID>.md / branch == base |
Stop with error |
| 1 | Sanity checks | Working tree dirty (git status --porcelain non-empty) |
Stop with error (ADR-20) |
| 1 | Sanity checks | depends-on: issue not in Done status |
Stop with error (ADR-21) |
| 2 | @pm |
Issue not found | Stop with error |
| 2 | @pm |
Status is Todo |
Flip to In Progress; propagate to README.md / parent's Sub-issues |
| 3 | Orchestrator | Plan drafted | Apply Dispatch Hygiene; write plan.md; verify test -f |
| 4 | @check |
ACCEPTABLE (regardless of @simplify) |
Proceed to Phase 5 |
| 4 | @check |
NEEDS WORK | Edit plan.md in place; re-dispatch (max 3 cycles) |
| 4 | @check |
BLOCK | Edit plan.md addressing the finding; re-dispatch |
| 4 | @simplify |
Any verdict (ADR-15) | Advisory only — record in summary; never blocks Phase 4 progression |
| 4 | Reviewers | Same @check finding twice |
Convergence detected; stop loop early |
| 4 | Reviewers | Unresolved after 3 cycles | Document blockers in summary; proceed |
| 5 | Orchestrator | Tasks drafted | Apply Dispatch Hygiene; write each task-N.md; verify test -f for every N |
| 5.5 | @check |
ACCEPTABLE, N=1 | Skip P5.5 entirely (ADR-21); proceed to Phase 6 — degenerate split |
| 5.5 | @check |
ACCEPTABLE, N>1 | File tasks 2…N as sub-issues with rich seed bodies (ADR-21) via @pm; proceed to Phase 6 with task-1 only |
| 5.5 | @check |
NEEDS WORK | Edit task-N.md in place; re-dispatch (max 2 cycles) |
| 5.5 | @check |
BLOCK plan-level | Edit plan.md addressing the BLOCK finding; decrement plan_rework_remaining; re-enter Phase 4 (ADR-17) |
| 5.5 | Run-level | plan_rework_remaining exhausted |
Abort to Failure Handler (ADR-13) |
| 6 | @test |
TESTS_READY + escalate_to_check: false |
Proceed to Phase 7 |
| 6 | @test |
TESTS_READY + escalate_to_check: true |
@check light review → @test fixes → forward |
| 6 | @test |
NOT_TESTABLE (general) | @check sign-off; task goes to @make without tests; record in NOT_TESTABLE manifest for Phase 8 (ADR-18) |
| 6 | @test |
NOT_TESTABLE: Missing testability seam | @make adds the seam; re-run @test |
| 6 | @test |
BLOCKED | Investigate; may need spec or plan revision |
| 6 | @test (stub-first) |
All tests pass with zero todo!() panics |
Reject — structural-only tests; route back to @test to rewrite |
| 7 | @make |
Implementation Complete | Proceed to Phase 8 |
| 7 | @make |
Implementation Incomplete (any flag or no flag) | Route through @check test-diagnosis-first (ADR-19); orchestrator follows @check's verdict |
| 7 | @check diagnosis |
test_design |
Dispatch @test to redesign tests; fresh @make re-attempt |
| 7 | @check diagnosis |
production_logic |
Re-dispatch @make with @check's production-side notes |
| 7 | @check diagnosis |
split_needed |
Abort to Failure Handler (ADR-21 / Q19a). In the one-task-per-run model task-1 is the only task; no AC have been satisfied; recovery is "discard worktree, re-plan from scratch." @pm adds a comment recording the diagnosis. |
| 7 | Escalation loop | 2 cycles exhausted | Decrement plan_rework_remaining; back to Phase 3 (plan revisit) |
| 8 | @check |
ACCEPTABLE | Proceed to Phase 9 |
| 8 | @check |
BLOCK / behavioral / production-code finding | Write task-fix-<N>.md to $RUN_DIR/ (ADR-16); dispatch @make against it (max 3 cycles) |
| 8 | @check |
BLOCK / test-design / test-quality finding | Route through @check diagnosis → @test → @make re-verify |
| 8 | @check |
BLOCK / plan-level finding | Decrement plan_rework_remaining; back to Phase 3 with the finding |
| 8 | @simplify |
Any verdict (ADR-15) | Advisory only — record in summary; never blocks Phase 8 progression |
| 8 | Reviewers | Strictly cosmetic finding (typo, missing newline, AST-preserving) | Orchestrator fixes directly; re-review |
| 8 | Reviewers | NOT_TESTABLE manifest task flagged as questionable | Apply same routing as a normal @check finding for that task |
| 8 | Review loop | Same finding twice | Convergence; stop loop |
| 8 | Review loop | 3 cycles exhausted | Document blockers; proceed |
| 9 | Orchestrator | Pre-existing bug, out of scope | File sub-issue via @pm (label: bug) |
| 9 | Orchestrator | Unresolved review-loop blocker | File sub-issue via @pm (label: followup) |
| 9 | @test (Phase 6) |
NOT_TESTABLE future-seam note | File sub-issue via @pm (label: tech-debt) |
| 9 | Orchestrator | @simplify advisory not acted on |
Record in summary; do NOT file (records, not work) |
| 9 | Orchestrator | All parent AC checked off | Set issue status to Done; sync README/parent; commit chore(todo): … |
| 9 | Orchestrator | Some parent AC remain unchecked AND sub-issues exist | Leave issue at In Progress; commit chore(todo): … |
| Run-level | Failure Handler | Workflow is non-resumable (ADR-14) | Document the cleanup procedure: git worktree remove, delete branch, re-create from base, retry |
5. Design Decisions Log
ADR-flavoured. New decisions append at the end. If a decision is later reversed or refined, mark the original Superseded by ADR-N and add a new entry.
ADR-1 (2026-05-06) — Forge-agnostic workflow
Context: original gist used the GitHub gh CLI for auth checks and gh pr create --draft at the end of the run.
Decision: workflow stops at git commit. No push, no PR/MR creation, no gh references anywhere.
Alternatives: keep gh integration; abstract behind a forge-plugin interface.
Consequences: workflow runs on any git host; user opens PR/MR manually on whichever forge they use. Removes the need for forge auth setup as a prerequisite.
ADR-2 (2026-05-06) — @pm operates on local TODO/ folder
Context: original @pm agent used the Linear CLI.
Decision: Linear-style folder-as-tracker with one <ID>.md file per issue plus a category-grouped README.md.
Alternatives: keep Linear; multi-backend abstraction; single-file TODO.md.
Consequences: project-local, version-controlled, no external service. Schema enforced in agents/pm.md. Initial single-file design moved to per-issue files in ADR-12.
ADR-3 (2026-05-07) — Workflow runs in worktree, not bare repo
Context: original orchestrated bare-clone → worktree creation as Phase 3 of the workflow.
Decision: user creates the worktree before launching opencode; the workflow assumes CWD is the worktree.
Alternatives: keep auto-worktree-creation; auto-detect bare vs. worktree.
Consequences: simpler workflow; opencode CWD = worktree, so subagents inherit the right project root naturally; less plumbing around WORKTREE_PATH. (Subagents still get absolute paths in dispatch prompts — see ADR-7.)
ADR-4 (2026-05-07) — @make and @test are polyglot
Context: original was Python-only via uv.
Decision: detect toolchain from marker files (pyproject.toml, Cargo.toml, flake.nix); wrap all toolchain commands in nix develop -c if a devshell is present.
Alternatives: per-language agents; keep Python-only.
Consequences: one agent per role serves multiple languages. Permission allowlists expanded for cargo and nix develop -c. Bash sandbox still denies shell escapes inside the wrapper.
ADR-5 (2026-05-07) — Subagent CWD via absolute paths
Context: opencode subagents do not inherit the orchestrator's cd. A @check dispatched from inside a worktree resolved relative paths against the parent project root and failed with "file not found."
Decision: capture WORKTREE_PATH in Phase 1 and pass absolute paths to every subagent dispatch.
Alternatives: patch opencode (out of scope); symlink dance.
Consequences: every dispatch has an explicit Worktree: <abs path> header convention. Verbose but reliable. Eventually superseded by run-artifact paths under $RUN_DIR (ADR-7).
ADR-6 (2026-05-08) — Run artifacts on disk in .workflow/run-<ID>/
Context: the orchestrator was paraphrasing the plan and task specs into each dispatch prompt. Result: @check and @simplify could see slightly different versions of the same plan; mid-loop revisions could leak as "actually let me reconsider…" passages; long specs ate context budget on every dispatch.
Decision: orchestrator writes plan.md (Phase 3), task-N.md (Phase 5), and summary.md (Phase 9) to $WORKTREE_PATH/.workflow/run-<ISSUE_ID>/. Dispatches name files by absolute path; subagents read them.
Alternatives: inline prompts (status quo); database; in-memory orchestrator state.
Consequences: byte-for-byte source of truth across dispatches. Mid-loop revisions edit the file in place; every subsequent reader sees the new version. Run-artifact directory is gitignored (.workflow/).
ADR-7 (2026-05-08) — Stub-first Rust TDD (mandatory for new symbols)
Context: Rust integration tests reference symbols imported from lib.rs. If those symbols don't exist yet, the test crate fails to compile — a build-error RED with no stack trace and no assertion diagnostics. Same for module tests against not-yet-existing functions.
Decision: for any Rust task that introduces new symbols, dispatch a stub-pass @make first (writes todo!()-bodied stubs, runs cargo check only). Then @test runs against compiling stubs; runtime panic on todo!() is the clean RED. Then @make body pass replaces stubs.
Alternatives: accept compile-error RED; let @make write tests + bodies in one pass; allow @test to add stubs to production source.
Consequences: two atomic commits per affected task (feat: scaffold X with todo!() stubs, then feat: implement X). Stub-pass scope is tight: bodies are exactly todo!(), signatures must match the planned final API. Phase 6 also adds a mandatory panic-coverage check after @test: every test must panic on todo!() to prove it actually exercises the stubbed symbols (catches structural-only tests).
On reviewer bypass: the stub-pass commit is not sent through Phase 5.5 or Phase 8 review. The bypass is intentional and safe because (a) stubs are mechanical — signatures plus todo!(), no logic; (b) the body-pass commit is reviewed and the body-pass diff strictly subsumes the stub-pass diff (the same signatures, now with bodies); (c) Phase 6's mandatory panic-coverage check is what actually validates that the stubs are exercised. Reviewing the stub-pass would duplicate work that the body-pass review catches anyway.
ADR-8 (2026-05-08) — @test may write inside #[cfg(test)] mod blocks
Context: Rust unit tests live colocated in production source files inside #[cfg(test)] mod tests { … } blocks — the canonical idiom, not an edge case. Original @test File Constraint forbade src/ writes entirely, which forced @make to write both production code and tests in a single dispatch. This lost the RED→GREEN separation that TDD relies on.
Decision: @test may modify src/**/*.rs strictly inside #[cfg(test)] mod <name> { … } blocks. Every line outside such a block stays read-only.
Alternatives: keep the restriction; write all unit-level tests as integration tests.
Consequences: TDD works for module tests as well as integration tests. The previous Phase 6 file gate (path-based git status snapshot diff) is removed — with @test now legitimately writing inside src/, a path-based gate proves nothing. Constraint is now enforced by the prompt rule, the diff being human-reviewable, and @check flagging production-code drift in Phase 8.
ADR-9 (2026-05-08) — Phase 5.5 task-split review by @check
Context: ppries' README mentioned @check reviewing the task split for completeness, but the gist's workflow.md never implemented it. Without a split-review gate, an over- or under-split task surfaced only at Phase 8 final review — after expensive @test and @make dispatches had already run on a broken split.
Decision: new Phase 5.5 dispatches @check against plan.md + every task-N.md to evaluate the split against five questions: coverage, no overlap, single-purpose, integration contracts, testable AC. Max 2 cycles; BLOCK routes back to Phase 4 (plan itself doesn't decompose).
Alternatives: status quo (catch at Phase 8); orchestrator self-check.
Consequences: one extra @check dispatch per run. @simplify is not involved at this phase — split review is structural, not complexity. Cheaper failure modes for over-/under-split tasks.
ADR-10 (2026-05-08) — @pm is single-mode (filesystem only)
Context: @pm had two read modes — git show <ref>:TODO.md (read-only) and filesystem (read/write). Git-ref mode existed for the bare-repo flow that ADR-3 retired. After ADR-3, the workflow always used filesystem mode; git-ref mode was dead weight that still added bash permissions and doc surface.
Decision: remove git-ref mode. @pm has no bash access. Ad-hoc historical reads (git show main:TODO/GAL-39.md) are out of scope — the user runs them directly.
Alternatives: keep dual-mode; document the separation more clearly.
Consequences: simpler agent. One less permission allowlist to maintain. Workflow's "(live filesystem mode)" qualifier dropped from Phase 2 / Phase 9 / Failure handler.
ADR-11 (2026-05-08) — Phase 9 files follow-ups as TODO sub-issues
Context: unresolved items (pre-existing bugs out of scope, blocked review findings, future-seam notes) were recorded only in summary.md — per-run, untracked, overwritten on the next run, read by nobody since the user has walked away.
Decision: Phase 9 has a ### File Follow-ups step that dispatches @pm to create new TODO sub-issues for tracked-worthy items. Each new issue has parent: <ISSUE_ID>, status Todo, and an appropriate label (bug / followup / tech-debt). @simplify advisories that the orchestrator chose not to act on stay in the summary as records, not filed.
Alternatives: leave items in summary; create as top-level issues (would need a README.md category, which can't be picked at unattended runtime).
Consequences: unresolved items become tracked work. Sub-issue routing avoids the README-category problem. The follow-up files commit alongside the worked-issue update in a single chore(todo): … commit.
ADR-12 (2026-05-08) — Phase 7 mid-implementation escalation
Context: Phase 7's escalation rule was gated on @make flagging concerns during entry validation (the RED check before implementing). When @make got past entry validation, started implementing, and then ground for 2-3 attempts because the test demanded impossible production code, the orchestrator had no documented route — it would re-dispatch @make with marginal context tweaks instead of recognizing the diagnosis as test-architecture failure.
Decision: split Phase 7's escalation into entry-validation and mid-implementation paths. @make reports escalate: test_design when its iteration limit is reached and the test seems to demand impossible / unreasonable code. Both paths route through @check (test diagnosis) → @test (redesign) → fresh @make dispatch. Max 2 escalation cycles before reverting to Phase 3 plan revisit.
Alternatives: status quo; let @make modify test files itself.
Consequences: faster recovery from test-design errors. Bounded loop prevents thrashing. @make.md Iteration Limits section gains a new red-flag class. Superseded in part by ADR-19 (unified diagnosis path).
ADR-13 (2026-05-08) — Run-level plan_rework_remaining counter
Context: several routes return control to an upstream phase when downstream signals reveal the upstream artifact was wrong: P5.5-BLOCK→P4 (split doesn't decompose), P7-escalation-exhaustion→P3 (test/code thrash exceeded its bound), P8-plan-level→P3 (final review exposes a plan defect). Each upstream phase has its own per-loop cycle cap (P4 max 3, P5.5 max 2, etc.), but those caps reset on every re-entry — so a run could in principle thrash P3↔P4↔P5.5↔P3 indefinitely without violating any local rule.
Decision: introduce one run-level counter, plan_rework_remaining, default value 1. It decrements on every transition where downstream signal forces upstream rework: P5.5 BLOCK → P4, P7 escalation exhausted → P3, P8 plan-level finding → P3. When the counter is 0 and another such transition fires, abort to the Failure Handler instead of re-entering. Per-phase cycle caps are unchanged.
Alternatives: (a) a global max_subagent_dispatches budget — over-engineered for the specific failure mode; (b) document the resets as intentional and rely on convergence detection — leaves the bug present.
Consequences: at most two plan attempts per run (the initial plan plus one revision). Failure Handler invocation distinct in cause from earlier-phase aborts: the cleanup is the same (per ADR-14) but the summary explains which downstream signal exhausted the budget.
ADR-14 (2026-05-08) — Workflow is non-resumable
Context: Phase 9 has multiple sub-steps (code commit → @pm status update → file follow-ups → TODO commit → summary). Crashing between any two sub-steps leaves the worktree in a state that earlier docs called "partial." The original Failure Handler did not flip status back, did not recognize partial-Phase-9 separately from earlier-phase crashes, and re-running /workflow after a crash could append new comments and re-do work indefinitely.
Decision: declare the workflow non-resumable. On any failure (Failure Handler invocation), the recovery procedure is: git worktree remove the failed worktree, delete the feature branch, re-create the worktree from $BASE_BRANCH, then re-run /workflow. Document this explicitly in the Failure Handler section. The throwaway-worktree model means there is no in-place resume state to corrupt — the user discards the worktree and starts fresh.
Alternatives: (a) smarter Failure Handler that cleans up partial state idempotently; (b) transactional Phase 9 via a state file; (c) idempotent sub-steps so re-runs auto-resume.
Consequences: simplest possible recovery model. Phase 9 sub-step ordering doesn't need to be defended against partial failures — partial state is acceptable because the recovery is "discard everything and re-run." User-initiated cancellation (Ctrl-C) follows the same procedure.
ADR-15 (2026-05-08) — @simplify is advisory at every gate
Context: the Phase 4 routing matrix used to read "Either reviewer NEEDS WORK → re-dispatch the loop," giving @simplify veto power equivalent to @check's. Phase 8's matrix said @simplify was advisory only ("Record in summary's 'Advisory notes (not filed)'"). Same agent, two different powers.
Decision: @simplify is advisory at every gate. Its findings are recorded in the run summary; they never force a re-dispatch loop. @check is the only reviewer with veto authority (NEEDS WORK / BLOCK).
Alternatives: (a) make @simplify blocking everywhere — too heavy for a heuristic agent prone to false positives; (b) keep the asymmetry and document a principle — fragile.
Consequences: uniform model — @check enforces correctness, @simplify advises on shape. Phase 4 review loops only run on @check findings; @simplify complexity flags get logged in the summary like at Phase 8. The user can manually promote a @simplify finding if it matters.
ADR-16 (2026-05-08) — Phase 8 fix specs go to disk
Context: when Phase 8 review surfaced a behavioral or production-code finding, the orchestrator would "build a new @make task spec from the finding" and dispatch it inline. That violates ADR-6's invariant (run artifacts on disk, no inline paraphrase) — and exactly when it matters most, because Phase 8 has up to 3 review cycles and the same finding can re-dispatch.
Decision: Phase 8 fix dispatches write a new artifact $RUN_DIR/task-fix-<N>.md (1-indexed within the Phase 8 cycle) before dispatching @make. Same Dispatch Hygiene rules as Phase 5 task specs, same test -f verification. Cosmetic findings (orchestrator fixes directly per workflow.md) skip the file — only @make-dispatched findings get one.
Alternatives: (a) inline in the dispatch prompt with an ADR-6 footnote — erodes the invariant for the highest-risk dispatch class; (b) edit the original task-N.md — muddies the audit trail of an already-met spec.
Consequences: ADR-6's invariant holds end-to-end. Phase 8 cycles re-dispatch against the same on-disk file (mid-loop edits in place), eliminating paraphrase drift across review cycles.
ADR-17 (2026-05-08) — Phase 5.5 BLOCK protocol
Context: Phase 5.5 BLOCK ("plan does not decompose cleanly") used to route "back to Phase 4 with @check's finding," but @check at 5.5 evaluated the split, not the plan; its finding may not map cleanly to a plan edit. Re-entering P4 with the same plan.md and a finding tagged on the prompt asks the wrong question.
Decision: on P5.5 BLOCK, the orchestrator translates the split-level finding into a concrete plan.md edit (e.g. "the plan conflates structural and runtime work; split into two milestones"), saves the edit, decrements plan_rework_remaining (per ADR-13), and re-dispatches Phase 4 reviewers against the revised plan. P4 reviewers see a genuinely different plan.
Alternatives: (a) re-dispatch P4 unchanged with finding attached — burns reviewers on a known-broken plan; (b) treat P5.5 BLOCK as terminal — too strict, we have the rework budget for one revisit.
Consequences: P5.5 BLOCK is an effective signal. The orchestrator's plan-edit step is mandatory; skipping it is a routing error. Run-level rework budget bounds the loop.
ADR-18 (2026-05-08) — Phase 8 NOT_TESTABLE manifest
Context: Phase 6 routes NOT_TESTABLE tasks through @check for sign-off, then dispatches @make without tests. Phase 8 reviews the diff but has no signal that "this change has no test because @test claimed it untestable." If @check at P6 was wrong, untested code ships.
Decision: Phase 8's dispatch prompt includes a "Tasks completed without tests (NOT_TESTABLE)" section listing each task ID, the @test justification, and the @check sign-off rationale. Reviewers explicitly evaluate "does the justification still hold given the final diff?" If a reviewer pushes back, routing follows the normal Phase 8 finding rules.
Alternatives: (a) double-up @check + @simplify at P6 NOT_TESTABLE granting — doubles dispatch cost without targeting the actual gap; (b) restrict NOT_TESTABLE to a fixed taxonomy — won't generalize across languages; (c) reject NOT_TESTABLE entirely — ignores legitimate cases.
Consequences: pure plumbing change. P8 reviewers gain visibility into the bypass without new agents or new authority.
ADR-19 (2026-05-08) — Unified Implementation Incomplete diagnosis path
Context: ADR-12 introduced three paths for @make reporting Implementation Incomplete: entry-validation flag, mid-impl escalate: test_design flag, no flag (re-dispatch with @check notes once, escalate after second failure). Three paths converging on the same destination (@check test-diagnosis → @test redesign or @make re-dispatch) added matrix surface and obscured the routing.
Decision: every Implementation Incomplete from @make routes through @check test-diagnosis-first. @check returns one of three verdicts — test_design (route to @test redesign), production_logic (re-dispatch @make with @check's notes), or split_needed (per ADR-21). @make's self-diagnosis flag becomes a hint for @check, not a control-flow input for the orchestrator.
Alternatives: (a) keep three paths, tighten what @check reviews in each — preserves the surface area; (b) push burden to @make — orchestrator still needs to gate via @check.
Consequences: routing logic shrinks. Matrix has fewer rows. ADR-12's split-into-two-paths is partially superseded — the escalation diagnosis unified, the iteration limit (max 2 cycles) preserved.
ADR-20 (2026-05-08) — Phase 1 working-tree cleanliness check
Context: Phase 1 verified non-bare repo, branch identity, base branch, issue file presence — but not that the working tree was clean. Stale uncommitted edits would be swept into the Phase 9 commit (workflow.md stages "code changes only" but doesn't distinguish which code) or a wip: failure commit.
Decision: Phase 1 runs git status --porcelain; if non-empty, stop with: "Working tree must be clean. Commit or stash uncommitted changes before running the workflow."
Alternatives: (a) capture initial dirty state, stage only files modified by the workflow at Phase 9 — error-prone baseline tracking; (b) document the requirement, don't enforce — // TODO: don't forget in design-doc form.
Consequences: matches the ADR-14 throwaway-worktree model. One additional sanity-check line. User's "but I have manual edits I want the workflow to build on" case is solved by them committing those edits first, which is what they should do anyway.
ADR-21 (2026-05-08) — One-task-per-run model
Context: the workflow originally executed N tasks per run, sequentially through Phase 7. That introduced cross-task regression risk (task 4 breaks task 1's tests, found N tasks late), big-diff Phase 8 reviews (multi-day branches accumulate thousands of diff lines that hit reviewer context limits silently), and the mid-flight task-split problem (when @make discovers task N is over-scoped, no documented route to re-split). It also coupled the workflow's success to "all N tasks complete," when in practice an issue worth one good commit shouldn't depend on unrelated downstream work succeeding.
Decision: every workflow run executes exactly one task through Phases 6–8. Phase 5 still splits the plan into N tasks via the Split Heuristic. If N=1, proceed normally. If N>1, the orchestrator dispatches @pm to file tasks 2…M as TODO sub-issues before Phase 6 starts, and only task 1 runs through Phases 6–8. If task 1 itself reports split_needed mid-Phase-7, abort to the Failure Handler (Q19a: in the one-task-per-run model task-1 is the only task in the run, so no feature AC have been satisfied; recovery is the standard non-resumable cleanup from ADR-14).
The model carries five sub-decisions:
- Sub-issue body schema for split-time filings: rich seed body that lets a fresh
/workflowinvocation re-plan and implement without seeing siblings or the originalplan.md. Includes task description + AC + Code Context + Integration Contracts (declared in frontmatter asdepends-on: [...]) + relevant slice ofplan.md+ Test Design section if present + a "Discovered during run on$BRANCH_NAMEfor parent issue$ISSUE_ID" attribution paragraph. - Phase 5.5 review questions strengthen to six: coverage, no overlap, single-purpose, integration contracts (with stronger bar — must be self-contained for cross-session use), testable AC, and self-containment (is each task spec runnable as a standalone
/workflowinvocation?). Self-containment is the new load-bearing question because each filed sub-issue runs in isolation. - Split Heuristic recalibration: keep the existing mechanical thresholds (>2 concerns, >50 lines across >2 files, mixes structural + runtime, etc.) but add a "default to no split" tiebreaker — when in doubt, do not split, because splitting now fans out across user sessions with full orchestration overhead per sub-issue.
- Parent issue status is AC-driven: Phase 9's existing AC checkbox logic (workflow.md flips ticked AC) determines status. If all parent AC are checked →
Done; if some remain unchecked → staysIn Progresswith the filed sub-issues covering the remaining work. depends-on:frontmatter and Phase 1 enforcement:@pmschema gains adepends-on: [<ID>, ...]list. Phase 1 sanity check refuses to start if any listed dependency is notDone. Hard block — soft-warn means the user (who has walked away) doesn't see the warning until later.
Auto-resolved problems:
- Mid-flight task split (formerly Q2 in Open Questions, ADR-12's adjacent gap): collapses into "file as sub-issue and exit."
- Big-diff Phase 8 reviews: one task = bounded diff (~50 lines per Split Heuristic). No big-diff problem possible.
- Cross-task regression within a run: no cross-task regressions possible inside a single-task run; subsequent sub-issue runs detect them at their own Phase 7 entry validation (which runs the project's test suite).
- Skip-P5.5-when-N=1 optimization: trivially satisfied — N=1 from Phase 6 onward in every run.
Alternatives: (a) keep N-task runs, add mid-flight re-splitting via P7→P5 re-entry — doesn't solve big-diff or cross-task regression; (b) keep N-task runs, accept the gaps — leaves three known-bad routes; (c) always one task per issue (skip Phase 5 entirely) — loses the planning-phase split heuristic that's catching legitimate over-scoping at design time.
Consequences: runs become shorter and more focused. Each commit/PR carries a bounded scope. Sub-issue fan-out becomes the primary scaling mechanism for multi-step work. TODO/ sees more sub-issue files; @pm's split-time filing path becomes a hot code path. Concurrent runs in different worktrees on the same repo become trivially safe because each worktree has its own TODO/ checkout (file conflicts surface as standard git merge conflicts at integration time, not as mid-run race conditions).
6. Open Questions / Known Gaps
When a question gets answered, move it to the Design decisions log.
Q1: Phase 5.5 review scope — does @check evaluate test-design soundness here?
Currently Phase 5.5 reviews the split (coverage, overlap, single-purpose, integration contracts, testable AC). It does not explicitly evaluate whether the test approach implied by each task spec is sound. That would partially overlap with Phase 4 (which has a plan-level Test Design section the reviewers evaluate). If a test-design error escapes Phase 4 and is encoded in a task spec, it surfaces at Phase 7 via the mid-impl escalation (ADR-12) — but earlier detection might be cheaper. Open: should Phase 5.5 add "test approach for each task is sound" as a sixth review question, or is that scope creep into Phase 4 territory?
Q2: Mid-flight task split — closed by ADR-21
The one-task-per-run model collapses this question. When @make discovers task-1 is over-scoped, the unified diagnosis path (ADR-19) returns split_needed from @check, and the orchestrator aborts to the Failure Handler (no P5 re-entry, no sub-issue filing — the recovery is "discard worktree, re-plan from scratch"). Tasks 2…M are already filed as sub-issues at Phase 5.5 acceptance, so there's no "remaining tasks" cleanup to think about.
Q3: Phase 9 partial-commit rollback — closed by ADR-14
The workflow is non-resumable. Phase 9 partial states are addressed by the throwaway-worktree recovery procedure: discard the worktree, delete the branch, re-create from base, re-run. Phase 9 sub-step ordering doesn't need to defend against partial failures because the recovery is "discard everything and re-run."
Q4: @simplify not involved at Phase 5.5 — is that the right call?
Phase 5.5 only dispatches @check. Rationale (ADR-9) is that split review is structural, not complexity. But @simplify's lens — "what if we deleted this?" — could legitimately catch unnecessary tasks (e.g. a third task that adds an abstraction nothing else needs). With ADR-21's one-task-per-run pivot, this question gains a different angle: a @simplify flag on a sibling sub-issue at Phase 5.5 could prevent filing a wasteful sub-issue, which is more valuable than catching the same redundancy at Phase 8 of a future run. Open: is the cost of one more dispatch worth the catch, especially now that Phase 5.5 is the gate for sub-issue fan-out?
Q5: Test-design loop bound vs plan-revisit threshold
ADR-12 sets max 2 cycles for the Phase 7 test-design escalation before reverting to Phase 3 plan revisit. The plan-review and final-review loops have max 3. Why the asymmetry? The test-design loop is more expensive per cycle (@check + @test + @make re-implement vs. just reviewers + plan edit), so 2 may be right. But the choice was made by feel, not measured. Open: is 2 the right number, or should it match Phase 4 / Phase 8 at 3?
Q6: Sub-issue ordering in the parent's ## Sub-issues list
ADR-21's split-time filing creates new sub-issues with depends-on: declarations, but the parent's ## Sub-issues list (rendered by @pm) is currently flat. When dependencies form a chain (sub-issue 2 depends on 1), the user has to read the chain from each sub-issue's frontmatter. Open: should @pm render the parent's sub-issue list in dependency order, with a visible indicator (e.g. indentation or ↳) for dependent items? Cosmetic but would speed up "what to run next" decisions.
Q7: Concurrent-worktree edge case — sub-issue ID collisions
Two parallel runs in different worktrees, each filing sub-issues, can both pick the same next ID (e.g. both pick GAL-42 because both saw GAL-41 as the highest at start). On merge, git surfaces this as a conflict over TODO/GAL-42.md content (two different files staked on the same name). Recoverable but annoying. Open: should @pm's ID generation use a strategy that's safer under concurrent runs (e.g. timestamp suffix, branch-prefix, content-addressable), or accept the merge-conflict-on-collision cost given the one-user assumption?