--- description: "Multi-agent workflow for the current worktree: plan, test, implement, commit" agent: build --- You are executing the multi-agent workflow inside the worktree this opencode session was started from. Run all phases without waiting for user input. The user has walked away. **Prerequisites (the user handles before launching opencode):** - A git worktree is checked out for the issue's feature branch - `opencode` was launched from the root of that worktree - A `TODO/` directory is committed to the repo with a per-issue tracker (schema in `agents/pm.md`) and a `TODO/README.md` index. The orchestrator does not read or construct per-issue paths — `@pm` is the only agent that touches issue files (ADR-22). **Task reference:** $ARGUMENTS If `$ARGUMENTS` is empty, stop immediately: "Usage: `/workflow [base-branch]` (e.g. `/workflow ABC-1`). The ID must already be tracked under `TODO/` (`@pm` validates existence at Phase 2). Base branch defaults to `main` (then `master`)." Parse `$ARGUMENTS`: the first whitespace-separated token is the issue ID, an optional second token overrides the base branch. Store as `ISSUE_ID`. --- ## Roles & Dispatch This is a **multi-agent** workflow. There is one orchestrator (you, running in `agent: build` mode per this file's frontmatter) and a cast of specialised subagents that the orchestrator dispatches at each phase. **The orchestrator coordinates; subagents do the work.** The orchestrator does not write production code, write tests, or play any subagent's role — it plans, dispatches, merges findings, edits its own artifacts under `.workflow/`, and commits. **The cast** (each defined as a separate agent file under `config/opencode/agents/.md`): | Subagent | Role | Notable constraints | |---|---|---| | `@check` | Reviews plans and code for risks, correctness, testability. Returns `ACCEPTABLE` / `NEEDS WORK` / `BLOCK`. | Read-only — no write / edit / bash. | | `@simplify` | Reviews for unnecessary complexity. Advisory recommendations. | Read-only. | | `@test` | Writes failing tests for a task spec, verifies RED, hands off to `@make`. | May only modify test files / `#[cfg(test)] mod` blocks. Bash sandboxed to test runners. | | `@make` | Implements a single task spec. Verifies acceptance criteria. | May only modify files listed in the task spec. Bash sandboxed to language toolchains; no `git`, network, `cd`. | | `@pm` | Reads/updates `TODO/` issue files. | May only modify `TODO/` contents. No bash. | **What "Dispatch" means here.** Every "dispatch `@`" in the phase descriptions is a call to opencode's subagent / task invocation tool with that agent name. Each dispatch starts a **fresh context**: the subagent has no memory of prior phases, no view of this orchestration, and no access beyond what its own file declares. The subagent receives only what the dispatch prompt provides — typically an absolute path to a file in `$RUN_DIR` plus a small per-dispatch context block. **Anti-patterns to avoid:** - Performing a subagent's work in the orchestrator's session ("I'll think like `@check` for a moment and produce the review myself"). Every `@` reference is a tool call, not a role-play. - Skipping a dispatch because the orchestrator "could just do it." The agents enforce permission boundaries the orchestrator (in `agent: build` mode) does not have. - Paraphrasing a subagent's output into the next dispatch's prompt instead of letting the next subagent read the on-disk artifact directly. --- ## Run Artifacts The orchestrator writes plan and task-spec artifacts to a per-run directory in the worktree. Subagents read these by absolute path rather than from inline prompt text. This keeps dispatch prompts small, eliminates paraphrase drift between dispatches (`@check` and `@simplify` see the same plan byte-for-byte), and gives Dispatch Hygiene's Finalized-Text Rule a physical anchor — the file *is* the final version. **Directory layout** (relative to `$WORKTREE_PATH`): ``` .workflow/ └── run-/ ├── plan.md # Phase 3 output — finalized ├── task-1.md # Phase 5 output — one file per task ├── task-2.md └── summary.md # Phase 9 output (the run summary) ``` Define `RUN_DIR="$WORKTREE_PATH/.workflow/run-$ISSUE_ID"` once in Phase 1 and reference it everywhere downstream. Create the directory in Phase 3 (`mkdir -p "$RUN_DIR"`). **Authoring rules:** - Files are written by the orchestrator, never by subagents. - Files are passed to subagents as absolute paths: e.g. *"the plan is at `/plan.md`; read it before responding."* The dispatch prompt body should be short — agent role, artifact path, per-dispatch context (worktree path, branch, base branch). **Do not quote artifact contents inline.** - Mid-loop revisions (Phase 4 review cycle, Phase 5 task respec, etc.) edit the file in place; every subsequent dispatch reads the new version automatically. **Lifecycle:** - Files persist across phases until the run finishes. - Files are **not committed** (same as `summary.md`). Recommend `.workflow/` in `.gitignore`. - Multiple runs on the same issue overwrite the prior run's artifacts. Save anything you want to keep before re-running. --- ## Phase 1: Sanity Check This phase covers **only** git/worktree-shaped sanity. **TODO-tracker validation (issue file existence, `depends-on` enforcement) is `@pm`'s job and happens at Phase 2 (ADR-22)**. The orchestrator does not construct or read paths under `TODO/` at any point — it dispatches `@pm` and uses whatever path `@pm` returns. 1. Verify CWD is a non-bare git worktree: `git rev-parse --is-bare-repository 2>/dev/null` must output `false`. If not, stop: "Workflow must be run from a non-bare worktree (the directory opencode was launched in)." 2. Capture the worktree path: `WORKTREE_PATH="$(pwd)"`. 3. Verify HEAD is not detached: `git symbolic-ref --short HEAD` must succeed. If it fails, stop: "Cannot run on a detached HEAD. Check out a feature branch first." 4. Capture the current branch: `BRANCH_NAME="$(git symbolic-ref --short HEAD)"`. 5. Resolve the base branch (`BASE_BRANCH`): - If `$ARGUMENTS` provided a second token, use it. - Else if `git rev-parse --verify --quiet main` succeeds, use `main`. - Else if `git rev-parse --verify --quiet master` succeeds, use `master`. - Else stop: "Could not determine base branch (no `main` or `master`). Pass it as the second argument: `/workflow `." 6. Verify the current branch is not the base branch: if `BRANCH_NAME == BASE_BRANCH`, stop: "Cannot run workflow on the base branch (`$BASE_BRANCH`). Switch to a feature branch first." 7. **Verify the working tree is clean** (ADR-20): `git status --porcelain` must return empty. If not, stop: "Working tree must be clean. Commit or stash uncommitted changes before running the workflow." 8. Set the run-artifacts directory: `RUN_DIR="$WORKTREE_PATH/.workflow/run-$ISSUE_ID"`. Phase 3 will `mkdir -p "$RUN_DIR"` before writing the first artifact. 9. Initialize the run-level rework counter: `PLAN_REWORK_REMAINING=1` (per ADR-13). Decrement on every P5.5-BLOCK→P4, P7-escalation-exhaustion→P3, and P8-plan-level→P3 transition. When the counter is `0` and another such transition fires, abort to the Failure Handler instead of re-entering. --- ## Phase 2: Issue Context Dispatch `@pm` with the issue ID `$ISSUE_ID`, `$WORKTREE_PATH`, and `Validate run prerequisites` as the operation. **The orchestrator does not assume any path under the worktree's `TODO/` tree exists** — it asks `@pm` to: 1. Verify the TODO tracker is well-formed in this worktree (directory + index file present). 2. Locate the issue file for `$ISSUE_ID`. 3. Verify all `depends-on:` entries in the issue's frontmatter resolve to issues with `status: Done` (ADR-21 / ADR-22). 4. Return one of two structured responses: **Success:** ```json { "ok": true, "issue": { "id": "...", "title": "...", "status": "Todo | In Progress | Done", "parent": "... | null", "labels": [...], "depends_on": [...], "description": "...", "acceptance_criteria": [{"checked": false, "text": "..."}], "sub_issues": [...] } } ``` **Failure:** ```json { "ok": false, "error_code": "tracker_missing | issue_not_found | dependency_unmet | dependency_missing", "message": "" } ``` On failure, stop the workflow with `@pm`'s `message` verbatim. Do **not** attempt to inspect or repair the TODO tree from the orchestrator — that belongs to `@pm`. On success, the orchestrator works exclusively from the structured `issue` object. **Every subsequent TODO operation re-dispatches `@pm` by issue ID** — the orchestrator never holds or passes around a file path. If `issue.status == "Todo"`, dispatch `@pm` again to flip it to `In Progress` (operation: `Update status`, target: the same issue ID; `@pm` propagates to README.md / parent's `## Sub-issues` line). The status edit will be staged alongside other TODO updates in Phase 9. **Forbidden in the orchestrator from this point forward:** - Reading any file inside the `TODO/` tree directly. - Constructing a per-issue file path from an issue ID — `@pm` is the only agent that knows the layout. - Editing or writing any file under `TODO/` — every TODO mutation is a `@pm` dispatch that returns the path of what it touched. These rules are enforced by *not telling you the path layout*. The schema lives in `agents/pm.md`; the orchestrator never needs it. --- ## Phase 3: Plan Analyze the codebase. Create a detailed implementation plan addressing the issue's requirements and acceptance criteria, then write it to `$RUN_DIR/plan.md` (run `mkdir -p "$RUN_DIR"` first if the directory doesn't exist). All Phase 4 reviewer dispatches read this file. The plan should include: - Problem summary (from issue context) - Proposed approach with rationale - Files to modify (with brief description of changes) - New files to create - Risks and open questions - **Test Design (conditional — include for non-trivial tasks):** - Key behaviors to verify, expressed as **action + observable outcome** (e.g. *"call `weave_enemies` with t=0.5 → enemy `Transform.translation.x` differs from initial position"*). A structural fact like *"enum has 3 variants"* or *"struct has these fields"* is **not** a behavior — it cannot fail meaningfully and does not exercise the code under test. - Edge cases and error conditions worth testing (also expressed as actions, not structure) - What explicitly should NOT be tested (prevents bloat) - Testability concerns (heavy external deps, GPU-only paths, etc.) **Include Test Design for:** Public API changes, bug fixes with behavioral impact, new features with business logic, multi-module changes. **Skip Test Design for:** Config-only changes, decorator swaps, import reorganization, documentation. When skipped, `@test` derives test cases directly from acceptance criteria. Before saving `plan.md`, apply **Dispatch Hygiene** (below). The file on disk is what reviewers will read in Phase 4 — there is no second chance to revise during dispatch. --- ## Dispatch Hygiene This applies to **every** subagent dispatch (Phases 4, 6, 7, 8) **and** to artifacts that will be dispatched (the plan from Phase 3, the task specs from Phase 5). Apply these checks before sending — fix the artifact, then re-check. ### Finalized-Text Rule The artifact must be **finalized** — single-author text, no contradictions, no open questions. Forbidden: - "Actually, that's wrong — let me correct…" - "Wait, let me reconsider…" - Two versions of the same code block, one labelled "corrected" or appearing after a revision pass - Open questions or ambiguities the orchestrator hasn't resolved - Mid-text revisions visible to the recipient If you find yourself revising while writing, stop, redo the artifact from scratch with the corrected understanding, and only then dispatch. Subagents are fresh-context — they cannot reliably resolve which of two contradictory drafts is canonical, and reviewers cannot give a clean verdict on a self-contradicting plan. ### No-Implementation-in-Plan-or-Spec Rule Plans (Phase 3) and task specs (Phase 5) are **not** the place to write the answer. They describe what to do; `@make` writes how. Provide: - Approach with rationale - Files to modify with brief descriptions - Function signatures, type declarations, data shapes (structure, not logic) - Constraints, invariants, integration contracts - Risks and edge cases Do **not** provide: - Drop-in code blocks longer than ~5 lines that constitute "the answer" - Full function bodies for the changes being planned - Complete `match` arms / branch logic / loop bodies for new behavior - Pre-written test bodies (those come from `@test`) - Stage-by-stage code transformations spelled out as ready-to-commit diffs If you've already written the implementation in the plan or spec, the artifact has overstepped. Convert finished code into structural description (signature + intent) and let `@make` produce the body. **Allowed in plans/specs:** - Existing code being replaced, marked as "current state" - Function signatures and type/struct/enum declarations (data, not logic) - Tiny inline constants (`pub const FOO: f32 = 30.0;`) - Test specifications as one-line behavior descriptions ("input X → expect Y") ### Pre-Dispatch Validation (MANDATORY) Scan the artifact and reject (revise, retry) if any of the following are present: | Check | Why it matters | |---|---| | `bash -c`, `sh -c`, `zsh -c`, `fish -c` (anywhere, including inside `nix develop --command bash -c …`) | `@make`/`@test` sandboxes deny all `*-c` shell invocations and any nested `bash` would bypass the per-command allowlist. Replace with one direct command per line: `nix develop -c cargo check`, etc. | | `nix develop --command bash` / `nix develop -c bash` / `nix develop -c sh` | Same — inner shell escapes the sandbox. Wrap each toolchain command directly. | | Any `cd && …` | Subagents cannot `cd`. Rewrite to use absolute paths. | | Code blocks longer than ~5 lines that draft the answer | Violates No-Implementation-in-Plan-or-Spec. Trim to structure (signature + "current state" only). | | Two versions of the same code, "actually let me correct…", or open questions | Violates the Finalized-Text Rule. Redo the artifact. | | Test bodies inside `@make` specs when tests are coming from `@test` | Duplicates the TDD handoff. | | Artifact path referenced in the dispatch (e.g. `$RUN_DIR/plan.md`, `$RUN_DIR/task-.md`) but the file isn't on disk | The subagent will fail to read it and either error or fabricate context. **Verify with `test -f ""` before every dispatch.** If missing, go back to the phase that produces it (Phase 3 for `plan.md`, Phase 5 for `task-.md`) and write the file before retrying. | If any check trips, **do not dispatch.** Fix and re-validate. Repeated trips on a single task signal a Phase 5 split problem — go back and split. --- ## Phase 4: Review Plan Dispatch `@check` and `@simplify` in parallel to review `$RUN_DIR/plan.md`. The dispatch prompt is short — agent role, the absolute path to the plan, the worktree path, and any per-dispatch reviewer focus. Tell each reviewer to read the plan from disk; do **not** paste the plan inline. Apply **Dispatch Hygiene** to each dispatch prompt. Reviewers should evaluate testability: - `@check`: Is the design testable? Are the right behaviors identified? (Review Framework §8) - `@simplify`: Is the test scope appropriate? Over-testing proposed? **Verdict authority** (ADR-15): `@check` is the only blocking reviewer. Its `NEEDS WORK` and `BLOCK` verdicts gate progression; its `ACCEPTABLE` verdict permits proceeding regardless of what `@simplify` reports. `@simplify` is advisory at every gate — its findings are recorded in the run summary's "Advisory notes (not filed)" section but never trigger a re-dispatch loop. The user may manually elevate a `@simplify` finding into a `task-fix-.md` if it warrants follow-up. **Review loop (max 3 cycles, gated on `@check`):** 1. Dispatch both reviewers against `$RUN_DIR/plan.md`. 2. Merge findings: record `@simplify`'s output for the summary; act only on `@check`'s verdict. 3. If `@check` returns ACCEPTABLE: proceed to Phase 5 (regardless of `@simplify`). 4. If `@check` returns BLOCK or NEEDS WORK: edit `$RUN_DIR/plan.md` in place addressing the findings (re-apply Dispatch Hygiene to the updated file), then re-review. 5. **Convergence detection:** if `@check` returns the same findings as the previous cycle, stop the loop early. 6. If still unresolved after 3 cycles: note unresolved blockers and proceed anyway (they will be documented in the workflow summary and commit message). --- ## Phase 5: Split into Tasks **The output of this phase is one file per task at `$RUN_DIR/task-.md`** (1-indexed: `task-1.md`, `task-2.md`, …). These files are the source-of-truth that Phase 5.5, Phase 6, and Phase 7 read by absolute path. **No file written = no dispatch in later phases.** If you skip the file-write step, every downstream dispatch will reference a non-existent path and fail. **One-task-per-run model (ADR-21):** Phase 5 still produces N task files. After Phase 5.5 review, only **task-1** runs through Phases 6–8 in this invocation. If N > 1, tasks 2…N are filed as TODO sub-issues at the end of Phase 5.5 (see "File sibling tasks as sub-issues" below) and the user runs `/workflow` separately on each. This bounds the run's scope, keeps Phase 8 diffs reviewable, and eliminates cross-task regression risk inside a run. Steps: 1. Break the approved plan into discrete tasks (see Split Heuristic and task-size guidance below). 2. For each task, draft the task spec covering the fields in the table below. 3. Apply **Dispatch Hygiene** (above) to each draft. 4. **Write each finalized spec to `$RUN_DIR/task-.md`.** After writing, verify with `test -f "$RUN_DIR/task-.md"` for every N. Phase 5 is not done until every task file exists on disk. 5. Drop your inline copies of the task drafts. From this phase onward, the file is the only source of truth — if you need a task spec later, read it back from disk. Each task file must contain: | Required | Description | |----------|-------------| | **Task** | Clear description of what to implement | | **Acceptance Criteria** | Specific, testable criteria (checkbox format) | | **Code Context** | Actual code snippets from the codebase, not just file paths | | **Files to Modify** | Explicit list, mark new files with "(create)" | | **Test File** | Path for test file. **Pick the pattern that matches the project's language** — see "Test File Path by Language" below. | ### Test File Path by Language The test file path must follow the language's actual test layout. **Do not invent paths that look colocated but aren't valid for the language** (e.g. `src/tests/test_.rs` is *not* a Rust test location — it's a regular `src/` submodule). - **Python** - Colocated: `/tests/test_.py (create)` - Top-level: `tests/test_.py (create)` - **Rust** - **Module tests** (most common — testing private/crate-internal functions): pick the relevant production source file, e.g. `src/.rs`. `@test` is permitted to add or edit content **only inside `#[cfg(test)] mod { … }` blocks** in that file (per `@test`'s File Constraint). The rest of the file remains read-only to `@test`. - **Integration tests** (testing the crate's public API as a black box): `tests/.rs (create)`, or in a workspace `/tests/.rs`. - **In both cases**, if the test references not-yet-existing functions/types, the task requires a **stub-first `@make` pre-pass** so the symbols exist as `todo!()` bodies before `@test` runs. See Phase 6 → "Rust stub-first TDD". Plan for two `@make` dispatches per such task: stub pass, then body pass. - **`src/tests/.rs` is not a valid path** — it would be a regular submodule needing `mod tests;` in production code. Use one of the two forms above. - **Polyglot Nix flake** - Match the host language of the code under change (Python or Rust rules above), wrapping commands in `nix develop -c …` per the agents' devshell rule. Include **Integration Contracts** when a task adds/changes function signatures, APIs, config keys, or has dependencies on other tasks. Include **Test Design** from Phase 3 when available, attached to the relevant task(s). **Task size:** ~10-30 minutes each, single coherent change, clear boundaries. ### Split Heuristic — when in doubt, **do not** split In the one-task-per-run model (ADR-21), splitting fans work out across user sessions: every additional task becomes a sub-issue the user must come back and run as its own `/workflow` invocation, with full P3/P4/P5/P5.5/P6/P7/P8 overhead per sub-issue. **Default to keeping work in one task.** Only split when one of the mechanical triggers below clearly applies *and* the resulting sub-tasks each warrant their own commit/PR-sized chunk of attention. A task should be **split** when any of the following apply: - It touches more than two distinct concerns (e.g. *constants + new component + sprite spawn + new system + main wiring* is **five** concerns — at least three tasks). - It changes more than ~50 lines across more than 2 files. - It mixes data/structural changes (constants, types, components) with runtime/system changes (new ECS systems, scheduling, render loops). - It mixes pure-logic changes (math helpers) with stateful changes (queries, world mutation). - It mixes new APIs with their first call sites in the same task. **Tiebreaker:** when none of the triggers clearly applies and the work plausibly fits a single coherent commit, do not split. Splitting fans out across sessions; only split if each resulting sub-issue is genuinely independently runnable and benefits from its own plan. When a task does fail the heuristic, split into: 1. **Foundations** — new constants, types, components (no behavior change yet). 2. **Implementation** — the actual production logic, calling the foundations. 3. **Wiring** — registration in `main.rs` / `lib.rs` / app-builder. Tasks 2…N are filed as sub-issues at the end of Phase 5.5; only task 1 runs in this invocation. ### Code Context — what to include The **Code Context** field exists so `@make` can find the seam to modify. Provide: - The existing code being replaced (verbatim, marked as "current state"), with ~5–10 lines of surrounding context - Function signatures of helpers `@make` will need to call - The file's relevant import block For everything you must **not** include — drop-in replacements, full function bodies, pre-written test bodies, "here is what to write" — see **Dispatch Hygiene → No-Implementation-in-Plan-or-Spec Rule** above. If the task is so well-specified that you've already written the implementation, the task is too small for `@make` (apply it directly) or you've over-determined the design (revisit Phase 3). Apply **Dispatch Hygiene** to each task spec before dispatch in Phase 7. --- ## Phase 5.5: Review Task Split A short, focused review of the task split as a set. In the one-task-per-run model (ADR-21), this phase is the gate for **two** things: (a) catching split errors (missed scope, overlap, multi-purpose tasks, missing integration contracts) before `@test`/`@make` dispatch, and (b) preventing a botched split from being persisted as garbage sub-issues that the user has to manually clean up later. Both stakes are higher than in the original N-tasks-per-run design. **Dispatch only `@check`** for this phase — split review is structural / coverage, not complexity. `@simplify` is not involved. Apply **Dispatch Hygiene** to the prompt. **Skip Phase 5.5 entirely when N=1** (ADR-21): a single-task plan has no split to review. Three of the six questions below (no overlap, integration contracts, sub-issue self-containment) are degenerate. The remaining structural concerns (coverage, single-purpose, testable AC) are already evaluated at Phase 4 plan acceptance. Proceed directly to Phase 6. The dispatch prompt names: - `$RUN_DIR/plan.md` (the plan being decomposed) - `$RUN_DIR/task-1.md` through `$RUN_DIR/task-N.md` (the split — list every task file) - The worktree path `@check` evaluates the split against six questions: 1. **Coverage** — do the tasks together implement everything the plan promises? Any gaps? 2. **No overlap** — do two tasks claim the same scope or modify the same lines? 3. **Single-purpose** — does any task do more than one thing? (See Phase 5's Split Heuristic.) 4. **Integration contracts** — where two tasks touch a shared interface, is the contract documented in *both* task files in a form that survives sub-issue filing? (Each sub-issue runs in isolation later — its eventual P3 plan must be reconstructable from the sub-issue body alone, including any cross-sub-issue dependencies. This is the load-bearing question in the new model.) 5. **Testable acceptance criteria** — does every task have specific, falsifiable AC? 6. **Self-containment** — is each task spec runnable as a standalone `/workflow` invocation? Does its description carry enough plan-level context (rationale, code seams, scope boundary) that a fresh run could re-plan it without seeing the parent plan or sibling task files? **Review loop (max 2 cycles):** 1. Dispatch `@check` against the plan + all task files. 2. If `ACCEPTABLE` → proceed to "File sibling tasks as sub-issues" (below), then Phase 6. 3. If `NEEDS WORK` → edit the task files in place (split a task into two, merge two tasks, strengthen integration contracts, sharpen AC, add self-containment context). Re-apply Dispatch Hygiene to each updated file. Re-dispatch. 4. If `BLOCK` plan-level finding (ADR-17) → translate the split-level finding into a concrete `plan.md` edit, save the edit, **decrement `PLAN_REWORK_REMAINING`**, and re-enter Phase 4 against the revised plan. If `PLAN_REWORK_REMAINING` was already `0`, abort to the Failure Handler instead. 5. **Convergence detection:** same `@check` finding twice → stop loop, document the unresolved split issue in the run summary, proceed. **This is a quick gate, not a deep review.** No line-by-line code feedback (there's no code), no design re-litigation (that was Phase 4's job). The whole point is a fast structural check before downstream phases start churning *and* before sibling tasks become persistent sub-issues. ### File sibling tasks as sub-issues (when N > 1) After Phase 5.5 returns ACCEPTABLE, dispatch `@pm` to file each of `task-2.md` through `task-N.md` as a TODO sub-issue with `parent: $ISSUE_ID`. **Only task-1 continues into Phase 6.** Each filed sub-issue gets a rich seed body (ADR-21) so its eventual `/workflow` run can plan and implement without seeing siblings or the original `plan.md`. For each task `$N` in 2…N, dispatch `@pm` with the following body content (assembled by the orchestrator from `task-.md` and the relevant slice of `plan.md`): ```markdown ## What to implement .md> ## Acceptance criteria .md> ## Code Context .md> ## Integration with sibling sub-issues , ...]`> ## Plan rationale ## Test design .md or plan.md if present> --- Discovered during run on `$BRANCH_NAME` for parent issue `$ISSUE_ID`. ``` `@pm` invocation per sub-issue: - Title — derived from `task-.md`'s task description (short imperative). - Status — `Todo`. - Parent — `$ISSUE_ID`. - Labels — propagate relevant labels from the parent (e.g. `gameplay`); add `split-from-run` to mark the provenance. - `depends-on:` — sibling sub-issue IDs that this task requires to be `Done` first. The orchestrator determines the dependency graph from the integration contracts captured in Phase 5.5 question 4. The new sub-issue files plus the parent's updated `## Sub-issues` list are staged in Phase 9's `chore(todo): …` commit alongside the parent's status/AC updates. --- ## Phase 6: Write Tests Apply **Dispatch Hygiene** to each `@test` prompt before sending. For each task from Phase 5, dispatch `@test` with a short prompt that names: - The absolute path to the task spec: `$RUN_DIR/task-.md` — `@test` reads acceptance criteria, code context, and files-to-modify from there. - The absolute path to the plan, if test design context is needed: `$RUN_DIR/plan.md`. - The worktree path (so `@test` resolves source files correctly). - The test file path to create. Do **not** quote task or plan content inline — `@test` reads from disk. `@test` writes failing tests and verifies RED with structured failure codes. **Decision table — handling `@test` results:** | Condition | Action | |-----------|--------| | `TESTS_READY` + `escalate_to_check: false` | Proceed to Phase 7 | | `TESTS_READY` + `escalate_to_check: true` | Route tests to `@check` for light review. `@check` diagnoses, caller routes fixes to `@test`. Then proceed. | | `NOT_TESTABLE` | Route to `@check` for sign-off on justification. If `Missing testability seam`, dispatch `@make` to add the seam first, then re-run `@test`. Otherwise the task goes to `@make` without tests. **Record the `@test` justification + `@check` sign-off rationale** for the Phase 8 NOT_TESTABLE manifest (ADR-18). | | `BLOCKED` | Investigate. May need to revise task spec or plan. | | Test passes immediately | Investigate — behavior may already exist. Task spec may be wrong. | | Stub-first run: tests pass with zero `todo!()` panics | **Structural-only tests.** Every test is asserting type/struct/enum facts without calling any stubbed symbol. Reject the test output and route back to `@test` with a "must exercise the stubbed symbols by calling them" note. Do not let these tests gate Phase 7 — they cannot RED→GREEN, so the body-pass `@make` would commit code with false-green coverage. | ### Rust stub-first TDD (mandatory for new symbols) Whenever `@test` will write tests (module or integration) that reference functions / methods / types **that do not yet exist**, the test cannot RED meaningfully against absent code: - *Module tests inside `src/.rs`* — without the function, the `#[cfg(test)] mod tests` block fails to compile (`error[E0425]`), masking assertion diagnostics. - *Integration tests inside `tests/.rs`* — same, but mediated through `lib.rs` re-exports. To get a clean runtime RED, dispatch a **stub-first `@make` pass** *before* `@test` runs: **Stub pass (split from Phase 7's body pass):** 1. Dispatch `@make` in **standard mode** (no tests exist yet). The dispatch prompt names `$RUN_DIR/task-.md` as the source spec and adds this stub-pass-specific scope inline: - **Goal:** add the planned API as `todo!()`-bodied stubs so the test will compile. - **Files to modify:** the relevant `src/.rs` for module tests, or `src/lib.rs` plus any new `src/.rs` for integration tests (the latter need `pub mod …;` declarations so the test crate can import). - **Stubs only:** every function body is exactly `todo!()`. Every method body is exactly `todo!()`. Structs may use `pub struct Foo;` or `pub struct Foo { /* fields TBD */ }` — but no logic. - **Signatures must match the planned final API exactly** (return types, lifetimes, generics, visibility). Lift signatures from the task spec. - **Acceptance criteria:** `cargo check` (wrapped in `nix develop -c …` if the project has a devshell) passes; no test command is run. - **Dispatch Hygiene still applies:** the stub pass is small and finalized — no draft bodies, no contradictory signatures. 2. Verify `cargo check` passed in `@make`'s output. If not, fix and re-dispatch the stub pass before continuing. 3. Dispatch `@test`. The test now compiles; running it panics on `todo!()` at runtime, which is a clean `MISSING_BEHAVIOR` RED with a stack trace — far better than the build-error-RED form. 4. **Panic-coverage check (MANDATORY).** After `@test` returns, re-run the test command in the orchestrator and verify that **every test in the new file panics on `todo!()`** (i.e. every test exercises at least one of the stubbed symbols). The decision rule: - If the test output shows N panics for N tests → proceed to body pass. - If any test passes without a `todo!()` panic → that test is structural-only (asserting type / variant-count / field facts without calling the stubbed code). **Reject** `@test`'s output and route back with the "Stub-first run: tests pass with zero `todo!()` panics" decision-table verdict. Require `@test` to rewrite each non-panicking test so it actually invokes the stubbed function/method. - This check is the only thing standing between false-green coverage and the body-pass commit. Skipping it has produced regressions like a system that compiles, "passes" tests, and silently no-ops in production. 5. Continue to Phase 7's body pass (`@make` in TDD mode), where the same files are revisited and the `todo!()` bodies are replaced. **This routing is mandatory** whenever new symbols are introduced in Rust (module or integration). It is **not** required when the test exercises an *existing* function/method (e.g. a behavior fix) — in that case `@test` runs directly and `@make` modifies the body in Phase 7. The stub pass and the body pass each produce their own atomic commit (per Phase 9 rules): `feat(): scaffold with todo!() stubs` followed by `feat(): implement ` (or whichever conventional type fits). **Parallelism:** N/A in the one-task-per-run model (ADR-21). Phase 6 dispatches `@test` for task-1 only; the stub-pass `@make` (when applicable) runs strictly before `@test`, and the body-pass `@make` strictly after. There are no peer dispatches to parallelise. **Constraint:** `@test` must not modify existing `conftest.py` files (preserves cross-test invariants for the project's broader suite). --- ## Phase 7: Implement Apply **Dispatch Hygiene** to each `@make` spec before sending. Repeated trips on a single task signal a scoping problem — `@check`'s diagnosis (below) will return `split_needed`, and the orchestrator files the task as a sub-issue and exits. **One task per run** (ADR-21): only task-1 reaches Phase 7. Tasks 2…N were filed as sub-issues at the end of Phase 5.5; this run does not dispatch `@make` for any of them. Stub-pass and body-pass dispatches for task-1 still run sequentially (the stub-pass must commit before `@test` runs against it, and the body-pass must run after `@test` produces failing tests). Execute task-1 by dispatching `@make` with a short prompt: - The absolute path to the task spec: `$RUN_DIR/task-1.md` — `@make` reads acceptance criteria, code context, and files-to-modify from there. - The worktree path. - **Pre-written failing tests and handoff from `@test` (if TESTS_READY)** — these are short and per-dispatch, so include them inline in the prompt. Do **not** quote the task spec inline. `@make` runs in TDD mode when tests are provided: 1. Entry validation: run tests, verify RED, check failure codes match handoff. 2. Implement minimal code to make tests pass (GREEN). 3. Regression check: run the project's full test suite (`nix develop -c cargo test` / `uv run pytest` / `nix flake check` as appropriate). 4. Refactor while keeping green. 5. Report RED→GREEN evidence. For NOT_TESTABLE tasks, `@make` runs in standard mode (no entry validation; standard implementation + verification). ### Implementation Incomplete — unified diagnosis path (ADR-19) When `@make` returns `Implementation Incomplete` for *any* reason — entry-validation concern, mid-implementation iteration limit, `escalate: test_design` flag, `escalate: split_needed` flag, or no flag at all — the orchestrator routes the report through `@check` for diagnosis. **Do not re-dispatch `@make` with marginal context tweaks.** `@make`'s self-diagnosis (the `escalate:` flag, if present) becomes a *hint* for `@check`; `@check` is the authority that decides what to do next. Steps: 1. `@make` returns its `Implementation Incomplete` report (with or without an `escalate:` flag). 2. Orchestrator dispatches `@check` for diagnosis. Inputs: the test files (`@check` reads them from disk), the production code state (the in-progress diff), the task spec at `$RUN_DIR/task-1.md`, and `@make`'s self-diagnosis hint. 3. `@check` returns one of three verdicts: - **`test_design`** — the test demands production code that's impossible, internally-inconsistent, or testing the wrong observable. Orchestrator dispatches `@test` to redesign the tests. Apply Dispatch Hygiene. Fixed tests return to `@make` for fresh entry validation and a clean implementation attempt. - **`production_logic`** — the test is sound; `@make`'s implementation is wrong or incomplete. Orchestrator re-dispatches `@make` with `@check`'s diagnostic notes attached. - **`split_needed`** — the task is over-scoped; no realistic implementation can satisfy the AC within the task's stated files-to-modify. See "split_needed exit" below. **Iteration limit on this loop: max 2 cycles.** If two cycles of `@check` diagnosis don't yield a clean `@make` Implementation Complete, the design problem is upstream — **decrement `PLAN_REWORK_REMAINING`** and revisit the Phase 3 plan. If `PLAN_REWORK_REMAINING` is already `0`, abort to the Failure Handler instead. ### split_needed exit (ADR-21) When `@check` diagnosis returns `split_needed` for task-1, **abort to the Failure Handler.** Rationale (Q19a per ADR-21): in the one-task-per-run model, task-1 is the only task in this invocation. No feature AC have been satisfied yet (any stub-pass commit is scaffolding, not feature work). Pre-filing a sub-issue from a botched plan would pollute `TODO/` with bad scoping the user has to manually clean up; the most likely upstream diagnosis is a plan-level scoping error, not a task-level over-scope. The Failure Handler's recovery procedure (ADR-14: discard worktree, delete branch, re-create from base, re-run) cleanly destroys the stub-pass commit and any in-progress body-pass changes. The user re-runs `/workflow` on the same parent issue; Phase 3/4/5 produce a fresh plan and split. Sub-issues filed at Phase 5.5 of the *previous* run remain in `TODO/` (they were committed there in the failed run only if Phase 9 reached the TODO commit, which by definition it didn't if Phase 7 invoked the Failure Handler) — typically they're gone with the discarded worktree. Concretely on `split_needed`: 1. Write a Failure Handler summary noting `@check`'s diagnosis verbatim and the Phase 5 split that was attempted. 2. Dispatch `@pm` (operation: `Add comment`, issue ID: `$ISSUE_ID`) with the comment text: `- YYYY-MM-DD — split_needed at Phase 7 task-1; . Re-run after re-creating the worktree.` `@pm` resolves the issue file path itself; the orchestrator never constructs it. 3. Stop execution. Do not commit code, do not file new sub-issues, do not stage anything under `.workflow/`. --- ## Phase 8: Final Review Apply **Dispatch Hygiene** to each reviewer prompt before sending. Dispatch `@check` and `@simplify` in parallel to review the implementation. Because the run executes exactly one task (ADR-21), the diff is bounded by the task's scope (~50 lines per the Split Heuristic) and fits comfortably in reviewer context — no chunking needed. Provide reviewers with: - The absolute path to `$RUN_DIR/plan.md` (the same file Phase 4 reviewed; mid-loop revisions will have updated it in place). - The absolute path to `$RUN_DIR/task-1.md` (the spec the implementation actually targeted). - The full diff (`git diff "$BASE_BRANCH"...HEAD`). - Any decisions or deviations from the plan, captured inline in the dispatch prompt. - **NOT_TESTABLE manifest (ADR-18):** if task-1 went `NOT_TESTABLE` at Phase 6, the dispatch prompt includes a "Tasks completed without tests (NOT_TESTABLE)" section listing the `@test` justification and the `@check` sign-off rationale. Reviewers explicitly evaluate "does the justification still hold given the final diff?" and may BLOCK if it doesn't. If task-1 had tests (the common case), this section reads "None — task-1 has tests." **Verdict authority** (ADR-15): same as Phase 4. `@check` is the only blocking reviewer; `@simplify`'s findings are recorded as advisory in the run summary but never trigger a re-dispatch loop. **Review loop (max 3 cycles, gated on `@check`):** 1. Send implementation to both reviewers (dispatched in parallel). 2. Merge findings: record `@simplify`'s output for the summary; act only on `@check`'s verdict. 3. If `@check` returns ACCEPTABLE: proceed to Phase 9. 4. If `@check` finds issues, route per the kind of finding — **the orchestrator does not write production code; `@make` does**: - **`BLOCK`, behavioral, correctness, or production-code findings:** write a fix spec to `$RUN_DIR/task-fix-.md` (1-indexed within this Phase 8 cycle, ADR-16). Apply Dispatch Hygiene, finalized text, no draft answer. Verify with `test -f` before dispatching. Dispatch `@make` with the absolute path. Do **not** fix directly. Every `BLOCK` is by definition behavioral and must round-trip through `@make`. - **Test-quality / test-design findings:** route through the unified Implementation Incomplete diagnosis path (Phase 7) — `@check` diagnosis → `@test` redesign → fresh `@make` re-attempt against the existing task spec. - **Plan-level finding:** **decrement `PLAN_REWORK_REMAINING`** and re-enter Phase 3 with the finding. If the counter is already `0`, abort to the Failure Handler. - **Strictly cosmetic findings** (typo in a comment, missing trailing newline, formatting that does not change the AST or behavior): the orchestrator may fix directly, then re-review. Anything compiler-detected (unused import, dead code) goes through `@make` via a `task-fix-.md`, since removing it is still a code change. - When in doubt, dispatch `@make`. 5. **Convergence detection:** same `@check` findings twice = stop loop early. 6. If unresolved after 3 cycles: document blockers, proceed to commit anyway. --- ## Phase 9: Commit and Wrap Up The workflow is forge-agnostic. It commits locally and stops. **Do not push, and do not open a pull/merge request** — the user chooses their forge and review workflow manually. ### Commit Code Changes - Stage code changes only. **Do not stage anything under `TODO/`** (committed separately below) and **do not stage anything under `.workflow/`** (intentionally never committed — these are per-run artifacts). - Write a conventional commit message summarizing the implementation. Reference the TODO issue ID in the body (e.g. `Refs: GAL-39`). - If changes are large/varied, use multiple atomic commits (one per logical unit) ### TODO Update Dispatch `@pm` with the issue ID `$ISSUE_ID` and the following operations (a single dispatch can carry all of them — see `agents/pm.md` for the request shape): 1. **Check off the AC the run satisfied.** Pass the list of AC indices or texts (from the `acceptance_criteria` array `@pm` returned at Phase 2) that the implemented work fulfilled. The orchestrator decides which AC are satisfied by inspecting task-1's spec and verification output. `@pm` flips the corresponding `- [ ]` to `- [x]`. 2. **Set the issue's `status` based on AC completion** (ADR-21, AC-driven): - **All AC are now `[x]`** → `Done`. - **Some AC remain `[ ]`** → `In Progress`. (Sub-issues filed at Phase 5.5 cover the unmet AC; the user runs them in subsequent invocations.) - **No AC section** → `Done` (the parent had no testable AC; one task ran end-to-end). 3. **Add a comment** of the form: `- YYYY-MM-DD — Branch \`$BRANCH_NAME\`, commit ` (date from the shell, never fabricated). `@pm` propagates status flips to the dependent index (the top-level README or the parent's `## Sub-issues` line) on its own — that's its job, not the orchestrator's. The orchestrator passes high-level intent ("set status to Done") and trusts `@pm` to update every dependent file. ### File Follow-ups Tracked-worthy unresolved items must become real TODO issues; otherwise they vanish into the per-run `summary.md` and the user (who has walked away) never sees them. Before writing the summary, scan the run for items in these categories and dispatch `@pm` to file each as a **sub-issue of the current issue** (`parent: $ISSUE_ID`). | Source | New issue label | Title style | |---|---|---| | Pre-existing bug discovered while working but out of scope (e.g. "Score not resetting on game restart" found during GAL-39) | `bug` | Imperative fix description ("Reset score on game restart") | | Unresolved blocker after a review loop exhausted its cycle limit (Phase 4 plan review or Phase 8 final review) | `followup` | Reference the `@check` finding | | `@test` `NOT_TESTABLE` "future seam" notes that imply a real test gap | `tech-debt` | Describe the missing seam | **Do NOT file follow-ups for:** - `@simplify` advisory recommendations the orchestrator chose not to act on — these are records, not missing work; they belong in the run summary. - Cosmetic / formatting / naming nits. - Anything already covered by an existing TODO issue (`@pm` lists existing issues; check the title/description before filing a duplicate). **Routing rules:** - Each new issue is a sub-issue (`parent: $ISSUE_ID`). `@pm` will add it to the parent's `## Sub-issues` list automatically. The user can promote it to top-level later if it deserves its own slot. - Issue body must include a "Discovered during" paragraph naming the run's branch and (where relevant) commit SHA, plus enough context for the user to triage it later without having to re-read the run. - Status: `Todo`. Default labels per the table; the orchestrator may add additional labels inferred from the parent (e.g. propagate `gameplay` from GAL-39 to a gameplay-relevant follow-up). - The Run Summary (next subsection) lists each filed follow-up by ID so the user has one place to see them. ### Commit TODO Changes After both the TODO Update and File Follow-ups steps, dispatch `@pm` with operation `Commit pending changes` and the commit message constructed from the run context: - If follow-ups were filed: `chore(todo): update status, file follow-ups`. - Otherwise: `chore(todo): update status and progress`. `@pm` is responsible for persistence — the orchestrator does **not** run `git add` or `git commit` on TODO changes itself (per ADR-23). For the filesystem-backed `@pm`, the dispatch results in a single atomic commit on the feature branch; for tracker-backed `@pm` implementations (e.g. Linear), the dispatch is a no-op because the API calls already persisted the data. Capture the returned `sha` (may be `null` for non-filesystem trackers) for the run summary's "final commit SHA(s)" field. ### Run Summary - Write `$RUN_DIR/summary.md` with: - **Run timestamp** — capture it from the shell at write time: `date -Iseconds` (e.g. `2026-05-08T11:24:13+02:00`). **Do not** use a placeholder like `???:???:??` or "session date" — if you cannot get a real timestamp, omit the field entirely rather than fabricating one. - Issue reference and title - Branch name and final commit SHA(s) - Summary of implementation - TDD evidence (RED→GREEN per task, NOT_TESTABLE justifications) - Review outcomes (plan review + final review verdicts) - **Filed follow-ups** — list each new issue created in the File Follow-ups step by ID, title, and reason (`bug` / `followup` / `tech-debt`). If none, write "None." - **Advisory notes (not filed)** — any `@simplify` or `@check` recommendations the orchestrator chose not to act on and did not turn into a TODO. These are records for the user to consider, not tracked work. - Files changed - **Do not commit anything under `.workflow/`.** The whole directory is per-run, per-branch state. Recommend the user add `.workflow/` to `.gitignore` if not already. --- ## Failure Handling At any phase, if an unrecoverable error occurs (or a routing rule explicitly aborts to the Failure Handler — `PLAN_REWORK_REMAINING` exhausted, `split_needed` at Phase 7, etc.): 1. Write `$RUN_DIR/summary.md` (creating `$RUN_DIR` first if it doesn't exist) with what was completed and what failed. Do **not** stage or commit anything under `.workflow/`. 2. If any code was written, commit it with message `wip: incomplete workflow run for `. Stage code only — exclude `.workflow/` and `TODO/`. 3. Leave the branch and worktree intact for the user to inspect — do not push, do not delete. 4. Dispatch `@pm` (operation: `Add comment`, issue ID: `$ISSUE_ID`) summarising what failed and naming the abort reason if it was a routing-rule abort (e.g. `split_needed at Phase 7 task-1`, `plan_rework_remaining exhausted at Phase 8`). The orchestrator never constructs the issue file path — `@pm` resolves it. 5. Dispatch `@pm` (operation: `Commit pending changes`, message: `chore(todo): record failure on `) so the failure note lands on the branch as a commit (per ADR-23). For tracker-backed `@pm` implementations this is a no-op. For filesystem `@pm`, the failure comment survives on the branch for the user to review before discarding the worktree. 6. Stop execution. ### Recovery procedure (workflow is non-resumable, ADR-14) The workflow is **non-resumable**. There is no `--resume` mode and no idempotent re-run path. To retry after a Failure Handler invocation (or after a user-initiated cancellation): 1. `git worktree remove ` — discard the failed worktree. 2. Delete the feature branch: `git branch -D `. The Failure Handler's `wip:` commit (if any) is discarded with the branch. 3. Re-create the worktree from `$BASE_BRANCH`: `git worktree add -b `. 4. Re-run `/workflow ` from the fresh worktree. The throwaway-worktree model is the recovery story. Re-running on the same worktree without this cleanup risks committing partial state or appending duplicate `@pm` comments. ### User-initiated cancellation User-initiated cancellation (Ctrl-C) follows the same procedure as automatic Failure Handler invocations: discard the worktree per the recovery procedure above. The orchestrator does not poll a soft-stop sentinel — runs are short enough (one task, ~10–30 min) that hard cancellation is fine. **Never hang on interactive prompts.** If any command appears to require input, treat it as a failure and follow the above procedure.