--- description: "Multi-agent workflow for the current worktree: plan, test, implement, commit" agent: build --- You are executing the multi-agent workflow inside the worktree this opencode session was started from. Run all phases without waiting for user input. The user has walked away. **Prerequisites (the user handles before launching opencode):** - A git worktree is checked out for the issue's feature branch - `opencode` was launched from the root of that worktree - `TODO.md` is committed to the repo and present at `./TODO.md` **Task reference:** $ARGUMENTS If `$ARGUMENTS` is empty, stop immediately: "Usage: `/workflow [base-branch]` (e.g. `/workflow ABC-1`). The ID must exist in `./TODO.md`. Base branch defaults to `main` (then `master`)." Parse `$ARGUMENTS`: the first whitespace-separated token is the issue ID, an optional second token overrides the base branch. --- ## Phase 1: Sanity Check 1. Verify CWD is a non-bare git worktree: `git rev-parse --is-bare-repository 2>/dev/null` must output `false`. If not, stop: "Workflow must be run from a non-bare worktree (the directory opencode was launched in)." 2. Verify `./TODO.md` exists. If not, stop: "TODO.md not found in the current worktree. Commit a TODO.md to the repo first." 3. Verify HEAD is not detached: `git symbolic-ref --short HEAD` must succeed. If it fails, stop: "Cannot run on a detached HEAD. Check out a feature branch first." 4. Capture the current branch: `BRANCH_NAME="$(git symbolic-ref --short HEAD)"`. 5. Resolve the base branch (`BASE_BRANCH`): - If `$ARGUMENTS` provided a second token, use it. - Else if `git rev-parse --verify --quiet main` succeeds, use `main`. - Else if `git rev-parse --verify --quiet master` succeeds, use `master`. - Else stop: "Could not determine base branch (no `main` or `master`). Pass it as the second argument: `/workflow `." 6. Verify the current branch is not the base branch: if `BRANCH_NAME == BASE_BRANCH`, stop: "Cannot run workflow on the base branch (`$BASE_BRANCH`). Switch to a feature branch first." --- ## Phase 2: Issue Context Dispatch `@pm` to read `./TODO.md` (live filesystem mode) and fetch the issue matching the parsed ID: - Issue title, description, acceptance criteria - Labels and priority - Existing status If the issue does not exist or `@pm` fails, stop with error. If the issue's status is `Backlog` or `Todo`, ask `@pm` to set it to `In Progress` (this edit will be staged in Phase 9 alongside other TODO.md updates). --- ## Phase 3: Plan Analyze the codebase. Create a detailed implementation plan addressing the issue's requirements and acceptance criteria. The plan should include: - Problem summary (from issue context) - Proposed approach with rationale - Files to modify (with brief description of changes) - New files to create - Risks and open questions - **Test Design (conditional — include for non-trivial tasks):** - Key behaviors to verify, expressed as **action + observable outcome** (e.g. *"call `weave_enemies` with t=0.5 → enemy `Transform.translation.x` differs from initial position"*). A structural fact like *"enum has 3 variants"* or *"struct has these fields"* is **not** a behavior — it cannot fail meaningfully and does not exercise the code under test. - Edge cases and error conditions worth testing (also expressed as actions, not structure) - What explicitly should NOT be tested (prevents bloat) - Testability concerns (heavy external deps, GPU-only paths, etc.) **Include Test Design for:** Public API changes, bug fixes with behavioral impact, new features with business logic, multi-module changes. **Skip Test Design for:** Config-only changes, decorator swaps, import reorganization, documentation. When skipped, `@test` derives test cases directly from acceptance criteria. After drafting, apply **Dispatch Hygiene** (below) to the plan — it is a dispatch artifact and gets sent to `@check`/`@simplify` in Phase 4. --- ## Dispatch Hygiene This applies to **every** subagent dispatch (Phases 4, 6, 7, 8) **and** to artifacts that will be dispatched (the plan from Phase 3, the task specs from Phase 5). Apply these checks before sending — fix the artifact, then re-check. ### Finalized-Text Rule The artifact must be **finalized** — single-author text, no contradictions, no open questions. Forbidden: - "Actually, that's wrong — let me correct…" - "Wait, let me reconsider…" - Two versions of the same code block, one labelled "corrected" or appearing after a revision pass - Open questions or ambiguities the orchestrator hasn't resolved - Mid-text revisions visible to the recipient If you find yourself revising while writing, stop, redo the artifact from scratch with the corrected understanding, and only then dispatch. Subagents are fresh-context — they cannot reliably resolve which of two contradictory drafts is canonical, and reviewers cannot give a clean verdict on a self-contradicting plan. ### No-Implementation-in-Plan-or-Spec Rule Plans (Phase 3) and task specs (Phase 5) are **not** the place to write the answer. They describe what to do; `@make` writes how. Provide: - Approach with rationale - Files to modify with brief descriptions - Function signatures, type declarations, data shapes (structure, not logic) - Constraints, invariants, integration contracts - Risks and edge cases Do **not** provide: - Drop-in code blocks longer than ~5 lines that constitute "the answer" - Full function bodies for the changes being planned - Complete `match` arms / branch logic / loop bodies for new behavior - Pre-written test bodies (those come from `@test`, or from `@make` for Rust unit-only) - Stage-by-stage code transformations spelled out as ready-to-commit diffs If you've already written the implementation in the plan or spec, the artifact has overstepped. Convert finished code into structural description (signature + intent) and let `@make` produce the body. **Allowed in plans/specs:** - Existing code being replaced, marked as "current state" - Function signatures and type/struct/enum declarations (data, not logic) - Tiny inline constants (`pub const FOO: f32 = 30.0;`) - Test specifications as one-line behavior descriptions ("input X → expect Y") ### Pre-Dispatch Validation (MANDATORY) Scan the artifact and reject (revise, retry) if any of the following are present: | Check | Why it matters | |---|---| | `bash -c`, `sh -c`, `zsh -c`, `fish -c` (anywhere, including inside `nix develop --command bash -c …`) | `@make`/`@test` sandboxes deny all `*-c` shell invocations and any nested `bash` would bypass the per-command allowlist. Replace with one direct command per line: `nix develop -c cargo check`, etc. | | `nix develop --command bash` / `nix develop -c bash` / `nix develop -c sh` | Same — inner shell escapes the sandbox. Wrap each toolchain command directly. | | Any `cd && …` | Subagents cannot `cd`. Rewrite to use absolute paths. | | Code blocks longer than ~5 lines that draft the answer | Violates No-Implementation-in-Plan-or-Spec. Trim to structure (signature + "current state" only). | | Two versions of the same code, "actually let me correct…", or open questions | Violates the Finalized-Text Rule. Redo the artifact. | | Test bodies inside `@make` specs when tests are coming from `@test` | Duplicates the TDD handoff. | If any check trips, **do not dispatch.** Fix and re-validate. Repeated trips on a single task signal a Phase 5 split problem — go back and split. --- ## Phase 4: Review Plan Apply **Dispatch Hygiene** to the plan and to each reviewer prompt before sending. Dispatch `@check` and `@simplify` in parallel to review the plan. Reviewers should evaluate testability: - `@check`: Is the design testable? Are the right behaviors identified? (Review Framework §8) - `@simplify`: Is the test scope appropriate? Over-testing proposed? **Merge rules:** - `@check` safety/correctness findings are hard constraints - If `@simplify` recommends removing something `@check` flags as needed, `@check` wins - Note conflicts explicitly **Review loop (max 3 cycles):** 1. Send plan to both reviewers 2. Merge findings 3. If verdict is ACCEPTABLE from both (or JUSTIFIED COMPLEXITY from `@simplify`): proceed to Phase 5 4. If BLOCK or NEEDS WORK: revise the plan addressing findings, then re-review 5. **Convergence detection:** if reviewers return the same findings as the previous cycle, stop the loop early 6. If still unresolved after 3 cycles: note unresolved blockers and proceed anyway (they will be documented in the workflow summary and commit message) --- ## Phase 5: Split into Tasks Break the approved plan into discrete tasks for `@make`. Each task needs: | Required | Description | |----------|-------------| | **Task** | Clear description of what to implement | | **Acceptance Criteria** | Specific, testable criteria (checkbox format) | | **Code Context** | Actual code snippets from the codebase, not just file paths | | **Files to Modify** | Explicit list, mark new files with "(create)" | | **Test File** | Path for test file. **Pick the pattern that matches the project's language** — see "Test File Path by Language" below. | ### Test File Path by Language The test file path must follow the language's actual test layout. **Do not invent paths that look colocated but aren't valid for the language** (e.g. `src/tests/test_.rs` is *not* a Rust test location — it's a regular `src/` submodule). - **Python** - Colocated: `/tests/test_.py (create)` - Top-level: `tests/test_.py (create)` - **Rust** - Crate-level integration tests: `tests/.rs (create)` (or, in a workspace, `/tests/.rs`). - **If the test references not-yet-existing public API**, the task automatically requires a **stub-first `@make` pre-pass** before `@test` runs (see Phase 6 → "Rust integration TDD: stub-first"). Plan for two `@make` dispatches per such task: stub pass, then body pass. - **Unit-test-only tasks (in-source `#[cfg(test)] mod tests`):** mark the task as `NOT_TESTABLE` with reason `Rust unit-only` — `@test` cannot write inside production source. `@make` writes those inline as part of its production change. - **Polyglot Nix flake** - Match the host language of the code under change (Python or Rust rules above), wrapping commands in `nix develop -c …` per the agents' devshell rule. Include **Integration Contracts** when a task adds/changes function signatures, APIs, config keys, or has dependencies on other tasks. Include **Test Design** from Phase 3 when available, attached to the relevant task(s). **Task size:** ~10-30 minutes each, single coherent change, clear boundaries. ### Split Heuristic — when in doubt, split A task must be **split** if any of the following apply: - It touches more than two distinct concerns (e.g. *constants + new component + sprite spawn + new system + main wiring* is **five** concerns — at least three tasks). - It changes more than ~50 lines across more than 2 files. - It mixes data/structural changes (constants, types, components) with runtime/system changes (new ECS systems, scheduling, render loops). - It mixes pure-logic changes (math helpers) with stateful changes (queries, world mutation). - It mixes new APIs with their first call sites in the same task. When a task fails the heuristic, split into: 1. **Foundations** — new constants, types, components (no behavior change yet). 2. **Implementation** — the actual production logic, calling the foundations. 3. **Wiring** — registration in `main.rs` / `lib.rs` / app-builder. Each split is dispatched separately to `@make` and verified before the next. ### Code Context — what to include The **Code Context** field exists so `@make` can find the seam to modify. Provide: - The existing code being replaced (verbatim, marked as "current state"), with ~5–10 lines of surrounding context - Function signatures of helpers `@make` will need to call - The file's relevant import block For everything you must **not** include — drop-in replacements, full function bodies, pre-written test bodies, "here is what to write" — see **Dispatch Hygiene → No-Implementation-in-Plan-or-Spec Rule** above. If the task is so well-specified that you've already written the implementation, the task is too small for `@make` (apply it directly) or you've over-determined the design (revisit Phase 3). Apply **Dispatch Hygiene** to each task spec before dispatch in Phase 7. --- ## Phase 6: Write Tests Apply **Dispatch Hygiene** to each `@test` prompt before sending. For each task from Phase 5, dispatch `@test` with: - The task spec (acceptance criteria, code context, files to modify) - The Test Design section from the plan (if provided) - The test file path to create (following colocated pattern) `@test` writes failing tests and verifies RED with structured failure codes. **Post-step file gate (MANDATORY):** Before dispatching `@test`, snapshot every modified, staged, *and untracked* file. `git diff --name-only` alone misses untracked files, which is precisely the state of any new test file `@test` creates (it cannot `git add`). Use `git status --porcelain` so the gate sees them: ```bash git status --porcelain | sed 's/^...//' | sort -u > /tmp/pre_test_baseline.txt ``` After `@test` completes, list NEW changes (in the post-snapshot but not the pre-snapshot): ```bash git status --porcelain | sed 's/^...//' | sort -u | comm -23 - /tmp/pre_test_baseline.txt > /tmp/test_new_files.txt ``` Each line in `/tmp/test_new_files.txt` is a file path that did not exist (or was unmodified) before `@test` ran. The gate validates each one against the patterns below. All new files must match the project's test patterns: - Python: `**/test_*.py`, `**/*_test.py`, `**/conftest.py` (new only), `**/test_data/**`, `**/test_fixtures/**` - Rust: `tests/**/*.rs`, `**/tests/**/*.rs` (workspace-style `/tests/...`), `**/test_data/**`, `**/test_fixtures/**` **Anti-patterns — discard the output even if the glob matches:** - Anything under `src/` for Rust (e.g. `src/tests/foo.rs`, `src/**/tests/...`). `src/tests/` is a regular module path under `src/`, not a Rust test location, and `@test` cannot wire it up via `mod` declarations in production source. Such paths indicate the task spec gave a wrong test path — escalate, don't accept the file. If any non-matching file appears, or any anti-pattern matches: discard `@test` output, report violation. **Decision table — handling `@test` results:** | Condition | Action | |-----------|--------| | `TESTS_READY` + `escalate_to_check: false` | Proceed to Phase 7 | | `TESTS_READY` + `escalate_to_check: true` | Route tests to `@check` for light review. `@check` diagnoses, caller routes fixes to `@test`. Then proceed. | | `NOT_TESTABLE` (general reasons) | Route to `@check` for sign-off on justification. If approved, task goes to `@make` without tests. | | `NOT_TESTABLE` reason `Rust unit-only` | See "Rust unit-only routing" below. **Do not** include test code in the `@make` spec; pass test specs only. | | `BLOCKED` | Investigate. May need to revise task spec or plan. | | Test passes immediately | Investigate — behavior may already exist. Task spec may be wrong. | | Stub-first run: tests pass with zero `todo!()` panics | **Structural-only tests.** Every test is asserting type/struct/enum facts without calling any stubbed symbol. Reject the test output and route back to `@test` with a "must exercise the stubbed symbols by calling them" note. Do not let these tests gate Phase 7 — they cannot RED→GREEN, so the body-pass `@make` would commit code with false-green coverage. | ### Rust unit-only routing When `@test` returns `NOT_TESTABLE: Rust unit-only` (the implementation needs in-source `#[cfg(test)] mod tests` blocks that `@test` is forbidden from writing), the orchestrator must: 1. Get `@check`'s sign-off on the justification (no integration-test seam exists). 2. Build the `@make` spec with **test specifications**, not test code: - "Add `#[cfg(test)] mod foo_tests` at the bottom of `` exercising:" - For each behavior, a one-line description: input → expected output, edge case to cover, error path to assert. - Where applicable, name the function under test and the assertion type (`assert_eq!`, `assert!`, panic on invalid input). 3. **Forbidden in the `@make` spec:** complete `#[test] fn …` bodies, full module blocks, or any `@test`-style RED-verified test code. `@make` writes the inline tests itself based on the spec. 4. After `@make` completes, the orchestrator runs the test suite once to confirm RED→GREEN evidence and includes it in the workflow summary. This keeps the agents in their lanes: `@test` never writes inside `src/`, `@make` writes both the tests and the production code in a single coherent change, and the orchestrator sees explicit test pass evidence. ### Rust integration TDD: stub-first (mandatory) Rust integration tests live in a separate test crate (`tests/.rs`) that imports from `lib.rs`. Any test referencing not-yet-existing public API can only RED at *build* time, which masks assertion diagnostics. To avoid this, **for every Rust task whose `@test` step writes an integration test against public API that does not yet exist**, dispatch a stub-first `@make` pass *before* `@test` runs: **Stub pass (split from Phase 7's body pass):** 1. Dispatch `@make` in **standard mode** (no tests exist yet) with this exact scope: - **Goal:** add the planned public API as `todo!()`-bodied stubs so the integration test will compile. - **Files to modify:** `src/lib.rs` (add `pub mod …;` declarations) plus any new `src/.rs` files containing the stub functions/structs. - **Stubs only:** every function body is exactly `todo!()`. Every method body is exactly `todo!()`. Public structs may use `pub struct Foo;` or `pub struct Foo { /* fields TBD */ }` — but no logic. - **Signatures must match the planned final API exactly** (return types, lifetimes, generics) — otherwise the integration test will mismatch later. Lift signatures from the Phase 3 plan / Phase 5 task spec. - **Acceptance criteria:** `cargo check` (wrapped in `nix develop -c …` if the project has a devshell) passes; no test command is run. - **Dispatch Hygiene still applies:** the stub pass is small and finalized — no draft bodies, no contradictory signatures. 2. Verify `cargo check` passed in `@make`'s output. If not, fix and re-dispatch the stub pass before continuing. 3. Dispatch `@test` as normal. The integration test now compiles; running it panics on `todo!()` at runtime, which is a clean `MISSING_BEHAVIOR` RED with a stack trace — far better than the build-error-RED form. 4. **Panic-coverage check (MANDATORY).** After `@test` returns, re-run the test command in the orchestrator and verify that **every test in the new file panics on `todo!()`** (i.e. every test exercises at least one of the stubbed symbols). The decision rule: - If the test output shows N panics for N tests → proceed to body pass. - If any test passes without a `todo!()` panic → that test is structural-only (asserting type / variant-count / field facts without calling the stubbed code). **Reject** `@test`'s output and route back with the "Stub-first run: tests pass with zero `todo!()` panics" decision-table verdict. Require `@test` to rewrite each non-panicking test so it actually invokes the stubbed function/method. - This check is the only thing standing between false-green coverage and the body-pass commit. Skipping it has produced regressions like a system that compiles, "passes" tests, and silently no-ops in production. 5. Continue to Phase 7's body pass (`@make` in TDD mode), where the same files are revisited and the `todo!()` bodies are replaced. **This routing is mandatory** for new public API in Rust. It is **not** required when the integration test exercises an existing public API (e.g. a behavior fix where the function already exists) — in that case `@test` runs directly and `@make` modifies the body in Phase 7. The stub pass and the body pass each produce their own atomic commit (per Phase 9 rules): `feat(): scaffold with todo!() stubs` followed by `feat(): implement ` (or whichever conventional type fits). **Parallelism:** - **Python:** Independent tasks can have tests written in parallel, *provided* their test files are disjoint and no shared `conftest.py` is being modified. - **Rust:** Run `@test` dispatches **sequentially**. Cargo serialises the build via the `target/` directory lock, so parallel dispatches give no speedup; they only add risk (a long-running build in one branch starves the other, and any task that touches a shared crate-level fixture/helper file will race). **Constraint:** `@test` must not modify existing `conftest.py` files (prevents collision during parallel execution). --- ## Phase 7: Implement Apply **Dispatch Hygiene** to each `@make` spec before sending. Repeated trips on a single task signal a Phase 5 split problem — go back and split. **`@make` dispatches are SEQUENTIAL — never in parallel.** Run each task to completion (writes, every verification command, and the orchestrator's post-check) before dispatching the next. Reasons: - `@make` writes source files. Parallel agents picking the same file (e.g. `src/lib.rs` for adding both a new `pub mod` and a registration) corrupt each other. - Even on disjoint files, Cargo's `target/` lock and uv's venv state serialise the verification builds anyway, so parallelism gives no speedup. - Stub-pass/body-pass pairs (Rust integration TDD) must be strictly ordered within a task; running stub-pass for task 2 while body-pass for task 1 is still building yields a non-deterministic crate state for `@test` to RED against. This applies to **all** `@make` invocations: standard mode, TDD mode, stub-pass, body-pass, and integration-fix dispatches. Execute each task by dispatching `@make` with: - The task spec (from Phase 5, finalized per Dispatch Hygiene) - Relevant code context (seam-revealing snippets only — see Phase 5 "Code Context — what to include") - **Pre-written failing tests and handoff from `@test` (if TESTS_READY)** `@make` runs in TDD mode when tests are provided: 1. Entry validation: run tests, verify RED, check failure codes match handoff 2. Implement minimal code to make tests pass (GREEN) 3. Regression check on broader area 4. Refactor while keeping green 5. Report RED→GREEN evidence **Escalation:** If `@make` flags test quality concerns during entry validation: 1. `@make` reports the issue to caller 2. Caller routes to `@check` for diagnosis 3. `@check` reports findings 4. Caller routes to `@test` for fixes 5. Fixed tests return to `@make` For NOT_TESTABLE tasks, `@make` runs in standard mode. After all tasks complete, verify overall integration: - Run the project's test suite if available - Run linting/type checking if configured - Fix any integration issues between tasks --- ## Phase 8: Final Review Apply **Dispatch Hygiene** to each reviewer prompt before sending. Dispatch `@check` and `@simplify` in parallel to review the full implementation (all changes across all files). Provide reviewers with: - The original plan - The full diff (`git diff "$BASE_BRANCH"...HEAD`) - Any decisions or deviations from the plan **Review loop (max 3 cycles):** 1. Send implementation to both reviewers 2. Merge findings (same precedence rules as Phase 4) 3. If ACCEPTABLE: proceed to Phase 9 4. If issues found, route per the kind of finding — **the orchestrator does not write production code; `@make` does**: - **`BLOCK`, behavioral, correctness, or test-quality findings:** build a new `@make` task spec from the finding (apply Dispatch Hygiene, finalized text, no draft answer). Dispatch `@make`. Do **not** fix directly. Every `BLOCK` is by definition behavioral and must round-trip through `@make`. - **Strictly cosmetic findings** (typo in a comment, missing trailing newline, formatting that does not change the AST or behavior): the orchestrator may fix directly, then re-review. Anything compiler-detected (unused import, dead code) goes through `@make`, since removing it is still a code change. - When in doubt, dispatch `@make`. 5. **Convergence detection:** same findings twice = stop loop early 6. If unresolved after 3 cycles: document blockers, proceed to commit anyway --- ## Phase 9: Commit and Wrap Up The workflow is forge-agnostic. It commits locally and stops. **Do not push, and do not open a pull/merge request** — the user chooses their forge and review workflow manually. ### Commit Code Changes - Stage code changes only. **Do not stage `TODO.md`** (committed separately below) and **do not stage `.opencode/workflow-summary.md`** (intentionally never committed — see Local Summary). - Write a conventional commit message summarizing the implementation. Reference the TODO.md issue ID in the body (e.g. `Refs: ABC-1`). - If changes are large/varied, use multiple atomic commits (one per logical unit) ### TODO Update - Dispatch `@pm` against `./TODO.md` (live filesystem mode). Ask it to: - Set **Branch** to `$BRANCH_NAME` - Set **Status** to `In Review` - Add a comment with the branch name, latest commit SHA, and a one-line summary - If acceptance-criteria checkboxes were addressed by the implementation, ask `@pm` to check them off - Commit the TODO.md change as a separate atomic commit: `chore(todo): update status and progress` ### Local Summary - Write `.opencode/workflow-summary.md` in the worktree with: - **Run timestamp** — capture it from the shell at write time: `date -Iseconds` (e.g. `2026-05-07T11:24:13+02:00`). **Do not** use a placeholder like `???:???:??` or "session date" — if you cannot get a real timestamp, omit the field entirely rather than fabricating one. - Issue reference and title - Branch name and final commit SHA(s) - Summary of implementation - TDD evidence (RED→GREEN per task, NOT_TESTABLE justifications) - Review outcomes (plan review + final review verdicts) - Unresolved items (if any) - Files changed - **Do not commit this file.** It is a per-run, per-branch artifact; committing it would create merge conflicts whenever multiple workflow branches are merged. Leave it untracked. Recommend the user add `.opencode/` to `.gitignore` if not already. --- ## Failure Handling At any phase, if an unrecoverable error occurs: 1. Write `.opencode/workflow-summary.md` with what was completed and what failed. Do **not** stage or commit this file. 2. If any code was written, commit it with message `wip: incomplete workflow run for `. Stage code only — exclude `.opencode/workflow-summary.md`. 3. Leave the branch and worktree intact for the user to inspect — do not push, do not delete. 4. Dispatch `@pm` against `./TODO.md` to add a comment on the issue summarising what failed. 5. Stop execution. **Never hang on interactive prompts.** If any command appears to require input, treat it as a failure and follow the above procedure.