A workflow run on a Rust/Bevy task produced a single @make dispatch covering six tasks (~2 hours of work), with the orchestrator drafting the full replacement code, including a self-contradicting "actually that's wrong, let me correct…" revision pass and a `nix develop --command bash -c "cargo check"` invocation that @make's sandbox denies. None of the failure modes were caught before dispatch. Phase 5 gains three new subsections: - Split Heuristic — explicit rules for when a task must be split (>2 concerns, >50 lines / 2 files, structural+runtime+wiring mix); prescribes the foundations / implementation / wiring split. - Code Context Anti-patterns — the field is for seam-revealing snippets, not finished answers; max ~5-line snippets, no full replacement bodies. - Finalized-Text Rule — task specs must be single-author finalized text, no "actually, that's wrong" revision passes, no two-version code blocks, no unresolved questions. Phase 6 promotes the Rust unit-only NOT_TESTABLE case out of the decision table into a dedicated routing subsection. The orchestrator must pass test *specifications* (one-line behavior descriptions, target functions, assertion types) to @make — never test code — and run the suite once after @make to capture RED→GREEN evidence. Phase 7 gains a mandatory Pre-Dispatch Validation table that rejects specs containing `bash -c` / `sh -c` (any nesting), `nix develop -c bash`, `cd <path> &&`, oversized Code Context blocks, contradictory revisions, or duplicated test bodies. Repeated trips signal a Phase 5 split problem and route back to splitting.
18 KiB
| description | agent |
|---|---|
| Multi-agent workflow for the current worktree: plan, test, implement, commit | build |
You are executing the multi-agent workflow inside the worktree this opencode session was started from. Run all phases without waiting for user input. The user has walked away.
Prerequisites (the user handles before launching opencode):
- A git worktree is checked out for the issue's feature branch
opencodewas launched from the root of that worktreeTODO.mdis committed to the repo and present at./TODO.md
Task reference: $ARGUMENTS
If $ARGUMENTS is empty, stop immediately: "Usage: /workflow <ISSUE-ID> [base-branch] (e.g. /workflow ABC-1). The ID must exist in ./TODO.md. Base branch defaults to main (then master)."
Parse $ARGUMENTS: the first whitespace-separated token is the issue ID, an optional second token overrides the base branch.
Phase 1: Sanity Check
- Verify CWD is a non-bare git worktree:
git rev-parse --is-bare-repository 2>/dev/nullmust outputfalse. If not, stop: "Workflow must be run from a non-bare worktree (the directory opencode was launched in)." - Verify
./TODO.mdexists. If not, stop: "TODO.md not found in the current worktree. Commit a TODO.md to the repo first." - Verify HEAD is not detached:
git symbolic-ref --short HEADmust succeed. If it fails, stop: "Cannot run on a detached HEAD. Check out a feature branch first." - Capture the current branch:
BRANCH_NAME="$(git symbolic-ref --short HEAD)". - Resolve the base branch (
BASE_BRANCH):- If
$ARGUMENTSprovided a second token, use it. - Else if
git rev-parse --verify --quiet mainsucceeds, usemain. - Else if
git rev-parse --verify --quiet mastersucceeds, usemaster. - Else stop: "Could not determine base branch (no
mainormaster). Pass it as the second argument:/workflow <ISSUE-ID> <base-branch>."
- If
- Verify the current branch is not the base branch: if
BRANCH_NAME == BASE_BRANCH, stop: "Cannot run workflow on the base branch ($BASE_BRANCH). Switch to a feature branch first."
Phase 2: Issue Context
Dispatch @pm to read ./TODO.md (live filesystem mode) and fetch the issue matching the parsed ID:
- Issue title, description, acceptance criteria
- Labels and priority
- Existing status
If the issue does not exist or @pm fails, stop with error.
If the issue's status is Backlog or Todo, ask @pm to set it to In Progress (this edit will be staged in Phase 9 alongside other TODO.md updates).
Phase 3: Plan
Analyze the codebase. Create a detailed implementation plan addressing the issue's requirements and acceptance criteria.
The plan should include:
-
Problem summary (from issue context)
-
Proposed approach with rationale
-
Files to modify (with brief description of changes)
-
New files to create
-
Risks and open questions
-
Test Design (conditional — include for non-trivial tasks):
- Key behaviors to verify (what tests should assert)
- Edge cases and error conditions worth testing
- What explicitly should NOT be tested (prevents bloat)
- Testability concerns (heavy external deps, GPU-only paths, etc.)
Include Test Design for: Public API changes, bug fixes with behavioral impact, new features with business logic, multi-module changes. Skip Test Design for: Config-only changes, decorator swaps, import reorganization, documentation. When skipped,
@testderives test cases directly from acceptance criteria.
Phase 4: Review Plan
Dispatch @check and @simplify in parallel to review the plan.
Reviewers should evaluate testability:
@check: Is the design testable? Are the right behaviors identified? (Review Framework §8)@simplify: Is the test scope appropriate? Over-testing proposed?
Merge rules:
@checksafety/correctness findings are hard constraints- If
@simplifyrecommends removing something@checkflags as needed,@checkwins - Note conflicts explicitly
Review loop (max 3 cycles):
- Send plan to both reviewers
- Merge findings
- If verdict is ACCEPTABLE from both (or JUSTIFIED COMPLEXITY from
@simplify): proceed to Phase 5 - If BLOCK or NEEDS WORK: revise the plan addressing findings, then re-review
- Convergence detection: if reviewers return the same findings as the previous cycle, stop the loop early
- If still unresolved after 3 cycles: note unresolved blockers and proceed anyway (they will be documented in the workflow summary and commit message)
Phase 5: Split into Tasks
Break the approved plan into discrete tasks for @make. Each task needs:
| Required | Description |
|---|---|
| Task | Clear description of what to implement |
| Acceptance Criteria | Specific, testable criteria (checkbox format) |
| Code Context | Actual code snippets from the codebase, not just file paths |
| Files to Modify | Explicit list, mark new files with "(create)" |
| Test File | Path for test file. Pick the pattern that matches the project's language — see "Test File Path by Language" below. |
Test File Path by Language
The test file path must follow the language's actual test layout. Do not invent paths that look colocated but aren't valid for the language (e.g. src/tests/test_<feature>.rs is not a Rust test location — it's a regular src/ submodule).
- Python
- Colocated:
<module>/tests/test_<feature>.py (create) - Top-level:
tests/test_<feature>.py (create)
- Colocated:
- Rust
- Crate-level integration tests:
tests/<feature>.rs (create)(or, in a workspace,<crate>/tests/<feature>.rs) - Unit-test-only tasks (in-source
#[cfg(test)] mod tests): mark the task asNOT_TESTABLEwith reasonRust unit-only—@testcannot write inside production source.@makewrites those inline as part of its production change.
- Crate-level integration tests:
- Polyglot Nix flake
- Match the host language of the code under change (Python or Rust rules above), wrapping commands in
nix develop -c …per the agents' devshell rule.
- Match the host language of the code under change (Python or Rust rules above), wrapping commands in
Include Integration Contracts when a task adds/changes function signatures, APIs, config keys, or has dependencies on other tasks.
Include Test Design from Phase 3 when available, attached to the relevant task(s).
Task size: ~10-30 minutes each, single coherent change, clear boundaries.
Split Heuristic — when in doubt, split
A task must be split if any of the following apply:
- It touches more than two distinct concerns (e.g. constants + new component + sprite spawn + new system + main wiring is five concerns — at least three tasks).
- It changes more than ~50 lines across more than 2 files.
- It mixes data/structural changes (constants, types, components) with runtime/system changes (new ECS systems, scheduling, render loops).
- It mixes pure-logic changes (math helpers) with stateful changes (queries, world mutation).
- It mixes new APIs with their first call sites in the same task.
When a task fails the heuristic, split into:
- Foundations — new constants, types, components (no behavior change yet).
- Implementation — the actual production logic, calling the foundations.
- Wiring — registration in
main.rs/lib.rs/ app-builder.
Each split is dispatched separately to @make and verified before the next.
Code Context Anti-patterns
The Code Context field exists so @make can find the seam to modify, not so it can read off a finished answer. Strictly follow:
- Provide: the existing code being replaced (verbatim), the surrounding ~5–10 lines of context, function signatures of helpers
@makewill need to call, the file's relevant import block. - Do NOT provide: a complete drop-in replacement, the new function bodies, the test bodies (those come from
@testor — for unit-only Rust — from@makeitself per Phase 6), or any "here is what to write" code block longer than ~5 lines.
If the task is so well-specified that you've already written the implementation, the task is too small for @make (apply it directly) or you've over-determined the design (revisit Phase 3).
Finalized-Text Rule
Each task spec must be finalized before dispatch — single-author text with no contradictions. Forbidden in dispatch prompts:
- "Actually, that's wrong — let me correct…"
- "Wait, let me revise…"
- Two versions of the same code block with one labelled "corrected"
- Open questions or ambiguities the orchestrator hasn't resolved
If you find yourself revising while writing the spec, stop, redo the spec from scratch with the corrected understanding, and only then dispatch. @make is a fresh-context implementer; it cannot reliably resolve which of two contradictory drafts is canonical.
Phase 6: Write Tests
For each task from Phase 5, dispatch @test with:
- The task spec (acceptance criteria, code context, files to modify)
- The Test Design section from the plan (if provided)
- The test file path to create (following colocated pattern)
@test writes failing tests and verifies RED with structured failure codes.
Post-step file gate (MANDATORY):
Before dispatching @test, snapshot the current changed files:
git diff --name-only > /tmp/pre_test_baseline.txt
After @test completes, validate only NEW changes:
git diff --name-only | comm -23 - /tmp/pre_test_baseline.txt > /tmp/test_new_files.txt
All new files must match the project's test patterns:
- Python:
**/test_*.py,**/*_test.py,**/conftest.py(new only),**/test_data/**,**/test_fixtures/** - Rust:
tests/**/*.rs,**/tests/**/*.rs(workspace-style<crate>/tests/...),**/test_data/**,**/test_fixtures/**
Anti-patterns — discard the output even if the glob matches:
- Anything under
src/for Rust (e.g.src/tests/foo.rs,src/**/tests/...).src/tests/is a regular module path undersrc/, not a Rust test location, and@testcannot wire it up viamoddeclarations in production source. Such paths indicate the task spec gave a wrong test path — escalate, don't accept the file.
If any non-matching file appears, or any anti-pattern matches: discard @test output, report violation.
Decision table — handling @test results:
| Condition | Action |
|---|---|
TESTS_READY + escalate_to_check: false |
Proceed to Phase 7 |
TESTS_READY + escalate_to_check: true |
Route tests to @check for light review. @check diagnoses, caller routes fixes to @test. Then proceed. |
NOT_TESTABLE (general reasons) |
Route to @check for sign-off on justification. If approved, task goes to @make without tests. |
NOT_TESTABLE reason Rust unit-only |
See "Rust unit-only routing" below. Do not include test code in the @make spec; pass test specs only. |
BLOCKED |
Investigate. May need to revise task spec or plan. |
| Test passes immediately | Investigate — behavior may already exist. Task spec may be wrong. |
Rust unit-only routing
When @test returns NOT_TESTABLE: Rust unit-only (the implementation needs in-source #[cfg(test)] mod tests blocks that @test is forbidden from writing), the orchestrator must:
- Get
@check's sign-off on the justification (no integration-test seam exists). - Build the
@makespec with test specifications, not test code:- "Add
#[cfg(test)] mod foo_testsat the bottom of<file>exercising:" - For each behavior, a one-line description: input → expected output, edge case to cover, error path to assert.
- Where applicable, name the function under test and the assertion type (
assert_eq!,assert!, panic on invalid input).
- "Add
- Forbidden in the
@makespec: complete#[test] fn …bodies, full module blocks, or any@test-style RED-verified test code.@makewrites the inline tests itself based on the spec. - After
@makecompletes, the orchestrator runs the test suite once to confirm RED→GREEN evidence and includes it in the workflow summary.
This keeps the agents in their lanes: @test never writes inside src/, @make writes both the tests and the production code in a single coherent change, and the orchestrator sees explicit test pass evidence.
Parallelism: Independent tasks can have tests written in parallel.
Constraint: @test must not modify existing conftest.py files (prevents collision during parallel execution).
Phase 7: Implement
Execute each task by dispatching @make with:
- The task spec (from Phase 5, finalized — see Finalized-Text Rule)
- Relevant code context (seam-revealing snippets only — see Code Context Anti-patterns)
- Pre-written failing tests and handoff from
@test(if TESTS_READY)
Pre-Dispatch Validation (MANDATORY)
Before sending the spec to @make, scan it and reject (revise, then retry) if any of the following are present:
| Check | Why it matters |
|---|---|
bash -c, sh -c, zsh -c, fish -c (anywhere, including inside nix develop --command bash -c …) |
@make's sandbox denies all *-c shell invocations and any nested bash would bypass the per-command allowlist. Replace with one direct command per line: nix develop -c cargo check, nix develop -c cargo test, etc. |
nix develop --command bash / nix develop -c bash / nix develop -c sh |
Same — the inner shell escapes the sandbox. Wrap each toolchain command directly. |
Any cd <path> && … |
@make cannot cd. Rewrite to use absolute paths or git -C <path> for git operations (and @make doesn't run git anyway). |
| Code blocks longer than ~5 lines under "Code Context" or labelled as the answer | Violates Code Context Anti-patterns. Trim to the seam. |
| Two versions of the same code, "actually let me correct…", or open questions | Violates the Finalized-Text Rule. Redo the spec. |
Test bodies inside the @make spec when tests are coming from @test |
The TDD handoff already provides them; duplicating creates conflict. |
If any check trips, do not dispatch. Fix the spec and re-validate. Repeated trips on the same task signal a Phase 5 split problem — go back and split.
@make runs in TDD mode when tests are provided:
- Entry validation: run tests, verify RED, check failure codes match handoff
- Implement minimal code to make tests pass (GREEN)
- Regression check on broader area
- Refactor while keeping green
- Report RED→GREEN evidence
Escalation: If @make flags test quality concerns during entry validation:
@makereports the issue to caller- Caller routes to
@checkfor diagnosis @checkreports findings- Caller routes to
@testfor fixes - Fixed tests return to
@make
For NOT_TESTABLE tasks, @make runs in standard mode.
After all tasks complete, verify overall integration:
- Run the project's test suite if available
- Run linting/type checking if configured
- Fix any integration issues between tasks
Phase 8: Final Review
Dispatch @check and @simplify in parallel to review the full implementation (all changes across all files).
Provide reviewers with:
- The original plan
- The full diff (
git diff "$BASE_BRANCH"...HEAD) - Any decisions or deviations from the plan
Review loop (max 3 cycles):
- Send implementation to both reviewers
- Merge findings (same precedence rules as Phase 4)
- If ACCEPTABLE: proceed to Phase 9
- If issues found: fix them directly (no need to re-dispatch
@makefor small fixes), then re-review - Convergence detection: same findings twice = stop loop early
- If unresolved after 3 cycles: document blockers, proceed to commit anyway
Phase 9: Commit and Wrap Up
The workflow is forge-agnostic. It commits locally and stops. Do not push, and do not open a pull/merge request — the user chooses their forge and review workflow manually.
Commit Code Changes
- Stage code changes only. Do not stage
TODO.md(committed separately below) and do not stage.opencode/workflow-summary.md(intentionally never committed — see Local Summary). - Write a conventional commit message summarizing the implementation. Reference the TODO.md issue ID in the body (e.g.
Refs: ABC-1). - If changes are large/varied, use multiple atomic commits (one per logical unit)
TODO Update
- Dispatch
@pmagainst./TODO.md(live filesystem mode). Ask it to:- Set Branch to
$BRANCH_NAME - Set Status to
In Review - Add a comment with the branch name, latest commit SHA, and a one-line summary
- Set Branch to
- If acceptance-criteria checkboxes were addressed by the implementation, ask
@pmto check them off - Commit the TODO.md change as a separate atomic commit:
chore(todo): update <issue-id> status and progress
Local Summary
- Write
.opencode/workflow-summary.mdin the worktree with:- Run timestamp
- Issue reference and title
- Branch name and final commit SHA(s)
- Summary of implementation
- TDD evidence (RED→GREEN per task, NOT_TESTABLE justifications)
- Review outcomes (plan review + final review verdicts)
- Unresolved items (if any)
- Files changed
- Do not commit this file. It is a per-run, per-branch artifact; committing it would create merge conflicts whenever multiple workflow branches are merged. Leave it untracked. Recommend the user add
.opencode/to.gitignoreif not already.
Failure Handling
At any phase, if an unrecoverable error occurs:
- Write
.opencode/workflow-summary.mdwith what was completed and what failed. Do not stage or commit this file. - If any code was written, commit it with message
wip: incomplete workflow run for <issue-id>. Stage code only — exclude.opencode/workflow-summary.md. - Leave the branch and worktree intact for the user to inspect — do not push, do not delete.
- Dispatch
@pmagainst./TODO.mdto add a comment on the issue summarising what failed. - Stop execution.
Never hang on interactive prompts. If any command appears to require input, treat it as a failure and follow the above procedure.