nixcfg/config/opencode/commands/workflow.md
Harald Hoyer 534361f1b5 feat(opencode): extend Phase 7 escalation to mid-implementation test-design errors
Phase 7's escalation rule was gated on @make flagging concerns "during
entry validation" only. When @make got past entry validation, started
implementing, and ground for 2-3 attempts because the test demanded
impossible production code, the orchestrator had no documented route
— it would re-dispatch @make with marginal context tweaks instead of
recognizing the failure as test-architecture.

Splits the escalation into two clearly-named paths (entry-validation
vs mid-implementation) that both route through @check (test diagnosis)
→ @test (redesign) → fresh @make. Bounded at max 2 escalation cycles
before reverting to a Phase 3 plan revisit, to prevent thrashing when
the actual problem is upstream.

@make.md gains a new Iteration Limits red-flag class — "Test-design
suspicion" — instructing @make to stop and report with an explicit
`escalate: test_design` flag in the Blocking Issue section. The flag
is the routing signal the orchestrator switches on.
2026-05-08 10:20:16 +02:00

38 KiB
Raw Blame History

description agent
Multi-agent workflow for the current worktree: plan, test, implement, commit build

You are executing the multi-agent workflow inside the worktree this opencode session was started from. Run all phases without waiting for user input. The user has walked away.

Prerequisites (the user handles before launching opencode):

  • A git worktree is checked out for the issue's feature branch
  • opencode was launched from the root of that worktree
  • A TODO/ directory is committed to the repo containing per-issue files (TODO/<ID>.md) plus TODO/README.md

Task reference: $ARGUMENTS

If $ARGUMENTS is empty, stop immediately: "Usage: /workflow <ISSUE-ID> [base-branch] (e.g. /workflow ABC-1). The ID must exist as ./TODO/<ID>.md. Base branch defaults to main (then master)."

Parse $ARGUMENTS: the first whitespace-separated token is the issue ID, an optional second token overrides the base branch. Store as ISSUE_ID.


Roles & Dispatch

This is a multi-agent workflow. There is one orchestrator (you, running in agent: build mode per this file's frontmatter) and a cast of specialised subagents that the orchestrator dispatches at each phase. The orchestrator coordinates; subagents do the work. The orchestrator does not write production code, write tests, or play any subagent's role — it plans, dispatches, merges findings, edits its own artifacts under .workflow/, and commits.

The cast (each defined as a separate agent file under config/opencode/agents/<name>.md):

Subagent Role Notable constraints
@check Reviews plans and code for risks, correctness, testability. Returns ACCEPTABLE / NEEDS WORK / BLOCK. Read-only — no write / edit / bash.
@simplify Reviews for unnecessary complexity. Advisory recommendations. Read-only.
@test Writes failing tests for a task spec, verifies RED, hands off to @make. May only modify test files / #[cfg(test)] mod blocks. Bash sandboxed to test runners.
@make Implements a single task spec. Verifies acceptance criteria. May only modify files listed in the task spec. Bash sandboxed to language toolchains; no git, network, cd.
@pm Reads/updates TODO/ issue files. May only modify TODO/ contents. No bash.

What "Dispatch" means here. Every "dispatch @<name>" in the phase descriptions is a call to opencode's subagent / task invocation tool with that agent name. Each dispatch starts a fresh context: the subagent has no memory of prior phases, no view of this orchestration, and no access beyond what its own file declares. The subagent receives only what the dispatch prompt provides — typically an absolute path to a file in $RUN_DIR plus a small per-dispatch context block.

Anti-patterns to avoid:

  • Performing a subagent's work in the orchestrator's session ("I'll think like @check for a moment and produce the review myself"). Every @<name> reference is a tool call, not a role-play.
  • Skipping a dispatch because the orchestrator "could just do it." The agents enforce permission boundaries the orchestrator (in agent: build mode) does not have.
  • Paraphrasing a subagent's output into the next dispatch's prompt instead of letting the next subagent read the on-disk artifact directly.

Run Artifacts

The orchestrator writes plan and task-spec artifacts to a per-run directory in the worktree. Subagents read these by absolute path rather than from inline prompt text. This keeps dispatch prompts small, eliminates paraphrase drift between dispatches (@check and @simplify see the same plan byte-for-byte), and gives Dispatch Hygiene's Finalized-Text Rule a physical anchor — the file is the final version.

Directory layout (relative to $WORKTREE_PATH):

.workflow/
└── run-<ISSUE-ID>/
    ├── plan.md          # Phase 3 output — finalized
    ├── task-1.md        # Phase 5 output — one file per task
    ├── task-2.md
    └── summary.md       # Phase 9 output (the run summary)

Define RUN_DIR="$WORKTREE_PATH/.workflow/run-$ISSUE_ID" once in Phase 1 and reference it everywhere downstream. Create the directory in Phase 3 (mkdir -p "$RUN_DIR").

Authoring rules:

  • Files are written by the orchestrator, never by subagents.
  • Files are passed to subagents as absolute paths: e.g. "the plan is at <RUN_DIR>/plan.md; read it before responding." The dispatch prompt body should be short — agent role, artifact path, per-dispatch context (worktree path, branch, base branch). Do not quote artifact contents inline.
  • Mid-loop revisions (Phase 4 review cycle, Phase 5 task respec, etc.) edit the file in place; every subsequent dispatch reads the new version automatically.

Lifecycle:

  • Files persist across phases until the run finishes.
  • Files are not committed (same as summary.md). Recommend .workflow/ in .gitignore.
  • Multiple runs on the same issue overwrite the prior run's artifacts. Save anything you want to keep before re-running.

Phase 1: Sanity Check

  1. Verify CWD is a non-bare git worktree: git rev-parse --is-bare-repository 2>/dev/null must output false. If not, stop: "Workflow must be run from a non-bare worktree (the directory opencode was launched in)."
  2. Capture the worktree path: WORKTREE_PATH="$(pwd)".
  3. Verify the TODO tracker exists:
    • ./TODO/ directory must exist. If not, stop: "TODO/ directory not found in the current worktree. Commit a TODO/ folder with one file per issue plus a README.md index."
    • ./TODO/README.md must exist. If not, stop: "TODO/README.md not found. Add the category index file before running the workflow."
    • ./TODO/$ISSUE_ID.md must exist. If not, stop: "Issue file ./TODO/<ID>.md not found for ID parsed from $ARGUMENTS."
  4. Verify HEAD is not detached: git symbolic-ref --short HEAD must succeed. If it fails, stop: "Cannot run on a detached HEAD. Check out a feature branch first."
  5. Capture the current branch: BRANCH_NAME="$(git symbolic-ref --short HEAD)".
  6. Resolve the base branch (BASE_BRANCH):
    • If $ARGUMENTS provided a second token, use it.
    • Else if git rev-parse --verify --quiet main succeeds, use main.
    • Else if git rev-parse --verify --quiet master succeeds, use master.
    • Else stop: "Could not determine base branch (no main or master). Pass it as the second argument: /workflow <ISSUE-ID> <base-branch>."
  7. Verify the current branch is not the base branch: if BRANCH_NAME == BASE_BRANCH, stop: "Cannot run workflow on the base branch ($BASE_BRANCH). Switch to a feature branch first."
  8. Set the run-artifacts directory: RUN_DIR="$WORKTREE_PATH/.workflow/run-$ISSUE_ID". Phase 3 will mkdir -p "$RUN_DIR" before writing the first artifact.

Phase 2: Issue Context

Dispatch @pm against ./TODO/ (pass the absolute TODO/ directory path) and fetch the issue at ./TODO/<ID>.md:

  • Title, description, acceptance criteria (if section present)
  • Labels and parent
  • Sub-issues list (if the issue is a parent)
  • Existing status

If the issue file does not exist or @pm fails, stop with error.

If the issue's status is Todo, ask @pm to set it to In Progress and propagate the change to the dependent index (README.md for top-level issues, the parent's ## Sub-issues line for sub-issues). The status edit will be staged alongside other TODO updates in Phase 9.


Phase 3: Plan

Analyze the codebase. Create a detailed implementation plan addressing the issue's requirements and acceptance criteria, then write it to $RUN_DIR/plan.md (run mkdir -p "$RUN_DIR" first if the directory doesn't exist). All Phase 4 reviewer dispatches read this file.

The plan should include:

  • Problem summary (from issue context)

  • Proposed approach with rationale

  • Files to modify (with brief description of changes)

  • New files to create

  • Risks and open questions

  • Test Design (conditional — include for non-trivial tasks):

    • Key behaviors to verify, expressed as action + observable outcome (e.g. "call weave_enemies with t=0.5 → enemy Transform.translation.x differs from initial position"). A structural fact like "enum has 3 variants" or "struct has these fields" is not a behavior — it cannot fail meaningfully and does not exercise the code under test.
    • Edge cases and error conditions worth testing (also expressed as actions, not structure)
    • What explicitly should NOT be tested (prevents bloat)
    • Testability concerns (heavy external deps, GPU-only paths, etc.)

    Include Test Design for: Public API changes, bug fixes with behavioral impact, new features with business logic, multi-module changes. Skip Test Design for: Config-only changes, decorator swaps, import reorganization, documentation. When skipped, @test derives test cases directly from acceptance criteria.

Before saving plan.md, apply Dispatch Hygiene (below). The file on disk is what reviewers will read in Phase 4 — there is no second chance to revise during dispatch.


Dispatch Hygiene

This applies to every subagent dispatch (Phases 4, 6, 7, 8) and to artifacts that will be dispatched (the plan from Phase 3, the task specs from Phase 5). Apply these checks before sending — fix the artifact, then re-check.

Finalized-Text Rule

The artifact must be finalized — single-author text, no contradictions, no open questions. Forbidden:

  • "Actually, that's wrong — let me correct…"
  • "Wait, let me reconsider…"
  • Two versions of the same code block, one labelled "corrected" or appearing after a revision pass
  • Open questions or ambiguities the orchestrator hasn't resolved
  • Mid-text revisions visible to the recipient

If you find yourself revising while writing, stop, redo the artifact from scratch with the corrected understanding, and only then dispatch. Subagents are fresh-context — they cannot reliably resolve which of two contradictory drafts is canonical, and reviewers cannot give a clean verdict on a self-contradicting plan.

No-Implementation-in-Plan-or-Spec Rule

Plans (Phase 3) and task specs (Phase 5) are not the place to write the answer. They describe what to do; @make writes how.

Provide:

  • Approach with rationale
  • Files to modify with brief descriptions
  • Function signatures, type declarations, data shapes (structure, not logic)
  • Constraints, invariants, integration contracts
  • Risks and edge cases

Do not provide:

  • Drop-in code blocks longer than ~5 lines that constitute "the answer"
  • Full function bodies for the changes being planned
  • Complete match arms / branch logic / loop bodies for new behavior
  • Pre-written test bodies (those come from @test)
  • Stage-by-stage code transformations spelled out as ready-to-commit diffs

If you've already written the implementation in the plan or spec, the artifact has overstepped. Convert finished code into structural description (signature + intent) and let @make produce the body.

Allowed in plans/specs:

  • Existing code being replaced, marked as "current state"
  • Function signatures and type/struct/enum declarations (data, not logic)
  • Tiny inline constants (pub const FOO: f32 = 30.0;)
  • Test specifications as one-line behavior descriptions ("input X → expect Y")

Pre-Dispatch Validation (MANDATORY)

Scan the artifact and reject (revise, retry) if any of the following are present:

Check Why it matters
bash -c, sh -c, zsh -c, fish -c (anywhere, including inside nix develop --command bash -c …) @make/@test sandboxes deny all *-c shell invocations and any nested bash would bypass the per-command allowlist. Replace with one direct command per line: nix develop -c cargo check, etc.
nix develop --command bash / nix develop -c bash / nix develop -c sh Same — inner shell escapes the sandbox. Wrap each toolchain command directly.
Any cd <path> && … Subagents cannot cd. Rewrite to use absolute paths.
Code blocks longer than ~5 lines that draft the answer Violates No-Implementation-in-Plan-or-Spec. Trim to structure (signature + "current state" only).
Two versions of the same code, "actually let me correct…", or open questions Violates the Finalized-Text Rule. Redo the artifact.
Test bodies inside @make specs when tests are coming from @test Duplicates the TDD handoff.
Artifact path referenced in the dispatch (e.g. $RUN_DIR/plan.md, $RUN_DIR/task-<N>.md) but the file isn't on disk The subagent will fail to read it and either error or fabricate context. Verify with test -f "<path>" before every dispatch. If missing, go back to the phase that produces it (Phase 3 for plan.md, Phase 5 for task-<N>.md) and write the file before retrying.

If any check trips, do not dispatch. Fix and re-validate. Repeated trips on a single task signal a Phase 5 split problem — go back and split.


Phase 4: Review Plan

Dispatch @check and @simplify in parallel to review $RUN_DIR/plan.md. The dispatch prompt is short — agent role, the absolute path to the plan, the worktree path, and any per-dispatch reviewer focus. Tell each reviewer to read the plan from disk; do not paste the plan inline. Apply Dispatch Hygiene to each dispatch prompt.

Reviewers should evaluate testability:

  • @check: Is the design testable? Are the right behaviors identified? (Review Framework §8)
  • @simplify: Is the test scope appropriate? Over-testing proposed?

Merge rules:

  • @check safety/correctness findings are hard constraints
  • If @simplify recommends removing something @check flags as needed, @check wins
  • Note conflicts explicitly

Review loop (max 3 cycles):

  1. Dispatch both reviewers against $RUN_DIR/plan.md.
  2. Merge findings
  3. If verdict is ACCEPTABLE from both (or JUSTIFIED COMPLEXITY from @simplify): proceed to Phase 5
  4. If BLOCK or NEEDS WORK: edit $RUN_DIR/plan.md in place addressing findings (re-apply Dispatch Hygiene to the updated file), then re-review.
  5. Convergence detection: if reviewers return the same findings as the previous cycle, stop the loop early
  6. If still unresolved after 3 cycles: note unresolved blockers and proceed anyway (they will be documented in the workflow summary and commit message)

Phase 5: Split into Tasks

The output of this phase is one file per task at $RUN_DIR/task-<N>.md (1-indexed: task-1.md, task-2.md, …). These files are the source-of-truth that Phase 6 (@test) and Phase 7 (@make) read by absolute path. No file written = no dispatch in later phases. If you skip the file-write step, every downstream dispatch will reference a non-existent path and fail.

Steps:

  1. Break the approved plan into discrete tasks (see Split Heuristic and task-size guidance below).
  2. For each task, draft the task spec covering the fields in the table below.
  3. Apply Dispatch Hygiene (above) to each draft.
  4. Write each finalized spec to $RUN_DIR/task-<N>.md. After writing, verify with test -f "$RUN_DIR/task-<N>.md" for every N. Phase 5 is not done until every task file exists on disk.
  5. Drop your inline copies of the task drafts. From this phase onward, the file is the only source of truth — if you need a task spec later, read it back from disk.

Each task file must contain:

Required Description
Task Clear description of what to implement
Acceptance Criteria Specific, testable criteria (checkbox format)
Code Context Actual code snippets from the codebase, not just file paths
Files to Modify Explicit list, mark new files with "(create)"
Test File Path for test file. Pick the pattern that matches the project's language — see "Test File Path by Language" below.

Test File Path by Language

The test file path must follow the language's actual test layout. Do not invent paths that look colocated but aren't valid for the language (e.g. src/tests/test_<feature>.rs is not a Rust test location — it's a regular src/ submodule).

  • Python
    • Colocated: <module>/tests/test_<feature>.py (create)
    • Top-level: tests/test_<feature>.py (create)
  • Rust
    • Module tests (most common — testing private/crate-internal functions): pick the relevant production source file, e.g. src/<module>.rs. @test is permitted to add or edit content only inside #[cfg(test)] mod <name> { … } blocks in that file (per @test's File Constraint). The rest of the file remains read-only to @test.
    • Integration tests (testing the crate's public API as a black box): tests/<feature>.rs (create), or in a workspace <crate>/tests/<feature>.rs.
    • In both cases, if the test references not-yet-existing functions/types, the task requires a stub-first @make pre-pass so the symbols exist as todo!() bodies before @test runs. See Phase 6 → "Rust stub-first TDD". Plan for two @make dispatches per such task: stub pass, then body pass.
    • src/tests/<feature>.rs is not a valid path — it would be a regular submodule needing mod tests; in production code. Use one of the two forms above.
  • Polyglot Nix flake
    • Match the host language of the code under change (Python or Rust rules above), wrapping commands in nix develop -c … per the agents' devshell rule.

Include Integration Contracts when a task adds/changes function signatures, APIs, config keys, or has dependencies on other tasks.

Include Test Design from Phase 3 when available, attached to the relevant task(s).

Task size: ~10-30 minutes each, single coherent change, clear boundaries.

Split Heuristic — when in doubt, split

A task must be split if any of the following apply:

  • It touches more than two distinct concerns (e.g. constants + new component + sprite spawn + new system + main wiring is five concerns — at least three tasks).
  • It changes more than ~50 lines across more than 2 files.
  • It mixes data/structural changes (constants, types, components) with runtime/system changes (new ECS systems, scheduling, render loops).
  • It mixes pure-logic changes (math helpers) with stateful changes (queries, world mutation).
  • It mixes new APIs with their first call sites in the same task.

When a task fails the heuristic, split into:

  1. Foundations — new constants, types, components (no behavior change yet).
  2. Implementation — the actual production logic, calling the foundations.
  3. Wiring — registration in main.rs / lib.rs / app-builder.

Each split is dispatched separately to @make and verified before the next.

Code Context — what to include

The Code Context field exists so @make can find the seam to modify. Provide:

  • The existing code being replaced (verbatim, marked as "current state"), with ~510 lines of surrounding context
  • Function signatures of helpers @make will need to call
  • The file's relevant import block

For everything you must not include — drop-in replacements, full function bodies, pre-written test bodies, "here is what to write" — see Dispatch Hygiene → No-Implementation-in-Plan-or-Spec Rule above.

If the task is so well-specified that you've already written the implementation, the task is too small for @make (apply it directly) or you've over-determined the design (revisit Phase 3).

Apply Dispatch Hygiene to each task spec before dispatch in Phase 7.


Phase 5.5: Review Task Split

A short, focused review of the task split as a set. Catches split errors (missed scope, overlap, multi-purpose tasks, missing integration contracts) when they're cheap to fix — before @test and @make dispatch on a broken split. Without this gate, the same errors surface only at Phase 8 final review, after expensive test/implementation work has already been done.

Dispatch only @check for this phase — split review is structural / coverage, not complexity. @simplify is not involved. Apply Dispatch Hygiene to the prompt.

The dispatch prompt names:

  • $RUN_DIR/plan.md (the plan being decomposed)
  • $RUN_DIR/task-1.md through $RUN_DIR/task-N.md (the split — list every task file)
  • The worktree path

@check evaluates the split against five questions:

  1. Coverage — do the tasks together implement everything the plan promises? Any gaps?
  2. No overlap — do two tasks claim the same scope or modify the same lines?
  3. Single-purpose — does any task do more than one thing? (See Phase 5's Split Heuristic.)
  4. Integration contracts — where two tasks touch a shared interface, is the contract documented in both task files?
  5. Testable acceptance criteria — does every task have specific, falsifiable AC?

Review loop (max 2 cycles):

  1. Dispatch @check against the plan + all task files.
  2. If ACCEPTABLE → proceed to Phase 6.
  3. If NEEDS WORK → edit the task files in place (split a task into two, merge two tasks, add integration contracts, sharpen AC). Re-apply Dispatch Hygiene to each updated file. Re-dispatch.
  4. If BLOCK → the plan itself does not decompose cleanly. Return to Phase 4 with @check's finding instead of forcing the split.
  5. Convergence detection: same finding twice → stop loop, document the unresolved split issue in the run summary, proceed.

This is a quick gate, not a deep review. No line-by-line code feedback (there's no code), no design re-litigation (that was Phase 4's job). The whole point is a fast structural check before downstream phases start churning.


Phase 6: Write Tests

Apply Dispatch Hygiene to each @test prompt before sending.

For each task from Phase 5, dispatch @test with a short prompt that names:

  • The absolute path to the task spec: $RUN_DIR/task-<N>.md@test reads acceptance criteria, code context, and files-to-modify from there.
  • The absolute path to the plan, if test design context is needed: $RUN_DIR/plan.md.
  • The worktree path (so @test resolves source files correctly).
  • The test file path to create.

Do not quote task or plan content inline — @test reads from disk.

@test writes failing tests and verifies RED with structured failure codes.

Decision table — handling @test results:

Condition Action
TESTS_READY + escalate_to_check: false Proceed to Phase 7
TESTS_READY + escalate_to_check: true Route tests to @check for light review. @check diagnoses, caller routes fixes to @test. Then proceed.
NOT_TESTABLE Route to @check for sign-off on justification. If Missing testability seam, dispatch @make to add the seam first, then re-run @test. Otherwise the task goes to @make without tests.
BLOCKED Investigate. May need to revise task spec or plan.
Test passes immediately Investigate — behavior may already exist. Task spec may be wrong.
Stub-first run: tests pass with zero todo!() panics Structural-only tests. Every test is asserting type/struct/enum facts without calling any stubbed symbol. Reject the test output and route back to @test with a "must exercise the stubbed symbols by calling them" note. Do not let these tests gate Phase 7 — they cannot RED→GREEN, so the body-pass @make would commit code with false-green coverage.

Rust stub-first TDD (mandatory for new symbols)

Whenever @test will write tests (module or integration) that reference functions / methods / types that do not yet exist, the test cannot RED meaningfully against absent code:

  • Module tests inside src/<module>.rs — without the function, the #[cfg(test)] mod tests block fails to compile (error[E0425]), masking assertion diagnostics.
  • Integration tests inside tests/<feature>.rs — same, but mediated through lib.rs re-exports.

To get a clean runtime RED, dispatch a stub-first @make pass before @test runs:

Stub pass (split from Phase 7's body pass):

  1. Dispatch @make in standard mode (no tests exist yet). The dispatch prompt names $RUN_DIR/task-<N>.md as the source spec and adds this stub-pass-specific scope inline:
    • Goal: add the planned API as todo!()-bodied stubs so the test will compile.
    • Files to modify: the relevant src/<module>.rs for module tests, or src/lib.rs plus any new src/<module>.rs for integration tests (the latter need pub mod …; declarations so the test crate can import).
    • Stubs only: every function body is exactly todo!(). Every method body is exactly todo!(). Structs may use pub struct Foo; or pub struct Foo { /* fields TBD */ } — but no logic.
    • Signatures must match the planned final API exactly (return types, lifetimes, generics, visibility). Lift signatures from the task spec.
    • Acceptance criteria: cargo check (wrapped in nix develop -c … if the project has a devshell) passes; no test command is run.
    • Dispatch Hygiene still applies: the stub pass is small and finalized — no draft bodies, no contradictory signatures.
  2. Verify cargo check passed in @make's output. If not, fix and re-dispatch the stub pass before continuing.
  3. Dispatch @test. The test now compiles; running it panics on todo!() at runtime, which is a clean MISSING_BEHAVIOR RED with a stack trace — far better than the build-error-RED form.
  4. Panic-coverage check (MANDATORY). After @test returns, re-run the test command in the orchestrator and verify that every test in the new file panics on todo!() (i.e. every test exercises at least one of the stubbed symbols). The decision rule:
    • If the test output shows N panics for N tests → proceed to body pass.
    • If any test passes without a todo!() panic → that test is structural-only (asserting type / variant-count / field facts without calling the stubbed code). Reject @test's output and route back with the "Stub-first run: tests pass with zero todo!() panics" decision-table verdict. Require @test to rewrite each non-panicking test so it actually invokes the stubbed function/method.
    • This check is the only thing standing between false-green coverage and the body-pass commit. Skipping it has produced regressions like a system that compiles, "passes" tests, and silently no-ops in production.
  5. Continue to Phase 7's body pass (@make in TDD mode), where the same files are revisited and the todo!() bodies are replaced.

This routing is mandatory whenever new symbols are introduced in Rust (module or integration). It is not required when the test exercises an existing function/method (e.g. a behavior fix) — in that case @test runs directly and @make modifies the body in Phase 7.

The stub pass and the body pass each produce their own atomic commit (per Phase 9 rules): feat(<scope>): scaffold <thing> with todo!() stubs followed by feat(<scope>): implement <thing> (or whichever conventional type fits).

Parallelism:

  • Python: Independent tasks can have tests written in parallel, provided their test files are disjoint and no shared conftest.py is being modified.
  • Rust: Run @test dispatches sequentially. Cargo serialises the build via the target/ directory lock, so parallel dispatches give no speedup; they only add risk (a long-running build in one branch starves the other, and any task that touches a shared crate-level fixture/helper file will race).

Constraint: @test must not modify existing conftest.py files (prevents collision during parallel execution).


Phase 7: Implement

Apply Dispatch Hygiene to each @make spec before sending. Repeated trips on a single task signal a Phase 5 split problem — go back and split.

@make dispatches are SEQUENTIAL — never in parallel. Run each task to completion (writes, every verification command, and the orchestrator's post-check) before dispatching the next. Reasons:

  • @make writes source files. Parallel agents picking the same file (e.g. src/lib.rs for adding both a new pub mod and a registration) corrupt each other.
  • Even on disjoint files, Cargo's target/ lock and uv's venv state serialise the verification builds anyway, so parallelism gives no speedup.
  • Stub-pass/body-pass pairs (Rust integration TDD) must be strictly ordered within a task; running stub-pass for task 2 while body-pass for task 1 is still building yields a non-deterministic crate state for @test to RED against.

This applies to all @make invocations: standard mode, TDD mode, stub-pass, body-pass, and integration-fix dispatches.

Execute each task by dispatching @make with a short prompt:

  • The absolute path to the task spec: $RUN_DIR/task-<N>.md@make reads acceptance criteria, code context, and files-to-modify from there.
  • The worktree path.
  • Pre-written failing tests and handoff from @test (if TESTS_READY) — these are short and per-dispatch, so include them inline in the prompt.

Do not quote the task spec inline.

@make runs in TDD mode when tests are provided:

  1. Entry validation: run tests, verify RED, check failure codes match handoff
  2. Implement minimal code to make tests pass (GREEN)
  3. Regression check on broader area
  4. Refactor while keeping green
  5. Report RED→GREEN evidence

Escalation — two paths route through @check@test → back to @make:

  1. Entry-validation escalation. Before implementing, @make's entry check (run tests, verify RED, compare against handoff) reveals test-quality concerns — wrong assertion target, mixed failure codes, mocks of internal boundaries, etc. @make reports without writing any production code.
  2. Mid-implementation escalation. After implementing, @make hits its iteration limit (23 attempts) because the test demands production code that's impossible or contradicts the spec. @make returns Implementation Incomplete with the flag escalate: test_design. Do not re-dispatch @make with marginal context tweaks — that just burns cycles on a test that needs redesign, not better implementation.

In both cases:

  1. @make returns its report (entry-time concern or mid-impl escalate: test_design).
  2. Orchestrator routes the report to @check for diagnosis (light review of the tests, not the implementation).
  3. @check confirms or rejects the test-design suspicion.
  4. If confirmed: orchestrator routes to @test to redesign the tests. Apply Dispatch Hygiene. Fixed tests return to @make for fresh entry validation and a clean implementation attempt.
  5. If rejected: the issue is in the production code; orchestrator re-dispatches @make with @check's diagnostic notes attached.

Iteration limit on this loop: max 2 cycles. If a test-design suspicion keeps surfacing but @check never confirms it, the design problem is upstream — revisit the Phase 3 plan rather than thrashing between @test and @make.

For NOT_TESTABLE tasks, @make runs in standard mode.

After all tasks complete, verify overall integration:

  • Run the project's test suite if available
  • Run linting/type checking if configured
  • Fix any integration issues between tasks

Phase 8: Final Review

Apply Dispatch Hygiene to each reviewer prompt before sending. Dispatch @check and @simplify in parallel to review the full implementation (all changes across all files).

Provide reviewers with:

  • The absolute path to $RUN_DIR/plan.md (the same file Phase 4 reviewed; mid-loop revisions will have updated it in place)
  • The full diff (git diff "$BASE_BRANCH"...HEAD)
  • Any decisions or deviations from the plan, captured inline in the dispatch prompt

Review loop (max 3 cycles):

  1. Send implementation to both reviewers
  2. Merge findings (same precedence rules as Phase 4)
  3. If ACCEPTABLE: proceed to Phase 9
  4. If issues found, route per the kind of finding — the orchestrator does not write production code; @make does:
    • BLOCK, behavioral, correctness, or test-quality findings: build a new @make task spec from the finding (apply Dispatch Hygiene, finalized text, no draft answer). Dispatch @make. Do not fix directly. Every BLOCK is by definition behavioral and must round-trip through @make.
    • Strictly cosmetic findings (typo in a comment, missing trailing newline, formatting that does not change the AST or behavior): the orchestrator may fix directly, then re-review. Anything compiler-detected (unused import, dead code) goes through @make, since removing it is still a code change.
    • When in doubt, dispatch @make.
  5. Convergence detection: same findings twice = stop loop early
  6. If unresolved after 3 cycles: document blockers, proceed to commit anyway

Phase 9: Commit and Wrap Up

The workflow is forge-agnostic. It commits locally and stops. Do not push, and do not open a pull/merge request — the user chooses their forge and review workflow manually.

Commit Code Changes

  • Stage code changes only. Do not stage anything under TODO/ (committed separately below) and do not stage anything under .workflow/ (intentionally never committed — these are per-run artifacts).
  • Write a conventional commit message summarizing the implementation. Reference the TODO issue ID in the body (e.g. Refs: GAL-39).
  • If changes are large/varied, use multiple atomic commits (one per logical unit)

TODO Update

  • Dispatch @pm against the absolute ./TODO/ path. Ask it to:
    • Set the issue file's frontmatter status to Done (or leave at In Progress if the run is incomplete and the user must verify before marking Done).
    • Add a comment of the form: - YYYY-MM-DD — Branch \$BRANCH_NAME`, commit — ` (date from the shell, never fabricated).
    • Propagate any status flip to the dependent index: TODO/README.md for top-level issues (parent: null), or the parent file's ## Sub-issues line for sub-issues.
  • If acceptance-criteria checkboxes were addressed by the implementation, ask @pm to check them off (flip - [ ] to - [x] under ## Acceptance criteria).

File Follow-ups

Tracked-worthy unresolved items must become real TODO issues; otherwise they vanish into the per-run summary.md and the user (who has walked away) never sees them. Before writing the summary, scan the run for items in these categories and dispatch @pm to file each as a sub-issue of the current issue (parent: $ISSUE_ID).

Source New issue label Title style
Pre-existing bug discovered while working but out of scope (e.g. "Score not resetting on game restart" found during GAL-39) bug Imperative fix description ("Reset score on game restart")
Unresolved blocker after a review loop exhausted its cycle limit (Phase 4 plan review or Phase 8 final review) followup Reference the @check finding
@test NOT_TESTABLE "future seam" notes that imply a real test gap tech-debt Describe the missing seam

Do NOT file follow-ups for:

  • @simplify advisory recommendations the orchestrator chose not to act on — these are records, not missing work; they belong in the run summary.
  • Cosmetic / formatting / naming nits.
  • Anything already covered by an existing TODO issue (@pm lists existing issues; check the title/description before filing a duplicate).

Routing rules:

  • Each new issue is a sub-issue (parent: $ISSUE_ID). @pm will add it to the parent's ## Sub-issues list automatically. The user can promote it to top-level later if it deserves its own slot.
  • Issue body must include a "Discovered during" paragraph naming the run's branch and (where relevant) commit SHA, plus enough context for the user to triage it later without having to re-read the run.
  • Status: Todo. Default labels per the table; the orchestrator may add additional labels inferred from the parent (e.g. propagate gameplay from GAL-39 to a gameplay-relevant follow-up).
  • The Run Summary (next subsection) lists each filed follow-up by ID so the user has one place to see them.

Commit TODO Changes

After both the TODO Update and File Follow-ups steps, commit everything under TODO/ in a single atomic commit: chore(todo): update <issue-id> status, file follow-ups. Stage the worked issue file, the dependent index (README.md or parent file), and any newly created follow-up issue files.

If no follow-ups were filed, the commit message simplifies to chore(todo): update <issue-id> status and progress and only the TODO Update changes are staged.

Run Summary

  • Write $RUN_DIR/summary.md with:
    • Run timestamp — capture it from the shell at write time: date -Iseconds (e.g. 2026-05-08T11:24:13+02:00). Do not use a placeholder like ???:???:?? or "session date" — if you cannot get a real timestamp, omit the field entirely rather than fabricating one.
    • Issue reference and title
    • Branch name and final commit SHA(s)
    • Summary of implementation
    • TDD evidence (RED→GREEN per task, NOT_TESTABLE justifications)
    • Review outcomes (plan review + final review verdicts)
    • Filed follow-ups — list each new issue created in the File Follow-ups step by ID, title, and reason (bug / followup / tech-debt). If none, write "None."
    • Advisory notes (not filed) — any @simplify or @check recommendations the orchestrator chose not to act on and did not turn into a TODO. These are records for the user to consider, not tracked work.
    • Files changed
  • Do not commit anything under .workflow/. The whole directory is per-run, per-branch state. Recommend the user add .workflow/ to .gitignore if not already.

Failure Handling

At any phase, if an unrecoverable error occurs:

  1. Write $RUN_DIR/summary.md (creating $RUN_DIR first if it doesn't exist) with what was completed and what failed. Do not stage or commit anything under .workflow/.
  2. If any code was written, commit it with message wip: incomplete workflow run for <issue-id>. Stage code only — exclude .workflow/ and TODO/.
  3. Leave the branch and worktree intact for the user to inspect — do not push, do not delete.
  4. Dispatch @pm against ./TODO/ to add a comment on the issue file (./TODO/<ID>.md) summarising what failed.
  5. Stop execution.

Never hang on interactive prompts. If any command appears to require input, treat it as a failure and follow the above procedure.