nixcfg/config/opencode/commands/workflow.md at f0cc3003580f80b2d517f5c90d6212da3c983740

Harald Hoyer f0cc300358 fix(opencode): make Phase 6 file gate see untracked files

`git diff --name-only` only shows tracked files with unstaged
modifications. It does not show untracked files — which is precisely
the state of any new test file @test creates, since @test's sandbox
denies `git add`. The pre/post snapshots therefore both missed new
files entirely and `comm -23 post pre` returned nothing, letting the
gate cheerfully conclude nothing changed even when @test had just
created tests/foo.rs (or, worse, src/lib.rs).

Switch both snapshots to `git status --porcelain | sed 's/^...//'
| sort -u`, which captures modified, staged, and untracked files in
a single pass. Inline rationale notes the untracked blind spot so
the orchestrator does not fall back to git diff.

2026-05-08 10:20:16 +02:00

23 KiB

Raw Blame History

description	agent
Multi-agent workflow for the current worktree: plan, test, implement, commit	build

You are executing the multi-agent workflow inside the worktree this opencode session was started from. Run all phases without waiting for user input. The user has walked away.

Prerequisites (the user handles before launching opencode):

A git worktree is checked out for the issue's feature branch
opencode was launched from the root of that worktree
TODO.md is committed to the repo and present at ./TODO.md

Task reference: $ARGUMENTS

If $ARGUMENTS is empty, stop immediately: "Usage: /workflow <ISSUE-ID> [base-branch] (e.g. /workflow ABC-1). The ID must exist in ./TODO.md. Base branch defaults to main (then master)."

Parse $ARGUMENTS: the first whitespace-separated token is the issue ID, an optional second token overrides the base branch.

Phase 1: Sanity Check

Verify CWD is a non-bare git worktree: git rev-parse --is-bare-repository 2>/dev/null must output false. If not, stop: "Workflow must be run from a non-bare worktree (the directory opencode was launched in)."
Verify ./TODO.md exists. If not, stop: "TODO.md not found in the current worktree. Commit a TODO.md to the repo first."
Verify HEAD is not detached: git symbolic-ref --short HEAD must succeed. If it fails, stop: "Cannot run on a detached HEAD. Check out a feature branch first."
Capture the current branch: BRANCH_NAME="$(git symbolic-ref --short HEAD)".
Resolve the base branch (BASE_BRANCH):
- If $ARGUMENTS provided a second token, use it.
- Else if git rev-parse --verify --quiet main succeeds, use main.
- Else if git rev-parse --verify --quiet master succeeds, use master.
- Else stop: "Could not determine base branch (no main or master). Pass it as the second argument: /workflow <ISSUE-ID> <base-branch>."
Verify the current branch is not the base branch: if BRANCH_NAME == BASE_BRANCH, stop: "Cannot run workflow on the base branch ($BASE_BRANCH). Switch to a feature branch first."

Phase 2: Issue Context

Dispatch @pm to read ./TODO.md (live filesystem mode) and fetch the issue matching the parsed ID:

Issue title, description, acceptance criteria
Labels and priority
Existing status

If the issue does not exist or @pm fails, stop with error.

If the issue's status is Backlog or Todo, ask @pm to set it to In Progress (this edit will be staged in Phase 9 alongside other TODO.md updates).

Phase 3: Plan

Analyze the codebase. Create a detailed implementation plan addressing the issue's requirements and acceptance criteria.

The plan should include:

Problem summary (from issue context)
Proposed approach with rationale
Files to modify (with brief description of changes)
New files to create
Risks and open questions
Test Design (conditional — include for non-trivial tasks):
- Key behaviors to verify (what tests should assert)
- Edge cases and error conditions worth testing
- What explicitly should NOT be tested (prevents bloat)
- Testability concerns (heavy external deps, GPU-only paths, etc.)
Include Test Design for: Public API changes, bug fixes with behavioral impact, new features with business logic, multi-module changes. Skip Test Design for: Config-only changes, decorator swaps, import reorganization, documentation. When skipped, @test derives test cases directly from acceptance criteria.

After drafting, apply Dispatch Hygiene (below) to the plan — it is a dispatch artifact and gets sent to @check/@simplify in Phase 4.

Dispatch Hygiene

This applies to every subagent dispatch (Phases 4, 6, 7, 8) and to artifacts that will be dispatched (the plan from Phase 3, the task specs from Phase 5). Apply these checks before sending — fix the artifact, then re-check.

Finalized-Text Rule

The artifact must be finalized — single-author text, no contradictions, no open questions. Forbidden:

"Actually, that's wrong — let me correct…"
"Wait, let me reconsider…"
Two versions of the same code block, one labelled "corrected" or appearing after a revision pass
Open questions or ambiguities the orchestrator hasn't resolved
Mid-text revisions visible to the recipient

If you find yourself revising while writing, stop, redo the artifact from scratch with the corrected understanding, and only then dispatch. Subagents are fresh-context — they cannot reliably resolve which of two contradictory drafts is canonical, and reviewers cannot give a clean verdict on a self-contradicting plan.

No-Implementation-in-Plan-or-Spec Rule

Plans (Phase 3) and task specs (Phase 5) are not the place to write the answer. They describe what to do; @make writes how.

Provide:

Approach with rationale
Files to modify with brief descriptions
Function signatures, type declarations, data shapes (structure, not logic)
Constraints, invariants, integration contracts
Risks and edge cases

Do not provide:

Drop-in code blocks longer than ~5 lines that constitute "the answer"
Full function bodies for the changes being planned
Complete match arms / branch logic / loop bodies for new behavior
Pre-written test bodies (those come from @test, or from @make for Rust unit-only)
Stage-by-stage code transformations spelled out as ready-to-commit diffs

If you've already written the implementation in the plan or spec, the artifact has overstepped. Convert finished code into structural description (signature + intent) and let @make produce the body.

Allowed in plans/specs:

Existing code being replaced, marked as "current state"
Function signatures and type/struct/enum declarations (data, not logic)
Tiny inline constants (pub const FOO: f32 = 30.0;)
Test specifications as one-line behavior descriptions ("input X → expect Y")

Pre-Dispatch Validation (MANDATORY)

Scan the artifact and reject (revise, retry) if any of the following are present:

Check	Why it matters
`bash -c`, `sh -c`, `zsh -c`, `fish -c` (anywhere, including inside `nix develop --command bash -c …`)	`@make`/`@test` sandboxes deny all `*-c` shell invocations and any nested `bash` would bypass the per-command allowlist. Replace with one direct command per line: `nix develop -c cargo check`, etc.
`nix develop --command bash` / `nix develop -c bash` / `nix develop -c sh`	Same — inner shell escapes the sandbox. Wrap each toolchain command directly.
Any `cd <path> && …`	Subagents cannot `cd`. Rewrite to use absolute paths.
Code blocks longer than ~5 lines that draft the answer	Violates No-Implementation-in-Plan-or-Spec. Trim to structure (signature + "current state" only).
Two versions of the same code, "actually let me correct…", or open questions	Violates the Finalized-Text Rule. Redo the artifact.
Test bodies inside `@make` specs when tests are coming from `@test`	Duplicates the TDD handoff.

If any check trips, do not dispatch. Fix and re-validate. Repeated trips on a single task signal a Phase 5 split problem — go back and split.

Phase 4: Review Plan

Apply Dispatch Hygiene to the plan and to each reviewer prompt before sending. Dispatch @check and @simplify in parallel to review the plan.

Reviewers should evaluate testability:

@check: Is the design testable? Are the right behaviors identified? (Review Framework §8)
@simplify: Is the test scope appropriate? Over-testing proposed?

Merge rules:

@check safety/correctness findings are hard constraints
If @simplify recommends removing something @check flags as needed, @check wins
Note conflicts explicitly

Review loop (max 3 cycles):

Send plan to both reviewers
Merge findings
If verdict is ACCEPTABLE from both (or JUSTIFIED COMPLEXITY from @simplify): proceed to Phase 5
If BLOCK or NEEDS WORK: revise the plan addressing findings, then re-review
Convergence detection: if reviewers return the same findings as the previous cycle, stop the loop early
If still unresolved after 3 cycles: note unresolved blockers and proceed anyway (they will be documented in the workflow summary and commit message)

Phase 5: Split into Tasks

Break the approved plan into discrete tasks for @make. Each task needs:

Required	Description
Task	Clear description of what to implement
Acceptance Criteria	Specific, testable criteria (checkbox format)
Code Context	Actual code snippets from the codebase, not just file paths
Files to Modify	Explicit list, mark new files with "(create)"
Test File	Path for test file. Pick the pattern that matches the project's language — see "Test File Path by Language" below.

Test File Path by Language

The test file path must follow the language's actual test layout. Do not invent paths that look colocated but aren't valid for the language (e.g. src/tests/test_<feature>.rs is not a Rust test location — it's a regular src/ submodule).

Python
- Colocated: <module>/tests/test_<feature>.py (create)
- Top-level: tests/test_<feature>.py (create)
Rust
- Crate-level integration tests: tests/<feature>.rs (create) (or, in a workspace, <crate>/tests/<feature>.rs).
  - If the test references not-yet-existing public API, the task automatically requires a stub-first @make pre-pass before @test runs (see Phase 6 → "Rust integration TDD: stub-first"). Plan for two @make dispatches per such task: stub pass, then body pass.
- Unit-test-only tasks (in-source #[cfg(test)] mod tests): mark the task as NOT_TESTABLE with reason Rust unit-only — @test cannot write inside production source. @make writes those inline as part of its production change.
Polyglot Nix flake
- Match the host language of the code under change (Python or Rust rules above), wrapping commands in nix develop -c … per the agents' devshell rule.

Include Integration Contracts when a task adds/changes function signatures, APIs, config keys, or has dependencies on other tasks.

Include Test Design from Phase 3 when available, attached to the relevant task(s).

Task size: ~10-30 minutes each, single coherent change, clear boundaries.

Split Heuristic — when in doubt, split

A task must be split if any of the following apply:

It touches more than two distinct concerns (e.g. constants + new component + sprite spawn + new system + main wiring is five concerns — at least three tasks).
It changes more than ~50 lines across more than 2 files.
It mixes data/structural changes (constants, types, components) with runtime/system changes (new ECS systems, scheduling, render loops).
It mixes pure-logic changes (math helpers) with stateful changes (queries, world mutation).
It mixes new APIs with their first call sites in the same task.

When a task fails the heuristic, split into:

Foundations — new constants, types, components (no behavior change yet).
Implementation — the actual production logic, calling the foundations.
Wiring — registration in main.rs / lib.rs / app-builder.

Each split is dispatched separately to @make and verified before the next.

Code Context — what to include

The Code Context field exists so @make can find the seam to modify. Provide:

The existing code being replaced (verbatim, marked as "current state"), with ~5–10 lines of surrounding context
Function signatures of helpers @make will need to call
The file's relevant import block

For everything you must not include — drop-in replacements, full function bodies, pre-written test bodies, "here is what to write" — see Dispatch Hygiene → No-Implementation-in-Plan-or-Spec Rule above.

If the task is so well-specified that you've already written the implementation, the task is too small for @make (apply it directly) or you've over-determined the design (revisit Phase 3).

Apply Dispatch Hygiene to each task spec before dispatch in Phase 7.

Phase 6: Write Tests

Apply Dispatch Hygiene to each @test prompt before sending.

For each task from Phase 5, dispatch @test with:

The task spec (acceptance criteria, code context, files to modify)
The Test Design section from the plan (if provided)
The test file path to create (following colocated pattern)

@test writes failing tests and verifies RED with structured failure codes.

Post-step file gate (MANDATORY): Before dispatching @test, snapshot every modified, staged, and untracked file. git diff --name-only alone misses untracked files, which is precisely the state of any new test file @test creates (it cannot git add). Use git status --porcelain so the gate sees them:

git status --porcelain | sed 's/^...//' | sort -u > /tmp/pre_test_baseline.txt

After @test completes, list NEW changes (in the post-snapshot but not the pre-snapshot):

git status --porcelain | sed 's/^...//' | sort -u | comm -23 - /tmp/pre_test_baseline.txt > /tmp/test_new_files.txt

Each line in /tmp/test_new_files.txt is a file path that did not exist (or was unmodified) before @test ran. The gate validates each one against the patterns below. All new files must match the project's test patterns:

Python: **/test_*.py, **/*_test.py, **/conftest.py (new only), **/test_data/**, **/test_fixtures/**
Rust: tests/**/*.rs, **/tests/**/*.rs (workspace-style <crate>/tests/...), **/test_data/**, **/test_fixtures/**

Anti-patterns — discard the output even if the glob matches:

Anything under src/ for Rust (e.g. src/tests/foo.rs, src/**/tests/...). src/tests/ is a regular module path under src/, not a Rust test location, and @test cannot wire it up via mod declarations in production source. Such paths indicate the task spec gave a wrong test path — escalate, don't accept the file.

If any non-matching file appears, or any anti-pattern matches: discard @test output, report violation.

Decision table — handling @test results:

Condition	Action
`TESTS_READY` + `escalate_to_check: false`	Proceed to Phase 7
`TESTS_READY` + `escalate_to_check: true`	Route tests to `@check` for light review. `@check` diagnoses, caller routes fixes to `@test`. Then proceed.
`NOT_TESTABLE` (general reasons)	Route to `@check` for sign-off on justification. If approved, task goes to `@make` without tests.
`NOT_TESTABLE` reason `Rust unit-only`	See "Rust unit-only routing" below. Do not include test code in the `@make` spec; pass test specs only.
`BLOCKED`	Investigate. May need to revise task spec or plan.
Test passes immediately	Investigate — behavior may already exist. Task spec may be wrong.

Rust unit-only routing

When @test returns NOT_TESTABLE: Rust unit-only (the implementation needs in-source #[cfg(test)] mod tests blocks that @test is forbidden from writing), the orchestrator must:

Get @check's sign-off on the justification (no integration-test seam exists).
Build the @make spec with test specifications, not test code:
- "Add #[cfg(test)] mod foo_tests at the bottom of <file> exercising:"
- For each behavior, a one-line description: input → expected output, edge case to cover, error path to assert.
- Where applicable, name the function under test and the assertion type (assert_eq!, assert!, panic on invalid input).
Forbidden in the @make spec: complete #[test] fn … bodies, full module blocks, or any @test-style RED-verified test code. @make writes the inline tests itself based on the spec.
After @make completes, the orchestrator runs the test suite once to confirm RED→GREEN evidence and includes it in the workflow summary.

This keeps the agents in their lanes: @test never writes inside src/, @make writes both the tests and the production code in a single coherent change, and the orchestrator sees explicit test pass evidence.

Rust integration TDD: stub-first (mandatory)

Rust integration tests live in a separate test crate (tests/<feature>.rs) that imports from lib.rs. Any test referencing not-yet-existing public API can only RED at build time, which masks assertion diagnostics. To avoid this, for every Rust task whose @test step writes an integration test against public API that does not yet exist, dispatch a stub-first @make pass before @test runs:

Stub pass (split from Phase 7's body pass):

Dispatch @make in standard mode (no tests exist yet) with this exact scope:
- Goal: add the planned public API as todo!()-bodied stubs so the integration test will compile.
- Files to modify: src/lib.rs (add pub mod …; declarations) plus any new src/<module>.rs files containing the stub functions/structs.
- Stubs only: every function body is exactly todo!(). Every method body is exactly todo!(). Public structs may use pub struct Foo; or pub struct Foo { /* fields TBD */ } — but no logic.
- Signatures must match the planned final API exactly (return types, lifetimes, generics) — otherwise the integration test will mismatch later. Lift signatures from the Phase 3 plan / Phase 5 task spec.
- Acceptance criteria: cargo check (wrapped in nix develop -c … if the project has a devshell) passes; no test command is run.
- Dispatch Hygiene still applies: the stub pass is small and finalized — no draft bodies, no contradictory signatures.
Verify cargo check passed in @make's output. If not, fix and re-dispatch the stub pass before continuing.
Dispatch @test as normal. The integration test now compiles; running it panics on todo!() at runtime, which is a clean MISSING_BEHAVIOR RED with a stack trace — far better than the build-error-RED form.
Continue to Phase 7's body pass (@make in TDD mode), where the same files are revisited and the todo!() bodies are replaced.

This routing is mandatory for new public API in Rust. It is not required when the integration test exercises an existing public API (e.g. a behavior fix where the function already exists) — in that case @test runs directly and @make modifies the body in Phase 7.

The stub pass and the body pass each produce their own atomic commit (per Phase 9 rules): feat(<scope>): scaffold <thing> with todo!() stubs followed by feat(<scope>): implement <thing> (or whichever conventional type fits).

Parallelism: Independent tasks can have tests written in parallel. Constraint: @test must not modify existing conftest.py files (prevents collision during parallel execution).

Phase 7: Implement

Apply Dispatch Hygiene to each @make spec before sending. Repeated trips on a single task signal a Phase 5 split problem — go back and split.

Execute each task by dispatching @make with:

The task spec (from Phase 5, finalized per Dispatch Hygiene)
Relevant code context (seam-revealing snippets only — see Phase 5 "Code Context — what to include")
Pre-written failing tests and handoff from @test (if TESTS_READY)

@make runs in TDD mode when tests are provided:

Entry validation: run tests, verify RED, check failure codes match handoff
Implement minimal code to make tests pass (GREEN)
Regression check on broader area
Refactor while keeping green
Report RED→GREEN evidence

Escalation: If @make flags test quality concerns during entry validation:

@make reports the issue to caller
Caller routes to @check for diagnosis
@check reports findings
Caller routes to @test for fixes
Fixed tests return to @make

For NOT_TESTABLE tasks, @make runs in standard mode.

After all tasks complete, verify overall integration:

Run the project's test suite if available
Run linting/type checking if configured
Fix any integration issues between tasks

Phase 8: Final Review

Apply Dispatch Hygiene to each reviewer prompt before sending. Dispatch @check and @simplify in parallel to review the full implementation (all changes across all files).

Provide reviewers with:

The original plan
The full diff (git diff "$BASE_BRANCH"...HEAD)
Any decisions or deviations from the plan