Rust integration tests live in a separate test crate that imports from lib.rs, so any test referencing not-yet-existing public API can only RED at build time. The build error masks assertion diagnostics and makes the RED state opaque — no stack trace, no left/right values. For Rust tasks whose @test step writes an integration test against public API that does not yet exist, the orchestrator now dispatches a stub-first @make pass before @test runs: 1. @make adds the planned public API as todo!()-bodied stubs in lib.rs and any new src/<module>.rs. Signatures lifted verbatim from the Phase 5 task spec. Acceptance criterion is cargo check only — no test command runs. 2. @test writes the integration test, which now compiles and panics at todo!() with a stack trace — a clean MISSING_BEHAVIOR RED. 3. Phase 7 dispatches @make again to replace the todo!() bodies with real implementations. Two atomic commits per task: scaffold then implement. Phase 5's Rust test-path guidance now flags the two-dispatch requirement up front. test.md's Rust failure-classification hints recognize todo!() / unimplemented!() panics as MISSING_BEHAVIOR with a pointer to the workflow's stub-first section.
347 lines
20 KiB
Markdown
347 lines
20 KiB
Markdown
---
|
||
description: "Multi-agent workflow for the current worktree: plan, test, implement, commit"
|
||
agent: build
|
||
---
|
||
|
||
You are executing the multi-agent workflow inside the worktree this opencode session was started from. Run all phases without waiting for user input. The user has walked away.
|
||
|
||
**Prerequisites (the user handles before launching opencode):**
|
||
- A git worktree is checked out for the issue's feature branch
|
||
- `opencode` was launched from the root of that worktree
|
||
- `TODO.md` is committed to the repo and present at `./TODO.md`
|
||
|
||
**Task reference:** $ARGUMENTS
|
||
|
||
If `$ARGUMENTS` is empty, stop immediately: "Usage: `/workflow <ISSUE-ID> [base-branch]` (e.g. `/workflow ABC-1`). The ID must exist in `./TODO.md`. Base branch defaults to `main` (then `master`)."
|
||
|
||
Parse `$ARGUMENTS`: the first whitespace-separated token is the issue ID, an optional second token overrides the base branch.
|
||
|
||
---
|
||
|
||
## Phase 1: Sanity Check
|
||
|
||
1. Verify CWD is a non-bare git worktree: `git rev-parse --is-bare-repository 2>/dev/null` must output `false`. If not, stop: "Workflow must be run from a non-bare worktree (the directory opencode was launched in)."
|
||
2. Verify `./TODO.md` exists. If not, stop: "TODO.md not found in the current worktree. Commit a TODO.md to the repo first."
|
||
3. Verify HEAD is not detached: `git symbolic-ref --short HEAD` must succeed. If it fails, stop: "Cannot run on a detached HEAD. Check out a feature branch first."
|
||
4. Capture the current branch: `BRANCH_NAME="$(git symbolic-ref --short HEAD)"`.
|
||
5. Resolve the base branch (`BASE_BRANCH`):
|
||
- If `$ARGUMENTS` provided a second token, use it.
|
||
- Else if `git rev-parse --verify --quiet main` succeeds, use `main`.
|
||
- Else if `git rev-parse --verify --quiet master` succeeds, use `master`.
|
||
- Else stop: "Could not determine base branch (no `main` or `master`). Pass it as the second argument: `/workflow <ISSUE-ID> <base-branch>`."
|
||
6. Verify the current branch is not the base branch: if `BRANCH_NAME == BASE_BRANCH`, stop: "Cannot run workflow on the base branch (`$BASE_BRANCH`). Switch to a feature branch first."
|
||
|
||
---
|
||
|
||
## Phase 2: Issue Context
|
||
|
||
Dispatch `@pm` to read `./TODO.md` (live filesystem mode) and fetch the issue matching the parsed ID:
|
||
- Issue title, description, acceptance criteria
|
||
- Labels and priority
|
||
- Existing status
|
||
|
||
If the issue does not exist or `@pm` fails, stop with error.
|
||
|
||
If the issue's status is `Backlog` or `Todo`, ask `@pm` to set it to `In Progress` (this edit will be staged in Phase 9 alongside other TODO.md updates).
|
||
|
||
---
|
||
|
||
## Phase 3: Plan
|
||
|
||
Analyze the codebase. Create a detailed implementation plan addressing the issue's requirements and acceptance criteria.
|
||
|
||
The plan should include:
|
||
- Problem summary (from issue context)
|
||
- Proposed approach with rationale
|
||
- Files to modify (with brief description of changes)
|
||
- New files to create
|
||
- Risks and open questions
|
||
- **Test Design (conditional — include for non-trivial tasks):**
|
||
- Key behaviors to verify (what tests should assert)
|
||
- Edge cases and error conditions worth testing
|
||
- What explicitly should NOT be tested (prevents bloat)
|
||
- Testability concerns (heavy external deps, GPU-only paths, etc.)
|
||
|
||
**Include Test Design for:** Public API changes, bug fixes with behavioral impact, new features with business logic, multi-module changes.
|
||
**Skip Test Design for:** Config-only changes, decorator swaps, import reorganization, documentation.
|
||
When skipped, `@test` derives test cases directly from acceptance criteria.
|
||
|
||
---
|
||
|
||
## Phase 4: Review Plan
|
||
|
||
Dispatch `@check` and `@simplify` in parallel to review the plan.
|
||
|
||
Reviewers should evaluate testability:
|
||
- `@check`: Is the design testable? Are the right behaviors identified? (Review Framework §8)
|
||
- `@simplify`: Is the test scope appropriate? Over-testing proposed?
|
||
|
||
**Merge rules:**
|
||
- `@check` safety/correctness findings are hard constraints
|
||
- If `@simplify` recommends removing something `@check` flags as needed, `@check` wins
|
||
- Note conflicts explicitly
|
||
|
||
**Review loop (max 3 cycles):**
|
||
1. Send plan to both reviewers
|
||
2. Merge findings
|
||
3. If verdict is ACCEPTABLE from both (or JUSTIFIED COMPLEXITY from `@simplify`): proceed to Phase 5
|
||
4. If BLOCK or NEEDS WORK: revise the plan addressing findings, then re-review
|
||
5. **Convergence detection:** if reviewers return the same findings as the previous cycle, stop the loop early
|
||
6. If still unresolved after 3 cycles: note unresolved blockers and proceed anyway (they will be documented in the workflow summary and commit message)
|
||
|
||
---
|
||
|
||
## Phase 5: Split into Tasks
|
||
|
||
Break the approved plan into discrete tasks for `@make`. Each task needs:
|
||
|
||
| Required | Description |
|
||
|----------|-------------|
|
||
| **Task** | Clear description of what to implement |
|
||
| **Acceptance Criteria** | Specific, testable criteria (checkbox format) |
|
||
| **Code Context** | Actual code snippets from the codebase, not just file paths |
|
||
| **Files to Modify** | Explicit list, mark new files with "(create)" |
|
||
| **Test File** | Path for test file. **Pick the pattern that matches the project's language** — see "Test File Path by Language" below. |
|
||
|
||
### Test File Path by Language
|
||
|
||
The test file path must follow the language's actual test layout. **Do not invent paths that look colocated but aren't valid for the language** (e.g. `src/tests/test_<feature>.rs` is *not* a Rust test location — it's a regular `src/` submodule).
|
||
|
||
- **Python**
|
||
- Colocated: `<module>/tests/test_<feature>.py (create)`
|
||
- Top-level: `tests/test_<feature>.py (create)`
|
||
- **Rust**
|
||
- Crate-level integration tests: `tests/<feature>.rs (create)` (or, in a workspace, `<crate>/tests/<feature>.rs`).
|
||
- **If the test references not-yet-existing public API**, the task automatically requires a **stub-first `@make` pre-pass** before `@test` runs (see Phase 6 → "Rust integration TDD: stub-first"). Plan for two `@make` dispatches per such task: stub pass, then body pass.
|
||
- **Unit-test-only tasks (in-source `#[cfg(test)] mod tests`):** mark the task as `NOT_TESTABLE` with reason `Rust unit-only` — `@test` cannot write inside production source. `@make` writes those inline as part of its production change.
|
||
- **Polyglot Nix flake**
|
||
- Match the host language of the code under change (Python or Rust rules above), wrapping commands in `nix develop -c …` per the agents' devshell rule.
|
||
|
||
Include **Integration Contracts** when a task adds/changes function signatures, APIs, config keys, or has dependencies on other tasks.
|
||
|
||
Include **Test Design** from Phase 3 when available, attached to the relevant task(s).
|
||
|
||
**Task size:** ~10-30 minutes each, single coherent change, clear boundaries.
|
||
|
||
### Split Heuristic — when in doubt, split
|
||
|
||
A task must be **split** if any of the following apply:
|
||
|
||
- It touches more than two distinct concerns (e.g. *constants + new component + sprite spawn + new system + main wiring* is **five** concerns — at least three tasks).
|
||
- It changes more than ~50 lines across more than 2 files.
|
||
- It mixes data/structural changes (constants, types, components) with runtime/system changes (new ECS systems, scheduling, render loops).
|
||
- It mixes pure-logic changes (math helpers) with stateful changes (queries, world mutation).
|
||
- It mixes new APIs with their first call sites in the same task.
|
||
|
||
When a task fails the heuristic, split into:
|
||
1. **Foundations** — new constants, types, components (no behavior change yet).
|
||
2. **Implementation** — the actual production logic, calling the foundations.
|
||
3. **Wiring** — registration in `main.rs` / `lib.rs` / app-builder.
|
||
|
||
Each split is dispatched separately to `@make` and verified before the next.
|
||
|
||
### Code Context Anti-patterns
|
||
|
||
The **Code Context** field exists so `@make` can find the seam to modify, not so it can read off a finished answer. Strictly follow:
|
||
|
||
- **Provide:** the existing code being replaced (verbatim), the surrounding ~5–10 lines of context, function signatures of helpers `@make` will need to call, the file's relevant import block.
|
||
- **Do NOT provide:** a complete drop-in replacement, the new function bodies, the test bodies (those come from `@test` or — for unit-only Rust — from `@make` itself per Phase 6), or any "here is what to write" code block longer than ~5 lines.
|
||
|
||
If the task is so well-specified that you've already written the implementation, the task is too small for `@make` (apply it directly) or you've over-determined the design (revisit Phase 3).
|
||
|
||
### Finalized-Text Rule
|
||
|
||
Each task spec must be **finalized** before dispatch — single-author text with no contradictions. **Forbidden in dispatch prompts:**
|
||
|
||
- "Actually, that's wrong — let me correct…"
|
||
- "Wait, let me revise…"
|
||
- Two versions of the same code block with one labelled "corrected"
|
||
- Open questions or ambiguities the orchestrator hasn't resolved
|
||
|
||
If you find yourself revising while writing the spec, stop, redo the spec from scratch with the corrected understanding, and only then dispatch. `@make` is a fresh-context implementer; it cannot reliably resolve which of two contradictory drafts is canonical.
|
||
|
||
---
|
||
|
||
## Phase 6: Write Tests
|
||
|
||
For each task from Phase 5, dispatch `@test` with:
|
||
- The task spec (acceptance criteria, code context, files to modify)
|
||
- The Test Design section from the plan (if provided)
|
||
- The test file path to create (following colocated pattern)
|
||
|
||
`@test` writes failing tests and verifies RED with structured failure codes.
|
||
|
||
**Post-step file gate (MANDATORY):**
|
||
Before dispatching `@test`, snapshot the current changed files:
|
||
```bash
|
||
git diff --name-only > /tmp/pre_test_baseline.txt
|
||
```
|
||
After `@test` completes, validate only NEW changes:
|
||
```bash
|
||
git diff --name-only | comm -23 - /tmp/pre_test_baseline.txt > /tmp/test_new_files.txt
|
||
```
|
||
All new files must match the project's test patterns:
|
||
- Python: `**/test_*.py`, `**/*_test.py`, `**/conftest.py` (new only), `**/test_data/**`, `**/test_fixtures/**`
|
||
- Rust: `tests/**/*.rs`, `**/tests/**/*.rs` (workspace-style `<crate>/tests/...`), `**/test_data/**`, `**/test_fixtures/**`
|
||
|
||
**Anti-patterns — discard the output even if the glob matches:**
|
||
- Anything under `src/` for Rust (e.g. `src/tests/foo.rs`, `src/**/tests/...`). `src/tests/` is a regular module path under `src/`, not a Rust test location, and `@test` cannot wire it up via `mod` declarations in production source. Such paths indicate the task spec gave a wrong test path — escalate, don't accept the file.
|
||
|
||
If any non-matching file appears, or any anti-pattern matches: discard `@test` output, report violation.
|
||
|
||
**Decision table — handling `@test` results:**
|
||
|
||
| Condition | Action |
|
||
|-----------|--------|
|
||
| `TESTS_READY` + `escalate_to_check: false` | Proceed to Phase 7 |
|
||
| `TESTS_READY` + `escalate_to_check: true` | Route tests to `@check` for light review. `@check` diagnoses, caller routes fixes to `@test`. Then proceed. |
|
||
| `NOT_TESTABLE` (general reasons) | Route to `@check` for sign-off on justification. If approved, task goes to `@make` without tests. |
|
||
| `NOT_TESTABLE` reason `Rust unit-only` | See "Rust unit-only routing" below. **Do not** include test code in the `@make` spec; pass test specs only. |
|
||
| `BLOCKED` | Investigate. May need to revise task spec or plan. |
|
||
| Test passes immediately | Investigate — behavior may already exist. Task spec may be wrong. |
|
||
|
||
### Rust unit-only routing
|
||
|
||
When `@test` returns `NOT_TESTABLE: Rust unit-only` (the implementation needs in-source `#[cfg(test)] mod tests` blocks that `@test` is forbidden from writing), the orchestrator must:
|
||
|
||
1. Get `@check`'s sign-off on the justification (no integration-test seam exists).
|
||
2. Build the `@make` spec with **test specifications**, not test code:
|
||
- "Add `#[cfg(test)] mod foo_tests` at the bottom of `<file>` exercising:"
|
||
- For each behavior, a one-line description: input → expected output, edge case to cover, error path to assert.
|
||
- Where applicable, name the function under test and the assertion type (`assert_eq!`, `assert!`, panic on invalid input).
|
||
3. **Forbidden in the `@make` spec:** complete `#[test] fn …` bodies, full module blocks, or any `@test`-style RED-verified test code. `@make` writes the inline tests itself based on the spec.
|
||
4. After `@make` completes, the orchestrator runs the test suite once to confirm RED→GREEN evidence and includes it in the workflow summary.
|
||
|
||
This keeps the agents in their lanes: `@test` never writes inside `src/`, `@make` writes both the tests and the production code in a single coherent change, and the orchestrator sees explicit test pass evidence.
|
||
|
||
### Rust integration TDD: stub-first (mandatory)
|
||
|
||
Rust integration tests live in a separate test crate (`tests/<feature>.rs`) that imports from `lib.rs`. Any test referencing not-yet-existing public API can only RED at *build* time, which masks assertion diagnostics. To avoid this, **for every Rust task whose `@test` step writes an integration test against public API that does not yet exist**, dispatch a stub-first `@make` pass *before* `@test` runs:
|
||
|
||
**Stub pass (split from Phase 7's body pass):**
|
||
|
||
1. Dispatch `@make` in **standard mode** (no tests exist yet) with this exact scope:
|
||
- **Goal:** add the planned public API as `todo!()`-bodied stubs so the integration test will compile.
|
||
- **Files to modify:** `src/lib.rs` (add `pub mod …;` declarations) plus any new `src/<module>.rs` files containing the stub functions/structs.
|
||
- **Stubs only:** every function body is exactly `todo!()`. Every method body is exactly `todo!()`. Public structs may use `pub struct Foo;` or `pub struct Foo { /* fields TBD */ }` — but no logic.
|
||
- **Signatures must match the planned final API exactly** (return types, lifetimes, generics) — otherwise the integration test will mismatch later. Lift signatures from the Phase 3 plan / Phase 5 task spec.
|
||
- **Acceptance criteria:** `cargo check` (wrapped in `nix develop -c …` if the project has a devshell) passes; no test command is run.
|
||
- **Code Context Anti-patterns still apply:** the stub pass is small and finalized — no draft bodies, no contradictory signatures.
|
||
2. Verify `cargo check` passed in `@make`'s output. If not, fix and re-dispatch the stub pass before continuing.
|
||
3. Dispatch `@test` as normal. The integration test now compiles; running it panics on `todo!()` at runtime, which is a clean `MISSING_BEHAVIOR` RED with a stack trace — far better than the build-error-RED form.
|
||
4. Continue to Phase 7's body pass (`@make` in TDD mode), where the same files are revisited and the `todo!()` bodies are replaced.
|
||
|
||
**This routing is mandatory** for new public API in Rust. It is **not** required when the integration test exercises an existing public API (e.g. a behavior fix where the function already exists) — in that case `@test` runs directly and `@make` modifies the body in Phase 7.
|
||
|
||
The stub pass and the body pass each produce their own atomic commit (per Phase 9 rules): `feat(<scope>): scaffold <thing> with todo!() stubs` followed by `feat(<scope>): implement <thing>` (or whichever conventional type fits).
|
||
|
||
**Parallelism:** Independent tasks can have tests written in parallel.
|
||
**Constraint:** `@test` must not modify existing conftest.py files (prevents collision during parallel execution).
|
||
|
||
---
|
||
|
||
## Phase 7: Implement
|
||
|
||
Execute each task by dispatching `@make` with:
|
||
- The task spec (from Phase 5, finalized — see Finalized-Text Rule)
|
||
- Relevant code context (seam-revealing snippets only — see Code Context Anti-patterns)
|
||
- **Pre-written failing tests and handoff from `@test` (if TESTS_READY)**
|
||
|
||
### Pre-Dispatch Validation (MANDATORY)
|
||
|
||
Before sending the spec to `@make`, scan it and reject (revise, then retry) if any of the following are present:
|
||
|
||
| Check | Why it matters |
|
||
|---|---|
|
||
| `bash -c`, `sh -c`, `zsh -c`, `fish -c` (anywhere, including inside `nix develop --command bash -c …`) | `@make`'s sandbox denies all `*-c` shell invocations and any nested `bash` would bypass the per-command allowlist. Replace with one direct command per line: `nix develop -c cargo check`, `nix develop -c cargo test`, etc. |
|
||
| `nix develop --command bash` / `nix develop -c bash` / `nix develop -c sh` | Same — the inner shell escapes the sandbox. Wrap each toolchain command directly. |
|
||
| Any `cd <path> && …` | `@make` cannot `cd`. Rewrite to use absolute paths or `git -C <path>` for git operations (and `@make` doesn't run git anyway). |
|
||
| Code blocks longer than ~5 lines under "Code Context" or labelled as the answer | Violates Code Context Anti-patterns. Trim to the seam. |
|
||
| Two versions of the same code, "actually let me correct…", or open questions | Violates the Finalized-Text Rule. Redo the spec. |
|
||
| Test bodies inside the `@make` spec when tests are coming from `@test` | The TDD handoff already provides them; duplicating creates conflict. |
|
||
|
||
If any check trips, **do not dispatch.** Fix the spec and re-validate. Repeated trips on the same task signal a Phase 5 split problem — go back and split.
|
||
|
||
`@make` runs in TDD mode when tests are provided:
|
||
1. Entry validation: run tests, verify RED, check failure codes match handoff
|
||
2. Implement minimal code to make tests pass (GREEN)
|
||
3. Regression check on broader area
|
||
4. Refactor while keeping green
|
||
5. Report RED→GREEN evidence
|
||
|
||
**Escalation:** If `@make` flags test quality concerns during entry validation:
|
||
1. `@make` reports the issue to caller
|
||
2. Caller routes to `@check` for diagnosis
|
||
3. `@check` reports findings
|
||
4. Caller routes to `@test` for fixes
|
||
5. Fixed tests return to `@make`
|
||
|
||
For NOT_TESTABLE tasks, `@make` runs in standard mode.
|
||
|
||
After all tasks complete, verify overall integration:
|
||
- Run the project's test suite if available
|
||
- Run linting/type checking if configured
|
||
- Fix any integration issues between tasks
|
||
|
||
---
|
||
|
||
## Phase 8: Final Review
|
||
|
||
Dispatch `@check` and `@simplify` in parallel to review the full implementation (all changes across all files).
|
||
|
||
Provide reviewers with:
|
||
- The original plan
|
||
- The full diff (`git diff "$BASE_BRANCH"...HEAD`)
|
||
- Any decisions or deviations from the plan
|
||
|
||
**Review loop (max 3 cycles):**
|
||
1. Send implementation to both reviewers
|
||
2. Merge findings (same precedence rules as Phase 4)
|
||
3. If ACCEPTABLE: proceed to Phase 9
|
||
4. If issues found: fix them directly (no need to re-dispatch `@make` for small fixes), then re-review
|
||
5. **Convergence detection:** same findings twice = stop loop early
|
||
6. If unresolved after 3 cycles: document blockers, proceed to commit anyway
|
||
|
||
---
|
||
|
||
## Phase 9: Commit and Wrap Up
|
||
|
||
The workflow is forge-agnostic. It commits locally and stops. **Do not push, and do not open a pull/merge request** — the user chooses their forge and review workflow manually.
|
||
|
||
### Commit Code Changes
|
||
- Stage code changes only. **Do not stage `TODO.md`** (committed separately below) and **do not stage `.opencode/workflow-summary.md`** (intentionally never committed — see Local Summary).
|
||
- Write a conventional commit message summarizing the implementation. Reference the TODO.md issue ID in the body (e.g. `Refs: ABC-1`).
|
||
- If changes are large/varied, use multiple atomic commits (one per logical unit)
|
||
|
||
### TODO Update
|
||
- Dispatch `@pm` against `./TODO.md` (live filesystem mode). Ask it to:
|
||
- Set **Branch** to `$BRANCH_NAME`
|
||
- Set **Status** to `In Review`
|
||
- Add a comment with the branch name, latest commit SHA, and a one-line summary
|
||
- If acceptance-criteria checkboxes were addressed by the implementation, ask `@pm` to check them off
|
||
- Commit the TODO.md change as a separate atomic commit: `chore(todo): update <issue-id> status and progress`
|
||
|
||
### Local Summary
|
||
- Write `.opencode/workflow-summary.md` in the worktree with:
|
||
- Run timestamp
|
||
- Issue reference and title
|
||
- Branch name and final commit SHA(s)
|
||
- Summary of implementation
|
||
- TDD evidence (RED→GREEN per task, NOT_TESTABLE justifications)
|
||
- Review outcomes (plan review + final review verdicts)
|
||
- Unresolved items (if any)
|
||
- Files changed
|
||
- **Do not commit this file.** It is a per-run, per-branch artifact; committing it would create merge conflicts whenever multiple workflow branches are merged. Leave it untracked. Recommend the user add `.opencode/` to `.gitignore` if not already.
|
||
|
||
---
|
||
|
||
## Failure Handling
|
||
|
||
At any phase, if an unrecoverable error occurs:
|
||
1. Write `.opencode/workflow-summary.md` with what was completed and what failed. Do **not** stage or commit this file.
|
||
2. If any code was written, commit it with message `wip: incomplete workflow run for <issue-id>`. Stage code only — exclude `.opencode/workflow-summary.md`.
|
||
3. Leave the branch and worktree intact for the user to inspect — do not push, do not delete.
|
||
4. Dispatch `@pm` against `./TODO.md` to add a comment on the issue summarising what failed.
|
||
5. Stop execution.
|
||
|
||
**Never hang on interactive prompts.** If any command appears to require input, treat it as a failure and follow the above procedure.
|