fix(opencode): harden workflow against multi-task spec dumps
A workflow run on a Rust/Bevy task produced a single @make dispatch covering six tasks (~2 hours of work), with the orchestrator drafting the full replacement code, including a self-contradicting "actually that's wrong, let me correct…" revision pass and a `nix develop --command bash -c "cargo check"` invocation that @make's sandbox denies. None of the failure modes were caught before dispatch. Phase 5 gains three new subsections: - Split Heuristic — explicit rules for when a task must be split (>2 concerns, >50 lines / 2 files, structural+runtime+wiring mix); prescribes the foundations / implementation / wiring split. - Code Context Anti-patterns — the field is for seam-revealing snippets, not finished answers; max ~5-line snippets, no full replacement bodies. - Finalized-Text Rule — task specs must be single-author finalized text, no "actually, that's wrong" revision passes, no two-version code blocks, no unresolved questions. Phase 6 promotes the Rust unit-only NOT_TESTABLE case out of the decision table into a dedicated routing subsection. The orchestrator must pass test *specifications* (one-line behavior descriptions, target functions, assertion types) to @make — never test code — and run the suite once after @make to capture RED→GREEN evidence. Phase 7 gains a mandatory Pre-Dispatch Validation table that rejects specs containing `bash -c` / `sh -c` (any nesting), `nix develop -c bash`, `cd <path> &&`, oversized Code Context blocks, contradictory revisions, or duplicated test bodies. Repeated trips signal a Phase 5 split problem and route back to splitting.
This commit is contained in:
parent
d5d90d8b9f
commit
832306c817
1 changed files with 70 additions and 3 deletions
|
|
@ -122,6 +122,43 @@ Include **Test Design** from Phase 3 when available, attached to the relevant ta
|
|||
|
||||
**Task size:** ~10-30 minutes each, single coherent change, clear boundaries.
|
||||
|
||||
### Split Heuristic — when in doubt, split
|
||||
|
||||
A task must be **split** if any of the following apply:
|
||||
|
||||
- It touches more than two distinct concerns (e.g. *constants + new component + sprite spawn + new system + main wiring* is **five** concerns — at least three tasks).
|
||||
- It changes more than ~50 lines across more than 2 files.
|
||||
- It mixes data/structural changes (constants, types, components) with runtime/system changes (new ECS systems, scheduling, render loops).
|
||||
- It mixes pure-logic changes (math helpers) with stateful changes (queries, world mutation).
|
||||
- It mixes new APIs with their first call sites in the same task.
|
||||
|
||||
When a task fails the heuristic, split into:
|
||||
1. **Foundations** — new constants, types, components (no behavior change yet).
|
||||
2. **Implementation** — the actual production logic, calling the foundations.
|
||||
3. **Wiring** — registration in `main.rs` / `lib.rs` / app-builder.
|
||||
|
||||
Each split is dispatched separately to `@make` and verified before the next.
|
||||
|
||||
### Code Context Anti-patterns
|
||||
|
||||
The **Code Context** field exists so `@make` can find the seam to modify, not so it can read off a finished answer. Strictly follow:
|
||||
|
||||
- **Provide:** the existing code being replaced (verbatim), the surrounding ~5–10 lines of context, function signatures of helpers `@make` will need to call, the file's relevant import block.
|
||||
- **Do NOT provide:** a complete drop-in replacement, the new function bodies, the test bodies (those come from `@test` or — for unit-only Rust — from `@make` itself per Phase 6), or any "here is what to write" code block longer than ~5 lines.
|
||||
|
||||
If the task is so well-specified that you've already written the implementation, the task is too small for `@make` (apply it directly) or you've over-determined the design (revisit Phase 3).
|
||||
|
||||
### Finalized-Text Rule
|
||||
|
||||
Each task spec must be **finalized** before dispatch — single-author text with no contradictions. **Forbidden in dispatch prompts:**
|
||||
|
||||
- "Actually, that's wrong — let me correct…"
|
||||
- "Wait, let me revise…"
|
||||
- Two versions of the same code block with one labelled "corrected"
|
||||
- Open questions or ambiguities the orchestrator hasn't resolved
|
||||
|
||||
If you find yourself revising while writing the spec, stop, redo the spec from scratch with the corrected understanding, and only then dispatch. `@make` is a fresh-context implementer; it cannot reliably resolve which of two contradictory drafts is canonical.
|
||||
|
||||
---
|
||||
|
||||
## Phase 6: Write Tests
|
||||
|
|
@ -157,10 +194,25 @@ If any non-matching file appears, or any anti-pattern matches: discard `@test` o
|
|||
|-----------|--------|
|
||||
| `TESTS_READY` + `escalate_to_check: false` | Proceed to Phase 7 |
|
||||
| `TESTS_READY` + `escalate_to_check: true` | Route tests to `@check` for light review. `@check` diagnoses, caller routes fixes to `@test`. Then proceed. |
|
||||
| `NOT_TESTABLE` | Route to `@check` for sign-off on justification. If approved, task goes to `@make` without tests. |
|
||||
| `NOT_TESTABLE` (general reasons) | Route to `@check` for sign-off on justification. If approved, task goes to `@make` without tests. |
|
||||
| `NOT_TESTABLE` reason `Rust unit-only` | See "Rust unit-only routing" below. **Do not** include test code in the `@make` spec; pass test specs only. |
|
||||
| `BLOCKED` | Investigate. May need to revise task spec or plan. |
|
||||
| Test passes immediately | Investigate — behavior may already exist. Task spec may be wrong. |
|
||||
|
||||
### Rust unit-only routing
|
||||
|
||||
When `@test` returns `NOT_TESTABLE: Rust unit-only` (the implementation needs in-source `#[cfg(test)] mod tests` blocks that `@test` is forbidden from writing), the orchestrator must:
|
||||
|
||||
1. Get `@check`'s sign-off on the justification (no integration-test seam exists).
|
||||
2. Build the `@make` spec with **test specifications**, not test code:
|
||||
- "Add `#[cfg(test)] mod foo_tests` at the bottom of `<file>` exercising:"
|
||||
- For each behavior, a one-line description: input → expected output, edge case to cover, error path to assert.
|
||||
- Where applicable, name the function under test and the assertion type (`assert_eq!`, `assert!`, panic on invalid input).
|
||||
3. **Forbidden in the `@make` spec:** complete `#[test] fn …` bodies, full module blocks, or any `@test`-style RED-verified test code. `@make` writes the inline tests itself based on the spec.
|
||||
4. After `@make` completes, the orchestrator runs the test suite once to confirm RED→GREEN evidence and includes it in the workflow summary.
|
||||
|
||||
This keeps the agents in their lanes: `@test` never writes inside `src/`, `@make` writes both the tests and the production code in a single coherent change, and the orchestrator sees explicit test pass evidence.
|
||||
|
||||
**Parallelism:** Independent tasks can have tests written in parallel.
|
||||
**Constraint:** `@test` must not modify existing conftest.py files (prevents collision during parallel execution).
|
||||
|
||||
|
|
@ -169,10 +221,25 @@ If any non-matching file appears, or any anti-pattern matches: discard `@test` o
|
|||
## Phase 7: Implement
|
||||
|
||||
Execute each task by dispatching `@make` with:
|
||||
- The task spec (from Phase 5)
|
||||
- Relevant code context (actual snippets)
|
||||
- The task spec (from Phase 5, finalized — see Finalized-Text Rule)
|
||||
- Relevant code context (seam-revealing snippets only — see Code Context Anti-patterns)
|
||||
- **Pre-written failing tests and handoff from `@test` (if TESTS_READY)**
|
||||
|
||||
### Pre-Dispatch Validation (MANDATORY)
|
||||
|
||||
Before sending the spec to `@make`, scan it and reject (revise, then retry) if any of the following are present:
|
||||
|
||||
| Check | Why it matters |
|
||||
|---|---|
|
||||
| `bash -c`, `sh -c`, `zsh -c`, `fish -c` (anywhere, including inside `nix develop --command bash -c …`) | `@make`'s sandbox denies all `*-c` shell invocations and any nested `bash` would bypass the per-command allowlist. Replace with one direct command per line: `nix develop -c cargo check`, `nix develop -c cargo test`, etc. |
|
||||
| `nix develop --command bash` / `nix develop -c bash` / `nix develop -c sh` | Same — the inner shell escapes the sandbox. Wrap each toolchain command directly. |
|
||||
| Any `cd <path> && …` | `@make` cannot `cd`. Rewrite to use absolute paths or `git -C <path>` for git operations (and `@make` doesn't run git anyway). |
|
||||
| Code blocks longer than ~5 lines under "Code Context" or labelled as the answer | Violates Code Context Anti-patterns. Trim to the seam. |
|
||||
| Two versions of the same code, "actually let me correct…", or open questions | Violates the Finalized-Text Rule. Redo the spec. |
|
||||
| Test bodies inside the `@make` spec when tests are coming from `@test` | The TDD handoff already provides them; duplicating creates conflict. |
|
||||
|
||||
If any check trips, **do not dispatch.** Fix the spec and re-validate. Repeated trips on the same task signal a Phase 5 split problem — go back and split.
|
||||
|
||||
`@make` runs in TDD mode when tests are provided:
|
||||
1. Entry validation: run tests, verify RED, check failure codes match handoff
|
||||
2. Implement minimal code to make tests pass (GREEN)
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue