From 832306c8170e1dec9d8605e34153451c30725e4f Mon Sep 17 00:00:00 2001 From: Harald Hoyer Date: Wed, 6 May 2026 20:25:40 +0200 Subject: [PATCH 1/3] fix(opencode): harden workflow against multi-task spec dumps MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A workflow run on a Rust/Bevy task produced a single @make dispatch covering six tasks (~2 hours of work), with the orchestrator drafting the full replacement code, including a self-contradicting "actually that's wrong, let me correct…" revision pass and a `nix develop --command bash -c "cargo check"` invocation that @make's sandbox denies. None of the failure modes were caught before dispatch. Phase 5 gains three new subsections: - Split Heuristic — explicit rules for when a task must be split (>2 concerns, >50 lines / 2 files, structural+runtime+wiring mix); prescribes the foundations / implementation / wiring split. - Code Context Anti-patterns — the field is for seam-revealing snippets, not finished answers; max ~5-line snippets, no full replacement bodies. - Finalized-Text Rule — task specs must be single-author finalized text, no "actually, that's wrong" revision passes, no two-version code blocks, no unresolved questions. Phase 6 promotes the Rust unit-only NOT_TESTABLE case out of the decision table into a dedicated routing subsection. The orchestrator must pass test *specifications* (one-line behavior descriptions, target functions, assertion types) to @make — never test code — and run the suite once after @make to capture RED→GREEN evidence. Phase 7 gains a mandatory Pre-Dispatch Validation table that rejects specs containing `bash -c` / `sh -c` (any nesting), `nix develop -c bash`, `cd &&`, oversized Code Context blocks, contradictory revisions, or duplicated test bodies. Repeated trips signal a Phase 5 split problem and route back to splitting. --- config/opencode/commands/workflow.md | 73 ++++++++++++++++++++++++++-- 1 file changed, 70 insertions(+), 3 deletions(-) diff --git a/config/opencode/commands/workflow.md b/config/opencode/commands/workflow.md index 2ff213f..29db101 100644 --- a/config/opencode/commands/workflow.md +++ b/config/opencode/commands/workflow.md @@ -122,6 +122,43 @@ Include **Test Design** from Phase 3 when available, attached to the relevant ta **Task size:** ~10-30 minutes each, single coherent change, clear boundaries. +### Split Heuristic — when in doubt, split + +A task must be **split** if any of the following apply: + +- It touches more than two distinct concerns (e.g. *constants + new component + sprite spawn + new system + main wiring* is **five** concerns — at least three tasks). +- It changes more than ~50 lines across more than 2 files. +- It mixes data/structural changes (constants, types, components) with runtime/system changes (new ECS systems, scheduling, render loops). +- It mixes pure-logic changes (math helpers) with stateful changes (queries, world mutation). +- It mixes new APIs with their first call sites in the same task. + +When a task fails the heuristic, split into: +1. **Foundations** — new constants, types, components (no behavior change yet). +2. **Implementation** — the actual production logic, calling the foundations. +3. **Wiring** — registration in `main.rs` / `lib.rs` / app-builder. + +Each split is dispatched separately to `@make` and verified before the next. + +### Code Context Anti-patterns + +The **Code Context** field exists so `@make` can find the seam to modify, not so it can read off a finished answer. Strictly follow: + +- **Provide:** the existing code being replaced (verbatim), the surrounding ~5–10 lines of context, function signatures of helpers `@make` will need to call, the file's relevant import block. +- **Do NOT provide:** a complete drop-in replacement, the new function bodies, the test bodies (those come from `@test` or — for unit-only Rust — from `@make` itself per Phase 6), or any "here is what to write" code block longer than ~5 lines. + +If the task is so well-specified that you've already written the implementation, the task is too small for `@make` (apply it directly) or you've over-determined the design (revisit Phase 3). + +### Finalized-Text Rule + +Each task spec must be **finalized** before dispatch — single-author text with no contradictions. **Forbidden in dispatch prompts:** + +- "Actually, that's wrong — let me correct…" +- "Wait, let me revise…" +- Two versions of the same code block with one labelled "corrected" +- Open questions or ambiguities the orchestrator hasn't resolved + +If you find yourself revising while writing the spec, stop, redo the spec from scratch with the corrected understanding, and only then dispatch. `@make` is a fresh-context implementer; it cannot reliably resolve which of two contradictory drafts is canonical. + --- ## Phase 6: Write Tests @@ -157,10 +194,25 @@ If any non-matching file appears, or any anti-pattern matches: discard `@test` o |-----------|--------| | `TESTS_READY` + `escalate_to_check: false` | Proceed to Phase 7 | | `TESTS_READY` + `escalate_to_check: true` | Route tests to `@check` for light review. `@check` diagnoses, caller routes fixes to `@test`. Then proceed. | -| `NOT_TESTABLE` | Route to `@check` for sign-off on justification. If approved, task goes to `@make` without tests. | +| `NOT_TESTABLE` (general reasons) | Route to `@check` for sign-off on justification. If approved, task goes to `@make` without tests. | +| `NOT_TESTABLE` reason `Rust unit-only` | See "Rust unit-only routing" below. **Do not** include test code in the `@make` spec; pass test specs only. | | `BLOCKED` | Investigate. May need to revise task spec or plan. | | Test passes immediately | Investigate — behavior may already exist. Task spec may be wrong. | +### Rust unit-only routing + +When `@test` returns `NOT_TESTABLE: Rust unit-only` (the implementation needs in-source `#[cfg(test)] mod tests` blocks that `@test` is forbidden from writing), the orchestrator must: + +1. Get `@check`'s sign-off on the justification (no integration-test seam exists). +2. Build the `@make` spec with **test specifications**, not test code: + - "Add `#[cfg(test)] mod foo_tests` at the bottom of `` exercising:" + - For each behavior, a one-line description: input → expected output, edge case to cover, error path to assert. + - Where applicable, name the function under test and the assertion type (`assert_eq!`, `assert!`, panic on invalid input). +3. **Forbidden in the `@make` spec:** complete `#[test] fn …` bodies, full module blocks, or any `@test`-style RED-verified test code. `@make` writes the inline tests itself based on the spec. +4. After `@make` completes, the orchestrator runs the test suite once to confirm RED→GREEN evidence and includes it in the workflow summary. + +This keeps the agents in their lanes: `@test` never writes inside `src/`, `@make` writes both the tests and the production code in a single coherent change, and the orchestrator sees explicit test pass evidence. + **Parallelism:** Independent tasks can have tests written in parallel. **Constraint:** `@test` must not modify existing conftest.py files (prevents collision during parallel execution). @@ -169,10 +221,25 @@ If any non-matching file appears, or any anti-pattern matches: discard `@test` o ## Phase 7: Implement Execute each task by dispatching `@make` with: -- The task spec (from Phase 5) -- Relevant code context (actual snippets) +- The task spec (from Phase 5, finalized — see Finalized-Text Rule) +- Relevant code context (seam-revealing snippets only — see Code Context Anti-patterns) - **Pre-written failing tests and handoff from `@test` (if TESTS_READY)** +### Pre-Dispatch Validation (MANDATORY) + +Before sending the spec to `@make`, scan it and reject (revise, then retry) if any of the following are present: + +| Check | Why it matters | +|---|---| +| `bash -c`, `sh -c`, `zsh -c`, `fish -c` (anywhere, including inside `nix develop --command bash -c …`) | `@make`'s sandbox denies all `*-c` shell invocations and any nested `bash` would bypass the per-command allowlist. Replace with one direct command per line: `nix develop -c cargo check`, `nix develop -c cargo test`, etc. | +| `nix develop --command bash` / `nix develop -c bash` / `nix develop -c sh` | Same — the inner shell escapes the sandbox. Wrap each toolchain command directly. | +| Any `cd && …` | `@make` cannot `cd`. Rewrite to use absolute paths or `git -C ` for git operations (and `@make` doesn't run git anyway). | +| Code blocks longer than ~5 lines under "Code Context" or labelled as the answer | Violates Code Context Anti-patterns. Trim to the seam. | +| Two versions of the same code, "actually let me correct…", or open questions | Violates the Finalized-Text Rule. Redo the spec. | +| Test bodies inside the `@make` spec when tests are coming from `@test` | The TDD handoff already provides them; duplicating creates conflict. | + +If any check trips, **do not dispatch.** Fix the spec and re-validate. Repeated trips on the same task signal a Phase 5 split problem — go back and split. + `@make` runs in TDD mode when tests are provided: 1. Entry validation: run tests, verify RED, check failure codes match handoff 2. Implement minimal code to make tests pass (GREEN) From 5b5c59aa844197d7ec92dc751d1438c96abd0e79 Mon Sep 17 00:00:00 2001 From: Harald Hoyer Date: Thu, 7 May 2026 05:42:16 +0200 Subject: [PATCH 2/3] feat(opencode): mandate stub-first @make pre-pass for Rust integration TDD MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Rust integration tests live in a separate test crate that imports from lib.rs, so any test referencing not-yet-existing public API can only RED at build time. The build error masks assertion diagnostics and makes the RED state opaque — no stack trace, no left/right values. For Rust tasks whose @test step writes an integration test against public API that does not yet exist, the orchestrator now dispatches a stub-first @make pass before @test runs: 1. @make adds the planned public API as todo!()-bodied stubs in lib.rs and any new src/.rs. Signatures lifted verbatim from the Phase 5 task spec. Acceptance criterion is cargo check only — no test command runs. 2. @test writes the integration test, which now compiles and panics at todo!() with a stack trace — a clean MISSING_BEHAVIOR RED. 3. Phase 7 dispatches @make again to replace the todo!() bodies with real implementations. Two atomic commits per task: scaffold then implement. Phase 5's Rust test-path guidance now flags the two-dispatch requirement up front. test.md's Rust failure-classification hints recognize todo!() / unimplemented!() panics as MISSING_BEHAVIOR with a pointer to the workflow's stub-first section. --- config/opencode/agents/test.md | 1 + config/opencode/commands/workflow.md | 24 +++++++++++++++++++++++- 2 files changed, 24 insertions(+), 1 deletion(-) diff --git a/config/opencode/agents/test.md b/config/opencode/agents/test.md index 7afacff..af4872a 100644 --- a/config/opencode/agents/test.md +++ b/config/opencode/agents/test.md @@ -200,6 +200,7 @@ After running tests, classify each failure: **Mapping hints (Rust):** - `error[E0432]: unresolved import` / `error[E0425]: cannot find function/value` for the symbol under test → `MISSING_BEHAVIOR` - `error[E0599]: no method named ...` on a real but incomplete type → `MISSING_BEHAVIOR` +- Test panics with `not yet implemented` / `not implemented: …` (from `todo!()` or `unimplemented!()` in a stub body) → `MISSING_BEHAVIOR` (this is the expected RED state for stub-first integration TDD; see workflow Phase 6 "Rust integration TDD: stub-first") - Test panics with `assertion failed: ... left: ..., right: ...` → `ASSERTION_MISMATCH` - Test file fails to compile due to its own bug (typo, wrong type, unused-import-as-error) → `TEST_BROKEN` - `linker not found`, missing system library, missing feature flag → `ENV_BROKEN` diff --git a/config/opencode/commands/workflow.md b/config/opencode/commands/workflow.md index 29db101..2402427 100644 --- a/config/opencode/commands/workflow.md +++ b/config/opencode/commands/workflow.md @@ -111,7 +111,8 @@ The test file path must follow the language's actual test layout. **Do not inven - Colocated: `/tests/test_.py (create)` - Top-level: `tests/test_.py (create)` - **Rust** - - Crate-level integration tests: `tests/.rs (create)` (or, in a workspace, `/tests/.rs`) + - Crate-level integration tests: `tests/.rs (create)` (or, in a workspace, `/tests/.rs`). + - **If the test references not-yet-existing public API**, the task automatically requires a **stub-first `@make` pre-pass** before `@test` runs (see Phase 6 → "Rust integration TDD: stub-first"). Plan for two `@make` dispatches per such task: stub pass, then body pass. - **Unit-test-only tasks (in-source `#[cfg(test)] mod tests`):** mark the task as `NOT_TESTABLE` with reason `Rust unit-only` — `@test` cannot write inside production source. `@make` writes those inline as part of its production change. - **Polyglot Nix flake** - Match the host language of the code under change (Python or Rust rules above), wrapping commands in `nix develop -c …` per the agents' devshell rule. @@ -213,6 +214,27 @@ When `@test` returns `NOT_TESTABLE: Rust unit-only` (the implementation needs in This keeps the agents in their lanes: `@test` never writes inside `src/`, `@make` writes both the tests and the production code in a single coherent change, and the orchestrator sees explicit test pass evidence. +### Rust integration TDD: stub-first (mandatory) + +Rust integration tests live in a separate test crate (`tests/.rs`) that imports from `lib.rs`. Any test referencing not-yet-existing public API can only RED at *build* time, which masks assertion diagnostics. To avoid this, **for every Rust task whose `@test` step writes an integration test against public API that does not yet exist**, dispatch a stub-first `@make` pass *before* `@test` runs: + +**Stub pass (split from Phase 7's body pass):** + +1. Dispatch `@make` in **standard mode** (no tests exist yet) with this exact scope: + - **Goal:** add the planned public API as `todo!()`-bodied stubs so the integration test will compile. + - **Files to modify:** `src/lib.rs` (add `pub mod …;` declarations) plus any new `src/.rs` files containing the stub functions/structs. + - **Stubs only:** every function body is exactly `todo!()`. Every method body is exactly `todo!()`. Public structs may use `pub struct Foo;` or `pub struct Foo { /* fields TBD */ }` — but no logic. + - **Signatures must match the planned final API exactly** (return types, lifetimes, generics) — otherwise the integration test will mismatch later. Lift signatures from the Phase 3 plan / Phase 5 task spec. + - **Acceptance criteria:** `cargo check` (wrapped in `nix develop -c …` if the project has a devshell) passes; no test command is run. + - **Code Context Anti-patterns still apply:** the stub pass is small and finalized — no draft bodies, no contradictory signatures. +2. Verify `cargo check` passed in `@make`'s output. If not, fix and re-dispatch the stub pass before continuing. +3. Dispatch `@test` as normal. The integration test now compiles; running it panics on `todo!()` at runtime, which is a clean `MISSING_BEHAVIOR` RED with a stack trace — far better than the build-error-RED form. +4. Continue to Phase 7's body pass (`@make` in TDD mode), where the same files are revisited and the `todo!()` bodies are replaced. + +**This routing is mandatory** for new public API in Rust. It is **not** required when the integration test exercises an existing public API (e.g. a behavior fix where the function already exists) — in that case `@test` runs directly and `@make` modifies the body in Phase 7. + +The stub pass and the body pass each produce their own atomic commit (per Phase 9 rules): `feat(): scaffold with todo!() stubs` followed by `feat(): implement ` (or whichever conventional type fits). + **Parallelism:** Independent tasks can have tests written in parallel. **Constraint:** `@test` must not modify existing conftest.py files (prevents collision during parallel execution). From 1aa98a805183f05f33a20d3be7c6d572ec98e091 Mon Sep 17 00:00:00 2001 From: Harald Hoyer Date: Thu, 7 May 2026 05:45:35 +0200 Subject: [PATCH 3/3] fix(opencode): require real shell timestamp in workflow summary MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A workflow run wrote the timestamp as `2026-05-06T???:???:?? (session date)` because the agent had no time-of-day source and inserted a placeholder. Phase 9 now mandates capturing the timestamp from the shell at write time via `date -Iseconds` and forbids placeholders — omit the field rather than fabricate one. --- config/opencode/commands/workflow.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/config/opencode/commands/workflow.md b/config/opencode/commands/workflow.md index 2402427..f1b21ae 100644 --- a/config/opencode/commands/workflow.md +++ b/config/opencode/commands/workflow.md @@ -323,7 +323,7 @@ The workflow is forge-agnostic. It commits locally and stops. **Do not push, and ### Local Summary - Write `.opencode/workflow-summary.md` in the worktree with: - - Run timestamp + - **Run timestamp** — capture it from the shell at write time: `date -Iseconds` (e.g. `2026-05-07T11:24:13+02:00`). **Do not** use a placeholder like `???:???:??` or "session date" — if you cannot get a real timestamp, omit the field entirely rather than fabricating one. - Issue reference and title - Branch name and final commit SHA(s) - Summary of implementation