Compare commits

...

3 commits

Author SHA1 Message Date
1aa98a8051 fix(opencode): require real shell timestamp in workflow summary
A workflow run wrote the timestamp as `2026-05-06T???:???:?? (session date)`
because the agent had no time-of-day source and inserted a placeholder.
Phase 9 now mandates capturing the timestamp from the shell at write
time via `date -Iseconds` and forbids placeholders — omit the field
rather than fabricate one.
2026-05-07 05:45:35 +02:00
5b5c59aa84 feat(opencode): mandate stub-first @make pre-pass for Rust integration TDD
Rust integration tests live in a separate test crate that imports from
lib.rs, so any test referencing not-yet-existing public API can only
RED at build time. The build error masks assertion diagnostics and
makes the RED state opaque — no stack trace, no left/right values.

For Rust tasks whose @test step writes an integration test against
public API that does not yet exist, the orchestrator now dispatches a
stub-first @make pass before @test runs:

1. @make adds the planned public API as todo!()-bodied stubs in
   lib.rs and any new src/<module>.rs. Signatures lifted verbatim
   from the Phase 5 task spec. Acceptance criterion is cargo check
   only — no test command runs.
2. @test writes the integration test, which now compiles and panics
   at todo!() with a stack trace — a clean MISSING_BEHAVIOR RED.
3. Phase 7 dispatches @make again to replace the todo!() bodies with
   real implementations. Two atomic commits per task: scaffold then
   implement.

Phase 5's Rust test-path guidance now flags the two-dispatch
requirement up front. test.md's Rust failure-classification hints
recognize todo!() / unimplemented!() panics as MISSING_BEHAVIOR with
a pointer to the workflow's stub-first section.
2026-05-07 05:42:16 +02:00
832306c817 fix(opencode): harden workflow against multi-task spec dumps
A workflow run on a Rust/Bevy task produced a single @make dispatch
covering six tasks (~2 hours of work), with the orchestrator drafting
the full replacement code, including a self-contradicting "actually
that's wrong, let me correct…" revision pass and a `nix develop
--command bash -c "cargo check"` invocation that @make's sandbox
denies. None of the failure modes were caught before dispatch.

Phase 5 gains three new subsections:
- Split Heuristic — explicit rules for when a task must be split
  (>2 concerns, >50 lines / 2 files, structural+runtime+wiring mix);
  prescribes the foundations / implementation / wiring split.
- Code Context Anti-patterns — the field is for seam-revealing
  snippets, not finished answers; max ~5-line snippets, no full
  replacement bodies.
- Finalized-Text Rule — task specs must be single-author finalized
  text, no "actually, that's wrong" revision passes, no two-version
  code blocks, no unresolved questions.

Phase 6 promotes the Rust unit-only NOT_TESTABLE case out of the
decision table into a dedicated routing subsection. The orchestrator
must pass test *specifications* (one-line behavior descriptions,
target functions, assertion types) to @make — never test code — and
run the suite once after @make to capture RED→GREEN evidence.

Phase 7 gains a mandatory Pre-Dispatch Validation table that rejects
specs containing `bash -c` / `sh -c` (any nesting), `nix develop -c
bash`, `cd <path> &&`, oversized Code Context blocks, contradictory
revisions, or duplicated test bodies. Repeated trips signal a Phase
5 split problem and route back to splitting.
2026-05-06 20:25:40 +02:00
2 changed files with 95 additions and 5 deletions

View file

@ -200,6 +200,7 @@ After running tests, classify each failure:
**Mapping hints (Rust):**
- `error[E0432]: unresolved import` / `error[E0425]: cannot find function/value` for the symbol under test → `MISSING_BEHAVIOR`
- `error[E0599]: no method named ...` on a real but incomplete type → `MISSING_BEHAVIOR`
- Test panics with `not yet implemented` / `not implemented: …` (from `todo!()` or `unimplemented!()` in a stub body) → `MISSING_BEHAVIOR` (this is the expected RED state for stub-first integration TDD; see workflow Phase 6 "Rust integration TDD: stub-first")
- Test panics with `assertion failed: ... left: ..., right: ...``ASSERTION_MISMATCH`
- Test file fails to compile due to its own bug (typo, wrong type, unused-import-as-error) → `TEST_BROKEN`
- `linker not found`, missing system library, missing feature flag → `ENV_BROKEN`

View file

@ -111,7 +111,8 @@ The test file path must follow the language's actual test layout. **Do not inven
- Colocated: `<module>/tests/test_<feature>.py (create)`
- Top-level: `tests/test_<feature>.py (create)`
- **Rust**
- Crate-level integration tests: `tests/<feature>.rs (create)` (or, in a workspace, `<crate>/tests/<feature>.rs`)
- Crate-level integration tests: `tests/<feature>.rs (create)` (or, in a workspace, `<crate>/tests/<feature>.rs`).
- **If the test references not-yet-existing public API**, the task automatically requires a **stub-first `@make` pre-pass** before `@test` runs (see Phase 6 → "Rust integration TDD: stub-first"). Plan for two `@make` dispatches per such task: stub pass, then body pass.
- **Unit-test-only tasks (in-source `#[cfg(test)] mod tests`):** mark the task as `NOT_TESTABLE` with reason `Rust unit-only``@test` cannot write inside production source. `@make` writes those inline as part of its production change.
- **Polyglot Nix flake**
- Match the host language of the code under change (Python or Rust rules above), wrapping commands in `nix develop -c …` per the agents' devshell rule.
@ -122,6 +123,43 @@ Include **Test Design** from Phase 3 when available, attached to the relevant ta
**Task size:** ~10-30 minutes each, single coherent change, clear boundaries.
### Split Heuristic — when in doubt, split
A task must be **split** if any of the following apply:
- It touches more than two distinct concerns (e.g. *constants + new component + sprite spawn + new system + main wiring* is **five** concerns — at least three tasks).
- It changes more than ~50 lines across more than 2 files.
- It mixes data/structural changes (constants, types, components) with runtime/system changes (new ECS systems, scheduling, render loops).
- It mixes pure-logic changes (math helpers) with stateful changes (queries, world mutation).
- It mixes new APIs with their first call sites in the same task.
When a task fails the heuristic, split into:
1. **Foundations** — new constants, types, components (no behavior change yet).
2. **Implementation** — the actual production logic, calling the foundations.
3. **Wiring** — registration in `main.rs` / `lib.rs` / app-builder.
Each split is dispatched separately to `@make` and verified before the next.
### Code Context Anti-patterns
The **Code Context** field exists so `@make` can find the seam to modify, not so it can read off a finished answer. Strictly follow:
- **Provide:** the existing code being replaced (verbatim), the surrounding ~510 lines of context, function signatures of helpers `@make` will need to call, the file's relevant import block.
- **Do NOT provide:** a complete drop-in replacement, the new function bodies, the test bodies (those come from `@test` or — for unit-only Rust — from `@make` itself per Phase 6), or any "here is what to write" code block longer than ~5 lines.
If the task is so well-specified that you've already written the implementation, the task is too small for `@make` (apply it directly) or you've over-determined the design (revisit Phase 3).
### Finalized-Text Rule
Each task spec must be **finalized** before dispatch — single-author text with no contradictions. **Forbidden in dispatch prompts:**
- "Actually, that's wrong — let me correct…"
- "Wait, let me revise…"
- Two versions of the same code block with one labelled "corrected"
- Open questions or ambiguities the orchestrator hasn't resolved
If you find yourself revising while writing the spec, stop, redo the spec from scratch with the corrected understanding, and only then dispatch. `@make` is a fresh-context implementer; it cannot reliably resolve which of two contradictory drafts is canonical.
---
## Phase 6: Write Tests
@ -157,10 +195,46 @@ If any non-matching file appears, or any anti-pattern matches: discard `@test` o
|-----------|--------|
| `TESTS_READY` + `escalate_to_check: false` | Proceed to Phase 7 |
| `TESTS_READY` + `escalate_to_check: true` | Route tests to `@check` for light review. `@check` diagnoses, caller routes fixes to `@test`. Then proceed. |
| `NOT_TESTABLE` | Route to `@check` for sign-off on justification. If approved, task goes to `@make` without tests. |
| `NOT_TESTABLE` (general reasons) | Route to `@check` for sign-off on justification. If approved, task goes to `@make` without tests. |
| `NOT_TESTABLE` reason `Rust unit-only` | See "Rust unit-only routing" below. **Do not** include test code in the `@make` spec; pass test specs only. |
| `BLOCKED` | Investigate. May need to revise task spec or plan. |
| Test passes immediately | Investigate — behavior may already exist. Task spec may be wrong. |
### Rust unit-only routing
When `@test` returns `NOT_TESTABLE: Rust unit-only` (the implementation needs in-source `#[cfg(test)] mod tests` blocks that `@test` is forbidden from writing), the orchestrator must:
1. Get `@check`'s sign-off on the justification (no integration-test seam exists).
2. Build the `@make` spec with **test specifications**, not test code:
- "Add `#[cfg(test)] mod foo_tests` at the bottom of `<file>` exercising:"
- For each behavior, a one-line description: input → expected output, edge case to cover, error path to assert.
- Where applicable, name the function under test and the assertion type (`assert_eq!`, `assert!`, panic on invalid input).
3. **Forbidden in the `@make` spec:** complete `#[test] fn …` bodies, full module blocks, or any `@test`-style RED-verified test code. `@make` writes the inline tests itself based on the spec.
4. After `@make` completes, the orchestrator runs the test suite once to confirm RED→GREEN evidence and includes it in the workflow summary.
This keeps the agents in their lanes: `@test` never writes inside `src/`, `@make` writes both the tests and the production code in a single coherent change, and the orchestrator sees explicit test pass evidence.
### Rust integration TDD: stub-first (mandatory)
Rust integration tests live in a separate test crate (`tests/<feature>.rs`) that imports from `lib.rs`. Any test referencing not-yet-existing public API can only RED at *build* time, which masks assertion diagnostics. To avoid this, **for every Rust task whose `@test` step writes an integration test against public API that does not yet exist**, dispatch a stub-first `@make` pass *before* `@test` runs:
**Stub pass (split from Phase 7's body pass):**
1. Dispatch `@make` in **standard mode** (no tests exist yet) with this exact scope:
- **Goal:** add the planned public API as `todo!()`-bodied stubs so the integration test will compile.
- **Files to modify:** `src/lib.rs` (add `pub mod …;` declarations) plus any new `src/<module>.rs` files containing the stub functions/structs.
- **Stubs only:** every function body is exactly `todo!()`. Every method body is exactly `todo!()`. Public structs may use `pub struct Foo;` or `pub struct Foo { /* fields TBD */ }` — but no logic.
- **Signatures must match the planned final API exactly** (return types, lifetimes, generics) — otherwise the integration test will mismatch later. Lift signatures from the Phase 3 plan / Phase 5 task spec.
- **Acceptance criteria:** `cargo check` (wrapped in `nix develop -c …` if the project has a devshell) passes; no test command is run.
- **Code Context Anti-patterns still apply:** the stub pass is small and finalized — no draft bodies, no contradictory signatures.
2. Verify `cargo check` passed in `@make`'s output. If not, fix and re-dispatch the stub pass before continuing.
3. Dispatch `@test` as normal. The integration test now compiles; running it panics on `todo!()` at runtime, which is a clean `MISSING_BEHAVIOR` RED with a stack trace — far better than the build-error-RED form.
4. Continue to Phase 7's body pass (`@make` in TDD mode), where the same files are revisited and the `todo!()` bodies are replaced.
**This routing is mandatory** for new public API in Rust. It is **not** required when the integration test exercises an existing public API (e.g. a behavior fix where the function already exists) — in that case `@test` runs directly and `@make` modifies the body in Phase 7.
The stub pass and the body pass each produce their own atomic commit (per Phase 9 rules): `feat(<scope>): scaffold <thing> with todo!() stubs` followed by `feat(<scope>): implement <thing>` (or whichever conventional type fits).
**Parallelism:** Independent tasks can have tests written in parallel.
**Constraint:** `@test` must not modify existing conftest.py files (prevents collision during parallel execution).
@ -169,10 +243,25 @@ If any non-matching file appears, or any anti-pattern matches: discard `@test` o
## Phase 7: Implement
Execute each task by dispatching `@make` with:
- The task spec (from Phase 5)
- Relevant code context (actual snippets)
- The task spec (from Phase 5, finalized — see Finalized-Text Rule)
- Relevant code context (seam-revealing snippets only — see Code Context Anti-patterns)
- **Pre-written failing tests and handoff from `@test` (if TESTS_READY)**
### Pre-Dispatch Validation (MANDATORY)
Before sending the spec to `@make`, scan it and reject (revise, then retry) if any of the following are present:
| Check | Why it matters |
|---|---|
| `bash -c`, `sh -c`, `zsh -c`, `fish -c` (anywhere, including inside `nix develop --command bash -c …`) | `@make`'s sandbox denies all `*-c` shell invocations and any nested `bash` would bypass the per-command allowlist. Replace with one direct command per line: `nix develop -c cargo check`, `nix develop -c cargo test`, etc. |
| `nix develop --command bash` / `nix develop -c bash` / `nix develop -c sh` | Same — the inner shell escapes the sandbox. Wrap each toolchain command directly. |
| Any `cd <path> && …` | `@make` cannot `cd`. Rewrite to use absolute paths or `git -C <path>` for git operations (and `@make` doesn't run git anyway). |
| Code blocks longer than ~5 lines under "Code Context" or labelled as the answer | Violates Code Context Anti-patterns. Trim to the seam. |
| Two versions of the same code, "actually let me correct…", or open questions | Violates the Finalized-Text Rule. Redo the spec. |
| Test bodies inside the `@make` spec when tests are coming from `@test` | The TDD handoff already provides them; duplicating creates conflict. |
If any check trips, **do not dispatch.** Fix the spec and re-validate. Repeated trips on the same task signal a Phase 5 split problem — go back and split.
`@make` runs in TDD mode when tests are provided:
1. Entry validation: run tests, verify RED, check failure codes match handoff
2. Implement minimal code to make tests pass (GREEN)
@ -234,7 +323,7 @@ The workflow is forge-agnostic. It commits locally and stops. **Do not push, and
### Local Summary
- Write `.opencode/workflow-summary.md` in the worktree with:
- Run timestamp
- **Run timestamp** — capture it from the shell at write time: `date -Iseconds` (e.g. `2026-05-07T11:24:13+02:00`). **Do not** use a placeholder like `???:???:??` or "session date" — if you cannot get a real timestamp, omit the field entirely rather than fabricating one.
- Issue reference and title
- Branch name and final commit SHA(s)
- Summary of implementation