refactor(opencode): allow @test inside #[cfg(test)] mod blocks, drop file gate

The previous design routed Rust unit tests to NOT_TESTABLE: Rust
unit-only because @test was forbidden from touching src/, which
forced @make to write both the production code and the inline
#[cfg(test)] mod tests in one dispatch — losing TDD's RED→GREEN
separation. But Rust module tests inside #[cfg(test)] mod tests
{ ... } are the canonical unit-testing idiom, not an edge case.

@test's File Constraint now allows modifying src/**/*.rs, but
strictly inside #[cfg(test)] mod <name> { ... } blocks. Every line
outside such a block stays read-only — adding pub, importing crates,
declaring siblings, or any other production change is forbidden.
Integration tests at tests/**/*.rs continue to work as before.

The Phase 6 post-step file gate (git status snapshot + comm -23
diff against test-pattern globs) is removed. With @test legitimately
writing inside src/, a path-based gate proves nothing — production
edits and cfg(test) edits live in the same files. The boundary is
enforced by the prompt rule and Phase 8 reviewer scrutiny.

Phase 5 test-file guidance updated to distinguish module vs
integration tests for Rust, with stub-first TDD applying to both
when symbols don't yet exist. The "Rust integration TDD: stub-first"
section is renamed to "Rust stub-first TDD" and now covers module
tests too. NOT_TESTABLE's "Rust unit-only" reason is replaced with
"Missing testability seam" for cases where the production code
needs a small change before tests can be authored.
This commit is contained in:
Harald Hoyer 2026-05-07 14:34:04 +02:00
parent 8373e32f34
commit 4dc3cffba6
2 changed files with 28 additions and 67 deletions

View file

@ -105,7 +105,7 @@ Do **not** provide:
- Drop-in code blocks longer than ~5 lines that constitute "the answer"
- Full function bodies for the changes being planned
- Complete `match` arms / branch logic / loop bodies for new behavior
- Pre-written test bodies (those come from `@test`, or from `@make` for Rust unit-only)
- Pre-written test bodies (those come from `@test`)
- Stage-by-stage code transformations spelled out as ready-to-commit diffs
If you've already written the implementation in the plan or spec, the artifact has overstepped. Convert finished code into structural description (signature + intent) and let `@make` produce the body.
@ -176,9 +176,10 @@ The test file path must follow the language's actual test layout. **Do not inven
- Colocated: `<module>/tests/test_<feature>.py (create)`
- Top-level: `tests/test_<feature>.py (create)`
- **Rust**
- Crate-level integration tests: `tests/<feature>.rs (create)` (or, in a workspace, `<crate>/tests/<feature>.rs`).
- **If the test references not-yet-existing public API**, the task automatically requires a **stub-first `@make` pre-pass** before `@test` runs (see Phase 6 → "Rust integration TDD: stub-first"). Plan for two `@make` dispatches per such task: stub pass, then body pass.
- **Unit-test-only tasks (in-source `#[cfg(test)] mod tests`):** mark the task as `NOT_TESTABLE` with reason `Rust unit-only``@test` cannot write inside production source. `@make` writes those inline as part of its production change.
- **Module tests** (most common — testing private/crate-internal functions): pick the relevant production source file, e.g. `src/<module>.rs`. `@test` is permitted to add or edit content **only inside `#[cfg(test)] mod <name> { … }` blocks** in that file (per `@test`'s File Constraint). The rest of the file remains read-only to `@test`.
- **Integration tests** (testing the crate's public API as a black box): `tests/<feature>.rs (create)`, or in a workspace `<crate>/tests/<feature>.rs`.
- **In both cases**, if the test references not-yet-existing functions/types, the task requires a **stub-first `@make` pre-pass** so the symbols exist as `todo!()` bodies before `@test` runs. See Phase 6 → "Rust stub-first TDD". Plan for two `@make` dispatches per such task: stub pass, then body pass.
- **`src/tests/<feature>.rs` is not a valid path** — it would be a regular submodule needing `mod tests;` in production code. Use one of the two forms above.
- **Polyglot Nix flake**
- Match the host language of the code under change (Python or Rust rules above), wrapping commands in `nix develop -c …` per the agents' devshell rule.
@ -232,73 +233,44 @@ For each task from Phase 5, dispatch `@test` with:
`@test` writes failing tests and verifies RED with structured failure codes.
**Post-step file gate (MANDATORY):**
Before dispatching `@test`, snapshot every modified, staged, *and untracked* file. `git diff --name-only` alone misses untracked files, which is precisely the state of any new test file `@test` creates (it cannot `git add`). Use `git status --porcelain` so the gate sees them:
```bash
git status --porcelain | sed 's/^...//' | sort -u > /tmp/pre_test_baseline.txt
```
After `@test` completes, list NEW changes (in the post-snapshot but not the pre-snapshot):
```bash
git status --porcelain | sed 's/^...//' | sort -u | comm -23 - /tmp/pre_test_baseline.txt > /tmp/test_new_files.txt
```
Each line in `/tmp/test_new_files.txt` is a file path that did not exist (or was unmodified) before `@test` ran. The gate validates each one against the patterns below.
All new files must match the project's test patterns:
- Python: `**/test_*.py`, `**/*_test.py`, `**/conftest.py` (new only), `**/test_data/**`, `**/test_fixtures/**`
- Rust: `tests/**/*.rs`, `**/tests/**/*.rs` (workspace-style `<crate>/tests/...`), `**/test_data/**`, `**/test_fixtures/**`
**Anti-patterns — discard the output even if the glob matches:**
- Anything under `src/` for Rust (e.g. `src/tests/foo.rs`, `src/**/tests/...`). `src/tests/` is a regular module path under `src/`, not a Rust test location, and `@test` cannot wire it up via `mod` declarations in production source. Such paths indicate the task spec gave a wrong test path — escalate, don't accept the file.
If any non-matching file appears, or any anti-pattern matches: discard `@test` output, report violation.
**Decision table — handling `@test` results:**
| Condition | Action |
|-----------|--------|
| `TESTS_READY` + `escalate_to_check: false` | Proceed to Phase 7 |
| `TESTS_READY` + `escalate_to_check: true` | Route tests to `@check` for light review. `@check` diagnoses, caller routes fixes to `@test`. Then proceed. |
| `NOT_TESTABLE` (general reasons) | Route to `@check` for sign-off on justification. If approved, task goes to `@make` without tests. |
| `NOT_TESTABLE` reason `Rust unit-only` | See "Rust unit-only routing" below. **Do not** include test code in the `@make` spec; pass test specs only. |
| `NOT_TESTABLE` | Route to `@check` for sign-off on justification. If `Missing testability seam`, dispatch `@make` to add the seam first, then re-run `@test`. Otherwise the task goes to `@make` without tests. |
| `BLOCKED` | Investigate. May need to revise task spec or plan. |
| Test passes immediately | Investigate — behavior may already exist. Task spec may be wrong. |
| Stub-first run: tests pass with zero `todo!()` panics | **Structural-only tests.** Every test is asserting type/struct/enum facts without calling any stubbed symbol. Reject the test output and route back to `@test` with a "must exercise the stubbed symbols by calling them" note. Do not let these tests gate Phase 7 — they cannot RED→GREEN, so the body-pass `@make` would commit code with false-green coverage. |
### Rust unit-only routing
### Rust stub-first TDD (mandatory for new symbols)
When `@test` returns `NOT_TESTABLE: Rust unit-only` (the implementation needs in-source `#[cfg(test)] mod tests` blocks that `@test` is forbidden from writing), the orchestrator must:
Whenever `@test` will write tests (module or integration) that reference functions / methods / types **that do not yet exist**, the test cannot RED meaningfully against absent code:
1. Get `@check`'s sign-off on the justification (no integration-test seam exists).
2. Build the `@make` spec with **test specifications**, not test code:
- "Add `#[cfg(test)] mod foo_tests` at the bottom of `<file>` exercising:"
- For each behavior, a one-line description: input → expected output, edge case to cover, error path to assert.
- Where applicable, name the function under test and the assertion type (`assert_eq!`, `assert!`, panic on invalid input).
3. **Forbidden in the `@make` spec:** complete `#[test] fn …` bodies, full module blocks, or any `@test`-style RED-verified test code. `@make` writes the inline tests itself based on the spec.
4. After `@make` completes, the orchestrator runs the test suite once to confirm RED→GREEN evidence and includes it in the workflow summary.
- *Module tests inside `src/<module>.rs`* — without the function, the `#[cfg(test)] mod tests` block fails to compile (`error[E0425]`), masking assertion diagnostics.
- *Integration tests inside `tests/<feature>.rs`* — same, but mediated through `lib.rs` re-exports.
This keeps the agents in their lanes: `@test` never writes inside `src/`, `@make` writes both the tests and the production code in a single coherent change, and the orchestrator sees explicit test pass evidence.
### Rust integration TDD: stub-first (mandatory)
Rust integration tests live in a separate test crate (`tests/<feature>.rs`) that imports from `lib.rs`. Any test referencing not-yet-existing public API can only RED at *build* time, which masks assertion diagnostics. To avoid this, **for every Rust task whose `@test` step writes an integration test against public API that does not yet exist**, dispatch a stub-first `@make` pass *before* `@test` runs:
To get a clean runtime RED, dispatch a **stub-first `@make` pass** *before* `@test` runs:
**Stub pass (split from Phase 7's body pass):**
1. Dispatch `@make` in **standard mode** (no tests exist yet) with this exact scope:
- **Goal:** add the planned public API as `todo!()`-bodied stubs so the integration test will compile.
- **Files to modify:** `src/lib.rs` (add `pub mod …;` declarations) plus any new `src/<module>.rs` files containing the stub functions/structs.
- **Stubs only:** every function body is exactly `todo!()`. Every method body is exactly `todo!()`. Public structs may use `pub struct Foo;` or `pub struct Foo { /* fields TBD */ }` — but no logic.
- **Signatures must match the planned final API exactly** (return types, lifetimes, generics) — otherwise the integration test will mismatch later. Lift signatures from the Phase 3 plan / Phase 5 task spec.
- **Goal:** add the planned API as `todo!()`-bodied stubs so the test will compile.
- **Files to modify:** the relevant `src/<module>.rs` for module tests, or `src/lib.rs` plus any new `src/<module>.rs` for integration tests (the latter need `pub mod …;` declarations so the test crate can import).
- **Stubs only:** every function body is exactly `todo!()`. Every method body is exactly `todo!()`. Structs may use `pub struct Foo;` or `pub struct Foo { /* fields TBD */ }` — but no logic.
- **Signatures must match the planned final API exactly** (return types, lifetimes, generics, visibility). Lift signatures from the Phase 3 plan / Phase 5 task spec.
- **Acceptance criteria:** `cargo check` (wrapped in `nix develop -c …` if the project has a devshell) passes; no test command is run.
- **Dispatch Hygiene still applies:** the stub pass is small and finalized — no draft bodies, no contradictory signatures.
2. Verify `cargo check` passed in `@make`'s output. If not, fix and re-dispatch the stub pass before continuing.
3. Dispatch `@test` as normal. The integration test now compiles; running it panics on `todo!()` at runtime, which is a clean `MISSING_BEHAVIOR` RED with a stack trace — far better than the build-error-RED form.
3. Dispatch `@test`. The test now compiles; running it panics on `todo!()` at runtime, which is a clean `MISSING_BEHAVIOR` RED with a stack trace — far better than the build-error-RED form.
4. **Panic-coverage check (MANDATORY).** After `@test` returns, re-run the test command in the orchestrator and verify that **every test in the new file panics on `todo!()`** (i.e. every test exercises at least one of the stubbed symbols). The decision rule:
- If the test output shows N panics for N tests → proceed to body pass.
- If any test passes without a `todo!()` panic → that test is structural-only (asserting type / variant-count / field facts without calling the stubbed code). **Reject** `@test`'s output and route back with the "Stub-first run: tests pass with zero `todo!()` panics" decision-table verdict. Require `@test` to rewrite each non-panicking test so it actually invokes the stubbed function/method.
- This check is the only thing standing between false-green coverage and the body-pass commit. Skipping it has produced regressions like a system that compiles, "passes" tests, and silently no-ops in production.
5. Continue to Phase 7's body pass (`@make` in TDD mode), where the same files are revisited and the `todo!()` bodies are replaced.
**This routing is mandatory** for new public API in Rust. It is **not** required when the integration test exercises an existing public API (e.g. a behavior fix where the function already exists) — in that case `@test` runs directly and `@make` modifies the body in Phase 7.
**This routing is mandatory** whenever new symbols are introduced in Rust (module or integration). It is **not** required when the test exercises an *existing* function/method (e.g. a behavior fix) — in that case `@test` runs directly and `@make` modifies the body in Phase 7.
The stub pass and the body pass each produce their own atomic commit (per Phase 9 rules): `feat(<scope>): scaffold <thing> with todo!() stubs` followed by `feat(<scope>): implement <thing>` (or whichever conventional type fits).