refactor(opencode): allow @test inside #[cfg(test)] mod blocks, drop file gate

The previous design routed Rust unit tests to NOT_TESTABLE: Rust
unit-only because @test was forbidden from touching src/, which
forced @make to write both the production code and the inline
#[cfg(test)] mod tests in one dispatch — losing TDD's RED→GREEN
separation. But Rust module tests inside #[cfg(test)] mod tests
{ ... } are the canonical unit-testing idiom, not an edge case.

@test's File Constraint now allows modifying src/**/*.rs, but
strictly inside #[cfg(test)] mod <name> { ... } blocks. Every line
outside such a block stays read-only — adding pub, importing crates,
declaring siblings, or any other production change is forbidden.
Integration tests at tests/**/*.rs continue to work as before.

The Phase 6 post-step file gate (git status snapshot + comm -23
diff against test-pattern globs) is removed. With @test legitimately
writing inside src/, a path-based gate proves nothing — production
edits and cfg(test) edits live in the same files. The boundary is
enforced by the prompt rule and Phase 8 reviewer scrutiny.

Phase 5 test-file guidance updated to distinguish module vs
integration tests for Rust, with stub-first TDD applying to both
when symbols don't yet exist. The "Rust integration TDD: stub-first"
section is renamed to "Rust stub-first TDD" and now covers module
tests too. NOT_TESTABLE's "Rust unit-only" reason is replaced with
"Missing testability seam" for cases where the production code
needs a small change before tests can be authored.
This commit is contained in:
Harald Hoyer 2026-05-07 14:34:04 +02:00
parent 8373e32f34
commit 4dc3cffba6
2 changed files with 28 additions and 67 deletions

View file

@ -113,29 +113,18 @@ Python:
- `**/test_data/**` - `**/test_data/**`
- `**/test_fixtures/**` - `**/test_fixtures/**`
Rust (integration tests only — see "Rust unit tests" below): Rust:
- `tests/**/*.rs` (crate-level integration tests directory) - **Integration tests:** `tests/**/*.rs` and `**/tests/**/*.rs` (workspace-style `<crate>/tests/...`). Create new files; do not modify existing integration tests in unrelated tasks.
- `**/tests/**/*.rs` (per-crate integration tests in workspace layouts) - **Module tests:** `src/**/*.rs` — but **only inside `#[cfg(test)] mod <name> { … }` blocks**. You may:
- `**/test_data/**` - Append a new `#[cfg(test)] mod tests { use super::*; … }` block at the end of an existing source file.
- `**/test_fixtures/**` - Add new `#[test] fn` items inside an already-existing `#[cfg(test)] mod` block.
- Edit/remove `#[test] fn` items you previously authored inside such a block.
- **Test data / fixtures:** `**/test_data/**`, `**/test_fixtures/**`.
**Anti-patterns — refuse the path even if the glob above matches:** **Strict boundary rule for Rust module tests:** every line outside a `#[cfg(test)] mod` block is read-only. Adding `pub`, changing function signatures, importing crates, declaring new `pub mod` siblings, touching the prelude, or any other production-code edit is forbidden — those changes belong to `@make`. If the test cannot be written without such a change, report the missing seam to the caller and return `NOT_TESTABLE` (or, for a fresh public API, request a stub-first `@make` pre-pass).
- Anything under `src/` (e.g. `src/tests/foo.rs`, `src/**/tests/...`). `src/tests/` is a regular module under `src/`; it would require declaring `mod tests;` in production code (`lib.rs` / `main.rs`) and creating `mod.rs`, which you cannot do. If the caller asks for such a path, treat it as a wrong task spec: return `BLOCKED` with a note that the path is not a valid Rust test location, suggesting `tests/<feature>.rs` (or `NOT_TESTABLE: Rust unit-only` if the test really needs to be in-source).
**You may NOT modify production/source code under any circumstances.** **Anti-patterns — refuse the path even if it would technically be writable:**
- `src/tests/foo.rs` and similar regular submodule paths under `src/`. These are not `#[cfg(test)]` modules — they are normal modules that would require a `mod tests;` declaration in production code (`lib.rs` / `main.rs`), which you cannot add. Report as `BLOCKED` and suggest either `tests/<feature>.rs` (integration) or a `#[cfg(test)] mod tests` block inside the relevant `src/<module>.rs`.
### Rust unit tests
Rust unit tests live inside production source files (inside `#[cfg(test)] mod tests { ... }` blocks in `src/**/*.rs`). Because that would require modifying production code, **you do not write Rust unit tests.** Options when the task spec requests unit-level coverage in Rust:
1. Convert to an integration test under `tests/` if the unit is part of the public API.
2. Return `NOT_TESTABLE` with reason `pure-wiring` or `external-system` if no integration-level seam exists, and let `@make` write the in-source tests.
Report this constraint to the caller rather than silently degrading coverage.
If you believe source code needs changes to be testable, report this to the caller — do not edit it yourself.
This constraint is enforced by a post-step file gate. Violations cause your output to be discarded.
## Test Philosophy ## Test Philosophy
@ -257,7 +246,7 @@ You may return `NOT_TESTABLE` only for these allowed reasons:
| **External system without harness** | Change only affects API call to service with no local mock possible | | **External system without harness** | Change only affects API call to service with no local mock possible |
| **Non-deterministic** | GPU numerical results, timing-dependent behavior | | **Non-deterministic** | GPU numerical results, timing-dependent behavior |
| **Pure wiring** | Decorator swap, import / `use` reorganization, no logic change | | **Pure wiring** | Decorator swap, import / `use` reorganization, no logic change |
| **Rust unit-only** | Coverage requires `#[cfg(test)]` mod tests in production source; @test cannot write those — let @make handle it | | **Missing testability seam** | Test would require a production-code change beyond a `#[cfg(test)] mod` block (e.g. a private function needs `pub(crate)`, a refactor exposes a hook). Report the missing seam so `@make` can add it before tests are authored. |
Must provide: Must provide:
- Which allowed reason applies - Which allowed reason applies

View file

@ -105,7 +105,7 @@ Do **not** provide:
- Drop-in code blocks longer than ~5 lines that constitute "the answer" - Drop-in code blocks longer than ~5 lines that constitute "the answer"
- Full function bodies for the changes being planned - Full function bodies for the changes being planned
- Complete `match` arms / branch logic / loop bodies for new behavior - Complete `match` arms / branch logic / loop bodies for new behavior
- Pre-written test bodies (those come from `@test`, or from `@make` for Rust unit-only) - Pre-written test bodies (those come from `@test`)
- Stage-by-stage code transformations spelled out as ready-to-commit diffs - Stage-by-stage code transformations spelled out as ready-to-commit diffs
If you've already written the implementation in the plan or spec, the artifact has overstepped. Convert finished code into structural description (signature + intent) and let `@make` produce the body. If you've already written the implementation in the plan or spec, the artifact has overstepped. Convert finished code into structural description (signature + intent) and let `@make` produce the body.
@ -176,9 +176,10 @@ The test file path must follow the language's actual test layout. **Do not inven
- Colocated: `<module>/tests/test_<feature>.py (create)` - Colocated: `<module>/tests/test_<feature>.py (create)`
- Top-level: `tests/test_<feature>.py (create)` - Top-level: `tests/test_<feature>.py (create)`
- **Rust** - **Rust**
- Crate-level integration tests: `tests/<feature>.rs (create)` (or, in a workspace, `<crate>/tests/<feature>.rs`). - **Module tests** (most common — testing private/crate-internal functions): pick the relevant production source file, e.g. `src/<module>.rs`. `@test` is permitted to add or edit content **only inside `#[cfg(test)] mod <name> { … }` blocks** in that file (per `@test`'s File Constraint). The rest of the file remains read-only to `@test`.
- **If the test references not-yet-existing public API**, the task automatically requires a **stub-first `@make` pre-pass** before `@test` runs (see Phase 6 → "Rust integration TDD: stub-first"). Plan for two `@make` dispatches per such task: stub pass, then body pass. - **Integration tests** (testing the crate's public API as a black box): `tests/<feature>.rs (create)`, or in a workspace `<crate>/tests/<feature>.rs`.
- **Unit-test-only tasks (in-source `#[cfg(test)] mod tests`):** mark the task as `NOT_TESTABLE` with reason `Rust unit-only``@test` cannot write inside production source. `@make` writes those inline as part of its production change. - **In both cases**, if the test references not-yet-existing functions/types, the task requires a **stub-first `@make` pre-pass** so the symbols exist as `todo!()` bodies before `@test` runs. See Phase 6 → "Rust stub-first TDD". Plan for two `@make` dispatches per such task: stub pass, then body pass.
- **`src/tests/<feature>.rs` is not a valid path** — it would be a regular submodule needing `mod tests;` in production code. Use one of the two forms above.
- **Polyglot Nix flake** - **Polyglot Nix flake**
- Match the host language of the code under change (Python or Rust rules above), wrapping commands in `nix develop -c …` per the agents' devshell rule. - Match the host language of the code under change (Python or Rust rules above), wrapping commands in `nix develop -c …` per the agents' devshell rule.
@ -232,73 +233,44 @@ For each task from Phase 5, dispatch `@test` with:
`@test` writes failing tests and verifies RED with structured failure codes. `@test` writes failing tests and verifies RED with structured failure codes.
**Post-step file gate (MANDATORY):**
Before dispatching `@test`, snapshot every modified, staged, *and untracked* file. `git diff --name-only` alone misses untracked files, which is precisely the state of any new test file `@test` creates (it cannot `git add`). Use `git status --porcelain` so the gate sees them:
```bash
git status --porcelain | sed 's/^...//' | sort -u > /tmp/pre_test_baseline.txt
```
After `@test` completes, list NEW changes (in the post-snapshot but not the pre-snapshot):
```bash
git status --porcelain | sed 's/^...//' | sort -u | comm -23 - /tmp/pre_test_baseline.txt > /tmp/test_new_files.txt
```
Each line in `/tmp/test_new_files.txt` is a file path that did not exist (or was unmodified) before `@test` ran. The gate validates each one against the patterns below.
All new files must match the project's test patterns:
- Python: `**/test_*.py`, `**/*_test.py`, `**/conftest.py` (new only), `**/test_data/**`, `**/test_fixtures/**`
- Rust: `tests/**/*.rs`, `**/tests/**/*.rs` (workspace-style `<crate>/tests/...`), `**/test_data/**`, `**/test_fixtures/**`
**Anti-patterns — discard the output even if the glob matches:**
- Anything under `src/` for Rust (e.g. `src/tests/foo.rs`, `src/**/tests/...`). `src/tests/` is a regular module path under `src/`, not a Rust test location, and `@test` cannot wire it up via `mod` declarations in production source. Such paths indicate the task spec gave a wrong test path — escalate, don't accept the file.
If any non-matching file appears, or any anti-pattern matches: discard `@test` output, report violation.
**Decision table — handling `@test` results:** **Decision table — handling `@test` results:**
| Condition | Action | | Condition | Action |
|-----------|--------| |-----------|--------|
| `TESTS_READY` + `escalate_to_check: false` | Proceed to Phase 7 | | `TESTS_READY` + `escalate_to_check: false` | Proceed to Phase 7 |
| `TESTS_READY` + `escalate_to_check: true` | Route tests to `@check` for light review. `@check` diagnoses, caller routes fixes to `@test`. Then proceed. | | `TESTS_READY` + `escalate_to_check: true` | Route tests to `@check` for light review. `@check` diagnoses, caller routes fixes to `@test`. Then proceed. |
| `NOT_TESTABLE` (general reasons) | Route to `@check` for sign-off on justification. If approved, task goes to `@make` without tests. | | `NOT_TESTABLE` | Route to `@check` for sign-off on justification. If `Missing testability seam`, dispatch `@make` to add the seam first, then re-run `@test`. Otherwise the task goes to `@make` without tests. |
| `NOT_TESTABLE` reason `Rust unit-only` | See "Rust unit-only routing" below. **Do not** include test code in the `@make` spec; pass test specs only. |
| `BLOCKED` | Investigate. May need to revise task spec or plan. | | `BLOCKED` | Investigate. May need to revise task spec or plan. |
| Test passes immediately | Investigate — behavior may already exist. Task spec may be wrong. | | Test passes immediately | Investigate — behavior may already exist. Task spec may be wrong. |
| Stub-first run: tests pass with zero `todo!()` panics | **Structural-only tests.** Every test is asserting type/struct/enum facts without calling any stubbed symbol. Reject the test output and route back to `@test` with a "must exercise the stubbed symbols by calling them" note. Do not let these tests gate Phase 7 — they cannot RED→GREEN, so the body-pass `@make` would commit code with false-green coverage. | | Stub-first run: tests pass with zero `todo!()` panics | **Structural-only tests.** Every test is asserting type/struct/enum facts without calling any stubbed symbol. Reject the test output and route back to `@test` with a "must exercise the stubbed symbols by calling them" note. Do not let these tests gate Phase 7 — they cannot RED→GREEN, so the body-pass `@make` would commit code with false-green coverage. |
### Rust unit-only routing ### Rust stub-first TDD (mandatory for new symbols)
When `@test` returns `NOT_TESTABLE: Rust unit-only` (the implementation needs in-source `#[cfg(test)] mod tests` blocks that `@test` is forbidden from writing), the orchestrator must: Whenever `@test` will write tests (module or integration) that reference functions / methods / types **that do not yet exist**, the test cannot RED meaningfully against absent code:
1. Get `@check`'s sign-off on the justification (no integration-test seam exists). - *Module tests inside `src/<module>.rs`* — without the function, the `#[cfg(test)] mod tests` block fails to compile (`error[E0425]`), masking assertion diagnostics.
2. Build the `@make` spec with **test specifications**, not test code: - *Integration tests inside `tests/<feature>.rs`* — same, but mediated through `lib.rs` re-exports.
- "Add `#[cfg(test)] mod foo_tests` at the bottom of `<file>` exercising:"
- For each behavior, a one-line description: input → expected output, edge case to cover, error path to assert.
- Where applicable, name the function under test and the assertion type (`assert_eq!`, `assert!`, panic on invalid input).
3. **Forbidden in the `@make` spec:** complete `#[test] fn …` bodies, full module blocks, or any `@test`-style RED-verified test code. `@make` writes the inline tests itself based on the spec.
4. After `@make` completes, the orchestrator runs the test suite once to confirm RED→GREEN evidence and includes it in the workflow summary.
This keeps the agents in their lanes: `@test` never writes inside `src/`, `@make` writes both the tests and the production code in a single coherent change, and the orchestrator sees explicit test pass evidence. To get a clean runtime RED, dispatch a **stub-first `@make` pass** *before* `@test` runs:
### Rust integration TDD: stub-first (mandatory)
Rust integration tests live in a separate test crate (`tests/<feature>.rs`) that imports from `lib.rs`. Any test referencing not-yet-existing public API can only RED at *build* time, which masks assertion diagnostics. To avoid this, **for every Rust task whose `@test` step writes an integration test against public API that does not yet exist**, dispatch a stub-first `@make` pass *before* `@test` runs:
**Stub pass (split from Phase 7's body pass):** **Stub pass (split from Phase 7's body pass):**
1. Dispatch `@make` in **standard mode** (no tests exist yet) with this exact scope: 1. Dispatch `@make` in **standard mode** (no tests exist yet) with this exact scope:
- **Goal:** add the planned public API as `todo!()`-bodied stubs so the integration test will compile. - **Goal:** add the planned API as `todo!()`-bodied stubs so the test will compile.
- **Files to modify:** `src/lib.rs` (add `pub mod …;` declarations) plus any new `src/<module>.rs` files containing the stub functions/structs. - **Files to modify:** the relevant `src/<module>.rs` for module tests, or `src/lib.rs` plus any new `src/<module>.rs` for integration tests (the latter need `pub mod …;` declarations so the test crate can import).
- **Stubs only:** every function body is exactly `todo!()`. Every method body is exactly `todo!()`. Public structs may use `pub struct Foo;` or `pub struct Foo { /* fields TBD */ }` — but no logic. - **Stubs only:** every function body is exactly `todo!()`. Every method body is exactly `todo!()`. Structs may use `pub struct Foo;` or `pub struct Foo { /* fields TBD */ }` — but no logic.
- **Signatures must match the planned final API exactly** (return types, lifetimes, generics) — otherwise the integration test will mismatch later. Lift signatures from the Phase 3 plan / Phase 5 task spec. - **Signatures must match the planned final API exactly** (return types, lifetimes, generics, visibility). Lift signatures from the Phase 3 plan / Phase 5 task spec.
- **Acceptance criteria:** `cargo check` (wrapped in `nix develop -c …` if the project has a devshell) passes; no test command is run. - **Acceptance criteria:** `cargo check` (wrapped in `nix develop -c …` if the project has a devshell) passes; no test command is run.
- **Dispatch Hygiene still applies:** the stub pass is small and finalized — no draft bodies, no contradictory signatures. - **Dispatch Hygiene still applies:** the stub pass is small and finalized — no draft bodies, no contradictory signatures.
2. Verify `cargo check` passed in `@make`'s output. If not, fix and re-dispatch the stub pass before continuing. 2. Verify `cargo check` passed in `@make`'s output. If not, fix and re-dispatch the stub pass before continuing.
3. Dispatch `@test` as normal. The integration test now compiles; running it panics on `todo!()` at runtime, which is a clean `MISSING_BEHAVIOR` RED with a stack trace — far better than the build-error-RED form. 3. Dispatch `@test`. The test now compiles; running it panics on `todo!()` at runtime, which is a clean `MISSING_BEHAVIOR` RED with a stack trace — far better than the build-error-RED form.
4. **Panic-coverage check (MANDATORY).** After `@test` returns, re-run the test command in the orchestrator and verify that **every test in the new file panics on `todo!()`** (i.e. every test exercises at least one of the stubbed symbols). The decision rule: 4. **Panic-coverage check (MANDATORY).** After `@test` returns, re-run the test command in the orchestrator and verify that **every test in the new file panics on `todo!()`** (i.e. every test exercises at least one of the stubbed symbols). The decision rule:
- If the test output shows N panics for N tests → proceed to body pass. - If the test output shows N panics for N tests → proceed to body pass.
- If any test passes without a `todo!()` panic → that test is structural-only (asserting type / variant-count / field facts without calling the stubbed code). **Reject** `@test`'s output and route back with the "Stub-first run: tests pass with zero `todo!()` panics" decision-table verdict. Require `@test` to rewrite each non-panicking test so it actually invokes the stubbed function/method. - If any test passes without a `todo!()` panic → that test is structural-only (asserting type / variant-count / field facts without calling the stubbed code). **Reject** `@test`'s output and route back with the "Stub-first run: tests pass with zero `todo!()` panics" decision-table verdict. Require `@test` to rewrite each non-panicking test so it actually invokes the stubbed function/method.
- This check is the only thing standing between false-green coverage and the body-pass commit. Skipping it has produced regressions like a system that compiles, "passes" tests, and silently no-ops in production. - This check is the only thing standing between false-green coverage and the body-pass commit. Skipping it has produced regressions like a system that compiles, "passes" tests, and silently no-ops in production.
5. Continue to Phase 7's body pass (`@make` in TDD mode), where the same files are revisited and the `todo!()` bodies are replaced. 5. Continue to Phase 7's body pass (`@make` in TDD mode), where the same files are revisited and the `todo!()` bodies are replaced.
**This routing is mandatory** for new public API in Rust. It is **not** required when the integration test exercises an existing public API (e.g. a behavior fix where the function already exists) — in that case `@test` runs directly and `@make` modifies the body in Phase 7. **This routing is mandatory** whenever new symbols are introduced in Rust (module or integration). It is **not** required when the test exercises an *existing* function/method (e.g. a behavior fix) — in that case `@test` runs directly and `@make` modifies the body in Phase 7.
The stub pass and the body pass each produce their own atomic commit (per Phase 9 rules): `feat(<scope>): scaffold <thing> with todo!() stubs` followed by `feat(<scope>): implement <thing>` (or whichever conventional type fits). The stub pass and the body pass each produce their own atomic commit (per Phase 9 rules): `feat(<scope>): scaffold <thing> with todo!() stubs` followed by `feat(<scope>): implement <thing>` (or whichever conventional type fits).