fix(opencode): close two false-green test loopholes and the orchestrator-as-implementer escape hatch
A workflow run on a Bevy weaving feature exposed two compounding
failures:
1. @test wrote 8 structural-only Rust tests that never invoked
weave_enemies or trigger_weaving. Every test passed against the
stub-first @make pre-pass because none of them called the
stubbed symbols, so todo!() never fired. The body-pass committed
code that "passed" the suite and silently broke trigger_weaving
in special stages.
2. @check found the trigger_weaving regression at Phase 8 (final
review) and the orchestrator decided to "fix them directly"
rather than dispatching @make — taking the license offered by
the existing review-loop wording.
Test-quality fixes:
- Phase 3 Test Design now requires each behavior to be expressed as
an action + observable outcome. Structural facts ("enum has 3
variants", "struct has these fields") are explicitly disqualified.
- Phase 6 stub-first flow gains a mandatory Panic-coverage check:
after @test returns, the orchestrator re-runs the test command and
rejects the output unless every test panics on todo!() (i.e. every
test exercises at least one stubbed symbol). Any passing test is
structural-only and routes back to @test.
- Phase 6 decision table gets a "Stub-first run: tests pass with zero
todo!() panics" row covering the same case.
- @test's Test Philosophy gains an explicit Do-NOT-write list of
structural-only patterns (variant_count, type ascriptions,
Box::new(my_fn), struct-literal-only flows, all-pass-on-stubs)
plus a positive rule: every test must call a function and assert
on observable outcome, or return NOT_TESTABLE rather than pad the
suite.
Orchestrator boundary fix:
- Phase 8 review loop replaces "fix them directly (no need to
re-dispatch @make for small fixes)" with the principle "the
orchestrator does not write production code; @make does". BLOCK,
behavioral, correctness, and test-quality findings round-trip
through @make. Only AST-preserving cosmetic edits (typos in
comments, trailing newlines) may be applied directly. Compiler-
detected issues (unused imports, dead code) go through @make.
This commit is contained in:
parent
91e8aab383
commit
91ba5bd272
2 changed files with 20 additions and 4 deletions
|
|
@ -149,6 +149,14 @@ This constraint is enforced by a post-step file gate. Violations cause your outp
|
||||||
- Trivial tests (constructor creates object, getter returns value)
|
- Trivial tests (constructor creates object, getter returns value)
|
||||||
- Tests that assert on mock behavior rather than real behavior
|
- Tests that assert on mock behavior rather than real behavior
|
||||||
- Tests requiring excessive mocking (>2 mocks suggests design problem — report it)
|
- Tests requiring excessive mocking (>2 mocks suggests design problem — report it)
|
||||||
|
- **Structural-only tests** that never invoke the function/method under test. Forbidden patterns:
|
||||||
|
- `assert_eq!(std::mem::variant_count::<X>(), N)` — variant count is a refactor-tripwire, not behavior.
|
||||||
|
- `let _: TypeName = …;` / `let _: fn(…) -> _ = my_fn;` — a type ascription that compiles tells you the symbol exists, not what it does.
|
||||||
|
- `Box::new(my_fn)` / `&my_fn as &dyn Fn(…)` — coercing a function pointer is not calling it.
|
||||||
|
- Struct-literal construction (`Foo { a: 1, b: 2 }`) followed only by field re-reads — that exercises field access, not the methods that mutate or read state.
|
||||||
|
- Tests in a stub-first scenario where every test passes without a `todo!()` panic — by definition no test actually called the stub.
|
||||||
|
|
||||||
|
**Positive rule — every test MUST exercise behavior.** Each test body must call at least one function or method that is the subject of the task and assert on an *observable outcome* (return value, mutated state, raised error, side effect). If the only thing you can write is a structural assertion, the task is "no test needed" — report it back to the caller as `NOT_TESTABLE` (with a clear reason) rather than padding the suite with type-only tests that produce false-green coverage.
|
||||||
|
|
||||||
**Follow existing codebase patterns** (per language):
|
**Follow existing codebase patterns** (per language):
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -57,8 +57,8 @@ The plan should include:
|
||||||
- New files to create
|
- New files to create
|
||||||
- Risks and open questions
|
- Risks and open questions
|
||||||
- **Test Design (conditional — include for non-trivial tasks):**
|
- **Test Design (conditional — include for non-trivial tasks):**
|
||||||
- Key behaviors to verify (what tests should assert)
|
- Key behaviors to verify, expressed as **action + observable outcome** (e.g. *"call `weave_enemies` with t=0.5 → enemy `Transform.translation.x` differs from initial position"*). A structural fact like *"enum has 3 variants"* or *"struct has these fields"* is **not** a behavior — it cannot fail meaningfully and does not exercise the code under test.
|
||||||
- Edge cases and error conditions worth testing
|
- Edge cases and error conditions worth testing (also expressed as actions, not structure)
|
||||||
- What explicitly should NOT be tested (prevents bloat)
|
- What explicitly should NOT be tested (prevents bloat)
|
||||||
- Testability concerns (heavy external deps, GPU-only paths, etc.)
|
- Testability concerns (heavy external deps, GPU-only paths, etc.)
|
||||||
|
|
||||||
|
|
@ -257,6 +257,7 @@ If any non-matching file appears, or any anti-pattern matches: discard `@test` o
|
||||||
| `NOT_TESTABLE` reason `Rust unit-only` | See "Rust unit-only routing" below. **Do not** include test code in the `@make` spec; pass test specs only. |
|
| `NOT_TESTABLE` reason `Rust unit-only` | See "Rust unit-only routing" below. **Do not** include test code in the `@make` spec; pass test specs only. |
|
||||||
| `BLOCKED` | Investigate. May need to revise task spec or plan. |
|
| `BLOCKED` | Investigate. May need to revise task spec or plan. |
|
||||||
| Test passes immediately | Investigate — behavior may already exist. Task spec may be wrong. |
|
| Test passes immediately | Investigate — behavior may already exist. Task spec may be wrong. |
|
||||||
|
| Stub-first run: tests pass with zero `todo!()` panics | **Structural-only tests.** Every test is asserting type/struct/enum facts without calling any stubbed symbol. Reject the test output and route back to `@test` with a "must exercise the stubbed symbols by calling them" note. Do not let these tests gate Phase 7 — they cannot RED→GREEN, so the body-pass `@make` would commit code with false-green coverage. |
|
||||||
|
|
||||||
### Rust unit-only routing
|
### Rust unit-only routing
|
||||||
|
|
||||||
|
|
@ -287,7 +288,11 @@ Rust integration tests live in a separate test crate (`tests/<feature>.rs`) that
|
||||||
- **Dispatch Hygiene still applies:** the stub pass is small and finalized — no draft bodies, no contradictory signatures.
|
- **Dispatch Hygiene still applies:** the stub pass is small and finalized — no draft bodies, no contradictory signatures.
|
||||||
2. Verify `cargo check` passed in `@make`'s output. If not, fix and re-dispatch the stub pass before continuing.
|
2. Verify `cargo check` passed in `@make`'s output. If not, fix and re-dispatch the stub pass before continuing.
|
||||||
3. Dispatch `@test` as normal. The integration test now compiles; running it panics on `todo!()` at runtime, which is a clean `MISSING_BEHAVIOR` RED with a stack trace — far better than the build-error-RED form.
|
3. Dispatch `@test` as normal. The integration test now compiles; running it panics on `todo!()` at runtime, which is a clean `MISSING_BEHAVIOR` RED with a stack trace — far better than the build-error-RED form.
|
||||||
4. Continue to Phase 7's body pass (`@make` in TDD mode), where the same files are revisited and the `todo!()` bodies are replaced.
|
4. **Panic-coverage check (MANDATORY).** After `@test` returns, re-run the test command in the orchestrator and verify that **every test in the new file panics on `todo!()`** (i.e. every test exercises at least one of the stubbed symbols). The decision rule:
|
||||||
|
- If the test output shows N panics for N tests → proceed to body pass.
|
||||||
|
- If any test passes without a `todo!()` panic → that test is structural-only (asserting type / variant-count / field facts without calling the stubbed code). **Reject** `@test`'s output and route back with the "Stub-first run: tests pass with zero `todo!()` panics" decision-table verdict. Require `@test` to rewrite each non-panicking test so it actually invokes the stubbed function/method.
|
||||||
|
- This check is the only thing standing between false-green coverage and the body-pass commit. Skipping it has produced regressions like a system that compiles, "passes" tests, and silently no-ops in production.
|
||||||
|
5. Continue to Phase 7's body pass (`@make` in TDD mode), where the same files are revisited and the `todo!()` bodies are replaced.
|
||||||
|
|
||||||
**This routing is mandatory** for new public API in Rust. It is **not** required when the integration test exercises an existing public API (e.g. a behavior fix where the function already exists) — in that case `@test` runs directly and `@make` modifies the body in Phase 7.
|
**This routing is mandatory** for new public API in Rust. It is **not** required when the integration test exercises an existing public API (e.g. a behavior fix where the function already exists) — in that case `@test` runs directly and `@make` modifies the body in Phase 7.
|
||||||
|
|
||||||
|
|
@ -353,7 +358,10 @@ Provide reviewers with:
|
||||||
1. Send implementation to both reviewers
|
1. Send implementation to both reviewers
|
||||||
2. Merge findings (same precedence rules as Phase 4)
|
2. Merge findings (same precedence rules as Phase 4)
|
||||||
3. If ACCEPTABLE: proceed to Phase 9
|
3. If ACCEPTABLE: proceed to Phase 9
|
||||||
4. If issues found: fix them directly (no need to re-dispatch `@make` for small fixes), then re-review
|
4. If issues found, route per the kind of finding — **the orchestrator does not write production code; `@make` does**:
|
||||||
|
- **`BLOCK`, behavioral, correctness, or test-quality findings:** build a new `@make` task spec from the finding (apply Dispatch Hygiene, finalized text, no draft answer). Dispatch `@make`. Do **not** fix directly. Every `BLOCK` is by definition behavioral and must round-trip through `@make`.
|
||||||
|
- **Strictly cosmetic findings** (typo in a comment, missing trailing newline, formatting that does not change the AST or behavior): the orchestrator may fix directly, then re-review. Anything compiler-detected (unused import, dead code) goes through `@make`, since removing it is still a code change.
|
||||||
|
- When in doubt, dispatch `@make`.
|
||||||
5. **Convergence detection:** same findings twice = stop loop early
|
5. **Convergence detection:** same findings twice = stop loop early
|
||||||
6. If unresolved after 3 cycles: document blockers, proceed to commit anyway
|
6. If unresolved after 3 cycles: document blockers, proceed to commit anyway
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue