fix(opencode): close two false-green test loopholes and the orchestrator-as-implementer escape hatch

A workflow run on a Bevy weaving feature exposed two compounding
failures:

1. @test wrote 8 structural-only Rust tests that never invoked
   weave_enemies or trigger_weaving. Every test passed against the
   stub-first @make pre-pass because none of them called the
   stubbed symbols, so todo!() never fired. The body-pass committed
   code that "passed" the suite and silently broke trigger_weaving
   in special stages.

2. @check found the trigger_weaving regression at Phase 8 (final
   review) and the orchestrator decided to "fix them directly"
   rather than dispatching @make — taking the license offered by
   the existing review-loop wording.

Test-quality fixes:

- Phase 3 Test Design now requires each behavior to be expressed as
  an action + observable outcome. Structural facts ("enum has 3
  variants", "struct has these fields") are explicitly disqualified.
- Phase 6 stub-first flow gains a mandatory Panic-coverage check:
  after @test returns, the orchestrator re-runs the test command and
  rejects the output unless every test panics on todo!() (i.e. every
  test exercises at least one stubbed symbol). Any passing test is
  structural-only and routes back to @test.
- Phase 6 decision table gets a "Stub-first run: tests pass with zero
  todo!() panics" row covering the same case.
- @test's Test Philosophy gains an explicit Do-NOT-write list of
  structural-only patterns (variant_count, type ascriptions,
  Box::new(my_fn), struct-literal-only flows, all-pass-on-stubs)
  plus a positive rule: every test must call a function and assert
  on observable outcome, or return NOT_TESTABLE rather than pad the
  suite.

Orchestrator boundary fix:

- Phase 8 review loop replaces "fix them directly (no need to
  re-dispatch @make for small fixes)" with the principle "the
  orchestrator does not write production code; @make does". BLOCK,
  behavioral, correctness, and test-quality findings round-trip
  through @make. Only AST-preserving cosmetic edits (typos in
  comments, trailing newlines) may be applied directly. Compiler-
  detected issues (unused imports, dead code) go through @make.
This commit is contained in:
Harald Hoyer 2026-05-07 09:07:41 +02:00
parent 91e8aab383
commit 91ba5bd272
2 changed files with 20 additions and 4 deletions

View file

@ -149,6 +149,14 @@ This constraint is enforced by a post-step file gate. Violations cause your outp
- Trivial tests (constructor creates object, getter returns value)
- Tests that assert on mock behavior rather than real behavior
- Tests requiring excessive mocking (>2 mocks suggests design problem — report it)
- **Structural-only tests** that never invoke the function/method under test. Forbidden patterns:
- `assert_eq!(std::mem::variant_count::<X>(), N)` — variant count is a refactor-tripwire, not behavior.
- `let _: TypeName = …;` / `let _: fn(…) -> _ = my_fn;` — a type ascription that compiles tells you the symbol exists, not what it does.
- `Box::new(my_fn)` / `&my_fn as &dyn Fn(…)` — coercing a function pointer is not calling it.
- Struct-literal construction (`Foo { a: 1, b: 2 }`) followed only by field re-reads — that exercises field access, not the methods that mutate or read state.
- Tests in a stub-first scenario where every test passes without a `todo!()` panic — by definition no test actually called the stub.
**Positive rule — every test MUST exercise behavior.** Each test body must call at least one function or method that is the subject of the task and assert on an *observable outcome* (return value, mutated state, raised error, side effect). If the only thing you can write is a structural assertion, the task is "no test needed" — report it back to the caller as `NOT_TESTABLE` (with a clear reason) rather than padding the suite with type-only tests that produce false-green coverage.
**Follow existing codebase patterns** (per language):