A workflow run on a Bevy weaving feature exposed two compounding
failures:
1. @test wrote 8 structural-only Rust tests that never invoked
weave_enemies or trigger_weaving. Every test passed against the
stub-first @make pre-pass because none of them called the
stubbed symbols, so todo!() never fired. The body-pass committed
code that "passed" the suite and silently broke trigger_weaving
in special stages.
2. @check found the trigger_weaving regression at Phase 8 (final
review) and the orchestrator decided to "fix them directly"
rather than dispatching @make — taking the license offered by
the existing review-loop wording.
Test-quality fixes:
- Phase 3 Test Design now requires each behavior to be expressed as
an action + observable outcome. Structural facts ("enum has 3
variants", "struct has these fields") are explicitly disqualified.
- Phase 6 stub-first flow gains a mandatory Panic-coverage check:
after @test returns, the orchestrator re-runs the test command and
rejects the output unless every test panics on todo!() (i.e. every
test exercises at least one stubbed symbol). Any passing test is
structural-only and routes back to @test.
- Phase 6 decision table gets a "Stub-first run: tests pass with zero
todo!() panics" row covering the same case.
- @test's Test Philosophy gains an explicit Do-NOT-write list of
structural-only patterns (variant_count, type ascriptions,
Box::new(my_fn), struct-literal-only flows, all-pass-on-stubs)
plus a positive rule: every test must call a function and assert
on observable outcome, or return NOT_TESTABLE rather than pad the
suite.
Orchestrator boundary fix:
- Phase 8 review loop replaces "fix them directly (no need to
re-dispatch @make for small fixes)" with the principle "the
orchestrator does not write production code; @make does". BLOCK,
behavioral, correctness, and test-quality findings round-trip
through @make. Only AST-preserving cosmetic edits (typos in
comments, trailing newlines) may be applied directly. Compiler-
detected issues (unused imports, dead code) go through @make.
Rust integration tests live in a separate test crate that imports from
lib.rs, so any test referencing not-yet-existing public API can only
RED at build time. The build error masks assertion diagnostics and
makes the RED state opaque — no stack trace, no left/right values.
For Rust tasks whose @test step writes an integration test against
public API that does not yet exist, the orchestrator now dispatches a
stub-first @make pass before @test runs:
1. @make adds the planned public API as todo!()-bodied stubs in
lib.rs and any new src/<module>.rs. Signatures lifted verbatim
from the Phase 5 task spec. Acceptance criterion is cargo check
only — no test command runs.
2. @test writes the integration test, which now compiles and panics
at todo!() with a stack trace — a clean MISSING_BEHAVIOR RED.
3. Phase 7 dispatches @make again to replace the todo!() bodies with
real implementations. Two atomic commits per task: scaffold then
implement.
Phase 5's Rust test-path guidance now flags the two-dispatch
requirement up front. test.md's Rust failure-classification hints
recognize todo!() / unimplemented!() panics as MISSING_BEHAVIOR with
a pointer to the workflow's stub-first section.
A workflow run on a Bevy/Rust project produced the test-file path
`src/tests/test_<feature>.rs`, which @test correctly flagged as
contradictory: it isn't a valid Rust test location (would require
declaring `mod tests;` in production source, which @test cannot do)
yet the file-gate glob `**/tests/**/*.rs` accidentally matched it.
Phase 5 now gives language-aware Test File guidance: Python uses
colocated or top-level `tests/`, Rust uses crate-level `tests/<feature>.rs`,
and Rust unit-only tasks are routed to NOT_TESTABLE for @make to
handle inline. Phase 6's file gate gains an explicit anti-pattern
clause discarding any new file under `src/` even when the glob matches.
@test's own File Constraint mirrors the anti-pattern so the agent
rejects the bad path with BLOCKED before the orchestrator's gate
even runs — defense in depth on both sides of the dispatch boundary.
Both agents previously hardcoded the Python/uv toolchain. They now
detect the language from marker files (pyproject.toml, Cargo.toml,
flake.nix) and run the appropriate test/lint/format/type-check commands
for Python, Rust, or both. When a flake.nix devshell is present, every
toolchain command is wrapped in `nix develop -c …`.
@make's permission allowlist gains `cargo *` and `nix develop -c *`,
plus matching denies for cargo add/remove/install/publish. The
Verification Tiers and Baseline Verification sections are rewritten as
per-language bullets, and output/TDD-evidence examples are now
language-neutral. Generalised the "no Kubernetes deployments"
constraint to cover any deploy/publish.
@test gains the same devshell + cargo allows (scoped to test, check,
clippy, fmt only — no build/run/install). Its file constraint adds
`tests/**/*.rs` for Rust integration tests, with an explicit note that
Rust unit tests stay with @make because they live inside production
source files. Failure-classification hints add Rust compiler-error
mappings, and the NOT_TESTABLE table gets a "Rust unit-only" row.
Adds @check, @simplify, @test, @make, @pm subagents and the /workflow
and /review slash commands from the autonomous multi-agent workflow
gist by ppries.
@pm is rewritten to manage issues in a local ./TODO.md file instead of
Linear (file-only access, documented schema, structured JSON output).
/workflow is adapted: TODO.md-based issue context, generic worktree
paths (no hardcoded ~/repos/veo/sunstone), generic branch examples,
and a Phase 1 guard that verifies origin is on GitHub before any
work begins.