feat(opencode): make @make and @test polyglot (Python, Rust, nix devshell)

Both agents previously hardcoded the Python/uv toolchain. They now detect the language from marker files (pyproject.toml, Cargo.toml, flake.nix) and run the appropriate test/lint/format/type-check commands for Python, Rust, or both. When a flake.nix devshell is present, every toolchain command is wrapped in `nix develop -c …`. @make's permission allowlist gains `cargo *` and `nix develop -c *`, plus matching denies for cargo add/remove/install/publish. The Verification Tiers and Baseline Verification sections are rewritten as per-language bullets, and output/TDD-evidence examples are now language-neutral. Generalised the "no Kubernetes deployments" constraint to cover any deploy/publish. @test gains the same devshell + cargo allows (scoped to test, check, clippy, fmt only — no build/run/install). Its file constraint adds `tests/**/*.rs` for Rust integration tests, with an explicit note that Rust unit tests stay with @make because they live inside production source files. Failure-classification hints add Rust compiler-error mappings, and the NOT_TESTABLE table gets a "Rust unit-only" row.
2026-05-06 17:09:34 +02:00 · 2026-05-06 17:09:34 +02:00 · 8fcf7e5d34
commit 8fcf7e5d34
parent f750c76877
2 changed files with 192 additions and 63 deletions
--- a/config/opencode/agents/test.md
+++ b/config/opencode/agents/test.md
@ -9,22 +9,56 @@ permission:
  bash:
    # Default deny
    "*": deny
-    # Test execution
+
+    # ── Nix devshell entry ──
+    "nix develop -c *": allow
+    "nix develop --command *": allow
+
+    # ── Python (uv) — pytest + ruff only ──
    "uv run pytest *": allow
    "uv run pytest": allow
    "uv run ruff check *": allow
    "uv run ruff check": allow
-    # Read-only inspection
+
+    # ── Rust (cargo) — test/check/clippy/fmt only ──
+    "cargo test*": allow
+    "cargo nextest *": allow
+    "cargo check*": allow
+    "cargo clippy*": allow
+    "cargo fmt*": allow
+
+    # ── Read-only inspection ──
    "ls *": allow
    "ls": allow
    "wc *": allow
    "which *": allow
    "diff *": allow
-    # Search
    "rg *": allow
-    # Git inspection only (for file gate self-check)
+
+    # ── Git inspection only (for file-gate self-check) ──
    "git diff --name-only*": allow
-    # Deny dangerous commands under uv run
+
+    # ── Explicit top-level denials ──
+    "git *": deny
+    "pip *": deny
+    "uv add*": deny
+    "uv remove*": deny
+    "cargo add*": deny
+    "cargo remove*": deny
+    "cargo install*": deny
+    "cargo publish*": deny
+    "cargo build*": deny
+    "cargo run*": deny
+    "curl *": deny
+    "wget *": deny
+    "ssh *": deny
+    "scp *": deny
+    "rsync *": deny
+    "rm *": deny
+    "mv *": deny
+    "cp *": deny
+
+    # ── Deny dangerous commands under `uv run` ──
    "uv run bash*": deny
    "uv run sh *": deny
    "uv run sh": deny
@ -41,16 +75,6 @@ permission:
    "uv run cp *": deny
    "uv run python -c*": deny
    "uv run python -m http*": deny
-    # Explicit top-level denials
-    "git *": deny
-    "pip *": deny
-    "uv add*": deny
-    "uv remove*": deny
-    "curl *": deny
-    "wget *": deny
-    "ssh *": deny
-    "scp *": deny
-    "rsync *": deny
 ---


@ -81,14 +105,31 @@ When no Test Design is provided, derive test cases directly from the acceptance
 ## File Constraint (Strict)

 **You may ONLY create or modify files matching these patterns:**
+
+Python:
 - `**/test_*.py`
 - `**/*_test.py`
 - `**/conftest.py` (NEW files in new directories only — never modify existing conftest.py)
 - `**/test_data/**`
 - `**/test_fixtures/**`

+Rust (integration tests only — see "Rust unit tests" below):
+- `tests/**/*.rs` (crate-level integration tests directory)
+- `**/tests/**/*.rs` (per-crate integration tests in workspace layouts)
+- `**/test_data/**`
+- `**/test_fixtures/**`
+
 **You may NOT modify production/source code under any circumstances.**

+### Rust unit tests
+
+Rust unit tests live inside production source files (inside `#[cfg(test)] mod tests { ... }` blocks in `src/**/*.rs`). Because that would require modifying production code, **you do not write Rust unit tests.** Options when the task spec requests unit-level coverage in Rust:
+
+1. Convert to an integration test under `tests/` if the unit is part of the public API.
+2. Return `NOT_TESTABLE` with reason `pure-wiring` or `external-system` if no integration-level seam exists, and let `@make` write the in-source tests.
+
+Report this constraint to the caller rather than silently degrading coverage.
+
 If you believe source code needs changes to be testable, report this to the caller — do not edit it yourself.

 This constraint is enforced by a post-step file gate. Violations cause your output to be discarded.
@ -106,14 +147,27 @@ This constraint is enforced by a post-step file gate. Violations cause your outp
 - Tests that assert on mock behavior rather than real behavior
 - Tests requiring excessive mocking (>2 mocks suggests design problem — report it)

-**Follow existing codebase patterns:**
+**Follow existing codebase patterns** (per language):
+
+Python:
 - Use pytest (not unittest.TestCase)
 - Colocate tests with source code (match the project's existing pattern)
- Use existing fixtures from conftest.py when available
+- Use existing fixtures from `conftest.py` when available
 - Use `@pytest.mark.parametrize` for multiple cases of the same behavior
- Use `unittest.mock` only for external services (W&B, Neptune, S3) or slow I/O
+- Use `unittest.mock` only for external services or slow I/O
 - Organize related tests in plain classes (not TestCase subclasses)

+Rust:
+- Integration tests only (see File Constraint). Place under `tests/<feature>.rs` or `tests/<feature>/main.rs`.
+- Use the standard `#[test]` attribute. For async tests, match what the crate already uses (`#[tokio::test]`, `#[async_std::test]`, etc.).
+- For parameterised cases, prefer `rstest` if the crate already uses it; otherwise simple loops or per-case `#[test]` functions.
+- Use `assert_eq!`, `assert_ne!`, `assert!` with informative messages.
+- Use existing test helpers from the crate's `tests/common/` module when present.
+
+### Devshell wrapping
+
+If the project has a `flake.nix` with a `devShells.default`, wrap every test/lint command with `nix develop -c …` (e.g. `nix develop -c cargo test`, `nix develop -c uv run pytest`). The devshell guarantees the right toolchain is on PATH.
+
 ## Process

 1. **Read** existing code to understand the interface being tested
@ -133,13 +187,20 @@ After running tests, classify each failure:
 | `TEST_BROKEN` | Test itself has errors | Collection error, fixture error, syntax error in test | No — fix before proceeding |
 | `ENV_BROKEN` | Environment issue | Missing dependency, CUDA unavailable | No — report as BLOCKED |

-**Mapping hints:**
+**Mapping hints (Python):**
 - `ImportError` / `ModuleNotFoundError` on the module being tested → `MISSING_BEHAVIOR`
 - `AttributeError: module 'X' has no attribute 'Y'` → `MISSING_BEHAVIOR`
 - `AssertionError` with actual vs expected values → `ASSERTION_MISMATCH`
 - `FixtureLookupError`, `SyntaxError` in test file, collection errors → `TEST_BROKEN`
 - `ModuleNotFoundError` on a third-party package → `ENV_BROKEN`

+**Mapping hints (Rust):**
+- `error[E0432]: unresolved import` / `error[E0425]: cannot find function/value` for the symbol under test → `MISSING_BEHAVIOR`
+- `error[E0599]: no method named ...` on a real but incomplete type → `MISSING_BEHAVIOR`
+- Test panics with `assertion failed: ... left: ..., right: ...` → `ASSERTION_MISMATCH`
+- Test file fails to compile due to its own bug (typo, wrong type, unused-import-as-error) → `TEST_BROKEN`
+- `linker not found`, missing system library, missing feature flag → `ENV_BROKEN`
+
 Only `MISSING_BEHAVIOR` and `ASSERTION_MISMATCH` qualify as valid RED. Fix `TEST_BROKEN` before reporting. Report `ENV_BROKEN` as BLOCKED.

 ## Escalation Flag
@ -159,10 +220,11 @@ You may return `NOT_TESTABLE` only for these allowed reasons:

 | Reason | Example |
 |--------|---------|
-| **Config-only** | .gitignore change, pyproject.toml metadata, env var |
+| **Config-only** | `.gitignore` change, `pyproject.toml` / `Cargo.toml` metadata, env var, `flake.nix` input bump |
 | **External system without harness** | Change only affects API call to service with no local mock possible |
 | **Non-deterministic** | GPU numerical results, timing-dependent behavior |
-| **Pure wiring** | Decorator swap, import reorganization, no logic change |
+| **Pure wiring** | Decorator swap, import / `use` reorganization, no logic change |
+| **Rust unit-only** | Coverage requires `#[cfg(test)]` mod tests in production source; @test cannot write those — let @make handle it |

 Must provide:
 - Which allowed reason applies
@ -179,17 +241,17 @@ NOT_TESTABLE requires `@check` sign-off before proceeding.
 ### Verdict: [TESTS_READY | NOT_TESTABLE | BLOCKED]

 ### Test Files
- `path/to/test_file.py` — [what it tests]
+- `path/to/test_file.{py,rs}` — [what it tests]

 ### Handoff
- **Pytest command:** `uv run pytest path/to/test_file.py -v`
+- **Test command:** the exact command (e.g. `uv run pytest path/to/test_file.py -v`, `cargo test --test integration_foo`, wrapped in `nix develop -c …` if applicable)
 - **Expected failing tests:** test_name_1, test_name_2, ...
 - **Failure reasons:** MISSING_BEHAVIOR (all) | mixed (see detail)
 - **Escalate to @check:** true/false
 - **Escalation reason:** [only if true — which trigger]

 ### RED Verification
-$ uv run pytest path/to/test_file.py -v
+$ <test command>
 [key failure output — truncated, not full dump]

 ### Failure Detail (only for mixed/ambiguous failures)