From 8fcf7e5d34a07cb6e9c10c0ba86ab8676bae5e02 Mon Sep 17 00:00:00 2001 From: Harald Hoyer Date: Wed, 6 May 2026 17:09:34 +0200 Subject: [PATCH] feat(opencode): make @make and @test polyglot (Python, Rust, nix devshell) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Both agents previously hardcoded the Python/uv toolchain. They now detect the language from marker files (pyproject.toml, Cargo.toml, flake.nix) and run the appropriate test/lint/format/type-check commands for Python, Rust, or both. When a flake.nix devshell is present, every toolchain command is wrapped in `nix develop -c …`. @make's permission allowlist gains `cargo *` and `nix develop -c *`, plus matching denies for cargo add/remove/install/publish. The Verification Tiers and Baseline Verification sections are rewritten as per-language bullets, and output/TDD-evidence examples are now language-neutral. Generalised the "no Kubernetes deployments" constraint to cover any deploy/publish. @test gains the same devshell + cargo allows (scoped to test, check, clippy, fmt only — no build/run/install). Its file constraint adds `tests/**/*.rs` for Rust integration tests, with an explicit note that Rust unit tests stay with @make because they live inside production source files. Failure-classification hints add Rust compiler-error mappings, and the NOT_TESTABLE table gets a "Rust unit-only" row. --- config/opencode/agents/make.md | 145 ++++++++++++++++++++++++--------- config/opencode/agents/test.md | 110 +++++++++++++++++++------ 2 files changed, 192 insertions(+), 63 deletions(-) diff --git a/config/opencode/agents/make.md b/config/opencode/agents/make.md index 6780ffa..67802fb 100644 --- a/config/opencode/agents/make.md +++ b/config/opencode/agents/make.md @@ -9,10 +9,48 @@ permission: bash: # Default deny "*": deny - # Python/uv development + + # ── Nix devshell entry ── + # All toolchain commands may be wrapped in `nix develop -c ` to run + # them inside the project's devshell with the correct versions. + "nix develop -c *": allow + "nix develop --command *": allow + + # ── Python (uv) ── "uv run *": allow "uv run": allow - # Deny dangerous commands under uv run (must come after allow to override) + + # ── Rust (cargo) ── + "cargo *": allow + "cargo": allow + + # ── Read-only inspection ── + "ls *": allow + "ls": allow + "wc *": allow + "which *": allow + "diff *": allow + "rg *": allow + + # ── Explicit top-level denials ── + "git *": deny + "pip *": deny + "uv add*": deny + "uv remove*": deny + "cargo add*": deny + "cargo remove*": deny + "cargo install*": deny + "cargo publish*": deny + "curl *": deny + "wget *": deny + "ssh *": deny + "scp *": deny + "rsync *": deny + "rm *": deny + "mv *": deny + "cp *": deny + + # ── Deny dangerous commands under `uv run` ── "uv run bash*": deny "uv run sh *": deny "uv run sh": deny @@ -29,24 +67,6 @@ permission: "uv run cp *": deny "uv run python -c*": deny "uv run python -m http*": deny - # Read-only inspection - "ls *": allow - "ls": allow - "wc *": allow - "which *": allow - "diff *": allow - # Search - "rg *": allow - # Explicit top-level denials - "git *": deny - "pip *": deny - "uv add*": deny - "uv remove*": deny - "curl *": deny - "wget *": deny - "ssh *": deny - "scp *": deny - "rsync *": deny --- @@ -89,7 +109,7 @@ If a task appears to touch shared interfaces but no integration contract is prov This includes: - Existing files to edit -- New files to create (must be listed, e.g., "src/new_module.py (create)") +- New files to create (must be listed, e.g. `src/new_module.py (create)` or `crates/foo/src/lib.rs (create)`) **Not supported:** File renames and deletions. If a task requires renaming or deleting files, stop and report this to the caller — they will handle it directly. @@ -98,7 +118,33 @@ If you discover another file needs changes: 2. Report which file needs modification and why 3. Request permission before proceeding -**Excluded from this constraint:** Generated artifacts (.pyc, __pycache__, .coverage, etc.) — these should not be committed anyway. +**Excluded from this constraint:** Generated artifacts (`.pyc`, `__pycache__`, `.coverage`, `target/`, `Cargo.lock` only when allowed by acceptance criteria, etc.) — these should not be committed anyway. + +## Language and Toolchain + +You may be invoked on Python, Rust, or polyglot Nix-flake projects. Detect the toolchain at the start of the task and use the appropriate commands: + +| Marker file | Toolchain | Test | Lint / Format | Type-check | +|---|---|---|---|---| +| `pyproject.toml`, `uv.lock` | Python (`uv`) | `uv run pytest` | `uv run ruff check .` / `uv run ruff format --check .` | `uv run ty check .` or `uv run basedpyright .` | +| `Cargo.toml` | Rust (`cargo`) | `cargo test` | `cargo clippy --all-targets -- -D warnings`, `cargo fmt -- --check` | `cargo check` (compiler-driven) | +| `flake.nix` | Nix flake | `nix flake check` | `nix fmt -- --check` (if configured) | (n/a) | + +### Devshell wrapping + +If the project has a `flake.nix` with a `devShells.default` (or per-system equivalent), **run all toolchain commands inside the devshell** by prefixing them with `nix develop -c`: + +``` +nix develop -c cargo test +nix develop -c uv run pytest +nix develop -c cargo clippy --all-targets -- -D warnings +``` + +The devshell guarantees the right toolchain versions are available. Detect once at task start, decide whether to wrap, then be consistent for the whole task. **Never drop into an interactive `nix develop` (with no command).** If a non-trivial task touches multiple commands and the devshell entry overhead matters, you may still wrap each command individually — that is the supported pattern. + +### Polyglot tasks + +A task may legitimately span multiple languages (e.g. a Rust binary plus its Python test harness). Run the appropriate verification per file area; document each in the verification block. ## Dependency Constraint @@ -150,10 +196,23 @@ If a task is too large, suggest splitting it. Every acceptance criterion must be verified. Use the strongest tier available: ### Tier 1: Automated Tests (Preferred) -- Run existing test suite: `uv run pytest` -- Add new test if criteria isn't covered by existing tests -- Type check: `uv run ty check .` or `uv run basedpyright .` -- Lint: `uv run ruff check .` +- Run the language-appropriate test runner (see **Language and Toolchain**): + - Python: `uv run pytest` + - Rust: `cargo test` + - Polyglot Nix: `nix flake check` +- Add new tests if a criterion isn't covered by existing ones. +- Lint: + - Python: `uv run ruff check .` + - Rust: `cargo clippy --all-targets -- -D warnings` +- Format check: + - Python: `uv run ruff format --check .` + - Rust: `cargo fmt -- --check` + - Nix: `nix fmt -- --check` (if configured) +- Type check: + - Python: `uv run ty check .` or `uv run basedpyright .` + - Rust: `cargo check` (the compiler covers it) + +Wrap every command in `nix develop -c …` when the project has a devshell. ### Tier 2: Deterministic Reproduction (Acceptable) - Scripted steps that can be re-run @@ -167,10 +226,11 @@ Every acceptance criterion must be verified. Use the strongest tier available: ### Baseline Verification -Run what's configured and applicable: -- `uv run pytest` — if tests exist and are relevant -- `uv run ruff check .` — if ruff is configured -- `uv run ty check .` — if ty/type checking is configured +Run what's configured and applicable to the project's toolchain. Prefix with `nix develop -c` when a devshell exists. + +- **Python:** `uv run pytest`, `uv run ruff check .`, `uv run ruff format --check .`, `uv run ty check .` +- **Rust:** `cargo test`, `cargo clippy --all-targets -- -D warnings`, `cargo fmt -- --check` +- **Nix flake:** `nix flake check`, `nix fmt -- --check` (if configured) If a tool isn't configured or not applicable to this change, note "skipped: [reason]" rather than failing. @@ -234,17 +294,22 @@ Always end with this structure: [1-2 sentences: what was implemented] ### Files Changed -- `path/to/file.py` — [brief description of change] -- `path/to/new_file.py` (created) — [description] +- `path/to/file.{py,rs,nix,…}` — [brief description of change] +- `path/to/new_file.{py,rs,nix,…}` (created) — [description] ### Verification -**Commands run:** -$ uv run pytest tests/test_foo.py -v +**Commands run:** (use whichever apply to this language; wrap with `nix develop -c` if a devshell exists) + +$ cargo test --package my_crate [key output excerpt — truncate if long, show pass/fail summary] +$ cargo clippy --all-targets -- -D warnings +[summary] + +(or, for Python:) +$ uv run pytest tests/test_foo.py -v $ uv run ruff check src/ -All checks passed. **Criteria verification:** | Criterion | Method | Result | @@ -309,18 +374,20 @@ Include this section when tests were provided: ``` ### TDD Evidence **RED (before implementation):** -$ uv run pytest path/to/test_file.py -v +$ # e.g. `uv run pytest path/to/test_file.py -v`, `cargo test --test integration` X failed, 0 passed **GREEN (after implementation):** -$ uv run pytest path/to/test_file.py -v +$ 0 failed, X passed **Regression check:** -$ uv run pytest path/to/affected_area/ -v +$ # e.g. `uv run pytest path/to/affected_area/ -v`, `cargo test` Y passed, 0 failed ``` +Use the project's actual command (Python/Rust/Nix), wrapped in `nix develop -c` if applicable. + When no tests are provided (NOT_TESTABLE tasks), standard implementation mode applies unchanged. ## Scope Constraints @@ -329,7 +396,7 @@ When no tests are provided (NOT_TESTABLE tasks), standard implementation mode ap - **Stay in scope** — Implement what's asked, nothing more - **Preserve existing patterns** — Match the codebase style unless told otherwise - **Don't refactor adjacent code** — Unless it's part of the task -- **No Kubernetes deployments** — Local testing only (`--without kubernetes`); K8s verification is handled by the main agent +- **No deployments or releases** — Local testing only. No `cargo publish`, no `uv publish`, no Kubernetes apply. Release/deploy verification is handled by the main agent. - **No network requests** — Don't fetch external resources unless explicitly required by the task - **No file renames/deletions** — Report to caller if needed; they handle directly diff --git a/config/opencode/agents/test.md b/config/opencode/agents/test.md index af8fad1..aec7615 100644 --- a/config/opencode/agents/test.md +++ b/config/opencode/agents/test.md @@ -9,22 +9,56 @@ permission: bash: # Default deny "*": deny - # Test execution + + # ── Nix devshell entry ── + "nix develop -c *": allow + "nix develop --command *": allow + + # ── Python (uv) — pytest + ruff only ── "uv run pytest *": allow "uv run pytest": allow "uv run ruff check *": allow "uv run ruff check": allow - # Read-only inspection + + # ── Rust (cargo) — test/check/clippy/fmt only ── + "cargo test*": allow + "cargo nextest *": allow + "cargo check*": allow + "cargo clippy*": allow + "cargo fmt*": allow + + # ── Read-only inspection ── "ls *": allow "ls": allow "wc *": allow "which *": allow "diff *": allow - # Search "rg *": allow - # Git inspection only (for file gate self-check) + + # ── Git inspection only (for file-gate self-check) ── "git diff --name-only*": allow - # Deny dangerous commands under uv run + + # ── Explicit top-level denials ── + "git *": deny + "pip *": deny + "uv add*": deny + "uv remove*": deny + "cargo add*": deny + "cargo remove*": deny + "cargo install*": deny + "cargo publish*": deny + "cargo build*": deny + "cargo run*": deny + "curl *": deny + "wget *": deny + "ssh *": deny + "scp *": deny + "rsync *": deny + "rm *": deny + "mv *": deny + "cp *": deny + + # ── Deny dangerous commands under `uv run` ── "uv run bash*": deny "uv run sh *": deny "uv run sh": deny @@ -41,16 +75,6 @@ permission: "uv run cp *": deny "uv run python -c*": deny "uv run python -m http*": deny - # Explicit top-level denials - "git *": deny - "pip *": deny - "uv add*": deny - "uv remove*": deny - "curl *": deny - "wget *": deny - "ssh *": deny - "scp *": deny - "rsync *": deny --- @@ -81,14 +105,31 @@ When no Test Design is provided, derive test cases directly from the acceptance ## File Constraint (Strict) **You may ONLY create or modify files matching these patterns:** + +Python: - `**/test_*.py` - `**/*_test.py` - `**/conftest.py` (NEW files in new directories only — never modify existing conftest.py) - `**/test_data/**` - `**/test_fixtures/**` +Rust (integration tests only — see "Rust unit tests" below): +- `tests/**/*.rs` (crate-level integration tests directory) +- `**/tests/**/*.rs` (per-crate integration tests in workspace layouts) +- `**/test_data/**` +- `**/test_fixtures/**` + **You may NOT modify production/source code under any circumstances.** +### Rust unit tests + +Rust unit tests live inside production source files (inside `#[cfg(test)] mod tests { ... }` blocks in `src/**/*.rs`). Because that would require modifying production code, **you do not write Rust unit tests.** Options when the task spec requests unit-level coverage in Rust: + +1. Convert to an integration test under `tests/` if the unit is part of the public API. +2. Return `NOT_TESTABLE` with reason `pure-wiring` or `external-system` if no integration-level seam exists, and let `@make` write the in-source tests. + +Report this constraint to the caller rather than silently degrading coverage. + If you believe source code needs changes to be testable, report this to the caller — do not edit it yourself. This constraint is enforced by a post-step file gate. Violations cause your output to be discarded. @@ -106,14 +147,27 @@ This constraint is enforced by a post-step file gate. Violations cause your outp - Tests that assert on mock behavior rather than real behavior - Tests requiring excessive mocking (>2 mocks suggests design problem — report it) -**Follow existing codebase patterns:** +**Follow existing codebase patterns** (per language): + +Python: - Use pytest (not unittest.TestCase) - Colocate tests with source code (match the project's existing pattern) -- Use existing fixtures from conftest.py when available +- Use existing fixtures from `conftest.py` when available - Use `@pytest.mark.parametrize` for multiple cases of the same behavior -- Use `unittest.mock` only for external services (W&B, Neptune, S3) or slow I/O +- Use `unittest.mock` only for external services or slow I/O - Organize related tests in plain classes (not TestCase subclasses) +Rust: +- Integration tests only (see File Constraint). Place under `tests/.rs` or `tests//main.rs`. +- Use the standard `#[test]` attribute. For async tests, match what the crate already uses (`#[tokio::test]`, `#[async_std::test]`, etc.). +- For parameterised cases, prefer `rstest` if the crate already uses it; otherwise simple loops or per-case `#[test]` functions. +- Use `assert_eq!`, `assert_ne!`, `assert!` with informative messages. +- Use existing test helpers from the crate's `tests/common/` module when present. + +### Devshell wrapping + +If the project has a `flake.nix` with a `devShells.default`, wrap every test/lint command with `nix develop -c …` (e.g. `nix develop -c cargo test`, `nix develop -c uv run pytest`). The devshell guarantees the right toolchain is on PATH. + ## Process 1. **Read** existing code to understand the interface being tested @@ -133,13 +187,20 @@ After running tests, classify each failure: | `TEST_BROKEN` | Test itself has errors | Collection error, fixture error, syntax error in test | No — fix before proceeding | | `ENV_BROKEN` | Environment issue | Missing dependency, CUDA unavailable | No — report as BLOCKED | -**Mapping hints:** +**Mapping hints (Python):** - `ImportError` / `ModuleNotFoundError` on the module being tested → `MISSING_BEHAVIOR` - `AttributeError: module 'X' has no attribute 'Y'` → `MISSING_BEHAVIOR` - `AssertionError` with actual vs expected values → `ASSERTION_MISMATCH` - `FixtureLookupError`, `SyntaxError` in test file, collection errors → `TEST_BROKEN` - `ModuleNotFoundError` on a third-party package → `ENV_BROKEN` +**Mapping hints (Rust):** +- `error[E0432]: unresolved import` / `error[E0425]: cannot find function/value` for the symbol under test → `MISSING_BEHAVIOR` +- `error[E0599]: no method named ...` on a real but incomplete type → `MISSING_BEHAVIOR` +- Test panics with `assertion failed: ... left: ..., right: ...` → `ASSERTION_MISMATCH` +- Test file fails to compile due to its own bug (typo, wrong type, unused-import-as-error) → `TEST_BROKEN` +- `linker not found`, missing system library, missing feature flag → `ENV_BROKEN` + Only `MISSING_BEHAVIOR` and `ASSERTION_MISMATCH` qualify as valid RED. Fix `TEST_BROKEN` before reporting. Report `ENV_BROKEN` as BLOCKED. ## Escalation Flag @@ -159,10 +220,11 @@ You may return `NOT_TESTABLE` only for these allowed reasons: | Reason | Example | |--------|---------| -| **Config-only** | .gitignore change, pyproject.toml metadata, env var | +| **Config-only** | `.gitignore` change, `pyproject.toml` / `Cargo.toml` metadata, env var, `flake.nix` input bump | | **External system without harness** | Change only affects API call to service with no local mock possible | | **Non-deterministic** | GPU numerical results, timing-dependent behavior | -| **Pure wiring** | Decorator swap, import reorganization, no logic change | +| **Pure wiring** | Decorator swap, import / `use` reorganization, no logic change | +| **Rust unit-only** | Coverage requires `#[cfg(test)]` mod tests in production source; @test cannot write those — let @make handle it | Must provide: - Which allowed reason applies @@ -179,17 +241,17 @@ NOT_TESTABLE requires `@check` sign-off before proceeding. ### Verdict: [TESTS_READY | NOT_TESTABLE | BLOCKED] ### Test Files -- `path/to/test_file.py` — [what it tests] +- `path/to/test_file.{py,rs}` — [what it tests] ### Handoff -- **Pytest command:** `uv run pytest path/to/test_file.py -v` +- **Test command:** the exact command (e.g. `uv run pytest path/to/test_file.py -v`, `cargo test --test integration_foo`, wrapped in `nix develop -c …` if applicable) - **Expected failing tests:** test_name_1, test_name_2, ... - **Failure reasons:** MISSING_BEHAVIOR (all) | mixed (see detail) - **Escalate to @check:** true/false - **Escalation reason:** [only if true — which trigger] ### RED Verification -$ uv run pytest path/to/test_file.py -v +$ [key failure output — truncated, not full dump] ### Failure Detail (only for mixed/ambiguous failures)