Commit graph

7 commits

Author SHA1 Message Date
267c05b107 feat(opencode): give @make a concrete test-smell checklist
Real-world observation: @make struggles when @test sets up tests
incorrectly because the existing escalate: test_design trigger is
described abstractly ("test seems to demand wrong thing"). When @make
sees an unfamiliar smell, it tends to attempt implementation, fail,
attempt again, and only escalate after burning 2-3 cycles. The
protocol exists; the recognition criteria don't.

Restructure Entry Validation step 5 into a named "Test triage" step
with a concrete checklist that fires *before* any implementation
attempt. Four categories of smells:

- **Mocking smells:** mocks the SUT, >2 mocks, mock-call-as-primary
  assertion, internal-boundary mocking
- **Structural-only smells:** variant counts, type ascriptions,
  function-pointer coercion, struct-literal-with-field-reads,
  stub-first no-panics (mirrors @test.md's anti-patterns)
- **Wrong-target smells:** asserts on private state / log strings,
  demands contradicting spec, physically impossible demands
- **Setup smells:** fixtures bypassing production validation,
  wrong-module imports, references to nonexistent infrastructure

Iteration Limits step 5 now cross-references the same checklist
instead of restating abstract criteria, so both gates apply the same
recognition rules with a single source of truth.

A "NOT for" caveat prevents over-eager escalation: when the test is
fine but the implementation is just hard, that's not a smell, that's
the test doing its job.

The checklist is inlined (not pulled from @test.md at runtime) because
subagents have separate contexts. Periodic manual sync between
@make.md's checklist and @test.md's anti-patterns is acceptable —
they shouldn't drift much in practice.

Refs: config/opencode/agents/test.md (anti-patterns + structural-only
list it mirrors), config/opencode/workflow-design.md ADR-19 (unified
Implementation Incomplete diagnosis path)
2026-05-08 15:16:33 +02:00
3e515d54eb feat(opencode): allow agents to read external Rust crate source
@make, @test, @check often need to inspect dependency source (trait
definitions, impl details, test patterns) to inform implementation or
verify findings. Opencode applies a CWD check on tool access, so reads
outside the worktree previously prompted for each access.

- Add permission.read/grep/glob path allowlists for the three locations
  cargo deps live: ~/.cargo/registry/src/, ~/.cargo/git/checkouts/, and
  /nix/store/*-vendor-*/ for crane / buildRustPackage projects.
- Document the discovery pattern in each agent: `cargo metadata
  --format-version 1` returns absolute paths via packages[].manifest_path.
- Cross-reference the registry paths from the permission.bash allowlist
  comment so future readers see the bash inspection commands (rg/ls)
  intentionally accept paths outside CWD.
- @check gets its first permission block (was tools-only before).

Path-pattern syntax for read/grep/glob isn't fully documented; if
opencode rejects it, fall back to `permission: { external_directory:
allow }` at the project config level.
2026-05-08 13:24:30 +02:00
af6481a5a7 feat(opencode): one-task-per-run model + 9 routing fixes (ADRs 13-21)
Captures the design grilling outcome. Adds ADRs 13-21 covering:
- run-level plan_rework_remaining counter to bound P3<->P5.5/P7/P8 thrash
- non-resumable workflow with throwaway-worktree recovery procedure
- @simplify advisory at every gate (not just Phase 8)
- Phase 8 fix specs go to disk as task-fix-N.md (preserves ADR-6)
- Phase 5.5 BLOCK protocol: orchestrator edits plan, decrements counter, re-enters P4
- Phase 8 NOT_TESTABLE manifest in reviewer prompt
- unified Implementation Incomplete diagnosis (test_design / production_logic / split_needed)
- Phase 1 working-tree cleanliness + depends-on enforcement
- one-task-per-run pivot: Phase 5 still splits N tasks, only task-1 runs;
  tasks 2..N filed as sub-issues with rich seed bodies; split_needed at P7
  aborts to Failure Handler (one-task-per-run = no salvageable prior work)

Auto-resolves big-diff Phase 8 reviews, cross-task regression-within-run, and
mid-flight task-split routing. Rewrites routing matrix and three Mermaid
diagrams; updates @pm (depends-on frontmatter, split-time filing), @check
(third diagnosis verdict), @make (escalate: split_needed flag).
2026-05-08 13:02:54 +02:00
534361f1b5 feat(opencode): extend Phase 7 escalation to mid-implementation test-design errors
Phase 7's escalation rule was gated on @make flagging concerns "during
entry validation" only. When @make got past entry validation, started
implementing, and ground for 2-3 attempts because the test demanded
impossible production code, the orchestrator had no documented route
— it would re-dispatch @make with marginal context tweaks instead of
recognizing the failure as test-architecture.

Splits the escalation into two clearly-named paths (entry-validation
vs mid-implementation) that both route through @check (test diagnosis)
→ @test (redesign) → fresh @make. Bounded at max 2 escalation cycles
before reverting to a Phase 3 plan revisit, to prevent thrashing when
the actual problem is upstream.

@make.md gains a new Iteration Limits red-flag class — "Test-design
suspicion" — instructing @make to stop and report with an explicit
`escalate: test_design` flag in the Blocking Issue section. The flag
is the routing signal the orchestrator switches on.
2026-05-08 10:20:16 +02:00
8fcf7e5d34 feat(opencode): make @make and @test polyglot (Python, Rust, nix devshell)
Both agents previously hardcoded the Python/uv toolchain. They now
detect the language from marker files (pyproject.toml, Cargo.toml,
flake.nix) and run the appropriate test/lint/format/type-check commands
for Python, Rust, or both. When a flake.nix devshell is present, every
toolchain command is wrapped in `nix develop -c …`.

@make's permission allowlist gains `cargo *` and `nix develop -c *`,
plus matching denies for cargo add/remove/install/publish. The
Verification Tiers and Baseline Verification sections are rewritten as
per-language bullets, and output/TDD-evidence examples are now
language-neutral. Generalised the "no Kubernetes deployments"
constraint to cover any deploy/publish.

@test gains the same devshell + cargo allows (scoped to test, check,
clippy, fmt only — no build/run/install). Its file constraint adds
`tests/**/*.rs` for Rust integration tests, with an explicit note that
Rust unit tests stay with @make because they live inside production
source files. Failure-classification hints add Rust compiler-error
mappings, and the NOT_TESTABLE table gets a "Rust unit-only" row.
2026-05-06 17:09:34 +02:00
37be2d9505 fix(opencode): remove agent models and temperature 2026-05-06 15:33:11 +02:00
4ec1561af4 feat(opencode): add multi-agent workflow agents and commands
Adds @check, @simplify, @test, @make, @pm subagents and the /workflow
and /review slash commands from the autonomous multi-agent workflow
gist by ppries.

@pm is rewritten to manage issues in a local ./TODO.md file instead of
Linear (file-only access, documented schema, structured JSON output).

/workflow is adapted: TODO.md-based issue context, generic worktree
paths (no hardcoded ~/repos/veo/sunstone), generic branch examples,
and a Phase 1 guard that verifies origin is on GitHub before any
work begins.
2026-05-06 14:56:42 +02:00