Commit graph

17 commits

Author SHA1 Message Date
a3e0de6d04 feat(opencode): hide TODO paths from orchestrator (ADR-22)
In recent runs the orchestrator skipped @pm and edited TODO/ files
itself, despite the workflow.md anti-pattern warning. Root cause: the
workflow doc literally taught the orchestrator the path layout
(`./TODO/<ID>.md`), making self-help a discoverable shortcut.

Fix: remove the recipe. The orchestrator now never constructs or reads
any per-issue TODO path. All TODO operations go through @pm dispatches;
@pm returns the absolute file path of every issue it touches, and the
orchestrator captures and reuses those paths downstream.

- Phase 1 loses the TODO-existence and depends-on checks (former steps
  3 and 9 of the recent edit) — Phase 1 is now git/worktree-only.
- Phase 2 expands @pm's existing dispatch into a `Validate run
  prerequisites` operation that returns either {ok: true,
  issue_file_path, issue: {...}} or {ok: false, error_code, message}
  with error_code in {tracker_missing, issue_not_found,
  dependency_unmet, dependency_missing}. depends-on enforcement moves
  here.
- Phase 7 split_needed exit, Phase 9 TODO Update, Phase 9 Commit TODO
  Changes, and Failure Handler all reference @pm-returned paths or use
  `git add ./TODO/` blanketly (safe because Phase 1 verified clean tree
  and only @pm writes there during a run).
- pm.md gains a path-return rule: every read returns issue_file_path,
  every write returns the modified paths. Run-Prerequisite Output
  format documented with all four error codes.
- ADR-22 captures the rationale; routing matrix updates Phase 1/2 rows;
  pipeline diagram labels updated.

The fix is discoverability-only — no permission deny on TODO/, per
explicit user direction. The schema lives in agents/pm.md, which the
orchestrator does not load.

Refs: config/opencode/workflow-design.md ADR-22
2026-05-08 13:45:51 +02:00
3e515d54eb feat(opencode): allow agents to read external Rust crate source
@make, @test, @check often need to inspect dependency source (trait
definitions, impl details, test patterns) to inform implementation or
verify findings. Opencode applies a CWD check on tool access, so reads
outside the worktree previously prompted for each access.

- Add permission.read/grep/glob path allowlists for the three locations
  cargo deps live: ~/.cargo/registry/src/, ~/.cargo/git/checkouts/, and
  /nix/store/*-vendor-*/ for crane / buildRustPackage projects.
- Document the discovery pattern in each agent: `cargo metadata
  --format-version 1` returns absolute paths via packages[].manifest_path.
- Cross-reference the registry paths from the permission.bash allowlist
  comment so future readers see the bash inspection commands (rg/ls)
  intentionally accept paths outside CWD.
- @check gets its first permission block (was tools-only before).

Path-pattern syntax for read/grep/glob isn't fully documented; if
opencode rejects it, fall back to `permission: { external_directory:
allow }` at the project config level.
2026-05-08 13:24:30 +02:00
af6481a5a7 feat(opencode): one-task-per-run model + 9 routing fixes (ADRs 13-21)
Captures the design grilling outcome. Adds ADRs 13-21 covering:
- run-level plan_rework_remaining counter to bound P3<->P5.5/P7/P8 thrash
- non-resumable workflow with throwaway-worktree recovery procedure
- @simplify advisory at every gate (not just Phase 8)
- Phase 8 fix specs go to disk as task-fix-N.md (preserves ADR-6)
- Phase 5.5 BLOCK protocol: orchestrator edits plan, decrements counter, re-enters P4
- Phase 8 NOT_TESTABLE manifest in reviewer prompt
- unified Implementation Incomplete diagnosis (test_design / production_logic / split_needed)
- Phase 1 working-tree cleanliness + depends-on enforcement
- one-task-per-run pivot: Phase 5 still splits N tasks, only task-1 runs;
  tasks 2..N filed as sub-issues with rich seed bodies; split_needed at P7
  aborts to Failure Handler (one-task-per-run = no salvageable prior work)

Auto-resolves big-diff Phase 8 reviews, cross-task regression-within-run, and
mid-flight task-split routing. Rewrites routing matrix and three Mermaid
diagrams; updates @pm (depends-on frontmatter, split-time filing), @check
(third diagnosis verdict), @make (escalate: split_needed flag).
2026-05-08 13:02:54 +02:00
534361f1b5 feat(opencode): extend Phase 7 escalation to mid-implementation test-design errors
Phase 7's escalation rule was gated on @make flagging concerns "during
entry validation" only. When @make got past entry validation, started
implementing, and ground for 2-3 attempts because the test demanded
impossible production code, the orchestrator had no documented route
— it would re-dispatch @make with marginal context tweaks instead of
recognizing the failure as test-architecture.

Splits the escalation into two clearly-named paths (entry-validation
vs mid-implementation) that both route through @check (test diagnosis)
→ @test (redesign) → fresh @make. Bounded at max 2 escalation cycles
before reverting to a Phase 3 plan revisit, to prevent thrashing when
the actual problem is upstream.

@make.md gains a new Iteration Limits red-flag class — "Test-design
suspicion" — instructing @make to stop and report with an explicit
`escalate: test_design` flag in the Blocking Issue section. The flag
is the routing signal the orchestrator switches on.
2026-05-08 10:20:16 +02:00
c3407c9c98 refactor(opencode): drop @pm git-ref read mode, no longer used by workflow
@pm originally had two read modes — git-ref (via `git show <ref>:TODO.md`)
and filesystem. Git-ref existed because the workflow once ran in a bare
repo with no working tree. Once the workflow was simplified to assume
opencode is launched in the worktree, every dispatch (Phase 2 read,
Phase 9 update, Failure handler) uses filesystem mode. Git-ref mode
became dead weight: it added bash permissions, an allowlist, a "Bash
Discipline" section, and a dual-mode "How to Read" section, but the
workflow never invoked it. A reviewer correctly flagged the resulting
inconsistency between the two-mode docs and the single-mode usage.

@pm is now single-mode. Bash access is removed (bash: false, no
permission allowlist). The "How to Read" section collapses to "you
operate on TODO/ via the filesystem only" with one explicit pointer
that ad-hoc historical reads (`git show main:TODO/GAL-39.md`) are
out of scope — the user can run that themselves.

The workflow drops the now-redundant "(live filesystem mode)"
qualifier from Phase 2 / Phase 9 / Failure handler dispatches and
the Roles & Dispatch table updates @pm's constraint to "No bash."
2026-05-08 10:20:16 +02:00
4dc3cffba6 refactor(opencode): allow @test inside #[cfg(test)] mod blocks, drop file gate
The previous design routed Rust unit tests to NOT_TESTABLE: Rust
unit-only because @test was forbidden from touching src/, which
forced @make to write both the production code and the inline
#[cfg(test)] mod tests in one dispatch — losing TDD's RED→GREEN
separation. But Rust module tests inside #[cfg(test)] mod tests
{ ... } are the canonical unit-testing idiom, not an edge case.

@test's File Constraint now allows modifying src/**/*.rs, but
strictly inside #[cfg(test)] mod <name> { ... } blocks. Every line
outside such a block stays read-only — adding pub, importing crates,
declaring siblings, or any other production change is forbidden.
Integration tests at tests/**/*.rs continue to work as before.

The Phase 6 post-step file gate (git status snapshot + comm -23
diff against test-pattern globs) is removed. With @test legitimately
writing inside src/, a path-based gate proves nothing — production
edits and cfg(test) edits live in the same files. The boundary is
enforced by the prompt rule and Phase 8 reviewer scrutiny.

Phase 5 test-file guidance updated to distinguish module vs
integration tests for Rust, with stub-first TDD applying to both
when symbols don't yet exist. The "Rust integration TDD: stub-first"
section is renamed to "Rust stub-first TDD" and now covers module
tests too. NOT_TESTABLE's "Rust unit-only" reason is replaced with
"Missing testability seam" for cases where the production code
needs a small change before tests can be authored.
2026-05-08 10:20:16 +02:00
8373e32f34 fix(opencode): forbid RED-state references in test names
A workflow run produced test names like move_enemies_following_path_
panics_on_todo, path_types_randomly_assigned, and spawn_enemies_
special_stage_panics_on_todo. The first and third leak the stub-first
RED mechanic into the test name; once @make's body pass turns them
GREEN, the name lies. The middle one is too vague to describe a
contract.

Adds a Test Naming subsection to @test's Test Philosophy stating the
TDD survival principle — the name describes the contract under test,
not the current state, and must remain accurate after the body pass.
Bans ..._panics_on_todo / ..._fails_red / ..._stub_works / generic
placeholders / vague verbs / implementation-detail leakage. Requires
action + observable outcome and shows bad-to-good rewrites of the
three names from this run.
2026-05-08 10:20:16 +02:00
5a5cf269dc refactor(opencode): migrate @pm and workflow to per-issue TODO/ folder
The single TODO.md schema is replaced by a Linear-style folder layout
matching the user's existing setup at /home/harald/git/bglga/TODO:

  TODO/
  ├── README.md          # category-grouped index (top-level only)
  ├── GAL-1.md
  ├── GAL-2.md
  └── …

Each issue file has YAML frontmatter (id, title, status, parent,
labels) and a body with optional sections (Sub-issues, Acceptance
criteria, Integration test hints, Comments). The status set shrinks
to Todo / In Progress / Done; Branch / PR / Priority / Assignee
fields are gone. Comments are date-only.

@pm gains directory-walking semantics (still scoped to TODO/), bash
allowlist additions for git ls-tree and ls, and a propagation rule:
status flips to/from Done update the dependent index — README.md for
top-level issues, or the parent file's Sub-issues line for sub-issues.

The workflow's Phase 1 sanity check now verifies TODO/, TODO/README.md,
and TODO/<ID>.md all exist. Phase 2 reads the issue file and flips Todo
to In Progress with index propagation. Phase 9 stages everything under
TODO/ as a separate atomic chore(todo) commit, sets the status to Done
(or leaves In Progress for incomplete runs), and adds a date + branch +
commit comment. Failure handler routes through the same directory.
2026-05-08 10:20:16 +02:00
91ba5bd272 fix(opencode): close two false-green test loopholes and the orchestrator-as-implementer escape hatch
A workflow run on a Bevy weaving feature exposed two compounding
failures:

1. @test wrote 8 structural-only Rust tests that never invoked
   weave_enemies or trigger_weaving. Every test passed against the
   stub-first @make pre-pass because none of them called the
   stubbed symbols, so todo!() never fired. The body-pass committed
   code that "passed" the suite and silently broke trigger_weaving
   in special stages.

2. @check found the trigger_weaving regression at Phase 8 (final
   review) and the orchestrator decided to "fix them directly"
   rather than dispatching @make — taking the license offered by
   the existing review-loop wording.

Test-quality fixes:

- Phase 3 Test Design now requires each behavior to be expressed as
  an action + observable outcome. Structural facts ("enum has 3
  variants", "struct has these fields") are explicitly disqualified.
- Phase 6 stub-first flow gains a mandatory Panic-coverage check:
  after @test returns, the orchestrator re-runs the test command and
  rejects the output unless every test panics on todo!() (i.e. every
  test exercises at least one stubbed symbol). Any passing test is
  structural-only and routes back to @test.
- Phase 6 decision table gets a "Stub-first run: tests pass with zero
  todo!() panics" row covering the same case.
- @test's Test Philosophy gains an explicit Do-NOT-write list of
  structural-only patterns (variant_count, type ascriptions,
  Box::new(my_fn), struct-literal-only flows, all-pass-on-stubs)
  plus a positive rule: every test must call a function and assert
  on observable outcome, or return NOT_TESTABLE rather than pad the
  suite.

Orchestrator boundary fix:

- Phase 8 review loop replaces "fix them directly (no need to
  re-dispatch @make for small fixes)" with the principle "the
  orchestrator does not write production code; @make does". BLOCK,
  behavioral, correctness, and test-quality findings round-trip
  through @make. Only AST-preserving cosmetic edits (typos in
  comments, trailing newlines) may be applied directly. Compiler-
  detected issues (unused imports, dead code) go through @make.
2026-05-08 10:20:16 +02:00
5b5c59aa84 feat(opencode): mandate stub-first @make pre-pass for Rust integration TDD
Rust integration tests live in a separate test crate that imports from
lib.rs, so any test referencing not-yet-existing public API can only
RED at build time. The build error masks assertion diagnostics and
makes the RED state opaque — no stack trace, no left/right values.

For Rust tasks whose @test step writes an integration test against
public API that does not yet exist, the orchestrator now dispatches a
stub-first @make pass before @test runs:

1. @make adds the planned public API as todo!()-bodied stubs in
   lib.rs and any new src/<module>.rs. Signatures lifted verbatim
   from the Phase 5 task spec. Acceptance criterion is cargo check
   only — no test command runs.
2. @test writes the integration test, which now compiles and panics
   at todo!() with a stack trace — a clean MISSING_BEHAVIOR RED.
3. Phase 7 dispatches @make again to replace the todo!() bodies with
   real implementations. Two atomic commits per task: scaffold then
   implement.

Phase 5's Rust test-path guidance now flags the two-dispatch
requirement up front. test.md's Rust failure-classification hints
recognize todo!() / unimplemented!() panics as MISSING_BEHAVIOR with
a pointer to the workflow's stub-first section.
2026-05-07 05:42:16 +02:00
d5d90d8b9f fix(opencode): reject Rust src/tests/ paths as a wrong task spec
A workflow run on a Bevy/Rust project produced the test-file path
`src/tests/test_<feature>.rs`, which @test correctly flagged as
contradictory: it isn't a valid Rust test location (would require
declaring `mod tests;` in production source, which @test cannot do)
yet the file-gate glob `**/tests/**/*.rs` accidentally matched it.

Phase 5 now gives language-aware Test File guidance: Python uses
colocated or top-level `tests/`, Rust uses crate-level `tests/<feature>.rs`,
and Rust unit-only tasks are routed to NOT_TESTABLE for @make to
handle inline. Phase 6's file gate gains an explicit anti-pattern
clause discarding any new file under `src/` even when the glob matches.

@test's own File Constraint mirrors the anti-pattern so the agent
rejects the bad path with BLOCKED before the orchestrator's gate
even runs — defense in depth on both sides of the dispatch boundary.
2026-05-06 18:31:14 +02:00
8fcf7e5d34 feat(opencode): make @make and @test polyglot (Python, Rust, nix devshell)
Both agents previously hardcoded the Python/uv toolchain. They now
detect the language from marker files (pyproject.toml, Cargo.toml,
flake.nix) and run the appropriate test/lint/format/type-check commands
for Python, Rust, or both. When a flake.nix devshell is present, every
toolchain command is wrapped in `nix develop -c …`.

@make's permission allowlist gains `cargo *` and `nix develop -c *`,
plus matching denies for cargo add/remove/install/publish. The
Verification Tiers and Baseline Verification sections are rewritten as
per-language bullets, and output/TDD-evidence examples are now
language-neutral. Generalised the "no Kubernetes deployments"
constraint to cover any deploy/publish.

@test gains the same devshell + cargo allows (scoped to test, check,
clippy, fmt only — no build/run/install). Its file constraint adds
`tests/**/*.rs` for Rust integration tests, with an explicit note that
Rust unit tests stay with @make because they live inside production
source files. Failure-classification hints add Rust compiler-error
mappings, and the NOT_TESTABLE table gets a "Rust unit-only" row.
2026-05-06 17:09:34 +02:00
c879870ccf fix(opencode): remove temperature 2026-05-06 16:43:35 +02:00
d22acf6906 refactor(opencode): let @pm read TODO.md via git show, drop tempfile
Gives @pm narrowly-scoped bash access (git show *, git rev-parse *) so
it can read TODO.md directly from any git ref. The workflow no longer
needs to mktemp + redirect the file before invoking the agent; Phase 2
just tells @pm the bare repo path and default branch and lets it run
git show "$DEFAULT_BRANCH:TODO.md" itself. Cleanup steps for the temp
snapshot are removed from Phase 10 and the failure handler.
2026-05-06 15:42:17 +02:00
37be2d9505 fix(opencode): remove agent models and temperature 2026-05-06 15:33:11 +02:00
2941faa822 refactor(opencode): make workflow forge-agnostic and read TODO.md from bare repo
Drops all GitHub-specific tooling (gh CLI, draft PR creation) so the
workflow stops at git commit and leaves push/PR/MR to the user.

TODO.md is now expected to be a tracked file on the default branch.
Phase 1 verifies the repo is bare via `git rev-parse --is-bare-repository`,
resolves the default branch from HEAD / init.defaultBranch, and snapshots
TODO.md via `git show "$DEFAULT_BRANCH:TODO.md"` to a tempfile that @pm
reads in Phase 2. Phase 10 updates the live TODO.md inside the worktree
and commits the change separately. The /review command drops its PR
mode for the same reason; @pm documents the read-only-snapshot vs.
live-worktree path distinction.
2026-05-06 15:28:08 +02:00
4ec1561af4 feat(opencode): add multi-agent workflow agents and commands
Adds @check, @simplify, @test, @make, @pm subagents and the /workflow
and /review slash commands from the autonomous multi-agent workflow
gist by ppries.

@pm is rewritten to manage issues in a local ./TODO.md file instead of
Linear (file-only access, documented schema, structured JSON output).

/workflow is adapted: TODO.md-based issue context, generic worktree
paths (no hardcoded ~/repos/veo/sunstone), generic branch examples,
and a Phase 1 guard that verifies origin is on GitHub before any
work begins.
2026-05-06 14:56:42 +02:00