Compare commits

...

14 commits

Author SHA1 Message Date
af0c1d6ea5 docs(opencode): add workflow-design.md as design rationale + decision log
Operational rules in commands/workflow.md and the agent files have
been accreting through repeated patches, with the rationale scattered
across commit messages and conversations. New gaps kept surfacing
after the fact (Phase 7 mid-impl escalation, Phase 8 routing for
test-design findings, Phase 5.5 entirely missing) because there was
no single place to audit the flow.

Adds config/opencode/workflow-design.md as a sibling to commands/
and agents/. It is the design rationale and decision log; operational
rules stay in the command and agent files. The intended flow is:
discuss new ideas / failure modes here → reach a decision → update
the operational files → record the decision in the ADR log.

Pre-populated with: cast & responsibilities table; three Mermaid
diagrams (phase pipeline, Phase 7 escalation state machine, issue
lifecycle); a routing matrix that lists every observed (phase,
signal) → action pair so gaps are visible at a glance; 12 ADRs
covering decisions made over the past several days (forge-agnostic,
TODO/ folder, worktree-only, polyglot agents, absolute-path dispatch,
run artifacts on disk, stub-first Rust TDD, @test inside cfg(test)
mod, Phase 5.5, single-mode @pm, file follow-ups, Phase 7 mid-impl
escalation); and 5 open questions teed up for future discussion.
2026-05-08 10:20:16 +02:00
534361f1b5 feat(opencode): extend Phase 7 escalation to mid-implementation test-design errors
Phase 7's escalation rule was gated on @make flagging concerns "during
entry validation" only. When @make got past entry validation, started
implementing, and ground for 2-3 attempts because the test demanded
impossible production code, the orchestrator had no documented route
— it would re-dispatch @make with marginal context tweaks instead of
recognizing the failure as test-architecture.

Splits the escalation into two clearly-named paths (entry-validation
vs mid-implementation) that both route through @check (test diagnosis)
→ @test (redesign) → fresh @make. Bounded at max 2 escalation cycles
before reverting to a Phase 3 plan revisit, to prevent thrashing when
the actual problem is upstream.

@make.md gains a new Iteration Limits red-flag class — "Test-design
suspicion" — instructing @make to stop and report with an explicit
`escalate: test_design` flag in the Blocking Issue section. The flag
is the routing signal the orchestrator switches on.
2026-05-08 10:20:16 +02:00
aac4d44a49 feat(opencode): file unresolved bugs/blockers as TODO sub-issues in Phase 9
A workflow run wrapped up with "Unresolved: Score not resetting on game
restart (pre-existing bug, out of scope)" — a real bug discovered while
implementing GAL-39. Buried in summary.md, which is per-run, untracked,
overwritten on the next run, and read by nobody (the user has walked
away by design).

Adds a File Follow-ups subsection to Phase 9, after the TODO Update.
Tracked-worthy items are routed through @pm as sub-issues of the
current issue (parent: $ISSUE_ID), so they auto-show in the parent's
Sub-issues list and don't need a README.md category at unattended
runtime. Three categories file an issue:

- Pre-existing bugs found out of scope → label `bug`
- Unresolved review-loop blockers (Phase 4 or 8 cycle exhaustion)
  → label `followup`
- @test NOT_TESTABLE "future seam" notes → label `tech-debt`

Things explicitly NOT filed: @simplify advisories the orchestrator
chose not to act on (records, not missing work), cosmetic nits,
duplicates of existing issues. Those live in the run summary's new
"Advisory notes (not filed)" section.

Renames "Commit TODO Changes" subsection so the worked issue update
plus any filed follow-ups commit together as one atomic chore(todo)
commit. The Run Summary's old "Unresolved items" bullet is replaced
with two sharper bullets: "Filed follow-ups" (lists IDs of created
sub-issues) and "Advisory notes (not filed)".
2026-05-08 10:20:16 +02:00
c3407c9c98 refactor(opencode): drop @pm git-ref read mode, no longer used by workflow
@pm originally had two read modes — git-ref (via `git show <ref>:TODO.md`)
and filesystem. Git-ref existed because the workflow once ran in a bare
repo with no working tree. Once the workflow was simplified to assume
opencode is launched in the worktree, every dispatch (Phase 2 read,
Phase 9 update, Failure handler) uses filesystem mode. Git-ref mode
became dead weight: it added bash permissions, an allowlist, a "Bash
Discipline" section, and a dual-mode "How to Read" section, but the
workflow never invoked it. A reviewer correctly flagged the resulting
inconsistency between the two-mode docs and the single-mode usage.

@pm is now single-mode. Bash access is removed (bash: false, no
permission allowlist). The "How to Read" section collapses to "you
operate on TODO/ via the filesystem only" with one explicit pointer
that ad-hoc historical reads (`git show main:TODO/GAL-39.md`) are
out of scope — the user can run that themselves.

The workflow drops the now-redundant "(live filesystem mode)"
qualifier from Phase 2 / Phase 9 / Failure handler dispatches and
the Roles & Dispatch table updates @pm's constraint to "No bash."
2026-05-08 10:20:16 +02:00
cc971b80e0 feat(opencode): add Phase 5.5 task-split review by @check
ppries' README mentioned "@check reviews task split for completeness
and coverage" as a workflow step but the gist's actual workflow.md
never implemented it, and neither did ours. Without a split-review
gate between Phase 5 and Phase 6, an over- or under-split task
surfaces only at Phase 8 final review — after expensive @test and
@make dispatches have already run on a broken split.

Adds Phase 5.5: a short, focused review of the task split as a set,
dispatched only to @check (split is structural / coverage, not
complexity, so @simplify is not involved). The dispatch passes the
absolute paths to plan.md and every task-N.md and asks @check to
evaluate the split against five questions: coverage, no overlap,
single-purpose, integration contracts, testable AC.

Loop limited to 2 cycles (less than the plan-review's 3), with a
BLOCK verdict routing back to Phase 4 when the plan itself does not
decompose cleanly. The phase is explicitly framed as "a quick gate,
not a deep review" — no line-by-line code feedback (there's no code
yet), no design re-litigation (that was Phase 4) — to keep it from
expanding into a second plan review.

No phase renumbering downstream — 5.5 fits between 5 and 6 without
disturbing existing cross-references.
2026-05-08 10:20:16 +02:00
236b4d2470 fix(opencode): teach orchestrator about subagents and enforce on-disk artifacts
Two related orchestration failures from recent runs:

1. An orchestrator missed the multi-agent concept entirely and produced
   reviews / implementations itself instead of dispatching @check / @make.
   The workflow described phases as "Dispatch @<name>" everywhere but
   never explained who the cast was, what "dispatch" meant, or that the
   orchestrator (agent: build) is distinct from the subagents.
2. Another orchestrator dispatched @test pointing at a $RUN_DIR/task-N.md
   that it never wrote — the file-write instruction in Phase 5 was a
   single bolded sentence inside a paragraph, easy to skim past, and
   nothing checked artifact existence before dispatching.

Adds a top-level "Roles & Dispatch" section between the parse line and
Run Artifacts. It establishes the multi-agent model, lists the cast
(@check / @simplify / @test / @make / @pm) with one-line role and
permission notes, defines "Dispatch" as a tool call (not a role-play
instruction), and lists three anti-patterns the orchestrator must
avoid (acting as a subagent, skipping a dispatch, paraphrasing
artifacts instead of letting subagents read them from disk).

Restructures Phase 5 as five explicit numbered steps. Step 4 mandates
writing each task to $RUN_DIR/task-<N>.md and verifying with test -f;
step 5 requires dropping inline copies once the file is the source of
truth. The phase is "not done" until every task file exists on disk.

Adds a row to Dispatch Hygiene's Pre-Dispatch Validation table that
requires test -f verification of any artifact path the dispatch
references; missing files route back to the producing phase.
2026-05-08 10:20:16 +02:00
25f4c6f179 feat(opencode): write plan and task specs to .workflow/run-<id>/ on disk
Plans and task specs were previously re-emitted as inline prompt text on
every dispatch. That meant @check and @simplify might receive paraphrased
versions of the same plan, mid-loop revisions could leak as "actually let
me reconsider" passes, and the same content rode through orchestrator
context many times across review/test/make dispatches.

The orchestrator now writes finalized artifacts to a per-run directory:

  .workflow/run-<ISSUE-ID>/
    plan.md         # Phase 3 output
    task-1.md       # Phase 5 output, one file per task
    task-2.md
    summary.md      # Phase 9 output (was .workflow/workflow-summary.md)

Subagents read these by absolute path; the dispatch prompt body shrinks
to agent role, artifact path, and short per-dispatch context. Mid-loop
revisions (Phase 4 review cycles, etc.) edit the file in place so every
subsequent dispatch sees the same byte-for-byte source of truth — the
Finalized-Text Rule has a physical anchor.

Phase 1 captures WORKTREE_PATH, ISSUE_ID, and RUN_DIR. Phase 3 mkdirs
the run directory and writes plan.md. Phase 4 dispatches reviewers
against plan.md by path. Phase 5 writes one task-N.md per task. Phase
6/7 dispatch @test/@make against task-N.md by path; the @test→@make
TDD handoff stays inline. Phase 8 reviewers re-read plan.md from disk.
Phase 9 renames "Local Summary" to "Run Summary" and writes to
$RUN_DIR/summary.md. The staging exclusion broadens from a single
file to the whole .workflow/ tree, and Failure Handling follows suit.
2026-05-08 10:20:16 +02:00
4dc3cffba6 refactor(opencode): allow @test inside #[cfg(test)] mod blocks, drop file gate
The previous design routed Rust unit tests to NOT_TESTABLE: Rust
unit-only because @test was forbidden from touching src/, which
forced @make to write both the production code and the inline
#[cfg(test)] mod tests in one dispatch — losing TDD's RED→GREEN
separation. But Rust module tests inside #[cfg(test)] mod tests
{ ... } are the canonical unit-testing idiom, not an edge case.

@test's File Constraint now allows modifying src/**/*.rs, but
strictly inside #[cfg(test)] mod <name> { ... } blocks. Every line
outside such a block stays read-only — adding pub, importing crates,
declaring siblings, or any other production change is forbidden.
Integration tests at tests/**/*.rs continue to work as before.

The Phase 6 post-step file gate (git status snapshot + comm -23
diff against test-pattern globs) is removed. With @test legitimately
writing inside src/, a path-based gate proves nothing — production
edits and cfg(test) edits live in the same files. The boundary is
enforced by the prompt rule and Phase 8 reviewer scrutiny.

Phase 5 test-file guidance updated to distinguish module vs
integration tests for Rust, with stub-first TDD applying to both
when symbols don't yet exist. The "Rust integration TDD: stub-first"
section is renamed to "Rust stub-first TDD" and now covers module
tests too. NOT_TESTABLE's "Rust unit-only" reason is replaced with
"Missing testability seam" for cases where the production code
needs a small change before tests can be authored.
2026-05-08 10:20:16 +02:00
8373e32f34 fix(opencode): forbid RED-state references in test names
A workflow run produced test names like move_enemies_following_path_
panics_on_todo, path_types_randomly_assigned, and spawn_enemies_
special_stage_panics_on_todo. The first and third leak the stub-first
RED mechanic into the test name; once @make's body pass turns them
GREEN, the name lies. The middle one is too vague to describe a
contract.

Adds a Test Naming subsection to @test's Test Philosophy stating the
TDD survival principle — the name describes the contract under test,
not the current state, and must remain accurate after the body pass.
Bans ..._panics_on_todo / ..._fails_red / ..._stub_works / generic
placeholders / vague verbs / implementation-detail leakage. Requires
action + observable outcome and shows bad-to-good rewrites of the
three names from this run.
2026-05-08 10:20:16 +02:00
5a5cf269dc refactor(opencode): migrate @pm and workflow to per-issue TODO/ folder
The single TODO.md schema is replaced by a Linear-style folder layout
matching the user's existing setup at /home/harald/git/bglga/TODO:

  TODO/
  ├── README.md          # category-grouped index (top-level only)
  ├── GAL-1.md
  ├── GAL-2.md
  └── …

Each issue file has YAML frontmatter (id, title, status, parent,
labels) and a body with optional sections (Sub-issues, Acceptance
criteria, Integration test hints, Comments). The status set shrinks
to Todo / In Progress / Done; Branch / PR / Priority / Assignee
fields are gone. Comments are date-only.

@pm gains directory-walking semantics (still scoped to TODO/), bash
allowlist additions for git ls-tree and ls, and a propagation rule:
status flips to/from Done update the dependent index — README.md for
top-level issues, or the parent file's Sub-issues line for sub-issues.

The workflow's Phase 1 sanity check now verifies TODO/, TODO/README.md,
and TODO/<ID>.md all exist. Phase 2 reads the issue file and flips Todo
to In Progress with index propagation. Phase 9 stages everything under
TODO/ as a separate atomic chore(todo) commit, sets the status to Done
(or leaves In Progress for incomplete runs), and adds a date + branch +
commit comment. Failure handler routes through the same directory.
2026-05-08 10:20:16 +02:00
91ba5bd272 fix(opencode): close two false-green test loopholes and the orchestrator-as-implementer escape hatch
A workflow run on a Bevy weaving feature exposed two compounding
failures:

1. @test wrote 8 structural-only Rust tests that never invoked
   weave_enemies or trigger_weaving. Every test passed against the
   stub-first @make pre-pass because none of them called the
   stubbed symbols, so todo!() never fired. The body-pass committed
   code that "passed" the suite and silently broke trigger_weaving
   in special stages.

2. @check found the trigger_weaving regression at Phase 8 (final
   review) and the orchestrator decided to "fix them directly"
   rather than dispatching @make — taking the license offered by
   the existing review-loop wording.

Test-quality fixes:

- Phase 3 Test Design now requires each behavior to be expressed as
  an action + observable outcome. Structural facts ("enum has 3
  variants", "struct has these fields") are explicitly disqualified.
- Phase 6 stub-first flow gains a mandatory Panic-coverage check:
  after @test returns, the orchestrator re-runs the test command and
  rejects the output unless every test panics on todo!() (i.e. every
  test exercises at least one stubbed symbol). Any passing test is
  structural-only and routes back to @test.
- Phase 6 decision table gets a "Stub-first run: tests pass with zero
  todo!() panics" row covering the same case.
- @test's Test Philosophy gains an explicit Do-NOT-write list of
  structural-only patterns (variant_count, type ascriptions,
  Box::new(my_fn), struct-literal-only flows, all-pass-on-stubs)
  plus a positive rule: every test must call a function and assert
  on observable outcome, or return NOT_TESTABLE rather than pad the
  suite.

Orchestrator boundary fix:

- Phase 8 review loop replaces "fix them directly (no need to
  re-dispatch @make for small fixes)" with the principle "the
  orchestrator does not write production code; @make does". BLOCK,
  behavioral, correctness, and test-quality findings round-trip
  through @make. Only AST-preserving cosmetic edits (typos in
  comments, trailing newlines) may be applied directly. Compiler-
  detected issues (unused imports, dead code) go through @make.
2026-05-08 10:20:16 +02:00
91e8aab383 fix(opencode): require sequential @make dispatches, tighten @test parallelism
A workflow run dispatched two @make agents in parallel. Both agents
write source files, run cargo verification commands, and may both
target the same file (e.g. src/lib.rs for a new `pub mod` plus a
later registration) — concurrent edits corrupt each other and Cargo's
target/ lock serialises the builds anyway, so parallelism only adds
risk without giving speedup.

Phase 7 now states explicitly that @make dispatches are SEQUENTIAL —
never in parallel — and lists the reasons inline. The rule covers
all @make invocations: standard mode, TDD mode, the Rust stub-pass
and body-pass, and integration-fix dispatches. Stub-pass/body-pass
ordering within a task is strict so @test always RED-verifies against
a deterministic crate state.

Phase 6's parallelism rule splits per language: Python parallel
@test is still allowed for disjoint test files, but Rust @test runs
sequentially since cargo serialises the build and shared crate-level
helper files race.
2026-05-08 10:20:16 +02:00
f0cc300358 fix(opencode): make Phase 6 file gate see untracked files
`git diff --name-only` only shows tracked files with unstaged
modifications. It does not show untracked files — which is precisely
the state of any new test file @test creates, since @test's sandbox
denies `git add`. The pre/post snapshots therefore both missed new
files entirely and `comm -23 post pre` returned nothing, letting the
gate cheerfully conclude nothing changed even when @test had just
created tests/foo.rs (or, worse, src/lib.rs).

Switch both snapshots to `git status --porcelain | sed 's/^...//'
| sort -u`, which captures modified, staged, and untracked files in
a single pass. Inline rationale notes the untracked blind spot so
the orchestrator does not fall back to git diff.
2026-05-08 10:20:16 +02:00
17ad3ba6ef refactor(opencode): hoist dispatch rules into a top-level Dispatch Hygiene section
A workflow run on GAL-38 dispatched a plan to @check that contained a
self-contradicting "Wait, the movement should be direct position
assignment, not delta… Let me reconsider…" passage with two versions
of the same move_enemies code, plus drop-in cargo-pasted match arms /
function bodies (plan-as-implementation). The rules added in 832306c
caught these patterns when @make was the recipient but did not cover
the plan itself, plan-review dispatches, test-author dispatches, or
final-review dispatches.

Hoists the Finalized-Text Rule and Pre-Dispatch Validation table out
of Phase 5/7 into a new top-level "Dispatch Hygiene" section between
Phase 3 and Phase 4, and adds an explicit "No-Implementation-in-Plan-
or-Spec Rule" that bans drop-in code blocks > ~5 lines, full function
bodies, and stage-by-stage transformations from plans and specs alike.

Phases 3, 4, 5, 6, 7, 8 each gain a one-line pointer requiring the
orchestrator to apply Dispatch Hygiene before sending. Phase 5's
former "Code Context Anti-patterns" becomes "Code Context — what to
include" with positive framing, deferring the negative list to the
hoisted rules. The Phase 6 stub-first section's stale anti-pattern
reference is updated to point at Dispatch Hygiene as well.
2026-05-08 10:20:16 +02:00
5 changed files with 742 additions and 228 deletions

View file

@ -272,12 +272,13 @@ If tests fail or verification doesn't pass:
2. **Context/spec issues** — Stop immediately and report; don't guess
3. **Code issues** — Attempt fix (max 2-3 attempts if making progress)
4. **Flaky/infra issues** — Stop and report with diagnostics
5. **Test-design suspicion** — If after 12 attempts the test seems to demand production code that contradicts the spec, asserts on internal state that shouldn't be observable, mocks an internal boundary instead of the external one, or otherwise looks like it's testing the wrong thing — **stop and report with `escalate: test_design`** in the Blocking Issue section. Do not modify the test file yourself; the caller will route to `@check` for diagnosis and `@test` for redesign per the workflow's Phase 7 escalation.
If still failing after 2-3 focused attempts, **stop and report**:
- What was implemented
- What's failing and why
- What you tried
- Suggested next steps
- Suggested next steps (including `escalate: test_design` if the failure points at the test rather than the production code)
Do not loop indefinitely. Better to report a clear failure than burn context.

View file

@ -1,5 +1,5 @@
---
description: Project management agent that manages issues in a local TODO.md file (status, comments, acceptance criteria)
description: Project management agent that manages a Linear-style TODO/ folder (one file per issue plus a README.md index)
mode: subagent
tools:
read: true
@ -7,142 +7,184 @@ tools:
grep: true
write: true
edit: true
bash: true
permission:
bash:
"*": deny
"git show *": allow
"git rev-parse *": allow
bash: false
---
You are a project management assistant. Your sole responsibility is reading and updating a `TODO.md` file. You do **not** modify any other file under any circumstances.
You are a project management assistant. Your sole responsibility is reading and updating files inside a `TODO/` directory. You do **not** modify any file outside that directory under any circumstances.
## How to Read TODO.md
## Directory Layout
There are two ways to read TODO.md, depending on what the caller tells you:
The issue tracker is a folder, not a single file:
1. **From a git ref** (used when there is no working tree, e.g. inside a bare repo) — run `git show <ref>:TODO.md` and parse stdout. Example: caller says "read TODO.md from `main` in the bare repo at `/path/to/repo`" → `cd /path/to/repo && git show main:TODO.md`. This is **read-only**: never attempt to update TODO.md when invoked in this mode. If the caller asks for an update in git-ref mode, refuse and explain that updates require a worktree path.
2. **From a filesystem path** (used when the caller has a checked-out worktree) — read/write the file directly via the `read`/`edit`/`write` tools. The caller supplies an absolute path like `/path/to/worktree/TODO.md`.
```
TODO/
├── README.md # category-grouped index (top-level issues only)
├── GAL-1.md
├── GAL-2.md
└── … one file per issue
```
The caller indicates the mode in the prompt. When the mode is ambiguous, default to read-only git-ref mode and ask.
- Each issue lives in `TODO/<ID>.md`. IDs are short, stable, and uppercase (e.g. `GAL-1`, `ABC-42`).
- `TODO/README.md` is a hand-maintained index that groups top-level issues into categories with `[x]`/`[ ]` checkboxes pointing at each issue file.
If no path or ref is provided, fall back to `./TODO.md` relative to the current working directory (ad-hoc invocations only).
## How to Read and Write TODO Files
## Bash Discipline
You operate on the `TODO/` directory through the filesystem only. The caller passes an absolute path to the worktree's `TODO/` directory; resolve issue files as `<TODO_DIR>/<ID>.md`. Use the `read` / `glob` / `grep` tools to inspect, and `write` / `edit` to update.
The only bash commands you may run are `git show <ref>:TODO.md` and `git rev-parse <args>` (for verifying refs/repo state). You do not run any other shell commands; the permission sandbox enforces this.
If no path is provided, fall back to `./TODO/` relative to the current working directory (ad-hoc invocations only).
If TODO.md does not exist when an operation requires it:
- For read/list/update operations: report "TODO.md not found at <absolute path or ref>" and stop.
- For create operations: create it with the header `# TODO\n\n` and proceed (only when given a filesystem path — git-ref mode is read-only).
If a required file does not exist when an operation requires it:
- For read/update: report "Issue file not found at `<absolute path>`" and stop.
- For create: see the create rules below.
## TODO.md Schema
You do **not** have bash access. Historical reads from a git ref (e.g. "what did `GAL-39` look like on `main` last week?") are out of scope — the user can run `git show main:TODO/GAL-39.md` themselves; that's not something this agent needs to wrap.
Each issue is an H2 section. Issue IDs are short, stable, and uppercase (e.g. `ABC-1`). The format is:
## Issue File Schema (`TODO/<ID>.md`)
```markdown
# TODO
---
id: GAL-39
title: Implement a special stage type
status: Done
parent: GAL-38
labels: [gameplay, advanced-mechanics]
---
## ABC-1: Short imperative title
# GAL-39: Implement a special stage type
- **Status:** Backlog
- **Priority:** Medium
- **Labels:** feature, security
- **Assignee:** self
- **Branch:** (none)
- **PR:** (none)
Free-form markdown describing the problem and context. Spans as many paragraphs as needed.
### Description
## Sub-issues
Free-form markdown describing the problem and context.
- [x] [GAL-40](GAL-40.md) — Subtitle of child issue
- [ ] [GAL-41](GAL-41.md) — Subtitle of child issue
### Acceptance Criteria
## Acceptance criteria
- [ ] First testable criterion
- [ ] Second testable criterion
### Comments
## Integration test hints
- 2026-05-06 10:30 — Comment text here.
- Free-form notes about how to set up tests.
---
## Comments
- 2026-05-07 — Status set to In Progress.
- 2026-05-07 — Branch `GAL-39`, commit 9e6d538 — short summary.
```
**Field rules:**
- **Status** must be one of: `Backlog`, `Todo`, `In Progress`, `In Review`, `Done`, `Cancelled`.
- **Priority** must be one of: `Urgent`, `High`, `Medium`, `Low`, `None`.
- **Labels** is a comma-separated list, or `(none)`.
- **Branch** / **PR** are free-form strings or `(none)`.
- Sections (`### Description`, `### Acceptance Criteria`, `### Comments`) are always present in that order. Empty sections still have the heading.
- Issues are separated by a `---` horizontal rule.
- Comments are append-only and timestamped `YYYY-MM-DD HH:MM` in local time.
**Frontmatter rules:**
- `id` — must equal the filename basename (e.g. `GAL-39` for `GAL-39.md`).
- `title` — short, imperative phrase. Mirrored in the H1 below the frontmatter as `# <ID>: <title>`.
- `status` — one of: `Todo`, `In Progress`, `Done`. (No other values; the old `Backlog`/`In Review`/`Cancelled` set is gone.)
- `parent` — either `null` (top-level issue) or another issue ID (e.g. `GAL-38`). Sub-issues belong to their parent's `## Sub-issues` list.
- `labels` — YAML list of strings, e.g. `[gameplay, advanced-mechanics]`. May be `[]`.
**Body rules:**
- The first heading is `# <ID>: <title>` (matches frontmatter).
- One free-form description paragraph (or more) follows.
- Optional sections, in this order when present: `## Sub-issues`, `## Acceptance criteria`, `## Integration test hints`, `## Comments`. Omit a section entirely rather than including an empty heading.
- `## Sub-issues` lines look like `- [x] [GAL-40](GAL-40.md) — Subtitle` with `[x]` when the child's status is `Done`, otherwise `[ ]`.
- `## Acceptance criteria` lines are checkboxes the workflow can flip off as work progresses.
- `## Comments` is append-only. Each comment is a single line `- YYYY-MM-DD — <text>` (date only, no time of day).
## README.md Schema
`TODO/README.md` is a hand-curated category index covering **only top-level issues** (those with `parent: null`). Format:
```markdown
# Project Issues
Linear-style issue tracker for <project>. Each issue lives in its own `<PREFIX>-N.md` file in this folder.
Statuses: `Todo`, `In Progress`, `Done`.
## 1. Category name
- [x] [GAL-1](GAL-1.md) — Title
- [ ] [GAL-25](GAL-25.md) — Title
```
- A line's checkbox is `[x]` iff the linked issue's `status` is `Done`, otherwise `[ ]`.
- Categories and category ordering are user-curated — do **not** invent new categories. When creating a new top-level issue, ask the caller which category it belongs in.
## Capabilities
You can:
- **View** an issue by ID (`ABC-1`) — return all of its fields verbatim, structured.
- **List** issues, optionally filtered by status, priority, or label.
- **Create** an issue with title, description, acceptance criteria, labels, priority. Default status is `Backlog`. Generate the next issue ID by scanning existing IDs with the same prefix and incrementing; if no prefix is provided, use `TODO-`.
- **Update** an issue's metadata (status, priority, labels, assignee, branch, PR).
- **Add a comment** to an issue. Always prepend timestamp.
- **Check off** an acceptance-criteria checkbox by index or by matching text.
- **Edit** description or acceptance criteria when explicitly requested.
- **View** an issue by ID — read `<TODO_DIR>/<ID>.md` and return its fields structured.
- **List** issues, optionally filtered by status / parent / label. Walk `<TODO_DIR>/*.md` (excluding `README.md`), parse frontmatter.
- **Create** an issue. Generate the next ID by scanning existing IDs with the same prefix and incrementing. Default `status: Todo`. Write `<TODO_DIR>/<NEW-ID>.md`. If the issue is top-level (`parent: null`), update `README.md` to add it under the caller-specified category. If the issue is a sub-issue (`parent: <PARENT-ID>`), update the parent file's `## Sub-issues` section.
- **Update status** in frontmatter. When status changes to/from `Done`, propagate the checkbox flip to:
- `README.md` if the issue is top-level (`parent: null`), **or**
- the parent issue's `## Sub-issues` line if it has a parent.
- **Add a comment** — append `- YYYY-MM-DD — <text>` to the issue's `## Comments` section (create the section if missing, just before EOF).
- **Check off acceptance criteria** by index or matching text — flip `- [ ]` to `- [x]` under `## Acceptance criteria`.
- **Edit** description or other body sections when explicitly requested.
You cannot:
- Delete issues. If asked, set status to `Cancelled` instead.
- Modify any file other than `TODO.md`.
- Run shell commands.
- Delete issues. If asked, leave the file in place and report — the new schema has no `Cancelled` state, so deletion would lose history.
- Modify any file outside `TODO/`.
- Modify `TODO/README.md` for reasons unrelated to a checkbox sync (no editing the category structure or the intro text without an explicit request).
- Run shell commands. You have no bash access.
## Output Format
When asked to view or list issues, return structured output as fenced JSON when the caller is a workflow/subagent invocation, otherwise return a concise human summary. Default to JSON if uncertain. Schema:
When asked to view or list issues, return structured output as fenced JSON when the caller is a workflow/subagent, otherwise a concise human summary. Default to JSON if uncertain.
Single-issue schema:
```json
{
"id": "ABC-1",
"title": "...",
"status": "Backlog",
"priority": "Medium",
"labels": ["feature"],
"assignee": "self",
"branch": "(none)",
"pr": "(none)",
"description": "...",
"id": "GAL-39",
"title": "Implement a special stage type",
"status": "Done",
"parent": "GAL-38",
"labels": ["gameplay", "advanced-mechanics"],
"description": "…",
"sub_issues": [
{ "id": "GAL-40", "title": "…", "checked": true }
],
"acceptance_criteria": [
{ "checked": false, "text": "First criterion" }
],
"integration_test_hints": "…",
"comments": [
{ "timestamp": "2026-05-06 10:30", "text": "..." }
{ "date": "2026-05-07", "text": "…" }
]
}
```
For lists, return an array of issues with at minimum `id`, `title`, `status`, `priority`, `labels`.
Omit fields whose corresponding sections are absent (`null` is fine for `parent`, but drop `sub_issues`/`acceptance_criteria`/`integration_test_hints`/`comments` entirely if the section isn't in the file).
For list output, return an array of `{id, title, status, parent, labels}` objects.
## Edit Discipline
- Use targeted edits (`edit` tool) for field changes and checkbox toggles. Do not rewrite the entire file for a small change.
- Preserve formatting: blank lines between sections, exact heading levels, the trailing `---` between issues.
- When appending a comment, keep the comments list in chronological order (oldest first, newest last).
- When creating a new issue, append it to the end of the file with a leading `---` separator from the previous issue (if any).
- If the file's current content does not match the schema, do **not** silently reformat it. Report the deviation and ask before normalizing.
- Use targeted edits (`edit` tool) for status changes, checkbox toggles, and comment appends. Do not rewrite the whole file for a small change.
- Preserve frontmatter formatting (key order, list syntax).
- Comments are append-only and chronological (oldest first).
- When propagating a status change, update the issue file **and** the dependent index (README.md or parent file) in the same response. If you can only update one due to an error, report the partial state instead of silently leaving the index out of sync.
- If a file's content does not match the schema (missing required frontmatter, no H1, weird section order), do **not** silently reformat. Report the deviation and ask before normalizing.
## Guidelines
### When creating issues
- Always set `Status: Backlog` unless the caller specifies otherwise.
- Use clear, imperative titles ("Add retry logic to ingest worker", not "retry stuff").
- Acceptance criteria must be testable checkboxes — vague criteria get pushed back.
- Default `status: Todo` unless the caller says otherwise.
- Title: short, imperative ("Add retry logic to ingest worker", not "retry stuff").
- Frontmatter must be complete: `id`, `title`, `status`, `parent`, `labels`.
- Always update the dependent index (README.md for top-level, parent file for sub-issues) so the new issue is visible.
### When updating issues
- Confirm the change in your response (e.g. "ABC-1 status: Backlog → In Progress").
- A status change to `Done` is only valid if all acceptance-criteria checkboxes are checked. If they are not, report which ones remain and ask for confirmation before forcing the change.
### When updating status
- Confirm the change (e.g. "GAL-39 status: In Progress → Done").
- A status change to `Done` is only valid if all acceptance-criteria checkboxes (when the section exists) are checked. If they are not, report which ones remain and ask for confirmation before forcing the change.
- After flipping status, sync the README.md or parent's Sub-issues checkbox in the same edit cycle.
### When adding comments
- Use 24-hour local timestamps with the format `YYYY-MM-DD HH:MM`.
- Comments are factual records — link to PRs, capture decisions, note blockers. Avoid chatty filler.
- Date only (`YYYY-MM-DD`), not time of day. Get the date from the shell or the caller — never fabricate one.
- Comments are factual records — link to commits/branches, capture decisions, note blockers. Avoid chatty filler.
### Communication style
- Be concise and action-oriented.
- Reference issues by `ID: title` (e.g. `ABC-1: Add retry logic`).
- Proactively suggest next steps when relevant (e.g. "Status set to In Review — consider linking the PR.").
- Concise and action-oriented.
- Reference issues by `ID: title` (e.g. `GAL-39: Implement a special stage type`).
- Proactively flag missing-section / broken-link / out-of-sync state when you encounter it.

View file

@ -113,29 +113,18 @@ Python:
- `**/test_data/**`
- `**/test_fixtures/**`
Rust (integration tests only — see "Rust unit tests" below):
- `tests/**/*.rs` (crate-level integration tests directory)
- `**/tests/**/*.rs` (per-crate integration tests in workspace layouts)
- `**/test_data/**`
- `**/test_fixtures/**`
Rust:
- **Integration tests:** `tests/**/*.rs` and `**/tests/**/*.rs` (workspace-style `<crate>/tests/...`). Create new files; do not modify existing integration tests in unrelated tasks.
- **Module tests:** `src/**/*.rs` — but **only inside `#[cfg(test)] mod <name> { … }` blocks**. You may:
- Append a new `#[cfg(test)] mod tests { use super::*; … }` block at the end of an existing source file.
- Add new `#[test] fn` items inside an already-existing `#[cfg(test)] mod` block.
- Edit/remove `#[test] fn` items you previously authored inside such a block.
- **Test data / fixtures:** `**/test_data/**`, `**/test_fixtures/**`.
**Anti-patterns — refuse the path even if the glob above matches:**
- Anything under `src/` (e.g. `src/tests/foo.rs`, `src/**/tests/...`). `src/tests/` is a regular module under `src/`; it would require declaring `mod tests;` in production code (`lib.rs` / `main.rs`) and creating `mod.rs`, which you cannot do. If the caller asks for such a path, treat it as a wrong task spec: return `BLOCKED` with a note that the path is not a valid Rust test location, suggesting `tests/<feature>.rs` (or `NOT_TESTABLE: Rust unit-only` if the test really needs to be in-source).
**Strict boundary rule for Rust module tests:** every line outside a `#[cfg(test)] mod` block is read-only. Adding `pub`, changing function signatures, importing crates, declaring new `pub mod` siblings, touching the prelude, or any other production-code edit is forbidden — those changes belong to `@make`. If the test cannot be written without such a change, report the missing seam to the caller and return `NOT_TESTABLE` (or, for a fresh public API, request a stub-first `@make` pre-pass).
**You may NOT modify production/source code under any circumstances.**
### Rust unit tests
Rust unit tests live inside production source files (inside `#[cfg(test)] mod tests { ... }` blocks in `src/**/*.rs`). Because that would require modifying production code, **you do not write Rust unit tests.** Options when the task spec requests unit-level coverage in Rust:
1. Convert to an integration test under `tests/` if the unit is part of the public API.
2. Return `NOT_TESTABLE` with reason `pure-wiring` or `external-system` if no integration-level seam exists, and let `@make` write the in-source tests.
Report this constraint to the caller rather than silently degrading coverage.
If you believe source code needs changes to be testable, report this to the caller — do not edit it yourself.
This constraint is enforced by a post-step file gate. Violations cause your output to be discarded.
**Anti-patterns — refuse the path even if it would technically be writable:**
- `src/tests/foo.rs` and similar regular submodule paths under `src/`. These are not `#[cfg(test)]` modules — they are normal modules that would require a `mod tests;` declaration in production code (`lib.rs` / `main.rs`), which you cannot add. Report as `BLOCKED` and suggest either `tests/<feature>.rs` (integration) or a `#[cfg(test)] mod tests` block inside the relevant `src/<module>.rs`.
## Test Philosophy
@ -149,6 +138,14 @@ This constraint is enforced by a post-step file gate. Violations cause your outp
- Trivial tests (constructor creates object, getter returns value)
- Tests that assert on mock behavior rather than real behavior
- Tests requiring excessive mocking (>2 mocks suggests design problem — report it)
- **Structural-only tests** that never invoke the function/method under test. Forbidden patterns:
- `assert_eq!(std::mem::variant_count::<X>(), N)` — variant count is a refactor-tripwire, not behavior.
- `let _: TypeName = …;` / `let _: fn(…) -> _ = my_fn;` — a type ascription that compiles tells you the symbol exists, not what it does.
- `Box::new(my_fn)` / `&my_fn as &dyn Fn(…)` — coercing a function pointer is not calling it.
- Struct-literal construction (`Foo { a: 1, b: 2 }`) followed only by field re-reads — that exercises field access, not the methods that mutate or read state.
- Tests in a stub-first scenario where every test passes without a `todo!()` panic — by definition no test actually called the stub.
**Positive rule — every test MUST exercise behavior.** Each test body must call at least one function or method that is the subject of the task and assert on an *observable outcome* (return value, mutated state, raised error, side effect). If the only thing you can write is a structural assertion, the task is "no test needed" — report it back to the caller as `NOT_TESTABLE` (with a clear reason) rather than padding the suite with type-only tests that produce false-green coverage.
**Follow existing codebase patterns** (per language):
@ -167,6 +164,27 @@ Rust:
- Use `assert_eq!`, `assert_ne!`, `assert!` with informative messages.
- Use existing test helpers from the crate's `tests/common/` module when present.
### Test Naming
In TDD, tests are *specifications*. The test name describes the **contract under test**, not the test machinery or the current RED state. The same name must be valid both before the body pass (RED) and after it (GREEN). If a name wouldn't survive the body pass, rename now.
**Forbidden naming patterns:**
- Anything referencing the stub mechanic: `..._panics_on_todo`, `..._fails_red`, `..._stub_works`, `..._not_yet_implemented`. These describe the RED state, which disappears once `@make` fills in the body.
- Generic placeholders: `test_works`, `it_does_the_thing`, `basic_test`.
- Vague verbs without an outcome: `..._handles_input`, `..._processes_data` — handles or processes how, with what observable result?
- Implementation-detail names that leak internals: `..._calls_query_get_mut_three_times`, `..._uses_hashmap`.
**Required form: action + observable outcome.** Examples:
| Bad | Good |
|---|---|
| `move_enemies_following_path_panics_on_todo` | `move_enemies_advances_position_along_path` |
| `path_types_randomly_assigned` | `spawn_in_special_stage_assigns_one_of_three_pattern_types` |
| `spawn_enemies_special_stage_panics_on_todo` | `spawn_enemies_in_special_stage_attaches_flight_pattern_component` |
| `weaving_test` | `weave_enemies_removes_weaving_component_after_duration` |
The name should read like a sentence: "[subject] [verb] [observable outcome under condition]". When you can't write such a sentence, the test is testing too much (split it) or testing the wrong thing (revisit the spec).
### Devshell wrapping
If the project has a `flake.nix` with a `devShells.default`, wrap every test/lint command with `nix develop -c …` (e.g. `nix develop -c cargo test`, `nix develop -c uv run pytest`). The devshell guarantees the right toolchain is on PATH.
@ -228,7 +246,7 @@ You may return `NOT_TESTABLE` only for these allowed reasons:
| **External system without harness** | Change only affects API call to service with no local mock possible |
| **Non-deterministic** | GPU numerical results, timing-dependent behavior |
| **Pure wiring** | Decorator swap, import / `use` reorganization, no logic change |
| **Rust unit-only** | Coverage requires `#[cfg(test)]` mod tests in production source; @test cannot write those — let @make handle it |
| **Missing testability seam** | Test would require a production-code change beyond a `#[cfg(test)] mod` block (e.g. a private function needs `pub(crate)`, a refactor exposes a hook). Report the missing seam so `@make` can add it before tests are authored. |
Must provide:
- Which allowed reason applies

View file

@ -8,47 +8,105 @@ You are executing the multi-agent workflow inside the worktree this opencode ses
**Prerequisites (the user handles before launching opencode):**
- A git worktree is checked out for the issue's feature branch
- `opencode` was launched from the root of that worktree
- `TODO.md` is committed to the repo and present at `./TODO.md`
- A `TODO/` directory is committed to the repo containing per-issue files (`TODO/<ID>.md`) plus `TODO/README.md`
**Task reference:** $ARGUMENTS
If `$ARGUMENTS` is empty, stop immediately: "Usage: `/workflow <ISSUE-ID> [base-branch]` (e.g. `/workflow ABC-1`). The ID must exist in `./TODO.md`. Base branch defaults to `main` (then `master`)."
If `$ARGUMENTS` is empty, stop immediately: "Usage: `/workflow <ISSUE-ID> [base-branch]` (e.g. `/workflow ABC-1`). The ID must exist as `./TODO/<ID>.md`. Base branch defaults to `main` (then `master`)."
Parse `$ARGUMENTS`: the first whitespace-separated token is the issue ID, an optional second token overrides the base branch.
Parse `$ARGUMENTS`: the first whitespace-separated token is the issue ID, an optional second token overrides the base branch. Store as `ISSUE_ID`.
---
## Roles & Dispatch
This is a **multi-agent** workflow. There is one orchestrator (you, running in `agent: build` mode per this file's frontmatter) and a cast of specialised subagents that the orchestrator dispatches at each phase. **The orchestrator coordinates; subagents do the work.** The orchestrator does not write production code, write tests, or play any subagent's role — it plans, dispatches, merges findings, edits its own artifacts under `.workflow/`, and commits.
**The cast** (each defined as a separate agent file under `config/opencode/agents/<name>.md`):
| Subagent | Role | Notable constraints |
|---|---|---|
| `@check` | Reviews plans and code for risks, correctness, testability. Returns `ACCEPTABLE` / `NEEDS WORK` / `BLOCK`. | Read-only — no write / edit / bash. |
| `@simplify` | Reviews for unnecessary complexity. Advisory recommendations. | Read-only. |
| `@test` | Writes failing tests for a task spec, verifies RED, hands off to `@make`. | May only modify test files / `#[cfg(test)] mod` blocks. Bash sandboxed to test runners. |
| `@make` | Implements a single task spec. Verifies acceptance criteria. | May only modify files listed in the task spec. Bash sandboxed to language toolchains; no `git`, network, `cd`. |
| `@pm` | Reads/updates `TODO/` issue files. | May only modify `TODO/` contents. No bash. |
**What "Dispatch" means here.** Every "dispatch `@<name>`" in the phase descriptions is a call to opencode's subagent / task invocation tool with that agent name. Each dispatch starts a **fresh context**: the subagent has no memory of prior phases, no view of this orchestration, and no access beyond what its own file declares. The subagent receives only what the dispatch prompt provides — typically an absolute path to a file in `$RUN_DIR` plus a small per-dispatch context block.
**Anti-patterns to avoid:**
- Performing a subagent's work in the orchestrator's session ("I'll think like `@check` for a moment and produce the review myself"). Every `@<name>` reference is a tool call, not a role-play.
- Skipping a dispatch because the orchestrator "could just do it." The agents enforce permission boundaries the orchestrator (in `agent: build` mode) does not have.
- Paraphrasing a subagent's output into the next dispatch's prompt instead of letting the next subagent read the on-disk artifact directly.
---
## Run Artifacts
The orchestrator writes plan and task-spec artifacts to a per-run directory in the worktree. Subagents read these by absolute path rather than from inline prompt text. This keeps dispatch prompts small, eliminates paraphrase drift between dispatches (`@check` and `@simplify` see the same plan byte-for-byte), and gives Dispatch Hygiene's Finalized-Text Rule a physical anchor — the file *is* the final version.
**Directory layout** (relative to `$WORKTREE_PATH`):
```
.workflow/
└── run-<ISSUE-ID>/
├── plan.md # Phase 3 output — finalized
├── task-1.md # Phase 5 output — one file per task
├── task-2.md
└── summary.md # Phase 9 output (the run summary)
```
Define `RUN_DIR="$WORKTREE_PATH/.workflow/run-$ISSUE_ID"` once in Phase 1 and reference it everywhere downstream. Create the directory in Phase 3 (`mkdir -p "$RUN_DIR"`).
**Authoring rules:**
- Files are written by the orchestrator, never by subagents.
- Files are passed to subagents as absolute paths: e.g. *"the plan is at `<RUN_DIR>/plan.md`; read it before responding."* The dispatch prompt body should be short — agent role, artifact path, per-dispatch context (worktree path, branch, base branch). **Do not quote artifact contents inline.**
- Mid-loop revisions (Phase 4 review cycle, Phase 5 task respec, etc.) edit the file in place; every subsequent dispatch reads the new version automatically.
**Lifecycle:**
- Files persist across phases until the run finishes.
- Files are **not committed** (same as `summary.md`). Recommend `.workflow/` in `.gitignore`.
- Multiple runs on the same issue overwrite the prior run's artifacts. Save anything you want to keep before re-running.
---
## Phase 1: Sanity Check
1. Verify CWD is a non-bare git worktree: `git rev-parse --is-bare-repository 2>/dev/null` must output `false`. If not, stop: "Workflow must be run from a non-bare worktree (the directory opencode was launched in)."
2. Verify `./TODO.md` exists. If not, stop: "TODO.md not found in the current worktree. Commit a TODO.md to the repo first."
3. Verify HEAD is not detached: `git symbolic-ref --short HEAD` must succeed. If it fails, stop: "Cannot run on a detached HEAD. Check out a feature branch first."
4. Capture the current branch: `BRANCH_NAME="$(git symbolic-ref --short HEAD)"`.
5. Resolve the base branch (`BASE_BRANCH`):
2. Capture the worktree path: `WORKTREE_PATH="$(pwd)"`.
3. Verify the TODO tracker exists:
- `./TODO/` directory must exist. If not, stop: "TODO/ directory not found in the current worktree. Commit a TODO/ folder with one file per issue plus a README.md index."
- `./TODO/README.md` must exist. If not, stop: "TODO/README.md not found. Add the category index file before running the workflow."
- `./TODO/$ISSUE_ID.md` must exist. If not, stop: "Issue file `./TODO/<ID>.md` not found for ID parsed from `$ARGUMENTS`."
4. Verify HEAD is not detached: `git symbolic-ref --short HEAD` must succeed. If it fails, stop: "Cannot run on a detached HEAD. Check out a feature branch first."
5. Capture the current branch: `BRANCH_NAME="$(git symbolic-ref --short HEAD)"`.
6. Resolve the base branch (`BASE_BRANCH`):
- If `$ARGUMENTS` provided a second token, use it.
- Else if `git rev-parse --verify --quiet main` succeeds, use `main`.
- Else if `git rev-parse --verify --quiet master` succeeds, use `master`.
- Else stop: "Could not determine base branch (no `main` or `master`). Pass it as the second argument: `/workflow <ISSUE-ID> <base-branch>`."
6. Verify the current branch is not the base branch: if `BRANCH_NAME == BASE_BRANCH`, stop: "Cannot run workflow on the base branch (`$BASE_BRANCH`). Switch to a feature branch first."
7. Verify the current branch is not the base branch: if `BRANCH_NAME == BASE_BRANCH`, stop: "Cannot run workflow on the base branch (`$BASE_BRANCH`). Switch to a feature branch first."
8. Set the run-artifacts directory: `RUN_DIR="$WORKTREE_PATH/.workflow/run-$ISSUE_ID"`. Phase 3 will `mkdir -p "$RUN_DIR"` before writing the first artifact.
---
## Phase 2: Issue Context
Dispatch `@pm` to read `./TODO.md` (live filesystem mode) and fetch the issue matching the parsed ID:
- Issue title, description, acceptance criteria
- Labels and priority
Dispatch `@pm` against `./TODO/` (pass the absolute `TODO/` directory path) and fetch the issue at `./TODO/<ID>.md`:
- Title, description, acceptance criteria (if section present)
- Labels and parent
- Sub-issues list (if the issue is a parent)
- Existing status
If the issue does not exist or `@pm` fails, stop with error.
If the issue file does not exist or `@pm` fails, stop with error.
If the issue's status is `Backlog` or `Todo`, ask `@pm` to set it to `In Progress` (this edit will be staged in Phase 9 alongside other TODO.md updates).
If the issue's status is `Todo`, ask `@pm` to set it to `In Progress` and propagate the change to the dependent index (`README.md` for top-level issues, the parent's `## Sub-issues` line for sub-issues). The status edit will be staged alongside other TODO updates in Phase 9.
---
## Phase 3: Plan
Analyze the codebase. Create a detailed implementation plan addressing the issue's requirements and acceptance criteria.
Analyze the codebase. Create a detailed implementation plan addressing the issue's requirements and acceptance criteria, then write it to `$RUN_DIR/plan.md` (run `mkdir -p "$RUN_DIR"` first if the directory doesn't exist). All Phase 4 reviewer dispatches read this file.
The plan should include:
- Problem summary (from issue context)
@ -57,8 +115,8 @@ The plan should include:
- New files to create
- Risks and open questions
- **Test Design (conditional — include for non-trivial tasks):**
- Key behaviors to verify (what tests should assert)
- Edge cases and error conditions worth testing
- Key behaviors to verify, expressed as **action + observable outcome** (e.g. *"call `weave_enemies` with t=0.5 → enemy `Transform.translation.x` differs from initial position"*). A structural fact like *"enum has 3 variants"* or *"struct has these fields"* is **not** a behavior — it cannot fail meaningfully and does not exercise the code under test.
- Edge cases and error conditions worth testing (also expressed as actions, not structure)
- What explicitly should NOT be tested (prevents bloat)
- Testability concerns (heavy external deps, GPU-only paths, etc.)
@ -66,11 +124,73 @@ The plan should include:
**Skip Test Design for:** Config-only changes, decorator swaps, import reorganization, documentation.
When skipped, `@test` derives test cases directly from acceptance criteria.
Before saving `plan.md`, apply **Dispatch Hygiene** (below). The file on disk is what reviewers will read in Phase 4 — there is no second chance to revise during dispatch.
---
## Dispatch Hygiene
This applies to **every** subagent dispatch (Phases 4, 6, 7, 8) **and** to artifacts that will be dispatched (the plan from Phase 3, the task specs from Phase 5). Apply these checks before sending — fix the artifact, then re-check.
### Finalized-Text Rule
The artifact must be **finalized** — single-author text, no contradictions, no open questions. Forbidden:
- "Actually, that's wrong — let me correct…"
- "Wait, let me reconsider…"
- Two versions of the same code block, one labelled "corrected" or appearing after a revision pass
- Open questions or ambiguities the orchestrator hasn't resolved
- Mid-text revisions visible to the recipient
If you find yourself revising while writing, stop, redo the artifact from scratch with the corrected understanding, and only then dispatch. Subagents are fresh-context — they cannot reliably resolve which of two contradictory drafts is canonical, and reviewers cannot give a clean verdict on a self-contradicting plan.
### No-Implementation-in-Plan-or-Spec Rule
Plans (Phase 3) and task specs (Phase 5) are **not** the place to write the answer. They describe what to do; `@make` writes how.
Provide:
- Approach with rationale
- Files to modify with brief descriptions
- Function signatures, type declarations, data shapes (structure, not logic)
- Constraints, invariants, integration contracts
- Risks and edge cases
Do **not** provide:
- Drop-in code blocks longer than ~5 lines that constitute "the answer"
- Full function bodies for the changes being planned
- Complete `match` arms / branch logic / loop bodies for new behavior
- Pre-written test bodies (those come from `@test`)
- Stage-by-stage code transformations spelled out as ready-to-commit diffs
If you've already written the implementation in the plan or spec, the artifact has overstepped. Convert finished code into structural description (signature + intent) and let `@make` produce the body.
**Allowed in plans/specs:**
- Existing code being replaced, marked as "current state"
- Function signatures and type/struct/enum declarations (data, not logic)
- Tiny inline constants (`pub const FOO: f32 = 30.0;`)
- Test specifications as one-line behavior descriptions ("input X → expect Y")
### Pre-Dispatch Validation (MANDATORY)
Scan the artifact and reject (revise, retry) if any of the following are present:
| Check | Why it matters |
|---|---|
| `bash -c`, `sh -c`, `zsh -c`, `fish -c` (anywhere, including inside `nix develop --command bash -c …`) | `@make`/`@test` sandboxes deny all `*-c` shell invocations and any nested `bash` would bypass the per-command allowlist. Replace with one direct command per line: `nix develop -c cargo check`, etc. |
| `nix develop --command bash` / `nix develop -c bash` / `nix develop -c sh` | Same — inner shell escapes the sandbox. Wrap each toolchain command directly. |
| Any `cd <path> && …` | Subagents cannot `cd`. Rewrite to use absolute paths. |
| Code blocks longer than ~5 lines that draft the answer | Violates No-Implementation-in-Plan-or-Spec. Trim to structure (signature + "current state" only). |
| Two versions of the same code, "actually let me correct…", or open questions | Violates the Finalized-Text Rule. Redo the artifact. |
| Test bodies inside `@make` specs when tests are coming from `@test` | Duplicates the TDD handoff. |
| Artifact path referenced in the dispatch (e.g. `$RUN_DIR/plan.md`, `$RUN_DIR/task-<N>.md`) but the file isn't on disk | The subagent will fail to read it and either error or fabricate context. **Verify with `test -f "<path>"` before every dispatch.** If missing, go back to the phase that produces it (Phase 3 for `plan.md`, Phase 5 for `task-<N>.md`) and write the file before retrying. |
If any check trips, **do not dispatch.** Fix and re-validate. Repeated trips on a single task signal a Phase 5 split problem — go back and split.
---
## Phase 4: Review Plan
Dispatch `@check` and `@simplify` in parallel to review the plan.
Dispatch `@check` and `@simplify` in parallel to review `$RUN_DIR/plan.md`. The dispatch prompt is short — agent role, the absolute path to the plan, the worktree path, and any per-dispatch reviewer focus. Tell each reviewer to read the plan from disk; do **not** paste the plan inline. Apply **Dispatch Hygiene** to each dispatch prompt.
Reviewers should evaluate testability:
- `@check`: Is the design testable? Are the right behaviors identified? (Review Framework §8)
@ -82,10 +202,10 @@ Reviewers should evaluate testability:
- Note conflicts explicitly
**Review loop (max 3 cycles):**
1. Send plan to both reviewers
1. Dispatch both reviewers against `$RUN_DIR/plan.md`.
2. Merge findings
3. If verdict is ACCEPTABLE from both (or JUSTIFIED COMPLEXITY from `@simplify`): proceed to Phase 5
4. If BLOCK or NEEDS WORK: revise the plan addressing findings, then re-review
4. If BLOCK or NEEDS WORK: edit `$RUN_DIR/plan.md` in place addressing findings (re-apply Dispatch Hygiene to the updated file), then re-review.
5. **Convergence detection:** if reviewers return the same findings as the previous cycle, stop the loop early
6. If still unresolved after 3 cycles: note unresolved blockers and proceed anyway (they will be documented in the workflow summary and commit message)
@ -93,7 +213,17 @@ Reviewers should evaluate testability:
## Phase 5: Split into Tasks
Break the approved plan into discrete tasks for `@make`. Each task needs:
**The output of this phase is one file per task at `$RUN_DIR/task-<N>.md`** (1-indexed: `task-1.md`, `task-2.md`, …). These files are the source-of-truth that Phase 6 (`@test`) and Phase 7 (`@make`) read by absolute path. **No file written = no dispatch in later phases.** If you skip the file-write step, every downstream dispatch will reference a non-existent path and fail.
Steps:
1. Break the approved plan into discrete tasks (see Split Heuristic and task-size guidance below).
2. For each task, draft the task spec covering the fields in the table below.
3. Apply **Dispatch Hygiene** (above) to each draft.
4. **Write each finalized spec to `$RUN_DIR/task-<N>.md`.** After writing, verify with `test -f "$RUN_DIR/task-<N>.md"` for every N. Phase 5 is not done until every task file exists on disk.
5. Drop your inline copies of the task drafts. From this phase onward, the file is the only source of truth — if you need a task spec later, read it back from disk.
Each task file must contain:
| Required | Description |
|----------|-------------|
@ -111,9 +241,10 @@ The test file path must follow the language's actual test layout. **Do not inven
- Colocated: `<module>/tests/test_<feature>.py (create)`
- Top-level: `tests/test_<feature>.py (create)`
- **Rust**
- Crate-level integration tests: `tests/<feature>.rs (create)` (or, in a workspace, `<crate>/tests/<feature>.rs`).
- **If the test references not-yet-existing public API**, the task automatically requires a **stub-first `@make` pre-pass** before `@test` runs (see Phase 6 → "Rust integration TDD: stub-first"). Plan for two `@make` dispatches per such task: stub pass, then body pass.
- **Unit-test-only tasks (in-source `#[cfg(test)] mod tests`):** mark the task as `NOT_TESTABLE` with reason `Rust unit-only``@test` cannot write inside production source. `@make` writes those inline as part of its production change.
- **Module tests** (most common — testing private/crate-internal functions): pick the relevant production source file, e.g. `src/<module>.rs`. `@test` is permitted to add or edit content **only inside `#[cfg(test)] mod <name> { … }` blocks** in that file (per `@test`'s File Constraint). The rest of the file remains read-only to `@test`.
- **Integration tests** (testing the crate's public API as a black box): `tests/<feature>.rs (create)`, or in a workspace `<crate>/tests/<feature>.rs`.
- **In both cases**, if the test references not-yet-existing functions/types, the task requires a **stub-first `@make` pre-pass** so the symbols exist as `todo!()` bodies before `@test` runs. See Phase 6 → "Rust stub-first TDD". Plan for two `@make` dispatches per such task: stub pass, then body pass.
- **`src/tests/<feature>.rs` is not a valid path** — it would be a regular submodule needing `mod tests;` in production code. Use one of the two forms above.
- **Polyglot Nix flake**
- Match the host language of the code under change (Python or Rust rules above), wrapping commands in `nix develop -c …` per the agents' devshell rule.
@ -140,127 +271,133 @@ When a task fails the heuristic, split into:
Each split is dispatched separately to `@make` and verified before the next.
### Code Context Anti-patterns
### Code Context — what to include
The **Code Context** field exists so `@make` can find the seam to modify, not so it can read off a finished answer. Strictly follow:
The **Code Context** field exists so `@make` can find the seam to modify. Provide:
- **Provide:** the existing code being replaced (verbatim), the surrounding ~510 lines of context, function signatures of helpers `@make` will need to call, the file's relevant import block.
- **Do NOT provide:** a complete drop-in replacement, the new function bodies, the test bodies (those come from `@test` or — for unit-only Rust — from `@make` itself per Phase 6), or any "here is what to write" code block longer than ~5 lines.
- The existing code being replaced (verbatim, marked as "current state"), with ~510 lines of surrounding context
- Function signatures of helpers `@make` will need to call
- The file's relevant import block
For everything you must **not** include — drop-in replacements, full function bodies, pre-written test bodies, "here is what to write" — see **Dispatch Hygiene → No-Implementation-in-Plan-or-Spec Rule** above.
If the task is so well-specified that you've already written the implementation, the task is too small for `@make` (apply it directly) or you've over-determined the design (revisit Phase 3).
### Finalized-Text Rule
Apply **Dispatch Hygiene** to each task spec before dispatch in Phase 7.
Each task spec must be **finalized** before dispatch — single-author text with no contradictions. **Forbidden in dispatch prompts:**
---
- "Actually, that's wrong — let me correct…"
- "Wait, let me revise…"
- Two versions of the same code block with one labelled "corrected"
- Open questions or ambiguities the orchestrator hasn't resolved
## Phase 5.5: Review Task Split
If you find yourself revising while writing the spec, stop, redo the spec from scratch with the corrected understanding, and only then dispatch. `@make` is a fresh-context implementer; it cannot reliably resolve which of two contradictory drafts is canonical.
A short, focused review of the task split as a set. Catches split errors (missed scope, overlap, multi-purpose tasks, missing integration contracts) when they're cheap to fix — *before* `@test` and `@make` dispatch on a broken split. Without this gate, the same errors surface only at Phase 8 final review, after expensive test/implementation work has already been done.
**Dispatch only `@check`** for this phase — split review is structural / coverage, not complexity. `@simplify` is not involved. Apply **Dispatch Hygiene** to the prompt.
The dispatch prompt names:
- `$RUN_DIR/plan.md` (the plan being decomposed)
- `$RUN_DIR/task-1.md` through `$RUN_DIR/task-N.md` (the split — list every task file)
- The worktree path
`@check` evaluates the split against five questions:
1. **Coverage** — do the tasks together implement everything the plan promises? Any gaps?
2. **No overlap** — do two tasks claim the same scope or modify the same lines?
3. **Single-purpose** — does any task do more than one thing? (See Phase 5's Split Heuristic.)
4. **Integration contracts** — where two tasks touch a shared interface, is the contract documented in both task files?
5. **Testable acceptance criteria** — does every task have specific, falsifiable AC?
**Review loop (max 2 cycles):**
1. Dispatch `@check` against the plan + all task files.
2. If `ACCEPTABLE` → proceed to Phase 6.
3. If `NEEDS WORK` → edit the task files in place (split a task into two, merge two tasks, add integration contracts, sharpen AC). Re-apply Dispatch Hygiene to each updated file. Re-dispatch.
4. If `BLOCK` → the plan itself does not decompose cleanly. Return to Phase 4 with `@check`'s finding instead of forcing the split.
5. **Convergence detection:** same finding twice → stop loop, document the unresolved split issue in the run summary, proceed.
**This is a quick gate, not a deep review.** No line-by-line code feedback (there's no code), no design re-litigation (that was Phase 4's job). The whole point is a fast structural check before downstream phases start churning.
---
## Phase 6: Write Tests
For each task from Phase 5, dispatch `@test` with:
- The task spec (acceptance criteria, code context, files to modify)
- The Test Design section from the plan (if provided)
- The test file path to create (following colocated pattern)
Apply **Dispatch Hygiene** to each `@test` prompt before sending.
For each task from Phase 5, dispatch `@test` with a short prompt that names:
- The absolute path to the task spec: `$RUN_DIR/task-<N>.md``@test` reads acceptance criteria, code context, and files-to-modify from there.
- The absolute path to the plan, if test design context is needed: `$RUN_DIR/plan.md`.
- The worktree path (so `@test` resolves source files correctly).
- The test file path to create.
Do **not** quote task or plan content inline — `@test` reads from disk.
`@test` writes failing tests and verifies RED with structured failure codes.
**Post-step file gate (MANDATORY):**
Before dispatching `@test`, snapshot the current changed files:
```bash
git diff --name-only > /tmp/pre_test_baseline.txt
```
After `@test` completes, validate only NEW changes:
```bash
git diff --name-only | comm -23 - /tmp/pre_test_baseline.txt > /tmp/test_new_files.txt
```
All new files must match the project's test patterns:
- Python: `**/test_*.py`, `**/*_test.py`, `**/conftest.py` (new only), `**/test_data/**`, `**/test_fixtures/**`
- Rust: `tests/**/*.rs`, `**/tests/**/*.rs` (workspace-style `<crate>/tests/...`), `**/test_data/**`, `**/test_fixtures/**`
**Anti-patterns — discard the output even if the glob matches:**
- Anything under `src/` for Rust (e.g. `src/tests/foo.rs`, `src/**/tests/...`). `src/tests/` is a regular module path under `src/`, not a Rust test location, and `@test` cannot wire it up via `mod` declarations in production source. Such paths indicate the task spec gave a wrong test path — escalate, don't accept the file.
If any non-matching file appears, or any anti-pattern matches: discard `@test` output, report violation.
**Decision table — handling `@test` results:**
| Condition | Action |
|-----------|--------|
| `TESTS_READY` + `escalate_to_check: false` | Proceed to Phase 7 |
| `TESTS_READY` + `escalate_to_check: true` | Route tests to `@check` for light review. `@check` diagnoses, caller routes fixes to `@test`. Then proceed. |
| `NOT_TESTABLE` (general reasons) | Route to `@check` for sign-off on justification. If approved, task goes to `@make` without tests. |
| `NOT_TESTABLE` reason `Rust unit-only` | See "Rust unit-only routing" below. **Do not** include test code in the `@make` spec; pass test specs only. |
| `NOT_TESTABLE` | Route to `@check` for sign-off on justification. If `Missing testability seam`, dispatch `@make` to add the seam first, then re-run `@test`. Otherwise the task goes to `@make` without tests. |
| `BLOCKED` | Investigate. May need to revise task spec or plan. |
| Test passes immediately | Investigate — behavior may already exist. Task spec may be wrong. |
| Stub-first run: tests pass with zero `todo!()` panics | **Structural-only tests.** Every test is asserting type/struct/enum facts without calling any stubbed symbol. Reject the test output and route back to `@test` with a "must exercise the stubbed symbols by calling them" note. Do not let these tests gate Phase 7 — they cannot RED→GREEN, so the body-pass `@make` would commit code with false-green coverage. |
### Rust unit-only routing
### Rust stub-first TDD (mandatory for new symbols)
When `@test` returns `NOT_TESTABLE: Rust unit-only` (the implementation needs in-source `#[cfg(test)] mod tests` blocks that `@test` is forbidden from writing), the orchestrator must:
Whenever `@test` will write tests (module or integration) that reference functions / methods / types **that do not yet exist**, the test cannot RED meaningfully against absent code:
1. Get `@check`'s sign-off on the justification (no integration-test seam exists).
2. Build the `@make` spec with **test specifications**, not test code:
- "Add `#[cfg(test)] mod foo_tests` at the bottom of `<file>` exercising:"
- For each behavior, a one-line description: input → expected output, edge case to cover, error path to assert.
- Where applicable, name the function under test and the assertion type (`assert_eq!`, `assert!`, panic on invalid input).
3. **Forbidden in the `@make` spec:** complete `#[test] fn …` bodies, full module blocks, or any `@test`-style RED-verified test code. `@make` writes the inline tests itself based on the spec.
4. After `@make` completes, the orchestrator runs the test suite once to confirm RED→GREEN evidence and includes it in the workflow summary.
- *Module tests inside `src/<module>.rs`* — without the function, the `#[cfg(test)] mod tests` block fails to compile (`error[E0425]`), masking assertion diagnostics.
- *Integration tests inside `tests/<feature>.rs`* — same, but mediated through `lib.rs` re-exports.
This keeps the agents in their lanes: `@test` never writes inside `src/`, `@make` writes both the tests and the production code in a single coherent change, and the orchestrator sees explicit test pass evidence.
### Rust integration TDD: stub-first (mandatory)
Rust integration tests live in a separate test crate (`tests/<feature>.rs`) that imports from `lib.rs`. Any test referencing not-yet-existing public API can only RED at *build* time, which masks assertion diagnostics. To avoid this, **for every Rust task whose `@test` step writes an integration test against public API that does not yet exist**, dispatch a stub-first `@make` pass *before* `@test` runs:
To get a clean runtime RED, dispatch a **stub-first `@make` pass** *before* `@test` runs:
**Stub pass (split from Phase 7's body pass):**
1. Dispatch `@make` in **standard mode** (no tests exist yet) with this exact scope:
- **Goal:** add the planned public API as `todo!()`-bodied stubs so the integration test will compile.
- **Files to modify:** `src/lib.rs` (add `pub mod …;` declarations) plus any new `src/<module>.rs` files containing the stub functions/structs.
- **Stubs only:** every function body is exactly `todo!()`. Every method body is exactly `todo!()`. Public structs may use `pub struct Foo;` or `pub struct Foo { /* fields TBD */ }` — but no logic.
- **Signatures must match the planned final API exactly** (return types, lifetimes, generics) — otherwise the integration test will mismatch later. Lift signatures from the Phase 3 plan / Phase 5 task spec.
1. Dispatch `@make` in **standard mode** (no tests exist yet). The dispatch prompt names `$RUN_DIR/task-<N>.md` as the source spec and adds this stub-pass-specific scope inline:
- **Goal:** add the planned API as `todo!()`-bodied stubs so the test will compile.
- **Files to modify:** the relevant `src/<module>.rs` for module tests, or `src/lib.rs` plus any new `src/<module>.rs` for integration tests (the latter need `pub mod …;` declarations so the test crate can import).
- **Stubs only:** every function body is exactly `todo!()`. Every method body is exactly `todo!()`. Structs may use `pub struct Foo;` or `pub struct Foo { /* fields TBD */ }` — but no logic.
- **Signatures must match the planned final API exactly** (return types, lifetimes, generics, visibility). Lift signatures from the task spec.
- **Acceptance criteria:** `cargo check` (wrapped in `nix develop -c …` if the project has a devshell) passes; no test command is run.
- **Code Context Anti-patterns still apply:** the stub pass is small and finalized — no draft bodies, no contradictory signatures.
- **Dispatch Hygiene still applies:** the stub pass is small and finalized — no draft bodies, no contradictory signatures.
2. Verify `cargo check` passed in `@make`'s output. If not, fix and re-dispatch the stub pass before continuing.
3. Dispatch `@test` as normal. The integration test now compiles; running it panics on `todo!()` at runtime, which is a clean `MISSING_BEHAVIOR` RED with a stack trace — far better than the build-error-RED form.
4. Continue to Phase 7's body pass (`@make` in TDD mode), where the same files are revisited and the `todo!()` bodies are replaced.
3. Dispatch `@test`. The test now compiles; running it panics on `todo!()` at runtime, which is a clean `MISSING_BEHAVIOR` RED with a stack trace — far better than the build-error-RED form.
4. **Panic-coverage check (MANDATORY).** After `@test` returns, re-run the test command in the orchestrator and verify that **every test in the new file panics on `todo!()`** (i.e. every test exercises at least one of the stubbed symbols). The decision rule:
- If the test output shows N panics for N tests → proceed to body pass.
- If any test passes without a `todo!()` panic → that test is structural-only (asserting type / variant-count / field facts without calling the stubbed code). **Reject** `@test`'s output and route back with the "Stub-first run: tests pass with zero `todo!()` panics" decision-table verdict. Require `@test` to rewrite each non-panicking test so it actually invokes the stubbed function/method.
- This check is the only thing standing between false-green coverage and the body-pass commit. Skipping it has produced regressions like a system that compiles, "passes" tests, and silently no-ops in production.
5. Continue to Phase 7's body pass (`@make` in TDD mode), where the same files are revisited and the `todo!()` bodies are replaced.
**This routing is mandatory** for new public API in Rust. It is **not** required when the integration test exercises an existing public API (e.g. a behavior fix where the function already exists) — in that case `@test` runs directly and `@make` modifies the body in Phase 7.
**This routing is mandatory** whenever new symbols are introduced in Rust (module or integration). It is **not** required when the test exercises an *existing* function/method (e.g. a behavior fix) — in that case `@test` runs directly and `@make` modifies the body in Phase 7.
The stub pass and the body pass each produce their own atomic commit (per Phase 9 rules): `feat(<scope>): scaffold <thing> with todo!() stubs` followed by `feat(<scope>): implement <thing>` (or whichever conventional type fits).
**Parallelism:** Independent tasks can have tests written in parallel.
**Constraint:** `@test` must not modify existing conftest.py files (prevents collision during parallel execution).
**Parallelism:**
- **Python:** Independent tasks can have tests written in parallel, *provided* their test files are disjoint and no shared `conftest.py` is being modified.
- **Rust:** Run `@test` dispatches **sequentially**. Cargo serialises the build via the `target/` directory lock, so parallel dispatches give no speedup; they only add risk (a long-running build in one branch starves the other, and any task that touches a shared crate-level fixture/helper file will race).
**Constraint:** `@test` must not modify existing `conftest.py` files (prevents collision during parallel execution).
---
## Phase 7: Implement
Execute each task by dispatching `@make` with:
- The task spec (from Phase 5, finalized — see Finalized-Text Rule)
- Relevant code context (seam-revealing snippets only — see Code Context Anti-patterns)
- **Pre-written failing tests and handoff from `@test` (if TESTS_READY)**
Apply **Dispatch Hygiene** to each `@make` spec before sending. Repeated trips on a single task signal a Phase 5 split problem — go back and split.
### Pre-Dispatch Validation (MANDATORY)
**`@make` dispatches are SEQUENTIAL — never in parallel.** Run each task to completion (writes, every verification command, and the orchestrator's post-check) before dispatching the next. Reasons:
- `@make` writes source files. Parallel agents picking the same file (e.g. `src/lib.rs` for adding both a new `pub mod` and a registration) corrupt each other.
- Even on disjoint files, Cargo's `target/` lock and uv's venv state serialise the verification builds anyway, so parallelism gives no speedup.
- Stub-pass/body-pass pairs (Rust integration TDD) must be strictly ordered within a task; running stub-pass for task 2 while body-pass for task 1 is still building yields a non-deterministic crate state for `@test` to RED against.
Before sending the spec to `@make`, scan it and reject (revise, then retry) if any of the following are present:
This applies to **all** `@make` invocations: standard mode, TDD mode, stub-pass, body-pass, and integration-fix dispatches.
| Check | Why it matters |
|---|---|
| `bash -c`, `sh -c`, `zsh -c`, `fish -c` (anywhere, including inside `nix develop --command bash -c …`) | `@make`'s sandbox denies all `*-c` shell invocations and any nested `bash` would bypass the per-command allowlist. Replace with one direct command per line: `nix develop -c cargo check`, `nix develop -c cargo test`, etc. |
| `nix develop --command bash` / `nix develop -c bash` / `nix develop -c sh` | Same — the inner shell escapes the sandbox. Wrap each toolchain command directly. |
| Any `cd <path> && …` | `@make` cannot `cd`. Rewrite to use absolute paths or `git -C <path>` for git operations (and `@make` doesn't run git anyway). |
| Code blocks longer than ~5 lines under "Code Context" or labelled as the answer | Violates Code Context Anti-patterns. Trim to the seam. |
| Two versions of the same code, "actually let me correct…", or open questions | Violates the Finalized-Text Rule. Redo the spec. |
| Test bodies inside the `@make` spec when tests are coming from `@test` | The TDD handoff already provides them; duplicating creates conflict. |
Execute each task by dispatching `@make` with a short prompt:
- The absolute path to the task spec: `$RUN_DIR/task-<N>.md``@make` reads acceptance criteria, code context, and files-to-modify from there.
- The worktree path.
- **Pre-written failing tests and handoff from `@test` (if TESTS_READY)** — these are short and per-dispatch, so include them inline in the prompt.
If any check trips, **do not dispatch.** Fix the spec and re-validate. Repeated trips on the same task signal a Phase 5 split problem — go back and split.
Do **not** quote the task spec inline.
`@make` runs in TDD mode when tests are provided:
1. Entry validation: run tests, verify RED, check failure codes match handoff
@ -269,12 +406,20 @@ If any check trips, **do not dispatch.** Fix the spec and re-validate. Repeated
4. Refactor while keeping green
5. Report RED→GREEN evidence
**Escalation:** If `@make` flags test quality concerns during entry validation:
1. `@make` reports the issue to caller
2. Caller routes to `@check` for diagnosis
3. `@check` reports findings
4. Caller routes to `@test` for fixes
5. Fixed tests return to `@make`
**Escalation — two paths route through `@check``@test` → back to `@make`:**
1. **Entry-validation escalation.** Before implementing, `@make`'s entry check (run tests, verify RED, compare against handoff) reveals test-quality concerns — wrong assertion target, mixed failure codes, mocks of internal boundaries, etc. `@make` reports without writing any production code.
2. **Mid-implementation escalation.** After implementing, `@make` hits its iteration limit (23 attempts) because the test demands production code that's impossible or contradicts the spec. `@make` returns `Implementation Incomplete` with the flag `escalate: test_design`. **Do not** re-dispatch `@make` with marginal context tweaks — that just burns cycles on a test that needs redesign, not better implementation.
In both cases:
1. `@make` returns its report (entry-time concern or mid-impl `escalate: test_design`).
2. Orchestrator routes the report to `@check` for diagnosis (light review of the *tests*, not the implementation).
3. `@check` confirms or rejects the test-design suspicion.
4. **If confirmed:** orchestrator routes to `@test` to redesign the tests. Apply Dispatch Hygiene. Fixed tests return to `@make` for fresh entry validation and a clean implementation attempt.
5. **If rejected:** the issue is in the production code; orchestrator re-dispatches `@make` with `@check`'s diagnostic notes attached.
**Iteration limit on this loop: max 2 cycles.** If a test-design suspicion keeps surfacing but `@check` never confirms it, the design problem is upstream — revisit the Phase 3 plan rather than thrashing between `@test` and `@make`.
For NOT_TESTABLE tasks, `@make` runs in standard mode.
@ -287,18 +432,21 @@ After all tasks complete, verify overall integration:
## Phase 8: Final Review
Dispatch `@check` and `@simplify` in parallel to review the full implementation (all changes across all files).
Apply **Dispatch Hygiene** to each reviewer prompt before sending. Dispatch `@check` and `@simplify` in parallel to review the full implementation (all changes across all files).
Provide reviewers with:
- The original plan
- The absolute path to `$RUN_DIR/plan.md` (the same file Phase 4 reviewed; mid-loop revisions will have updated it in place)
- The full diff (`git diff "$BASE_BRANCH"...HEAD`)
- Any decisions or deviations from the plan
- Any decisions or deviations from the plan, captured inline in the dispatch prompt
**Review loop (max 3 cycles):**
1. Send implementation to both reviewers
2. Merge findings (same precedence rules as Phase 4)
3. If ACCEPTABLE: proceed to Phase 9
4. If issues found: fix them directly (no need to re-dispatch `@make` for small fixes), then re-review
4. If issues found, route per the kind of finding — **the orchestrator does not write production code; `@make` does**:
- **`BLOCK`, behavioral, correctness, or test-quality findings:** build a new `@make` task spec from the finding (apply Dispatch Hygiene, finalized text, no draft answer). Dispatch `@make`. Do **not** fix directly. Every `BLOCK` is by definition behavioral and must round-trip through `@make`.
- **Strictly cosmetic findings** (typo in a comment, missing trailing newline, formatting that does not change the AST or behavior): the orchestrator may fix directly, then re-review. Anything compiler-detected (unused import, dead code) goes through `@make`, since removing it is still a code change.
- When in doubt, dispatch `@make`.
5. **Convergence detection:** same findings twice = stop loop early
6. If unresolved after 3 cycles: document blockers, proceed to commit anyway
@ -309,39 +457,66 @@ Provide reviewers with:
The workflow is forge-agnostic. It commits locally and stops. **Do not push, and do not open a pull/merge request** — the user chooses their forge and review workflow manually.
### Commit Code Changes
- Stage code changes only. **Do not stage `TODO.md`** (committed separately below) and **do not stage `.opencode/workflow-summary.md`** (intentionally never committed — see Local Summary).
- Write a conventional commit message summarizing the implementation. Reference the TODO.md issue ID in the body (e.g. `Refs: ABC-1`).
- Stage code changes only. **Do not stage anything under `TODO/`** (committed separately below) and **do not stage anything under `.workflow/`** (intentionally never committed — these are per-run artifacts).
- Write a conventional commit message summarizing the implementation. Reference the TODO issue ID in the body (e.g. `Refs: GAL-39`).
- If changes are large/varied, use multiple atomic commits (one per logical unit)
### TODO Update
- Dispatch `@pm` against `./TODO.md` (live filesystem mode). Ask it to:
- Set **Branch** to `$BRANCH_NAME`
- Set **Status** to `In Review`
- Add a comment with the branch name, latest commit SHA, and a one-line summary
- If acceptance-criteria checkboxes were addressed by the implementation, ask `@pm` to check them off
- Commit the TODO.md change as a separate atomic commit: `chore(todo): update <issue-id> status and progress`
- Dispatch `@pm` against the absolute `./TODO/` path. Ask it to:
- Set the issue file's frontmatter `status` to `Done` (or leave at `In Progress` if the run is incomplete and the user must verify before marking Done).
- Add a comment of the form: `- YYYY-MM-DD — Branch \`$BRANCH_NAME\`, commit <SHA><one-line summary>` (date from the shell, never fabricated).
- Propagate any status flip to the dependent index: `TODO/README.md` for top-level issues (`parent: null`), or the parent file's `## Sub-issues` line for sub-issues.
- If acceptance-criteria checkboxes were addressed by the implementation, ask `@pm` to check them off (flip `- [ ]` to `- [x]` under `## Acceptance criteria`).
### Local Summary
- Write `.opencode/workflow-summary.md` in the worktree with:
- **Run timestamp** — capture it from the shell at write time: `date -Iseconds` (e.g. `2026-05-07T11:24:13+02:00`). **Do not** use a placeholder like `???:???:??` or "session date" — if you cannot get a real timestamp, omit the field entirely rather than fabricating one.
### File Follow-ups
Tracked-worthy unresolved items must become real TODO issues; otherwise they vanish into the per-run `summary.md` and the user (who has walked away) never sees them. Before writing the summary, scan the run for items in these categories and dispatch `@pm` to file each as a **sub-issue of the current issue** (`parent: $ISSUE_ID`).
| Source | New issue label | Title style |
|---|---|---|
| Pre-existing bug discovered while working but out of scope (e.g. "Score not resetting on game restart" found during GAL-39) | `bug` | Imperative fix description ("Reset score on game restart") |
| Unresolved blocker after a review loop exhausted its cycle limit (Phase 4 plan review or Phase 8 final review) | `followup` | Reference the `@check` finding |
| `@test` `NOT_TESTABLE` "future seam" notes that imply a real test gap | `tech-debt` | Describe the missing seam |
**Do NOT file follow-ups for:**
- `@simplify` advisory recommendations the orchestrator chose not to act on — these are records, not missing work; they belong in the run summary.
- Cosmetic / formatting / naming nits.
- Anything already covered by an existing TODO issue (`@pm` lists existing issues; check the title/description before filing a duplicate).
**Routing rules:**
- Each new issue is a sub-issue (`parent: $ISSUE_ID`). `@pm` will add it to the parent's `## Sub-issues` list automatically. The user can promote it to top-level later if it deserves its own slot.
- Issue body must include a "Discovered during" paragraph naming the run's branch and (where relevant) commit SHA, plus enough context for the user to triage it later without having to re-read the run.
- Status: `Todo`. Default labels per the table; the orchestrator may add additional labels inferred from the parent (e.g. propagate `gameplay` from GAL-39 to a gameplay-relevant follow-up).
- The Run Summary (next subsection) lists each filed follow-up by ID so the user has one place to see them.
### Commit TODO Changes
After both the TODO Update and File Follow-ups steps, commit everything under `TODO/` in a single atomic commit: `chore(todo): update <issue-id> status, file follow-ups`. Stage the worked issue file, the dependent index (README.md or parent file), and any newly created follow-up issue files.
If no follow-ups were filed, the commit message simplifies to `chore(todo): update <issue-id> status and progress` and only the TODO Update changes are staged.
### Run Summary
- Write `$RUN_DIR/summary.md` with:
- **Run timestamp** — capture it from the shell at write time: `date -Iseconds` (e.g. `2026-05-08T11:24:13+02:00`). **Do not** use a placeholder like `???:???:??` or "session date" — if you cannot get a real timestamp, omit the field entirely rather than fabricating one.
- Issue reference and title
- Branch name and final commit SHA(s)
- Summary of implementation
- TDD evidence (RED→GREEN per task, NOT_TESTABLE justifications)
- Review outcomes (plan review + final review verdicts)
- Unresolved items (if any)
- **Filed follow-ups** — list each new issue created in the File Follow-ups step by ID, title, and reason (`bug` / `followup` / `tech-debt`). If none, write "None."
- **Advisory notes (not filed)** — any `@simplify` or `@check` recommendations the orchestrator chose not to act on and did not turn into a TODO. These are records for the user to consider, not tracked work.
- Files changed
- **Do not commit this file.** It is a per-run, per-branch artifact; committing it would create merge conflicts whenever multiple workflow branches are merged. Leave it untracked. Recommend the user add `.opencode/` to `.gitignore` if not already.
- **Do not commit anything under `.workflow/`.** The whole directory is per-run, per-branch state. Recommend the user add `.workflow/` to `.gitignore` if not already.
---
## Failure Handling
At any phase, if an unrecoverable error occurs:
1. Write `.opencode/workflow-summary.md` with what was completed and what failed. Do **not** stage or commit this file.
2. If any code was written, commit it with message `wip: incomplete workflow run for <issue-id>`. Stage code only — exclude `.opencode/workflow-summary.md`.
1. Write `$RUN_DIR/summary.md` (creating `$RUN_DIR` first if it doesn't exist) with what was completed and what failed. Do **not** stage or commit anything under `.workflow/`.
2. If any code was written, commit it with message `wip: incomplete workflow run for <issue-id>`. Stage code only — exclude `.workflow/` and `TODO/`.
3. Leave the branch and worktree intact for the user to inspect — do not push, do not delete.
4. Dispatch `@pm` against `./TODO.md` to add a comment on the issue summarising what failed.
4. Dispatch `@pm` against `./TODO/` to add a comment on the issue file (`./TODO/<ID>.md`) summarising what failed.
5. Stop execution.
**Never hang on interactive prompts.** If any command appears to require input, treat it as a failure and follow the above procedure.

View file

@ -0,0 +1,278 @@
# Workflow Design
## 1. Purpose
This document is the **design rationale and decision log** for the multi-agent workflow. The operational rules — what the orchestrator does, in what order, with what guardrails — live in [`commands/workflow.md`](commands/workflow.md) and the agent files under [`agents/`](agents/). This document is where we discuss changes *before* they land in those files.
**Intended flow:**
1. A new idea, gap, or failure mode comes up (often from a real run).
2. Discuss in this document — capture context, options, trade-offs.
3. When a decision is reached, update `commands/workflow.md` and/or the relevant agent file.
4. Record the decision in the [Design decisions log](#5-design-decisions-log) below.
The operational files stay terse and procedural. The "why" lives here.
---
## 2. Cast & Responsibilities
One orchestrator, five subagents. The orchestrator runs in `agent: build` mode; the subagents are defined as separate agent files under `config/opencode/agents/`.
| Actor | File | Role | Boundary |
|---|---|---|---|
| **Orchestrator** | `commands/workflow.md` | Plans, dispatches, merges findings, edits artifacts under `.workflow/`, commits. | **Does not** write production code, write tests, or play any subagent's role. |
| `@check` | `agents/check.md` | Reviews plans / task splits / code for risks, correctness, testability. | Read-only — no write / edit / bash. |
| `@simplify` | `agents/simplify.md` | Reviews for unnecessary complexity. Advisory only. | Read-only. |
| `@test` | `agents/test.md` | Writes failing tests for a task spec, verifies RED. | May modify test files / `#[cfg(test)] mod` blocks. Sandboxed bash. |
| `@make` | `agents/make.md` | Implements a single task spec. Verifies acceptance criteria. | May modify files listed in the task spec. Sandboxed bash; no `git` / network / `cd`. |
| `@pm` | `agents/pm.md` | Reads / updates `TODO/` issue files. | May modify only `TODO/` contents. No bash. |
**Permission boundaries are enforced per agent.** The orchestrator (in `agent: build` mode) has full edit/bash capabilities, which is precisely why it must not act as the subagents — the agent files are where the limits live.
---
## 3. Flow Diagrams
### 3.1 Phase pipeline
High-level happy path with the major escalation arms.
```mermaid
flowchart TD
P1[Phase 1: Sanity Check]
P2[Phase 2: Issue Context<br/>@pm reads TODO/ID.md]
P3[Phase 3: Plan<br/>write plan.md]
P4{Phase 4: Review Plan<br/>@check + @simplify<br/>max 3 cycles}
P5[Phase 5: Split into Tasks<br/>write task-N.md]
P55{Phase 5.5: Review Split<br/>@check<br/>max 2 cycles}
P6[Phase 6: Write Tests<br/>@test ± stub-first @make]
P7[Phase 7: Implement<br/>@make]
P7E{Test-design escalation<br/>max 2 cycles}
P8{Phase 8: Final Review<br/>@check + @simplify<br/>max 3 cycles}
P9[Phase 9: Commit + TODO + Follow-ups + Summary]
P1 --> P2 --> P3 --> P4
P4 -->|ACCEPTABLE| P5 --> P55
P4 -->|NEEDS WORK / BLOCK| P3
P55 -->|ACCEPTABLE| P6 --> P7
P55 -->|NEEDS WORK| P5
P55 -->|BLOCK plan-level| P3
P7 --> P8
P7 -.->|escalate: test_design| P7E
P7E -->|@check@test@make| P7
P7E -.->|2 cycles exhausted| P3
P8 -->|ACCEPTABLE| P9
P8 -->|production-code finding| P7
P8 -->|test-design finding| P7E
P8 -->|plan-level finding| P3
P9 --> END([Done])
```
### 3.2 Phase 7 escalation loop
The pattern when `@make` cannot reach GREEN.
```mermaid
stateDiagram-v2
[*] --> Dispatched: orchestrator dispatches @make
Dispatched --> EntryCheck: run tests, verify RED
EntryCheck --> Implementing: failure code matches handoff
EntryCheck --> EntryEscalation: test-quality concern
Implementing --> GreenReached: tests pass within 2-3 attempts
Implementing --> MidEscalation: escalate: test_design
Implementing --> MidStuck: incomplete, no flag
MidStuck --> Implementing: re-dispatch with @check notes (1 retry)
MidStuck --> MidEscalation: still failing on retry
EntryEscalation --> CheckDiag
MidEscalation --> CheckDiag
CheckDiag --> TestRedesign: confirmed test-design error
CheckDiag --> Dispatched: rejected (production issue)
TestRedesign --> Dispatched: @test fixes, fresh entry validation
Dispatched --> PlanRevisit: 2 escalation cycles exhausted
GreenReached --> [*]
PlanRevisit --> [*]: back to Phase 3
```
### 3.3 Issue lifecycle
How TODO entries move through statuses, with sub-issue filing during a run.
```mermaid
stateDiagram-v2
[*] --> Todo: issue file created
Todo --> InProgress: Phase 2 (workflow starts)
InProgress --> Done: Phase 9 (run completes successfully)
InProgress --> Todo: workflow fails (failure handler adds comment)
note right of InProgress
New sub-issues may be filed during Phase 9
(parent: <ISSUE_ID>, status: Todo, label: bug/followup/tech-debt)
end note
Done --> [*]
```
---
## 4. Routing Matrix
Every observed `(phase, signal) → action`. Empty cells are gaps. Walking this table is the cheap way to spot routing issues like the recent Phase 7 mid-implementation escalation.
| Phase | Signal source | Signal | Action |
|---|---|---|---|
| 1 | Sanity checks | Bare repo / detached HEAD / missing `TODO/<ID>.md` / branch == base | Stop with error |
| 2 | `@pm` | Issue not found | Stop with error |
| 2 | `@pm` | Status is `Todo` | Flip to `In Progress`; propagate to README.md / parent's Sub-issues |
| 3 | Orchestrator | Plan drafted | Apply Dispatch Hygiene; write `plan.md`; verify `test -f` |
| 4 | `@check` + `@simplify` | Both ACCEPTABLE | Proceed to Phase 5 |
| 4 | Either reviewer | NEEDS WORK | Edit `plan.md` in place; re-dispatch (max 3 cycles) |
| 4 | `@check` | BLOCK | Edit `plan.md` addressing the finding; re-dispatch |
| 4 | Reviewers | Same finding twice | Convergence detected; stop loop early |
| 4 | Reviewers | Unresolved after 3 cycles | Document blockers in summary; proceed |
| 5 | Orchestrator | Tasks drafted | Apply Dispatch Hygiene; write each `task-N.md`; verify `test -f` for every N |
| 5.5 | `@check` | ACCEPTABLE | Proceed to Phase 6 |
| 5.5 | `@check` | NEEDS WORK | Edit `task-N.md` in place; re-dispatch (max 2 cycles) |
| 5.5 | `@check` | BLOCK | Plan doesn't decompose cleanly; back to Phase 4 |
| 6 | `@test` | TESTS_READY + `escalate_to_check: false` | Proceed to Phase 7 |
| 6 | `@test` | TESTS_READY + `escalate_to_check: true` | `@check` light review → `@test` fixes → forward |
| 6 | `@test` | NOT_TESTABLE (general) | `@check` sign-off; task goes to `@make` without tests |
| 6 | `@test` | NOT_TESTABLE: Missing testability seam | `@make` adds the seam; re-run `@test` |
| 6 | `@test` | BLOCKED | Investigate; may need spec or plan revision |
| 6 | `@test` (stub-first) | All tests pass with zero `todo!()` panics | Reject — structural-only tests; route back to `@test` to rewrite |
| 7 | `@make` | Implementation Complete | Proceed to Phase 8 |
| 7 | `@make` | Implementation Incomplete + entry-validation flag | `@check` (test diagnosis) → `@test` (fixes) → fresh `@make` |
| 7 | `@make` | Implementation Incomplete + `escalate: test_design` | Same path; max 2 escalation cycles |
| 7 | `@make` | Implementation Incomplete (no flag) | Re-dispatch with `@check` notes once; if 2nd attempt fails, treat as `escalate: test_design` |
| 7 | Escalation loop | 2 cycles exhausted | Back to Phase 3 (plan revisit) |
| 8 | `@check` + `@simplify` | ACCEPTABLE | Proceed to Phase 9 |
| 8 | `@check` | BLOCK / behavioral / production-code finding | New `@make` task spec from finding; dispatch (max 3 cycles) |
| 8 | `@check` | BLOCK / test-design / test-quality finding | `@check``@test``@make` re-verify |
| 8 | `@check` | BLOCK / plan-level finding | Back to Phase 3 with the finding |
| 8 | `@simplify` | Advisory | Record in summary's "Advisory notes (not filed)" |
| 8 | Reviewers | Strictly cosmetic finding (typo, missing newline, AST-preserving) | Orchestrator fixes directly; re-review |
| 8 | Review loop | Same finding twice | Convergence; stop loop |
| 8 | Review loop | 3 cycles exhausted | Document blockers; proceed |
| 9 | Orchestrator | Pre-existing bug, out of scope | File sub-issue via `@pm` (label: `bug`) |
| 9 | Orchestrator | Unresolved review-loop blocker | File sub-issue via `@pm` (label: `followup`) |
| 9 | `@test` (Phase 6) | NOT_TESTABLE future-seam note | File sub-issue via `@pm` (label: `tech-debt`) |
| 9 | Orchestrator | `@simplify` advisory not acted on | Record in summary; do NOT file (records, not work) |
| 9 | Orchestrator | All commits done | Set issue status to `Done`; sync README/parent; commit `chore(todo): …` |
---
## 5. Design Decisions Log
ADR-flavoured. New decisions append at the end. If a decision is later reversed or refined, mark the original *Superseded by ADR-N* and add a new entry.
### ADR-1 (2026-05-06) — Forge-agnostic workflow
**Context:** original gist used the GitHub `gh` CLI for auth checks and `gh pr create --draft` at the end of the run.
**Decision:** workflow stops at `git commit`. No push, no PR/MR creation, no `gh` references anywhere.
**Alternatives:** keep `gh` integration; abstract behind a forge-plugin interface.
**Consequences:** workflow runs on any git host; user opens PR/MR manually on whichever forge they use. Removes the need for forge auth setup as a prerequisite.
### ADR-2 (2026-05-06) — `@pm` operates on local `TODO/` folder
**Context:** original `@pm` agent used the Linear CLI.
**Decision:** Linear-style folder-as-tracker with one `<ID>.md` file per issue plus a category-grouped `README.md`.
**Alternatives:** keep Linear; multi-backend abstraction; single-file `TODO.md`.
**Consequences:** project-local, version-controlled, no external service. Schema enforced in `agents/pm.md`. Initial single-file design moved to per-issue files in ADR-12.
### ADR-3 (2026-05-07) — Workflow runs in worktree, not bare repo
**Context:** original orchestrated bare-clone → worktree creation as Phase 3 of the workflow.
**Decision:** user creates the worktree before launching opencode; the workflow assumes CWD is the worktree.
**Alternatives:** keep auto-worktree-creation; auto-detect bare vs. worktree.
**Consequences:** simpler workflow; opencode CWD = worktree, so subagents inherit the right project root naturally; less plumbing around `WORKTREE_PATH`. (Subagents still get absolute paths in dispatch prompts — see ADR-7.)
### ADR-4 (2026-05-07) — `@make` and `@test` are polyglot
**Context:** original was Python-only via `uv`.
**Decision:** detect toolchain from marker files (`pyproject.toml`, `Cargo.toml`, `flake.nix`); wrap all toolchain commands in `nix develop -c` if a devshell is present.
**Alternatives:** per-language agents; keep Python-only.
**Consequences:** one agent per role serves multiple languages. Permission allowlists expanded for `cargo` and `nix develop -c`. Bash sandbox still denies shell escapes inside the wrapper.
### ADR-5 (2026-05-07) — Subagent CWD via absolute paths
**Context:** opencode subagents do not inherit the orchestrator's `cd`. A `@check` dispatched from inside a worktree resolved relative paths against the parent project root and failed with "file not found."
**Decision:** capture `WORKTREE_PATH` in Phase 1 and pass absolute paths to every subagent dispatch.
**Alternatives:** patch opencode (out of scope); symlink dance.
**Consequences:** every dispatch has an explicit `Worktree: <abs path>` header convention. Verbose but reliable. Eventually superseded by run-artifact paths under `$RUN_DIR` (ADR-7).
### ADR-6 (2026-05-08) — Run artifacts on disk in `.workflow/run-<ID>/`
**Context:** the orchestrator was paraphrasing the plan and task specs into each dispatch prompt. Result: `@check` and `@simplify` could see slightly different versions of the same plan; mid-loop revisions could leak as "actually let me reconsider…" passages; long specs ate context budget on every dispatch.
**Decision:** orchestrator writes `plan.md` (Phase 3), `task-N.md` (Phase 5), and `summary.md` (Phase 9) to `$WORKTREE_PATH/.workflow/run-<ISSUE_ID>/`. Dispatches name files by absolute path; subagents read them.
**Alternatives:** inline prompts (status quo); database; in-memory orchestrator state.
**Consequences:** byte-for-byte source of truth across dispatches. Mid-loop revisions edit the file in place; every subsequent reader sees the new version. Run-artifact directory is gitignored (`.workflow/`).
### ADR-7 (2026-05-08) — Stub-first Rust TDD (mandatory for new symbols)
**Context:** Rust integration tests reference symbols imported from `lib.rs`. If those symbols don't exist yet, the test crate fails to compile — a build-error RED with no stack trace and no assertion diagnostics. Same for module tests against not-yet-existing functions.
**Decision:** for any Rust task that introduces new symbols, dispatch a stub-pass `@make` first (writes `todo!()`-bodied stubs, runs `cargo check` only). Then `@test` runs against compiling stubs; runtime panic on `todo!()` is the clean RED. Then `@make` body pass replaces stubs.
**Alternatives:** accept compile-error RED; let `@make` write tests + bodies in one pass; allow `@test` to add stubs to production source.
**Consequences:** two atomic commits per affected task (`feat: scaffold X with todo!() stubs`, then `feat: implement X`). Stub-pass scope is tight: bodies are exactly `todo!()`, signatures must match the planned final API. Phase 6 also adds a mandatory panic-coverage check after `@test`: every test must panic on `todo!()` to prove it actually exercises the stubbed symbols (catches structural-only tests).
### ADR-8 (2026-05-08) — `@test` may write inside `#[cfg(test)] mod` blocks
**Context:** Rust unit tests live colocated in production source files inside `#[cfg(test)] mod tests { … }` blocks — the canonical idiom, not an edge case. Original `@test` File Constraint forbade `src/` writes entirely, which forced `@make` to write both production code and tests in a single dispatch. This lost the RED→GREEN separation that TDD relies on.
**Decision:** `@test` may modify `src/**/*.rs` strictly inside `#[cfg(test)] mod <name> { … }` blocks. Every line outside such a block stays read-only.
**Alternatives:** keep the restriction; write all unit-level tests as integration tests.
**Consequences:** TDD works for module tests as well as integration tests. The previous Phase 6 file gate (path-based `git status` snapshot diff) is removed — with `@test` now legitimately writing inside `src/`, a path-based gate proves nothing. Constraint is now enforced by the prompt rule, the diff being human-reviewable, and `@check` flagging production-code drift in Phase 8.
### ADR-9 (2026-05-08) — Phase 5.5 task-split review by `@check`
**Context:** `ppries`' README mentioned `@check` reviewing the task split for completeness, but the gist's `workflow.md` never implemented it. Without a split-review gate, an over- or under-split task surfaced only at Phase 8 final review — after expensive `@test` and `@make` dispatches had already run on a broken split.
**Decision:** new Phase 5.5 dispatches `@check` against `plan.md` + every `task-N.md` to evaluate the split against five questions: coverage, no overlap, single-purpose, integration contracts, testable AC. Max 2 cycles; BLOCK routes back to Phase 4 (plan itself doesn't decompose).
**Alternatives:** status quo (catch at Phase 8); orchestrator self-check.
**Consequences:** one extra `@check` dispatch per run. `@simplify` is not involved at this phase — split review is structural, not complexity. Cheaper failure modes for over-/under-split tasks.
### ADR-10 (2026-05-08) — `@pm` is single-mode (filesystem only)
**Context:** `@pm` had two read modes — `git show <ref>:TODO.md` (read-only) and filesystem (read/write). Git-ref mode existed for the bare-repo flow that ADR-3 retired. After ADR-3, the workflow always used filesystem mode; git-ref mode was dead weight that still added bash permissions and doc surface.
**Decision:** remove git-ref mode. `@pm` has no bash access. Ad-hoc historical reads (`git show main:TODO/GAL-39.md`) are out of scope — the user runs them directly.
**Alternatives:** keep dual-mode; document the separation more clearly.
**Consequences:** simpler agent. One less permission allowlist to maintain. Workflow's "(live filesystem mode)" qualifier dropped from Phase 2 / Phase 9 / Failure handler.
### ADR-11 (2026-05-08) — Phase 9 files follow-ups as TODO sub-issues
**Context:** unresolved items (pre-existing bugs out of scope, blocked review findings, future-seam notes) were recorded only in `summary.md` — per-run, untracked, overwritten on the next run, read by nobody since the user has walked away.
**Decision:** Phase 9 has a `### File Follow-ups` step that dispatches `@pm` to create new TODO sub-issues for tracked-worthy items. Each new issue has `parent: <ISSUE_ID>`, status `Todo`, and an appropriate label (`bug` / `followup` / `tech-debt`). `@simplify` advisories that the orchestrator chose not to act on stay in the summary as records, not filed.
**Alternatives:** leave items in summary; create as top-level issues (would need a README.md category, which can't be picked at unattended runtime).
**Consequences:** unresolved items become tracked work. Sub-issue routing avoids the README-category problem. The follow-up files commit alongside the worked-issue update in a single `chore(todo): …` commit.
### ADR-12 (2026-05-08) — Phase 7 mid-implementation escalation
**Context:** Phase 7's escalation rule was gated on `@make` flagging concerns *during entry validation* (the RED check before implementing). When `@make` got past entry validation, started implementing, and then ground for 2-3 attempts because the test demanded impossible production code, the orchestrator had no documented route — it would re-dispatch `@make` with marginal context tweaks instead of recognizing the diagnosis as test-architecture failure.
**Decision:** split Phase 7's escalation into entry-validation and mid-implementation paths. `@make` reports `escalate: test_design` when its iteration limit is reached and the test seems to demand impossible / unreasonable code. Both paths route through `@check` (test diagnosis) → `@test` (redesign) → fresh `@make` dispatch. Max 2 escalation cycles before reverting to Phase 3 plan revisit.
**Alternatives:** status quo; let `@make` modify test files itself.
**Consequences:** faster recovery from test-design errors. Bounded loop prevents thrashing. `@make.md` Iteration Limits section gains a new red-flag class.
---
## 6. Open Questions / Known Gaps
When a question gets answered, move it to the [Design decisions log](#5-design-decisions-log).
### Q1: Phase 5.5 review scope — does `@check` evaluate test-design soundness here?
Currently Phase 5.5 reviews the **split** (coverage, overlap, single-purpose, integration contracts, testable AC). It does *not* explicitly evaluate whether the test approach implied by each task spec is sound. That would partially overlap with Phase 4 (which has a plan-level Test Design section the reviewers evaluate). If a test-design error escapes Phase 4 and is encoded in a task spec, it surfaces at Phase 7 via the mid-impl escalation (ADR-12) — but earlier detection might be cheaper. Open: should Phase 5.5 add "test approach for each task is sound" as a sixth review question, or is that scope creep into Phase 4 territory?
### Q2: How does the orchestrator handle "split heuristic violated only after attempting a task"?
Phase 5's Split Heuristic catches obvious over-/under-split cases at planning time. But sometimes a task that *looked* single-purpose during planning turns out to mix structural and runtime work only when `@make` starts implementing it. There's no documented mid-Phase-7 routing for "this task needs to be split now." Currently `@make` would either thrash (mid-impl escalation, ADR-12), or report the spec is ambiguous (Insufficient Context Protocol in `make.md`). Open: should there be a "split mid-flight" route that takes the task back to Phase 5 for re-splitting?
### Q3: Phase 9 has no rollback for partial commits if it fails between sub-steps
Phase 9's order is: code commit → TODO update → file follow-ups → commit TODO changes → write summary. If the workflow crashes between code commit and TODO commit, the worktree has the code change but the issue file still says `In Progress`. The Failure Handler covers earlier-phase crashes but Phase-9-internal partial states aren't explicitly addressed. Open: should the Failure Handler distinguish "Phase 9 partial" and resume from the right sub-step on retry, or is leaving manual cleanup to the user good enough?
### Q4: `@simplify` not involved at Phase 5.5 — is that the right call?
Phase 5.5 only dispatches `@check`. Rationale (ADR-9) is that split review is structural, not complexity. But `@simplify`'s lens — "what if we deleted this?" — could legitimately catch unnecessary tasks (e.g. a third task that adds an abstraction nothing else needs). Open: is the cost of one more dispatch worth the catch?
### Q5: Test-design loop bound vs plan-revisit threshold
ADR-12 sets max 2 cycles for the Phase 7 test-design escalation before reverting to Phase 3 plan revisit. The plan-review and final-review loops have max 3. Why the asymmetry? The test-design loop is more expensive per cycle (`@check` + `@test` + `@make` re-implement vs. just reviewers + plan edit), so 2 may be right. But the choice was made by feel, not measured. Open: is 2 the right number, or should it match Phase 4 / Phase 8 at 3?
---