# Workflow Design ## 1. Purpose This document is the **design rationale and decision log** for the multi-agent workflow. The operational rules — what the orchestrator does, in what order, with what guardrails — live in [`commands/workflow.md`](commands/workflow.md) and the agent files under [`agents/`](agents/). This document is where we discuss changes *before* they land in those files. **Intended flow:** 1. A new idea, gap, or failure mode comes up (often from a real run). 2. Discuss in this document — capture context, options, trade-offs. 3. When a decision is reached, update `commands/workflow.md` and/or the relevant agent file. 4. Record the decision in the [Design decisions log](#5-design-decisions-log) below. The operational files stay terse and procedural. The "why" lives here. --- ## 2. Cast & Responsibilities One orchestrator, five subagents. The orchestrator runs in `agent: build` mode; the subagents are defined as separate agent files under `config/opencode/agents/`. | Actor | File | Role | Boundary | |---|---|---|---| | **Orchestrator** | `commands/workflow.md` | Plans, dispatches, merges findings, edits artifacts under `.workflow/`, commits. | **Does not** write production code, write tests, or play any subagent's role. | | `@check` | `agents/check.md` | Reviews plans / task splits / code for risks, correctness, testability. | Read-only — no write / edit / bash. | | `@simplify` | `agents/simplify.md` | Reviews for unnecessary complexity. Advisory only. | Read-only. | | `@test` | `agents/test.md` | Writes failing tests for a task spec, verifies RED. | May modify test files / `#[cfg(test)] mod` blocks. Sandboxed bash. | | `@make` | `agents/make.md` | Implements a single task spec. Verifies acceptance criteria. | May modify files listed in the task spec. Sandboxed bash; no `git` / network / `cd`. | | `@pm` | `agents/pm.md` | Reads / updates `TODO/` issue files. | May modify only `TODO/` contents. No bash. | **Permission boundaries are enforced per agent.** The orchestrator (in `agent: build` mode) has full edit/bash capabilities, which is precisely why it must not act as the subagents — the agent files are where the limits live. --- ## 3. Flow Diagrams ### 3.1 Phase pipeline High-level happy path with the major escalation arms. ```mermaid flowchart TD P1[Phase 1: Sanity Check] P2[Phase 2: Issue Context
@pm reads TODO/ID.md] P3[Phase 3: Plan
write plan.md] P4{Phase 4: Review Plan
@check + @simplify
max 3 cycles} P5[Phase 5: Split into Tasks
write task-N.md] P55{Phase 5.5: Review Split
@check
max 2 cycles} P6[Phase 6: Write Tests
@test ± stub-first @make] P7[Phase 7: Implement
@make] P7E{Test-design escalation
max 2 cycles} P8{Phase 8: Final Review
@check + @simplify
max 3 cycles} P9[Phase 9: Commit + TODO + Follow-ups + Summary] P1 --> P2 --> P3 --> P4 P4 -->|ACCEPTABLE| P5 --> P55 P4 -->|NEEDS WORK / BLOCK| P3 P55 -->|ACCEPTABLE| P6 --> P7 P55 -->|NEEDS WORK| P5 P55 -->|BLOCK plan-level| P3 P7 --> P8 P7 -.->|escalate: test_design| P7E P7E -->|@check → @test → @make| P7 P7E -.->|2 cycles exhausted| P3 P8 -->|ACCEPTABLE| P9 P8 -->|production-code finding| P7 P8 -->|test-design finding| P7E P8 -->|plan-level finding| P3 P9 --> END([Done]) ``` ### 3.2 Phase 7 escalation loop The pattern when `@make` cannot reach GREEN. ```mermaid stateDiagram-v2 [*] --> Dispatched: orchestrator dispatches @make Dispatched --> EntryCheck: run tests, verify RED EntryCheck --> Implementing: failure code matches handoff EntryCheck --> EntryEscalation: test-quality concern Implementing --> GreenReached: tests pass within 2-3 attempts Implementing --> MidEscalation: escalate: test_design Implementing --> MidStuck: incomplete, no flag MidStuck --> Implementing: re-dispatch with @check notes (1 retry) MidStuck --> MidEscalation: still failing on retry EntryEscalation --> CheckDiag MidEscalation --> CheckDiag CheckDiag --> TestRedesign: confirmed test-design error CheckDiag --> Dispatched: rejected (production issue) TestRedesign --> Dispatched: @test fixes, fresh entry validation Dispatched --> PlanRevisit: 2 escalation cycles exhausted GreenReached --> [*] PlanRevisit --> [*]: back to Phase 3 ``` ### 3.3 Issue lifecycle How TODO entries move through statuses, with sub-issue filing during a run. ```mermaid stateDiagram-v2 [*] --> Todo: issue file created Todo --> InProgress: Phase 2 (workflow starts) InProgress --> Done: Phase 9 (run completes successfully) InProgress --> Todo: workflow fails (failure handler adds comment) note right of InProgress New sub-issues may be filed during Phase 9 (parent: , status: Todo, label: bug/followup/tech-debt) end note Done --> [*] ``` --- ## 4. Routing Matrix Every observed `(phase, signal) → action`. Empty cells are gaps. Walking this table is the cheap way to spot routing issues like the recent Phase 7 mid-implementation escalation. | Phase | Signal source | Signal | Action | |---|---|---|---| | 1 | Sanity checks | Bare repo / detached HEAD / missing `TODO/.md` / branch == base | Stop with error | | 2 | `@pm` | Issue not found | Stop with error | | 2 | `@pm` | Status is `Todo` | Flip to `In Progress`; propagate to README.md / parent's Sub-issues | | 3 | Orchestrator | Plan drafted | Apply Dispatch Hygiene; write `plan.md`; verify `test -f` | | 4 | `@check` + `@simplify` | Both ACCEPTABLE | Proceed to Phase 5 | | 4 | Either reviewer | NEEDS WORK | Edit `plan.md` in place; re-dispatch (max 3 cycles) | | 4 | `@check` | BLOCK | Edit `plan.md` addressing the finding; re-dispatch | | 4 | Reviewers | Same finding twice | Convergence detected; stop loop early | | 4 | Reviewers | Unresolved after 3 cycles | Document blockers in summary; proceed | | 5 | Orchestrator | Tasks drafted | Apply Dispatch Hygiene; write each `task-N.md`; verify `test -f` for every N | | 5.5 | `@check` | ACCEPTABLE | Proceed to Phase 6 | | 5.5 | `@check` | NEEDS WORK | Edit `task-N.md` in place; re-dispatch (max 2 cycles) | | 5.5 | `@check` | BLOCK | Plan doesn't decompose cleanly; back to Phase 4 | | 6 | `@test` | TESTS_READY + `escalate_to_check: false` | Proceed to Phase 7 | | 6 | `@test` | TESTS_READY + `escalate_to_check: true` | `@check` light review → `@test` fixes → forward | | 6 | `@test` | NOT_TESTABLE (general) | `@check` sign-off; task goes to `@make` without tests | | 6 | `@test` | NOT_TESTABLE: Missing testability seam | `@make` adds the seam; re-run `@test` | | 6 | `@test` | BLOCKED | Investigate; may need spec or plan revision | | 6 | `@test` (stub-first) | All tests pass with zero `todo!()` panics | Reject — structural-only tests; route back to `@test` to rewrite | | 7 | `@make` | Implementation Complete | Proceed to Phase 8 | | 7 | `@make` | Implementation Incomplete + entry-validation flag | `@check` (test diagnosis) → `@test` (fixes) → fresh `@make` | | 7 | `@make` | Implementation Incomplete + `escalate: test_design` | Same path; max 2 escalation cycles | | 7 | `@make` | Implementation Incomplete (no flag) | Re-dispatch with `@check` notes once; if 2nd attempt fails, treat as `escalate: test_design` | | 7 | Escalation loop | 2 cycles exhausted | Back to Phase 3 (plan revisit) | | 8 | `@check` + `@simplify` | ACCEPTABLE | Proceed to Phase 9 | | 8 | `@check` | BLOCK / behavioral / production-code finding | New `@make` task spec from finding; dispatch (max 3 cycles) | | 8 | `@check` | BLOCK / test-design / test-quality finding | `@check` → `@test` → `@make` re-verify | | 8 | `@check` | BLOCK / plan-level finding | Back to Phase 3 with the finding | | 8 | `@simplify` | Advisory | Record in summary's "Advisory notes (not filed)" | | 8 | Reviewers | Strictly cosmetic finding (typo, missing newline, AST-preserving) | Orchestrator fixes directly; re-review | | 8 | Review loop | Same finding twice | Convergence; stop loop | | 8 | Review loop | 3 cycles exhausted | Document blockers; proceed | | 9 | Orchestrator | Pre-existing bug, out of scope | File sub-issue via `@pm` (label: `bug`) | | 9 | Orchestrator | Unresolved review-loop blocker | File sub-issue via `@pm` (label: `followup`) | | 9 | `@test` (Phase 6) | NOT_TESTABLE future-seam note | File sub-issue via `@pm` (label: `tech-debt`) | | 9 | Orchestrator | `@simplify` advisory not acted on | Record in summary; do NOT file (records, not work) | | 9 | Orchestrator | All commits done | Set issue status to `Done`; sync README/parent; commit `chore(todo): …` | --- ## 5. Design Decisions Log ADR-flavoured. New decisions append at the end. If a decision is later reversed or refined, mark the original *Superseded by ADR-N* and add a new entry. ### ADR-1 (2026-05-06) — Forge-agnostic workflow **Context:** original gist used the GitHub `gh` CLI for auth checks and `gh pr create --draft` at the end of the run. **Decision:** workflow stops at `git commit`. No push, no PR/MR creation, no `gh` references anywhere. **Alternatives:** keep `gh` integration; abstract behind a forge-plugin interface. **Consequences:** workflow runs on any git host; user opens PR/MR manually on whichever forge they use. Removes the need for forge auth setup as a prerequisite. ### ADR-2 (2026-05-06) — `@pm` operates on local `TODO/` folder **Context:** original `@pm` agent used the Linear CLI. **Decision:** Linear-style folder-as-tracker with one `.md` file per issue plus a category-grouped `README.md`. **Alternatives:** keep Linear; multi-backend abstraction; single-file `TODO.md`. **Consequences:** project-local, version-controlled, no external service. Schema enforced in `agents/pm.md`. Initial single-file design moved to per-issue files in ADR-12. ### ADR-3 (2026-05-07) — Workflow runs in worktree, not bare repo **Context:** original orchestrated bare-clone → worktree creation as Phase 3 of the workflow. **Decision:** user creates the worktree before launching opencode; the workflow assumes CWD is the worktree. **Alternatives:** keep auto-worktree-creation; auto-detect bare vs. worktree. **Consequences:** simpler workflow; opencode CWD = worktree, so subagents inherit the right project root naturally; less plumbing around `WORKTREE_PATH`. (Subagents still get absolute paths in dispatch prompts — see ADR-7.) ### ADR-4 (2026-05-07) — `@make` and `@test` are polyglot **Context:** original was Python-only via `uv`. **Decision:** detect toolchain from marker files (`pyproject.toml`, `Cargo.toml`, `flake.nix`); wrap all toolchain commands in `nix develop -c` if a devshell is present. **Alternatives:** per-language agents; keep Python-only. **Consequences:** one agent per role serves multiple languages. Permission allowlists expanded for `cargo` and `nix develop -c`. Bash sandbox still denies shell escapes inside the wrapper. ### ADR-5 (2026-05-07) — Subagent CWD via absolute paths **Context:** opencode subagents do not inherit the orchestrator's `cd`. A `@check` dispatched from inside a worktree resolved relative paths against the parent project root and failed with "file not found." **Decision:** capture `WORKTREE_PATH` in Phase 1 and pass absolute paths to every subagent dispatch. **Alternatives:** patch opencode (out of scope); symlink dance. **Consequences:** every dispatch has an explicit `Worktree: ` header convention. Verbose but reliable. Eventually superseded by run-artifact paths under `$RUN_DIR` (ADR-7). ### ADR-6 (2026-05-08) — Run artifacts on disk in `.workflow/run-/` **Context:** the orchestrator was paraphrasing the plan and task specs into each dispatch prompt. Result: `@check` and `@simplify` could see slightly different versions of the same plan; mid-loop revisions could leak as "actually let me reconsider…" passages; long specs ate context budget on every dispatch. **Decision:** orchestrator writes `plan.md` (Phase 3), `task-N.md` (Phase 5), and `summary.md` (Phase 9) to `$WORKTREE_PATH/.workflow/run-/`. Dispatches name files by absolute path; subagents read them. **Alternatives:** inline prompts (status quo); database; in-memory orchestrator state. **Consequences:** byte-for-byte source of truth across dispatches. Mid-loop revisions edit the file in place; every subsequent reader sees the new version. Run-artifact directory is gitignored (`.workflow/`). ### ADR-7 (2026-05-08) — Stub-first Rust TDD (mandatory for new symbols) **Context:** Rust integration tests reference symbols imported from `lib.rs`. If those symbols don't exist yet, the test crate fails to compile — a build-error RED with no stack trace and no assertion diagnostics. Same for module tests against not-yet-existing functions. **Decision:** for any Rust task that introduces new symbols, dispatch a stub-pass `@make` first (writes `todo!()`-bodied stubs, runs `cargo check` only). Then `@test` runs against compiling stubs; runtime panic on `todo!()` is the clean RED. Then `@make` body pass replaces stubs. **Alternatives:** accept compile-error RED; let `@make` write tests + bodies in one pass; allow `@test` to add stubs to production source. **Consequences:** two atomic commits per affected task (`feat: scaffold X with todo!() stubs`, then `feat: implement X`). Stub-pass scope is tight: bodies are exactly `todo!()`, signatures must match the planned final API. Phase 6 also adds a mandatory panic-coverage check after `@test`: every test must panic on `todo!()` to prove it actually exercises the stubbed symbols (catches structural-only tests). ### ADR-8 (2026-05-08) — `@test` may write inside `#[cfg(test)] mod` blocks **Context:** Rust unit tests live colocated in production source files inside `#[cfg(test)] mod tests { … }` blocks — the canonical idiom, not an edge case. Original `@test` File Constraint forbade `src/` writes entirely, which forced `@make` to write both production code and tests in a single dispatch. This lost the RED→GREEN separation that TDD relies on. **Decision:** `@test` may modify `src/**/*.rs` strictly inside `#[cfg(test)] mod { … }` blocks. Every line outside such a block stays read-only. **Alternatives:** keep the restriction; write all unit-level tests as integration tests. **Consequences:** TDD works for module tests as well as integration tests. The previous Phase 6 file gate (path-based `git status` snapshot diff) is removed — with `@test` now legitimately writing inside `src/`, a path-based gate proves nothing. Constraint is now enforced by the prompt rule, the diff being human-reviewable, and `@check` flagging production-code drift in Phase 8. ### ADR-9 (2026-05-08) — Phase 5.5 task-split review by `@check` **Context:** `ppries`' README mentioned `@check` reviewing the task split for completeness, but the gist's `workflow.md` never implemented it. Without a split-review gate, an over- or under-split task surfaced only at Phase 8 final review — after expensive `@test` and `@make` dispatches had already run on a broken split. **Decision:** new Phase 5.5 dispatches `@check` against `plan.md` + every `task-N.md` to evaluate the split against five questions: coverage, no overlap, single-purpose, integration contracts, testable AC. Max 2 cycles; BLOCK routes back to Phase 4 (plan itself doesn't decompose). **Alternatives:** status quo (catch at Phase 8); orchestrator self-check. **Consequences:** one extra `@check` dispatch per run. `@simplify` is not involved at this phase — split review is structural, not complexity. Cheaper failure modes for over-/under-split tasks. ### ADR-10 (2026-05-08) — `@pm` is single-mode (filesystem only) **Context:** `@pm` had two read modes — `git show :TODO.md` (read-only) and filesystem (read/write). Git-ref mode existed for the bare-repo flow that ADR-3 retired. After ADR-3, the workflow always used filesystem mode; git-ref mode was dead weight that still added bash permissions and doc surface. **Decision:** remove git-ref mode. `@pm` has no bash access. Ad-hoc historical reads (`git show main:TODO/GAL-39.md`) are out of scope — the user runs them directly. **Alternatives:** keep dual-mode; document the separation more clearly. **Consequences:** simpler agent. One less permission allowlist to maintain. Workflow's "(live filesystem mode)" qualifier dropped from Phase 2 / Phase 9 / Failure handler. ### ADR-11 (2026-05-08) — Phase 9 files follow-ups as TODO sub-issues **Context:** unresolved items (pre-existing bugs out of scope, blocked review findings, future-seam notes) were recorded only in `summary.md` — per-run, untracked, overwritten on the next run, read by nobody since the user has walked away. **Decision:** Phase 9 has a `### File Follow-ups` step that dispatches `@pm` to create new TODO sub-issues for tracked-worthy items. Each new issue has `parent: `, status `Todo`, and an appropriate label (`bug` / `followup` / `tech-debt`). `@simplify` advisories that the orchestrator chose not to act on stay in the summary as records, not filed. **Alternatives:** leave items in summary; create as top-level issues (would need a README.md category, which can't be picked at unattended runtime). **Consequences:** unresolved items become tracked work. Sub-issue routing avoids the README-category problem. The follow-up files commit alongside the worked-issue update in a single `chore(todo): …` commit. ### ADR-12 (2026-05-08) — Phase 7 mid-implementation escalation **Context:** Phase 7's escalation rule was gated on `@make` flagging concerns *during entry validation* (the RED check before implementing). When `@make` got past entry validation, started implementing, and then ground for 2-3 attempts because the test demanded impossible production code, the orchestrator had no documented route — it would re-dispatch `@make` with marginal context tweaks instead of recognizing the diagnosis as test-architecture failure. **Decision:** split Phase 7's escalation into entry-validation and mid-implementation paths. `@make` reports `escalate: test_design` when its iteration limit is reached and the test seems to demand impossible / unreasonable code. Both paths route through `@check` (test diagnosis) → `@test` (redesign) → fresh `@make` dispatch. Max 2 escalation cycles before reverting to Phase 3 plan revisit. **Alternatives:** status quo; let `@make` modify test files itself. **Consequences:** faster recovery from test-design errors. Bounded loop prevents thrashing. `@make.md` Iteration Limits section gains a new red-flag class. --- ## 6. Open Questions / Known Gaps When a question gets answered, move it to the [Design decisions log](#5-design-decisions-log). ### Q1: Phase 5.5 review scope — does `@check` evaluate test-design soundness here? Currently Phase 5.5 reviews the **split** (coverage, overlap, single-purpose, integration contracts, testable AC). It does *not* explicitly evaluate whether the test approach implied by each task spec is sound. That would partially overlap with Phase 4 (which has a plan-level Test Design section the reviewers evaluate). If a test-design error escapes Phase 4 and is encoded in a task spec, it surfaces at Phase 7 via the mid-impl escalation (ADR-12) — but earlier detection might be cheaper. Open: should Phase 5.5 add "test approach for each task is sound" as a sixth review question, or is that scope creep into Phase 4 territory? ### Q2: How does the orchestrator handle "split heuristic violated only after attempting a task"? Phase 5's Split Heuristic catches obvious over-/under-split cases at planning time. But sometimes a task that *looked* single-purpose during planning turns out to mix structural and runtime work only when `@make` starts implementing it. There's no documented mid-Phase-7 routing for "this task needs to be split now." Currently `@make` would either thrash (mid-impl escalation, ADR-12), or report the spec is ambiguous (Insufficient Context Protocol in `make.md`). Open: should there be a "split mid-flight" route that takes the task back to Phase 5 for re-splitting? ### Q3: Phase 9 has no rollback for partial commits if it fails between sub-steps Phase 9's order is: code commit → TODO update → file follow-ups → commit TODO changes → write summary. If the workflow crashes between code commit and TODO commit, the worktree has the code change but the issue file still says `In Progress`. The Failure Handler covers earlier-phase crashes but Phase-9-internal partial states aren't explicitly addressed. Open: should the Failure Handler distinguish "Phase 9 partial" and resume from the right sub-step on retry, or is leaving manual cleanup to the user good enough? ### Q4: `@simplify` not involved at Phase 5.5 — is that the right call? Phase 5.5 only dispatches `@check`. Rationale (ADR-9) is that split review is structural, not complexity. But `@simplify`'s lens — "what if we deleted this?" — could legitimately catch unnecessary tasks (e.g. a third task that adds an abstraction nothing else needs). Open: is the cost of one more dispatch worth the catch? ### Q5: Test-design loop bound vs plan-revisit threshold ADR-12 sets max 2 cycles for the Phase 7 test-design escalation before reverting to Phase 3 plan revisit. The plan-review and final-review loops have max 3. Why the asymmetry? The test-design loop is more expensive per cycle (`@check` + `@test` + `@make` re-implement vs. just reviewers + plan edit), so 2 may be right. But the choice was made by feel, not measured. Open: is 2 the right number, or should it match Phase 4 / Phase 8 at 3? ---