# Workflow Design
## 1. Purpose
This document is the **design rationale and decision log** for the multi-agent workflow. The operational rules — what the orchestrator does, in what order, with what guardrails — live in [`commands/workflow.md`](commands/workflow.md) and the agent files under [`agents/`](agents/). This document is where we discuss changes *before* they land in those files.
**Intended flow:**
1. A new idea, gap, or failure mode comes up (often from a real run).
2. Discuss in this document — capture context, options, trade-offs.
3. When a decision is reached, update `commands/workflow.md` and/or the relevant agent file.
4. Record the decision in the [Design decisions log](#5-design-decisions-log) below.
The operational files stay terse and procedural. The "why" lives here.
---
## 2. Cast & Responsibilities
One orchestrator, five subagents. The orchestrator runs in `agent: build` mode; the subagents are defined as separate agent files under `config/opencode/agents/`.
| Actor | File | Role | Boundary |
|---|---|---|---|
| **Orchestrator** | `commands/workflow.md` | Plans, dispatches, merges findings, edits artifacts under `.workflow/`, commits. | **Does not** write production code, write tests, or play any subagent's role. |
| `@check` | `agents/check.md` | Reviews plans / task splits / code for risks, correctness, testability. | Read-only — no write / edit / bash. |
| `@simplify` | `agents/simplify.md` | Reviews for unnecessary complexity. Advisory only. | Read-only. |
| `@test` | `agents/test.md` | Writes failing tests for a task spec, verifies RED. | May modify test files / `#[cfg(test)] mod` blocks. Sandboxed bash. |
| `@make` | `agents/make.md` | Implements a single task spec. Verifies acceptance criteria. | May modify files listed in the task spec. Sandboxed bash; no `git` / network / `cd`. |
| `@pm` | `agents/pm.md` | Reads / updates `TODO/` issue files. | May modify only `TODO/` contents. No bash. |
**Permission boundaries are enforced per agent.** The orchestrator (in `agent: build` mode) has full edit/bash capabilities, which is precisely why it must not act as the subagents — the agent files are where the limits live.
---
## 3. Flow Diagrams
### 3.1 Phase pipeline
High-level happy path with the major escalation arms.
```mermaid
flowchart TD
P1[Phase 1: Sanity Check]
P2[Phase 2: Issue Context
@pm reads TODO/ID.md]
P3[Phase 3: Plan
write plan.md]
P4{Phase 4: Review Plan
@check + @simplify
max 3 cycles}
P5[Phase 5: Split into Tasks
write task-N.md]
P55{Phase 5.5: Review Split
@check
max 2 cycles}
P6[Phase 6: Write Tests
@test ± stub-first @make]
P7[Phase 7: Implement
@make]
P7E{Test-design escalation
max 2 cycles}
P8{Phase 8: Final Review
@check + @simplify
max 3 cycles}
P9[Phase 9: Commit + TODO + Follow-ups + Summary]
P1 --> P2 --> P3 --> P4
P4 -->|ACCEPTABLE| P5 --> P55
P4 -->|NEEDS WORK / BLOCK| P3
P55 -->|ACCEPTABLE| P6 --> P7
P55 -->|NEEDS WORK| P5
P55 -->|BLOCK plan-level| P3
P7 --> P8
P7 -.->|escalate: test_design| P7E
P7E -->|@check → @test → @make| P7
P7E -.->|2 cycles exhausted| P3
P8 -->|ACCEPTABLE| P9
P8 -->|production-code finding| P7
P8 -->|test-design finding| P7E
P8 -->|plan-level finding| P3
P9 --> END([Done])
```
### 3.2 Phase 7 escalation loop
The pattern when `@make` cannot reach GREEN.
```mermaid
stateDiagram-v2
[*] --> Dispatched: orchestrator dispatches @make
Dispatched --> EntryCheck: run tests, verify RED
EntryCheck --> Implementing: failure code matches handoff
EntryCheck --> EntryEscalation: test-quality concern
Implementing --> GreenReached: tests pass within 2-3 attempts
Implementing --> MidEscalation: escalate: test_design
Implementing --> MidStuck: incomplete, no flag
MidStuck --> Implementing: re-dispatch with @check notes (1 retry)
MidStuck --> MidEscalation: still failing on retry
EntryEscalation --> CheckDiag
MidEscalation --> CheckDiag
CheckDiag --> TestRedesign: confirmed test-design error
CheckDiag --> Dispatched: rejected (production issue)
TestRedesign --> Dispatched: @test fixes, fresh entry validation
Dispatched --> PlanRevisit: 2 escalation cycles exhausted
GreenReached --> [*]
PlanRevisit --> [*]: back to Phase 3
```
### 3.3 Issue lifecycle
How TODO entries move through statuses, with sub-issue filing during a run.
```mermaid
stateDiagram-v2
[*] --> Todo: issue file created
Todo --> InProgress: Phase 2 (workflow starts)
InProgress --> Done: Phase 9 (run completes successfully)
InProgress --> Todo: workflow fails (failure handler adds comment)
note right of InProgress
New sub-issues may be filed during Phase 9
(parent: , status: Todo, label: bug/followup/tech-debt)
end note
Done --> [*]
```
---
## 4. Routing Matrix
Every observed `(phase, signal) → action`. Empty cells are gaps. Walking this table is the cheap way to spot routing issues like the recent Phase 7 mid-implementation escalation.
| Phase | Signal source | Signal | Action |
|---|---|---|---|
| 1 | Sanity checks | Bare repo / detached HEAD / missing `TODO/.md` / branch == base | Stop with error |
| 2 | `@pm` | Issue not found | Stop with error |
| 2 | `@pm` | Status is `Todo` | Flip to `In Progress`; propagate to README.md / parent's Sub-issues |
| 3 | Orchestrator | Plan drafted | Apply Dispatch Hygiene; write `plan.md`; verify `test -f` |
| 4 | `@check` + `@simplify` | Both ACCEPTABLE | Proceed to Phase 5 |
| 4 | Either reviewer | NEEDS WORK | Edit `plan.md` in place; re-dispatch (max 3 cycles) |
| 4 | `@check` | BLOCK | Edit `plan.md` addressing the finding; re-dispatch |
| 4 | Reviewers | Same finding twice | Convergence detected; stop loop early |
| 4 | Reviewers | Unresolved after 3 cycles | Document blockers in summary; proceed |
| 5 | Orchestrator | Tasks drafted | Apply Dispatch Hygiene; write each `task-N.md`; verify `test -f` for every N |
| 5.5 | `@check` | ACCEPTABLE | Proceed to Phase 6 |
| 5.5 | `@check` | NEEDS WORK | Edit `task-N.md` in place; re-dispatch (max 2 cycles) |
| 5.5 | `@check` | BLOCK | Plan doesn't decompose cleanly; back to Phase 4 |
| 6 | `@test` | TESTS_READY + `escalate_to_check: false` | Proceed to Phase 7 |
| 6 | `@test` | TESTS_READY + `escalate_to_check: true` | `@check` light review → `@test` fixes → forward |
| 6 | `@test` | NOT_TESTABLE (general) | `@check` sign-off; task goes to `@make` without tests |
| 6 | `@test` | NOT_TESTABLE: Missing testability seam | `@make` adds the seam; re-run `@test` |
| 6 | `@test` | BLOCKED | Investigate; may need spec or plan revision |
| 6 | `@test` (stub-first) | All tests pass with zero `todo!()` panics | Reject — structural-only tests; route back to `@test` to rewrite |
| 7 | `@make` | Implementation Complete | Proceed to Phase 8 |
| 7 | `@make` | Implementation Incomplete + entry-validation flag | `@check` (test diagnosis) → `@test` (fixes) → fresh `@make` |
| 7 | `@make` | Implementation Incomplete + `escalate: test_design` | Same path; max 2 escalation cycles |
| 7 | `@make` | Implementation Incomplete (no flag) | Re-dispatch with `@check` notes once; if 2nd attempt fails, treat as `escalate: test_design` |
| 7 | Escalation loop | 2 cycles exhausted | Back to Phase 3 (plan revisit) |
| 8 | `@check` + `@simplify` | ACCEPTABLE | Proceed to Phase 9 |
| 8 | `@check` | BLOCK / behavioral / production-code finding | New `@make` task spec from finding; dispatch (max 3 cycles) |
| 8 | `@check` | BLOCK / test-design / test-quality finding | `@check` → `@test` → `@make` re-verify |
| 8 | `@check` | BLOCK / plan-level finding | Back to Phase 3 with the finding |
| 8 | `@simplify` | Advisory | Record in summary's "Advisory notes (not filed)" |
| 8 | Reviewers | Strictly cosmetic finding (typo, missing newline, AST-preserving) | Orchestrator fixes directly; re-review |
| 8 | Review loop | Same finding twice | Convergence; stop loop |
| 8 | Review loop | 3 cycles exhausted | Document blockers; proceed |
| 9 | Orchestrator | Pre-existing bug, out of scope | File sub-issue via `@pm` (label: `bug`) |
| 9 | Orchestrator | Unresolved review-loop blocker | File sub-issue via `@pm` (label: `followup`) |
| 9 | `@test` (Phase 6) | NOT_TESTABLE future-seam note | File sub-issue via `@pm` (label: `tech-debt`) |
| 9 | Orchestrator | `@simplify` advisory not acted on | Record in summary; do NOT file (records, not work) |
| 9 | Orchestrator | All commits done | Set issue status to `Done`; sync README/parent; commit `chore(todo): …` |
---
## 5. Design Decisions Log
ADR-flavoured. New decisions append at the end. If a decision is later reversed or refined, mark the original *Superseded by ADR-N* and add a new entry.
### ADR-1 (2026-05-06) — Forge-agnostic workflow
**Context:** original gist used the GitHub `gh` CLI for auth checks and `gh pr create --draft` at the end of the run.
**Decision:** workflow stops at `git commit`. No push, no PR/MR creation, no `gh` references anywhere.
**Alternatives:** keep `gh` integration; abstract behind a forge-plugin interface.
**Consequences:** workflow runs on any git host; user opens PR/MR manually on whichever forge they use. Removes the need for forge auth setup as a prerequisite.
### ADR-2 (2026-05-06) — `@pm` operates on local `TODO/` folder
**Context:** original `@pm` agent used the Linear CLI.
**Decision:** Linear-style folder-as-tracker with one `.md` file per issue plus a category-grouped `README.md`.
**Alternatives:** keep Linear; multi-backend abstraction; single-file `TODO.md`.
**Consequences:** project-local, version-controlled, no external service. Schema enforced in `agents/pm.md`. Initial single-file design moved to per-issue files in ADR-12.
### ADR-3 (2026-05-07) — Workflow runs in worktree, not bare repo
**Context:** original orchestrated bare-clone → worktree creation as Phase 3 of the workflow.
**Decision:** user creates the worktree before launching opencode; the workflow assumes CWD is the worktree.
**Alternatives:** keep auto-worktree-creation; auto-detect bare vs. worktree.
**Consequences:** simpler workflow; opencode CWD = worktree, so subagents inherit the right project root naturally; less plumbing around `WORKTREE_PATH`. (Subagents still get absolute paths in dispatch prompts — see ADR-7.)
### ADR-4 (2026-05-07) — `@make` and `@test` are polyglot
**Context:** original was Python-only via `uv`.
**Decision:** detect toolchain from marker files (`pyproject.toml`, `Cargo.toml`, `flake.nix`); wrap all toolchain commands in `nix develop -c` if a devshell is present.
**Alternatives:** per-language agents; keep Python-only.
**Consequences:** one agent per role serves multiple languages. Permission allowlists expanded for `cargo` and `nix develop -c`. Bash sandbox still denies shell escapes inside the wrapper.
### ADR-5 (2026-05-07) — Subagent CWD via absolute paths
**Context:** opencode subagents do not inherit the orchestrator's `cd`. A `@check` dispatched from inside a worktree resolved relative paths against the parent project root and failed with "file not found."
**Decision:** capture `WORKTREE_PATH` in Phase 1 and pass absolute paths to every subagent dispatch.
**Alternatives:** patch opencode (out of scope); symlink dance.
**Consequences:** every dispatch has an explicit `Worktree: ` header convention. Verbose but reliable. Eventually superseded by run-artifact paths under `$RUN_DIR` (ADR-7).
### ADR-6 (2026-05-08) — Run artifacts on disk in `.workflow/run-/`
**Context:** the orchestrator was paraphrasing the plan and task specs into each dispatch prompt. Result: `@check` and `@simplify` could see slightly different versions of the same plan; mid-loop revisions could leak as "actually let me reconsider…" passages; long specs ate context budget on every dispatch.
**Decision:** orchestrator writes `plan.md` (Phase 3), `task-N.md` (Phase 5), and `summary.md` (Phase 9) to `$WORKTREE_PATH/.workflow/run-/`. Dispatches name files by absolute path; subagents read them.
**Alternatives:** inline prompts (status quo); database; in-memory orchestrator state.
**Consequences:** byte-for-byte source of truth across dispatches. Mid-loop revisions edit the file in place; every subsequent reader sees the new version. Run-artifact directory is gitignored (`.workflow/`).
### ADR-7 (2026-05-08) — Stub-first Rust TDD (mandatory for new symbols)
**Context:** Rust integration tests reference symbols imported from `lib.rs`. If those symbols don't exist yet, the test crate fails to compile — a build-error RED with no stack trace and no assertion diagnostics. Same for module tests against not-yet-existing functions.
**Decision:** for any Rust task that introduces new symbols, dispatch a stub-pass `@make` first (writes `todo!()`-bodied stubs, runs `cargo check` only). Then `@test` runs against compiling stubs; runtime panic on `todo!()` is the clean RED. Then `@make` body pass replaces stubs.
**Alternatives:** accept compile-error RED; let `@make` write tests + bodies in one pass; allow `@test` to add stubs to production source.
**Consequences:** two atomic commits per affected task (`feat: scaffold X with todo!() stubs`, then `feat: implement X`). Stub-pass scope is tight: bodies are exactly `todo!()`, signatures must match the planned final API. Phase 6 also adds a mandatory panic-coverage check after `@test`: every test must panic on `todo!()` to prove it actually exercises the stubbed symbols (catches structural-only tests).
### ADR-8 (2026-05-08) — `@test` may write inside `#[cfg(test)] mod` blocks
**Context:** Rust unit tests live colocated in production source files inside `#[cfg(test)] mod tests { … }` blocks — the canonical idiom, not an edge case. Original `@test` File Constraint forbade `src/` writes entirely, which forced `@make` to write both production code and tests in a single dispatch. This lost the RED→GREEN separation that TDD relies on.
**Decision:** `@test` may modify `src/**/*.rs` strictly inside `#[cfg(test)] mod { … }` blocks. Every line outside such a block stays read-only.
**Alternatives:** keep the restriction; write all unit-level tests as integration tests.
**Consequences:** TDD works for module tests as well as integration tests. The previous Phase 6 file gate (path-based `git status` snapshot diff) is removed — with `@test` now legitimately writing inside `src/`, a path-based gate proves nothing. Constraint is now enforced by the prompt rule, the diff being human-reviewable, and `@check` flagging production-code drift in Phase 8.
### ADR-9 (2026-05-08) — Phase 5.5 task-split review by `@check`
**Context:** `ppries`' README mentioned `@check` reviewing the task split for completeness, but the gist's `workflow.md` never implemented it. Without a split-review gate, an over- or under-split task surfaced only at Phase 8 final review — after expensive `@test` and `@make` dispatches had already run on a broken split.
**Decision:** new Phase 5.5 dispatches `@check` against `plan.md` + every `task-N.md` to evaluate the split against five questions: coverage, no overlap, single-purpose, integration contracts, testable AC. Max 2 cycles; BLOCK routes back to Phase 4 (plan itself doesn't decompose).
**Alternatives:** status quo (catch at Phase 8); orchestrator self-check.
**Consequences:** one extra `@check` dispatch per run. `@simplify` is not involved at this phase — split review is structural, not complexity. Cheaper failure modes for over-/under-split tasks.
### ADR-10 (2026-05-08) — `@pm` is single-mode (filesystem only)
**Context:** `@pm` had two read modes — `git show [:TODO.md` (read-only) and filesystem (read/write). Git-ref mode existed for the bare-repo flow that ADR-3 retired. After ADR-3, the workflow always used filesystem mode; git-ref mode was dead weight that still added bash permissions and doc surface.
**Decision:** remove git-ref mode. `@pm` has no bash access. Ad-hoc historical reads (`git show main:TODO/GAL-39.md`) are out of scope — the user runs them directly.
**Alternatives:** keep dual-mode; document the separation more clearly.
**Consequences:** simpler agent. One less permission allowlist to maintain. Workflow's "(live filesystem mode)" qualifier dropped from Phase 2 / Phase 9 / Failure handler.
### ADR-11 (2026-05-08) — Phase 9 files follow-ups as TODO sub-issues
**Context:** unresolved items (pre-existing bugs out of scope, blocked review findings, future-seam notes) were recorded only in `summary.md` — per-run, untracked, overwritten on the next run, read by nobody since the user has walked away.
**Decision:** Phase 9 has a `### File Follow-ups` step that dispatches `@pm` to create new TODO sub-issues for tracked-worthy items. Each new issue has `parent: `, status `Todo`, and an appropriate label (`bug` / `followup` / `tech-debt`). `@simplify` advisories that the orchestrator chose not to act on stay in the summary as records, not filed.
**Alternatives:** leave items in summary; create as top-level issues (would need a README.md category, which can't be picked at unattended runtime).
**Consequences:** unresolved items become tracked work. Sub-issue routing avoids the README-category problem. The follow-up files commit alongside the worked-issue update in a single `chore(todo): …` commit.
### ADR-12 (2026-05-08) — Phase 7 mid-implementation escalation
**Context:** Phase 7's escalation rule was gated on `@make` flagging concerns *during entry validation* (the RED check before implementing). When `@make` got past entry validation, started implementing, and then ground for 2-3 attempts because the test demanded impossible production code, the orchestrator had no documented route — it would re-dispatch `@make` with marginal context tweaks instead of recognizing the diagnosis as test-architecture failure.
**Decision:** split Phase 7's escalation into entry-validation and mid-implementation paths. `@make` reports `escalate: test_design` when its iteration limit is reached and the test seems to demand impossible / unreasonable code. Both paths route through `@check` (test diagnosis) → `@test` (redesign) → fresh `@make` dispatch. Max 2 escalation cycles before reverting to Phase 3 plan revisit.
**Alternatives:** status quo; let `@make` modify test files itself.
**Consequences:** faster recovery from test-design errors. Bounded loop prevents thrashing. `@make.md` Iteration Limits section gains a new red-flag class.
---
## 6. Open Questions / Known Gaps
When a question gets answered, move it to the [Design decisions log](#5-design-decisions-log).
### Q1: Phase 5.5 review scope — does `@check` evaluate test-design soundness here?
Currently Phase 5.5 reviews the **split** (coverage, overlap, single-purpose, integration contracts, testable AC). It does *not* explicitly evaluate whether the test approach implied by each task spec is sound. That would partially overlap with Phase 4 (which has a plan-level Test Design section the reviewers evaluate). If a test-design error escapes Phase 4 and is encoded in a task spec, it surfaces at Phase 7 via the mid-impl escalation (ADR-12) — but earlier detection might be cheaper. Open: should Phase 5.5 add "test approach for each task is sound" as a sixth review question, or is that scope creep into Phase 4 territory?
### Q2: How does the orchestrator handle "split heuristic violated only after attempting a task"?
Phase 5's Split Heuristic catches obvious over-/under-split cases at planning time. But sometimes a task that *looked* single-purpose during planning turns out to mix structural and runtime work only when `@make` starts implementing it. There's no documented mid-Phase-7 routing for "this task needs to be split now." Currently `@make` would either thrash (mid-impl escalation, ADR-12), or report the spec is ambiguous (Insufficient Context Protocol in `make.md`). Open: should there be a "split mid-flight" route that takes the task back to Phase 5 for re-splitting?
### Q3: Phase 9 has no rollback for partial commits if it fails between sub-steps
Phase 9's order is: code commit → TODO update → file follow-ups → commit TODO changes → write summary. If the workflow crashes between code commit and TODO commit, the worktree has the code change but the issue file still says `In Progress`. The Failure Handler covers earlier-phase crashes but Phase-9-internal partial states aren't explicitly addressed. Open: should the Failure Handler distinguish "Phase 9 partial" and resume from the right sub-step on retry, or is leaving manual cleanup to the user good enough?
### Q4: `@simplify` not involved at Phase 5.5 — is that the right call?
Phase 5.5 only dispatches `@check`. Rationale (ADR-9) is that split review is structural, not complexity. But `@simplify`'s lens — "what if we deleted this?" — could legitimately catch unnecessary tasks (e.g. a third task that adds an abstraction nothing else needs). Open: is the cost of one more dispatch worth the catch?
### Q5: Test-design loop bound vs plan-revisit threshold
ADR-12 sets max 2 cycles for the Phase 7 test-design escalation before reverting to Phase 3 plan revisit. The plan-review and final-review loops have max 3. Why the asymmetry? The test-design loop is more expensive per cycle (`@check` + `@test` + `@make` re-implement vs. just reviewers + plan edit), so 2 may be right. But the choice was made by feel, not measured. Open: is 2 the right number, or should it match Phase 4 / Phase 8 at 3?
---
]