Forgejo's Mermaid parser is stricter than GitHub's and rejected two
diagrams in workflow-design.md:
1. Flowchart 3.1 — `@check`, `@test`, `@make` in pipe-delimited edge
labels were tokenised as LINK_ID (newer Mermaid uses `@{...}` for
edge IDs), e.g. `P7E -->|@check → @test → @make| P7` failed at
the first @.
2. State diagram 3.2 — the second colon inside transition labels
(`escalate: test_design`) collided with the `:` field separator
that splits transition from label.
Drops the @-prefix from labels in all three diagrams (`@check` → `check`
in prose-of-the-label only; ADRs and prose elsewhere keep `@check`
backticked, which is just markdown). Replaces second colons with
descriptive text. Drops parentheses from state-diagram transition
labels. Drops the Unicode arrow `→` in favour of plain words.
Quotes the flowchart node-label strings to keep `<br/>` safe.
The ADR text and prose continue to use `@<name>` references — those
live in markdown, not Mermaid, and render the same.
21 KiB
Workflow Design
1. Purpose
This document is the design rationale and decision log for the multi-agent workflow. The operational rules — what the orchestrator does, in what order, with what guardrails — live in commands/workflow.md and the agent files under agents/. This document is where we discuss changes before they land in those files.
Intended flow:
- A new idea, gap, or failure mode comes up (often from a real run).
- Discuss in this document — capture context, options, trade-offs.
- When a decision is reached, update
commands/workflow.mdand/or the relevant agent file. - Record the decision in the Design decisions log below.
The operational files stay terse and procedural. The "why" lives here.
2. Cast & Responsibilities
One orchestrator, five subagents. The orchestrator runs in agent: build mode; the subagents are defined as separate agent files under config/opencode/agents/.
| Actor | File | Role | Boundary |
|---|---|---|---|
| Orchestrator | commands/workflow.md |
Plans, dispatches, merges findings, edits artifacts under .workflow/, commits. |
Does not write production code, write tests, or play any subagent's role. |
@check |
agents/check.md |
Reviews plans / task splits / code for risks, correctness, testability. | Read-only — no write / edit / bash. |
@simplify |
agents/simplify.md |
Reviews for unnecessary complexity. Advisory only. | Read-only. |
@test |
agents/test.md |
Writes failing tests for a task spec, verifies RED. | May modify test files / #[cfg(test)] mod blocks. Sandboxed bash. |
@make |
agents/make.md |
Implements a single task spec. Verifies acceptance criteria. | May modify files listed in the task spec. Sandboxed bash; no git / network / cd. |
@pm |
agents/pm.md |
Reads / updates TODO/ issue files. |
May modify only TODO/ contents. No bash. |
Permission boundaries are enforced per agent. The orchestrator (in agent: build mode) has full edit/bash capabilities, which is precisely why it must not act as the subagents — the agent files are where the limits live.
3. Flow Diagrams
3.1 Phase pipeline
High-level happy path with the major escalation arms.
flowchart TD
P1["Phase 1: Sanity Check"]
P2["Phase 2: Issue Context<br/>pm reads TODO/ID.md"]
P3["Phase 3: Plan<br/>write plan.md"]
P4{"Phase 4: Review Plan<br/>check + simplify<br/>max 3 cycles"}
P5["Phase 5: Split into Tasks<br/>write task-N.md"]
P55{"Phase 5.5: Review Split<br/>check<br/>max 2 cycles"}
P6["Phase 6: Write Tests<br/>test, stub-first make"]
P7["Phase 7: Implement<br/>make"]
P7E{"Test-design escalation<br/>max 2 cycles"}
P8{"Phase 8: Final Review<br/>check + simplify<br/>max 3 cycles"}
P9["Phase 9: Commit + TODO + Follow-ups + Summary"]
P1 --> P2 --> P3 --> P4
P4 -->|ACCEPTABLE| P5 --> P55
P4 -->|NEEDS WORK or BLOCK| P3
P55 -->|ACCEPTABLE| P6 --> P7
P55 -->|NEEDS WORK| P5
P55 -->|BLOCK plan-level| P3
P7 --> P8
P7 -.->|escalate test_design| P7E
P7E -->|check then test then make| P7
P7E -.->|2 cycles exhausted| P3
P8 -->|ACCEPTABLE| P9
P8 -->|production-code finding| P7
P8 -->|test-design finding| P7E
P8 -->|plan-level finding| P3
P9 --> END([Done])
3.2 Phase 7 escalation loop
The pattern when @make cannot reach GREEN.
stateDiagram-v2
[*] --> Dispatched: orchestrator dispatches make
Dispatched --> EntryCheck: run tests verify RED
EntryCheck --> Implementing: failure code matches handoff
EntryCheck --> EntryEscalation: test-quality concern
Implementing --> GreenReached: tests pass within 2-3 attempts
Implementing --> MidEscalation: escalate test_design flag
Implementing --> MidStuck: incomplete no flag
MidStuck --> Implementing: re-dispatch with check notes 1 retry
MidStuck --> MidEscalation: still failing on retry
EntryEscalation --> CheckDiag
MidEscalation --> CheckDiag
CheckDiag --> TestRedesign: confirmed test-design error
CheckDiag --> Dispatched: rejected production issue
TestRedesign --> Dispatched: test fixes fresh entry validation
Dispatched --> PlanRevisit: 2 escalation cycles exhausted
GreenReached --> [*]
PlanRevisit --> [*]: back to Phase 3
3.3 Issue lifecycle
How TODO entries move through statuses, with sub-issue filing during a run.
stateDiagram-v2
[*] --> Todo: issue file created
Todo --> InProgress: Phase 2 workflow starts
InProgress --> Done: Phase 9 run completes
InProgress --> Todo: workflow fails, failure handler adds comment
note right of InProgress
New sub-issues may be filed during Phase 9
with parent ISSUE_ID, status Todo
and label bug, followup, or tech-debt
end note
Done --> [*]
4. Routing Matrix
Every observed (phase, signal) → action. Empty cells are gaps. Walking this table is the cheap way to spot routing issues like the recent Phase 7 mid-implementation escalation.
| Phase | Signal source | Signal | Action |
|---|---|---|---|
| 1 | Sanity checks | Bare repo / detached HEAD / missing TODO/<ID>.md / branch == base |
Stop with error |
| 2 | @pm |
Issue not found | Stop with error |
| 2 | @pm |
Status is Todo |
Flip to In Progress; propagate to README.md / parent's Sub-issues |
| 3 | Orchestrator | Plan drafted | Apply Dispatch Hygiene; write plan.md; verify test -f |
| 4 | @check + @simplify |
Both ACCEPTABLE | Proceed to Phase 5 |
| 4 | Either reviewer | NEEDS WORK | Edit plan.md in place; re-dispatch (max 3 cycles) |
| 4 | @check |
BLOCK | Edit plan.md addressing the finding; re-dispatch |
| 4 | Reviewers | Same finding twice | Convergence detected; stop loop early |
| 4 | Reviewers | Unresolved after 3 cycles | Document blockers in summary; proceed |
| 5 | Orchestrator | Tasks drafted | Apply Dispatch Hygiene; write each task-N.md; verify test -f for every N |
| 5.5 | @check |
ACCEPTABLE | Proceed to Phase 6 |
| 5.5 | @check |
NEEDS WORK | Edit task-N.md in place; re-dispatch (max 2 cycles) |
| 5.5 | @check |
BLOCK | Plan doesn't decompose cleanly; back to Phase 4 |
| 6 | @test |
TESTS_READY + escalate_to_check: false |
Proceed to Phase 7 |
| 6 | @test |
TESTS_READY + escalate_to_check: true |
@check light review → @test fixes → forward |
| 6 | @test |
NOT_TESTABLE (general) | @check sign-off; task goes to @make without tests |
| 6 | @test |
NOT_TESTABLE: Missing testability seam | @make adds the seam; re-run @test |
| 6 | @test |
BLOCKED | Investigate; may need spec or plan revision |
| 6 | @test (stub-first) |
All tests pass with zero todo!() panics |
Reject — structural-only tests; route back to @test to rewrite |
| 7 | @make |
Implementation Complete | Proceed to Phase 8 |
| 7 | @make |
Implementation Incomplete + entry-validation flag | @check (test diagnosis) → @test (fixes) → fresh @make |
| 7 | @make |
Implementation Incomplete + escalate: test_design |
Same path; max 2 escalation cycles |
| 7 | @make |
Implementation Incomplete (no flag) | Re-dispatch with @check notes once; if 2nd attempt fails, treat as escalate: test_design |
| 7 | Escalation loop | 2 cycles exhausted | Back to Phase 3 (plan revisit) |
| 8 | @check + @simplify |
ACCEPTABLE | Proceed to Phase 9 |
| 8 | @check |
BLOCK / behavioral / production-code finding | New @make task spec from finding; dispatch (max 3 cycles) |
| 8 | @check |
BLOCK / test-design / test-quality finding | @check → @test → @make re-verify |
| 8 | @check |
BLOCK / plan-level finding | Back to Phase 3 with the finding |
| 8 | @simplify |
Advisory | Record in summary's "Advisory notes (not filed)" |
| 8 | Reviewers | Strictly cosmetic finding (typo, missing newline, AST-preserving) | Orchestrator fixes directly; re-review |
| 8 | Review loop | Same finding twice | Convergence; stop loop |
| 8 | Review loop | 3 cycles exhausted | Document blockers; proceed |
| 9 | Orchestrator | Pre-existing bug, out of scope | File sub-issue via @pm (label: bug) |
| 9 | Orchestrator | Unresolved review-loop blocker | File sub-issue via @pm (label: followup) |
| 9 | @test (Phase 6) |
NOT_TESTABLE future-seam note | File sub-issue via @pm (label: tech-debt) |
| 9 | Orchestrator | @simplify advisory not acted on |
Record in summary; do NOT file (records, not work) |
| 9 | Orchestrator | All commits done | Set issue status to Done; sync README/parent; commit chore(todo): … |
5. Design Decisions Log
ADR-flavoured. New decisions append at the end. If a decision is later reversed or refined, mark the original Superseded by ADR-N and add a new entry.
ADR-1 (2026-05-06) — Forge-agnostic workflow
Context: original gist used the GitHub gh CLI for auth checks and gh pr create --draft at the end of the run.
Decision: workflow stops at git commit. No push, no PR/MR creation, no gh references anywhere.
Alternatives: keep gh integration; abstract behind a forge-plugin interface.
Consequences: workflow runs on any git host; user opens PR/MR manually on whichever forge they use. Removes the need for forge auth setup as a prerequisite.
ADR-2 (2026-05-06) — @pm operates on local TODO/ folder
Context: original @pm agent used the Linear CLI.
Decision: Linear-style folder-as-tracker with one <ID>.md file per issue plus a category-grouped README.md.
Alternatives: keep Linear; multi-backend abstraction; single-file TODO.md.
Consequences: project-local, version-controlled, no external service. Schema enforced in agents/pm.md. Initial single-file design moved to per-issue files in ADR-12.
ADR-3 (2026-05-07) — Workflow runs in worktree, not bare repo
Context: original orchestrated bare-clone → worktree creation as Phase 3 of the workflow.
Decision: user creates the worktree before launching opencode; the workflow assumes CWD is the worktree.
Alternatives: keep auto-worktree-creation; auto-detect bare vs. worktree.
Consequences: simpler workflow; opencode CWD = worktree, so subagents inherit the right project root naturally; less plumbing around WORKTREE_PATH. (Subagents still get absolute paths in dispatch prompts — see ADR-7.)
ADR-4 (2026-05-07) — @make and @test are polyglot
Context: original was Python-only via uv.
Decision: detect toolchain from marker files (pyproject.toml, Cargo.toml, flake.nix); wrap all toolchain commands in nix develop -c if a devshell is present.
Alternatives: per-language agents; keep Python-only.
Consequences: one agent per role serves multiple languages. Permission allowlists expanded for cargo and nix develop -c. Bash sandbox still denies shell escapes inside the wrapper.
ADR-5 (2026-05-07) — Subagent CWD via absolute paths
Context: opencode subagents do not inherit the orchestrator's cd. A @check dispatched from inside a worktree resolved relative paths against the parent project root and failed with "file not found."
Decision: capture WORKTREE_PATH in Phase 1 and pass absolute paths to every subagent dispatch.
Alternatives: patch opencode (out of scope); symlink dance.
Consequences: every dispatch has an explicit Worktree: <abs path> header convention. Verbose but reliable. Eventually superseded by run-artifact paths under $RUN_DIR (ADR-7).
ADR-6 (2026-05-08) — Run artifacts on disk in .workflow/run-<ID>/
Context: the orchestrator was paraphrasing the plan and task specs into each dispatch prompt. Result: @check and @simplify could see slightly different versions of the same plan; mid-loop revisions could leak as "actually let me reconsider…" passages; long specs ate context budget on every dispatch.
Decision: orchestrator writes plan.md (Phase 3), task-N.md (Phase 5), and summary.md (Phase 9) to $WORKTREE_PATH/.workflow/run-<ISSUE_ID>/. Dispatches name files by absolute path; subagents read them.
Alternatives: inline prompts (status quo); database; in-memory orchestrator state.
Consequences: byte-for-byte source of truth across dispatches. Mid-loop revisions edit the file in place; every subsequent reader sees the new version. Run-artifact directory is gitignored (.workflow/).
ADR-7 (2026-05-08) — Stub-first Rust TDD (mandatory for new symbols)
Context: Rust integration tests reference symbols imported from lib.rs. If those symbols don't exist yet, the test crate fails to compile — a build-error RED with no stack trace and no assertion diagnostics. Same for module tests against not-yet-existing functions.
Decision: for any Rust task that introduces new symbols, dispatch a stub-pass @make first (writes todo!()-bodied stubs, runs cargo check only). Then @test runs against compiling stubs; runtime panic on todo!() is the clean RED. Then @make body pass replaces stubs.
Alternatives: accept compile-error RED; let @make write tests + bodies in one pass; allow @test to add stubs to production source.
Consequences: two atomic commits per affected task (feat: scaffold X with todo!() stubs, then feat: implement X). Stub-pass scope is tight: bodies are exactly todo!(), signatures must match the planned final API. Phase 6 also adds a mandatory panic-coverage check after @test: every test must panic on todo!() to prove it actually exercises the stubbed symbols (catches structural-only tests).
ADR-8 (2026-05-08) — @test may write inside #[cfg(test)] mod blocks
Context: Rust unit tests live colocated in production source files inside #[cfg(test)] mod tests { … } blocks — the canonical idiom, not an edge case. Original @test File Constraint forbade src/ writes entirely, which forced @make to write both production code and tests in a single dispatch. This lost the RED→GREEN separation that TDD relies on.
Decision: @test may modify src/**/*.rs strictly inside #[cfg(test)] mod <name> { … } blocks. Every line outside such a block stays read-only.
Alternatives: keep the restriction; write all unit-level tests as integration tests.
Consequences: TDD works for module tests as well as integration tests. The previous Phase 6 file gate (path-based git status snapshot diff) is removed — with @test now legitimately writing inside src/, a path-based gate proves nothing. Constraint is now enforced by the prompt rule, the diff being human-reviewable, and @check flagging production-code drift in Phase 8.
ADR-9 (2026-05-08) — Phase 5.5 task-split review by @check
Context: ppries' README mentioned @check reviewing the task split for completeness, but the gist's workflow.md never implemented it. Without a split-review gate, an over- or under-split task surfaced only at Phase 8 final review — after expensive @test and @make dispatches had already run on a broken split.
Decision: new Phase 5.5 dispatches @check against plan.md + every task-N.md to evaluate the split against five questions: coverage, no overlap, single-purpose, integration contracts, testable AC. Max 2 cycles; BLOCK routes back to Phase 4 (plan itself doesn't decompose).
Alternatives: status quo (catch at Phase 8); orchestrator self-check.
Consequences: one extra @check dispatch per run. @simplify is not involved at this phase — split review is structural, not complexity. Cheaper failure modes for over-/under-split tasks.
ADR-10 (2026-05-08) — @pm is single-mode (filesystem only)
Context: @pm had two read modes — git show <ref>:TODO.md (read-only) and filesystem (read/write). Git-ref mode existed for the bare-repo flow that ADR-3 retired. After ADR-3, the workflow always used filesystem mode; git-ref mode was dead weight that still added bash permissions and doc surface.
Decision: remove git-ref mode. @pm has no bash access. Ad-hoc historical reads (git show main:TODO/GAL-39.md) are out of scope — the user runs them directly.
Alternatives: keep dual-mode; document the separation more clearly.
Consequences: simpler agent. One less permission allowlist to maintain. Workflow's "(live filesystem mode)" qualifier dropped from Phase 2 / Phase 9 / Failure handler.
ADR-11 (2026-05-08) — Phase 9 files follow-ups as TODO sub-issues
Context: unresolved items (pre-existing bugs out of scope, blocked review findings, future-seam notes) were recorded only in summary.md — per-run, untracked, overwritten on the next run, read by nobody since the user has walked away.
Decision: Phase 9 has a ### File Follow-ups step that dispatches @pm to create new TODO sub-issues for tracked-worthy items. Each new issue has parent: <ISSUE_ID>, status Todo, and an appropriate label (bug / followup / tech-debt). @simplify advisories that the orchestrator chose not to act on stay in the summary as records, not filed.
Alternatives: leave items in summary; create as top-level issues (would need a README.md category, which can't be picked at unattended runtime).
Consequences: unresolved items become tracked work. Sub-issue routing avoids the README-category problem. The follow-up files commit alongside the worked-issue update in a single chore(todo): … commit.
ADR-12 (2026-05-08) — Phase 7 mid-implementation escalation
Context: Phase 7's escalation rule was gated on @make flagging concerns during entry validation (the RED check before implementing). When @make got past entry validation, started implementing, and then ground for 2-3 attempts because the test demanded impossible production code, the orchestrator had no documented route — it would re-dispatch @make with marginal context tweaks instead of recognizing the diagnosis as test-architecture failure.
Decision: split Phase 7's escalation into entry-validation and mid-implementation paths. @make reports escalate: test_design when its iteration limit is reached and the test seems to demand impossible / unreasonable code. Both paths route through @check (test diagnosis) → @test (redesign) → fresh @make dispatch. Max 2 escalation cycles before reverting to Phase 3 plan revisit.
Alternatives: status quo; let @make modify test files itself.
Consequences: faster recovery from test-design errors. Bounded loop prevents thrashing. @make.md Iteration Limits section gains a new red-flag class.
6. Open Questions / Known Gaps
When a question gets answered, move it to the Design decisions log.
Q1: Phase 5.5 review scope — does @check evaluate test-design soundness here?
Currently Phase 5.5 reviews the split (coverage, overlap, single-purpose, integration contracts, testable AC). It does not explicitly evaluate whether the test approach implied by each task spec is sound. That would partially overlap with Phase 4 (which has a plan-level Test Design section the reviewers evaluate). If a test-design error escapes Phase 4 and is encoded in a task spec, it surfaces at Phase 7 via the mid-impl escalation (ADR-12) — but earlier detection might be cheaper. Open: should Phase 5.5 add "test approach for each task is sound" as a sixth review question, or is that scope creep into Phase 4 territory?
Q2: How does the orchestrator handle "split heuristic violated only after attempting a task"?
Phase 5's Split Heuristic catches obvious over-/under-split cases at planning time. But sometimes a task that looked single-purpose during planning turns out to mix structural and runtime work only when @make starts implementing it. There's no documented mid-Phase-7 routing for "this task needs to be split now." Currently @make would either thrash (mid-impl escalation, ADR-12), or report the spec is ambiguous (Insufficient Context Protocol in make.md). Open: should there be a "split mid-flight" route that takes the task back to Phase 5 for re-splitting?
Q3: Phase 9 has no rollback for partial commits if it fails between sub-steps
Phase 9's order is: code commit → TODO update → file follow-ups → commit TODO changes → write summary. If the workflow crashes between code commit and TODO commit, the worktree has the code change but the issue file still says In Progress. The Failure Handler covers earlier-phase crashes but Phase-9-internal partial states aren't explicitly addressed. Open: should the Failure Handler distinguish "Phase 9 partial" and resume from the right sub-step on retry, or is leaving manual cleanup to the user good enough?
Q4: @simplify not involved at Phase 5.5 — is that the right call?
Phase 5.5 only dispatches @check. Rationale (ADR-9) is that split review is structural, not complexity. But @simplify's lens — "what if we deleted this?" — could legitimately catch unnecessary tasks (e.g. a third task that adds an abstraction nothing else needs). Open: is the cost of one more dispatch worth the catch?
Q5: Test-design loop bound vs plan-revisit threshold
ADR-12 sets max 2 cycles for the Phase 7 test-design escalation before reverting to Phase 3 plan revisit. The plan-review and final-review loops have max 3. Why the asymmetry? The test-design loop is more expensive per cycle (@check + @test + @make re-implement vs. just reviewers + plan edit), so 2 may be right. But the choice was made by feel, not measured. Open: is 2 the right number, or should it match Phase 4 / Phase 8 at 3?