Harald Hoyer 0b15944d1c docs(opencode): make workflow-design Mermaid diagrams Forgejo-compatible

Forgejo's Mermaid parser is stricter than GitHub's and rejected two
diagrams in workflow-design.md:

1. Flowchart 3.1 — `@check`, `@test`, `@make` in pipe-delimited edge
   labels were tokenised as LINK_ID (newer Mermaid uses `@{...}` for
   edge IDs), e.g. `P7E -->|@check → @test → @make| P7` failed at
   the first @.
2. State diagram 3.2 — the second colon inside transition labels
   (`escalate: test_design`) collided with the `:` field separator
   that splits transition from label.

Drops the @-prefix from labels in all three diagrams (`@check` → `check`
in prose-of-the-label only; ADRs and prose elsewhere keep `@check`
backticked, which is just markdown). Replaces second colons with
descriptive text. Drops parentheses from state-diagram transition
labels. Drops the Unicode arrow `→` in favour of plain words.
Quotes the flowchart node-label strings to keep `<br/>` safe.

The ADR text and prose continue to use `@<name>` references — those
live in markdown, not Mermaid, and render the same.

2026-05-08 10:24:57 +02:00

21 KiB

Raw Blame History

Workflow Design

1. Purpose

This document is the design rationale and decision log for the multi-agent workflow. The operational rules — what the orchestrator does, in what order, with what guardrails — live in commands/workflow.md and the agent files under agents/. This document is where we discuss changes before they land in those files.

Intended flow:

A new idea, gap, or failure mode comes up (often from a real run).
Discuss in this document — capture context, options, trade-offs.
When a decision is reached, update commands/workflow.md and/or the relevant agent file.
Record the decision in the Design decisions log below.

The operational files stay terse and procedural. The "why" lives here.

2. Cast & Responsibilities

One orchestrator, five subagents. The orchestrator runs in agent: build mode; the subagents are defined as separate agent files under config/opencode/agents/.

Actor	File	Role	Boundary
Orchestrator	`commands/workflow.md`	Plans, dispatches, merges findings, edits artifacts under `.workflow/`, commits.	Does not write production code, write tests, or play any subagent's role.
`@check`	`agents/check.md`	Reviews plans / task splits / code for risks, correctness, testability.	Read-only — no write / edit / bash.
`@simplify`	`agents/simplify.md`	Reviews for unnecessary complexity. Advisory only.	Read-only.
`@test`	`agents/test.md`	Writes failing tests for a task spec, verifies RED.	May modify test files / `#[cfg(test)] mod` blocks. Sandboxed bash.
`@make`	`agents/make.md`	Implements a single task spec. Verifies acceptance criteria.	May modify files listed in the task spec. Sandboxed bash; no `git` / network / `cd`.
`@pm`	`agents/pm.md`	Reads / updates `TODO/` issue files.	May modify only `TODO/` contents. No bash.

Permission boundaries are enforced per agent. The orchestrator (in agent: build mode) has full edit/bash capabilities, which is precisely why it must not act as the subagents — the agent files are where the limits live.

3. Flow Diagrams

3.1 Phase pipeline

High-level happy path with the major escalation arms.

flowchart TD
    P1["Phase 1: Sanity Check"]
    P2["Phase 2: Issue Context<br/>pm reads TODO/ID.md"]
    P3["Phase 3: Plan<br/>write plan.md"]
    P4{"Phase 4: Review Plan<br/>check + simplify<br/>max 3 cycles"}
    P5["Phase 5: Split into Tasks<br/>write task-N.md"]
    P55{"Phase 5.5: Review Split<br/>check<br/>max 2 cycles"}
    P6["Phase 6: Write Tests<br/>test, stub-first make"]
    P7["Phase 7: Implement<br/>make"]
    P7E{"Test-design escalation<br/>max 2 cycles"}
    P8{"Phase 8: Final Review<br/>check + simplify<br/>max 3 cycles"}
    P9["Phase 9: Commit + TODO + Follow-ups + Summary"]

    P1 --> P2 --> P3 --> P4
    P4 -->|ACCEPTABLE| P5 --> P55
    P4 -->|NEEDS WORK or BLOCK| P3
    P55 -->|ACCEPTABLE| P6 --> P7
    P55 -->|NEEDS WORK| P5
    P55 -->|BLOCK plan-level| P3
    P7 --> P8
    P7 -.->|escalate test_design| P7E
    P7E -->|check then test then make| P7
    P7E -.->|2 cycles exhausted| P3
    P8 -->|ACCEPTABLE| P9
    P8 -->|production-code finding| P7
    P8 -->|test-design finding| P7E
    P8 -->|plan-level finding| P3
    P9 --> END([Done])

3.2 Phase 7 escalation loop

The pattern when @make cannot reach GREEN.

stateDiagram-v2
    [*] --> Dispatched: orchestrator dispatches make
    Dispatched --> EntryCheck: run tests verify RED
    EntryCheck --> Implementing: failure code matches handoff
    EntryCheck --> EntryEscalation: test-quality concern
    Implementing --> GreenReached: tests pass within 2-3 attempts
    Implementing --> MidEscalation: escalate test_design flag
    Implementing --> MidStuck: incomplete no flag
    MidStuck --> Implementing: re-dispatch with check notes 1 retry
    MidStuck --> MidEscalation: still failing on retry
    EntryEscalation --> CheckDiag
    MidEscalation --> CheckDiag
    CheckDiag --> TestRedesign: confirmed test-design error
    CheckDiag --> Dispatched: rejected production issue
    TestRedesign --> Dispatched: test fixes fresh entry validation
    Dispatched --> PlanRevisit: 2 escalation cycles exhausted
    GreenReached --> [*]
    PlanRevisit --> [*]: back to Phase 3

3.3 Issue lifecycle

How TODO entries move through statuses, with sub-issue filing during a run.

stateDiagram-v2
    [*] --> Todo: issue file created
    Todo --> InProgress: Phase 2 workflow starts
    InProgress --> Done: Phase 9 run completes
    InProgress --> Todo: workflow fails, failure handler adds comment

    note right of InProgress
        New sub-issues may be filed during Phase 9
        with parent ISSUE_ID, status Todo
        and label bug, followup, or tech-debt
    end note

    Done --> [*]

4. Routing Matrix

Every observed (phase, signal) → action. Empty cells are gaps. Walking this table is the cheap way to spot routing issues like the recent Phase 7 mid-implementation escalation.

Phase	Signal source	Signal	Action
1	Sanity checks	Bare repo / detached HEAD / missing `TODO/<ID>.md` / branch == base	Stop with error
2	`@pm`	Issue not found	Stop with error
2	`@pm`	Status is `Todo`	Flip to `In Progress`; propagate to README.md / parent's Sub-issues
3	Orchestrator	Plan drafted	Apply Dispatch Hygiene; write `plan.md`; verify `test -f`
4	`@check` + `@simplify`	Both ACCEPTABLE	Proceed to Phase 5
4	Either reviewer	NEEDS WORK	Edit `plan.md` in place; re-dispatch (max 3 cycles)
4	`@check`	BLOCK	Edit `plan.md` addressing the finding; re-dispatch
4	Reviewers	Same finding twice	Convergence detected; stop loop early
4	Reviewers	Unresolved after 3 cycles	Document blockers in summary; proceed
5	Orchestrator	Tasks drafted	Apply Dispatch Hygiene; write each `task-N.md`; verify `test -f` for every N
5.5	`@check`	ACCEPTABLE	Proceed to Phase 6
5.5	`@check`	NEEDS WORK	Edit `task-N.md` in place; re-dispatch (max 2 cycles)
5.5	`@check`	BLOCK	Plan doesn't decompose cleanly; back to Phase 4
6	`@test`	TESTS_READY + `escalate_to_check: false`	Proceed to Phase 7
6	`@test`	TESTS_READY + `escalate_to_check: true`	`@check` light review → `@test` fixes → forward
6	`@test`	NOT_TESTABLE (general)	`@check` sign-off; task goes to `@make` without tests
6	`@test`	NOT_TESTABLE: Missing testability seam	`@make` adds the seam; re-run `@test`
6	`@test`	BLOCKED	Investigate; may need spec or plan revision
6	`@test` (stub-first)	All tests pass with zero `todo!()` panics	Reject — structural-only tests; route back to `@test` to rewrite
7	`@make`	Implementation Complete	Proceed to Phase 8
7	`@make`	Implementation Incomplete + entry-validation flag	`@check` (test diagnosis) → `@test` (fixes) → fresh `@make`
7	`@make`	Implementation Incomplete + `escalate: test_design`	Same path; max 2 escalation cycles
7	`@make`	Implementation Incomplete (no flag)	Re-dispatch with `@check` notes once; if 2nd attempt fails, treat as `escalate: test_design`
7	Escalation loop	2 cycles exhausted	Back to Phase 3 (plan revisit)
8	`@check` + `@simplify`	ACCEPTABLE	Proceed to Phase 9
8	`@check`	BLOCK / behavioral / production-code finding	New `@make` task spec from finding; dispatch (max 3 cycles)
8	`@check`	BLOCK / test-design / test-quality finding	`@check` → `@test` → `@make` re-verify
8	`@check`	BLOCK / plan-level finding	Back to Phase 3 with the finding
8	`@simplify`	Advisory	Record in summary's "Advisory notes (not filed)"
8	Reviewers	Strictly cosmetic finding (typo, missing newline, AST-preserving)	Orchestrator fixes directly; re-review
8	Review loop	Same finding twice	Convergence; stop loop
8	Review loop	3 cycles exhausted	Document blockers; proceed
9	Orchestrator	Pre-existing bug, out of scope	File sub-issue via `@pm` (label: `bug`)
9	Orchestrator	Unresolved review-loop blocker	File sub-issue via `@pm` (label: `followup`)
9	`@test` (Phase 6)	NOT_TESTABLE future-seam note	File sub-issue via `@pm` (label: `tech-debt`)
9	Orchestrator	`@simplify` advisory not acted on	Record in summary; do NOT file (records, not work)
9	Orchestrator	All commits done	Set issue status to `Done`; sync README/parent; commit `chore(todo): …`

5. Design Decisions Log

ADR-flavoured. New decisions append at the end. If a decision is later reversed or refined, mark the original Superseded by ADR-N and add a new entry.

ADR-1 (2026-05-06) — Forge-agnostic workflow

Context: original gist used the GitHub gh CLI for auth checks and gh pr create --draft at the end of the run. Decision: workflow stops at git commit. No push, no PR/MR creation, no gh references anywhere. Alternatives: keep gh integration; abstract behind a forge-plugin interface. Consequences: workflow runs on any git host; user opens PR/MR manually on whichever forge they use. Removes the need for forge auth setup as a prerequisite.

ADR-2 (2026-05-06) — `@pm` operates on local `TODO/` folder

Context: original @pm agent used the Linear CLI. Decision: Linear-style folder-as-tracker with one <ID>.md file per issue plus a category-grouped README.md. Alternatives: keep Linear; multi-backend abstraction; single-file TODO.md. Consequences: project-local, version-controlled, no external service. Schema enforced in agents/pm.md. Initial single-file design moved to per-issue files in ADR-12.

ADR-3 (2026-05-07) — Workflow runs in worktree, not bare repo

Context: original orchestrated bare-clone → worktree creation as Phase 3 of the workflow. Decision: user creates the worktree before launching opencode; the workflow assumes CWD is the worktree. Alternatives: keep auto-worktree-creation; auto-detect bare vs. worktree. Consequences: simpler workflow; opencode CWD = worktree, so subagents inherit the right project root naturally; less plumbing around WORKTREE_PATH. (Subagents still get absolute paths in dispatch prompts — see ADR-7.)

ADR-4 (2026-05-07) — `@make` and `@test` are polyglot

Context: original was Python-only via uv. Decision: detect toolchain from marker files (pyproject.toml, Cargo.toml, flake.nix); wrap all toolchain commands in nix develop -c if a devshell is present. Alternatives: per-language agents; keep Python-only. Consequences: one agent per role serves multiple languages. Permission allowlists expanded for cargo and nix develop -c. Bash sandbox still denies shell escapes inside the wrapper.

ADR-5 (2026-05-07) — Subagent CWD via absolute paths

Context: opencode subagents do not inherit the orchestrator's cd. A @check dispatched from inside a worktree resolved relative paths against the parent project root and failed with "file not found." Decision: capture WORKTREE_PATH in Phase 1 and pass absolute paths to every subagent dispatch. Alternatives: patch opencode (out of scope); symlink dance. Consequences: every dispatch has an explicit Worktree: <abs path> header convention. Verbose but reliable. Eventually superseded by run-artifact paths under $RUN_DIR (ADR-7).

ADR-6 (2026-05-08) — Run artifacts on disk in `.workflow/run-<ID>/`

Context: the orchestrator was paraphrasing the plan and task specs into each dispatch prompt. Result: @check and @simplify could see slightly different versions of the same plan; mid-loop revisions could leak as "actually let me reconsider…" passages; long specs ate context budget on every dispatch. Decision: orchestrator writes plan.md (Phase 3), task-N.md (Phase 5), and summary.md (Phase 9) to $WORKTREE_PATH/.workflow/run-<ISSUE_ID>/. Dispatches name files by absolute path; subagents read them. Alternatives: inline prompts (status quo); database; in-memory orchestrator state. Consequences: byte-for-byte source of truth across dispatches. Mid-loop revisions edit the file in place; every subsequent reader sees the new version. Run-artifact directory is gitignored (.workflow/).

ADR-7 (2026-05-08) — Stub-first Rust TDD (mandatory for new symbols)

Context: Rust integration tests reference symbols imported from lib.rs. If those symbols don't exist yet, the test crate fails to compile — a build-error RED with no stack trace and no assertion diagnostics. Same for module tests against not-yet-existing functions. Decision: for any Rust task that introduces new symbols, dispatch a stub-pass @make first (writes todo!()-bodied stubs, runs cargo check only). Then @test runs against compiling stubs; runtime panic on todo!() is the clean RED. Then @make body pass replaces stubs. Alternatives: accept compile-error RED; let @make write tests + bodies in one pass; allow @test to add stubs to production source. Consequences: two atomic commits per affected task (feat: scaffold X with todo!() stubs, then feat: implement X). Stub-pass scope is tight: bodies are exactly todo!(), signatures must match the planned final API. Phase 6 also adds a mandatory panic-coverage check after @test: every test must panic on todo!() to prove it actually exercises the stubbed symbols (catches structural-only tests).

ADR-8 (2026-05-08) — `@test` may write inside `#[cfg(test)] mod` blocks

Context: Rust unit tests live colocated in production source files inside #[cfg(test)] mod tests { … } blocks — the canonical idiom, not an edge case. Original @test File Constraint forbade src/ writes entirely, which forced @make to write both production code and tests in a single dispatch. This lost the RED→GREEN separation that TDD relies on. Decision: @test may modify src/**/*.rs strictly inside #[cfg(test)] mod <name> { … } blocks. Every line outside such a block stays read-only. Alternatives: keep the restriction; write all unit-level tests as integration tests. Consequences: TDD works for module tests as well as integration tests. The previous Phase 6 file gate (path-based git status snapshot diff) is removed — with @test now legitimately writing inside src/, a path-based gate proves nothing. Constraint is now enforced by the prompt rule, the diff being human-reviewable, and @check flagging production-code drift in Phase 8.

ADR-9 (2026-05-08) — Phase 5.5 task-split review by `@check`

Context: ppries' README mentioned @check reviewing the task split for completeness, but the gist's workflow.md never implemented it. Without a split-review gate, an over- or under-split task surfaced only at Phase 8 final review — after expensive @test and @make dispatches had already run on a broken split. Decision: new Phase 5.5 dispatches @check against plan.md + every task-N.md to evaluate the split against five questions: coverage, no overlap, single-purpose, integration contracts, testable AC. Max 2 cycles; BLOCK routes back to Phase 4 (plan itself doesn't decompose). Alternatives: status quo (catch at Phase 8); orchestrator self-check. Consequences: one extra @check dispatch per run. @simplify is not involved at this phase — split review is structural, not complexity. Cheaper failure modes for over-/under-split tasks.

ADR-10 (2026-05-08) — `@pm` is single-mode (filesystem only)

Context: @pm had two read modes — git show <ref>:TODO.md (read-only) and filesystem (read/write). Git-ref mode existed for the bare-repo flow that ADR-3 retired. After ADR-3, the workflow always used filesystem mode; git-ref mode was dead weight that still added bash permissions and doc surface. Decision: remove git-ref mode. @pm has no bash access. Ad-hoc historical reads (git show main:TODO/GAL-39.md) are out of scope — the user runs them directly. Alternatives: keep dual-mode; document the separation more clearly. Consequences: simpler agent. One less permission allowlist to maintain. Workflow's "(live filesystem mode)" qualifier dropped from Phase 2 / Phase 9 / Failure handler.

ADR-11 (2026-05-08) — Phase 9 files follow-ups as TODO sub-issues

Context: unresolved items (pre-existing bugs out of scope, blocked review findings, future-seam notes) were recorded only in summary.md — per-run, untracked, overwritten on the next run, read by nobody since the user has walked away. Decision: Phase 9 has a ### File Follow-ups step that dispatches @pm to create new TODO sub-issues for tracked-worthy items. Each new issue has parent: <ISSUE_ID>, status Todo, and an appropriate label (bug / followup / tech-debt). @simplify advisories that the orchestrator chose not to act on stay in the summary as records, not filed. Alternatives: leave items in summary; create as top-level issues (would need a README.md category, which can't be picked at unattended runtime). Consequences: unresolved items become tracked work. Sub-issue routing avoids the README-category problem. The follow-up files commit alongside the worked-issue update in a single chore(todo): … commit.

ADR-12 (2026-05-08) — Phase 7 mid-implementation escalation

Context: Phase 7's escalation rule was gated on @make flagging concerns during entry validation (the RED check before implementing). When @make got past entry validation, started implementing, and then ground for 2-3 attempts because the test demanded impossible production code, the orchestrator had no documented route — it would re-dispatch @make with marginal context tweaks instead of recognizing the diagnosis as test-architecture failure. Decision: split Phase 7's escalation into entry-validation and mid-implementation paths. @make reports escalate: test_design when its iteration limit is reached and the test seems to demand impossible / unreasonable code. Both paths route through @check (test diagnosis) → @test (redesign) → fresh @make dispatch. Max 2 escalation cycles before reverting to Phase 3 plan revisit. Alternatives: status quo; let @make modify test files itself. Consequences: faster recovery from test-design errors. Bounded loop prevents thrashing. @make.md Iteration Limits section gains a new red-flag class.

6. Open Questions / Known Gaps

When a question gets answered, move it to the Design decisions log.

Q1: Phase 5.5 review scope — does `@check` evaluate test-design soundness here?

Currently Phase 5.5 reviews the split (coverage, overlap, single-purpose, integration contracts, testable AC). It does not explicitly evaluate whether the test approach implied by each task spec is sound. That would partially overlap with Phase 4 (which has a plan-level Test Design section the reviewers evaluate). If a test-design error escapes Phase 4 and is encoded in a task spec, it surfaces at Phase 7 via the mid-impl escalation (ADR-12) — but earlier detection might be cheaper. Open: should Phase 5.5 add "test approach for each task is sound" as a sixth review question, or is that scope creep into Phase 4 territory?

Q2: How does the orchestrator handle "split heuristic violated only after attempting a task"?

Phase 5's Split Heuristic catches obvious over-/under-split cases at planning time. But sometimes a task that looked single-purpose during planning turns out to mix structural and runtime work only when @make starts implementing it. There's no documented mid-Phase-7 routing for "this task needs to be split now." Currently @make would either thrash (mid-impl escalation, ADR-12), or report the spec is ambiguous (Insufficient Context Protocol in make.md). Open: should there be a "split mid-flight" route that takes the task back to Phase 5 for re-splitting?

Q3: Phase 9 has no rollback for partial commits if it fails between sub-steps

Phase 9's order is: code commit → TODO update → file follow-ups → commit TODO changes → write summary. If the workflow crashes between code commit and TODO commit, the worktree has the code change but the issue file still says In Progress. The Failure Handler covers earlier-phase crashes but Phase-9-internal partial states aren't explicitly addressed. Open: should the Failure Handler distinguish "Phase 9 partial" and resume from the right sub-step on retry, or is leaving manual cleanup to the user good enough?

Q4: `@simplify` not involved at Phase 5.5 — is that the right call?

Phase 5.5 only dispatches @check. Rationale (ADR-9) is that split review is structural, not complexity. But @simplify's lens — "what if we deleted this?" — could legitimately catch unnecessary tasks (e.g. a third task that adds an abstraction nothing else needs). Open: is the cost of one more dispatch worth the catch?

Q5: Test-design loop bound vs plan-revisit threshold

ADR-12 sets max 2 cycles for the Phase 7 test-design escalation before reverting to Phase 3 plan revisit. The plan-review and final-review loops have max 3. Why the asymmetry? The test-design loop is more expensive per cycle (@check + @test + @make re-implement vs. just reviewers + plan edit), so 2 may be right. But the choice was made by feel, not measured. Open: is 2 the right number, or should it match Phase 4 / Phase 8 at 3?

21 KiB Raw Blame History