feat(opencode): one-task-per-run model + 9 routing fixes (ADRs 13-21)

Captures the design grilling outcome. Adds ADRs 13-21 covering:
- run-level plan_rework_remaining counter to bound P3<->P5.5/P7/P8 thrash
- non-resumable workflow with throwaway-worktree recovery procedure
- @simplify advisory at every gate (not just Phase 8)
- Phase 8 fix specs go to disk as task-fix-N.md (preserves ADR-6)
- Phase 5.5 BLOCK protocol: orchestrator edits plan, decrements counter, re-enters P4
- Phase 8 NOT_TESTABLE manifest in reviewer prompt
- unified Implementation Incomplete diagnosis (test_design / production_logic / split_needed)
- Phase 1 working-tree cleanliness + depends-on enforcement
- one-task-per-run pivot: Phase 5 still splits N tasks, only task-1 runs;
  tasks 2..N filed as sub-issues with rich seed bodies; split_needed at P7
  aborts to Failure Handler (one-task-per-run = no salvageable prior work)

Auto-resolves big-diff Phase 8 reviews, cross-task regression-within-run, and
mid-flight task-split routing. Rewrites routing matrix and three Mermaid
diagrams; updates @pm (depends-on frontmatter, split-time filing), @check
(third diagnosis verdict), @make (escalate: split_needed flag).
This commit is contained in:
Harald Hoyer 2026-05-08 13:02:54 +02:00
parent 0b15944d1c
commit af6481a5a7
5 changed files with 342 additions and 130 deletions

View file

@ -145,6 +145,14 @@ High-level:
- No excessive mocking (>2 mocks is a yellow flag)?
- Diagnose issues and report findings. Do NOT edit test files — the caller routes fixes back to `@test`.
**When diagnosing `Implementation Incomplete` from `@make`** (the `/workflow` Phase 7 unified diagnosis path, per ADR-19): you receive `@make`'s self-diagnosis hint (`escalate: test_design`, `escalate: split_needed`, or no flag), the test files, the in-progress production diff, and the task spec. Return one of three verdicts in your output:
- **`test_design`** — the test demands production code that's impossible, internally-inconsistent, or testing the wrong observable. The fix is in the tests. (Caller routes to `@test` for redesign.)
- **`production_logic`** — the test is sound; `@make`'s implementation is wrong or incomplete. The fix is in the production code. (Caller re-dispatches `@make` with your notes.)
- **`split_needed`** — the task itself is over-scoped: no realistic implementation can satisfy the AC within the task's stated files-to-modify. Either the AC require touching files not listed, or the AC mix multiple concerns that should have been split at Phase 5 (per the workflow's Split Heuristic). (Caller aborts to the Failure Handler; the user re-plans from scratch.)
State the verdict explicitly — e.g. "Diagnosis: `split_needed` — the AC implies modifying both `src/foo.rs` and the EventLoop registration in `src/main.rs`, but the task spec lists only `src/foo.rs`. This is a Phase 5 split error, not a code or test error." Calibrate confidence honestly: `split_needed` is the heaviest verdict (it kills the run); reserve it for cases where neither test redesign nor code-fix would plausibly converge.
**When reviewing NOT_TESTABLE verdicts:**
- Does the reason match an allowed category (config-only, external-system, non-deterministic, pure-wiring)?
- Was a test approach genuinely attempted?

View file

@ -273,12 +273,15 @@ If tests fail or verification doesn't pass:
3. **Code issues** — Attempt fix (max 2-3 attempts if making progress)
4. **Flaky/infra issues** — Stop and report with diagnostics
5. **Test-design suspicion** — If after 12 attempts the test seems to demand production code that contradicts the spec, asserts on internal state that shouldn't be observable, mocks an internal boundary instead of the external one, or otherwise looks like it's testing the wrong thing — **stop and report with `escalate: test_design`** in the Blocking Issue section. Do not modify the test file yourself; the caller will route to `@check` for diagnosis and `@test` for redesign per the workflow's Phase 7 escalation.
6. **Task-scope suspicion** — If after 12 attempts you find that the AC realistically require modifying files not listed in your "Files to Modify," or the AC mix multiple distinct concerns that don't fit one coherent change (e.g. a new type *and* its registration site *and* a new system using it), the task is over-scoped — **stop and report with `escalate: split_needed`** in the Blocking Issue section. State concretely which file(s) outside your modify list you'd need, or which concerns the task is mixing. Do not silently expand scope; the caller will route to `@check` for diagnosis and (per the workflow's ADR-21) the run will abort to the Failure Handler so the user can re-plan from scratch.
The `escalate:` flag is a *hint* to the caller's diagnosis routing — `@check` is the authority that confirms or rejects it. Reporting `escalate: split_needed` doesn't guarantee the run aborts; if `@check` decides the task is sound and the issue is in tests or production code, the diagnosis will route back to a normal `test_design` or `production_logic` verdict.
If still failing after 2-3 focused attempts, **stop and report**:
- What was implemented
- What's failing and why
- What you tried
- Suggested next steps (including `escalate: test_design` if the failure points at the test rather than the production code)
- Suggested next steps (with the appropriate `escalate:` flag if you have a strong hypothesis about which kind of failure this is)
Do not loop indefinitely. Better to report a clear failure than burn context.

View file

@ -48,6 +48,7 @@ title: Implement a special stage type
status: Done
parent: GAL-38
labels: [gameplay, advanced-mechanics]
depends-on: [GAL-37]
---
# GAL-39: Implement a special stage type
@ -80,6 +81,7 @@ Free-form markdown describing the problem and context. Spans as many paragraphs
- `status` — one of: `Todo`, `In Progress`, `Done`. (No other values; the old `Backlog`/`In Review`/`Cancelled` set is gone.)
- `parent` — either `null` (top-level issue) or another issue ID (e.g. `GAL-38`). Sub-issues belong to their parent's `## Sub-issues` list.
- `labels` — YAML list of strings, e.g. `[gameplay, advanced-mechanics]`. May be `[]`.
- `depends-on`*optional* YAML list of issue IDs that must reach `status: Done` before this issue can be started. Used by `/workflow`'s Phase 1 sanity check to hard-block runs whose dependencies aren't satisfied (per ADR-21). Omit the field entirely when there are no dependencies; do not write `depends-on: []`. Cycles are not detected by this agent — the caller is responsible for not creating a cycle.
**Body rules:**
- The first heading is `# <ID>: <title>` (matches frontmatter).
@ -172,9 +174,24 @@ For list output, return an array of `{id, title, status, parent, labels}` object
### When creating issues
- Default `status: Todo` unless the caller says otherwise.
- Title: short, imperative ("Add retry logic to ingest worker", not "retry stuff").
- Frontmatter must be complete: `id`, `title`, `status`, `parent`, `labels`.
- Frontmatter must be complete: `id`, `title`, `status`, `parent`, `labels`. Add `depends-on:` when the caller specifies dependencies.
- Always update the dependent index (README.md for top-level, parent file for sub-issues) so the new issue is visible.
### Split-time sub-issue creation (rich-body filings)
When the `/workflow` orchestrator dispatches you mid-run to file a sub-issue from a Phase 5.5 task split (per ADR-21), the caller passes a structured body containing more than the usual minimum. Treat the body as already-finalized — write it verbatim into the new issue file. Common sections you'll see:
- `## What to implement` — one-line + brief description.
- `## Acceptance criteria` — checkboxes; preserve `- [ ]` state (newly filed sub-issues start with all AC unchecked).
- `## Code Context` — code snippets carried over from the split-time task spec.
- `## Integration with sibling sub-issues` — narrative; the structural dependencies belong in the `depends-on:` frontmatter list, which the caller will pass alongside the body.
- `## Plan rationale` — slice of the parent's plan.
- `## Test design` — when present.
Use the rendered ordering: H1 → description (the "Discovered during run on …" attribution paragraph that ends the body counts as part of the description) → `## Sub-issues` (omit; sub-issues won't have their own children at filing time) → `## Acceptance criteria``## Integration test hints` (omit unless caller passed it) → `## Comments` (omit until first comment is appended).
Add the `split-from-run` label to the labels list when the caller specifies it, alongside any propagated parent labels.
### When updating status
- Confirm the change (e.g. "GAL-39 status: In Progress → Done").
- A status change to `Done` is only valid if all acceptance-criteria checkboxes (when the section exists) are checked. If they are not, report which ones remain and ask for confirmation before forcing the change.