feat(opencode): one-task-per-run model + 9 routing fixes (ADRs 13-21)

Captures the design grilling outcome. Adds ADRs 13-21 covering: - run-level plan_rework_remaining counter to bound P3<->P5.5/P7/P8 thrash - non-resumable workflow with throwaway-worktree recovery procedure - @simplify advisory at every gate (not just Phase 8) - Phase 8 fix specs go to disk as task-fix-N.md (preserves ADR-6) - Phase 5.5 BLOCK protocol: orchestrator edits plan, decrements counter, re-enters P4 - Phase 8 NOT_TESTABLE manifest in reviewer prompt - unified Implementation Incomplete diagnosis (test_design / production_logic / split_needed) - Phase 1 working-tree cleanliness + depends-on enforcement - one-task-per-run pivot: Phase 5 still splits N tasks, only task-1 runs; tasks 2..N filed as sub-issues with rich seed bodies; split_needed at P7 aborts to Failure Handler (one-task-per-run = no salvageable prior work) Auto-resolves big-diff Phase 8 reviews, cross-task regression-within-run, and mid-flight task-split routing. Rewrites routing matrix and three Mermaid diagrams; updates @pm (depends-on frontmatter, split-time filing), @check (third diagnosis verdict), @make (escalate: split_needed flag).
2026-05-08 13:02:54 +02:00 · 2026-05-08 13:02:54 +02:00 · af6481a5a7
commit af6481a5a7
parent 0b15944d1c
5 changed files with 342 additions and 130 deletions
--- a/config/opencode/agents/make.md
+++ b/config/opencode/agents/make.md
@ -273,12 +273,15 @@ If tests fail or verification doesn't pass:
 3. **Code issues** — Attempt fix (max 2-3 attempts if making progress)
 4. **Flaky/infra issues** — Stop and report with diagnostics
 5. **Test-design suspicion** — If after 1–2 attempts the test seems to demand production code that contradicts the spec, asserts on internal state that shouldn't be observable, mocks an internal boundary instead of the external one, or otherwise looks like it's testing the wrong thing — **stop and report with `escalate: test_design`** in the Blocking Issue section. Do not modify the test file yourself; the caller will route to `@check` for diagnosis and `@test` for redesign per the workflow's Phase 7 escalation.
+6. **Task-scope suspicion** — If after 1–2 attempts you find that the AC realistically require modifying files not listed in your "Files to Modify," or the AC mix multiple distinct concerns that don't fit one coherent change (e.g. a new type *and* its registration site *and* a new system using it), the task is over-scoped — **stop and report with `escalate: split_needed`** in the Blocking Issue section. State concretely which file(s) outside your modify list you'd need, or which concerns the task is mixing. Do not silently expand scope; the caller will route to `@check` for diagnosis and (per the workflow's ADR-21) the run will abort to the Failure Handler so the user can re-plan from scratch.
+
+The `escalate:` flag is a *hint* to the caller's diagnosis routing — `@check` is the authority that confirms or rejects it. Reporting `escalate: split_needed` doesn't guarantee the run aborts; if `@check` decides the task is sound and the issue is in tests or production code, the diagnosis will route back to a normal `test_design` or `production_logic` verdict.

 If still failing after 2-3 focused attempts, **stop and report**:
 - What was implemented
 - What's failing and why
 - What you tried
- Suggested next steps (including `escalate: test_design` if the failure points at the test rather than the production code)
+- Suggested next steps (with the appropriate `escalate:` flag if you have a strong hypothesis about which kind of failure this is)

 Do not loop indefinitely. Better to report a clear failure than burn context.