feat(opencode): one-task-per-run model + 9 routing fixes (ADRs 13-21)

Captures the design grilling outcome. Adds ADRs 13-21 covering:
- run-level plan_rework_remaining counter to bound P3<->P5.5/P7/P8 thrash
- non-resumable workflow with throwaway-worktree recovery procedure
- @simplify advisory at every gate (not just Phase 8)
- Phase 8 fix specs go to disk as task-fix-N.md (preserves ADR-6)
- Phase 5.5 BLOCK protocol: orchestrator edits plan, decrements counter, re-enters P4
- Phase 8 NOT_TESTABLE manifest in reviewer prompt
- unified Implementation Incomplete diagnosis (test_design / production_logic / split_needed)
- Phase 1 working-tree cleanliness + depends-on enforcement
- one-task-per-run pivot: Phase 5 still splits N tasks, only task-1 runs;
  tasks 2..N filed as sub-issues with rich seed bodies; split_needed at P7
  aborts to Failure Handler (one-task-per-run = no salvageable prior work)

Auto-resolves big-diff Phase 8 reviews, cross-task regression-within-run, and
mid-flight task-split routing. Rewrites routing matrix and three Mermaid
diagrams; updates @pm (depends-on frontmatter, split-time filing), @check
(third diagnosis verdict), @make (escalate: split_needed flag).
This commit is contained in:
Harald Hoyer 2026-05-08 13:02:54 +02:00
parent 0b15944d1c
commit af6481a5a7
5 changed files with 342 additions and 130 deletions

View file

@ -145,6 +145,14 @@ High-level:
- No excessive mocking (>2 mocks is a yellow flag)?
- Diagnose issues and report findings. Do NOT edit test files — the caller routes fixes back to `@test`.
**When diagnosing `Implementation Incomplete` from `@make`** (the `/workflow` Phase 7 unified diagnosis path, per ADR-19): you receive `@make`'s self-diagnosis hint (`escalate: test_design`, `escalate: split_needed`, or no flag), the test files, the in-progress production diff, and the task spec. Return one of three verdicts in your output:
- **`test_design`** — the test demands production code that's impossible, internally-inconsistent, or testing the wrong observable. The fix is in the tests. (Caller routes to `@test` for redesign.)
- **`production_logic`** — the test is sound; `@make`'s implementation is wrong or incomplete. The fix is in the production code. (Caller re-dispatches `@make` with your notes.)
- **`split_needed`** — the task itself is over-scoped: no realistic implementation can satisfy the AC within the task's stated files-to-modify. Either the AC require touching files not listed, or the AC mix multiple concerns that should have been split at Phase 5 (per the workflow's Split Heuristic). (Caller aborts to the Failure Handler; the user re-plans from scratch.)
State the verdict explicitly — e.g. "Diagnosis: `split_needed` — the AC implies modifying both `src/foo.rs` and the EventLoop registration in `src/main.rs`, but the task spec lists only `src/foo.rs`. This is a Phase 5 split error, not a code or test error." Calibrate confidence honestly: `split_needed` is the heaviest verdict (it kills the run); reserve it for cases where neither test redesign nor code-fix would plausibly converge.
**When reviewing NOT_TESTABLE verdicts:**
- Does the reason match an allowed category (config-only, external-system, non-deterministic, pure-wiring)?
- Was a test approach genuinely attempted?