feat(opencode): extend Phase 7 escalation to mid-implementation test-design errors

Phase 7's escalation rule was gated on @make flagging concerns "during
entry validation" only. When @make got past entry validation, started
implementing, and ground for 2-3 attempts because the test demanded
impossible production code, the orchestrator had no documented route
— it would re-dispatch @make with marginal context tweaks instead of
recognizing the failure as test-architecture.

Splits the escalation into two clearly-named paths (entry-validation
vs mid-implementation) that both route through @check (test diagnosis)
→ @test (redesign) → fresh @make. Bounded at max 2 escalation cycles
before reverting to a Phase 3 plan revisit, to prevent thrashing when
the actual problem is upstream.

@make.md gains a new Iteration Limits red-flag class — "Test-design
suspicion" — instructing @make to stop and report with an explicit
`escalate: test_design` flag in the Blocking Issue section. The flag
is the routing signal the orchestrator switches on.
This commit is contained in:
Harald Hoyer 2026-05-08 10:13:29 +02:00
parent aac4d44a49
commit 534361f1b5
2 changed files with 16 additions and 7 deletions

View file

@ -272,12 +272,13 @@ If tests fail or verification doesn't pass:
2. **Context/spec issues** — Stop immediately and report; don't guess
3. **Code issues** — Attempt fix (max 2-3 attempts if making progress)
4. **Flaky/infra issues** — Stop and report with diagnostics
5. **Test-design suspicion** — If after 12 attempts the test seems to demand production code that contradicts the spec, asserts on internal state that shouldn't be observable, mocks an internal boundary instead of the external one, or otherwise looks like it's testing the wrong thing — **stop and report with `escalate: test_design`** in the Blocking Issue section. Do not modify the test file yourself; the caller will route to `@check` for diagnosis and `@test` for redesign per the workflow's Phase 7 escalation.
If still failing after 2-3 focused attempts, **stop and report**:
- What was implemented
- What's failing and why
- What you tried
- Suggested next steps
- Suggested next steps (including `escalate: test_design` if the failure points at the test rather than the production code)
Do not loop indefinitely. Better to report a clear failure than burn context.

View file

@ -406,12 +406,20 @@ Do **not** quote the task spec inline.
4. Refactor while keeping green
5. Report RED→GREEN evidence
**Escalation:** If `@make` flags test quality concerns during entry validation:
1. `@make` reports the issue to caller
2. Caller routes to `@check` for diagnosis
3. `@check` reports findings
4. Caller routes to `@test` for fixes
5. Fixed tests return to `@make`
**Escalation — two paths route through `@check``@test` → back to `@make`:**
1. **Entry-validation escalation.** Before implementing, `@make`'s entry check (run tests, verify RED, compare against handoff) reveals test-quality concerns — wrong assertion target, mixed failure codes, mocks of internal boundaries, etc. `@make` reports without writing any production code.
2. **Mid-implementation escalation.** After implementing, `@make` hits its iteration limit (23 attempts) because the test demands production code that's impossible or contradicts the spec. `@make` returns `Implementation Incomplete` with the flag `escalate: test_design`. **Do not** re-dispatch `@make` with marginal context tweaks — that just burns cycles on a test that needs redesign, not better implementation.
In both cases:
1. `@make` returns its report (entry-time concern or mid-impl `escalate: test_design`).
2. Orchestrator routes the report to `@check` for diagnosis (light review of the *tests*, not the implementation).
3. `@check` confirms or rejects the test-design suspicion.
4. **If confirmed:** orchestrator routes to `@test` to redesign the tests. Apply Dispatch Hygiene. Fixed tests return to `@make` for fresh entry validation and a clean implementation attempt.
5. **If rejected:** the issue is in the production code; orchestrator re-dispatches `@make` with `@check`'s diagnostic notes attached.
**Iteration limit on this loop: max 2 cycles.** If a test-design suspicion keeps surfacing but `@check` never confirms it, the design problem is upstream — revisit the Phase 3 plan rather than thrashing between `@test` and `@make`.
For NOT_TESTABLE tasks, `@make` runs in standard mode.