feat(opencode): extend Phase 7 escalation to mid-implementation test-design errors

Phase 7's escalation rule was gated on @make flagging concerns "during
entry validation" only. When @make got past entry validation, started
implementing, and ground for 2-3 attempts because the test demanded
impossible production code, the orchestrator had no documented route
— it would re-dispatch @make with marginal context tweaks instead of
recognizing the failure as test-architecture.

Splits the escalation into two clearly-named paths (entry-validation
vs mid-implementation) that both route through @check (test diagnosis)
→ @test (redesign) → fresh @make. Bounded at max 2 escalation cycles
before reverting to a Phase 3 plan revisit, to prevent thrashing when
the actual problem is upstream.

@make.md gains a new Iteration Limits red-flag class — "Test-design
suspicion" — instructing @make to stop and report with an explicit
`escalate: test_design` flag in the Blocking Issue section. The flag
is the routing signal the orchestrator switches on.
This commit is contained in:
Harald Hoyer 2026-05-08 10:13:29 +02:00
parent aac4d44a49
commit 534361f1b5
2 changed files with 16 additions and 7 deletions

View file

@ -272,12 +272,13 @@ If tests fail or verification doesn't pass:
2. **Context/spec issues** — Stop immediately and report; don't guess
3. **Code issues** — Attempt fix (max 2-3 attempts if making progress)
4. **Flaky/infra issues** — Stop and report with diagnostics
5. **Test-design suspicion** — If after 12 attempts the test seems to demand production code that contradicts the spec, asserts on internal state that shouldn't be observable, mocks an internal boundary instead of the external one, or otherwise looks like it's testing the wrong thing — **stop and report with `escalate: test_design`** in the Blocking Issue section. Do not modify the test file yourself; the caller will route to `@check` for diagnosis and `@test` for redesign per the workflow's Phase 7 escalation.
If still failing after 2-3 focused attempts, **stop and report**:
- What was implemented
- What's failing and why
- What you tried
- Suggested next steps
- Suggested next steps (including `escalate: test_design` if the failure points at the test rather than the production code)
Do not loop indefinitely. Better to report a clear failure than burn context.