diff --git a/config/opencode/agents/make.md b/config/opencode/agents/make.md index 67802fb..5193ef2 100644 --- a/config/opencode/agents/make.md +++ b/config/opencode/agents/make.md @@ -272,12 +272,13 @@ If tests fail or verification doesn't pass: 2. **Context/spec issues** — Stop immediately and report; don't guess 3. **Code issues** — Attempt fix (max 2-3 attempts if making progress) 4. **Flaky/infra issues** — Stop and report with diagnostics +5. **Test-design suspicion** — If after 1–2 attempts the test seems to demand production code that contradicts the spec, asserts on internal state that shouldn't be observable, mocks an internal boundary instead of the external one, or otherwise looks like it's testing the wrong thing — **stop and report with `escalate: test_design`** in the Blocking Issue section. Do not modify the test file yourself; the caller will route to `@check` for diagnosis and `@test` for redesign per the workflow's Phase 7 escalation. If still failing after 2-3 focused attempts, **stop and report**: - What was implemented - What's failing and why - What you tried -- Suggested next steps +- Suggested next steps (including `escalate: test_design` if the failure points at the test rather than the production code) Do not loop indefinitely. Better to report a clear failure than burn context. diff --git a/config/opencode/commands/workflow.md b/config/opencode/commands/workflow.md index 34ca336..c6304c1 100644 --- a/config/opencode/commands/workflow.md +++ b/config/opencode/commands/workflow.md @@ -406,12 +406,20 @@ Do **not** quote the task spec inline. 4. Refactor while keeping green 5. Report RED→GREEN evidence -**Escalation:** If `@make` flags test quality concerns during entry validation: -1. `@make` reports the issue to caller -2. Caller routes to `@check` for diagnosis -3. `@check` reports findings -4. Caller routes to `@test` for fixes -5. Fixed tests return to `@make` +**Escalation — two paths route through `@check` → `@test` → back to `@make`:** + +1. **Entry-validation escalation.** Before implementing, `@make`'s entry check (run tests, verify RED, compare against handoff) reveals test-quality concerns — wrong assertion target, mixed failure codes, mocks of internal boundaries, etc. `@make` reports without writing any production code. +2. **Mid-implementation escalation.** After implementing, `@make` hits its iteration limit (2–3 attempts) because the test demands production code that's impossible or contradicts the spec. `@make` returns `Implementation Incomplete` with the flag `escalate: test_design`. **Do not** re-dispatch `@make` with marginal context tweaks — that just burns cycles on a test that needs redesign, not better implementation. + +In both cases: + +1. `@make` returns its report (entry-time concern or mid-impl `escalate: test_design`). +2. Orchestrator routes the report to `@check` for diagnosis (light review of the *tests*, not the implementation). +3. `@check` confirms or rejects the test-design suspicion. +4. **If confirmed:** orchestrator routes to `@test` to redesign the tests. Apply Dispatch Hygiene. Fixed tests return to `@make` for fresh entry validation and a clean implementation attempt. +5. **If rejected:** the issue is in the production code; orchestrator re-dispatches `@make` with `@check`'s diagnostic notes attached. + +**Iteration limit on this loop: max 2 cycles.** If a test-design suspicion keeps surfacing but `@check` never confirms it, the design problem is upstream — revisit the Phase 3 plan rather than thrashing between `@test` and `@make`. For NOT_TESTABLE tasks, `@make` runs in standard mode.