feat(opencode): one-task-per-run model + 9 routing fixes (ADRs 13-21)
Captures the design grilling outcome. Adds ADRs 13-21 covering: - run-level plan_rework_remaining counter to bound P3<->P5.5/P7/P8 thrash - non-resumable workflow with throwaway-worktree recovery procedure - @simplify advisory at every gate (not just Phase 8) - Phase 8 fix specs go to disk as task-fix-N.md (preserves ADR-6) - Phase 5.5 BLOCK protocol: orchestrator edits plan, decrements counter, re-enters P4 - Phase 8 NOT_TESTABLE manifest in reviewer prompt - unified Implementation Incomplete diagnosis (test_design / production_logic / split_needed) - Phase 1 working-tree cleanliness + depends-on enforcement - one-task-per-run pivot: Phase 5 still splits N tasks, only task-1 runs; tasks 2..N filed as sub-issues with rich seed bodies; split_needed at P7 aborts to Failure Handler (one-task-per-run = no salvageable prior work) Auto-resolves big-diff Phase 8 reviews, cross-task regression-within-run, and mid-flight task-split routing. Rewrites routing matrix and three Mermaid diagrams; updates @pm (depends-on frontmatter, split-time filing), @check (third diagnosis verdict), @make (escalate: split_needed flag).
This commit is contained in:
parent
0b15944d1c
commit
af6481a5a7
5 changed files with 342 additions and 130 deletions
|
|
@ -86,7 +86,10 @@ Define `RUN_DIR="$WORKTREE_PATH/.workflow/run-$ISSUE_ID"` once in Phase 1 and re
|
|||
- Else if `git rev-parse --verify --quiet master` succeeds, use `master`.
|
||||
- Else stop: "Could not determine base branch (no `main` or `master`). Pass it as the second argument: `/workflow <ISSUE-ID> <base-branch>`."
|
||||
7. Verify the current branch is not the base branch: if `BRANCH_NAME == BASE_BRANCH`, stop: "Cannot run workflow on the base branch (`$BASE_BRANCH`). Switch to a feature branch first."
|
||||
8. Set the run-artifacts directory: `RUN_DIR="$WORKTREE_PATH/.workflow/run-$ISSUE_ID"`. Phase 3 will `mkdir -p "$RUN_DIR"` before writing the first artifact.
|
||||
8. **Verify the working tree is clean** (ADR-20): `git status --porcelain` must return empty. If not, stop: "Working tree must be clean. Commit or stash uncommitted changes before running the workflow."
|
||||
9. **Check `depends-on:` declarations** (ADR-21): if `./TODO/$ISSUE_ID.md`'s frontmatter contains a `depends-on: [<ID>, ...]` list, verify every listed dependency's status is `Done` (read each `./TODO/<DEP-ID>.md`). If any dependency is not `Done`, stop: "Cannot start `$ISSUE_ID`; it depends on `<DEP-ID>` (status: `<status>`). Complete dependencies first." If a listed dependency file does not exist, stop with: "Cannot start `$ISSUE_ID`; declared dependency `<DEP-ID>` has no issue file." If the field is absent, proceed.
|
||||
10. Set the run-artifacts directory: `RUN_DIR="$WORKTREE_PATH/.workflow/run-$ISSUE_ID"`. Phase 3 will `mkdir -p "$RUN_DIR"` before writing the first artifact.
|
||||
11. Initialize the run-level rework counter: `PLAN_REWORK_REMAINING=1` (per ADR-13). Decrement on every P5.5-BLOCK→P4, P7-escalation-exhaustion→P3, and P8-plan-level→P3 transition. When the counter is `0` and another such transition fires, abort to the Failure Handler instead of re-entering.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -196,24 +199,23 @@ Reviewers should evaluate testability:
|
|||
- `@check`: Is the design testable? Are the right behaviors identified? (Review Framework §8)
|
||||
- `@simplify`: Is the test scope appropriate? Over-testing proposed?
|
||||
|
||||
**Merge rules:**
|
||||
- `@check` safety/correctness findings are hard constraints
|
||||
- If `@simplify` recommends removing something `@check` flags as needed, `@check` wins
|
||||
- Note conflicts explicitly
|
||||
**Verdict authority** (ADR-15): `@check` is the only blocking reviewer. Its `NEEDS WORK` and `BLOCK` verdicts gate progression; its `ACCEPTABLE` verdict permits proceeding regardless of what `@simplify` reports. `@simplify` is advisory at every gate — its findings are recorded in the run summary's "Advisory notes (not filed)" section but never trigger a re-dispatch loop. The user may manually elevate a `@simplify` finding into a `task-fix-<N>.md` if it warrants follow-up.
|
||||
|
||||
**Review loop (max 3 cycles):**
|
||||
**Review loop (max 3 cycles, gated on `@check`):**
|
||||
1. Dispatch both reviewers against `$RUN_DIR/plan.md`.
|
||||
2. Merge findings
|
||||
3. If verdict is ACCEPTABLE from both (or JUSTIFIED COMPLEXITY from `@simplify`): proceed to Phase 5
|
||||
4. If BLOCK or NEEDS WORK: edit `$RUN_DIR/plan.md` in place addressing findings (re-apply Dispatch Hygiene to the updated file), then re-review.
|
||||
5. **Convergence detection:** if reviewers return the same findings as the previous cycle, stop the loop early
|
||||
6. If still unresolved after 3 cycles: note unresolved blockers and proceed anyway (they will be documented in the workflow summary and commit message)
|
||||
2. Merge findings: record `@simplify`'s output for the summary; act only on `@check`'s verdict.
|
||||
3. If `@check` returns ACCEPTABLE: proceed to Phase 5 (regardless of `@simplify`).
|
||||
4. If `@check` returns BLOCK or NEEDS WORK: edit `$RUN_DIR/plan.md` in place addressing the findings (re-apply Dispatch Hygiene to the updated file), then re-review.
|
||||
5. **Convergence detection:** if `@check` returns the same findings as the previous cycle, stop the loop early.
|
||||
6. If still unresolved after 3 cycles: note unresolved blockers and proceed anyway (they will be documented in the workflow summary and commit message).
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: Split into Tasks
|
||||
|
||||
**The output of this phase is one file per task at `$RUN_DIR/task-<N>.md`** (1-indexed: `task-1.md`, `task-2.md`, …). These files are the source-of-truth that Phase 6 (`@test`) and Phase 7 (`@make`) read by absolute path. **No file written = no dispatch in later phases.** If you skip the file-write step, every downstream dispatch will reference a non-existent path and fail.
|
||||
**The output of this phase is one file per task at `$RUN_DIR/task-<N>.md`** (1-indexed: `task-1.md`, `task-2.md`, …). These files are the source-of-truth that Phase 5.5, Phase 6, and Phase 7 read by absolute path. **No file written = no dispatch in later phases.** If you skip the file-write step, every downstream dispatch will reference a non-existent path and fail.
|
||||
|
||||
**One-task-per-run model (ADR-21):** Phase 5 still produces N task files. After Phase 5.5 review, only **task-1** runs through Phases 6–8 in this invocation. If N > 1, tasks 2…N are filed as TODO sub-issues at the end of Phase 5.5 (see "File sibling tasks as sub-issues" below) and the user runs `/workflow` separately on each. This bounds the run's scope, keeps Phase 8 diffs reviewable, and eliminates cross-task regression risk inside a run.
|
||||
|
||||
Steps:
|
||||
|
||||
|
|
@ -254,9 +256,11 @@ Include **Test Design** from Phase 3 when available, attached to the relevant ta
|
|||
|
||||
**Task size:** ~10-30 minutes each, single coherent change, clear boundaries.
|
||||
|
||||
### Split Heuristic — when in doubt, split
|
||||
### Split Heuristic — when in doubt, **do not** split
|
||||
|
||||
A task must be **split** if any of the following apply:
|
||||
In the one-task-per-run model (ADR-21), splitting fans work out across user sessions: every additional task becomes a sub-issue the user must come back and run as its own `/workflow` invocation, with full P3/P4/P5/P5.5/P6/P7/P8 overhead per sub-issue. **Default to keeping work in one task.** Only split when one of the mechanical triggers below clearly applies *and* the resulting sub-tasks each warrant their own commit/PR-sized chunk of attention.
|
||||
|
||||
A task should be **split** when any of the following apply:
|
||||
|
||||
- It touches more than two distinct concerns (e.g. *constants + new component + sprite spawn + new system + main wiring* is **five** concerns — at least three tasks).
|
||||
- It changes more than ~50 lines across more than 2 files.
|
||||
|
|
@ -264,12 +268,14 @@ A task must be **split** if any of the following apply:
|
|||
- It mixes pure-logic changes (math helpers) with stateful changes (queries, world mutation).
|
||||
- It mixes new APIs with their first call sites in the same task.
|
||||
|
||||
When a task fails the heuristic, split into:
|
||||
**Tiebreaker:** when none of the triggers clearly applies and the work plausibly fits a single coherent commit, do not split. Splitting fans out across sessions; only split if each resulting sub-issue is genuinely independently runnable and benefits from its own plan.
|
||||
|
||||
When a task does fail the heuristic, split into:
|
||||
1. **Foundations** — new constants, types, components (no behavior change yet).
|
||||
2. **Implementation** — the actual production logic, calling the foundations.
|
||||
3. **Wiring** — registration in `main.rs` / `lib.rs` / app-builder.
|
||||
|
||||
Each split is dispatched separately to `@make` and verified before the next.
|
||||
Tasks 2…N are filed as sub-issues at the end of Phase 5.5; only task 1 runs in this invocation.
|
||||
|
||||
### Code Context — what to include
|
||||
|
||||
|
|
@ -289,32 +295,73 @@ Apply **Dispatch Hygiene** to each task spec before dispatch in Phase 7.
|
|||
|
||||
## Phase 5.5: Review Task Split
|
||||
|
||||
A short, focused review of the task split as a set. Catches split errors (missed scope, overlap, multi-purpose tasks, missing integration contracts) when they're cheap to fix — *before* `@test` and `@make` dispatch on a broken split. Without this gate, the same errors surface only at Phase 8 final review, after expensive test/implementation work has already been done.
|
||||
A short, focused review of the task split as a set. In the one-task-per-run model (ADR-21), this phase is the gate for **two** things: (a) catching split errors (missed scope, overlap, multi-purpose tasks, missing integration contracts) before `@test`/`@make` dispatch, and (b) preventing a botched split from being persisted as garbage sub-issues that the user has to manually clean up later. Both stakes are higher than in the original N-tasks-per-run design.
|
||||
|
||||
**Dispatch only `@check`** for this phase — split review is structural / coverage, not complexity. `@simplify` is not involved. Apply **Dispatch Hygiene** to the prompt.
|
||||
|
||||
**Skip Phase 5.5 entirely when N=1** (ADR-21): a single-task plan has no split to review. Three of the six questions below (no overlap, integration contracts, sub-issue self-containment) are degenerate. The remaining structural concerns (coverage, single-purpose, testable AC) are already evaluated at Phase 4 plan acceptance. Proceed directly to Phase 6.
|
||||
|
||||
The dispatch prompt names:
|
||||
- `$RUN_DIR/plan.md` (the plan being decomposed)
|
||||
- `$RUN_DIR/task-1.md` through `$RUN_DIR/task-N.md` (the split — list every task file)
|
||||
- The worktree path
|
||||
|
||||
`@check` evaluates the split against five questions:
|
||||
`@check` evaluates the split against six questions:
|
||||
|
||||
1. **Coverage** — do the tasks together implement everything the plan promises? Any gaps?
|
||||
2. **No overlap** — do two tasks claim the same scope or modify the same lines?
|
||||
3. **Single-purpose** — does any task do more than one thing? (See Phase 5's Split Heuristic.)
|
||||
4. **Integration contracts** — where two tasks touch a shared interface, is the contract documented in both task files?
|
||||
4. **Integration contracts** — where two tasks touch a shared interface, is the contract documented in *both* task files in a form that survives sub-issue filing? (Each sub-issue runs in isolation later — its eventual P3 plan must be reconstructable from the sub-issue body alone, including any cross-sub-issue dependencies. This is the load-bearing question in the new model.)
|
||||
5. **Testable acceptance criteria** — does every task have specific, falsifiable AC?
|
||||
6. **Self-containment** — is each task spec runnable as a standalone `/workflow` invocation? Does its description carry enough plan-level context (rationale, code seams, scope boundary) that a fresh run could re-plan it without seeing the parent plan or sibling task files?
|
||||
|
||||
**Review loop (max 2 cycles):**
|
||||
|
||||
1. Dispatch `@check` against the plan + all task files.
|
||||
2. If `ACCEPTABLE` → proceed to Phase 6.
|
||||
3. If `NEEDS WORK` → edit the task files in place (split a task into two, merge two tasks, add integration contracts, sharpen AC). Re-apply Dispatch Hygiene to each updated file. Re-dispatch.
|
||||
4. If `BLOCK` → the plan itself does not decompose cleanly. Return to Phase 4 with `@check`'s finding instead of forcing the split.
|
||||
5. **Convergence detection:** same finding twice → stop loop, document the unresolved split issue in the run summary, proceed.
|
||||
2. If `ACCEPTABLE` → proceed to "File sibling tasks as sub-issues" (below), then Phase 6.
|
||||
3. If `NEEDS WORK` → edit the task files in place (split a task into two, merge two tasks, strengthen integration contracts, sharpen AC, add self-containment context). Re-apply Dispatch Hygiene to each updated file. Re-dispatch.
|
||||
4. If `BLOCK` plan-level finding (ADR-17) → translate the split-level finding into a concrete `plan.md` edit, save the edit, **decrement `PLAN_REWORK_REMAINING`**, and re-enter Phase 4 against the revised plan. If `PLAN_REWORK_REMAINING` was already `0`, abort to the Failure Handler instead.
|
||||
5. **Convergence detection:** same `@check` finding twice → stop loop, document the unresolved split issue in the run summary, proceed.
|
||||
|
||||
**This is a quick gate, not a deep review.** No line-by-line code feedback (there's no code), no design re-litigation (that was Phase 4's job). The whole point is a fast structural check before downstream phases start churning.
|
||||
**This is a quick gate, not a deep review.** No line-by-line code feedback (there's no code), no design re-litigation (that was Phase 4's job). The whole point is a fast structural check before downstream phases start churning *and* before sibling tasks become persistent sub-issues.
|
||||
|
||||
### File sibling tasks as sub-issues (when N > 1)
|
||||
|
||||
After Phase 5.5 returns ACCEPTABLE, dispatch `@pm` to file each of `task-2.md` through `task-N.md` as a TODO sub-issue with `parent: $ISSUE_ID`. **Only task-1 continues into Phase 6.** Each filed sub-issue gets a rich seed body (ADR-21) so its eventual `/workflow` run can plan and implement without seeing siblings or the original `plan.md`.
|
||||
|
||||
For each task `$N` in 2…N, dispatch `@pm` with the following body content (assembled by the orchestrator from `task-<N>.md` and the relevant slice of `plan.md`):
|
||||
|
||||
```markdown
|
||||
## What to implement
|
||||
<task description from task-<N>.md>
|
||||
|
||||
## Acceptance criteria
|
||||
<AC checkboxes from task-<N>.md>
|
||||
|
||||
## Code Context
|
||||
<code snippets from task-<N>.md>
|
||||
|
||||
## Integration with sibling sub-issues
|
||||
<dependencies on sibling sub-issues, with brief rationale; declared in frontmatter as `depends-on: [<SIB-ID>, ...]`>
|
||||
|
||||
## Plan rationale
|
||||
<relevant slice of plan.md — typically 1–3 paragraphs covering why this approach was chosen>
|
||||
|
||||
## Test design
|
||||
<from task-<N>.md or plan.md if present>
|
||||
|
||||
---
|
||||
Discovered during run on `$BRANCH_NAME` for parent issue `$ISSUE_ID`.
|
||||
```
|
||||
|
||||
`@pm` invocation per sub-issue:
|
||||
- Title — derived from `task-<N>.md`'s task description (short imperative).
|
||||
- Status — `Todo`.
|
||||
- Parent — `$ISSUE_ID`.
|
||||
- Labels — propagate relevant labels from the parent (e.g. `gameplay`); add `split-from-run` to mark the provenance.
|
||||
- `depends-on:` — sibling sub-issue IDs that this task requires to be `Done` first. The orchestrator determines the dependency graph from the integration contracts captured in Phase 5.5 question 4.
|
||||
|
||||
The new sub-issue files plus the parent's updated `## Sub-issues` list are staged in Phase 9's `chore(todo): …` commit alongside the parent's status/AC updates.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -338,7 +385,7 @@ Do **not** quote task or plan content inline — `@test` reads from disk.
|
|||
|-----------|--------|
|
||||
| `TESTS_READY` + `escalate_to_check: false` | Proceed to Phase 7 |
|
||||
| `TESTS_READY` + `escalate_to_check: true` | Route tests to `@check` for light review. `@check` diagnoses, caller routes fixes to `@test`. Then proceed. |
|
||||
| `NOT_TESTABLE` | Route to `@check` for sign-off on justification. If `Missing testability seam`, dispatch `@make` to add the seam first, then re-run `@test`. Otherwise the task goes to `@make` without tests. |
|
||||
| `NOT_TESTABLE` | Route to `@check` for sign-off on justification. If `Missing testability seam`, dispatch `@make` to add the seam first, then re-run `@test`. Otherwise the task goes to `@make` without tests. **Record the `@test` justification + `@check` sign-off rationale** for the Phase 8 NOT_TESTABLE manifest (ADR-18). |
|
||||
| `BLOCKED` | Investigate. May need to revise task spec or plan. |
|
||||
| Test passes immediately | Investigate — behavior may already exist. Task spec may be wrong. |
|
||||
| Stub-first run: tests pass with zero `todo!()` panics | **Structural-only tests.** Every test is asserting type/struct/enum facts without calling any stubbed symbol. Reject the test output and route back to `@test` with a "must exercise the stubbed symbols by calling them" note. Do not let these tests gate Phase 7 — they cannot RED→GREEN, so the body-pass `@make` would commit code with false-green coverage. |
|
||||
|
|
@ -373,82 +420,88 @@ To get a clean runtime RED, dispatch a **stub-first `@make` pass** *before* `@te
|
|||
|
||||
The stub pass and the body pass each produce their own atomic commit (per Phase 9 rules): `feat(<scope>): scaffold <thing> with todo!() stubs` followed by `feat(<scope>): implement <thing>` (or whichever conventional type fits).
|
||||
|
||||
**Parallelism:**
|
||||
- **Python:** Independent tasks can have tests written in parallel, *provided* their test files are disjoint and no shared `conftest.py` is being modified.
|
||||
- **Rust:** Run `@test` dispatches **sequentially**. Cargo serialises the build via the `target/` directory lock, so parallel dispatches give no speedup; they only add risk (a long-running build in one branch starves the other, and any task that touches a shared crate-level fixture/helper file will race).
|
||||
**Parallelism:** N/A in the one-task-per-run model (ADR-21). Phase 6 dispatches `@test` for task-1 only; the stub-pass `@make` (when applicable) runs strictly before `@test`, and the body-pass `@make` strictly after. There are no peer dispatches to parallelise.
|
||||
|
||||
**Constraint:** `@test` must not modify existing `conftest.py` files (prevents collision during parallel execution).
|
||||
**Constraint:** `@test` must not modify existing `conftest.py` files (preserves cross-test invariants for the project's broader suite).
|
||||
|
||||
---
|
||||
|
||||
## Phase 7: Implement
|
||||
|
||||
Apply **Dispatch Hygiene** to each `@make` spec before sending. Repeated trips on a single task signal a Phase 5 split problem — go back and split.
|
||||
Apply **Dispatch Hygiene** to each `@make` spec before sending. Repeated trips on a single task signal a scoping problem — `@check`'s diagnosis (below) will return `split_needed`, and the orchestrator files the task as a sub-issue and exits.
|
||||
|
||||
**`@make` dispatches are SEQUENTIAL — never in parallel.** Run each task to completion (writes, every verification command, and the orchestrator's post-check) before dispatching the next. Reasons:
|
||||
- `@make` writes source files. Parallel agents picking the same file (e.g. `src/lib.rs` for adding both a new `pub mod` and a registration) corrupt each other.
|
||||
- Even on disjoint files, Cargo's `target/` lock and uv's venv state serialise the verification builds anyway, so parallelism gives no speedup.
|
||||
- Stub-pass/body-pass pairs (Rust integration TDD) must be strictly ordered within a task; running stub-pass for task 2 while body-pass for task 1 is still building yields a non-deterministic crate state for `@test` to RED against.
|
||||
**One task per run** (ADR-21): only task-1 reaches Phase 7. Tasks 2…N were filed as sub-issues at the end of Phase 5.5; this run does not dispatch `@make` for any of them. Stub-pass and body-pass dispatches for task-1 still run sequentially (the stub-pass must commit before `@test` runs against it, and the body-pass must run after `@test` produces failing tests).
|
||||
|
||||
This applies to **all** `@make` invocations: standard mode, TDD mode, stub-pass, body-pass, and integration-fix dispatches.
|
||||
|
||||
Execute each task by dispatching `@make` with a short prompt:
|
||||
- The absolute path to the task spec: `$RUN_DIR/task-<N>.md` — `@make` reads acceptance criteria, code context, and files-to-modify from there.
|
||||
Execute task-1 by dispatching `@make` with a short prompt:
|
||||
- The absolute path to the task spec: `$RUN_DIR/task-1.md` — `@make` reads acceptance criteria, code context, and files-to-modify from there.
|
||||
- The worktree path.
|
||||
- **Pre-written failing tests and handoff from `@test` (if TESTS_READY)** — these are short and per-dispatch, so include them inline in the prompt.
|
||||
|
||||
Do **not** quote the task spec inline.
|
||||
|
||||
`@make` runs in TDD mode when tests are provided:
|
||||
1. Entry validation: run tests, verify RED, check failure codes match handoff
|
||||
2. Implement minimal code to make tests pass (GREEN)
|
||||
3. Regression check on broader area
|
||||
4. Refactor while keeping green
|
||||
5. Report RED→GREEN evidence
|
||||
1. Entry validation: run tests, verify RED, check failure codes match handoff.
|
||||
2. Implement minimal code to make tests pass (GREEN).
|
||||
3. Regression check: run the project's full test suite (`nix develop -c cargo test` / `uv run pytest` / `nix flake check` as appropriate).
|
||||
4. Refactor while keeping green.
|
||||
5. Report RED→GREEN evidence.
|
||||
|
||||
**Escalation — two paths route through `@check` → `@test` → back to `@make`:**
|
||||
For NOT_TESTABLE tasks, `@make` runs in standard mode (no entry validation; standard implementation + verification).
|
||||
|
||||
1. **Entry-validation escalation.** Before implementing, `@make`'s entry check (run tests, verify RED, compare against handoff) reveals test-quality concerns — wrong assertion target, mixed failure codes, mocks of internal boundaries, etc. `@make` reports without writing any production code.
|
||||
2. **Mid-implementation escalation.** After implementing, `@make` hits its iteration limit (2–3 attempts) because the test demands production code that's impossible or contradicts the spec. `@make` returns `Implementation Incomplete` with the flag `escalate: test_design`. **Do not** re-dispatch `@make` with marginal context tweaks — that just burns cycles on a test that needs redesign, not better implementation.
|
||||
### Implementation Incomplete — unified diagnosis path (ADR-19)
|
||||
|
||||
In both cases:
|
||||
When `@make` returns `Implementation Incomplete` for *any* reason — entry-validation concern, mid-implementation iteration limit, `escalate: test_design` flag, `escalate: split_needed` flag, or no flag at all — the orchestrator routes the report through `@check` for diagnosis. **Do not re-dispatch `@make` with marginal context tweaks.** `@make`'s self-diagnosis (the `escalate:` flag, if present) becomes a *hint* for `@check`; `@check` is the authority that decides what to do next.
|
||||
|
||||
1. `@make` returns its report (entry-time concern or mid-impl `escalate: test_design`).
|
||||
2. Orchestrator routes the report to `@check` for diagnosis (light review of the *tests*, not the implementation).
|
||||
3. `@check` confirms or rejects the test-design suspicion.
|
||||
4. **If confirmed:** orchestrator routes to `@test` to redesign the tests. Apply Dispatch Hygiene. Fixed tests return to `@make` for fresh entry validation and a clean implementation attempt.
|
||||
5. **If rejected:** the issue is in the production code; orchestrator re-dispatches `@make` with `@check`'s diagnostic notes attached.
|
||||
Steps:
|
||||
|
||||
**Iteration limit on this loop: max 2 cycles.** If a test-design suspicion keeps surfacing but `@check` never confirms it, the design problem is upstream — revisit the Phase 3 plan rather than thrashing between `@test` and `@make`.
|
||||
1. `@make` returns its `Implementation Incomplete` report (with or without an `escalate:` flag).
|
||||
2. Orchestrator dispatches `@check` for diagnosis. Inputs: the test files (`@check` reads them from disk), the production code state (the in-progress diff), the task spec at `$RUN_DIR/task-1.md`, and `@make`'s self-diagnosis hint.
|
||||
3. `@check` returns one of three verdicts:
|
||||
- **`test_design`** — the test demands production code that's impossible, internally-inconsistent, or testing the wrong observable. Orchestrator dispatches `@test` to redesign the tests. Apply Dispatch Hygiene. Fixed tests return to `@make` for fresh entry validation and a clean implementation attempt.
|
||||
- **`production_logic`** — the test is sound; `@make`'s implementation is wrong or incomplete. Orchestrator re-dispatches `@make` with `@check`'s diagnostic notes attached.
|
||||
- **`split_needed`** — the task is over-scoped; no realistic implementation can satisfy the AC within the task's stated files-to-modify. See "split_needed exit" below.
|
||||
|
||||
For NOT_TESTABLE tasks, `@make` runs in standard mode.
|
||||
**Iteration limit on this loop: max 2 cycles.** If two cycles of `@check` diagnosis don't yield a clean `@make` Implementation Complete, the design problem is upstream — **decrement `PLAN_REWORK_REMAINING`** and revisit the Phase 3 plan. If `PLAN_REWORK_REMAINING` is already `0`, abort to the Failure Handler instead.
|
||||
|
||||
After all tasks complete, verify overall integration:
|
||||
- Run the project's test suite if available
|
||||
- Run linting/type checking if configured
|
||||
- Fix any integration issues between tasks
|
||||
### split_needed exit (ADR-21)
|
||||
|
||||
When `@check` diagnosis returns `split_needed` for task-1, **abort to the Failure Handler.** Rationale (Q19a per ADR-21): in the one-task-per-run model, task-1 is the only task in this invocation. No feature AC have been satisfied yet (any stub-pass commit is scaffolding, not feature work). Pre-filing a sub-issue from a botched plan would pollute `TODO/` with bad scoping the user has to manually clean up; the most likely upstream diagnosis is a plan-level scoping error, not a task-level over-scope.
|
||||
|
||||
The Failure Handler's recovery procedure (ADR-14: discard worktree, delete branch, re-create from base, re-run) cleanly destroys the stub-pass commit and any in-progress body-pass changes. The user re-runs `/workflow` on the same parent issue; Phase 3/4/5 produce a fresh plan and split. Sub-issues filed at Phase 5.5 of the *previous* run remain in `TODO/` (they were committed there in the failed run only if Phase 9 reached the TODO commit, which by definition it didn't if Phase 7 invoked the Failure Handler) — typically they're gone with the discarded worktree.
|
||||
|
||||
Concretely on `split_needed`:
|
||||
|
||||
1. Write a Failure Handler summary noting `@check`'s diagnosis verbatim and the Phase 5 split that was attempted.
|
||||
2. Dispatch `@pm` to add a comment on `./TODO/$ISSUE_ID.md`: `- YYYY-MM-DD — split_needed at Phase 7 task-1; <one-line diagnosis>. Re-run after re-creating the worktree.`
|
||||
3. Stop execution. Do not commit code, do not file new sub-issues, do not stage anything under `.workflow/`.
|
||||
|
||||
---
|
||||
|
||||
## Phase 8: Final Review
|
||||
|
||||
Apply **Dispatch Hygiene** to each reviewer prompt before sending. Dispatch `@check` and `@simplify` in parallel to review the full implementation (all changes across all files).
|
||||
Apply **Dispatch Hygiene** to each reviewer prompt before sending. Dispatch `@check` and `@simplify` in parallel to review the implementation. Because the run executes exactly one task (ADR-21), the diff is bounded by the task's scope (~50 lines per the Split Heuristic) and fits comfortably in reviewer context — no chunking needed.
|
||||
|
||||
Provide reviewers with:
|
||||
- The absolute path to `$RUN_DIR/plan.md` (the same file Phase 4 reviewed; mid-loop revisions will have updated it in place)
|
||||
- The full diff (`git diff "$BASE_BRANCH"...HEAD`)
|
||||
- Any decisions or deviations from the plan, captured inline in the dispatch prompt
|
||||
- The absolute path to `$RUN_DIR/plan.md` (the same file Phase 4 reviewed; mid-loop revisions will have updated it in place).
|
||||
- The absolute path to `$RUN_DIR/task-1.md` (the spec the implementation actually targeted).
|
||||
- The full diff (`git diff "$BASE_BRANCH"...HEAD`).
|
||||
- Any decisions or deviations from the plan, captured inline in the dispatch prompt.
|
||||
- **NOT_TESTABLE manifest (ADR-18):** if task-1 went `NOT_TESTABLE` at Phase 6, the dispatch prompt includes a "Tasks completed without tests (NOT_TESTABLE)" section listing the `@test` justification and the `@check` sign-off rationale. Reviewers explicitly evaluate "does the justification still hold given the final diff?" and may BLOCK if it doesn't. If task-1 had tests (the common case), this section reads "None — task-1 has tests."
|
||||
|
||||
**Review loop (max 3 cycles):**
|
||||
1. Send implementation to both reviewers
|
||||
2. Merge findings (same precedence rules as Phase 4)
|
||||
3. If ACCEPTABLE: proceed to Phase 9
|
||||
4. If issues found, route per the kind of finding — **the orchestrator does not write production code; `@make` does**:
|
||||
- **`BLOCK`, behavioral, correctness, or test-quality findings:** build a new `@make` task spec from the finding (apply Dispatch Hygiene, finalized text, no draft answer). Dispatch `@make`. Do **not** fix directly. Every `BLOCK` is by definition behavioral and must round-trip through `@make`.
|
||||
- **Strictly cosmetic findings** (typo in a comment, missing trailing newline, formatting that does not change the AST or behavior): the orchestrator may fix directly, then re-review. Anything compiler-detected (unused import, dead code) goes through `@make`, since removing it is still a code change.
|
||||
**Verdict authority** (ADR-15): same as Phase 4. `@check` is the only blocking reviewer; `@simplify`'s findings are recorded as advisory in the run summary but never trigger a re-dispatch loop.
|
||||
|
||||
**Review loop (max 3 cycles, gated on `@check`):**
|
||||
1. Send implementation to both reviewers (dispatched in parallel).
|
||||
2. Merge findings: record `@simplify`'s output for the summary; act only on `@check`'s verdict.
|
||||
3. If `@check` returns ACCEPTABLE: proceed to Phase 9.
|
||||
4. If `@check` finds issues, route per the kind of finding — **the orchestrator does not write production code; `@make` does**:
|
||||
- **`BLOCK`, behavioral, correctness, or production-code findings:** write a fix spec to `$RUN_DIR/task-fix-<N>.md` (1-indexed within this Phase 8 cycle, ADR-16). Apply Dispatch Hygiene, finalized text, no draft answer. Verify with `test -f` before dispatching. Dispatch `@make` with the absolute path. Do **not** fix directly. Every `BLOCK` is by definition behavioral and must round-trip through `@make`.
|
||||
- **Test-quality / test-design findings:** route through the unified Implementation Incomplete diagnosis path (Phase 7) — `@check` diagnosis → `@test` redesign → fresh `@make` re-attempt against the existing task spec.
|
||||
- **Plan-level finding:** **decrement `PLAN_REWORK_REMAINING`** and re-enter Phase 3 with the finding. If the counter is already `0`, abort to the Failure Handler.
|
||||
- **Strictly cosmetic findings** (typo in a comment, missing trailing newline, formatting that does not change the AST or behavior): the orchestrator may fix directly, then re-review. Anything compiler-detected (unused import, dead code) goes through `@make` via a `task-fix-<N>.md`, since removing it is still a code change.
|
||||
- When in doubt, dispatch `@make`.
|
||||
5. **Convergence detection:** same findings twice = stop loop early
|
||||
6. If unresolved after 3 cycles: document blockers, proceed to commit anyway
|
||||
5. **Convergence detection:** same `@check` findings twice = stop loop early.
|
||||
6. If unresolved after 3 cycles: document blockers, proceed to commit anyway.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -462,11 +515,16 @@ The workflow is forge-agnostic. It commits locally and stops. **Do not push, and
|
|||
- If changes are large/varied, use multiple atomic commits (one per logical unit)
|
||||
|
||||
### TODO Update
|
||||
- Dispatch `@pm` against the absolute `./TODO/` path. Ask it to:
|
||||
- Set the issue file's frontmatter `status` to `Done` (or leave at `In Progress` if the run is incomplete and the user must verify before marking Done).
|
||||
- Add a comment of the form: `- YYYY-MM-DD — Branch \`$BRANCH_NAME\`, commit <SHA> — <one-line summary>` (date from the shell, never fabricated).
|
||||
- Propagate any status flip to the dependent index: `TODO/README.md` for top-level issues (`parent: null`), or the parent file's `## Sub-issues` line for sub-issues.
|
||||
- If acceptance-criteria checkboxes were addressed by the implementation, ask `@pm` to check them off (flip `- [ ]` to `- [x]` under `## Acceptance criteria`).
|
||||
|
||||
Dispatch `@pm` against the absolute `./TODO/` path. Ask it to:
|
||||
|
||||
1. **Check off the AC checkboxes that task-1 satisfied.** For each `- [ ]` AC line in `./TODO/$ISSUE_ID.md`'s `## Acceptance criteria` section that the implemented work fulfilled, flip to `- [x]`. The orchestrator decides which AC are satisfied by inspecting task-1's spec and verification output.
|
||||
2. **Set the issue's frontmatter `status` based on AC completion** (ADR-21, AC-driven):
|
||||
- **All AC are now `[x]`** → `status: Done`.
|
||||
- **Some AC remain `[ ]`** → `status: In Progress`. (Sub-issues filed at Phase 5.5 cover the unmet AC; the user runs them in subsequent invocations.)
|
||||
- **No AC section in the file** → `status: Done` (the parent had no testable AC; one task ran end-to-end).
|
||||
3. **Propagate any status flip to the dependent index:** `TODO/README.md` for top-level issues (`parent: null`), or the parent issue file's `## Sub-issues` line for sub-issues.
|
||||
4. **Add a comment** of the form: `- YYYY-MM-DD — Branch \`$BRANCH_NAME\`, commit <SHA> — <one-line summary>` (date from the shell, never fabricated).
|
||||
|
||||
### File Follow-ups
|
||||
|
||||
|
|
@ -512,11 +570,27 @@ If no follow-ups were filed, the commit message simplifies to `chore(todo): upda
|
|||
|
||||
## Failure Handling
|
||||
|
||||
At any phase, if an unrecoverable error occurs:
|
||||
At any phase, if an unrecoverable error occurs (or a routing rule explicitly aborts to the Failure Handler — `PLAN_REWORK_REMAINING` exhausted, `split_needed` at Phase 7, etc.):
|
||||
|
||||
1. Write `$RUN_DIR/summary.md` (creating `$RUN_DIR` first if it doesn't exist) with what was completed and what failed. Do **not** stage or commit anything under `.workflow/`.
|
||||
2. If any code was written, commit it with message `wip: incomplete workflow run for <issue-id>`. Stage code only — exclude `.workflow/` and `TODO/`.
|
||||
3. Leave the branch and worktree intact for the user to inspect — do not push, do not delete.
|
||||
4. Dispatch `@pm` against `./TODO/` to add a comment on the issue file (`./TODO/<ID>.md`) summarising what failed.
|
||||
4. Dispatch `@pm` against `./TODO/` to add a comment on the issue file (`./TODO/<ID>.md`) summarising what failed and naming the abort reason if it was a routing-rule abort (e.g. `split_needed at Phase 7 task-1`, `plan_rework_remaining exhausted at Phase 8`).
|
||||
5. Stop execution.
|
||||
|
||||
### Recovery procedure (workflow is non-resumable, ADR-14)
|
||||
|
||||
The workflow is **non-resumable**. There is no `--resume` mode and no idempotent re-run path. To retry after a Failure Handler invocation (or after a user-initiated cancellation):
|
||||
|
||||
1. `git worktree remove <path>` — discard the failed worktree.
|
||||
2. Delete the feature branch: `git branch -D <branch>`. The Failure Handler's `wip:` commit (if any) is discarded with the branch.
|
||||
3. Re-create the worktree from `$BASE_BRANCH`: `git worktree add <path> -b <branch> <base-branch>`.
|
||||
4. Re-run `/workflow <ISSUE-ID>` from the fresh worktree.
|
||||
|
||||
The throwaway-worktree model is the recovery story. Re-running on the same worktree without this cleanup risks committing partial state or appending duplicate `@pm` comments.
|
||||
|
||||
### User-initiated cancellation
|
||||
|
||||
User-initiated cancellation (Ctrl-C) follows the same procedure as automatic Failure Handler invocations: discard the worktree per the recovery procedure above. The orchestrator does not poll a soft-stop sentinel — runs are short enough (one task, ~10–30 min) that hard cancellation is fine.
|
||||
|
||||
**Never hang on interactive prompts.** If any command appears to require input, treat it as a failure and follow the above procedure.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue