- Scope the orchestrator-direct-verification substitution to anchor-100 P2/P3
findings only (verifiable from code alone, where an independent re-read adds
nothing). P2/P3 at anchor 75 (judgment calls) and all P0/P1 still require the
independent validator wave -- the orchestrator synthesized these findings, so
it cannot supply the fresh second opinion that filters their false positives.
Direct verification catches a wrong fact, not the orchestrator's own bias.
- Make the subagent and validator templates path-aware: when a large review
stages the diff/file-list to disk and passes paths (per the Stage 4 nudge),
the child now Reads the staged path instead of treating the filename as the
content to review. Without this, the path-staging mode the nudge introduced
would feed reviewers/validators a bare filename in the Diff block and they'd
analyze no hunks.
- Point the Stage 4 nudge at the diff / changed-files slots and note that the
templates instruct the child to Read a staged path.
Both dogfood runs validated a P2/P3-only survivor set by orchestrator direct
first-party verification instead of the per-finding subagent wave, reading
"complement, not a skip / never replaces the wave for P0/P1" as "may replace
it for non-P0/P1." That read is reasonable -- each P2 was source-checkable and
the full suite ran -- so make it intended rather than emergent:
- Any P0/P1 survivor: the per-finding validator wave is required; direct
verification only complements it.
- P2/P3-only survivors: direct first-party verification per finding may stand
in for the wave when each finding is source-checkable; note the method in
Coverage.
The stage still runs whenever findings survive (validation is never skipped);
only the method varies by severity. Contract 29/0.
Two dogfood runs rendered the Applied section as Field:-prefixed blocks +
box-drawing separators instead of the mandated `| # | File | Fix | Reviewer |`
pipe table -- reproducibly, even on a faithful run that otherwise followed the
skill (loaded refs, used path-passing, applied + committed-on-clean, never
pushed, adapted commit scope to fix(cli):).
Root cause: the "never produce these shapes" rules and the format-verification
gate were scoped to "a finding's index" / "across severities" -- they never
named the Applied table, so the no-field-blocks rule read as findings-only.
(The review-output-template's Applied example was already a correct pipe table;
the always-loaded rules just didn't cover Applied.)
- Broaden the forbidden-shapes rules to every tabular section, Applied
included, naming the #:/Fix: field-block labels the agent actually emitted.
- Harden the Applied item: pipe table, never Field:-blocks or box-drawing.
- Restate the concrete forbidden shapes at the format gate and call out the
Applied table as the most common offender -- restating at the point of
action for the proven-fragile format spot.
Contract 29/0.
Three gaps surfaced by a dogfood review on a large PR (none from the lean
refactor itself):
- Persist the rendered report (report.md, default mode) in the run dir, so
output format and numbering stay auditable after a run.
- State at the orchestrator level that each reviewer's artifact file must
carry the detail-tier fields (why_it_matters, evidence) -- writing the
compact return to the artifact strips detail that Coverage and the keyed
detail lines depend on -- and that staging context to disk for a large
diff never licenses a thinner reviewer contract.
- Soften the hardcoded fix(review): commit scope to "fix(review):, or the
repo's nearest convention" -- target repos whose commit-lint has no
"review" scope reject the literal form.
Contract 29/0; full suite unchanged (same pre-existing CLI failures).
The "pass file paths, not contents, for large material" rule lived only in
AGENTS.md (authoring context, invisible at runtime), while the runtime
SKILL.md told reviewers to receive the diff inline. So orchestrators that
staged a big diff to disk and passed reviewers the path were exercising
good judgment the skill never encoded -- emergent and unreliable.
Add a brief Stage 4 nudge: when inlining the diff/file-list into every
reviewer and validator prompt would be wasteful (many files / a big diff),
write them once into the run dir (full.diff, files.txt) and pass the paths;
inline a small diff directly. The trigger is the qualitative cost test, not
a magic number, and it's a nudge (the failure mode is wasted tokens, not a
broken review) -- so it stays lightweight.
Mode-safe: the staged diff is an artifact, not a workspace file, so passing
its path is valid even in pr-remote/branch-remote scope. Contract 29/0.
Reduce the always-loaded SKILL.md body without changing behavior:
- Dedup the apply/commit/never-push model to a single home (Stage 5c);
Operating principles, Output format, Action Routing, After Review, and
the deprecated-mode note are now terse pointers.
- Consolidate the Stage 6 findings-format failure signatures, which were
stated twice (the "never produce these shapes" list and the
format-verification gate).
- Trim behavior-neutral rationale across Stage 5/5b/5c, keeping the
reasons that guard named failures (gate-runs-late ordering, scope-mode
inspection, per-finding validator independence).
- Collapse the "## Reviewers" tables to a compact roster with terse
triggers; de-triplicate the data-migration spawn gate and the
maintainability-owns-structure rule.
- Drop the redundant @-inline of review-output-template: Stage 6 already
loads it on demand and keeps an inline fallback skeleton, so inlining
it again was duplicate always-loaded weight.
persona-catalog and subagent-template stay @-inlined -- they are needed
in every full review, so deferring them would save little while adding
load-reliability risk on the dispatch contract.
SKILL.md 703 -> 671 lines, 9,205 -> 8,561 words. Contract tests 29/0;
full suite unchanged (same pre-existing CLI failures, zero new).