Files
compound-engineering-plugin…/docs/skills/ce-code-review.md

16 KiB

ce-code-review

Structured code review using tiered persona agents, confidence-gated findings, and a merge/dedup pipeline.

ce-code-review is the deep code review skill. It analyzes the diff (PR, branch, or current changes), selects the right reviewer personas for what was actually touched, dispatches them in parallel, then merges and deduplicates their findings into a single report. Each finding carries a severity (P0-P3), an autofix class (safe_auto, gated_auto, manual, advisory), and an owner that determines what happens next. Safe deterministic fixes can be auto-applied; everything else routes through structured user decisions.

The compound-engineering ideation chain is /ce-ideate → /ce-brainstorm → /ce-plan → /ce-work. ce-code-review is /ce-work's Tier 2 escalation target — invoked automatically for sensitive surfaces, large diffs, or explicit deep-review requests, but also directly invocable any time you want a structured review of the current branch or a specific PR.


TL;DR

Question Answer
What does it do? Selects reviewer personas based on diff content, dispatches them in parallel, merges findings into one report with confidence gating and auto-fix routing
When to use it Before opening a PR for sensitive/large work; explicit deep review requested; harness has no built-in /review
What it produces A structured findings report — interactive review, applied fixes, residual work routed via the gate
Modes Interactive (default), Autofix, Report-only, Headless

The Problem

Generalist code review prompts collapse in predictable ways:

  • Surface-level findings — "consider adding tests" without naming what to test for
  • Wrong findings for the diff — security feedback on a doc-only change, performance feedback on a typo fix
  • No severity calibration — every finding presented as critical, drowning the actual P0s
  • No confidence calibration — speculative "could be a bug" presented identically to verified defects
  • One pass at one model's reasoning — a single reviewer biased toward whatever it was last trained on most heavily
  • No structured follow-through — findings end up in chat; no record, no fix queue, no residual handling
  • Mutating actions on the wrong checkout — running review on a shared checkout while another agent runs tests in parallel produces undefined outcomes

The Solution

ce-code-review runs review as a structured pipeline with explicit gates:

  • Diff-aware persona selection — 4 always-on reviewers + 2 CE always-on agents, plus cross-cutting and stack-specific personas chosen for what the diff actually touches
  • Parallel persona dispatch — each reviewer focuses on its lens; results return in parallel
  • Confidence-gated synthesis — findings merge, dedupe, promote on cross-persona agreement, and route by autofix class
  • Severity scale (P0-P3) + autofix class — separates urgency from action ownership
  • Four modes — Interactive, Autofix, Report-only, Headless — for different invocation contexts
  • Residual Work Gate — when autofix doesn't resolve everything, structured options for accept / file tickets / continue / stop
  • Quick-review short-circuit — defers to harness-native /review for light passes; multi-agent runs only when warranted

What Makes It Novel

1. Diff-aware persona selection

A small config change triggers 6 reviewers (the 4 always-on + 2 CE always-on). A Rails auth feature with migrations might trigger 10. The skill decides which personas fit the diff:

  • Always-on (every review)ce-correctness-reviewer, ce-testing-reviewer, ce-maintainability-reviewer, ce-project-standards-reviewer, ce-agent-native-reviewer, ce-learnings-researcher
  • Cross-cutting conditional — security, performance, API contract, data migrations, reliability, adversarial, previous-comments — each selected only when the diff touches its concern
  • Stack-specific conditional — Julik frontend races, Swift/iOS — only when the matching runtime domain is touched. Structural quality (complexity deletion, 1k-line regressions, spaghetti) lives in the always-on maintainability persona.
  • CE conditional (migrations)ce-deployment-verification-agent for risky migration diffs; schema drift and migration safety are handled by the data-migration persona

Persona selection is agent judgment, not keyword matching. Instruction-prose files (Markdown skills, JSON schemas) are product code but skip runtime-focused reviewers (adversarial, races) — they wouldn't apply.

2. Severity (P0-P3) and autofix class are orthogonal

Severity answers urgency (P0=critical breakage, P3=user discretion). The autofix class answers who acts next:

  • safe_autoreview-fixer enters the in-skill fixer queue automatically (only when mode allows mutation)
  • gated_auto → fix exists but changes behavior, contracts, or sensitive boundaries — routes to a downstream resolver or human
  • manual → actionable work for handoff
  • advisory → report-only output (learnings, rollout notes, residual risk)

Synthesis owns the final route. Persona-provided routing metadata is input, not the last word — disagreements default to the more conservative route.

3. Four modes — different invocation contexts

Mode When Behavior
Interactive (default) Direct user invocation Apply safe_auto fixes, ask policy decisions on gated_auto/manual, optionally continue into next steps
Autofix mode:autofix Apply safe_auto only; no user prompts; emit Residual Actionable Work summary for the caller
Report-only mode:report-only Strictly read-only; safe to run concurrently with browser tests on the same checkout
Headless mode:headless Programmatic mode for skill-to-skill; structured text output with all non-auto findings preserved

Modes that mutate the checkout refuse to switch branches on a shared checkout — they require an isolated worktree or base:<ref> to review without checkout-switching.

4. Quick-review short-circuit

When the user asks for a "quick", "fast", or "light" review, the skill defers to the harness-native code review (e.g., /review in Claude Code) instead of dispatching the multi-agent pipeline. This respects intent — sometimes the right tool is the lighter one. Programmatic callers (autofix / report-only / headless) bypass the short-circuit and always run the full pipeline.

5. Synthesis pipeline — merge, dedupe, promote, route

After all dispatched personas return, synthesis:

  • Validates each finding against the schema
  • Anchors to the actual diff (drops findings about lines that don't exist or aren't in scope)
  • Deduplicates across personas (same issue surfaced by multiple reviewers)
  • Promotes confidence on cross-persona agreement (two reviewers spotting the same issue raises priority)
  • Resolves contradictions (different personas disagree about what to do)
  • Auto-promotes safe-auto candidates that meet the bar
  • Routes by tier — applied fixes, gated/manual, FYI

The output is one report with calibrated severity, evidence quotes, and explicit ownership — not a flat list of every reviewer's raw output.

6. Plan discovery for requirements verification

When the diff has an associated plan (docs/plans/*.md), the skill discovers it (via plan: argument, PR body link, or auto-discovery from branch name) and reads its Requirements section + Implementation Units. Synthesis then verifies the diff actually satisfies those requirements — catching the case where the code looks fine but doesn't match what the plan said it should do.

7. Residual Work Gate

When autofix mode runs and the in-skill fixer can't resolve everything, the residual work doesn't just disappear into chat. The Residual Actionable Work summary lists each unresolved finding with stable numbering, severity, file:line, title, and autofix class. Callers (e.g., /ce-work Phase 3.4) read this summary and present user options: apply now, file tickets, accept with durable sink, or stop.

8. Protected artifacts

Compound-engineering pipeline artifacts (docs/brainstorms/*, docs/plans/*.md, docs/solutions/*.md) are protected — reviewers' findings to delete or gitignore them are discarded during synthesis. These are decision artifacts the pipeline depends on; reviewers shouldn't garbage-collect them.


Quick Example

You invoke /ce-code-review on a feature branch with a Rails auth change that includes a database migration.

The skill detects you're on a feature branch (no PR yet), resolves the base from origin/HEAD (or PR metadata when an open PR exists), and computes the diff. Stage 2 reads commit messages and writes a 2-3 line intent summary. Stage 2b auto-discovers the plan in docs/plans/ from the branch name and reads its Requirements (R1-R8, U1-U6).

Stage 3 selects reviewers: the 6 always-on, plus security (auth touched), reliability (background job for token cleanup), data-migration (migration file present), and deployment-verification agent when the migration is risky. Seven or eight reviewers total, dispatched in parallel.

After all return, synthesis merges 23 raw findings into 14 distinct findings. Three are safe_auto (typo, rename, dead code) and applied automatically. Six are gated_auto for the auth surface — routed into the interactive walk-through. Two are manual (deployment Go/No-Go checklist items). Three are advisory (FYI notes). Each finding has anchored evidence and a stable number.

You walk through the 6 gated findings, apply 4, defer 1 to follow-up via the tracker, and decline 1 with a cited harm. Final validation runs; the report is saved.


When to Reach For It

Reach for ce-code-review when:

  • You're about to open a PR for sensitive or large work (auth, payments, migrations, public APIs)
  • Your harness lacks a built-in /review and you still want a real review
  • You want structured handling of residual work, not just findings dumped in chat
  • You explicitly want a deeper, multi-persona pass (e.g., "review this thoroughly")
  • Another skill is escalating to it (/ce-work Phase 3.3 Tier 2, /ce-optimize Phase 4.3)

Skip ce-code-review when:

  • You want a quick light review — your harness's built-in /review is right; the short-circuit handles this
  • The change is trivial (typo, formatting, dependency bump) — Tier 1 review is sufficient
  • You want to fix bugs you find, not review code → use /ce-debug

Use as Part of the Workflow

ce-code-review is invoked from multiple skills as the deep-review path:

  • /ce-work Phase 3.3 — escalates to ce-code-review mode:autofix for sensitive surfaces, ≥400 lines + diffuse, ≥1,000 lines, or explicit thorough-review requests
  • /ce-work Phase 3.4 Residual Work Gate — reads the Residual Actionable Work summary ce-code-review returned and presents user options
  • /ce-optimize Phase 4.3 — runs against the cumulative optimization branch diff before merging
  • /ce-doc-review — sibling skill for docs (requirements, plans), not code

Tier 1 (harness-native /review) handles most cases; ce-code-review is the Tier 2 escalation.


Use Standalone

The skill works directly from any starting state:

  • Current branch/ce-code-review
  • Specific PR/ce-code-review 1234 or /ce-code-review <PR URL>
  • Specific branch/ce-code-review feat/notification-mute
  • With base ref/ce-code-review base:abc1234 or base:origin/main (skips scope detection; reviews against that ref)
  • With plan/ce-code-review plan:docs/plans/.../plan.md for explicit requirements verification

Concurrent use note: mode:report-only is the only mode safe to run alongside browser tests on the same checkout. Other modes mutate (apply safe_auto fixes); they need isolated checkouts when running concurrently.


Reference

Argument Effect
(empty) Reviews current branch (detects base from origin/HEAD or PR metadata)
<PR number or URL> Reviews that PR (checks out, fetches metadata, reviews against PR base)
<branch name> Checks out and reviews against detected base
base:<sha-or-ref> Skips scope detection; reviews current checkout against that ref
plan:<path> Loads the plan for requirements verification
mode:autofix No prompts; apply safe_auto only; emit Residual Actionable Work summary
mode:report-only Strictly read-only; safe with concurrent browser tests
mode:headless Skill-to-skill; structured text output

Conflicting mode flags stop execution with an error. Combining base: with a PR/branch target also errors — pass one or the other.


FAQ

Why not just use the harness's built-in /review? Use it when it's the right tool — the quick-review short-circuit defers to it explicitly. ce-code-review is for cases where you want diff-aware persona selection, structured findings with calibrated severity, autofix routing, and residual work handling. It's the heavier tool; reach for it when the work warrants.

How does it decide which personas to dispatch? Agent judgment over the actual diff — not keyword matching. The 4 always-on + 2 CE always-on personas run for every review. Cross-cutting and stack-specific personas are added when their concern is touched (e.g., security if auth files changed; ce-data-migration-reviewer when migration or schema dump files are present). Instruction-prose files skip runtime-focused reviewers (adversarial, races).

What's the difference between Autofix and Headless? Autofix applies safe_auto fixes silently and emits a Residual Actionable Work summary for the caller to route. Headless is similar but returns all findings as structured text (including safe_auto) and never enters bounded re-review rounds. Headless is for programmatic skill-to-skill invocation; Autofix is for orchestrators that own the residual-handling UI.

What's the Residual Work Gate? The structured presentation of findings the autofix pass couldn't resolve. The caller (typically /ce-work Phase 3.4) reads the summary and asks the user: apply now, file tickets, accept with durable sink, or stop. "Accept" requires a real durable record (Known Residuals in PR description, or docs/residual-review-findings/<sha>.md) — findings can't disappear into chat.

Why does it refuse to switch the shared checkout in some modes? Because mutating modes (Interactive, Autofix, Headless) write files. Switching the shared checkout while another agent is running tests or holding state produces undefined outcomes. The skill instead asks for base:<ref> (review the current checkout against a different ref) or an isolated worktree.

Can it run concurrently with browser tests? Only mode:report-only. The other modes mutate, so they need isolated checkouts.

Does it support non-software work? No — the skill is tightly coupled to git, code reviewers, and PR contexts. For docs (requirements, plans), use /ce-doc-review instead.


See Also

  • ce-work — primary upstream caller; escalates to ce-code-review at Phase 3.3
  • ce-doc-review — sibling skill for documents (requirements, plans), not code
  • ce-debug — for fixing bugs found during review, when root-cause investigation matters
  • ce-resolve-pr-feedback — handles incoming reviewer comments after a PR is open
  • ce-simplify-code — invoked by ce-work before review; complement, not substitute