compound-engineering-plugin-mirror/docs/solutions/skill-design/compound-refresh-skill-improvements.md at tmchow/https-github.com-EveryInc-compound-engineering-plugin-issues-867

averello/compound-engineering-plugin-mirror

Fork 0

mirror of https://github.com/EveryInc/compound-engineering-plugin.git synced 2026-06-19 15:41:46 +02:00

Files

T

Trevin Chow 8cc07acb52 docs(solutions): refresh stale references and prune obsolete doc (#712 )

2026-04-27 22:17:02 -07:00

8.2 KiB

Raw Permalink Blame History

title, category, date, module, component, tags, severity, description, related

title

Problem

The initial ce:compound-refresh skill had several design issues discovered during real-world testing:

Interactive questions never triggered the proper tool (AskUserQuestion) because the instruction used a weak "when available" qualifier
Auto-delete criteria contradicted a "always ask before deleting" rule in a later phase
Broad scope (9+ docs) asked the user to choose an area blindly without providing analysis
The Replace flow tried to hand off to ce:compound, which expects fresh problem-solving context the user doesn't have months later
Subagents used shell commands for file existence checks, triggering permission prompts
No way to run the skill unattended (e.g., on a schedule) — every run required user interaction

Root Cause

Five independent design issues, each with a distinct root cause:

Hardcoded tool name with escape hatch. Saying "Use AskUserQuestion when available" gave the model permission to skip the tool and just output text. Also non-portable to Codex and other platforms.
Contradictory rules across phases. Phase 2 defined auto-delete criteria. Phase 3 said "always ask before deleting" with no exception. The model followed Phase 3.
Question before evidence. The skill prompted scope selection before gathering any information about which areas were most stale or interconnected.
Unsatisfied precondition in cross-skill handoff. ce:compound expects a recently solved problem with fresh context. A maintenance refresh has investigation evidence instead — equivalent data, different shape.
No tool preference guidance for subagents. Without explicit instruction, subagents defaulted to bash for file operations.
Interactive-only design. Every phase assumed a user was present. No way to run autonomously for scheduled maintenance or hands-off sweeps.

Solution

1. Platform-agnostic interactive questions

Reference "the platform's interactive question tool" as the concept, with concrete examples:

Ask questions **one at a time** — use the platform's interactive question tool
(e.g. `AskUserQuestion` in Claude Code, `request_user_input` in Codex) and
**stop to wait for the answer** before continuing.

The "stop to wait" language removes the escape hatch. The examples help each platform's model select the right tool.

2. Auto-delete exemption for unambiguous cases

Phase 3 now defers to Phase 2's auto-delete criteria:

You are about to Delete a document **and** the evidence is not unambiguous
(see auto-delete criteria in Phase 2). When auto-delete criteria are met,
proceed without asking.

3. Smart triage for broad scope

When 9+ candidate docs are found, triage before asking:

Inventory — read frontmatter, group by module/component/category
Impact clustering — dense clusters of interconnected learnings + pattern docs are higher-impact than isolated docs
Spot-check drift — check whether primary referenced files still exist
Recommend — present the highest-impact cluster with rationale

Key insight: "code changed recently" is NOT a reliable staleness signal. Missing references in a high-impact cluster is the strongest signal.

4. Replacement subagents instead of ce:compound handoff

By the time a Replace is identified, Phase 1 investigation has already gathered the evidence that ce:compound would research:

The old learning's claims
What the current code actually does
Where and why the drift occurred

A replacement subagent writes the successor directly using ce:compound's document format (frontmatter, problem, root cause, solution, prevention). Run sequentially — one at a time — because each may read significant code.

When evidence is insufficient (e.g., entire subsystem replaced, new architecture too complex to understand from investigation alone), mark as stale and recommend ce:compound after the user's next encounter with that area.

5. Dedicated file tools over shell commands

Added to subagent strategy:

Subagents should use dedicated file search and read tools for investigation —
not shell commands. This avoids unnecessary permission prompts and is more
reliable across platforms.

6. Autofix mode for scheduled/unattended runs

Added mode:autofix argument support so the skill can run without user interaction (e.g., on a schedule, in CI, or when the user just wants a hands-off sweep).

Key design decisions:

Explicit opt-in only. mode:autofix must be in the arguments. Auto-detection based on tool availability was rejected because a user in an interactive agent without a question tool (e.g., Cursor, Windsurf) is still interactive — they just use plain-text replies.
Conservative confidence. Borderline cases that would get a user question in interactive mode get marked stale in autofix mode. Err toward stale-marking over incorrect action.
Detailed report as deliverable. Since no user was present, the output report includes full rationale for each action so a human can review after the fact.
Process everything. No scope narrowing questions — if no scope hint provided, process all docs. For broad scope, process clusters in impact order without asking.

Prevention

Skill review checklist additions

These five patterns should be checked during any skill review:

No hardcoded tool names — All tool references use capability-first language with platform examples and a plain-text fallback
No contradictory rules across phases — Trace each action type through all phases; verify absolute language ("always," "never") is not contradicted elsewhere
No blind user questions — Every question presented to the user is informed by evidence the agent gathered first
No unsatisfied cross-skill preconditions — Every skill handoff verifies the target skill's preconditions are met by the calling context
No shell commands for file operations in subagents — Subagent instructions explicitly prefer dedicated tools over shell commands
Autofix mode for long-running skills — Any skill that could run unattended should support an explicit opt-in mode with conservative confidence and detailed reporting

Key anti-patterns

Anti-pattern	Better pattern
"Use the AskUserQuestion tool when available"	"Use the platform's interactive question tool (e.g. AskUserQuestion in Claude Code, request_user_input in Codex)"
Defining auto-delete conditions, then "always ask before deleting"	Single-source-of-truth: define the rule once, reference it elsewhere
"Which area should we review?" before any investigation	Triage first, recommend with evidence, let user confirm or redirect
"Create a successor learning through ce:compound" during a refresh	Replacement subagent writes directly using gathered evidence
No tool guidance for subagents	"Use dedicated file search and read tools, not shell commands"
Auto-detecting "no question tool = headless"	Explicit `mode:autofix` argument — interactive agents without question tools are still interactive

Cross-References

PR #260: The PR containing all these improvements
Issue #204: Platform-agnostic tool references (AskUserQuestion dependency)
Issue #221: Motivating issue for maintenance at scale
PR #242: ce:audit (detection counterpart, closed)
PR #150: Established subagent context-isolation pattern

8.2 KiB Raw Permalink Blame History