fix(ce-plan): add answer-seeking disposition to universal planning

ce-plan bailed out of the workflow on investigative or analytical
questions, treating them as non-planning tasks and answering ad hoc
(or, for software-topic questions, over-planning them into a plan
document the user never asked for). Both contradicted the skill's own
"when directly invoked, always plan" rule.

Add an answer-seeking disposition to the universal-planning path: when
the deliverable is an answer rather than a saved plan, ce-plan states a
brief, right-sized plan-of-attack as working scaffold, executes it
(research and synthesis only, never code), and delivers the answer in
chat with no plan file. Plan-seeking tasks (trips, study plans, events)
still produce their saved plan as before; a cross-contamination guard
keeps the two flows from bleeding into each other.

- Narrow the universal-planning exit gate to genuinely trivial
  single-fact lookups; multi-step, retrieval, or synthesis questions no
  longer bail.
- Route by task-type, not topic, in Phase 0.1b: a question that merely
  references code, repos, or APIs is answer-seeking, not software work,
  while build/modify/refactor requests stay on the implementation path.
- Add veil-of-value guidance so answers surface question-domain
  thinking and hide skill machinery, while never hiding caveats that
  affect trust in the answer.
- Sync the ce-plan skill doc with the two-dispositions mechanic.
This commit is contained in:
Trevin Chow
2026-05-31 10:21:48 -07:00
parent 85987d496f
commit 8351c4daaf
3 changed files with 58 additions and 3 deletions
+2
View File
@@ -94,6 +94,8 @@ Phase 1 always runs local research in parallel — repo-research-analyst (techno
The guardrails-not-choreography frame transfers cleanly across domains. Real (non-hypothetical) uses include annual hot-water-tank maintenance, study plans, trip planning, research workflows, and event planning. The non-software path skips the software-specific confidence check, but U-IDs, dependency ordering, scope boundaries, test/verification scenarios, and the right-sized template all carry over unchanged.
Universal planning also distinguishes two **dispositions**. *Plan-seeking* tasks (trip, study curriculum, event) produce a saved plan — the artifact is the deliverable. *Answer-seeking* tasks are investigative or analytical questions ("how often does X happen — is it a big deal?", "how does our approach compare to Y?") where the *answer* is the deliverable and no one wants a plan file. For those, `ce-plan` doesn't bail and doesn't write a document: it states a brief, right-sized plan-of-attack in chat — working scaffold that both steers the agent and shows the human the approach — then executes it (research and synthesis, never code) and delivers the answer. The plan is spoken in the language of the question, not the language of the skill; internal machinery stays hidden while caveats that affect trust in the answer are always surfaced. Only genuinely trivial single-fact lookups skip planning entirely and get answered outright.
---
## Quick Example
@@ -109,7 +109,9 @@ If the plan already has a `deepened: YYYY-MM-DD` frontmatter field and there is
#### 0.1b Classify Task Domain
If the task involves building, modifying, or architecting software (references code, repos, APIs, databases, or asks to build/modify/deploy), continue to Phase 0.2.
If the task asks to build, modify, refactor, deploy, or architect software (code, schemas, infrastructure), continue to Phase 0.2.
Classify by task-type, not topic. A request that merely *references* code, a repo, an API, or a database is not automatically software work: building or modifying code is software; investigating or analyzing it is an answer-seeking question. "How often does X star repos — is it a big deal?" or "how does our approach compare to Y?" route to `references/universal-planning.md` (answer-seeking), not the implementation-plan path.
If the domain is genuinely ambiguous (e.g., "plan a migration" with no other context), ask the user before routing.
@@ -7,10 +7,61 @@ This file is loaded when ce-plan detects a non-software task (Phase 0.1b). It re
The detection stub in SKILL.md routes here for anything that isn't clearly software. Verify the classification is correct before proceeding:
- **Is this actually a software task?** The key distinction is task-type, not topic-domain. A study guide about Rust is non-software (producing educational content). A Rust library refactor is software (modifying code). If this is actually software, return to Phase 0.2 in the main SKILL.md.
- **Is this a quick-help request, not a planning task?** Error messages, factual questions, and single-step tasks don't need a plan. Respond directly and exit. Examples: "zsh: command not found: brew", "what's the capital of France."
- **Is this a trivial single-fact lookup?** Only a question answerable from one fact with no research, retrieval, or judgment skips planning — answer it directly and stop, in the user's terms. Do not narrate that it "isn't a planning task" or explain the routing; that is process exhaust (see Veil of value below). Examples: "zsh: command not found: brew", "what's the capital of France." A question that needs multiple steps, any retrieval, or synthesis to answer well does **not** qualify: it is an answer-seeking task (see Disposition below), not a quick-help exit. When unsure, do not exit.
- **Pipeline mode?** If invoked from LFG or any `disable-model-invocation` context: output "This is a non-software task. The LFG pipeline requires ce-work, which only supports software tasks. Use `/ce-plan` directly for non-software planning." and stop.
Once past these checks, commit to producing a plan. Do not exit because the task looks like a "lookup" or "research question" — the user invoked `ce-plan` because they want a structured output.
Once past these checks, commit to the task — do not bail because it looks like a "lookup" or "research question." The user invoked the planning tool on purpose. Then choose the disposition below.
---
## Disposition: plan-seeking vs. answer-seeking
Two kinds of task land here, with different deliverables:
- **Plan-seeking** — the deliverable is a *plan*: a trip itinerary, a study curriculum, an event runbook, a project plan. The plan is the artifact, saved or shared. → Follow Steps 1-3 below.
- **Answer-seeking** — the deliverable is an *answer*: an investigative or analytical question ("how often does X happen — is it a big deal?", "how does our approach compare to Y?", "should we Z?"). No one wants a saved plan document for this; planning is the means to a good answer, not the output. → Follow the **Answer-seeking flow** below; skip the Step 3 artifact handling.
If a request blends both ("research X, then plan Y"), do the answer-seeking research first, then produce the plan artifact.
Commit to one disposition before reading further, and follow only that flow: a plan-seeking task still produces its plan document (Steps 1-3) and does not stop at a chat answer; an answer-seeking task does not write a plan file.
---
## Answer-seeking flow
The planning instinct still applies — but the plan is *working scaffold*, not an artifact. State it in chat to steer the work and show the human the approach; execute it; discard it. No plan file is written.
### State a brief plan-of-attack, then proceed
Say how the question will be answered, right-sized to it: a light question gets a one-line approach; a multi-part analytical question gets a short bulleted plan (a few steps). This is **non-blocking** — announce the approach and continue immediately. Do not ask the user to approve the plan; the stated approach is itself the checkpoint, and the user can interrupt if the framing is wrong. Stop to ask only on a genuine fork the agent cannot resolve (e.g., "his personal account or the org's?").
### Execute the plan
Carry out the approach. When the answer depends on facts the model can't reliably supply from memory — current data, recent events, specifics that drift — gather them using the **Research decomposition pattern** under Step 1 below (decompose into focused questions, dispatch in parallel via the platform's subagent/web primitive, collate). Skip research for anything the model already knows well.
**Execution here is research and analysis only — never code.** Answering a question by gathering and synthesizing information is in-bounds; writing or running code to change the system is not — that belongs in `ce-work`. This keeps the planning/execution boundary intact.
### Deliver the answer
Answer in chat. Do **not** write a plan file and do **not** run the Step 3 save/share menu by default. If the investigation produced something the user might want to keep (a comparison table, a sourced summary), offer to save it; otherwise just give the answer. In headless or non-interactive runs, skip the offer and deliver the answer.
### Veil of value: what to surface, what to hide
The plan-of-attack and the answer are for the caller; the skill's internal machinery is not. Edit for relevance the way an expert consultant does — they tell you their thinking about your problem, not which template their back office applied.
- **Surface** (question-domain — reads as value): the approach to the user's actual question, in the user's terms.
- **Hide** (skill-domain — process exhaust): which skill, mode, or phase is running; whether a plan file was or wasn't written; the routing or disposition decision itself.
- **Never hide** (audit content — affects trust in the answer): caveats, gaps, and uncertainty. "I could only pull his last ~100 stars, so this is partial" or "this is my read, not a hard signal" is not junk — it is what a good assistant surfaces. The veil hides plumbing, never the limits of the answer.
Register example, for "how often does he star things — is this a big deal?":
> Wrong: "Quick note first: /ce-plan builds implementation plans, so I ignored the template and just answered the question. Here's what the data says..."
Leaks the skill's name, narrates an internal routing decision, apologizes for deviating — the caller sees the seams of the tool.
> Right: "Let me size this up — I'll check how active a starrer he is overall, his recent cadence, and the kinds of repos he tends to star, then weigh where this one lands. [gathers data] Yes, this is a real signal: ..."
Same underlying process; none of the machinery surfaces. The caller sees thinking about their question.
---