Files
Bear-mirror/plan.md
Laszlo Nagy b45d85e32e docs(meta): plan documentation restructure for agent workflows
Add plan.md capturing an eight-phase plan to reorganise the repo so
Claude Code (and humans) can find the right rule, write to the right
place, and keep documentation in sync with code.

The plan introduces a `docs/` parent for requirements and rationale,
single-source-of-truth files for configuration and CLI surface, sync
checks to catch drift, and a preflight phase plus recovery appendix
to make phase-by-phase execution safe.

Each subsequent commit on this branch should execute one phase.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 12:16:31 +00:00

28 KiB

Plan: Align Bear with agent workflows

This plan reorganises the repo so that an LLM agent (or a human) can find the right rule, write to the right place, and not be misled by inconsistencies between sibling documents. It also adds two pieces that are currently missing: a persistent home for design rationale, and a single source of truth for configuration and CLI surface that cannot silently drift from the code.

Execute the tasks in phase order. Tasks within a phase can be done in any order unless a dependency is called out.

Guiding principle: three roles for documentation

The end state separates three roles, each with one home and one drift-control mechanism. No piece of information lives in two places.

Role Where Drift control
Contract -- what the user can expect docs/requirements/*.md Requirements: tag in tests, check-requirements-coverage.sh
Concrete user-facing surface -- config keys, CLI flags docs/configuration.md, generated man page check-config-coverage.sh, check-man-page.sh
Rationale -- why we chose this docs/rationale/NNNN-*.md Linked from the requirement(s) it supports

Requirements stay contract-only. Concrete keys and flags do not appear in requirement bodies; they live in the surface documents and the source code, which the sync checks keep aligned. Rationale documents are the only home for design reasoning that would previously have been crammed into a Notes section or lost in a PR thread.

This principle drives Phases 3, 4, and 5. The earlier inconsistency where some requirements had an ## Implementation details section and others did not is resolved by removing that section everywhere, not by codifying it.

Conventions for the executing agent

  • Follow CLAUDE.md pre-commit rules: cargo fmt --check, cargo clippy --all-targets -- -D warnings, cargo test must pass before any commit. Run them after each phase, not just at the end.
  • ASCII only in every file written or edited (no em dashes, no smart quotes, no unicode bullets).
  • Use git mv for renames so history follows. Never copy + delete.
  • Land each phase as its own commit, conventional-commit style: docs(meta): ..., chore(scripts): .... Branch name suggestion: agent-workflow-restructure.
  • After each phase, update the routing tables in CLAUDE.md files so the repo never sits in a half-renamed state.
  • Run one phase per agent invocation. Each phase is designed to produce a single coherent commit that a human can review before the next phase starts. Do not chain phases inside one run.
  • If a phase's instructions and the current repo state disagree (e.g. a file the plan says to move has already been moved), stop and report rather than guessing. Appendix C describes how to recover from common partial states.

Phase 0: Preflight checks

Intent

Catch problems before any change lands: a dirty working tree, a broken baseline, or an out-of-date branch will compound errors through subsequent phases. This phase makes no commits.

Changes

  1. Run git status --short. Working tree must be clean. If not, stop and ask the human.
  2. Run the existing pre-commit triple on the starting branch and record the result:
    • cargo fmt --check
    • cargo clippy --all-targets -- -D warnings
    • cargo test If any fails on the baseline, stop and report. The plan assumes a green baseline.
  3. Confirm the branch is agent-workflow-restructure (or another branch dedicated to this work). Never execute on master or on a release branch like 4.1.4-rc.
  4. Read CLAUDE.md (root), requirements/CLAUDE.md, and Appendix A and Appendix B at the bottom of this plan. These are the inputs the later phases assume you have in mind.

Acceptance

  • git status --short produces no output.
  • The pre-commit triple exits 0.
  • The current branch is the dedicated restructure branch.

Phase 1: Restructure the documentation tree

Intent

Today, requirements/ sits at the repo root and is the only place for project-level documents. The plan introduces docs/ as the parent for all human- and agent-readable documentation, with sibling folders for the two kinds of long-form content the repo needs: contracts (requirements) and rationale (decisions and research). Repo-hygiene scripts move into a sibling scripts/ folder so they are easy to discover and not confused with the documents they check.

Changes

  1. Create directory docs/.
  2. Rename requirements/ to docs/requirements/ using git mv. All existing .md files keep their names.
  3. Create directory docs/rationale/.
  4. Create directory scripts/.
  5. Move requirements/check-coverage.sh to scripts/check-requirements-coverage.sh. Update every literal requirements path inside the script:
    • Header comment "Run from the repo root: ./requirements/..." -> ./scripts/check-requirements-coverage.sh.
    • Existence guard ${repo_root}/requirements -> ${repo_root}/docs/requirements.
    • Error message that names the same path.
    • Glob ${repo_root}/requirements/*.md -> ${repo_root}/docs/requirements/*.md. The script_dir/.. computation still gives the repo root after the move, so no logic change is needed there.
  6. Add docs/README.md that lists what each subfolder contains, in two or three lines, and links to the man page under man/ for the user-facing CLI reference.

Acceptance

  • ls docs/ shows requirements/, rationale/, README.md (plus any files added in later phases).
  • ls scripts/ shows check-requirements-coverage.sh.
  • scripts/check-requirements-coverage.sh runs from the repo root with exit code 0 (or the same exit code as before the move).
  • requirements/ no longer exists at the repo root.

Phase 2: Define the rationale (ADR) system

Intent

Today there is no home for design rationale or vendor research that informed a decision but is not part of the user-visible contract. Notes sections of requirements become cramped when the rationale is substantial, and issue threads rot. Architecture Decision Records solve this with a small, stable format.

Changes

  1. Create docs/rationale/CLAUDE.md describing:
    • When to write a rationale document: when a design decision is non-obvious, when vendor research informed it, or when a future reader will ask "why didn't we do X instead?"
    • File naming: NNNN-short-kebab-case-title.md with a monotonically increasing four-digit number.
    • Required sections: Context, Decision, Consequences, References.
    • Linking: a rationale document links to the requirement(s) it supports. The requirement's Notes section links back.
  2. Create docs/rationale/_template.md with the four sections above and one-line guidance under each.
  3. Create docs/rationale/0001-document-restructure.md. This is the first rationale entry and records why this plan exists: the inconsistencies it resolves and the trade-offs chosen (e.g. ADRs over an in-requirement rationale dump). Link to the resulting structure.

Acceptance

  • docs/rationale/CLAUDE.md exists and answers the "when to write one" question in plain language.
  • docs/rationale/_template.md exists with the four sections.
  • docs/rationale/0001-document-restructure.md exists and follows the template.

Phase 3: Tighten the requirement template to contract-only

Intent

requirements/CLAUDE.md defines a requirement template, but the repo's actual practice has drifted: some requirements (e.g. output-path-format.md) carry an ## Implementation details section with literal YAML config keys, while others (e.g. output-compilation-entries.md) do not. With docs/configuration.md (Phase 4) and the generated man page (Phase 5) becoming the single sources of truth for concrete user-facing surface, the in-requirement Implementation details section becomes a duplicate place that can drift from those sources. The right resolution is to remove it everywhere, codify requirements as contract-only, and add an explicit ## Rationale section whose only job is to link to ADRs.

Changes

  1. Edit docs/requirements/CLAUDE.md (formerly requirements/CLAUDE.md):
    • State that requirements are contract-only: they describe what the user can expect, not where the bits live.
    • Forbid embedding literal config keys, CLI flag names, or schema fragments in a requirement body. Reference behaviour instead (e.g. "a configuration option toggles inlining"), and let docs/configuration.md name the key.
    • Remove any guidance that implies an ## Implementation details section should exist.
    • Add ## Rationale as an optional section, placed after ## Notes. Its body is a list of links to docs/rationale/NNNN-*.md documents that motivated the requirement. No prose beyond a one-line label per link.
  2. Audit the requirements listed in Appendix B (the exact set of files that currently contain ## Implementation details). Migrate the content of each section as follows. Because docs/configuration.md does not exist until Phase 4, the audit writes its findings to a staging file docs/configuration.draft.md that Phase 4 consumes and deletes:
    • Literal config keys, default values, YAML examples -> append to docs/configuration.draft.md with a header naming the source requirement, then remove from the requirement body.
    • Algorithm walkthroughs, error-handling tables, and other "how" content -> code comments next to the implementation, or delete if already obvious from the code.
    • Design choices, trade-offs, vendor research -> a new rationale document under docs/rationale/, linked from the requirement's ## Rationale section.
    • When the section is empty after migration, remove the heading.
    • Do not change acceptance criteria during this audit; the audit is structural only.
  3. Always create docs/configuration.draft.md even if every audited section ends up routed to rationale or code. An empty staging file with a single placeholder line is acceptable; Phase 4 needs a known file to consume.
  4. Worked example -- output-path-format.md (one of the entries in Appendix B). Its ## Implementation details opens with a YAML block describing format.paths.directory and format.paths.file. That YAML and the prose around it go to docs/configuration.draft.md. The subsequent "Strategy details" table describing how each strategy resolves paths is algorithm description; it goes to a code comment in the relevant module under bear/src/output/ (or is deleted if the code is already self-evident). The platform-constraint paragraph about Windows \\?\ prefix stripping is rationale tied to GitHub issue #683 and belongs in a new rationale document. After migration, the ## Implementation details heading is removed; the requirement body keeps only Intent, Acceptance criteria, Non-functional constraints, Testing, Notes, and the new ## Rationale link.

Acceptance

  • docs/requirements/CLAUDE.md states the contract-only rule and describes the ## Rationale section.
  • No file under docs/requirements/ contains an ## Implementation details section.
  • For each existing requirement touched, the commit message lists the destinations of migrated content (configuration entry, rationale document, or code comment).
  • cargo test and scripts/check-requirements-coverage.sh both pass.

Phase 4: Configuration reference and sync check

Intent

Bear's configuration schema lives in bear/src/config/types.rs. The public surface is currently documented (when it is documented at all) inside scattered Implementation details sections and the man page. There is no single document a user or agent can read to find every option, its type, its default, and the requirement that governs it. And nothing in the build catches a new option that is added without matching documentation. This phase establishes docs/configuration.md as the sole home for that surface, and migrates the YAML examples that Phase 3 strips out of requirement files into it.

Changes

  1. Create docs/configuration.md with one entry per public field of the configuration schema. Each entry contains:
    • Field path in dotted notation (e.g. format.paths.directory).
    • Type and accepted values.
    • Default value.
    • One-line description from the user's perspective.
    • Link to the governing requirement under docs/requirements/.
  2. Populate the document from two inputs:
    • The current state of bear/src/config/types.rs (the schema).
    • docs/configuration.draft.md, the staging file produced by the Phase 3 audit. Fold each entry into the appropriate field in docs/configuration.md, editing for consistency, then delete the staging file in the same commit. This is a one-time snapshot; the sync check below keeps it honest.
  3. Add scripts/check-config-coverage.sh. Scope it narrowly to keep the implementation tractable:
    • Walks bear/src/config/types.rs and collects only struct fields with a #[serde(rename = "...")] or implicit serde name, reachable from the top-level config struct.
    • For each collected field, looks for its dotted path in docs/configuration.md. Reports missing entries.
    • Out of scope for v1: enum variants (tagged enums like Intercept), elements of Vec<T> collections beyond their parent field name, and #[serde(flatten)] fields. Document these limitations at the top of the script and in docs/configuration.md. A future improvement can replace the shell script with a syn-based Rust binary that handles them fully; the script is sufficient to catch the common case of a newly added struct field that nobody documented.
  4. Wire the check into the pre-commit triple. Update CLAUDE.md (root) to list it alongside cargo fmt --check, cargo clippy --all-targets -- -D warnings, cargo test.
  5. Document the convention in docs/configuration.md itself: "Adding or removing a public config field requires updating this file in the same commit. The sync check enforces this."
  6. Smoke-test the script before committing: temporarily add a dummy struct field __plan_smoke: Option<()> to bear/src/config/types.rs, run scripts/check-config-coverage.sh, confirm it exits non-zero and names the dummy field. Revert the field with git checkout -- bear/src/config/types.rs. Do not commit the dummy.

Acceptance

  • docs/configuration.md exists and lists every option present in bear/src/config/types.rs at the time of writing.
  • docs/configuration.draft.md has been deleted; its contents are folded into docs/configuration.md.
  • scripts/check-config-coverage.sh exits 0 on the current tree.
  • CLAUDE.md lists the check as part of the mandatory pre-commit set.

(Drift detection can be verified manually by temporarily adding a dummy field to types.rs and re-running the check; do not commit the dummy field.)


Phase 5: CLI reference and sync with clap

Intent

bear/src/args.rs defines the CLI via clap derive macros. man/bear.1.md is the user-facing reference and man/bear.1 is its generated form. Today the man page is hand-written, and bear/CLAUDE.md:48 notes that clap_mangen generation is "not yet implemented". This is the same drift risk as the config file, with the same answer: generate from the source of truth and check that the generated artefact is up to date.

Changes

  1. Add clap_mangen as a dev-dependency to the bear crate.
  2. Add a small binary bear/src/bin/generate-man.rs that takes the clap Command from args.rs and writes its troff output to a temporary path (e.g. target/man/bear.1).
  3. Leave man/bear.1.md and man/bear.1 as the human-authored sources of truth. No literal inclusion of generated content into either file; this keeps splicing rules and pandoc behaviour untouched. The generator's role is to produce a reference artefact that the check script compares against.
  4. Add scripts/check-man-page.sh:
    • Runs the generator to produce the reference artefact.
    • Extracts the option list from both the reference artefact and man/bear.1.md (e.g. by collecting every long flag of the form --<name>).
    • Compares the two sets. Exits non-zero if either side has flags the other lacks.
    • Does not enforce ordering, surrounding prose, or formatting -- only flag-set parity.
  5. Document the workflow in man/CLAUDE.md: "After editing args.rs, update man/bear.1.md so its option list still matches what cargo run --bin generate-man produces. The sync check enforces parity."
  6. Add the check to the pre-commit set in root CLAUDE.md.
  7. Smoke-test the script before committing: temporarily add a dummy clap argument such as #[arg(long = "plan-smoke")] _smoke: bool to the top-level Args struct in bear/src/args.rs, run scripts/check-man-page.sh, confirm it exits non-zero and names --plan-smoke. Revert with git checkout -- bear/src/args.rs. Do not commit the dummy.

Acceptance

  • cargo run --bin generate-man produces output whose synopsis and option list match man/bear.1.md.
  • scripts/check-man-page.sh exits 0 on the current tree.
  • man/CLAUDE.md documents the regeneration step.

(Drift detection can be verified manually by temporarily adding a clap argument to args.rs and re-running the check; do not commit the change.)


Phase 6: Routing and scope updates

Intent

After Phases 1 through 5, several paths have changed and several new rules exist. The routing tables in every CLAUDE.md file must point at the new locations, and the always-loaded CLAUDE.md at the root must mention the new pre-commit checks. Subdirectory CLAUDE.md files should also surface the "write a requirement first for new features" rule, which currently lives only at the root.

Changes

  1. Update the routing table in root CLAUDE.md:

    • Add rows for docs/configuration.md, docs/rationale/, docs/requirements/, scripts/.
    • Update existing rows that pointed at requirements/.
    • Add a row for "Add or change a CLI flag" pointing at both bear/src/args.rs and man/CLAUDE.md.
  2. Update every CLAUDE.md listed in Appendix A so its references to requirements/ now read docs/requirements/. Verify by running grep -rn "requirements/" --include='CLAUDE.md' . from the repo root after the edits; the only hits should be the new docs/requirements/ path.

  3. Add a one-line reminder near the top of each subdirectory CLAUDE.md that touches user-visible behaviour: "New features require a requirement under docs/requirements/ before implementation. See root CLAUDE.md Decision protocol." Apply this at minimum to bear/CLAUDE.md, intercept-preload/CLAUDE.md, and bear/interpreters/CLAUDE.md.

  4. Update the pre-commit section of root CLAUDE.md to list:

    • cargo fmt --check
    • cargo clippy --all-targets -- -D warnings
    • cargo test
    • scripts/check-requirements-coverage.sh
    • scripts/check-config-coverage.sh
    • scripts/check-man-page.sh

    These checks are advisory: they are run by humans or agents before committing, not enforced by a .git/hooks/pre-commit gate. The CLAUDE.md listing is the contract; a future phase can add a composite script or CI gate if drift becomes a problem in practice.

Acceptance

  • No CLAUDE.md file references the old requirements/ path.
  • The root CLAUDE.md routing table includes entries for every document and script added in Phases 1 through 5.
  • Every subdirectory CLAUDE.md whose code can introduce a new feature points back at the Decision protocol.
  • Running all six pre-commit checks from the root succeeds.

Phase 7: Back-fill the response-file inlining rationale

Intent

The 4.1.4-rc branch added output-response-file-inlining.md with a brief Notes section that gestures at vendor differences. The full research (GCC keeps @file literal on missing files, Clang errors, MSVC has no nesting, tokenization differs per family, nvcc uses --options-file instead) was gathered during the design discussion and currently lives only in this conversation. Capturing it as a rationale document validates the new docs/rationale/ folder with a realistic example and preserves the receipts for any future revisit.

This phase runs only after 4.1.4-rc has been merged or rebased onto the restructure branch, so the requirement file is in its final location.

Changes

  1. Create docs/rationale/0002-response-file-inlining-design.md, following the Phase 2 template.

  2. Context section -- include the following per-compiler matrix verbatim. The executing agent should not have to re-derive it.

    Compiler @file syntax Missing file Nested @file
    GCC @file, GNU quoting kept literal, no error recursive
    Clang / Apple Clang @file, GNU quoting error recursive
    clang-cl @file, Windows quoting error recursive
    MSVC (cl.exe) @file, Windows quoting error not supported
    flang / icx / armclang @file, LLVM-based error recursive
    nvcc uses --options-file/-optf instead n/a n/a
    IBM XL uses -qoptfile=file instead n/a n/a

    Tokenization differences: GCC/Clang-family use whitespace separators, single or double quotes, backslash-escape any character. MSVC family uses Windows command-line rules: only double quotes group, backslash escaping is positional.

    Cite sources: GCC manual ("Overall Options"), LLVM CommandLine docs, Microsoft "@ (Specify a Compiler Response File)", NVIDIA CUDA Compiler Driver NVCC.

  3. Decision section -- record the four chosen behaviours and the one-sentence reason each diverges from at least one compiler's own behaviour:

    • Opt-in by default: preserves the current contract documented in output-compilation-entries.
    • Warn-and-keep on missing file: matches GCC, deliberately diverges from Clang/MSVC so a stale build artefact does not fail the whole database.
    • Always recurse: matches GCC/Clang, deliberately exceeds MSVC's non-nesting limit because a successful build had already resolved any nesting.
    • Per-family tokenization: honours the only genuinely non-normalisable difference between compilers.
  4. Consequences section -- nvcc --options-file and IBM XL -qoptfile are out of scope for v1; if a user requests them later, add new acceptance criteria to the requirement rather than re-opening this rationale.

  5. References section -- link issue #701, the requirement file docs/requirements/output-response-file-inlining.md, and the four compiler documentation pages cited in step 2. Do not name any specific source file or test in the Bear codebase -- those paths may change before this rationale is read.

  6. Edit the Notes section of docs/requirements/output-response-file-inlining.md to link to the rationale document and remove text duplicated by it.

Acceptance

  • docs/rationale/0002-response-file-inlining-design.md exists and follows the template from Phase 2.
  • docs/requirements/output-response-file-inlining.md links to it from its Notes section.
  • scripts/check-requirements-coverage.sh still passes.

Phase 8: Decommission this plan

Intent

plan.md is a one-time execution document. Once the phases are complete, leaving it at the root invites confusion about whether it is still authoritative. Either preserve it as a rationale entry or remove it.

Changes

  1. If the plan's narrative still has historical value, move it to docs/rationale/0003-documentation-restructure-plan.md and edit the heading to past tense ("How Bear's documentation was restructured"). Otherwise, delete plan.md.
  2. Update any branch description or PR body that referenced plan.md at the root.

Acceptance

  • plan.md no longer exists at the repo root.
  • Either it has been archived under docs/rationale/ with a meaningful number, or it has been deleted and the commit message records the deletion.

Dependency summary

  • Phase 0 runs first, before anything else. No commit; aborts if the baseline is dirty or red.
  • Phases 1 and 2 are independent and can run in either order, but Phase 1 first is cleaner because Phase 2 writes into directories Phase 1 creates.
  • Phase 3 depends on Phase 1 (paths) and Phase 2 (rationale folder exists, so migrated content has a destination and the new ## Rationale section in the template can link somewhere).
  • Phase 4 depends on Phase 3 (the audit's migrated config content feeds into docs/configuration.md).
  • Phase 5 is independent of Phase 4 and depends only on Phases 1 and 3.
  • Phase 6 depends on Phases 1 through 5 being landed.
  • Phase 7 depends on Phase 2 and on 4.1.4-rc being merged.
  • Phase 8 is last.

Out of scope for this plan

  • Changing the build pipeline beyond adding clap_mangen and the three check scripts.
  • Splitting the bear crate, renaming any source modules, or changing the workspace layout.
  • Introducing a documentation site generator (mdBook, Antora, etc.). Plain Markdown under docs/ is sufficient for now; a site generator can be a future rationale entry of its own.

Appendix A: CLAUDE.md inventory

These are the CLAUDE.md files that exist in the repo at the time this plan was written. Phase 6 step 2 must update every one whose content references requirements/. Phase 6 step 3 surfaces the Decision protocol in the subset marked with a *.

  • CLAUDE.md (root)
  • bear/CLAUDE.md *
  • bear/interpreters/CLAUDE.md *
  • bear-codegen/CLAUDE.md
  • bear-completions/CLAUDE.md
  • intercept-preload/CLAUDE.md *
  • integration-tests/CLAUDE.md
  • man/CLAUDE.md
  • platform-checks/CLAUDE.md
  • requirements/CLAUDE.md (becomes docs/requirements/CLAUDE.md in Phase 1)

If find . -name CLAUDE.md -not -path './target/*' -not -path './.git/*' returns a file not in this list, the plan is out of date with the repo: stop and report rather than guessing what the new file should say.

Appendix B: Requirements with Implementation details sections

These are the exact files Phase 3 step 2 must audit. They were identified by grep -l '^## Implementation details' requirements/*.md at the time this plan was written. Any other requirement file is out of scope for migration.

  1. requirements/output-json-compilation-database.md
  2. requirements/output-path-format.md
  3. requirements/output-source-directory-filter.md
  4. requirements/interception-compiler-env-with-flags.md
  5. requirements/interception-preload-mechanism.md
  6. requirements/interception-wrapper-mechanism.md
  7. requirements/interception-wrapper-recursion.md
  8. requirements/output-append.md
  9. requirements/output-atomic-write.md
  10. requirements/output-duplicate-detection.md

(After Phase 1, these paths begin with docs/requirements/.)

If the audit finds a ## Implementation details heading in a requirement file not on this list, the plan is out of date: stop and report. If a file on this list no longer has the heading by the time Phase 3 runs, skip it silently and note this in the commit message.

Appendix C: Recovery from partial states

If a phase aborts partway, the safe action is to revert the working tree and start the phase over rather than try to patch forward. Specific recoveries:

  • Phase 1 partly applied (some files moved, some not): run git status --short; if both old and new paths exist for the same file, git restore the affected files and re-run Phase 1 from scratch.
  • Phase 3 produced docs/configuration.draft.md but Phase 4 did not consume it: the file is harmless to keep across invocations. Phase 4 step 2 will pick it up. Do not delete it manually.
  • A check script (Phase 4 or 5) returns non-zero on the smoke test but no dummy field is present: the script itself has a bug. Fix the script in the same commit; do not commit a passing smoke-test if the underlying detection is wrong.
  • Routing edits in Phase 6 left a stale requirements/ path in some CLAUDE.md: re-run the grep verifier from Phase 6 step 2 and patch the remaining file in a follow-up commit.
  • Pre-commit triple fails after a phase commit: revert the commit (git reset --hard HEAD~1 is acceptable on the dedicated restructure branch), fix the failure, re-commit. Do not chain phases on top of a red baseline.