`cc` and `c++` are GCC on most Linuxes but Clang on FreeBSD,
OpenBSD, NetBSD, DragonFly, and macOS. The regex defaulted them
to GCC, which corrupted the compilation database on those hosts
via wrong flag-arity tables (e.g. Clang's `-Xclang <arg>`
consumes the next argv slot, GCC's does not).
Recognition now runs `--version` lazily for these ambiguous
basenames, classifies by signature, and dispatches accordingly.
The probe is the sole classifier: gcc.yaml deliberately omits
`cc`/`c++`, so a failed probe returns NotRecognized rather than
guessing -- a missing entry is visible and debuggable, whereas a
wrongly-classified entry corrupts the database silently via
mismatched flag-arity tables (the bug this work exists to fix).
Layered design:
- CompilerRecognizer dispatches.
- CompilerProbe classifies. VersionProbe on Unix (hardened:
closed stdin, process-group SIGKILL on timeout, LD_PRELOAD /
DYLD_INSERT_LIBRARIES stripped); NoProbe on Windows where
basenames are unambiguous and the Unix subprocess primitives
the probe relies on aren't available.
- CachingProbe memoizes the probe's verdict per canonical path
so each unique compiler is fork-exec'd at most once per
process.
A user `compilers:` entry preempts the probe -- the sole
supported override.
Also simplifies the WrapperInterpreter that the probe work
exposed: replaces the cyclic Arc::new_cyclic +
OnceLock<Box<dyn Interpreter>> + Weak<dyn Interpreter> machinery
with a flat wrapper::unwrap() helper called inline from
CompilerInterpreter::recognize.
See requirements/recognition-ambiguous-name-probe.md for the spec.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Item #5 consolidated platform-checks but left the build pipeline
under-documented: contributors had to read three build.rs files
to find out who runs what, and the lld prerequisite was a silent
trap. This commit spreads that knowledge across CLAUDE.md files,
co-located with the crates each piece belongs to:
- New platform-checks/CLAUDE.md: role, post-Item-#5 public API,
recipe for adding a probe, scope boundary against the
intercept-family list.
- New bear-codegen/CLAUDE.md: build-time codegen role; YAML to
OUT_DIR via include!(); snapshot-test contract.
- New bear-completions/CLAUDE.md: why a separate crate
(clap_complete cost), how it's actually invoked (distributor
runs it; install.sh only picks up pre-generated files).
- Top-level CLAUDE.md: short "Build pipeline" routing section
pointing at the per-crate files; "Host requirements" calling
out lld as a Linux-only prerequisite; routing table grew
three entries.
- bear/CLAUDE.md: replaced the narrow "Code generation"
subsection with a "Build script" section that also covers
INTERCEPT_LIBDIR validation and the rustc-env emissions
consumed by installation.rs.
- intercept-preload/CLAUDE.md: "Build script duties" listing
the cc-shim build, exports list, and link directives, plus a
pointer at src/c/shim.c as the source of truth for
INTERCEPT_FAMILY.
- integration-tests/CLAUDE.md: "Build script duties" describing
the executable probes (single vs grouped cfgs) and the
ccache-masquerade detection.
Side cleanup: dropped the dead `cargo:rustc-cfg=build_cdylib`
directive from intercept-preload/build.rs. It was emitted but
read by no source -- the existing comment claiming it forced
cdylib generation was misleading; cdylib production is decided
by Cargo.toml's crate-type, not by the cfg.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bear-codegen's identifier check is replaced with a hand-rolled
validator (the only regex usage was a 4-char-class pattern).
bear's compiler recognizer switches from regex to regex-lite.
This drops regex, regex-syntax, regex-automata, and aho-corasick
(~6.3s combined) from production builds; regex-lite (~0.5s)
takes their place.
The regex-lite engine is pure NFA, no DFA/SIMD optimizations,
but the recognizer runs in the post-build semantic pass on
short anchored ASCII inputs, not on the LD_PRELOAD per-syscall
hot path. If profiling ever flags it, memoize by filename.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bump three constraints in the workspace table:
- ctor 0.4 -> 0.6 to align with the version Fedora is staging in
rust-ctor PR #2, removing the need for a downstream relaxation
patch.
- signal-hook 0.3 -> 0.4 and serde-saphyr 0.0.22 -> 0.0.24 to
pick up upstream patch fixes.
None require source changes.
Raise the insta floor to 1.46 and proptest to 1.11 so the manifest
stops advertising stale minimums; both stay at-or-below what Fedora
rawhide ships, so no new substitution constraints are imposed on
packagers.
Hoist the four remaining inline dev-deps (mockall, insta, proptest,
encoding_rs) into [workspace.dependencies] so every dependency
version lives in one place.
cl.exe accepts a warning-number argument either glued (/wd4995) or as
a separate token (/wd 4995); the latter form is widely used in nmake
Makefiles. Three gaps on the current 4.1.2-rc tip:
1. `/wd*`, `/we*`, `/wo*` used a plain prefix pattern that matches the
glued form only. When bear saw "/wd 4995", the flag consumed zero
extra args and the trailing numeric token was reclassified as a
Source and dropped from compile_commands.json. clangd then emitted
drv_invalid_int_value for every translation unit.
2. `/w1nnnn`, `/w2nnnn`, `/w3nnnn`, `/w4nnnn` (set warning level for a
specific warning) were not defined at all, so `/w1 4326` was split
into an unknown flag plus an orphan numeric token.
3. `/Wv[:version]` was not defined. Both the bare `/Wv` form (cl uses
the current compiler version when omitted) and `/Wv:17` were
affected.
All three classes are documented on the MS warning-level options page:
https://learn.microsoft.com/en-us/cpp/build/reference/compiler-option-warning-level
Fix:
* /wd*, /we*, /wo* -> /wd{ }*, /we{ }*, /wo{ }*
(ExactlyWithGluedOrSep, matching /D, /I, /U, /FI).
* Add /w1{ }*, /w2{ }*, /w3{ }*, /w4{ }*.
* Add /Wv (exact) plus /Wv:* (ExactlyWithColon, required value).
clang_cl.yaml inherits the fix via `extends: msvc`. Codegen snapshot
fixtures updated accordingly.
Two integration tests in integration-tests/tests/cases/semantic.rs
cover all three classes.
Manually verified by scc-tw <scc@scc.tw>.
Closes: #690
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every fallible function in bear-codegen now returns anyhow::Result
instead of panicking. Error context chains give YAML authors readable
messages when something is wrong:
error: flag codegen failed
caused by: generating flags for gcc.yaml
caused by: flag '-c'
caused by: unknown result value 'bogus_value'
Changes:
- result_to_rust returns Result instead of panicking
- EnvEntry::validate returns anyhow::Result (was Result<_, String>)
- EnvMappingYaml::to_rust returns Result instead of unreachable!()
- resolve_flags returns anyhow::Result (was Result<_, String>)
- ResolvedTable::new and generate return Result
- generate() returns Result; build.rs prints the error chain on failure
- load_tables returns Result
- All callers use .context() / .with_context() for layered messages
- Tests assert on .is_err() and error message content
- Removed EnvEntry::validate yaml_file parameter (context added by caller)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
resolve_flags now deduplicates inherited flags by (pattern, count):
- Same pattern+count with same result: silently deduped (child's kept)
- Same pattern+count with different result: build error
This removes 11 redundant flag entries across the real YAML files:
- clang.yaml had 5 flags duplicating gcc.yaml
- cuda.yaml had 6 flags duplicating gcc.yaml
All duplicates had identical results - no conflicts in current data.
If someone adds a flag to a child YAML that contradicts the parent's
result for the same pattern, the build now fails with a clear message
naming the pattern and both results.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
flags, ignore_when, and slash_prefix only resolved one level of the
extends chain, while environment was fully transitive. This meant
ibm_xl (extends clang extends gcc) got gcc's environment variables
but not gcc's flags - a silent data loss bug.
All four resolve functions now walk the full extends chain with a
visited set to prevent cycles:
- resolve_flags: own + parent + grandparent (concatenated)
- resolve_ignore_when: own overrides inherited per-field, transitive
- resolve_slash_prefix: first explicit value in chain wins, transitive
- resolve_environment: already correct, unchanged
Affected compilers (those with 2+ level extends chains):
- ibm_xl (extends clang extends gcc) - gains gcc flags
- armclang (extends clang extends gcc) - gains gcc flags
- intel_cc (extends clang extends gcc) - gains gcc flags
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Option A: move free functions to methods on their natural types
- FlagMatch::name_len() (was codegen::flag_name_len)
- EnvEntry::validate() (was codegen::validate_env_entry)
- EnvMappingYaml::to_rust() (was codegen::env_mapping_to_rust)
codegen.rs now contains only the two string-to-string converters
(pattern_to_rust, result_to_rust) that operate on raw YAML strings.
Option B: introduce ResolvedTable struct that encapsulates inheritance
resolution and code generation for a single compiler table.
- ResolvedTable::new() merges flags, ignore_when, slash_prefix, env
- ResolvedTable::generate() produces the complete output file
- generate() in lib.rs becomes a thin loop over TABLES
- Snapshot tests reduce from 20-line helper to one-liner
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 2 of the bear-codegen extraction:
- Add insta snapshot tests for all 14 generated files (12 flag tables +
recognition.rs + env_keys.rs). Any change in YAML or codegen logic
is caught as a snapshot diff.
- Add YAML schema validation tests: extends references, result strings,
pattern codegen, env entry validation, circular extends detection,
and structural invariants (typed tables have recognition entries,
all tables have flags).
- Add proptest property-based tests for pattern_to_rust and flag_name_len:
suffix-to-variant mapping, count-dependent behavior, and bounds.
- Refactor validate_env_entry to return Result<(), String> instead of
panicking, so tests get structured error messages. Build path still
unwraps (panics on invalid YAML).
- Refactor generate_recognition_patterns and generate_env_keys to return
strings instead of writing files, enabling direct snapshot testing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move the `mod flags` code from `bear/build.rs` into a dedicated
`bear-codegen` workspace crate. The crate exposes a single
`generate(flags_dir, out_dir)` entry point that `bear/build.rs`
now calls as a thin shim. Internal modules: yaml_types, codegen,
resolve, recognition, env_keys, tables.
Includes unit tests for pattern_to_rust, result_to_rust, flag_name_len,
resolve_environment, resolve_ignore_when, resolve_slash_prefix,
validate_env_entry, and a full integration test against real YAML files.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>