Bear-mirror

mirror of https://github.com/rizsotto/Bear.git synced 2026-05-28 00:20:45 +02:00

Author	SHA1	Message	Date
Laszlo Nagy	fc7eb4ad91	recognition: probe cc/c++ to pick clang vs gcc on BSD/macOS hosts `cc` and `c++` are GCC on most Linuxes but Clang on FreeBSD, OpenBSD, NetBSD, DragonFly, and macOS. The regex defaulted them to GCC, which corrupted the compilation database on those hosts via wrong flag-arity tables (e.g. Clang's `-Xclang <arg>` consumes the next argv slot, GCC's does not). Recognition now runs `--version` lazily for these ambiguous basenames, classifies by signature, and dispatches accordingly. The probe is the sole classifier: gcc.yaml deliberately omits `cc`/`c++`, so a failed probe returns NotRecognized rather than guessing -- a missing entry is visible and debuggable, whereas a wrongly-classified entry corrupts the database silently via mismatched flag-arity tables (the bug this work exists to fix). Layered design: - CompilerRecognizer dispatches. - CompilerProbe classifies. VersionProbe on Unix (hardened: closed stdin, process-group SIGKILL on timeout, LD_PRELOAD / DYLD_INSERT_LIBRARIES stripped); NoProbe on Windows where basenames are unambiguous and the Unix subprocess primitives the probe relies on aren't available. - CachingProbe memoizes the probe's verdict per canonical path so each unique compiler is fork-exec'd at most once per process. A user `compilers:` entry preempts the probe -- the sole supported override. Also simplifies the WrapperInterpreter that the probe work exposed: replaces the cyclic Arc::new_cyclic + OnceLock<Box<dyn Interpreter>> + Weak<dyn Interpreter> machinery with a flat wrapper::unwrap() helper called inline from CompilerInterpreter::recognize. See requirements/recognition-ambiguous-name-probe.md for the spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 14:00:59 +00:00
Laszlo Nagy	a2fcd51857	docs: explain the build pipeline per crate Item #5 consolidated platform-checks but left the build pipeline under-documented: contributors had to read three build.rs files to find out who runs what, and the lld prerequisite was a silent trap. This commit spreads that knowledge across CLAUDE.md files, co-located with the crates each piece belongs to: - New platform-checks/CLAUDE.md: role, post-Item-#5 public API, recipe for adding a probe, scope boundary against the intercept-family list. - New bear-codegen/CLAUDE.md: build-time codegen role; YAML to OUT_DIR via include!(); snapshot-test contract. - New bear-completions/CLAUDE.md: why a separate crate (clap_complete cost), how it's actually invoked (distributor runs it; install.sh only picks up pre-generated files). - Top-level CLAUDE.md: short "Build pipeline" routing section pointing at the per-crate files; "Host requirements" calling out lld as a Linux-only prerequisite; routing table grew three entries. - bear/CLAUDE.md: replaced the narrow "Code generation" subsection with a "Build script" section that also covers INTERCEPT_LIBDIR validation and the rustc-env emissions consumed by installation.rs. - intercept-preload/CLAUDE.md: "Build script duties" listing the cc-shim build, exports list, and link directives, plus a pointer at src/c/shim.c as the source of truth for INTERCEPT_FAMILY. - integration-tests/CLAUDE.md: "Build script duties" describing the executable probes (single vs grouped cfgs) and the ccache-masquerade detection. Side cleanup: dropped the dead `cargo:rustc-cfg=build_cdylib` directive from intercept-preload/build.rs. It was emitted but read by no source -- the existing comment claiming it forced cdylib generation was misleading; cdylib production is decided by Cargo.toml's crate-type, not by the cfg. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 11:57:36 +00:00
Laszlo Nagy	3c9ef6985c	build: drop regex crate in favor of regex-lite bear-codegen's identifier check is replaced with a hand-rolled validator (the only regex usage was a 4-char-class pattern). bear's compiler recognizer switches from regex to regex-lite. This drops regex, regex-syntax, regex-automata, and aho-corasick (~6.3s combined) from production builds; regex-lite (~0.5s) takes their place. The regex-lite engine is pure NFA, no DFA/SIMD optimizations, but the recognizer runs in the post-build semantic pass on short anchored ASCII inputs, not on the LD_PRELOAD per-syscall hot path. If profiling ever flags it, memoize by filename. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 10:12:26 +00:00
Laszlo Nagy	9dedc88cfb	chore(deps): refresh dependency pins and consolidate into the workspace Bump three constraints in the workspace table: - ctor 0.4 -> 0.6 to align with the version Fedora is staging in rust-ctor PR #2, removing the need for a downstream relaxation patch. - signal-hook 0.3 -> 0.4 and serde-saphyr 0.0.22 -> 0.0.24 to pick up upstream patch fixes. None require source changes. Raise the insta floor to 1.46 and proptest to 1.11 so the manifest stops advertising stale minimums; both stay at-or-below what Fedora rawhide ships, so no new substitution constraints are imposed on packagers. Hoist the four remaining inline dev-deps (mockall, insta, proptest, encoding_rs) into [workspace.dependencies] so every dependency version lives in one place.	2026-04-24 07:50:54 +00:00
scc	9c04b2e1a5	fix(msvc): handle all per-warning cl.exe options cl.exe accepts a warning-number argument either glued (/wd4995) or as a separate token (/wd 4995); the latter form is widely used in nmake Makefiles. Three gaps on the current 4.1.2-rc tip: 1. `/wd`, `/we`, `/wo` used a plain prefix pattern that matches the glued form only. When bear saw "/wd 4995", the flag consumed zero extra args and the trailing numeric token was reclassified as a Source and dropped from compile_commands.json. clangd then emitted drv_invalid_int_value for every translation unit. 2. `/w1nnnn`, `/w2nnnn`, `/w3nnnn`, `/w4nnnn` (set warning level for a specific warning) were not defined at all, so `/w1 4326` was split into an unknown flag plus an orphan numeric token. 3. `/Wv[:version]` was not defined. Both the bare `/Wv` form (cl uses the current compiler version when omitted) and `/Wv:17` were affected. All three classes are documented on the MS warning-level options page: https://learn.microsoft.com/en-us/cpp/build/reference/compiler-option-warning-level Fix: /wd, /we, /wo* -> /wd{ }, /we{ }, /wo{ }* (ExactlyWithGluedOrSep, matching /D, /I, /U, /FI). * Add /w1{ }, /w2{ }, /w3{ }, /w4{ }. * Add /Wv (exact) plus /Wv:* (ExactlyWithColon, required value). clang_cl.yaml inherits the fix via `extends: msvc`. Codegen snapshot fixtures updated accordingly. Two integration tests in integration-tests/tests/cases/semantic.rs cover all three classes. Manually verified by scc-tw <scc@scc.tw>. Closes: #690 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 20:49:57 +10:00
Laszlo Nagy	3ebfea41f5	fix(lint): use sort_by_key instead of sort_by in bear-codegen Fixes clippy::unnecessary_sort_by warning on Rust 1.95+. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-19 06:38:02 +00:00
Laszlo Nagy	c21b8d5901	replace panics with anyhow error handling Every fallible function in bear-codegen now returns anyhow::Result instead of panicking. Error context chains give YAML authors readable messages when something is wrong: error: flag codegen failed caused by: generating flags for gcc.yaml caused by: flag '-c' caused by: unknown result value 'bogus_value' Changes: - result_to_rust returns Result instead of panicking - EnvEntry::validate returns anyhow::Result (was Result<_, String>) - EnvMappingYaml::to_rust returns Result instead of unreachable!() - resolve_flags returns anyhow::Result (was Result<_, String>) - ResolvedTable::new and generate return Result - generate() returns Result; build.rs prints the error chain on failure - load_tables returns Result - All callers use .context() / .with_context() for layered messages - Tests assert on .is_err() and error message content - Removed EnvEntry::validate yaml_file parameter (context added by caller) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 10:14:22 +00:00
Laszlo Nagy	cf6662ddef	add flag dedup and conflict detection during inheritance resolve_flags now deduplicates inherited flags by (pattern, count): - Same pattern+count with same result: silently deduped (child's kept) - Same pattern+count with different result: build error This removes 11 redundant flag entries across the real YAML files: - clang.yaml had 5 flags duplicating gcc.yaml - cuda.yaml had 6 flags duplicating gcc.yaml All duplicates had identical results - no conflicts in current data. If someone adds a flag to a child YAML that contradicts the parent's result for the same pattern, the build now fails with a clear message naming the pattern and both results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 09:55:18 +00:00
Laszlo Nagy	ccf973a2c0	fix inconsistent inheritance: make all resolution transitive flags, ignore_when, and slash_prefix only resolved one level of the extends chain, while environment was fully transitive. This meant ibm_xl (extends clang extends gcc) got gcc's environment variables but not gcc's flags - a silent data loss bug. All four resolve functions now walk the full extends chain with a visited set to prevent cycles: - resolve_flags: own + parent + grandparent (concatenated) - resolve_ignore_when: own overrides inherited per-field, transitive - resolve_slash_prefix: first explicit value in chain wins, transitive - resolve_environment: already correct, unchanged Affected compilers (those with 2+ level extends chains): - ibm_xl (extends clang extends gcc) - gains gcc flags - armclang (extends clang extends gcc) - gains gcc flags - intel_cc (extends clang extends gcc) - gains gcc flags Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 09:43:50 +00:00
Laszlo Nagy	dea535751c	move methods to types and introduce ResolvedTable Option A: move free functions to methods on their natural types - FlagMatch::name_len() (was codegen::flag_name_len) - EnvEntry::validate() (was codegen::validate_env_entry) - EnvMappingYaml::to_rust() (was codegen::env_mapping_to_rust) codegen.rs now contains only the two string-to-string converters (pattern_to_rust, result_to_rust) that operate on raw YAML strings. Option B: introduce ResolvedTable struct that encapsulates inheritance resolution and code generation for a single compiler table. - ResolvedTable::new() merges flags, ignore_when, slash_prefix, env - ResolvedTable::generate() produces the complete output file - generate() in lib.rs becomes a thin loop over TABLES - Snapshot tests reduce from 20-line helper to one-liner Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 09:31:02 +00:00
Laszlo Nagy	b54ce0a953	add snapshot, schema validation, and property-based tests Phase 2 of the bear-codegen extraction: - Add insta snapshot tests for all 14 generated files (12 flag tables + recognition.rs + env_keys.rs). Any change in YAML or codegen logic is caught as a snapshot diff. - Add YAML schema validation tests: extends references, result strings, pattern codegen, env entry validation, circular extends detection, and structural invariants (typed tables have recognition entries, all tables have flags). - Add proptest property-based tests for pattern_to_rust and flag_name_len: suffix-to-variant mapping, count-dependent behavior, and bounds. - Refactor validate_env_entry to return Result<(), String> instead of panicking, so tests get structured error messages. Build path still unwraps (panics on invalid YAML). - Refactor generate_recognition_patterns and generate_env_keys to return strings instead of writing files, enabling direct snapshot testing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 09:14:23 +00:00
Laszlo Nagy	ec4c8ccbef	extract bear-codegen crate from build.rs flag generation Move the `mod flags` code from `bear/build.rs` into a dedicated `bear-codegen` workspace crate. The crate exposes a single `generate(flags_dir, out_dir)` entry point that `bear/build.rs` now calls as a thin shim. Internal modules: yaml_types, codegen, resolve, recognition, env_keys, tables. Includes unit tests for pattern_to_rust, result_to_rust, flag_name_len, resolve_environment, resolve_ignore_when, resolve_slash_prefix, validate_env_entry, and a full integration test against real YAML files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 08:59:47 +00:00

12 Commits