8 Commits

Author SHA1 Message Date
Laszlo Nagy fc7eb4ad91 recognition: probe cc/c++ to pick clang vs gcc on BSD/macOS hosts
`cc` and `c++` are GCC on most Linuxes but Clang on FreeBSD,
OpenBSD, NetBSD, DragonFly, and macOS. The regex defaulted them
to GCC, which corrupted the compilation database on those hosts
via wrong flag-arity tables (e.g. Clang's `-Xclang <arg>`
consumes the next argv slot, GCC's does not).

Recognition now runs `--version` lazily for these ambiguous
basenames, classifies by signature, and dispatches accordingly.
The probe is the sole classifier: gcc.yaml deliberately omits
`cc`/`c++`, so a failed probe returns NotRecognized rather than
guessing -- a missing entry is visible and debuggable, whereas a
wrongly-classified entry corrupts the database silently via
mismatched flag-arity tables (the bug this work exists to fix).

Layered design:
- CompilerRecognizer dispatches.
- CompilerProbe classifies. VersionProbe on Unix (hardened:
  closed stdin, process-group SIGKILL on timeout, LD_PRELOAD /
  DYLD_INSERT_LIBRARIES stripped); NoProbe on Windows where
  basenames are unambiguous and the Unix subprocess primitives
  the probe relies on aren't available.
- CachingProbe memoizes the probe's verdict per canonical path
  so each unique compiler is fork-exec'd at most once per
  process.

A user `compilers:` entry preempts the probe -- the sole
supported override.

Also simplifies the WrapperInterpreter that the probe work
exposed: replaces the cyclic Arc::new_cyclic +
OnceLock<Box<dyn Interpreter>> + Weak<dyn Interpreter> machinery
with a flat wrapper::unwrap() helper called inline from
CompilerInterpreter::recognize.

See requirements/recognition-ambiguous-name-probe.md for the spec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:00:59 +00:00
Laszlo Nagy 3c9ef6985c build: drop regex crate in favor of regex-lite
bear-codegen's identifier check is replaced with a hand-rolled
validator (the only regex usage was a 4-char-class pattern).
bear's compiler recognizer switches from regex to regex-lite.
This drops regex, regex-syntax, regex-automata, and aho-corasick
(~6.3s combined) from production builds; regex-lite (~0.5s)
takes their place.

The regex-lite engine is pure NFA, no DFA/SIMD optimizations,
but the recognizer runs in the post-build semantic pass on
short anchored ASCII inputs, not on the LD_PRELOAD per-syscall
hot path. If profiling ever flags it, memoize by filename.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 10:12:26 +00:00
scc 9c04b2e1a5 fix(msvc): handle all per-warning cl.exe options
cl.exe accepts a warning-number argument either glued (/wd4995) or as
a separate token (/wd 4995); the latter form is widely used in nmake
Makefiles. Three gaps on the current 4.1.2-rc tip:

1. `/wd*`, `/we*`, `/wo*` used a plain prefix pattern that matches the
   glued form only. When bear saw "/wd 4995", the flag consumed zero
   extra args and the trailing numeric token was reclassified as a
   Source and dropped from compile_commands.json. clangd then emitted
   drv_invalid_int_value for every translation unit.

2. `/w1nnnn`, `/w2nnnn`, `/w3nnnn`, `/w4nnnn` (set warning level for a
   specific warning) were not defined at all, so `/w1 4326` was split
   into an unknown flag plus an orphan numeric token.

3. `/Wv[:version]` was not defined. Both the bare `/Wv` form (cl uses
   the current compiler version when omitted) and `/Wv:17` were
   affected.

All three classes are documented on the MS warning-level options page:
https://learn.microsoft.com/en-us/cpp/build/reference/compiler-option-warning-level

Fix:
  * /wd*, /we*, /wo*  ->  /wd{ }*, /we{ }*, /wo{ }*
    (ExactlyWithGluedOrSep, matching /D, /I, /U, /FI).
  * Add /w1{ }*, /w2{ }*, /w3{ }*, /w4{ }*.
  * Add /Wv (exact) plus /Wv:* (ExactlyWithColon, required value).

clang_cl.yaml inherits the fix via `extends: msvc`. Codegen snapshot
fixtures updated accordingly.

Two integration tests in integration-tests/tests/cases/semantic.rs
cover all three classes.

Manually verified by scc-tw <scc@scc.tw>.
Closes: #690

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 20:49:57 +10:00
Laszlo Nagy c21b8d5901 replace panics with anyhow error handling
Every fallible function in bear-codegen now returns anyhow::Result
instead of panicking. Error context chains give YAML authors readable
messages when something is wrong:

  error: flag codegen failed
    caused by: generating flags for gcc.yaml
    caused by: flag '-c'
    caused by: unknown result value 'bogus_value'

Changes:
- result_to_rust returns Result instead of panicking
- EnvEntry::validate returns anyhow::Result (was Result<_, String>)
- EnvMappingYaml::to_rust returns Result instead of unreachable!()
- resolve_flags returns anyhow::Result (was Result<_, String>)
- ResolvedTable::new and generate return Result
- generate() returns Result; build.rs prints the error chain on failure
- load_tables returns Result
- All callers use .context() / .with_context() for layered messages
- Tests assert on .is_err() and error message content
- Removed EnvEntry::validate yaml_file parameter (context added by caller)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 10:14:22 +00:00
Laszlo Nagy cf6662ddef add flag dedup and conflict detection during inheritance
resolve_flags now deduplicates inherited flags by (pattern, count):
- Same pattern+count with same result: silently deduped (child's kept)
- Same pattern+count with different result: build error

This removes 11 redundant flag entries across the real YAML files:
- clang.yaml had 5 flags duplicating gcc.yaml
- cuda.yaml had 6 flags duplicating gcc.yaml
All duplicates had identical results - no conflicts in current data.

If someone adds a flag to a child YAML that contradicts the parent's
result for the same pattern, the build now fails with a clear message
naming the pattern and both results.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 09:55:18 +00:00
Laszlo Nagy ccf973a2c0 fix inconsistent inheritance: make all resolution transitive
flags, ignore_when, and slash_prefix only resolved one level of the
extends chain, while environment was fully transitive. This meant
ibm_xl (extends clang extends gcc) got gcc's environment variables
but not gcc's flags - a silent data loss bug.

All four resolve functions now walk the full extends chain with a
visited set to prevent cycles:

- resolve_flags: own + parent + grandparent (concatenated)
- resolve_ignore_when: own overrides inherited per-field, transitive
- resolve_slash_prefix: first explicit value in chain wins, transitive
- resolve_environment: already correct, unchanged

Affected compilers (those with 2+ level extends chains):
- ibm_xl (extends clang extends gcc) - gains gcc flags
- armclang (extends clang extends gcc) - gains gcc flags
- intel_cc (extends clang extends gcc) - gains gcc flags

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 09:43:50 +00:00
Laszlo Nagy dea535751c move methods to types and introduce ResolvedTable
Option A: move free functions to methods on their natural types
- FlagMatch::name_len() (was codegen::flag_name_len)
- EnvEntry::validate() (was codegen::validate_env_entry)
- EnvMappingYaml::to_rust() (was codegen::env_mapping_to_rust)

codegen.rs now contains only the two string-to-string converters
(pattern_to_rust, result_to_rust) that operate on raw YAML strings.

Option B: introduce ResolvedTable struct that encapsulates inheritance
resolution and code generation for a single compiler table.
- ResolvedTable::new() merges flags, ignore_when, slash_prefix, env
- ResolvedTable::generate() produces the complete output file
- generate() in lib.rs becomes a thin loop over TABLES
- Snapshot tests reduce from 20-line helper to one-liner

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 09:31:02 +00:00
Laszlo Nagy b54ce0a953 add snapshot, schema validation, and property-based tests
Phase 2 of the bear-codegen extraction:

- Add insta snapshot tests for all 14 generated files (12 flag tables +
  recognition.rs + env_keys.rs). Any change in YAML or codegen logic
  is caught as a snapshot diff.

- Add YAML schema validation tests: extends references, result strings,
  pattern codegen, env entry validation, circular extends detection,
  and structural invariants (typed tables have recognition entries,
  all tables have flags).

- Add proptest property-based tests for pattern_to_rust and flag_name_len:
  suffix-to-variant mapping, count-dependent behavior, and bounds.

- Refactor validate_env_entry to return Result<(), String> instead of
  panicking, so tests get structured error messages. Build path still
  unwraps (panics on invalid YAML).

- Refactor generate_recognition_patterns and generate_env_keys to return
  strings instead of writing files, enabling direct snapshot testing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 09:14:23 +00:00