`cc` and `c++` are GCC on most Linuxes but Clang on FreeBSD, OpenBSD, NetBSD, DragonFly, and macOS. The regex defaulted them to GCC, which corrupted the compilation database on those hosts via wrong flag-arity tables (e.g. Clang's `-Xclang <arg>` consumes the next argv slot, GCC's does not). Recognition now runs `--version` lazily for these ambiguous basenames, classifies by signature, and dispatches accordingly. The probe is the sole classifier: gcc.yaml deliberately omits `cc`/`c++`, so a failed probe returns NotRecognized rather than guessing -- a missing entry is visible and debuggable, whereas a wrongly-classified entry corrupts the database silently via mismatched flag-arity tables (the bug this work exists to fix). Layered design: - CompilerRecognizer dispatches. - CompilerProbe classifies. VersionProbe on Unix (hardened: closed stdin, process-group SIGKILL on timeout, LD_PRELOAD / DYLD_INSERT_LIBRARIES stripped); NoProbe on Windows where basenames are unambiguous and the Unix subprocess primitives the probe relies on aren't available. - CachingProbe memoizes the probe's verdict per canonical path so each unique compiler is fork-exec'd at most once per process. A user `compilers:` entry preempts the probe -- the sole supported override. Also simplifies the WrapperInterpreter that the probe work exposed: replaces the cyclic Arc::new_cyclic + OnceLock<Box<dyn Interpreter>> + Weak<dyn Interpreter> machinery with a flat wrapper::unwrap() helper called inline from CompilerInterpreter::recognize. See requirements/recognition-ambiguous-name-probe.md for the spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Compiler Definitions
This directory contains YAML files that define how Bear recognizes compiler executables, categorizes their command-line flags, and filters internal invocations. Each file corresponds to one compiler (or compiler family).
At build time, bear/build.rs reads these files and generates static Rust
arrays for flag tables, ignore filters, and recognition patterns. The generated
code is included in the interpreter and recognition modules via include!().
File structure
# Optional: inherit all flags from another file (by filename stem)
extends: gcc
# Required: maps to a CompilerType variant (gcc, clang, flang, cuda, intel_fortran, cray_fortran)
type: gcc
# Executable names this compiler is known by
recognize:
- executables: ["gcc", "g++", "gfortran"]
cross_compilation: true # match with cross-compilation prefix (e.g., arm-linux-gnu-gcc)
versioned: true # match with version suffix (e.g., gcc-11, gcc11)
- executables: ["cc", "c++"]
cross_compilation: true
versioned: false
# Optional: treat '/'-prefixed arguments as flags (default: false)
# When true, arguments like /Fo, /c, /I are treated as compiler flags.
# When false (default), only '-'-prefixed arguments are treated as flags.
# Inherited from base file via `extends` if not specified.
slash_prefix: false
# Optional: conditions under which a recognized invocation should be ignored
ignore_when:
# Ignore if the executable filename matches any of these
executables: ["cc1", "cc1plus", "f951"]
# Ignore if any argument matches any of these flags
flags: ["-cc1"]
flags:
- match: {pattern: "-o{ }*"}
result: output
- match: {pattern: "-c"}
result: stops_at_compiling
- match: {pattern: "-I{ }*"}
result: configures_preprocessing
Pattern syntax
The pattern string encodes both the flag name and how it consumes arguments:
| Syntax | Example | Meaning |
|---|---|---|
-flag |
-c |
Exact match, no additional arguments |
-flag + count |
-x count: 1 |
Exact match with N separate arguments |
-flag* |
-W* |
Prefix match (anything starting with -W) |
-flag* + count |
-Xarch* count: 1 |
Prefix match with N separate arguments |
-flag{ }* |
-D{ }* |
Exact match, value glued or as separate arg |
-flag=* |
-specs=* |
Exact match, value after = |
-flag{=}* |
--std{=}* |
Exact match, value after = or as separate arg |
-flag:* |
/std:* |
Exact match, value after : |
-flag{:}* |
/Fe{:}* |
Exact match, value after : or as separate arg |
The {} pair means the separator is optional:
{ }-- the space between flag and value is optional (value can be glued:-Dfooor separate:-D foo){=}-- the=between flag and value is optional (value can follow=:--std=c99or be separate:--std c99){:}-- the:between flag and value is optional (value can follow::/std:c++20or be separate:/std c++20)
Result values
The result field describes what the flag means semantically:
| Value | Meaning |
|---|---|
output |
Output file specification |
configures_preprocessing |
Affects the preprocessing pass |
configures_compiling |
Affects the compilation pass |
configures_assembling |
Affects the assembly pass |
configures_linking |
Affects the linking pass |
stops_at_preprocessing |
Stop compilation after preprocessing |
stops_at_compiling |
Stop compilation after compiling |
stops_at_assembling |
Stop compilation after assembling |
info_and_exit |
Print info and exit (e.g. --version) |
driver_option |
Driver/toolchain behavior flag |
pass_through |
Stop parsing; remaining args go to linker |
none |
No specific semantic effect |
Ignore filters
The optional ignore_when section specifies conditions under which a recognized
compiler invocation should be treated as an internal/ignored command rather than
a user-facing compilation:
executables-- list of executable filenames (not paths). If the invoked executable's filename matches any entry, the command is ignored. Used by GCC to skip internal executables likecc1,collect2, etc.flags-- list of argument strings. If any argument in the invocation matches any entry, the command is ignored. Used by Clang to skip-cc1frontend invocations.
Both fields are optional and default to empty. When a file uses extends, the
ignore filters are inherited from the base file only if the extending file does
not define its own list for that field (i.e., own values take precedence per field,
not per entry).
Inheritance
Files with extends: gcc inherit all GCC flags and (unless overridden) ignore
filters. The build script concatenates own flags before base flags, then sorts
all entries by flag name length (longest first) so more specific flags match
before shorter prefixes. The sort is stable, so own flags take priority over
base flags of the same length.
Recognition patterns
The recognize section defines which executable names this compiler is known by.
Each entry specifies:
executables-- list of base executable names (e.g.,["gcc", "g++"])cross_compilation-- iftrue, also matches names with a cross-compilation prefix (e.g.,arm-linux-gnueabihf-gcc)versioned-- iftrue, also matches names with a version suffix (e.g.,gcc-11,gcc11,gcc-11.2)
All patterns automatically handle .exe extensions on Windows.
Executables listed in ignore_when.executables are automatically added as
recognition entries with cross_compilation: false, versioned: false. This
ensures the recognizer routes them to the right compiler type, where the
interpreter then ignores them. You do not need to list them under recognize.
Environment variables
The optional environment section declares environment variables that the
compiler binary reads and how their values map to command-line arguments.
environment:
- variable: CPATH
effect: configures_preprocessing
mapping:
flag: "-I"
separator: path
- variable: CL
effect: configures_compiling
mapping:
expand: prepend
separator: space
Each entry has:
variable-- the environment variable name (must match[A-Za-z_][A-Za-z0-9_]*)effect-- semantic effect (same vocabulary asresultin flags)mapping-- how the value translates to arguments
Mapping types
| Type | Fields | Behavior |
|---|---|---|
| Flag | flag + separator |
Split value by separator, emit <flag> <entry> per element |
| Expand | expand + separator: space |
Shell-split value, insert as raw arguments |
Separators
| Value | Meaning |
|---|---|
path |
Platform path separator (: on Unix, ; on Windows) |
";" |
Fixed semicolon separator |
space |
POSIX shell-word splitting (used with expand) |
Expand positions
| Value | Meaning |
|---|---|
prepend |
Insert before command-line arguments (e.g., MSVC CL) |
append |
Insert after command-line arguments (e.g., MSVC _CL_) |
Documentary entries
Variables the compiler reads but Bear cannot parse (e.g., config file paths)
can be listed with effect: none:
- variable: ICXCFG
effect: none
note: "Config file - not parsed"
mapping:
separator: space
These are skipped during code generation but document the variable for future contributors.
Environment inheritance
Environment variables follow the extends chain transitively. If
armclang.yaml extends clang.yaml which extends gcc.yaml, armclang
inherits all GCC and Clang environment entries. Own entries override
inherited ones matched by variable name.
Compilers that do not read GCC variables (e.g., NVIDIA HPC SDK) must not extend GCC and will have an empty environment table.
Adding a new compiler
- Create a new YAML file in this directory (e.g.,
mycompiler.yaml) - Add
type:,recognize:,flags:entries and optionallyextends:,ignore_when:,environment: - Add a
TableConfigentry inbear/build.rs - Add a
CompilerTypevariant inconfig.rsand a mapping incompiler_recognition.rs::parse_compiler_type - Register the
FlagBasedInterpreterinCompilerInterpreter::new_with_config - Run
cargo build && cargo test
Adding a new flag
- Find the right YAML file for the compiler
- Add an entry under
flags:with the appropriatematchpattern andresult - Run
cargo build-- the build script regenerates the flag tables automatically - Run
cargo test-- invariant tests verify sorting, no invalid kinds, etc.