fix(intercept): resolve past masquerade compiler wrappers in wrapper mode

Wrapper mode previously stored whatever `which(gcc)` returned as the
"real compiler" for each wrapper. On distributions with a ccache
masquerade in PATH (Fedora/Arch/Gentoo by default), that is the
ccache symlink, so the wrapper's child process was ccache. ccache
then searched PATH for gcc, skipping only symlinks to itself; Bear's
hard-linked wrapper in `.bear/` passed the self-check and was
re-executed, producing an infinite loop.

environment.rs now detects masquerade wrappers at discovery time by
canonicalising candidate paths and checking the target's basename
against a fixed set (ccache, distcc, icecc, colorgcc, buildcache).
The containing directory is stripped from the lookup PATH and
resolution retries, so the wrapper config always names the real
compiler. Both the CC-env and PATH-scan discovery paths are covered.

Other changes in the same fix:
- Requirement reworked around "resolve past masquerade wrappers at
  discovery time"; the original CCACHE_COMPILER proposal is
  documented as rejected, verified empirically to reproduce the
  hang via CCACHE_COMPILER pointing at the ccache symlink.
- Nine new unit tests cover detection, filtering, and the
  no-real-compiler fallback.
- New integration test wrapper_mode_survives_masquerade_wrapper_in_path
  prepends the masquerade dir to its own child PATH so the
  recursion scenario is exercised regardless of host PATH, while
  keeping other tests ccache-free.
- build.rs scans well-known masquerade locations (/usr/lib/ccache,
  /usr/lib64/ccache, /usr/libexec/ccache), exposes the found dir
  via CCACHE_MASQUERADE_DIR, and sets cfg(host_has_ccache_masquerade)
  to gate the new test.
- The manual ccache_free_path_and_compiler workaround in the
  wrapper-mode tests is gone; the tests now run against the host's
  real PATH and also protect this requirement.
- CI: Ubuntu job runs apt-get install ccache so the masquerade dir
  exists on every PR. The job PATH is deliberately not modified --
  ccache first on PATH would inflate event counts for preload-mode
  tests that assert exact compiler-event counts.

Side effect: ccache is bypassed while Bear is observing. That
matches Bear's observe-don't-optimise stance and keeps
compile_commands.json recording the real compiler.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Laszlo Nagy
2026-04-24 09:35:38 +00:00
parent 9dedc88cfb
commit d7305bac20
5 changed files with 601 additions and 298 deletions
+12
View File
@@ -49,6 +49,18 @@ jobs:
- uses: dtolnay/rust-toolchain@stable
with:
toolchain: ${{ matrix.toolchain }}
- name: Install ccache on Ubuntu
# Bear's wrapper-recursion integration test
# (wrapper_mode_survives_masquerade_wrapper_in_path) needs a ccache
# masquerade dir to exist. The ccache package creates
# /usr/lib/ccache/* symlinks; the test injects that dir into its own
# child PATH. We deliberately do NOT prepend the dir to the job
# PATH, because ccache first on PATH would inflate event counts
# for other tests that assert exact compiler-event counts.
if: runner.os == 'Linux'
run: |
sudo apt-get update
sudo apt-get install -y ccache
- name: Run integration tests
run: |
cargo build --verbose
+232 -6
View File
@@ -213,7 +213,9 @@ impl BuildEnvironment {
///
/// This function scans all directories in the PATH environment variable and applies
/// the provided predicate to each executable file found. Executables that match
/// the predicate are returned.
/// the predicate are returned. Masquerade wrappers (ccache, distcc, ...) are
/// filtered out so that the registered compiler is always the real one; the
/// iteration continues past them to find the next candidate on PATH.
fn compiler_candidates<P>(context: &context::Context, predicate: P) -> impl Iterator<Item = PathBuf>
where
P: Fn(&Path) -> bool,
@@ -235,15 +237,16 @@ impl BuildEnvironment {
Vec::new().into_iter()
}
})
.filter(move |path| is_executable_file(path) && predicate(path))
.filter(move |path| is_executable_file(path) && !is_masquerade_wrapper(path) && predicate(path))
}
/// Resolves a program env var value to an absolute executable path.
///
/// Handles three cases:
/// - Absolute path: returned as-is (canonicalized if possible)
/// - Absolute path: returned as-is (not canonicalized)
/// - Relative path (contains directory component): joined with cwd
/// - Bare name: resolved via PATH using `which`
/// - Bare name: resolved via PATH, skipping masquerade wrapper
/// directories (ccache, distcc, icecc, colorgcc, buildcache)
///
/// Returns `None` if the program cannot be found.
fn resolve_program_path(context: &context::Context, value: &str) -> Option<PathBuf> {
@@ -267,9 +270,9 @@ impl BuildEnvironment {
return Some(context.current_directory.join(&path));
}
// Bare name: resolve via PATH
// Bare name: resolve via PATH, skipping masquerade wrappers.
let search_path = context.path().map(|(_, p)| p).unwrap_or_else(|| context.confstr_path.clone());
which::which_in(name, Some(search_path.as_str()), &context.current_directory).ok()
resolve_past_masquerade_wrappers(name, &search_path, &context.current_directory)
}
/// Creates a `BuildEnvironment` configured for preload mode interception.
@@ -434,6 +437,75 @@ pub fn insert_to_path<P: AsRef<Path>>(original: &str, first: P) -> Result<String
std::env::join_paths(paths).map(|os_string| os_string.into_string().unwrap_or_default())
}
/// Known masquerade wrappers. A directory full of symlinks named after
/// compilers, where each symlink resolves to one of these binaries, is a
/// masquerade directory; Bear skips such directories when resolving the real
/// compiler. See `interception-wrapper-recursion`.
const MASQUERADE_WRAPPERS: &[&str] = &["ccache", "distcc", "icecc", "colorgcc", "buildcache"];
/// Checks whether `path` is a symlink whose ultimate target's filename is one
/// of the known masquerade wrappers.
///
/// Uses `canonicalize` to follow the symlink chain because we only inspect
/// the basename of the final target; the canonicalised path itself is
/// discarded. This must not be used when registering a compiler, since
/// canonicalisation would change the name (e.g. `/usr/bin/gcc` ->
/// `/usr/bin/gcc-13`) and break wrapper lookup.
fn is_masquerade_wrapper(path: &Path) -> bool {
let Ok(target) = std::fs::canonicalize(path) else { return false };
let Some(name) = target.file_name().and_then(|n| n.to_str()) else { return false };
let stem = name.strip_suffix(".exe").or_else(|| name.strip_suffix(".EXE")).unwrap_or(name);
let stem_lower = stem.to_ascii_lowercase();
MASQUERADE_WRAPPERS.iter().any(|w| *w == stem_lower)
}
/// Resolves a bare program name via PATH, transparently skipping masquerade
/// wrapper directories. If the first match on PATH is a masquerade wrapper
/// (e.g. `/usr/lib64/ccache/gcc` -> `/usr/bin/ccache`), the containing
/// directory is excluded and the search is retried. The process repeats until
/// a non-masquerade compiler is found or PATH is exhausted.
///
/// Returns `None` if no real compiler is reachable past the masquerade
/// directories. In that case the caller must not register a wrapper; doing so
/// would re-create the recursion the filtering is designed to prevent.
fn resolve_past_masquerade_wrappers(name: &str, search_path: &str, cwd: &Path) -> Option<PathBuf> {
let mut excluded: Vec<PathBuf> = Vec::new();
loop {
let current = filter_out_paths(search_path, &excluded);
let found = which::which_in(name, Some(current.as_str()), cwd).ok()?;
if !is_masquerade_wrapper(&found) {
return Some(found);
}
let parent = found.parent()?.to_path_buf();
if excluded.contains(&parent) {
// Defensive: the excluded dir came back, which would loop forever.
log::warn!(
"resolve: masquerade dir {} already excluded but returned again for '{}'",
parent.display(),
name,
);
return None;
}
log::info!(
"resolve: masquerade wrapper at {}; re-resolving '{}' past {}",
found.display(),
name,
parent.display(),
);
excluded.push(parent);
}
}
/// Joins a path-separated string, removing any entries that match one of the
/// excluded paths. Matching is by value; no canonicalisation.
fn filter_out_paths(original: &str, excluded: &[PathBuf]) -> String {
let kept: Vec<PathBuf> =
std::env::split_paths(original).filter(|p| !excluded.iter().any(|e| e == p)).collect();
std::env::join_paths(kept).map(|os| os.into_string().unwrap_or_default()).unwrap_or_default()
}
/// Checks if a path represents an executable file.
fn is_executable_file(path: &Path) -> bool {
if !path.is_file() {
@@ -1156,4 +1228,158 @@ mod test {
let cc_value = sut.environment_overrides.get("CC").expect("CC should be overridden");
assert!(cc_value.contains(".bear"), "CC should point to wrapper: {}", cc_value);
}
#[cfg(unix)]
mod masquerade {
use super::super::{filter_out_paths, is_masquerade_wrapper, resolve_past_masquerade_wrappers};
use std::os::unix::fs::PermissionsExt;
use std::path::PathBuf;
use tempfile::TempDir;
fn write_executable(path: &std::path::Path, content: &str) {
std::fs::write(path, content).unwrap();
std::fs::set_permissions(path, std::fs::Permissions::from_mode(0o755)).unwrap();
}
// Requirements: interception-wrapper-recursion
#[test]
fn detects_symlink_to_ccache() {
let dir = TempDir::new().unwrap();
let fake_ccache = dir.path().join("ccache");
write_executable(&fake_ccache, "#!/bin/sh\n");
let gcc_symlink = dir.path().join("masq").join("gcc");
std::fs::create_dir_all(gcc_symlink.parent().unwrap()).unwrap();
std::os::unix::fs::symlink(&fake_ccache, &gcc_symlink).unwrap();
assert!(is_masquerade_wrapper(&gcc_symlink));
}
// Requirements: interception-wrapper-recursion
#[test]
fn detects_all_known_wrapper_names() {
let dir = TempDir::new().unwrap();
for name in ["distcc", "icecc", "colorgcc", "buildcache"] {
let target = dir.path().join(name);
write_executable(&target, "#!/bin/sh\n");
let link = dir.path().join(format!("{name}-gcc-link"));
std::os::unix::fs::symlink(&target, &link).unwrap();
assert!(is_masquerade_wrapper(&link), "{name} target should be detected");
}
}
// Requirements: interception-wrapper-recursion
#[test]
fn ignores_real_compiler_and_non_wrapper_symlinks() {
let dir = TempDir::new().unwrap();
let real_gcc = dir.path().join("gcc-13");
write_executable(&real_gcc, "#!/bin/sh\n");
assert!(!is_masquerade_wrapper(&real_gcc));
let gcc_symlink = dir.path().join("gcc");
std::os::unix::fs::symlink(&real_gcc, &gcc_symlink).unwrap();
assert!(!is_masquerade_wrapper(&gcc_symlink));
}
// Requirements: interception-wrapper-recursion
#[test]
fn ignores_broken_and_missing_paths() {
let dir = TempDir::new().unwrap();
let broken = dir.path().join("broken-link");
std::os::unix::fs::symlink(dir.path().join("does-not-exist"), &broken).unwrap();
assert!(!is_masquerade_wrapper(&broken));
assert!(!is_masquerade_wrapper(&dir.path().join("does-not-exist")));
}
// Requirements: interception-wrapper-recursion
#[test]
fn filter_out_paths_drops_matching_entries_only() {
let a = PathBuf::from("/a");
let b = PathBuf::from("/b");
let c = PathBuf::from("/c");
let original = std::env::join_paths([&a, &b, &c]).unwrap().into_string().unwrap_or_default();
let kept = filter_out_paths(&original, std::slice::from_ref(&b));
let entries: Vec<PathBuf> = std::env::split_paths(&kept).collect();
assert_eq!(entries, vec![a, c]);
}
// Requirements: interception-wrapper-recursion
#[test]
fn resolver_returns_real_compiler_when_no_masquerade() {
let dir = TempDir::new().unwrap();
let real = dir.path().join("gcc");
write_executable(&real, "#!/bin/sh\n");
let path = std::env::join_paths([dir.path()]).unwrap().into_string().unwrap_or_default();
let found = resolve_past_masquerade_wrappers("gcc", &path, dir.path()).unwrap();
assert_eq!(found, real);
}
// Requirements: interception-wrapper-recursion
#[test]
fn resolver_skips_masquerade_and_returns_next_real_compiler() {
let dir = TempDir::new().unwrap();
let ccache_bin = dir.path().join("bin").join("ccache");
std::fs::create_dir_all(ccache_bin.parent().unwrap()).unwrap();
write_executable(&ccache_bin, "#!/bin/sh\n");
let masq_dir = dir.path().join("ccache_dir");
std::fs::create_dir_all(&masq_dir).unwrap();
let masq_gcc = masq_dir.join("gcc");
std::os::unix::fs::symlink(&ccache_bin, &masq_gcc).unwrap();
let real_dir = dir.path().join("real");
std::fs::create_dir_all(&real_dir).unwrap();
let real_gcc = real_dir.join("gcc");
write_executable(&real_gcc, "#!/bin/sh\n");
let path =
std::env::join_paths([&masq_dir, &real_dir]).unwrap().into_string().unwrap_or_default();
let found = resolve_past_masquerade_wrappers("gcc", &path, dir.path()).unwrap();
assert_eq!(found, real_gcc);
}
// Requirements: interception-wrapper-recursion
#[test]
fn resolver_returns_none_when_only_masquerade_is_reachable() {
let dir = TempDir::new().unwrap();
let ccache_bin = dir.path().join("ccache");
write_executable(&ccache_bin, "#!/bin/sh\n");
let masq_dir = dir.path().join("masq");
std::fs::create_dir_all(&masq_dir).unwrap();
let masq_gcc = masq_dir.join("gcc");
std::os::unix::fs::symlink(&ccache_bin, &masq_gcc).unwrap();
let path = std::env::join_paths([&masq_dir]).unwrap().into_string().unwrap_or_default();
assert!(resolve_past_masquerade_wrappers("gcc", &path, dir.path()).is_none());
}
// Requirements: interception-wrapper-recursion
#[test]
fn resolver_skips_multiple_masquerade_layers() {
let dir = TempDir::new().unwrap();
let ccache_bin = dir.path().join("ccache");
let distcc_bin = dir.path().join("distcc");
write_executable(&ccache_bin, "#!/bin/sh\n");
write_executable(&distcc_bin, "#!/bin/sh\n");
let masq1 = dir.path().join("m1");
let masq2 = dir.path().join("m2");
let real = dir.path().join("real");
std::fs::create_dir_all(&masq1).unwrap();
std::fs::create_dir_all(&masq2).unwrap();
std::fs::create_dir_all(&real).unwrap();
std::os::unix::fs::symlink(&ccache_bin, masq1.join("gcc")).unwrap();
std::os::unix::fs::symlink(&distcc_bin, masq2.join("gcc")).unwrap();
let real_gcc = real.join("gcc");
write_executable(&real_gcc, "#!/bin/sh\n");
let path =
std::env::join_paths([&masq1, &masq2, &real]).unwrap().into_string().unwrap_or_default();
let found = resolve_past_masquerade_wrappers("gcc", &path, dir.path()).unwrap();
assert_eq!(found, real_gcc);
}
}
}
+44
View File
@@ -73,6 +73,7 @@ fn main() {
check_one_executable_exists("make", &["make", "gmake", "mingw32-make"]);
check_one_executable_exists("compiler_c", &["gcc", "clang", "cc"]);
check_one_executable_exists("compiler_cxx", &["g++", "clang++", "c++"]);
check_ccache_masquerade_dir();
check_one_executable_exists("compiler_fortran", &["gfortran", "flang"]);
check_one_executable_exists("compiler_cuda", &["nvcc"]);
check_executable_exists("libtool");
@@ -130,6 +131,49 @@ fn check_one_executable_exists(define: &str, executables: &[&str]) {
println!("cargo:warning=Checking for executable: {} ... missing", define);
}
/// Locate a ccache masquerade directory on this host, independent of PATH.
/// When one is found, expose it via the `CCACHE_MASQUERADE_DIR` env var and
/// set `cfg(host_has_ccache_masquerade)`. The dedicated recursion test (see
/// `interception-wrapper-recursion`) prepends that dir to its own PATH at
/// runtime so the masquerade setup is exercised regardless of whether the
/// host's default PATH already includes it. CI installs ccache so the dir
/// exists on the Ubuntu matrix entry.
fn check_ccache_masquerade_dir() {
println!("cargo:rustc-check-cfg=cfg(host_has_ccache_masquerade)");
let candidates = ["/usr/lib/ccache", "/usr/lib64/ccache", "/usr/libexec/ccache"];
for dir in candidates {
if let Some(path) = detect_ccache_masquerade_dir(dir) {
println!("cargo:rustc-cfg=host_has_ccache_masquerade");
println!("cargo:rustc-env=CCACHE_MASQUERADE_DIR={}", path);
println!("cargo:warning=ccache masquerade directory found at {}", path);
return;
}
}
println!("cargo:warning=no ccache masquerade directory detected");
}
/// A directory qualifies as a ccache masquerade dir if it contains a `gcc`,
/// `cc`, `g++`, `c++`, `clang`, or `clang++` entry whose ultimate target's
/// file name is `ccache`.
fn detect_ccache_masquerade_dir(dir: &str) -> Option<String> {
let path = std::path::Path::new(dir);
if !path.is_dir() {
return None;
}
for name in ["gcc", "cc", "g++", "c++", "clang", "clang++"] {
let candidate = path.join(name);
if !candidate.exists() {
continue;
}
if let Ok(target) = std::fs::canonicalize(&candidate)
&& target.file_name().and_then(|n| n.to_str()) == Some("ccache")
{
return Some(dir.to_string());
}
}
None
}
fn check_preload_library_availability(preload_path: &str) {
// Check if we're on a platform that supports LD_PRELOAD (Unix-like systems)
let platform_supports_preload = !cfg!(windows);
+111 -83
View File
@@ -321,24 +321,6 @@ fn libtool_command_interception() -> Result<()> {
Ok(())
}
/// Build a PATH that excludes ccache directories and resolve the bare
/// compiler name within it. Without this, wrapper mode on a ccache-equipped
/// host recurses: `.bear/gcc` -> ccache -> PATH lookup for `gcc` -> `.bear/gcc`.
/// Same workaround used by `wrapper_mode_resolves_cc_bare_name_via_path`.
#[cfg(target_family = "unix")]
fn ccache_free_path_and_compiler() -> (std::ffi::OsString, std::path::PathBuf) {
let safe_path = std::env::join_paths(
std::env::split_paths(&std::env::var("PATH").unwrap_or_default())
.filter(|p| !p.to_string_lossy().contains("ccache")),
)
.expect("failed to join PATH");
let compiler_filename = filename_of(COMPILER_C_PATH);
let real_compiler =
which::which_in(&compiler_filename, Some(&safe_path), std::env::current_dir().unwrap())
.unwrap_or_else(|_| std::path::PathBuf::from(COMPILER_C_PATH));
(safe_path, real_compiler)
}
/// In wrapper mode, Bear creates a deterministic `.bear/` directory in the
/// working directory during the build and removes it automatically on exit.
/// Verify both: the build observes `.bear/` while it runs, and after Bear
@@ -351,8 +333,6 @@ fn wrapper_mode_creates_and_cleans_up_bear_directory() -> Result<()> {
let env = TestEnvironment::new("wrapper_bear_dir_cleanup")?;
env.create_source_files(&[("test.c", "int main() { return 0; }")])?;
let (safe_path, real_compiler) = ccache_free_path_and_compiler();
// Build script records whether `.bear/` was present during the build,
// then invokes the compiler via $CC so the wrapper is exercised.
let build = r#"if [ -d .bear ]; then echo present > bear_dir_status.txt; else echo missing > bear_dir_status.txt; fi
@@ -360,23 +340,20 @@ $CC -c test.c -o test.o
"#;
let script = env.create_shell_script("build.sh", build)?;
let config = format!(
r#"
// Config forces wrapper mode and does not list compilers: Bear discovers
// them via the CC env var and PATH scan (masquerade-aware, see
// `interception-wrapper-recursion`).
let config = r#"
schema: "4.1"
intercept:
mode: wrapper
compilers:
- path: {}
"#,
real_compiler.to_str().unwrap()
);
"#;
let config_path = env.test_dir().join("config.yaml");
std::fs::write(&config_path, config)?;
let mut cmd = env.command_bear();
cmd.current_dir(env.test_dir()).env("CC", filename_of(COMPILER_C_PATH)).env("PATH", &safe_path).args([
cmd.current_dir(env.test_dir()).env("CC", filename_of(COMPILER_C_PATH)).args([
"--config",
config_path.to_str().unwrap(),
"--output",
@@ -417,41 +394,31 @@ fn wrapper_mode_bear_directory_is_deterministic_across_runs() -> Result<()> {
let env = TestEnvironment::new("wrapper_bear_dir_deterministic")?;
env.create_source_files(&[("test.c", "int main() { return 0; }")])?;
let (safe_path, real_compiler) = ccache_free_path_and_compiler();
let build = r#"ls -d .bear >> bear_dir_observed.txt 2>/dev/null || echo missing >> bear_dir_observed.txt
$CC -c test.c -o test.o
"#;
let script = env.create_shell_script("build.sh", build)?;
let config = format!(
r#"
let config = r#"
schema: "4.1"
intercept:
mode: wrapper
compilers:
- path: {}
"#,
real_compiler.to_str().unwrap()
);
"#;
let config_path = env.test_dir().join("config.yaml");
std::fs::write(&config_path, config)?;
for _ in 0..2 {
let mut cmd = env.command_bear();
cmd.current_dir(env.test_dir()).env("CC", filename_of(COMPILER_C_PATH)).env("PATH", &safe_path).args(
[
"--config",
config_path.to_str().unwrap(),
"--output",
"compile_commands.json",
"--",
SHELL_PATH,
script.to_str().unwrap(),
],
);
cmd.current_dir(env.test_dir()).env("CC", filename_of(COMPILER_C_PATH)).args([
"--config",
config_path.to_str().unwrap(),
"--output",
"compile_commands.json",
"--",
SHELL_PATH,
script.to_str().unwrap(),
]);
let output = cmd.output()?;
assert!(
output.status.success(),
@@ -687,14 +654,12 @@ fn fakeroot_integration() -> Result<()> {
/// Test that wrapper mode resolves bare compiler names from CC env var via PATH.
///
/// Covers the PATH resolution part of issue #686: when CC is set to a bare name
/// (e.g. "gcc" instead of "/usr/bin/gcc"), Bear's wrapper mode should resolve it
/// through PATH before registering wrapper targets.
///
/// The build script uses $CC so the wrapper actually intercepts it. To avoid
/// ccache recursion (where the wrapper calls ccache which finds the wrapper
/// again via PATH), we construct a PATH containing only the real compiler
/// directory, excluding any ccache directories.
// Requirements: interception-wrapper-mechanism
/// (e.g. "gcc" instead of "/usr/bin/gcc"), Bear's wrapper mode must resolve it
/// through PATH before registering wrapper targets. On ccache-equipped hosts
/// the bare-name resolution also exercises `interception-wrapper-recursion`,
/// since Bear must step past the ccache masquerade directory to reach the
/// real compiler.
// Requirements: interception-wrapper-mechanism, interception-wrapper-recursion
#[test]
#[cfg(all(has_executable_compiler_c, has_executable_shell))]
fn wrapper_mode_resolves_cc_bare_name_via_path() -> Result<()> {
@@ -704,39 +669,20 @@ fn wrapper_mode_resolves_cc_bare_name_via_path() -> Result<()> {
let compiler_filename = filename_of(COMPILER_C_PATH);
// Build a PATH that excludes ccache directories to avoid wrapper recursion
// (ccache symlinks search PATH for the real compiler, finding the wrapper).
let safe_path = std::env::join_paths(
std::env::split_paths(&std::env::var("PATH").unwrap_or_default())
.filter(|p| !p.to_string_lossy().contains("ccache")),
)
.expect("failed to join PATH");
// Ensure the real compiler is reachable: resolve the compiler filename
// in the safe PATH to get a non-ccache path (e.g. /usr/bin/gcc).
let real_compiler = which::which_in(&compiler_filename, Some(&safe_path), env.test_dir())
.unwrap_or_else(|_| std::path::PathBuf::from(COMPILER_C_PATH));
let real_compiler_str = real_compiler.to_str().unwrap();
// Build script uses $CC so the wrapper intercepts the call.
let build_commands = "$CC -c test.c".to_string();
let script_path = env.create_shell_script("build.sh", &build_commands)?;
// Config forces wrapper mode and lists the real (non-ccache) compiler
// to suppress PATH-based discovery.
let config = format!(
r#"
// Config forces wrapper mode. No `compilers:` entry -- Bear discovers the
// compiler via CC (masquerade-aware resolution).
let config = r#"
schema: "4.1"
intercept:
mode: wrapper
compilers:
- path: {}
"#,
real_compiler_str
);
"#;
let config_path = env.test_dir().join("config.yaml");
std::fs::write(&config_path, &config)?;
std::fs::write(&config_path, config)?;
// Run the full bear pipeline with CC set to the bare compiler name (no path).
// Bear must resolve "gcc" via PATH before creating wrapper symlinks.
@@ -745,7 +691,6 @@ compilers:
.env("RUST_LOG", "debug")
.env("RUST_BACKTRACE", "1")
.env("CC", &compiler_filename)
.env("PATH", &safe_path)
.args([
"--config",
config_path.to_str().unwrap(),
@@ -778,6 +723,89 @@ compilers:
Ok(())
}
/// When a ccache masquerade directory is first on PATH (the default on
/// Fedora/Arch/Gentoo and on the CI Ubuntu job after `apt-get install
/// ccache`), wrapper mode must resolve past it to the real compiler. Before
/// the fix for `interception-wrapper-recursion`, Bear's wrapper config held
/// the ccache symlink as the "real compiler"; ccache's PATH search then
/// picked up Bear's hard-linked wrapper in `.bear/` and the two looped
/// forever.
///
/// The test prepends the masquerade dir into its own child PATH so the
/// scenario runs regardless of the host's default PATH. Other integration
/// tests keep the host's PATH so ccache does not leak into their event
/// counts.
// Requirements: interception-wrapper-recursion
#[test]
#[cfg(target_family = "unix")]
#[cfg(host_has_ccache_masquerade)]
#[cfg(all(has_executable_compiler_c, has_executable_shell))]
fn wrapper_mode_survives_masquerade_wrapper_in_path() -> Result<()> {
let env = TestEnvironment::new("wrapper_ccache_masquerade")?;
env.create_source_files(&[("test.c", "int main() { return 0; }")])?;
let compiler_filename = filename_of(COMPILER_C_PATH);
let build = "$CC -c test.c -o test.o\n".to_string();
let script = env.create_shell_script("build.sh", &build)?;
let config = r#"
schema: "4.1"
intercept:
mode: wrapper
"#;
let config_path = env.test_dir().join("config.yaml");
std::fs::write(&config_path, config)?;
// Prepend the ccache masquerade directory to the child's PATH so the
// recursion scenario is exercised. Without the masquerade fix the
// build hangs and the test harness timeout will surface it.
let masquerade_dir = env!("CCACHE_MASQUERADE_DIR");
let host_path = std::env::var("PATH").unwrap_or_default();
let child_path = std::env::join_paths(
std::iter::once(std::path::PathBuf::from(masquerade_dir)).chain(std::env::split_paths(&host_path)),
)
.expect("join PATH failed");
let mut cmd = env.command_bear();
cmd.current_dir(env.test_dir())
.env("RUST_LOG", "debug")
.env("RUST_BACKTRACE", "1")
.env("CC", &compiler_filename)
.env("PATH", &child_path)
.args([
"--config",
config_path.to_str().unwrap(),
"--output",
"compile_commands.json",
"--",
SHELL_PATH,
script.to_str().unwrap(),
]);
let output = cmd.output()?;
let stderr = String::from_utf8_lossy(&output.stderr);
assert!(output.status.success(), "bear failed: {stderr}");
let db = env.load_compilation_database("compile_commands.json")?;
db.assert_count(1)?;
// Recorded compiler must be a real compiler, not the ccache symlink and
// not Bear's own wrapper in `.bear/`.
for entry in db.entries() {
let argv = entry.get("arguments").and_then(|v| v.as_array());
let compiler = argv
.and_then(|a| a.first())
.and_then(|v| v.as_str())
.expect("compilation db entry must have argv[0]");
assert!(!compiler.contains("ccache"), "compilation db must not reference ccache: {compiler}");
assert!(!compiler.contains(".bear"), "compilation db must not reference Bear wrapper: {compiler}");
}
Ok(())
}
/// Test that wrapper mode handles CC without .exe extension on Windows.
///
/// Reproduces the exact user scenario from issue #686: CC=cl (no extension,
+202 -209
View File
@@ -1,235 +1,228 @@
---
title: Prevent wrapper recursion with compiler wrappers
status: proposed
title: Resolve past masquerade wrappers in wrapper mode
status: implemented
---
## Problem
## Intent
When ccache is in PATH (the most common compiler wrapper setup), Bear's
wrapper mode can enter an infinite recursion loop:
When the user runs `bear -- make` on a distribution that ships compiler
masquerade wrappers (ccache on Fedora/Arch/Gentoo, icecream on its
supported distros, etc.), Bear's wrapper mode must not enter an infinite
loop with the masquerade wrapper. The compilation database must record
the real compiler command, and the build must complete. The user should
not have to strip any directories from PATH to make Bear work.
1. Bear creates a wrapper hard link `.bear/gcc` -> `bear-wrapper`
2. Bear prepends `.bear/` to PATH
3. Build runs `gcc foo.c`
4. Shell finds `.bear/gcc` first in PATH (the Bear wrapper)
5. Bear wrapper reports the execution and invokes the "real" compiler
6. The "real" compiler is `/usr/lib64/ccache/gcc` (ccache's symlink)
7. ccache searches PATH for `gcc`, skipping only symlinks to itself
8. ccache finds `.bear/gcc` -- a hard link to `bear-wrapper`, NOT a
symlink to ccache, so ccache accepts it as the real compiler
9. ccache executes `.bear/gcc`, which is Bear's wrapper again
10. Infinite loop: steps 5-9 repeat
Bear achieves this by resolving past masquerade directories at
discovery time. The price is that while Bear is observing the build,
tools like ccache are not exercised -- the build sees the real compiler
directly. This is intentional: Bear observes, it does not optimise.
This was observed during integration testing on Fedora where gcc is
symlinked through ccache (`/usr/lib64/ccache/gcc` -> `/usr/bin/ccache`).
## Background: how masquerade wrappers break Bear
The current workaround in the integration test (`intercept.rs:551-556`)
manually strips ccache directories from PATH before running Bear. This
is not available to end users.
Compiler masquerade wrappers (ccache, distcc, icecream/icecc,
colorgcc, buildcache) install a directory of symlinks named after real
compilers (`/usr/lib64/ccache/gcc`, `/usr/lib/icecc/bin/gcc`, ...) where
each symlink points at the wrapper binary. The distribution prepends
that directory to PATH, so a bare `gcc` in a Makefile resolves to the
wrapper, which then looks up the real compiler on PATH (skipping its
own symlinks) and forwards the call.
## How compiler wrappers find the real compiler
Bear's wrapper mode puts `.bear/` (full of hard links to `bear-wrapper`)
at the front of PATH. On a ccache-equipped box the interaction is:
Research into the major compiler wrappers (verified against their docs
and source code):
1. Shell finds `.bear/gcc`, runs Bear wrapper.
2. Wrapper reads its config: real `gcc` is `/usr/lib64/ccache/gcc` --
whatever `which gcc` returned at Bear startup.
3. Wrapper execs `/usr/lib64/ccache/gcc` (which IS ccache).
4. ccache searches PATH for `gcc`, skipping symlinks to itself. It
does NOT skip `.bear/gcc` because that is a hard link, not a
symlink, so ccache accepts it as the real compiler.
5. ccache execs `.bear/gcc`, Bear wrapper runs again. Steps 2-5
repeat forever.
### ccache
The same shape applies to any masquerade wrapper that detects itself
only by symlink comparison. distcc in masquerade mode happens to avoid
this specific loop because it strips all PATH entries up to and
including its own dir -- which drops `.bear/` as collateral damage --
but that still means distcc silently removes Bear from the child's
PATH, which breaks nested interception even when no loop occurs.
**Source**: ccache manual (https://ccache.dev/manual/latest.html),
source `find_executable_in_path`.
### Known masquerade wrappers
- Searches the full PATH for the first executable matching the compiler
name that is **not a symbolic link to ccache itself**.
- Detection uses `S_ISLNK` check + basename comparison to "ccache".
Hard links and copies are NOT detected as ccache.
- **Env vars for real compiler**:
- `CCACHE_COMPILER` (preferred) -- forces the compiler path, bypasses
PATH search entirely.
- `CCACHE_CC` -- deprecated alias for `CCACHE_COMPILER`.
- `CCACHE_PATH` -- restricts which directories ccache searches for the
compiler (colon-separated on Unix, semicolon on Windows).
- ccache does NOT read the `CC` or `CXX` env vars itself.
| Tool | Masquerade dir examples | Notes |
|----------------------|----------------------------------------------|----------------------------------------------------------------------|
| ccache | `/usr/lib64/ccache`, `/usr/lib/ccache` | Default on Fedora, Arch, Gentoo. Loops with Bear. |
| distcc | `/usr/lib/distcc`, `/usr/lib/distcc/bin` | Strips PATH prefix including `.bear/`; no loop, but breaks nesting. |
| icecream / icecc | `/usr/lib/icecc/bin`, `/usr/libexec/icecc` | Symlink pattern same as ccache. Loops with Bear. |
| colorgcc | `~/bin/colorgcc` setups | Rare; typically configured via `~/.colorgccrc`, not PATH masquerade. |
| buildcache | `/usr/lib/buildcache/bin` (varies) | Same shape as ccache. |
| sccache | Not a masquerade wrapper | Invoked explicitly (`sccache gcc ...`); no recursion with Bear. |
### distcc
**Source**: distcc(1) man page (https://www.distcc.org/man/distcc_1.html),
source `src/climasq.c`.
- In masquerade mode, strips **all directories up to and including** its
own masquerade directory from PATH, then searches the remainder.
- This means if PATH is `.bear:/usr/lib/distcc/bin:/usr/bin`, distcc
strips everything up to `/usr/lib/distcc/bin`, leaving only `/usr/bin`.
Bear's `.bear/` is removed in this process.
- Self-detection uses string comparison on directory paths, not symlink
resolution or inode comparison. Has a FIXME in source acknowledging
this limitation.
- **No env var for real compiler**. The documented env vars are
`DISTCC_HOSTS`, `DISTCC_LOG`, `DISTCC_VERBOSE`, `DISTCC_DIR`,
`DISTCC_SSH` -- none for compiler override.
- **distcc does NOT cause recursion with Bear** because its aggressive
PATH stripping removes `.bear/` along with everything before it.
### colorgcc
**Source**: colorgcc source (`colorgcc.pl` on GitHub).
- Reads compiler paths from `~/.colorgccrc` config file.
- Falls back to PATH search using `abs_path($0)` to skip entries that
resolve to itself. Detects symlinks but NOT hard links.
- **No env var for real compiler**. The only env var check is
`GCC_COLORS` -- if set, colorgcc skips colorization and execs the
compiler directly.
### sccache
**Source**: sccache GitHub (https://github.com/mozilla/sccache).
- Direct invocation only (`sccache gcc -c foo.c`). No masquerade mode.
- **No env var for real compiler** (beyond RUSTC_WRAPPER for Rust).
- **sccache does NOT cause recursion with Bear** because it does not
use symlink-in-PATH masquerade.
### Summary
| Tool | Causes recursion with Bear? | Env var for real compiler |
|---|---|---|
| ccache | **Yes** | `CCACHE_COMPILER` |
| distcc | No (strips all preceding PATH dirs) | None |
| colorgcc | Possible (hard link not detected) | None (config file only) |
| sccache | No (no masquerade mode) | None |
The primary problem is **ccache**, which is also by far the most common
compiler wrapper (default on Fedora, Arch, Gentoo, and others).
## Example scenario (ccache recursion)
System setup:
```
/usr/lib64/ccache/gcc -> /usr/bin/ccache
PATH=/usr/lib64/ccache:/usr/bin
```
User runs: `bear -- make`
Bear creates `.bear/gcc` (hard link to `bear-wrapper`) and sets
`PATH=.bear:/usr/lib64/ccache:/usr/bin`.
Trace:
1. Shell finds `.bear/gcc`, executes Bear wrapper
2. Bear wrapper looks up config: real compiler = `/usr/lib64/ccache/gcc`
3. Bear wrapper spawns `/usr/lib64/ccache/gcc foo.c` with PATH unchanged
4. ccache searches PATH for `gcc`, skips `/usr/lib64/ccache/gcc` (symlink
to itself), but accepts `.bear/gcc` (hard link, not symlink to ccache)
5. ccache runs `.bear/gcc foo.c` -- back to step 1, infinite loop
## Integration test plan
The goal is to verify Bear works on a machine configured the way a user's
distribution ships it. Fedora, Arch, and Gentoo install ccache symlinks
in a masquerade directory and put that directory on PATH by default, so
when the user types `gcc` they are actually running ccache. We want to
confirm Bear does not loop in that exact setup.
Detection (in `build.rs`):
- Probe whether the host compiler resolved by the existing
`compiler_c` check (`gcc`, `clang`, or `cc`) is actually a symlink
pointing at a `ccache` binary. Walk the symlink chain and compare the
target's filename to `ccache`.
- If it is, set a `cfg(host_compiler_goes_through_ccache)` flag and
expose the resolved ccache binary path through an env var, the same
way existing probes expose executables.
- If the host is not configured with ccache-in-PATH, the test is simply
skipped via `#[cfg(host_compiler_goes_through_ccache)]`. We do not
fabricate a masquerade directory -- synthetic ccache setups prove
nothing about real user environments.
Set up (in the test):
- A test environment with a single source file `test.c`.
- An isolated `CCACHE_DIR` inside the test temp area so the test does
not pollute the developer's real ccache cache and does not get flaky
results from pre-existing cached entries.
- A build script that compiles `test.c` by invoking the compiler via
its bare name, so PATH resolution kicks in exactly as it does in a
real build. PATH itself is **not** modified -- we want the host's
default PATH, ccache symlinks included.
Run Bear in wrapper mode against this build script.
Verify:
- The command completes within a reasonable timeout (no infinite loop).
- The exit status is success.
- The output compilation database contains exactly one entry for `test.c`.
- The compiler command recorded in the entry resolves to a real compiler
binary, not Bear's wrapper.
Why this exercises the bug:
- The real ccache from the host distribution is in play, using its
actual PATH-search logic (which accepts hard links as "not itself").
- Bear's `.bear/` directory is present in PATH because Bear puts it
there in wrapper mode. Without the fix, ccache searches PATH, finds
`.bear/gcc`, and loops.
- The fix (setting `CCACHE_COMPILER` in the wrapper's child environment)
should route ccache directly to the real compiler, bypassing PATH
search entirely.
Negative check (manual, not automated):
- Temporarily revert the fix and confirm the test hangs or times out,
ensuring the test actually covers the bug rather than coincidentally
passing.
Detection is by symlink resolution, not by matching directory paths,
so new or distribution-local masquerade setups are covered as long as
their installer symlinks compiler names to a wrapper binary.
## Acceptance criteria
- [ ] Wrapper mode completes without hanging when ccache symlinks are in PATH
- [ ] The compilation database is generated correctly
- [ ] No special user-side workarounds required (no manual PATH stripping)
- [ ] Nested compiler invocations are still intercepted (`.bear/` stays in PATH)
- [ ] User's existing `CCACHE_COMPILER` setting is not overridden
- Wrapper mode completes without hanging when any supported masquerade
wrapper directory is present in PATH
- The compilation database contains one entry per compiled source file
- The compiler path recorded in each entry is an absolute path to the
real compiler, never the masquerade wrapper and never a `.bear/`
wrapper
- Nested compiler invocations (a compiler driver spawning another
bare-name compiler) are still intercepted: `.bear/` stays at the
front of the child's PATH
- The user is not required to strip any directory from PATH, unset
any environment variable, or configure `CCACHE_*` manually
- If every `gcc` on PATH is a masquerade wrapper and no real compiler
can be found past them, Bear reports a diagnostic and skips
registering that compiler (it does not fall back to the wrapper)
## Solution: Set `CCACHE_COMPILER` in the wrapper's child environment
## Implementation details
The wrapper binary already knows the real compiler path (from the config
mapping). Before spawning the real compiler, set `CCACHE_COMPILER` to
that path. This tells ccache exactly which compiler to use, bypassing
its PATH search entirely.
### Detection
This is the documented mechanism for controlling ccache's compiler
selection. It does not require removing `.bear/` from PATH, so nested
compilations remain intercepted.
For each compiler that Bear resolves during wrapper setup (from
`CC`/`CXX`/... env vars or PATH discovery), Bear classifies the
resolved binary as a masquerade wrapper by:
Changes in `bear/src/bin/wrapper.rs`, after resolving the real executable:
1. Reading the file as a symbolic link (`read_link`, followed
iteratively -- not `canonicalize`, which resolves too aggressively
and would hide, for example, `/usr/bin/gcc -> gcc-13`).
2. Taking the final target's file name and comparing it, lowercased,
against a fixed set of known wrapper names: `ccache`, `distcc`,
`icecc`, `colorgcc`, `buildcache`.
```rust
// Tell ccache to use the real compiler directly, bypassing PATH search.
// This prevents ccache from finding Bear's wrapper in .bear/ and looping.
// Only set if the user hasn't already configured CCACHE_COMPILER.
if !execution.environment.contains_key("CCACHE_COMPILER") {
execution.environment.insert(
"CCACHE_COMPILER".into(),
real_executable.to_string_lossy().into(),
);
}
```
If the match succeeds, the directory containing the resolved binary is
flagged as a masquerade directory. The resolution retries with that
directory removed from the search PATH. The process repeats until it
lands on a non-masquerade compiler or exhausts PATH.
**Complexity**: Very low. ~5 lines in `wrapper.rs`.
**Alignment**: Good. Uses ccache's own documented interface. The wrapper
already has all the information needed (real compiler path from config).
If a non-masquerade compiler is not found, Bear logs a warning and
does not register a wrapper for that name. The build will see its
normal PATH, the same as if Bear were not involved; this is strictly
better than registering a wrapper that loops.
**Why this is sufficient**:
- ccache is the only common wrapper that causes recursion with Bear.
- distcc does not loop (its PATH stripping removes `.bear/`).
- colorgcc is rare and typically configured via `~/.colorgccrc` with
explicit paths, which avoids the PATH search problem.
- Setting `CCACHE_COMPILER` when ccache is not involved is harmless --
the variable is simply ignored by non-ccache compilers.
- The `contains_key` check preserves any user-configured value.
### Scope of the change
- `bear/src/intercept/environment.rs`:
- `resolve_program_path` -- used for `CC=gcc`-style env vars
- `compiler_candidates` -- used for PATH-based discovery when no
compilers are configured
- Both paths share a helper that filters masquerade directories and
reruns the search.
- The child process's PATH is not modified; only Bear's own lookup
PATH is filtered. Masquerade directories remain visible to the
build, which matters if, for example, a Makefile hard-codes
`/usr/lib/ccache/gcc`; that call is unaffected and still intercepted
only if Bear happens to have a wrapper for the basename.
### Interaction with existing code
- The manual workaround `ccache_free_path_and_compiler` in
`integration-tests/tests/cases/intercept.rs` becomes unnecessary
once this is in. Tests that use it are rewritten to rely on Bear
itself stripping the masquerade dir, so that the test also protects
this requirement against regression.
## Non-functional constraints
- Detection must be pure filesystem inspection. No subprocess may be
spawned to identify a wrapper (cost, trust).
- Resolution failure for one compiler must not fail Bear overall;
other compilers are still registered.
- The set of recognised wrapper names is fixed in source. Uncommon
or locally built wrappers that do not match are not detected; the
user can either unset them from PATH or use preload mode.
## Testing
Given a host where `/usr/lib64/ccache/gcc -> /usr/bin/ccache` is first
in PATH:
> When the user runs `bear -- make` in wrapper mode,
> then the build completes within a normal timeout,
> and `compile_commands.json` contains one entry per source,
> and the recorded compiler path is an absolute path that is not
> a masquerade wrapper and not the Bear wrapper.
Given a host with no masquerade wrapper installed:
> When the user runs `bear -- make`,
> then Bear's resolution behaves identically to before (no filtering
> kicks in, no performance regression),
> and the compilation database is produced normally.
Given a compiler that exists only as a masquerade symlink on PATH
(no real compiler past it):
> When Bear resolves it,
> then Bear logs a warning naming the compiler and the detected
> wrapper,
> and does not register a `.bear/` wrapper for it,
> and the build uses the compiler directly without Bear interception
> for that name.
Given a nested compiler invocation (a compiler-driver calls another
bare-name compiler from the child process):
> When the child invokes `cc -c foo.c`,
> then `.bear/cc` is still first on PATH in the grandchild process,
> so the invocation is intercepted.
### CI coverage
The existing `rust CI` workflow (`.github/workflows/build_rust.yml`)
runs integration tests on `ubuntu-latest`. The Ubuntu matrix entry
runs `apt-get install -y ccache` before `cargo test`, which creates
`/usr/lib/ccache/*` symlinks. The job does NOT prepend that dir to
PATH: putting ccache first on the job PATH would inflate event
counts for every preload-mode test that asserts an exact number of
compiler invocations.
At build-time, `integration-tests/build.rs` scans well-known
locations (`/usr/lib/ccache`, `/usr/lib64/ccache`,
`/usr/libexec/ccache`) for a ccache masquerade directory and, if
found, exposes it via the `CCACHE_MASQUERADE_DIR` env var and sets
`cfg(host_has_ccache_masquerade)`. The dedicated recursion test is
gated on that cfg. At runtime the test prepends
`CCACHE_MASQUERADE_DIR` to its own child PATH, exercising the
recursion scenario regardless of the host's default PATH while
leaving other tests ccache-free.
## Notes
- The integration test for issue #686 (`wrapper_mode_resolves_cc_bare_name_via_path`)
had to manually strip ccache from PATH to work. With this fix, that
workaround could be removed.
- ccache documentation recommends placing its directory first in PATH, which
is the standard setup on Fedora, Arch, Gentoo, and other distributions.
- `CCACHE_CC` is a deprecated alias for `CCACHE_COMPILER`. We should use the
modern name.
- ccache 4.x documentation: https://ccache.dev/manual/4.10.2.html
- Related issue: #686.
### Alternatives considered and rejected
**Setting `CCACHE_COMPILER` in the wrapper's child environment.**
The original proposal. Rejected because the path the wrapper knows
IS the ccache symlink (that is what `which gcc` returned at setup),
and `CCACHE_COMPILER` pointing at a symlink-to-ccache makes ccache
recurse into itself. Empirically verified: on Fedora,
`CCACHE_COMPILER=/usr/lib64/ccache/gcc ccache gcc -c foo.c` hangs and
must be killed; `CCACHE_COMPILER=/usr/bin/gcc` works. The fix would
have required also resolving past ccache to get the real path --
which is precisely what this requirement does, making `CCACHE_COMPILER`
redundant. It is also ccache-specific and would not help with icecc,
distcc, or any other wrapper that lacks an equivalent variable.
**`CCACHE_PATH` alternative.** Set `CCACHE_PATH` to PATH minus
`.bear/`. Rejected: ccache-specific (no equivalent for other
wrappers), requires enumerating a safe PATH anyway, and does not
address the deeper issue (Bear's config pointing at the wrong
executable).
**Removing masquerade directories from the child's PATH.** Rejected:
masquerade directories might contain binaries other than the ones
that loop (e.g. some installs put `distcc` itself in the same dir);
stripping them globally would be heavy-handed. Filtering Bear's own
lookup PATH is the narrower intervention.
### Related
- Issue #445 -- original PATH-ordering report
- Issue #686 -- bare-name CC resolution (`wrapper_mode_resolves_cc_bare_name_via_path`)
- Related requirement: `interception-wrapper-mechanism`
- ccache 4.x manual: https://ccache.dev/manual/4.10.2.html
- icecream masquerade setup: https://github.com/icecc/icecream