mirror of
https://github.com/rizsotto/Bear.git
synced 2026-05-28 00:20:45 +02:00
2425630e9c
Follow-up to bee907e. Narrow-scope corrections from an independent
second review of the masquerade-wrapper handling.
Behaviour
- filter_out_paths now normalises trailing path separators on both
sides before comparing, so a PATH entry written as
'/usr/lib64/ccache/' (with trailing slash) still matches an
excluded dir derived from PathBuf::parent() (which never has one).
Without this, the defensive "already excluded but returned again"
branch fired and Bear gave up with a bogus "no real compiler past
masquerade dir" warning even when a real compiler was reachable.
New unit test filter_out_paths_matches_across_trailing_separator
protects the case.
- integration-tests/build.rs now also scans
/opt/homebrew/opt/ccache/libexec and /usr/local/opt/ccache/libexec,
the default Homebrew ccache masquerade locations on Apple Silicon
and Intel macOS. Without these, a Homebrew-equipped developer saw
host_has_ccache_masquerade silently stay unset and the recursion
integration test skipped.
Tests
- wrapper_mode_survives_masquerade_wrapper_in_path now also asserts
that the recorded compiler path is absolute, matching the
acceptance criterion's "absolute path to the real compiler"
language.
Docs
- Requirement: struck the "nested compiler invocations ... .bear/
stays at the front of the child's PATH" bullet from the
acceptance criteria. That guarantee belongs to
interception-wrapper-mechanism and is preserved here by not
modifying the child's PATH; the previous wording implied this
requirement owns a guarantee it only protects.
- Requirement: clarified that the PATH-scan path
(compiler_candidates) filters per-file rather than per-directory.
Distro-shipped masquerade dirs contain only symlinks, so the
behaviours coincide in practice; the wording now matches what
the code does.
- Requirement: added a "detection is symlink-based" entry under
non-functional constraints. Masquerade wrappers installed as
shell scripts or hard copies are out of scope and will not be
detected; if a non-symlink masquerade appears in the wild,
extend detection rather than widen the classification helper to
read file contents.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
252 lines
12 KiB
Markdown
252 lines
12 KiB
Markdown
---
|
|
title: Resolve past masquerade wrappers in wrapper mode
|
|
status: implemented
|
|
---
|
|
|
|
## Intent
|
|
|
|
When the user runs `bear -- make` on a distribution that ships compiler
|
|
masquerade wrappers (ccache on Fedora/Arch/Gentoo, icecream on its
|
|
supported distros, etc.), Bear's wrapper mode must not enter an infinite
|
|
loop with the masquerade wrapper. The compilation database must record
|
|
the real compiler command, and the build must complete. The user should
|
|
not have to strip any directories from PATH to make Bear work.
|
|
|
|
Bear achieves this by resolving past masquerade directories at
|
|
discovery time. The price is that while Bear is observing the build,
|
|
tools like ccache are not exercised -- the build sees the real compiler
|
|
directly. This is intentional: Bear observes, it does not optimise.
|
|
|
|
## Background: how masquerade wrappers break Bear
|
|
|
|
Compiler masquerade wrappers (ccache, distcc, icecream/icecc,
|
|
colorgcc, buildcache) install a directory of symlinks named after real
|
|
compilers (`/usr/lib64/ccache/gcc`, `/usr/lib/icecc/bin/gcc`, ...) where
|
|
each symlink points at the wrapper binary. The distribution prepends
|
|
that directory to PATH, so a bare `gcc` in a Makefile resolves to the
|
|
wrapper, which then looks up the real compiler on PATH (skipping its
|
|
own symlinks) and forwards the call.
|
|
|
|
Bear's wrapper mode puts `.bear/` (full of hard links to `bear-wrapper`)
|
|
at the front of PATH. On a ccache-equipped box the interaction is:
|
|
|
|
1. Shell finds `.bear/gcc`, runs Bear wrapper.
|
|
2. Wrapper reads its config: real `gcc` is `/usr/lib64/ccache/gcc` --
|
|
whatever `which gcc` returned at Bear startup.
|
|
3. Wrapper execs `/usr/lib64/ccache/gcc` (which IS ccache).
|
|
4. ccache searches PATH for `gcc`, skipping symlinks to itself. It
|
|
does NOT skip `.bear/gcc` because that is a hard link, not a
|
|
symlink, so ccache accepts it as the real compiler.
|
|
5. ccache execs `.bear/gcc`, Bear wrapper runs again. Steps 2-5
|
|
repeat forever.
|
|
|
|
The same shape applies to any masquerade wrapper that detects itself
|
|
only by symlink comparison. distcc in masquerade mode happens to avoid
|
|
this specific loop because it strips all PATH entries up to and
|
|
including its own dir -- which drops `.bear/` as collateral damage --
|
|
but that still means distcc silently removes Bear from the child's
|
|
PATH, which breaks nested interception even when no loop occurs.
|
|
|
|
### Known masquerade wrappers
|
|
|
|
| Tool | Masquerade dir examples | Notes |
|
|
|----------------------|----------------------------------------------|----------------------------------------------------------------------|
|
|
| ccache | `/usr/lib64/ccache`, `/usr/lib/ccache` | Default on Fedora, Arch, Gentoo. Loops with Bear. |
|
|
| distcc | `/usr/lib/distcc`, `/usr/lib/distcc/bin` | Strips PATH prefix including `.bear/`; no loop, but breaks nesting. |
|
|
| icecream / icecc | `/usr/lib/icecc/bin`, `/usr/libexec/icecc` | Symlink pattern same as ccache. Loops with Bear. |
|
|
| colorgcc | `~/bin/colorgcc` setups | Rare; typically configured via `~/.colorgccrc`, not PATH masquerade. |
|
|
| buildcache | `/usr/lib/buildcache/bin` (varies) | Same shape as ccache. |
|
|
| sccache | Not a masquerade wrapper | Invoked explicitly (`sccache gcc ...`); no recursion with Bear. |
|
|
|
|
Detection is by symlink resolution, not by matching directory paths,
|
|
so new or distribution-local masquerade setups are covered as long as
|
|
their installer symlinks compiler names to a wrapper binary.
|
|
|
|
## Acceptance criteria
|
|
|
|
- Wrapper mode completes without hanging when any supported masquerade
|
|
wrapper directory is present in PATH
|
|
- The compilation database contains one entry per compiled source file
|
|
- The compiler path recorded in each entry is an absolute path to the
|
|
real compiler, never the masquerade wrapper and never a `.bear/`
|
|
wrapper
|
|
- The user is not required to strip any directory from PATH, unset
|
|
any environment variable, or configure `CCACHE_*` manually
|
|
- If every `gcc` on PATH is a masquerade wrapper and no real compiler
|
|
can be found past them, Bear reports a diagnostic and skips
|
|
registering that compiler (it does not fall back to the wrapper)
|
|
|
|
## Implementation details
|
|
|
|
### Detection
|
|
|
|
For each compiler that Bear resolves during wrapper setup (from
|
|
`CC`/`CXX`/... env vars or PATH discovery), Bear classifies the
|
|
resolved binary as a masquerade wrapper by:
|
|
|
|
1. Short-circuiting on non-symlinks so iterating every executable
|
|
on PATH stays cheap.
|
|
2. For symlinks, following the chain to its ultimate target. Only
|
|
the final target's basename is inspected; the resolved path
|
|
itself is discarded. Implementation uses `std::fs::canonicalize`;
|
|
iterative `read_link` would work equivalently.
|
|
3. Comparing that basename, ASCII-case-insensitive, against a fixed
|
|
set of known wrapper names: `ccache`, `distcc`, `icecc`,
|
|
`colorgcc`, `buildcache`.
|
|
|
|
Note that Bear must NOT canonicalise when *registering* a compiler,
|
|
because that would change the name (e.g. `/usr/bin/gcc` ->
|
|
`/usr/bin/gcc-13`) and break wrapper lookup. The classification
|
|
helper is used only to decide whether to keep looking; the path
|
|
stored in the wrapper config is whatever PATH resolution returned
|
|
past the masquerade dirs.
|
|
|
|
When `resolve_program_path` matches a masquerade wrapper, the
|
|
containing directory is excluded from Bear's lookup PATH and the
|
|
basename is re-resolved. The process repeats until it lands on a
|
|
non-masquerade compiler or exhausts PATH.
|
|
|
|
The PATH-scan path (`compiler_candidates`) uses the classification
|
|
helper as a per-file filter rather than a per-directory exclusion:
|
|
entries that are not themselves masquerade symlinks are registered
|
|
even when they sit in a known masquerade dir. In practice the
|
|
distro-shipped masquerade dirs contain nothing but symlinks to the
|
|
wrapper binary, so the behaviours coincide; the per-file filter
|
|
avoids surprising the user if a distribution ever adds a real
|
|
binary alongside the symlinks.
|
|
|
|
If a non-masquerade compiler is not found, Bear logs a warning and
|
|
does not register a wrapper for that name. The build will see its
|
|
normal PATH, the same as if Bear were not involved; this is strictly
|
|
better than registering a wrapper that loops.
|
|
|
|
### Scope of the change
|
|
|
|
- `bear/src/intercept/environment.rs`:
|
|
- `resolve_program_path` -- used for `CC=gcc`-style env vars
|
|
- `compiler_candidates` -- used for PATH-based discovery when no
|
|
compilers are configured
|
|
- Both paths share a helper that filters masquerade directories and
|
|
reruns the search.
|
|
- The child process's PATH is not modified; only Bear's own lookup
|
|
PATH is filtered. Masquerade directories remain visible to the
|
|
build, which matters if, for example, a Makefile hard-codes
|
|
`/usr/lib/ccache/gcc`; that call is unaffected and still intercepted
|
|
only if Bear happens to have a wrapper for the basename.
|
|
|
|
### Interaction with existing code
|
|
|
|
- The manual workaround `ccache_free_path_and_compiler` in
|
|
`integration-tests/tests/cases/intercept.rs` becomes unnecessary
|
|
once this is in. Tests that use it are rewritten to rely on Bear
|
|
itself stripping the masquerade dir, so that the test also protects
|
|
this requirement against regression.
|
|
|
|
## Non-functional constraints
|
|
|
|
- Detection must be pure filesystem inspection. No subprocess may be
|
|
spawned to identify a wrapper (cost, trust).
|
|
- Resolution failure for one compiler must not fail Bear overall;
|
|
other compilers are still registered.
|
|
- The set of recognised wrapper names is fixed in source. Uncommon
|
|
or locally built wrappers that do not match are not detected; the
|
|
user can either unset them from PATH or use preload mode.
|
|
- Detection is symlink-based. A masquerade wrapper installed as a
|
|
shell script or hard copy (rather than a symlink) is out of scope
|
|
and will not be detected. All major distros (Debian/Ubuntu,
|
|
Fedora, Arch, Gentoo, macOS Homebrew) ship masquerade dirs as
|
|
directories of symlinks, so this is a theoretical gap. If a
|
|
non-symlink masquerade does surface in the wild, extend detection
|
|
rather than widening the classification helper to read file
|
|
contents.
|
|
|
|
## Testing
|
|
|
|
Given a host where `/usr/lib64/ccache/gcc -> /usr/bin/ccache` is first
|
|
in PATH:
|
|
|
|
> When the user runs `bear -- make` in wrapper mode,
|
|
> then the build completes within a normal timeout,
|
|
> and `compile_commands.json` contains one entry per source,
|
|
> and the recorded compiler path is an absolute path that is not
|
|
> a masquerade wrapper and not the Bear wrapper.
|
|
|
|
Given a host with no masquerade wrapper installed:
|
|
|
|
> When the user runs `bear -- make`,
|
|
> then Bear's resolution behaves identically to before (no filtering
|
|
> kicks in, no performance regression),
|
|
> and the compilation database is produced normally.
|
|
|
|
Given a compiler that exists only as a masquerade symlink on PATH
|
|
(no real compiler past it):
|
|
|
|
> When Bear resolves it,
|
|
> then Bear logs a warning naming the compiler and the masquerade
|
|
> dir(s) it excluded,
|
|
> and does not register a `.bear/` wrapper for it,
|
|
> and the build uses the compiler directly without Bear interception
|
|
> for that name.
|
|
|
|
Nested compiler invocations (a compiler driver spawning another
|
|
bare-name compiler in a grandchild process) must still be
|
|
intercepted; that guarantee is not specific to masquerade handling
|
|
and is covered by `interception-wrapper-mechanism`. This
|
|
requirement preserves it by not modifying the child's PATH.
|
|
|
|
### CI coverage
|
|
|
|
The existing `rust CI` workflow (`.github/workflows/build_rust.yml`)
|
|
runs integration tests on `ubuntu-latest`. The Ubuntu matrix entry
|
|
runs `apt-get install -y ccache` before `cargo test`, which creates
|
|
`/usr/lib/ccache/*` symlinks. The job does NOT prepend that dir to
|
|
PATH: putting ccache first on the job PATH would inflate event
|
|
counts for every preload-mode test that asserts an exact number of
|
|
compiler invocations.
|
|
|
|
At build-time, `integration-tests/build.rs` scans well-known
|
|
locations (`/usr/lib/ccache`, `/usr/lib64/ccache`,
|
|
`/usr/libexec/ccache`) for a ccache masquerade directory and, if
|
|
found, exposes it via the `CCACHE_MASQUERADE_DIR` env var and sets
|
|
`cfg(host_has_ccache_masquerade)`. The dedicated recursion test is
|
|
gated on that cfg. At runtime the test prepends
|
|
`CCACHE_MASQUERADE_DIR` to its own child PATH, exercising the
|
|
recursion scenario regardless of the host's default PATH while
|
|
leaving other tests ccache-free.
|
|
|
|
## Notes
|
|
|
|
### Alternatives considered and rejected
|
|
|
|
**Setting `CCACHE_COMPILER` in the wrapper's child environment.**
|
|
The original proposal. Rejected because the path the wrapper knows
|
|
IS the ccache symlink (that is what `which gcc` returned at setup),
|
|
and `CCACHE_COMPILER` pointing at a symlink-to-ccache makes ccache
|
|
recurse into itself. Empirically verified: on Fedora,
|
|
`CCACHE_COMPILER=/usr/lib64/ccache/gcc ccache gcc -c foo.c` hangs and
|
|
must be killed; `CCACHE_COMPILER=/usr/bin/gcc` works. The fix would
|
|
have required also resolving past ccache to get the real path --
|
|
which is precisely what this requirement does, making `CCACHE_COMPILER`
|
|
redundant. It is also ccache-specific and would not help with icecc,
|
|
distcc, or any other wrapper that lacks an equivalent variable.
|
|
|
|
**`CCACHE_PATH` alternative.** Set `CCACHE_PATH` to PATH minus
|
|
`.bear/`. Rejected: ccache-specific (no equivalent for other
|
|
wrappers), requires enumerating a safe PATH anyway, and does not
|
|
address the deeper issue (Bear's config pointing at the wrong
|
|
executable).
|
|
|
|
**Removing masquerade directories from the child's PATH.** Rejected:
|
|
masquerade directories might contain binaries other than the ones
|
|
that loop (e.g. some installs put `distcc` itself in the same dir);
|
|
stripping them globally would be heavy-handed. Filtering Bear's own
|
|
lookup PATH is the narrower intervention.
|
|
|
|
### Related
|
|
|
|
- Issue #445 -- original PATH-ordering report
|
|
- Issue #686 -- bare-name CC resolution (`wrapper_mode_resolves_cc_bare_name_via_path`)
|
|
- Related requirement: `interception-wrapper-mechanism`
|
|
- ccache 4.x manual: https://ccache.dev/manual/4.10.2.html
|
|
- icecream masquerade setup: https://github.com/icecc/icecream
|