mirror of
https://github.com/yamadashy/repomix.git
synced 2026-05-30 11:18:53 +02:00
343e6c8c8e
intent(token-count-cache): warm-run repacks of the same repo spend ~600ms in BPE tokenization for ~1000 files; persist token counts across CLI invocations so the metrics phase becomes ~free on the second run
decision(cache-shape): single shared JSON under $TMPDIR/repomix-cache/ keyed by `${encoding}:${byteLength}:${md5_16}` — content-addressed entries dedupe across repos (vendored copies, shared boilerplate) and any change to encoding/length/digest auto-invalidates without explicit accounting
decision(wrapper-cache): reuse the same cache for the output-wrapper token count — wrapper string is byte-stable across runs whenever file set, headers, instructions, and template are unchanged, and a hit replaces a ~30ms worker round-trip with MD5+Map.get
decision(save-strategy): await save at end of pack() rather than fire-and-forget so newly produced entries are not lost when a fast-exiting CLI tears down before the write flushes
decision(atomicity): write to ${cacheFile}.${pid}.tmp then rename onto destination so concurrent invocations or SIGINT mid-write cannot leave torn JSON that nukes the cache for the next run
decision(eviction): FIFO on insertion order at MAX_CACHE_ENTRIES=100_000 — true LRU would dirty the map on every read and force a write on every warm run, which costs more than it saves
decision(disable-switch): env-var only (REPOMIX_TOKEN_CACHE=0, REPOMIX_TOKEN_CACHE_PATH=…) — no config schema entry needed for a transparent perf cache with graceful degradation
rejected(per-repo-cache-file): would lose cross-repo dedupe and require placing state under .git or a path-hash directory; single shared file with FIFO eviction is simpler and still bounded
rejected(true-LRU): every getCached would require Map.delete+set to refresh order, dirtying state on read and forcing a save on warm runs even when no new entries were produced
constraint(test-isolation): tests share $TMPDIR with the developer's real cache; vitestSetup defaults REPOMIX_TOKEN_CACHE=0 so the suite neither reads nor writes the host cache, and tokenCountCache.test.ts overrides REPOMIX_TOKEN_CACHE_PATH per test to a fresh tmpdir
constraint(double-md5): cache key is computed once during miss-detection and carried alongside the file into the worker-result map so cold-cache runs do not re-hash content (~10ms saved on 1000 files)
learned(fs-permissions): cache contains digests of user file contents; the cache directory is created with mode 0o700 and the file with mode 0o600 so it is not world-readable on shared hosts
9 lines
468 B
TypeScript
9 lines
468 B
TypeScript
// Disable the token-count disk cache by default for the entire test suite so
|
|
// that (a) test runs do not read or write the developer's real cache file in
|
|
// $TMPDIR and (b) tests asserting on worker dispatch behavior are not skewed
|
|
// by entries left behind by a previous run. Tests that exercise the cache
|
|
// directly explicitly clear this variable in their own setup.
|
|
if (process.env.REPOMIX_TOKEN_CACHE === undefined) {
|
|
process.env.REPOMIX_TOKEN_CACHE = '0';
|
|
}
|