Lower `EAGER_WARMUP_THREADS` from 2 to 1 when `tokenCountCacheFileExists()`
returns true. With the persistent token-count disk cache populated by a
prior run, `calculateFileMetrics` serves every per-file token count from
the in-memory map and dispatches zero worker tasks. The only worker work
that survives caching on a warm rerun is a small fixed set of dispatches:
- the wrapper-token tokenization (cache hit after run #2)
- git diff staged/worktree token counts (only when
`output.git.includeDiffs` is enabled)
- git log token count (only when `output.git.includeLogs` is enabled)
That worst case is 2-3 short tasks (a few KB each) that fit a single warm
worker serially in well under 30 ms. Spawning a second warm worker means
a redundant ~340 ms BPE table parse that contends with the file-collection
main thread for CPU AND extends the final `pool.destroy()` blocking wait
(BPE-loaded workers take ~21 ms to terminate vs ~3 ms when idle).
Cold-cache (no cache file) behavior is preserved: the unscoped path keeps
3 warm workers and the explicit-scope path keeps 2, so the actual file
tokenizations still parallelise across the original worker counts.
The probe is a coarse heuristic — a cache file written by a previous run
that used a different `tokenCount.encoding` (e.g. cl100k_base instead of
the default o200k_base) yields no hits for the current run, so the metrics
phase pays one BPE parse sequentially on the critical path before
tokenizing files. This is a one-time cost on encoding switches; subsequent
runs rebuild the cache for the new encoding and hit again.
Benchmark (paired, n=25, repomix self-pack on 1068 files):
WARM CACHE (cache file present)
BASELINE mean=968.9ms median=976.0ms sd=40.3ms
AFTER mean=883.2ms median=875.0ms sd=33.1ms
DELTA mean=85.6 ms (8.84%) median=87.0 ms sd=42.7
t=10.02 (df=24) faster=24/25
COLD CACHE (cache file deleted before each run, n=12)
BASELINE mean=1606.3ms median=1588.0ms sd=58.6ms
AFTER mean=1593.2ms median=1598.5ms sd=58.6ms
DELTA mean=13.2 ms (0.82%) t=0.62 faster=9/12 — within noise
Stacks on top of the existing warm-cache wins on this branch (token-count
disk cache, output-wrapper cache, prefetched template, native ignore-file
prescan, etc.); this single change pushes warm-cache wall-clock another
~86 ms below the previous floor.
Companion to the previous commit. Plumb `tokenCountCacheFileExists` into
the packager `defaultDeps` so the metrics warm-up sizing can be exercised
deterministically from tests, and add a paired test that asserts the
2-warm-up-worker branch is taken when the persistent disk cache exists.
Also rename the cold-cache test to make the new gating explicit and refresh
its docstring with the warm/cold distinction.
https://claude.ai/code/session_01TJqKkJ8n3r6Pa2JdW9Vp2w
Add fs.readdir mock (returning empty array) to all relevant beforeEach
blocks so collectIgnoreFilePatterns does not fail with "entries is not
iterable". Update globby option assertions to reflect gitignore: false
and ignoreFiles: [] now that patterns are pre-collected by the prescan.
https://claude.ai/code/session_01Fm25x51fmGGeFMJyCm1CER
Final commit of the perf(core) Pre-warm security worker pool change —
extends the unit packager test and the integration packager test:
- tests/core/packager.test.ts: adds `createSecurityTaskRunner` mock to
the orchestration test's `mockDeps` and to the `parallel error
handling` `baseDeps()` shared fixture, updates the
`validateFileSafety.toHaveBeenCalledWith` assertion to expect the new
6th-argument deps object (`{ taskRunner: <Object> }`), and adds
positive/negative gate assertions —
`expect(deps.createSecurityTaskRunner).toHaveBeenCalled()` for the
default unscoped path, `.not.toHaveBeenCalled()` for the
`--include 'src'` and `explicitFiles` (--stdin) paths.
- tests/integration-tests/packager.test.ts: adds the
`createSecurityTaskRunner` stub so the default-scope path no longer
attempts to spawn a real worker pool (the previous unhandled-rejection
noise from a missing worker file URL is gone with this change).
(See PR description / first source commit for the full perf change
rationale, benchmark numbers, and correctness notes.)
Continuation of the perf(core) Pre-warm security worker pool change —
extends `mockDeps` / inline pack-test plumbing in the three smaller test
files so the default-scope path no longer attempts to spawn a real
worker pool from the test environment.
- tests/core/packager/diffsFunctionality.test.ts: adds
`mockCreateSecurityTaskRunner` to both pack-call sites.
- tests/core/packager/splitOutput.test.ts: same — adds the stub to the
inline mock deps.
- tests/core/security/validateFileSafety.test.ts: updates the
`runSecurityCheck` call assertion to include the new
`{ taskRunner: undefined }` deps argument forwarded by
`validateFileSafety` when no pre-warmed runner is provided.
(See PR description / parent commit for the full perf change rationale,
benchmark numbers, and correctness notes.)
Packing.
Bumps EAGER_WARMUP_THREADS from 2 to 3 in src/core/packager.ts when the
user did not narrow the file set via --include / config.include / --stdin.
Tinypool fixes maxThreads at construction, so the 3rd worker must be
pre-warmed during the searchFiles + collectFiles window or it stalls
dispatch (a 4-thread / 2-warm experiment regressed by 27% paired in a
prior iteration). With explicit scope the file set is typically a few
hundred files, the metrics phase is shorter, and the 3rd worker's
~250ms BPE warm-up dominates the parallelism gain — paired benchmarks
regressed -11.85% on the 258-file `--include 'src,tests'` workload at
unconditional EAGER_WARMUP_THREADS=3, so the heuristic falls back to 2.
Reasoning.
After change 3 on this branch (eager metrics warm-up), the metrics phase
is the dominant wall-clock contributor on the default 1046-file workload
(~770 ms in `calculate metrics`, vs ~120 ms output generation, ~370 ms
search, ~270 ms collect, ~200 ms security check). Five sub-agent
investigations over independent scopes (CLI startup, file search/glob,
file collect/security, output generation, token counting) converged on
metrics worker count as the only candidate clearing the 2% bar without
regressing other phases. Output gen, security pre-warm, file-search
scoping, and CLI lazy-load were all measured below threshold or net-
negative; documented as the previous iteration's notes plus the
follow-on attempts here:
- EAGER_WARMUP_THREADS=3 unconditional: -11.85% paired regression on
the 258-file workload (n=20, t=-10.85), +2.92% on the 1046-file
workload — net negative because small workloads can't amortize the
extra BPE parse.
- Pre-warm the security worker pool gated on the metrics warm-up:
security-check phase shrunk from 197 ms to 110 ms, but the saving was
absorbed by the parallel `Process Files` branch and an offsetting
worker-spawn cost during collect. Paired n=30 measured -4.90% on
258-file and 0.81% (noise) on 1046-file. Reverted.
Verification.
Paired interleaved benchmarks (n=20, NODE_DISABLE_COMPILE_CACHE=1):
Default workload — `node bin/repomix.cjs --quiet` (1046 files):
| | min | median | mean | max | sd |
|--------|---------|---------|---------|---------|--------|
| BEFORE | 1820 ms | 1885 ms | 1886 ms | 2020 ms | 45 ms |
| AFTER | 1700 ms | 1845 ms | 1840 ms | 1970 ms | 62 ms |
- Mean paired Δ: +46.5 ms (2.46% wall-clock reduction)
- Median paired Δ: +50.0 ms (2.65%)
- Paired-delta SD: 65.3 ms · paired t = 3.18 (p < 0.01)
- AFTER faster in 15/20 pairs (75%)
Scoped workload — `node bin/repomix.cjs --include 'src,tests' --quiet`
(258 files):
| | min | median | mean | max | sd |
|--------|---------|---------|---------|---------|--------|
| BEFORE | 900 ms | 955 ms | 953 ms | 990 ms | 25 ms |
| AFTER | 910 ms | 940 ms | 946 ms | 1010 ms | 29 ms |
- Mean paired Δ: +6.5 ms (0.68%) — neutral within noise (t = 0.90)
- The heuristic falls back to 2 warm workers, so this branch matches
pre-change behavior; the small positive delta is sampling noise.
An independent reviewer's paired n=15 NODE_DISABLE_COMPILE_CACHE=1 run
on a separate sample reported +4.10% (t=6.61, 14/15 pairs) on the
default workload, consistent direction at higher magnitude.
Correctness.
- All 1260 unit tests pass (`npm test`); 3 new tests in
`tests/core/packager.test.ts` exercise both heuristic branches plus
the `--stdin` (explicitFiles) path.
- `npm run lint` clean (only pre-existing warnings unchanged from main).
- XML and Markdown output byte-identical between BEFORE and AFTER on
both workloads (verified via sha256sum).
- Worker-pool size confirmed via `--verbose` logs:
- Default scan: `min=1, max=3 threads` for `calculateMetrics`.
- `--include 'src,tests'`: `min=1, max=2 threads` (unchanged).
- Single-CPU and 2-CPU hosts are unaffected (`min(cpuCount, 3) =
min(cpuCount, 2)` for cpuCount ≤ 2).
- Public `pack()` API unchanged (no new parameters; the heuristic reads
existing `config.include` and `explicitFiles` arguments).
Risks.
The heuristic is a coarse proxy. Pathological cases:
- User runs default scan on a tiny repo (~50 files): 3 workers, +1
extra BPE parse. The cost is bounded by the eager-warm-up overlap
with searchFiles/collectFiles, so the worst case approaches the
paired noise floor (~30 ms sd on 258-file). Not measured below 50
files; expected to be neutral-to-slightly-negative within typical
run-to-run variance.
- User runs `--include 'huge-dir'` on a 5000-file project: 2 workers,
misses the parallelism win. Falls back to current production
behavior — no regression vs main.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Background
----------
Each metrics worker independently parses gpt-tokenizer's ~2.2 MB
`o200k_base.js` BPE table on its first task (~200-300 ms pure-CPU per
worker). The pool was previously created in `pack()` after the file
search and sort phases, so the only stages that could absorb the BPE
warm-up were `collectFiles` + git subprocesses + security check + file
processing. On a 258-file run this still left a residual ~80-130 ms
`await metricsWarmupPromise` stall before the metrics phase.
Change
------
Move `createMetricsTaskRunner` to fire before `searchFiles`. This adds
the ~130 ms glob scan to the hidden warm-up budget and shrinks the
residual stall to ~0-12 ms on the 258-file workload.
Pool sizing: Tinypool fixes `maxThreads` at construction, and the file
count is not yet known. Pre-warming exactly the workers we'll use is
essential — Tinypool queues tasks for newly spawned (cold) workers and
the pipeline can't progress until those workers finish their BPE parse
and pick up the queued task (an experiment with `maxThreads=cpuCount=4`
and only 2 warm workers regressed the 258-file workload by 27 % paired).
So the pool is sized to a fixed 2 workers (`numOfTasks = 2 ×
TASKS_PER_THREAD = 400` → `maxThreads = min(cpuCount, 2)`), matching the
security pool's hard cap and the typical metrics pool size for repos
≤400 files after the TASKS_PER_THREAD=200 sizing on this branch.
Larger repos (>400 files) would benefit from more parallelism, but the
1046-file regression check below shows the eager-warmup gain still
net-improves wall-clock at maxThreads=2 (the BPE warm-up cost
~250 ms × cpuCount-2 extra workers dominates the parallelism savings on
the metrics phase). On single-CPU hosts the heuristic naturally
collapses to maxThreads=1, identical to today's behavior.
The `try { } finally { cleanup }` block is widened to cover the new
early call so the worker pool is cleaned up on early throws too. A new
`searchFiles`-rejection test in `tests/core/packager.test.ts` exercises
that path explicitly.
`TASKS_PER_THREAD` is exported from `processConcurrency.ts` and consumed
by name in `packager.ts` to keep the eager-warmup constant tied to the
shared sizing rule.
Benchmark
---------
Both runs use n=… paired interleaved (alternating BEFORE-first /
AFTER-first ordering) with `NODE_DISABLE_COMPILE_CACHE=1` so cold-start
BPE parse is measured rather than masked. 4-vCPU Intel(R) Xeon(R) host.
`node bin/repomix.cjs --include 'src,tests' --quiet` (258 files, n=20):
| | min | median | mean | max | sd |
|--------|---------|---------|---------|---------|--------|
| BEFORE | 1007 ms | 1044 ms | 1054 ms | 1164 ms | 36 ms |
| AFTER | 893 ms | 966 ms | 962 ms | 1065 ms | 36 ms |
- Mean paired Δ: +91.6 ms (8.69 % wall-clock reduction)
- Median paired Δ: +97.5 ms (9.34 %)
- Paired-delta SD: 36.0 ms · paired t = 11.39 (p < 0.001)
- AFTER faster in 20/20 pairs (100 %)
Regression check — `node bin/repomix.cjs --quiet` (default, 1046 files,
n=15) on a clean repo (baseline binary built outside the working tree
so it does not get picked up as a workload file):
| | min | median | mean | max | sd |
|--------|---------|---------|---------|---------|--------|
| BEFORE | 1769 ms | 1872 ms | 1877 ms | 2063 ms | 79 ms |
| AFTER | 1751 ms | 1820 ms | 1837 ms | 2018 ms | 61 ms |
- Mean paired Δ: +40.0 ms (2.13 %)
- Median paired Δ: +48.6 ms (2.60 %)
- Paired-delta SD: 51.7 ms · paired t = 2.99 (p ≈ 0.01)
- AFTER faster in 11/15 pairs (73 %)
The larger workload also clears the 2 % threshold; the eager warm-up's
gain offsets the maxThreads=2 cap that's now applied unconditionally.
Correctness
-----------
- All 1257 unit tests pass (`npm test`); `npm run lint` clean (only
pre-existing warnings).
- XML and Markdown output byte-identical between BEFORE and AFTER on
both the 258-file and 1046-file workloads.
- Worker-pool size confirmed via `--verbose` logs: `min=1, max=2 threads`
for `calculateMetrics` on both workloads (was `max=2` on 258 files,
`max=4` on 1046 files before this change).
- New test `cleans up the metrics worker pool when searchFiles rejects`
exercises the widened `try/finally` cleanup path.
Three constructor variants were silently dropped during --compress:
- `const Foo(...);` and `const Foo.named(...) : ...;` parse as
`(declaration (constant_constructor_signature ...))` — a node type the
existing constructor query did not list.
- `const factory Foo() = Bar;` parses as
`(redirecting_factory_constructor_signature (const_builtin) (identifier) ...)`
whose first named child is `const_builtin`, so the leading-anchor
`. (identifier)` pattern failed to match.
- `external factory Foo.make();` parses as
`(declaration (factory_constructor_signature ...))` — bare under
`declaration`, not wrapped in `method_signature`, so the existing
factory query missed it.
Switch the constructor / factory / redirecting-factory queries to
capture the whole signature node as `@name.definition.method`. This
emits the same source line(s) DefaultParseStrategy already produces and
is robust across all body / external / const / redirecting variants.
Two pre-existing gaps surfaced while extending queryDart:
- Plain constructors (e.g. `Animal(this.name);`) live directly under
`declaration`, not wrapped in `method_signature`, so the existing
`(method_signature (constructor_signature ...))` query never matched
them. Add a sibling query against `(declaration (constructor_signature ...))`.
- Operator overloads (`operator +`, `operator []`, `operator []=`,
`operator ==`, ...) parse as `(method_signature (operator_signature ...))`
but `operator_signature` has no identifier name field — the operator
token surfaces as `(binary_operator)` / `([])` / `([]=)` children.
Capture the whole `operator_signature` as `@name.definition.method` so
DefaultParseStrategy emits its full source range.
Verified against `--compress` on a real Dart file: signatures that were
previously dropped (only their `///` doc comments survived) now appear
in compressed output.
intent(dart-query): make --compress preserve Dart definition kinds that were silently dropped — mixin, typedef, getter, setter, factory, and redirecting factory
decision(capture-naming): align Dart captures with the dominant @name.definition.X convention used by queryTypeScript/queryPython/queryRust; output is unchanged because DefaultParseStrategy matches via name.includes('name')
constraint(redirecting-factory): tree-sitter-dart grammar makes redirecting_factory_constructor_signature a child of `declaration`, not `method_signature`, so it must be queried bare to avoid a "Bad pattern structure" parse error
constraint(type-alias): type_alias's name node is `type_identifier`, not `identifier` — using `identifier` would silently match nothing
learned(external-keyword): `external` modifier in Dart is a sibling token outside function_signature/method_signature, so existing captures already cover `external void foo();` without changes
intent(ci-green): typos@1.45.1 flags `mis` as a typo of `miss`/`mist` and the
hyphenated `mis-classified` in the new PDF-magic regression test comment
trips the `Check typos` job. The unhyphenated `misclassified` is the more
common spelling and passes the dictionary.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cuts per-thread IPC overhead in calculateFileMetrics by ~80% by sending
fewer, larger batches to the metrics worker pool. At ~1000 files this
reduces the dispatched task count from ~100 to ~20 (~5 batches per
worker on a 4-thread pool instead of ~25), saving ~40 estimated
worker-ms of pure serialization/dispatch overhead per worker on the
critical path.
The previous size of 10 was tuned for fast worker availability to
overlap with output generation, but tinypool round-trips were measured
at ~2ms minimum (batches of trivially small files bottom out there),
so the IPC dominated for short batches. Increasing to 50 keeps load
balance comparable (~5:1 per thread vs. the prior ~25:1) and avoids
the load-imbalance pitfall of even larger sizes (e.g. 100), where one
oversized batch can monopolize a worker and stretch the tail.
Benchmark on this repo (1016 files, 4 cores; 100 paired interleaved
runs alternating BATCH=10 and BATCH=50 with the same prebuilt JS,
swapping only the constant, full pipeline timed end-to-end):
BATCH=10 mean 1.5589s median 1.5480s stdev 0.0966s
BATCH=50 mean 1.5111s median 1.5030s stdev 0.0625s
Saved 47.8ms (3.07%) mean / 45.0ms (2.91%) median
(trimmed mean 41.0ms, dropping top/bottom 10%)
Wins 69/100 paired runs faster with BATCH=50
t-test paired t = 4.545 (df=99, p < 0.001)
95% CI [27.2 ms, 68.4 ms] on the mean improvement
Sign test 2-sided p = 0.0002
Also adds a regression test for the batching path: with 120 input files
the test asserts result order/completeness and that the progress
callback fires once per dispatched batch (not per file, not just once).
All 1250 tests pass; output is byte-identical for the same source tree.
- validateFileSafety: pin the negative path of `if (config.security.enableSecurityCheck)`
— every other test enabled the check, so a regression that always runs
the security check would have passed silently.
- unifiedWorker:
- Add a positive workerData=securityCheck + ambiguous-task case so the
pair (override + this) distinguishes "inference always wins" from
"inference wins only when it yields a value".
- Stop pretending the handler-cache test verifies caching. Both branches
of `if (cached) return cached;` end with the same Map.set, and Node's
own module cache makes the dynamic import effectively free, so the
cache is unobservable from outside without exposing internals.
Renamed to "repeated calls" with a comment explaining the limitation.
- fileSystemReadDirectoryTool: translate the pre-existing Japanese comment
to English per CLAUDE.md.
- TokenCounter: extract `LoadEncodingFn` type alias instead of the
unusual `typeof loadEncoding`, so a signature drift between the local
function and the deps field would surface at the type level.
- outputGenerate: tests titled "throws RepomixError…" / "wraps … in
RepomixError" now assert the rejection is an instance of RepomixError
in addition to the message regex, matching the test names.
- LanguageParser: collapse the duplicate getParserForLang('javascript')
rejection assertions into a single .catch capture that checks both
type and message.
- calculateMetrics: vi.mocked(initTaskRunner).mockReset() before
mockReturnValueOnce so a future test that omits taskRunner can't
silently consume the override.
- packager: pre-attach a no-op .catch on the rejected warmupPromise so
vitest's unhandled-rejection detection doesn't fire before pack
awaits it. Production code mirrors this pattern in packager.ts:262.
Address two review threads on PR #1518 that flagged tests whose titles
overstated what was being verified.
- fileProcess: the longBase64 string is one continuous line, so the
truncateBase64 → removeEmptyLines ordering was never actually under
test (truncateBase64Content's regex does not span newlines). Rename
to describe the combined behavior the test really pins.
- skillTechStack: rename the per-directory case to reflect that root
and subpackage land in separate buckets keyed by getDirPath, and
add a second case with two package.json entries at the same path
to genuinely exercise the parsed.packageManager && !result.packageManager
guard at skillTechStack.ts:541.
Address PR review feedback (claude R4): the previous error-handling tests
overwrote the private `countFn` field via a cast, which silently breaks
on a rename. Add a `deps` parameter to TokenCounter that defaults to the
real `loadEncoding`, and switch the error-handling tests to inject a fake
that returns the throwing function directly. Matches the dependency
injection pattern documented in CLAUDE.md.
- shared/errorHandle: recognize duck-typed OperationCancelledError from
worker boundaries in isRepomixError (it extends RepomixError but the
name was missing from the structured-clone fallback comparison).
Add a regression test for the worker-boundary case.
Test improvements per coderabbit / claude review:
- cliReport: assert skill-directory + relative path on the same log line.
- processConcurrency: restore process.versions.bun by removing the property
when it didn't originally exist, instead of leaving it defined-as-undefined.
- logger: drop the no-op `process.env.REPOMIX_LOG_LEVEL = undefined` (it
coerces to the string "undefined" and is overwritten by the next delete).
- unifiedWorker: replace the tautological cache test with one that proves
cache uniqueness via onWorkerTermination cleanup count; add a test for
task-based inference overriding workerData (bundled-env reuse).
- calculateMetricsWorker: new direct test for the default export's items
vs. single-mode dispatch — unifiedWorker mocks this module so the branch
was otherwise untested.
- packRemoteRepositoryTool: hard-code the expected output path instead of
expect.any(String) to catch arg-swap regressions.
- memoryUtils: tighten getMemoryStats assertions with sanity bounds
(heapUsed <= heapTotal, rss > 0, heapUsagePercent <= 100) so a
unit-conversion regression (bytes vs MB) would fail the test.
Targeted regression tests for the high-risk areas identified in the
v1.13.1..main audit, focusing on silent-correctness bugs and parallel
error handling — places that wouldn't surface in CI but would in the
field.
- core/metrics/calculateMetrics: pin numeric equivalence between the
fast path (Σ file tokens + wrapper tokens) and the slow path (full
output tokenization). Cover wrapper-extraction fallback, split-output
fallback, and worker pool cleanup when fileMetrics rejects.
- core/file/fileProcess: pin transform ordering invariants —
removeComments → removeEmptyLines (blank lines from comment removal
must be cleaned up; preserved when removeEmptyLines is off);
truncateBase64 → removeEmptyLines (multi-line base64 squashed first);
trim → showLineNumbers (no leading/trailing blanks numbered).
Plus worker/lightweight path parity for inputs that don't need
worker processing.
- core/packager: pin metrics worker pool cleanup on parallel branch
failures (validateFileSafety, produceOutput, calculateMetrics, warmup
rejection). Verify prefetchSortData failure is isolated and does not
block sortOutputFiles.
- core/skill/skillTechStack: cover untested fix-commit invariants —
root entry sorts first in monorepo output; configFiles deduplicated
within a directory; first-seen packageManager wins per directory.
intent(truncateBase64-tests): add explicit coverage for the new fast-path guards introduced alongside the regex-skip optimization
decision(test-cases): focus on the four cases that exercise guard behavior not previously asserted — empty input, exactly-below-threshold (255 chars), run-reset on non-base64 separator, and non-base64 data URI without `;base64,`
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge the two separate globby traversals used by `searchFiles` into a
single one and parallelize the per-directory `readdir` calls used to
filter empty directories.
Background
----------
When `output.includeEmptyDirectories` is enabled (the default for
`repomix.config.json` in this repo, and any repo that wants an accurate
directory tree), `searchFiles` previously walked the working tree twice:
once with `onlyFiles: true` and a second time with `onlyDirectories:
true`. Each call re-traversed the tree and re-parsed every `.gitignore`
/ `.repomixignore` file. `findEmptyDirectories` then issued `readdir`
serially for every matched directory, awaiting each syscall before
starting the next.
Change
------
* Replace the two globby invocations with one `objectMode: true,
onlyFiles: false` call. Partition the returned `GlobEntry[]` by
`dirent.isFile()` / `dirent.isDirectory()`, matching the previous
`onlyFiles: true` semantics for symlinks and other non-file
non-directory entries.
* Rewrite `findEmptyDirectories` to run the per-directory `readdir`
checks concurrently via `Promise.all`. Ordering is preserved by the
result array and the caller sorts the final list anyway.
* When `includeEmptyDirectories` is disabled, keep the fast
`onlyFiles: true` path unchanged so the default CLI run pays no cost.
Benchmark (hyperfine, repomix packing itself, 30 runs, warmup 3)
----------------------------------------------------------------
Run 1: baseline 2.162s ± 0.042s → perf 2.017s ± 0.029s → -145ms (-6.7%)
Run 2: baseline 2.161s ± 0.023s → perf 2.030s ± 0.027s → -131ms (-6.1%)
Per-stage verbose timings:
baseline: [globby files 200ms] + [globby dirs 85ms] + [empty dirs 61ms]
perf: [combined globby 223ms] + [empty dirs 66ms]
saved: -57ms consistently on the critical path
- Fix calculateMetrics test to use parsableStyle: true so it exercises
the fallback path (calculateOutputMetrics mock) instead of accidentally
hitting the fast path
- Correct packager comment to clarify that git-log is cached but the
array sort itself runs twice (negligible cost)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Export the two helper functions and add 13 unit tests covering:
- extractOutputWrapper: normal extraction, missing content (null),
empty files, identical content, wrong order, no files, no wrapper
- canUseFastOutputTokenPath: each style variant, splitOutput,
parsableStyle
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The function now always calculates metrics for all files, so the
"Selective" prefix no longer reflects its behavior.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously, per-file tokenization was only done for all files when
`tokenCountTree` was enabled; otherwise only the top N files (by char
count) were tokenized for the "Top Files" display. But output
tokenization always processes all file contents anyway, so the
"selective" path was not saving any work — it just prevented the fast
output-token path from being used.
Now we always tokenize every file individually, which:
- Enables the wrapper-extraction fast path regardless of tokenCountTree
- Simplifies the metrics pipeline by removing the conditional branching
- Provides complete per-file token data for all downstream consumers
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When `tokenCountTree` is enabled `calculateSelectiveFileMetrics` already
tokenizes every file individually on the primary worker pool. The original
`calculateOutputMetrics` then re-tokenized the full output a second time, split
into 200 KB chunks, to compute `totalTokens`. On large repos with the tree
display enabled, this second pass was the single longest task in the
`calculateMetrics` `Promise.all`, consuming roughly 1 second of worker time
that duplicated work already done for the per-file counts.
This change introduces a fast path for the common case (xml / markdown / plain
output, non-parsable, single-part): walk the generated output with
`indexOf(file.content, cursor)` once per file to splice file contents out of
the output, tokenize only the remaining "wrapper" (template boilerplate +
directory tree + git diff/log + per-file headers), and compute
`totalTokens = Σ per-file tokens + wrapper tokens`.
The accuracy delta versus the old 200 KB-chunk approach is bounded by BPE
merges across file↔wrapper boundaries; on the repomix repository itself the
measured error was 309 / 1,284,067 tokens ≈ 0.024 %, comparable to the chunk
boundary error the existing approach already accepts.
## Implementation
- `src/core/metrics/calculateMetrics.ts`
- Add `extractOutputWrapper(output, processedFilesInOutputOrder)` which
walks the output with a single forward cursor. Returns `null` and
triggers a fall back to `calculateOutputMetrics` if any file content is
not found (e.g., template escaped it, output was split, order mismatch).
- Add `canUseFastOutputTokenPath(config)` gate: only enabled when
`tokenCountTree` is truthy, `splitOutput` is undefined, `parsableStyle`
is false, and the style is `xml` / `markdown` / `plain`. JSON output
and parsable XML go through `JSON.stringify` / `fast-xml-builder` which
escape file contents, so `indexOf(content)` would miss them.
- In `calculateMetrics`, when the fast path is available and wrapper
extraction succeeds, replace `outputMetricsPromise` with a promise that
awaits the already-running `selectiveFileMetricsPromise`, sums the
per-file token counts, and dispatches a single `runTokenCount` on the
extracted wrapper string. The rest of the `Promise.all` is unchanged.
- `src/core/packager.ts`
- Call `sortOutputFiles(filteredProcessedFiles, config)` once in `pack`
immediately after suspicious-file filtering and use its result as
`processedFiles` downstream (for `produceOutput`, `calculateMetrics`,
and the final result object). `generateOutput` internally calls
`sortOutputFiles` as well, which is stable and memoized via
`fileChangeCountsCache`, so the two now share the single git-log
subprocess result and consumers see files in the exact order they
appear in the output. This is a precondition for the fast path's
forward-walk extraction.
- Expose `sortOutputFiles` on `defaultDeps` so existing packager unit
tests can inject their own implementation.
- `tests/core/packager/diffsFunctionality.test.ts`
- Extend the `gitRepositoryHandle.js` `vi.mock` to also stub
`isGitInstalled` and `getFileChangeCount`, since `sortOutputFiles`
resolves its default dependencies from that module at module load time.
All 1102 existing tests pass unchanged; lint is clean.
## Benchmark
Interleaved 30-run benchmark against the repomix repo itself (1018 files,
~4 MB xml output, `tokenCountTree: 50000`, `sortByChanges: true`, `includeDiffs`
and `includeLogs` enabled via the repo's own `repomix.config.json`):
base median: 2735.2 ms [2389 - 3528] IQR=367 ms
opt median: 2373.6 ms [2125 - 2653] IQR=293 ms
delta: -361.6 ms (-13.22%)
Verbose trace before/after (single run, representative):
before:
Selective metrics calculation completed in 639 ms
Output token count completed in 1046 ms
Calculate Metrics wall: 1296 ms
after:
Selective metrics calculation completed in 579 ms
Fast-path output tokens: files=1017293, wrapper=33678 (126996 chars)
Calculate Metrics wall: ~580 ms
The savings are concentrated in the `calculateMetrics` phase, which was the
dominant critical path in the final `Promise.all` for tokenCountTree runs on
large repos.
The @xmldom/xmldom upgrade removed the errorHandler option and changed
documentElement to be nullable. Switch to the new onError callback and
add null checks so the test suite compiles and runs against the new API.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
JS strings use UTF-16 encoding where character count != byte count.
Use 'K characters' for technical accuracy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Benchmarks show 200KB chunks are optimal for output token counting,
reducing worker round-trips while maintaining good parallelism across
available CPU cores.
For a 3.9MB output (typical large repo), this reduces chunks from 39
to 20, saving ~46ms per run due to fewer structured-clone round-trips.
Benchmark results (repomix self-pack, 996 files, 3.8M chars, 5 runs):
- Before (100K chunks): 1384ms median
- After (200K chunks): 1293ms median
- Improvement: ~91ms = ~6.6%
Combined with existing batch IPC optimization, total improvement vs
baseline is ~156ms = ~10.8%.
https://claude.ai/code/session_01NjmXXUzBrB2oe4FD82NpGe
Selective file metrics previously sent one IPC round-trip per file to
worker threads for token counting. With ~991 files and ~0.5ms overhead
per round-trip, this added ~495ms of pure IPC waste.
This change introduces batch mode for the metrics worker, grouping files
into batches of 50 before sending to workers. This reduces round-trips
from 991 to 20.
Type safety improvement over the original approach: instead of scattering
`as number` casts across all callers, a new metricsWorkerRunner module
centralizes the type narrowing in two helper functions (runTokenCount and
runBatchTokenCount), keeping all other modules fully type-safe.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move getProcessConcurrency from a direct module import to the deps
parameter for consistency with initTaskRunner. This makes it easier
to test with different concurrency values without module-level mocking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add maxWorkerThreads option to WorkerOptions for explicit thread count
capping, then use it to reduce CPU contention when metrics and security
worker pools run concurrently during the pipeline overlap phase.
- Metrics pool: capped at (processConcurrency - 1)
- Security pool: capped at floor(processConcurrency / 2)
On a 4-core machine this reduces concurrent threads from 8 (4+4) to 5
(3+2), avoiding context-switching overhead during gpt-tokenizer warmup.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add an archive entry filter that checks file extensions with isBinaryPath
before writing to disk, avoiding unnecessary I/O for binary files (images,
fonts, executables, etc.) that would be excluded later anyway.
The filter strips the leading tar segment (e.g. "repo-branch/") since tar's
filter callback receives paths before strip is applied.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
A batch size of 50 still reduces IPC round-trips by ~98% (990 → 20)
while producing enough batches to utilize all available CPU cores
on multi-core systems.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add comment explaining why numOfTasks uses totalItems instead of
batches.length (passing batches.length would yield maxThreads=1,
forcing sequential execution)
- Fix test comments that incorrectly referenced batch size 100
when actual BATCH_SIZE is 500
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fire maxThreads warmup tasks so every worker thread has gpt-tokenizer
loaded before metrics calculation begins. Combined with the early
warmup position (before collectFiles/securityCheck), this eliminates
cold-start latency on all threads without adding to the critical path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cover the new factory function: return shape validation, warmup task
payload verification, and error swallowing behavior.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move worker thread warmup from packager into createMetricsTaskRunner,
which now returns both a taskRunner and warmupPromise. This keeps the
packager clean — it no longer needs to know warmup implementation details.
Also:
- Skip metrics worker pool creation on skill-generation path where
it is unused
- Await warmupPromise in finally block before cleanup to prevent
tearing down workers during initialization
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
codeload.github.com resolves branches, tags, and SHAs automatically
without refs/heads/ or refs/tags/ prefixes. This eliminates the tag
fallback URL entirely and simplifies buildGitHubArchiveUrl to a single
return statement, saving an extra round trip for tag-based downloads.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Download archives from codeload.github.com instead of github.com/archive
to eliminate the intermediate 302 redirect, saving ~100-300ms per request.
This is the same pattern used by create-react-app and degit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CHUNK_SIZE was used as the number of chunks (1000), creating ~1KB chunks
for 1MB output. Each chunk dispatched a worker task with ~0.5ms overhead
for serialization, scheduling, and callback resolution, totaling ~500ms
of overhead that dominated the actual tokenization work.
Replace with TARGET_CHARS_PER_CHUNK (100,000) so chunks are sized by
content rather than count. A 1MB output now produces ~10 chunks instead
of ~1000, reducing worker round-trip overhead by ~99%.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When includeEmptyDirectories is enabled, buildOutputGeneratorContext
called searchFiles a second time just to obtain emptyDirPaths, despite
these already being computed during the initial file search in packager.
Changes:
- Capture emptyDirPaths from the initial searchFiles result in packager
and thread them through the pipeline (packager → produceOutput →
generateOutput/outputSplit → buildOutputGeneratorContext)
- Guard emptyDirPaths processing with includeEmptyDirectories check to
skip unnecessary work when the feature is disabled
- Fix split output path which was not receiving emptyDirPaths despite
the parameter being declared in produceOutput's signature
- Add tests for cache hit (searchFiles not called) and fallback paths
Local benchmark (repomix on itself, includeEmptyDirectories: true):
main: 696.6ms ± 4.2ms
branch: 637.1ms ± 2.6ms
Improvement: ~60ms (~8.5%)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The output parameter was typed as `string | string[] | Promise<...>` but
callers can always wrap sync values in Promise.resolve(). Simplifying to
`Promise<string | string[]>` makes the interface cleaner.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>