mirror of
https://github.com/yamadashy/repomix.git
synced 2026-05-30 11:18:53 +02:00
cf013a0c8f
Background ---------- On a typical CLI run (`node bin/repomix.cjs --include 'src,tests' --quiet`, 258 files, 4-vCPU host), the metrics worker pool was sized as `ceil(258 / 100) = 3 workers`. Combined with the security pool's hard cap of 2 workers (securityCheck.ts:90) and the main thread, the process held 6 active threads on 4 cores during the overlap of `validateFileSafety` and `calculateMetrics`. Each metrics worker independently parses gpt-tokenizer's ~2.2 MB `o200k_base.js` BPE table on its first task — a ~200-300 ms pure-CPU operation per worker. Spawning 3 cold metrics workers in the warm-up phase (calculateMetrics.ts:46-48) therefore drove the security workers off the CPU during their own (concurrent) cold-start, inflating the critical-path security phase. Change ------ Raise `TASKS_PER_THREAD` from 100 to 200 so: - ≤200 file repos: 1 metrics worker (was 1) — no change - 201-400 file repos: 2 metrics workers (was 3) — -1 worker, the win - 401-600 file repos: 3 metrics workers (was 4-cap) — -1 worker - 601-800 file repos: 4 metrics workers (was 4-cap) — no change - 801+ file repos: 4 metrics workers (was 4-cap) — no change (cap) For the 258-file benchmark this brings active workers during the metrics+security overlap to 2 + 2 = 4, matching CPU count, and halves the parallel BPE-loading work in the warm-up phase. Tests for `getWorkerThreadCount` and `createWorkerPool` are updated to reflect the new ratio. Benchmark --------- `node bin/repomix.cjs --include 'src,tests' --quiet` (258 files), n=20 paired interleaved (alternating BEFORE-first / AFTER-first ordering): | | min | p25 | median | p75 | mean | sd | |--------|---------|---------|---------|---------|---------|--------| | BEFORE | 1045 ms | 1092 ms | 1109 ms | 1122 ms | 1107 ms | 27 ms | | AFTER | 937 ms | 973 ms | 991 ms | 1020 ms | 994 ms | 29 ms | Mean paired Δ: +112.5 ms (10.17 % wall-clock reduction) Median paired Δ: +115.4 ms (10.66 % wall-clock reduction) Paired-delta SD: 36.2 ms (paired t = 13.88, p < 0.001) AFTER faster in 20/20 pairs (100 %) Regression check — `node bin/repomix.cjs --quiet` (default, 1572 files), n=15 paired interleaved: | | min | p25 | median | p75 | mean | sd | |--------|---------|---------|---------|---------|---------|--------| | BEFORE | 1933 ms | 1970 ms | 2016 ms | 2102 ms | 2028 ms | 62 ms | | AFTER | 1955 ms | 1966 ms | 2004 ms | 2131 ms | 2034 ms | 74 ms | Mean paired Δ: -6.2 ms (-0.31 %) (paired t = -0.29, p > 0.05) Median paired Δ: -12.7 ms (statistically neutral) No regression on the large workload — both 100 and 200 saturate the per-CPU cap at 4 workers for ≥800 file repos, so the dispatch-time behavior is identical there. Correctness ----------- - 1256 / 1256 unit tests pass. - `npm run lint` clean (only pre-existing warnings unrelated to this change). - No behavioral change to file processing, tokenization, security checks, or output. Pool sizing is the only effect.