mirror of
https://github.com/yamadashy/repomix.git
synced 2026-05-30 11:18:53 +02:00
fb4c895085
The security worker pool currently spawns its 2 workers lazily inside
`runSecurityCheck`, paying a ~50 ms `@secretlint/core` +
`@secretlint/secretlint-rule-preset-recommend` module load on each
freshly spawned worker (~100 ms wall-clock for both workers loading
concurrently). That cold-start cost runs on the critical path inside
the security-check phase, before any scanning begins.
Mirror the existing `createMetricsTaskRunner` pattern: hoist the pool
construction to `pack()` and dispatch one no-op task per worker at the
pipeline entry, so the module load overlaps with the collectFiles + git
ops phase (~200 ms) instead of stalling the security check.
## Mechanism
- New `createSecurityTaskRunner(numOfTasks, deps?)` in
`src/core/security/securityCheck.ts` returns
`{ taskRunner, warmupPromise }`. The warm-up dispatches `maxThreads`
no-op tasks (`{ items: [] }`) — Tinypool spawns a fresh worker for
each concurrent task, fanning out the @secretlint/core load across
all workers in parallel.
- `runSecurityCheck` accepts an optional `taskRunner` in `deps`. When
provided, the caller owns the pool's lifecycle (creation + cleanup);
when omitted, runSecurityCheck creates and cleans up a fresh pool —
preserving the existing behavior for direct callers (e.g. the MCP
fileSystemReadFileTool path).
- `validateFileSafety` accepts and forwards an optional `taskRunner`.
- `pack()` calls `createSecurityTaskRunner` after `searchFiles` resolves
(file count is now known) and before the parallel collectFiles + git
ops block, so the warm-up runs concurrently with disk I/O. The
task runner is plumbed through `validateFileSafety` deps; the pool
is cleaned up alongside the metrics pool in the surrounding
try/finally.
## Scope gate
Pre-warming is gated on the same `hasExplicitScope` heuristic that
already differentiates 2- vs. 3-worker metrics warm-up:
| Workload | Pre-warm? |
|--------------------------------------------------|-----------|
| Default scan (no `--include` / `--stdin`) | yes |
| `--include`, `config.include`, or `--stdin` set | no |
Without the gate, the small/scoped workload regresses by 3.4 % paired
mean: the security check scans only ~5 batches and finishes in ~50–80
ms, so the up-front cost of constructing + destroying a second worker
pool outweighs the saved cold-start. The unconstrained scan runs
security over ~1000+ files where the hidden cold-start dominates.
## Benchmark — `node bin/repomix.cjs --quiet` (1046 files)
Two independent paired n=50 runs (interleaved BEFORE/AFTER alternating
order, NODE_DISABLE_COMPILE_CACHE=1):
| | min | median | mean | max | sd |
|--------|---------|---------|---------|---------|--------|
| BEFORE | 1320 ms | 1454 ms | 1451 ms | 1590 ms | 49 ms |
| AFTER | 1318 ms | 1410 ms | 1416 ms | 1501 ms | 40 ms |
- Mean paired Δ: **+35.2 ms (2.42 % wall-clock reduction)**
- Median paired Δ: +32.5 ms (2.23 %)
- Paired-delta SD: 64.78 ms · paired t = **3.84** (p < 0.001)
- AFTER faster in **39/50** pairs (78 %)
Confirmation run (same setup, n=50): mean Δ +37.0 ms (2.55 %),
t = 3.93, 36/50 pairs faster.
## Regression check — `--include 'src,tests' --quiet` (258 files)
n=30 paired interleaved, NODE_DISABLE_COMPILE_CACHE=1:
| | min | median | mean | max |
|--------|--------|--------|--------|--------|
| BEFORE | 670 ms | 732 ms | 730 ms | 783 ms |
| AFTER | 688 ms | 728 ms | 729 ms | 786 ms |
- Mean paired Δ: +0.9 ms (0.13 %) — **neutral within noise**
(paired t = 0.17)
- AFTER faster in 16/30 pairs
The gate falls back to the original lazy-spawn path on this workload,
so AFTER == BEFORE up to noise. Without the gate this workload
regresses by 3.4 % paired (t = -4.88).
## Correctness
- All **1260** unit tests pass (`npm test`); `npm run lint` clean
(only the two pre-existing `biome-ignore` warnings unrelated to
this change).
- XML output **byte-identical** between BEFORE and AFTER on both the
default 1046-file workload and the `--include 'src,tests'`
258-file workload (verified via `diff` on full ~4.85 MB outputs).
- `runSecurityCheck`'s public signature gains an optional `taskRunner`
in deps; when omitted, behavior is unchanged. Existing callers
outside the pack pipeline (e.g. MCP `fileSystemReadFileTool`) still
spawn their own pool.
- The MCP main-thread security path is unaffected — it uses
`runSecretLint` directly (worker module loaded once at process
start) and never goes through the pool.
## Tests
- `tests/core/security/validateFileSafety.test.ts` — assertion on the
`runSecurityCheck` call updated to include the new `{ taskRunner }`
deps argument (currently undefined when no pre-warmed runner is
provided).
- `tests/core/packager.test.ts`,
`tests/core/packager/diffsFunctionality.test.ts`,
`tests/core/packager/splitOutput.test.ts`,
`tests/integration-tests/packager.test.ts` — extended `mockDeps` /
`baseDeps` with a stubbed `createSecurityTaskRunner` so the default
scope path no longer attempts to spawn a real worker pool from the
test environment. The pack-level assertion on `validateFileSafety`
now matches the new 6th-argument deps object via
`expect.objectContaining({ taskRunner: expect.any(Object) })`.