mirror of
https://github.com/yamadashy/repomix.git
synced 2026-06-11 15:37:16 +02:00
fb4c895085
The security worker pool currently spawns its 2 workers lazily inside
`runSecurityCheck`, paying a ~50 ms `@secretlint/core` +
`@secretlint/secretlint-rule-preset-recommend` module load on each
freshly spawned worker (~100 ms wall-clock for both workers loading
concurrently). That cold-start cost runs on the critical path inside
the security-check phase, before any scanning begins.
Mirror the existing `createMetricsTaskRunner` pattern: hoist the pool
construction to `pack()` and dispatch one no-op task per worker at the
pipeline entry, so the module load overlaps with the collectFiles + git
ops phase (~200 ms) instead of stalling the security check.
## Mechanism
- New `createSecurityTaskRunner(numOfTasks, deps?)` in
`src/core/security/securityCheck.ts` returns
`{ taskRunner, warmupPromise }`. The warm-up dispatches `maxThreads`
no-op tasks (`{ items: [] }`) — Tinypool spawns a fresh worker for
each concurrent task, fanning out the @secretlint/core load across
all workers in parallel.
- `runSecurityCheck` accepts an optional `taskRunner` in `deps`. When
provided, the caller owns the pool's lifecycle (creation + cleanup);
when omitted, runSecurityCheck creates and cleans up a fresh pool —
preserving the existing behavior for direct callers (e.g. the MCP
fileSystemReadFileTool path).
- `validateFileSafety` accepts and forwards an optional `taskRunner`.
- `pack()` calls `createSecurityTaskRunner` after `searchFiles` resolves
(file count is now known) and before the parallel collectFiles + git
ops block, so the warm-up runs concurrently with disk I/O. The
task runner is plumbed through `validateFileSafety` deps; the pool
is cleaned up alongside the metrics pool in the surrounding
try/finally.
## Scope gate
Pre-warming is gated on the same `hasExplicitScope` heuristic that
already differentiates 2- vs. 3-worker metrics warm-up:
| Workload | Pre-warm? |
|--------------------------------------------------|-----------|
| Default scan (no `--include` / `--stdin`) | yes |
| `--include`, `config.include`, or `--stdin` set | no |
Without the gate, the small/scoped workload regresses by 3.4 % paired
mean: the security check scans only ~5 batches and finishes in ~50–80
ms, so the up-front cost of constructing + destroying a second worker
pool outweighs the saved cold-start. The unconstrained scan runs
security over ~1000+ files where the hidden cold-start dominates.
## Benchmark — `node bin/repomix.cjs --quiet` (1046 files)
Two independent paired n=50 runs (interleaved BEFORE/AFTER alternating
order, NODE_DISABLE_COMPILE_CACHE=1):
| | min | median | mean | max | sd |
|--------|---------|---------|---------|---------|--------|
| BEFORE | 1320 ms | 1454 ms | 1451 ms | 1590 ms | 49 ms |
| AFTER | 1318 ms | 1410 ms | 1416 ms | 1501 ms | 40 ms |
- Mean paired Δ: **+35.2 ms (2.42 % wall-clock reduction)**
- Median paired Δ: +32.5 ms (2.23 %)
- Paired-delta SD: 64.78 ms · paired t = **3.84** (p < 0.001)
- AFTER faster in **39/50** pairs (78 %)
Confirmation run (same setup, n=50): mean Δ +37.0 ms (2.55 %),
t = 3.93, 36/50 pairs faster.
## Regression check — `--include 'src,tests' --quiet` (258 files)
n=30 paired interleaved, NODE_DISABLE_COMPILE_CACHE=1:
| | min | median | mean | max |
|--------|--------|--------|--------|--------|
| BEFORE | 670 ms | 732 ms | 730 ms | 783 ms |
| AFTER | 688 ms | 728 ms | 729 ms | 786 ms |
- Mean paired Δ: +0.9 ms (0.13 %) — **neutral within noise**
(paired t = 0.17)
- AFTER faster in 16/30 pairs
The gate falls back to the original lazy-spawn path on this workload,
so AFTER == BEFORE up to noise. Without the gate this workload
regresses by 3.4 % paired (t = -4.88).
## Correctness
- All **1260** unit tests pass (`npm test`); `npm run lint` clean
(only the two pre-existing `biome-ignore` warnings unrelated to
this change).
- XML output **byte-identical** between BEFORE and AFTER on both the
default 1046-file workload and the `--include 'src,tests'`
258-file workload (verified via `diff` on full ~4.85 MB outputs).
- `runSecurityCheck`'s public signature gains an optional `taskRunner`
in deps; when omitted, behavior is unchanged. Existing callers
outside the pack pipeline (e.g. MCP `fileSystemReadFileTool`) still
spawn their own pool.
- The MCP main-thread security path is unaffected — it uses
`runSecretLint` directly (worker module loaded once at process
start) and never goes through the pool.
## Tests
- `tests/core/security/validateFileSafety.test.ts` — assertion on the
`runSecurityCheck` call updated to include the new `{ taskRunner }`
deps argument (currently undefined when no pre-warmed runner is
provided).
- `tests/core/packager.test.ts`,
`tests/core/packager/diffsFunctionality.test.ts`,
`tests/core/packager/splitOutput.test.ts`,
`tests/integration-tests/packager.test.ts` — extended `mockDeps` /
`baseDeps` with a stubbed `createSecurityTaskRunner` so the default
scope path no longer attempts to spawn a real worker pool from the
test environment. The pack-level assertion on `validateFileSafety`
now matches the new 6th-argument deps object via
`expect.objectContaining({ taskRunner: expect.any(Object) })`.
317 lines
13 KiB
TypeScript
317 lines
13 KiB
TypeScript
import path from 'node:path';
|
|
import { beforeEach, describe, expect, test, vi } from 'vitest';
|
|
import { pack } from '../../src/core/packager.js';
|
|
import { createMockConfig } from '../testing/testUtils.js';
|
|
|
|
vi.mock('node:fs/promises');
|
|
vi.mock('fs/promises');
|
|
vi.mock('../../src/core/metrics/TokenCounter.js', () => {
|
|
return {
|
|
TOKEN_ENCODINGS: ['o200k_base', 'cl100k_base', 'p50k_base', 'p50k_edit', 'r50k_base'],
|
|
TokenCounter: vi.fn().mockImplementation(() => ({
|
|
countTokens: vi.fn().mockReturnValue(10),
|
|
free: vi.fn(),
|
|
})),
|
|
};
|
|
});
|
|
|
|
describe('packager', () => {
|
|
beforeEach(() => {
|
|
vi.resetAllMocks();
|
|
});
|
|
|
|
test('pack should orchestrate packing files and generating output', async () => {
|
|
const file2Path = path.join('dir1', 'file2.txt');
|
|
const mockRawFiles = [
|
|
{ path: 'file1.txt', content: 'raw content 1' },
|
|
{ path: file2Path, content: 'raw content 2' },
|
|
];
|
|
const mockProcessedFiles = [
|
|
{ path: 'file1.txt', content: 'processed content 1' },
|
|
{ path: file2Path, content: 'processed content 2' },
|
|
];
|
|
const mockOutput = 'mock output';
|
|
const mockFilePaths = ['file1.txt', file2Path];
|
|
|
|
const mockDeps = {
|
|
searchFiles: vi.fn().mockResolvedValue({
|
|
filePaths: mockFilePaths,
|
|
emptyDirPaths: [],
|
|
}),
|
|
sortPaths: vi.fn().mockImplementation((paths) => paths),
|
|
collectFiles: vi.fn().mockResolvedValue({ rawFiles: mockRawFiles, skippedFiles: [] }),
|
|
processFiles: vi.fn().mockReturnValue(mockProcessedFiles),
|
|
validateFileSafety: vi.fn().mockResolvedValue({
|
|
safeFilePaths: mockFilePaths,
|
|
safeRawFiles: mockRawFiles,
|
|
suspiciousFilesResults: [],
|
|
suspiciousGitDiffResults: [],
|
|
suspiciousGitLogResults: [],
|
|
}),
|
|
produceOutput: vi.fn().mockResolvedValue({
|
|
outputForMetrics: mockOutput,
|
|
}),
|
|
createMetricsTaskRunner: vi.fn().mockReturnValue({
|
|
taskRunner: {
|
|
run: vi.fn().mockResolvedValue(0),
|
|
cleanup: vi.fn().mockResolvedValue(undefined),
|
|
},
|
|
warmupPromise: Promise.resolve(),
|
|
}),
|
|
createSecurityTaskRunner: vi.fn().mockReturnValue({
|
|
taskRunner: {
|
|
run: vi.fn().mockResolvedValue([]),
|
|
cleanup: vi.fn().mockResolvedValue(undefined),
|
|
},
|
|
warmupPromise: Promise.resolve(),
|
|
}),
|
|
calculateMetrics: vi.fn().mockResolvedValue({
|
|
totalFiles: 2,
|
|
totalCharacters: 11,
|
|
totalTokens: 10,
|
|
fileCharCounts: {
|
|
'file1.txt': 19,
|
|
[file2Path]: 19,
|
|
},
|
|
fileTokenCounts: {
|
|
'file1.txt': 10,
|
|
[file2Path]: 10,
|
|
},
|
|
gitDiffTokenCount: 0,
|
|
gitLogTokenCount: 0,
|
|
}),
|
|
};
|
|
|
|
const mockConfig = createMockConfig();
|
|
const progressCallback = vi.fn();
|
|
const result = await pack(['root'], mockConfig, progressCallback, mockDeps);
|
|
|
|
expect(mockDeps.searchFiles).toHaveBeenCalledWith('root', mockConfig, undefined);
|
|
expect(mockDeps.collectFiles).toHaveBeenCalledWith(mockFilePaths, 'root', mockConfig, progressCallback);
|
|
expect(mockDeps.validateFileSafety).toHaveBeenCalled();
|
|
expect(mockDeps.processFiles).toHaveBeenCalled();
|
|
expect(mockDeps.produceOutput).toHaveBeenCalled();
|
|
expect(mockDeps.calculateMetrics).toHaveBeenCalled();
|
|
|
|
expect(mockDeps.validateFileSafety).toHaveBeenCalledWith(
|
|
mockRawFiles,
|
|
progressCallback,
|
|
mockConfig,
|
|
undefined,
|
|
undefined,
|
|
expect.objectContaining({ taskRunner: expect.any(Object) }),
|
|
);
|
|
expect(mockDeps.processFiles).toHaveBeenCalledWith(mockRawFiles, mockConfig, progressCallback);
|
|
expect(mockDeps.produceOutput).toHaveBeenCalledWith(
|
|
['root'],
|
|
mockConfig,
|
|
mockProcessedFiles,
|
|
mockFilePaths,
|
|
undefined,
|
|
undefined,
|
|
progressCallback,
|
|
[{ rootLabel: 'root', files: mockFilePaths }],
|
|
undefined,
|
|
);
|
|
expect(mockDeps.calculateMetrics).toHaveBeenCalledWith(
|
|
mockProcessedFiles,
|
|
expect.anything(),
|
|
progressCallback,
|
|
mockConfig,
|
|
undefined,
|
|
undefined,
|
|
expect.objectContaining({ taskRunner: expect.anything() }),
|
|
);
|
|
|
|
// Verify that calculateMetrics received a promise that resolves to the expected output
|
|
const outputArg = mockDeps.calculateMetrics.mock.calls[0][1];
|
|
await expect(outputArg).resolves.toBe(mockOutput);
|
|
|
|
// Check the result of pack function
|
|
expect(result.totalFiles).toBe(2);
|
|
expect(result.totalCharacters).toBe(11);
|
|
expect(result.totalTokens).toBe(10);
|
|
expect(result.fileCharCounts).toEqual({
|
|
'file1.txt': 19,
|
|
[file2Path]: 19,
|
|
});
|
|
expect(result.fileTokenCounts).toEqual({
|
|
'file1.txt': 10,
|
|
[file2Path]: 10,
|
|
});
|
|
expect(result.skippedFiles).toEqual([]);
|
|
});
|
|
|
|
describe('parallel error handling', () => {
|
|
// The pipeline runs several stages in parallel (security check + file processing,
|
|
// output generation + metrics). Regressions in error propagation or worker cleanup
|
|
// are easy to introduce when adding parallel branches.
|
|
const mockFilePaths = ['file1.txt'];
|
|
const mockRawFiles = [{ path: 'file1.txt', content: 'raw' }];
|
|
const mockProcessedFiles = [{ path: 'file1.txt', content: 'processed' }];
|
|
|
|
const baseDeps = () => {
|
|
const cleanup = vi.fn().mockResolvedValue(undefined);
|
|
return {
|
|
cleanup,
|
|
deps: {
|
|
searchFiles: vi.fn().mockResolvedValue({ filePaths: mockFilePaths, emptyDirPaths: [] }),
|
|
sortPaths: vi.fn().mockImplementation((p) => p),
|
|
collectFiles: vi.fn().mockResolvedValue({ rawFiles: mockRawFiles, skippedFiles: [] }),
|
|
processFiles: vi.fn().mockResolvedValue(mockProcessedFiles),
|
|
validateFileSafety: vi.fn().mockResolvedValue({
|
|
safeFilePaths: mockFilePaths,
|
|
safeRawFiles: mockRawFiles,
|
|
suspiciousFilesResults: [],
|
|
suspiciousGitDiffResults: [],
|
|
suspiciousGitLogResults: [],
|
|
}),
|
|
produceOutput: vi.fn().mockResolvedValue({ outputForMetrics: 'output' }),
|
|
// Mirror real calculateMetrics behavior: await the outputForMetrics promise so a
|
|
// produceOutput rejection propagates here instead of becoming unhandled.
|
|
calculateMetrics: vi.fn().mockImplementation(async (_files, outputPromise) => {
|
|
await outputPromise;
|
|
return {
|
|
totalFiles: 1,
|
|
totalCharacters: 9,
|
|
totalTokens: 1,
|
|
fileCharCounts: { 'file1.txt': 9 },
|
|
fileTokenCounts: { 'file1.txt': 1 },
|
|
gitDiffTokenCount: 0,
|
|
gitLogTokenCount: 0,
|
|
};
|
|
}),
|
|
createMetricsTaskRunner: vi.fn().mockReturnValue({
|
|
taskRunner: { run: vi.fn().mockResolvedValue(0), cleanup },
|
|
warmupPromise: Promise.resolve(),
|
|
}),
|
|
createSecurityTaskRunner: vi.fn().mockReturnValue({
|
|
taskRunner: { run: vi.fn().mockResolvedValue([]), cleanup: vi.fn().mockResolvedValue(undefined) },
|
|
warmupPromise: Promise.resolve(),
|
|
}),
|
|
getGitDiffs: vi.fn().mockResolvedValue(undefined),
|
|
getGitLogs: vi.fn().mockResolvedValue(undefined),
|
|
prefetchSortData: vi.fn().mockResolvedValue(undefined),
|
|
sortOutputFiles: vi.fn().mockImplementation((files) => files),
|
|
},
|
|
};
|
|
};
|
|
|
|
test('cleans up the metrics worker pool when searchFiles rejects', async () => {
|
|
// The metrics worker pool is created before searchFiles so its BPE warm-up
|
|
// can overlap with the glob scan. A searchFiles rejection must still trigger
|
|
// the pool cleanup via the surrounding try/finally.
|
|
const { cleanup, deps } = baseDeps();
|
|
deps.searchFiles = vi.fn().mockRejectedValue(new Error('search failed'));
|
|
|
|
await expect(pack(['root'], createMockConfig(), vi.fn(), deps)).rejects.toThrow('search failed');
|
|
|
|
expect(cleanup).toHaveBeenCalled();
|
|
});
|
|
|
|
test('cleans up the metrics worker pool when validateFileSafety rejects', async () => {
|
|
const { cleanup, deps } = baseDeps();
|
|
deps.validateFileSafety = vi.fn().mockRejectedValue(new Error('security check failed'));
|
|
|
|
await expect(pack(['root'], createMockConfig(), vi.fn(), deps)).rejects.toThrow('security check failed');
|
|
|
|
expect(cleanup).toHaveBeenCalled();
|
|
});
|
|
|
|
test('cleans up the metrics worker pool when produceOutput rejects', async () => {
|
|
const { cleanup, deps } = baseDeps();
|
|
deps.produceOutput = vi.fn().mockRejectedValue(new Error('output failed'));
|
|
|
|
await expect(pack(['root'], createMockConfig(), vi.fn(), deps)).rejects.toThrow('output failed');
|
|
|
|
expect(cleanup).toHaveBeenCalled();
|
|
});
|
|
|
|
test('cleans up the metrics worker pool when calculateMetrics rejects', async () => {
|
|
const { cleanup, deps } = baseDeps();
|
|
deps.calculateMetrics = vi.fn().mockRejectedValue(new Error('metrics failed'));
|
|
|
|
await expect(pack(['root'], createMockConfig(), vi.fn(), deps)).rejects.toThrow('metrics failed');
|
|
|
|
expect(cleanup).toHaveBeenCalled();
|
|
});
|
|
|
|
test('a prefetchSortData failure does not block the pipeline', async () => {
|
|
const { cleanup, deps } = baseDeps();
|
|
deps.prefetchSortData = vi.fn().mockRejectedValue(new Error('git failed'));
|
|
|
|
const result = await pack(['root'], createMockConfig(), vi.fn(), deps);
|
|
|
|
// Pack should complete successfully even though the prefetch failed.
|
|
expect(result.totalFiles).toBe(1);
|
|
expect(deps.sortOutputFiles).toHaveBeenCalled();
|
|
expect(cleanup).toHaveBeenCalled();
|
|
});
|
|
|
|
test('uses 3 warm-up workers for the metrics pool when no scope is specified', async () => {
|
|
// The eager metrics warm-up sizes the pool via numOfTasks. When the user did
|
|
// not constrain the file set (no --include / no --stdin), the metrics phase is
|
|
// typically long enough to amortize a 3rd worker's BPE warm-up, so we pass
|
|
// numOfTasks = 3 * TASKS_PER_THREAD = 600 (yielding maxThreads=min(cpuCount, 3)).
|
|
const { deps } = baseDeps();
|
|
const config = createMockConfig();
|
|
config.include = [];
|
|
|
|
await pack(['root'], config, vi.fn(), deps);
|
|
|
|
expect(deps.createMetricsTaskRunner).toHaveBeenCalledWith(600, expect.any(String));
|
|
// Same gate also enables the security pool pre-warm on the unscoped path.
|
|
expect(deps.createSecurityTaskRunner).toHaveBeenCalled();
|
|
});
|
|
|
|
test('uses 2 warm-up workers for the metrics pool when --include narrows scope', async () => {
|
|
// With explicit includes the file set is typically much smaller and the
|
|
// 3rd worker's BPE warm-up dominates the parallelism gain (paired benchmarks
|
|
// regressed by ~12% on the 258-file --include 'src,tests' workload), so the
|
|
// heuristic falls back to 2 warm workers. The same `hasExplicitScope`
|
|
// gate also disables the security pool pre-warm in this branch — its
|
|
// up-front cost outweighs the saved cold-start on small/scoped runs.
|
|
const { deps } = baseDeps();
|
|
const config = createMockConfig();
|
|
config.include = ['src'];
|
|
|
|
await pack(['root'], config, vi.fn(), deps);
|
|
|
|
expect(deps.createMetricsTaskRunner).toHaveBeenCalledWith(400, expect.any(String));
|
|
expect(deps.createSecurityTaskRunner).not.toHaveBeenCalled();
|
|
});
|
|
|
|
test('uses 2 warm-up workers for the metrics pool when explicitFiles is provided (--stdin)', async () => {
|
|
const { deps } = baseDeps();
|
|
const config = createMockConfig();
|
|
config.include = [];
|
|
|
|
await pack(['root'], config, vi.fn(), deps, ['file1.txt', 'file2.txt']);
|
|
|
|
expect(deps.createMetricsTaskRunner).toHaveBeenCalledWith(400, expect.any(String));
|
|
// The security pool pre-warm shares the `hasExplicitScope` gate — when --stdin
|
|
// (explicitFiles) provides the file set, the pre-warm is skipped to avoid the
|
|
// pool-construction overhead that outweighs the saved cold-start on small runs.
|
|
expect(deps.createSecurityTaskRunner).not.toHaveBeenCalled();
|
|
});
|
|
|
|
test('cleans up the metrics worker pool even when the warmup promise rejects', async () => {
|
|
const { cleanup, deps } = baseDeps();
|
|
// Pre-attach a no-op handler so the rejection is observed at construction time,
|
|
// before pack() reaches `await metricsWarmupPromise`. Production code mirrors this
|
|
// with `.catch(() => {})` in packager.ts:262, so the warmup rejection is fully
|
|
// contained — but vitest's unhandled-rejection detector can flag it eagerly here.
|
|
const warmupPromise = Promise.reject(new Error('warmup failed'));
|
|
warmupPromise.catch(() => {});
|
|
deps.createMetricsTaskRunner = vi.fn().mockReturnValue({
|
|
taskRunner: { run: vi.fn().mockResolvedValue(0), cleanup },
|
|
warmupPromise,
|
|
});
|
|
|
|
await expect(pack(['root'], createMockConfig(), vi.fn(), deps)).rejects.toThrow('warmup failed');
|
|
|
|
expect(cleanup).toHaveBeenCalled();
|
|
});
|
|
});
|
|
});
|