Files
repomix-mirror/tests/core/security/validateFileSafety.test.ts
Claude fb4c895085 perf(core): Pre-warm security worker pool to overlap @secretlint/core load
The security worker pool currently spawns its 2 workers lazily inside
`runSecurityCheck`, paying a ~50 ms `@secretlint/core` +
`@secretlint/secretlint-rule-preset-recommend` module load on each
freshly spawned worker (~100 ms wall-clock for both workers loading
concurrently). That cold-start cost runs on the critical path inside
the security-check phase, before any scanning begins.

Mirror the existing `createMetricsTaskRunner` pattern: hoist the pool
construction to `pack()` and dispatch one no-op task per worker at the
pipeline entry, so the module load overlaps with the collectFiles + git
ops phase (~200 ms) instead of stalling the security check.

## Mechanism

- New `createSecurityTaskRunner(numOfTasks, deps?)` in
  `src/core/security/securityCheck.ts` returns
  `{ taskRunner, warmupPromise }`. The warm-up dispatches `maxThreads`
  no-op tasks (`{ items: [] }`) — Tinypool spawns a fresh worker for
  each concurrent task, fanning out the @secretlint/core load across
  all workers in parallel.
- `runSecurityCheck` accepts an optional `taskRunner` in `deps`. When
  provided, the caller owns the pool's lifecycle (creation + cleanup);
  when omitted, runSecurityCheck creates and cleans up a fresh pool —
  preserving the existing behavior for direct callers (e.g. the MCP
  fileSystemReadFileTool path).
- `validateFileSafety` accepts and forwards an optional `taskRunner`.
- `pack()` calls `createSecurityTaskRunner` after `searchFiles` resolves
  (file count is now known) and before the parallel collectFiles + git
  ops block, so the warm-up runs concurrently with disk I/O. The
  task runner is plumbed through `validateFileSafety` deps; the pool
  is cleaned up alongside the metrics pool in the surrounding
  try/finally.

## Scope gate

Pre-warming is gated on the same `hasExplicitScope` heuristic that
already differentiates 2- vs. 3-worker metrics warm-up:

| Workload                                         | Pre-warm? |
|--------------------------------------------------|-----------|
| Default scan (no `--include` / `--stdin`)        | yes       |
| `--include`, `config.include`, or `--stdin` set  | no        |

Without the gate, the small/scoped workload regresses by 3.4 % paired
mean: the security check scans only ~5 batches and finishes in ~50–80
ms, so the up-front cost of constructing + destroying a second worker
pool outweighs the saved cold-start. The unconstrained scan runs
security over ~1000+ files where the hidden cold-start dominates.

## Benchmark — `node bin/repomix.cjs --quiet` (1046 files)

Two independent paired n=50 runs (interleaved BEFORE/AFTER alternating
order, NODE_DISABLE_COMPILE_CACHE=1):

|        | min     | median  | mean    | max     | sd     |
|--------|---------|---------|---------|---------|--------|
| BEFORE | 1320 ms | 1454 ms | 1451 ms | 1590 ms | 49 ms  |
| AFTER  | 1318 ms | 1410 ms | 1416 ms | 1501 ms | 40 ms  |

- Mean paired Δ:   **+35.2 ms (2.42 % wall-clock reduction)**
- Median paired Δ: +32.5 ms (2.23 %)
- Paired-delta SD: 64.78 ms · paired t = **3.84** (p < 0.001)
- AFTER faster in **39/50** pairs (78 %)

Confirmation run (same setup, n=50): mean Δ +37.0 ms (2.55 %),
t = 3.93, 36/50 pairs faster.

## Regression check — `--include 'src,tests' --quiet` (258 files)

n=30 paired interleaved, NODE_DISABLE_COMPILE_CACHE=1:

|        | min    | median | mean   | max    |
|--------|--------|--------|--------|--------|
| BEFORE | 670 ms | 732 ms | 730 ms | 783 ms |
| AFTER  | 688 ms | 728 ms | 729 ms | 786 ms |

- Mean paired Δ:   +0.9 ms (0.13 %) — **neutral within noise**
  (paired t = 0.17)
- AFTER faster in 16/30 pairs

The gate falls back to the original lazy-spawn path on this workload,
so AFTER == BEFORE up to noise. Without the gate this workload
regresses by 3.4 % paired (t = -4.88).

## Correctness

- All **1260** unit tests pass (`npm test`); `npm run lint` clean
  (only the two pre-existing `biome-ignore` warnings unrelated to
  this change).
- XML output **byte-identical** between BEFORE and AFTER on both the
  default 1046-file workload and the `--include 'src,tests'`
  258-file workload (verified via `diff` on full ~4.85 MB outputs).
- `runSecurityCheck`'s public signature gains an optional `taskRunner`
  in deps; when omitted, behavior is unchanged. Existing callers
  outside the pack pipeline (e.g. MCP `fileSystemReadFileTool`) still
  spawn their own pool.
- The MCP main-thread security path is unaffected — it uses
  `runSecretLint` directly (worker module loaded once at process
  start) and never goes through the pool.

## Tests

- `tests/core/security/validateFileSafety.test.ts` — assertion on the
  `runSecurityCheck` call updated to include the new `{ taskRunner }`
  deps argument (currently undefined when no pre-warmed runner is
  provided).
- `tests/core/packager.test.ts`,
  `tests/core/packager/diffsFunctionality.test.ts`,
  `tests/core/packager/splitOutput.test.ts`,
  `tests/integration-tests/packager.test.ts` — extended `mockDeps` /
  `baseDeps` with a stubbed `createSecurityTaskRunner` so the default
  scope path no longer attempts to spawn a real worker pool from the
  test environment. The pack-level assertion on `validateFileSafety`
  now matches the new 6th-argument deps object via
  `expect.objectContaining({ taskRunner: expect.any(Object) })`.
2026-05-08 17:05:51 +00:00

118 lines
5.2 KiB
TypeScript

import { afterEach, describe, expect, it, vi } from 'vitest';
import type { RepomixConfigMerged } from '../../../src/config/configSchema.js';
import type { RawFile } from '../../../src/core/file/fileTypes.js';
import type { SuspiciousFileResult } from '../../../src/core/security/securityCheck.js';
import { validateFileSafety } from '../../../src/core/security/validateFileSafety.js';
import { logger } from '../../../src/shared/logger.js';
import type { RepomixProgressCallback } from '../../../src/shared/types.js';
describe('validateFileSafety', () => {
it('should validate file safety and return safe files and paths', async () => {
const rawFiles: RawFile[] = [
{ path: 'file1.txt', content: 'content1' },
{ path: 'file2.txt', content: 'content2' },
{ path: 'file3.txt', content: 'content3' },
];
const safeRawFiles = [rawFiles[0], rawFiles[1]];
const config: RepomixConfigMerged = {
security: { enableSecurityCheck: true },
} as RepomixConfigMerged;
const progressCallback: RepomixProgressCallback = vi.fn();
const suspiciousFilesResults: SuspiciousFileResult[] = [
{ filePath: 'file2.txt', messages: ['something suspicious.'], type: 'file' },
];
const deps = {
runSecurityCheck: vi.fn().mockResolvedValue(suspiciousFilesResults),
filterOutUntrustedFiles: vi.fn().mockReturnValue(safeRawFiles),
};
const result = await validateFileSafety(rawFiles, progressCallback, config, undefined, undefined, deps);
expect(deps.runSecurityCheck).toHaveBeenCalledWith(rawFiles, progressCallback, undefined, undefined, {
taskRunner: undefined,
});
expect(deps.filterOutUntrustedFiles).toHaveBeenCalledWith(rawFiles, suspiciousFilesResults);
expect(result).toEqual({
safeRawFiles,
safeFilePaths: ['file1.txt', 'file2.txt'],
suspiciousFilesResults,
suspiciousGitDiffResults: [],
suspiciousGitLogResults: [],
});
});
describe('git content warnings', () => {
afterEach(() => {
vi.restoreAllMocks();
});
it('warns about each suspicious git diff/log entry with singular/plural form', async () => {
const warnSpy = vi.spyOn(logger, 'warn').mockImplementation(() => {});
const config: RepomixConfigMerged = {
security: { enableSecurityCheck: true },
} as RepomixConfigMerged;
const progressCallback: RepomixProgressCallback = vi.fn();
const allResults: SuspiciousFileResult[] = [
{ filePath: 'work_tree', messages: ['leaked key'], type: 'gitDiff' },
{ filePath: 'staged', messages: ['leaked key', 'token'], type: 'gitDiff' },
{ filePath: 'commit_log', messages: ['secret', 'pw', 'token'], type: 'gitLog' },
];
const deps = {
runSecurityCheck: vi.fn().mockResolvedValue(allResults),
filterOutUntrustedFiles: vi.fn().mockReturnValue([]),
};
const result = await validateFileSafety([], progressCallback, config, undefined, undefined, deps);
expect(result.suspiciousGitDiffResults).toHaveLength(2);
expect(result.suspiciousGitLogResults).toHaveLength(1);
const warnings = warnSpy.mock.calls.map((c) => String(c[0]));
// Header lines for each section
expect(warnings).toContain('Security issues found in Git diffs, but they will still be included in the output');
expect(warnings).toContain('Security issues found in Git logs, but they will still be included in the output');
// Singular form for 1 message, plural for >1
expect(warnings).toContain(' - work_tree: 1 issue detected');
expect(warnings).toContain(' - staged: 2 issues detected');
expect(warnings).toContain(' - commit_log: 3 issues detected');
});
it('does not log a header when no suspicious git content is found', async () => {
const warnSpy = vi.spyOn(logger, 'warn').mockImplementation(() => {});
const config: RepomixConfigMerged = {
security: { enableSecurityCheck: true },
} as RepomixConfigMerged;
await validateFileSafety([], vi.fn(), config, undefined, undefined, {
runSecurityCheck: vi.fn().mockResolvedValue([]),
filterOutUntrustedFiles: vi.fn().mockReturnValue([]),
});
expect(warnSpy).not.toHaveBeenCalled();
});
});
it('skips runSecurityCheck entirely when enableSecurityCheck is false', async () => {
// Pin the negative path of the `if (config.security.enableSecurityCheck)` guard.
// Dropping the guard would still pass every other test in this file because
// they all enable the check; this one fails if the guard ever regresses.
const config: RepomixConfigMerged = {
security: { enableSecurityCheck: false },
} as RepomixConfigMerged;
const rawFiles: RawFile[] = [{ path: 'file1.txt', content: 'content' }];
const deps = {
runSecurityCheck: vi.fn(),
filterOutUntrustedFiles: vi.fn().mockReturnValue(rawFiles),
};
const result = await validateFileSafety(rawFiles, vi.fn(), config, undefined, undefined, deps);
expect(deps.runSecurityCheck).not.toHaveBeenCalled();
expect(result.suspiciousFilesResults).toEqual([]);
expect(result.suspiciousGitDiffResults).toEqual([]);
expect(result.suspiciousGitLogResults).toEqual([]);
expect(result.safeFilePaths).toEqual(['file1.txt']);
});
});