Commit Graph

9 Commits

Author SHA1 Message Date
Claude fb4c895085 perf(core): Pre-warm security worker pool to overlap @secretlint/core load
The security worker pool currently spawns its 2 workers lazily inside
`runSecurityCheck`, paying a ~50 ms `@secretlint/core` +
`@secretlint/secretlint-rule-preset-recommend` module load on each
freshly spawned worker (~100 ms wall-clock for both workers loading
concurrently). That cold-start cost runs on the critical path inside
the security-check phase, before any scanning begins.

Mirror the existing `createMetricsTaskRunner` pattern: hoist the pool
construction to `pack()` and dispatch one no-op task per worker at the
pipeline entry, so the module load overlaps with the collectFiles + git
ops phase (~200 ms) instead of stalling the security check.

## Mechanism

- New `createSecurityTaskRunner(numOfTasks, deps?)` in
  `src/core/security/securityCheck.ts` returns
  `{ taskRunner, warmupPromise }`. The warm-up dispatches `maxThreads`
  no-op tasks (`{ items: [] }`) — Tinypool spawns a fresh worker for
  each concurrent task, fanning out the @secretlint/core load across
  all workers in parallel.
- `runSecurityCheck` accepts an optional `taskRunner` in `deps`. When
  provided, the caller owns the pool's lifecycle (creation + cleanup);
  when omitted, runSecurityCheck creates and cleans up a fresh pool —
  preserving the existing behavior for direct callers (e.g. the MCP
  fileSystemReadFileTool path).
- `validateFileSafety` accepts and forwards an optional `taskRunner`.
- `pack()` calls `createSecurityTaskRunner` after `searchFiles` resolves
  (file count is now known) and before the parallel collectFiles + git
  ops block, so the warm-up runs concurrently with disk I/O. The
  task runner is plumbed through `validateFileSafety` deps; the pool
  is cleaned up alongside the metrics pool in the surrounding
  try/finally.

## Scope gate

Pre-warming is gated on the same `hasExplicitScope` heuristic that
already differentiates 2- vs. 3-worker metrics warm-up:

| Workload                                         | Pre-warm? |
|--------------------------------------------------|-----------|
| Default scan (no `--include` / `--stdin`)        | yes       |
| `--include`, `config.include`, or `--stdin` set  | no        |

Without the gate, the small/scoped workload regresses by 3.4 % paired
mean: the security check scans only ~5 batches and finishes in ~50–80
ms, so the up-front cost of constructing + destroying a second worker
pool outweighs the saved cold-start. The unconstrained scan runs
security over ~1000+ files where the hidden cold-start dominates.

## Benchmark — `node bin/repomix.cjs --quiet` (1046 files)

Two independent paired n=50 runs (interleaved BEFORE/AFTER alternating
order, NODE_DISABLE_COMPILE_CACHE=1):

|        | min     | median  | mean    | max     | sd     |
|--------|---------|---------|---------|---------|--------|
| BEFORE | 1320 ms | 1454 ms | 1451 ms | 1590 ms | 49 ms  |
| AFTER  | 1318 ms | 1410 ms | 1416 ms | 1501 ms | 40 ms  |

- Mean paired Δ:   **+35.2 ms (2.42 % wall-clock reduction)**
- Median paired Δ: +32.5 ms (2.23 %)
- Paired-delta SD: 64.78 ms · paired t = **3.84** (p < 0.001)
- AFTER faster in **39/50** pairs (78 %)

Confirmation run (same setup, n=50): mean Δ +37.0 ms (2.55 %),
t = 3.93, 36/50 pairs faster.

## Regression check — `--include 'src,tests' --quiet` (258 files)

n=30 paired interleaved, NODE_DISABLE_COMPILE_CACHE=1:

|        | min    | median | mean   | max    |
|--------|--------|--------|--------|--------|
| BEFORE | 670 ms | 732 ms | 730 ms | 783 ms |
| AFTER  | 688 ms | 728 ms | 729 ms | 786 ms |

- Mean paired Δ:   +0.9 ms (0.13 %) — **neutral within noise**
  (paired t = 0.17)
- AFTER faster in 16/30 pairs

The gate falls back to the original lazy-spawn path on this workload,
so AFTER == BEFORE up to noise. Without the gate this workload
regresses by 3.4 % paired (t = -4.88).

## Correctness

- All **1260** unit tests pass (`npm test`); `npm run lint` clean
  (only the two pre-existing `biome-ignore` warnings unrelated to
  this change).
- XML output **byte-identical** between BEFORE and AFTER on both the
  default 1046-file workload and the `--include 'src,tests'`
  258-file workload (verified via `diff` on full ~4.85 MB outputs).
- `runSecurityCheck`'s public signature gains an optional `taskRunner`
  in deps; when omitted, behavior is unchanged. Existing callers
  outside the pack pipeline (e.g. MCP `fileSystemReadFileTool`) still
  spawn their own pool.
- The MCP main-thread security path is unaffected — it uses
  `runSecretLint` directly (worker module loaded once at process
  start) and never goes through the pool.

## Tests

- `tests/core/security/validateFileSafety.test.ts` — assertion on the
  `runSecurityCheck` call updated to include the new `{ taskRunner }`
  deps argument (currently undefined when no pre-warmed runner is
  provided).
- `tests/core/packager.test.ts`,
  `tests/core/packager/diffsFunctionality.test.ts`,
  `tests/core/packager/splitOutput.test.ts`,
  `tests/integration-tests/packager.test.ts` — extended `mockDeps` /
  `baseDeps` with a stubbed `createSecurityTaskRunner` so the default
  scope path no longer attempts to spawn a real worker pool from the
  test environment. The pack-level assertion on `validateFileSafety`
  now matches the new 6th-argument deps object via
  `expect.objectContaining({ taskRunner: expect.any(Object) })`.
2026-05-08 17:05:51 +00:00
Kazuki Yamada cfbab618c5 refactor(metrics): Encapsulate warmup logic in createMetricsTaskRunner
Move worker thread warmup from packager into createMetricsTaskRunner,
which now returns both a taskRunner and warmupPromise. This keeps the
packager clean — it no longer needs to know warmup implementation details.

Also:
- Skip metrics worker pool creation on skill-generation path where
  it is unused
- Await warmupPromise in finally block before cleanup to prevent
  tearing down workers during initialization

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 14:56:49 +09:00
Kazuki Yamada 96a6a7c804 perf(core): Cache empty directory paths to avoid redundant file search
When includeEmptyDirectories is enabled, buildOutputGeneratorContext
called searchFiles a second time just to obtain emptyDirPaths, despite
these already being computed during the initial file search in packager.

Changes:
- Capture emptyDirPaths from the initial searchFiles result in packager
  and thread them through the pipeline (packager → produceOutput →
  generateOutput/outputSplit → buildOutputGeneratorContext)
- Guard emptyDirPaths processing with includeEmptyDirectories check to
  skip unnecessary work when the feature is disabled
- Fix split output path which was not receiving emptyDirPaths despite
  the parameter being declared in produceOutput's signature
- Add tests for cache hit (searchFiles not called) and fallback paths

Local benchmark (repomix on itself, includeEmptyDirectories: true):
  main:   696.6ms ± 4.2ms
  branch: 637.1ms ± 2.6ms
  Improvement: ~60ms (~8.5%)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 00:03:49 +09:00
Claude ea29ac9ffa perf(core): Overlap security check, file processing, and metrics with output generation
Pipeline optimization that parallelizes independent stages to reduce end-to-end latency:

1. Run security check and file processing concurrently: Security check uses
   worker threads while file processing (in default config) runs on the main
   thread, so they don't compete for CPU. After both complete, suspicious files
   are filtered from the processed results using a Set for O(1) lookups.

2. Overlap output generation with metrics calculation: File metrics and git
   metrics don't depend on the generated output, so they start immediately
   via the worker pool while output generation runs on the main thread. Only
   output token counting waits for the output string, passed as a Promise.

Before (sequential):
  collectFiles → securityCheck(284ms) → processFiles(75ms) → produceOutput(185ms) → calculateMetrics(870ms)

After (overlapped):
  collectFiles → [securityCheck || processFiles] → [produceOutput || fileMetrics + gitMetrics] → outputMetrics

Benchmark results (repomix repo, 989 files, 10 runs each, back-to-back):

  Baseline:  avg 2187ms, median 2176ms, p90 2269ms
  Optimized: avg 2022ms, median 2018ms, p90 2070ms

  Average improvement: 165ms (7.5%)
  Median improvement:  158ms (7.3%)
  P90 improvement:     199ms (8.8%)

https://claude.ai/code/session_011Sfyivv65pcN4VhFjRnDbs
2026-03-31 22:39:06 +09:00
Claude 4d2bbcf6cc perf(core): Pre-initialize metrics worker pool to overlap tiktoken WASM loading
Pipeline-level optimizations that produce measurable end-to-end improvement:

- Pre-initialize metrics worker pool during file collection phase so tiktoken
  WASM loading overlaps with security checks and file processing. First token
  count task dropped from 381ms to 22ms (worker already warmed).
- Lazy-load Jiti via dynamic import — only loaded when TS/JS config files are
  detected, saving startup time for the common JSON/default config path.
- Fix O(n²) file path re-grouping in packager by using Map + Set for O(1)
  membership checks instead of .find() + .includes().
- Move binary extension check before fs.stat in fileRead to skip unnecessary
  stat syscalls for binary files.
- Parallelize split output file writes with Promise.all instead of sequential
  for-loop.

Benchmark (15 runs each, median ± IQR, packing repomix repo ~1000 files):

  main branch: 3515ms (P25: 3443, P75: 3581)
  perf branch: 3318ms (P25: 3215, P75: 3383)
  Improvement: -197ms (-5.6%)

Pipeline stage breakdown (instrumented):
  - Metrics first-file init: 381ms → 22ms (worker pre-warmed)
  - Total metrics stage: 793ms → ~450ms

All 1096 tests pass. Lint clean.

https://claude.ai/code/session_01JoNjFe7S2roMfHfNcw6bso
2026-03-28 01:15:43 +09:00
Kazuki Yamada f41b75c560 fix(test): fix lint errors and update test signatures for filePathsByRoot
- Remove unused imports (generateFileTree, treeToString) in fileTreeGenerate.test.ts
- Add filePathsByRoot parameter to generateOutput and produceOutput calls in tests
- Update expect assertions to include filePathsByRoot argument
2026-01-04 23:11:28 +09:00
Kazuki Yamada 5f9c22ef6d refactor(core): Extract output generation logic from packager
Move split/single output generation and writing logic to
packager/produceOutput.ts to keep packager.ts focused
on the high-level orchestration flow.

- Create produceOutput module handling both output modes
- Simplify packager.ts from 227 to 181 lines
- Update related tests to use new dependency structure
2025-12-21 21:56:39 +09:00
Dango233 3216450bd5 docs(cli): Document --split-output
Add the new split output flag to README and website docs, including examples and the config option.
2025-12-21 21:56:39 +09:00
Dango233 e51d77a7c6 feat(cli): Add --split-output option
Adds a size-based output splitter via --split-output (kb/mb) and writes numbered parts without splitting within a top-level folder.

Also updates metrics aggregation for multi-part output and adds unit tests.
2025-12-21 21:56:39 +09:00