Commit Graph

10 Commits

Author SHA1 Message Date
Claude fb4c895085 perf(core): Pre-warm security worker pool to overlap @secretlint/core load
The security worker pool currently spawns its 2 workers lazily inside
`runSecurityCheck`, paying a ~50 ms `@secretlint/core` +
`@secretlint/secretlint-rule-preset-recommend` module load on each
freshly spawned worker (~100 ms wall-clock for both workers loading
concurrently). That cold-start cost runs on the critical path inside
the security-check phase, before any scanning begins.

Mirror the existing `createMetricsTaskRunner` pattern: hoist the pool
construction to `pack()` and dispatch one no-op task per worker at the
pipeline entry, so the module load overlaps with the collectFiles + git
ops phase (~200 ms) instead of stalling the security check.

## Mechanism

- New `createSecurityTaskRunner(numOfTasks, deps?)` in
  `src/core/security/securityCheck.ts` returns
  `{ taskRunner, warmupPromise }`. The warm-up dispatches `maxThreads`
  no-op tasks (`{ items: [] }`) — Tinypool spawns a fresh worker for
  each concurrent task, fanning out the @secretlint/core load across
  all workers in parallel.
- `runSecurityCheck` accepts an optional `taskRunner` in `deps`. When
  provided, the caller owns the pool's lifecycle (creation + cleanup);
  when omitted, runSecurityCheck creates and cleans up a fresh pool —
  preserving the existing behavior for direct callers (e.g. the MCP
  fileSystemReadFileTool path).
- `validateFileSafety` accepts and forwards an optional `taskRunner`.
- `pack()` calls `createSecurityTaskRunner` after `searchFiles` resolves
  (file count is now known) and before the parallel collectFiles + git
  ops block, so the warm-up runs concurrently with disk I/O. The
  task runner is plumbed through `validateFileSafety` deps; the pool
  is cleaned up alongside the metrics pool in the surrounding
  try/finally.

## Scope gate

Pre-warming is gated on the same `hasExplicitScope` heuristic that
already differentiates 2- vs. 3-worker metrics warm-up:

| Workload                                         | Pre-warm? |
|--------------------------------------------------|-----------|
| Default scan (no `--include` / `--stdin`)        | yes       |
| `--include`, `config.include`, or `--stdin` set  | no        |

Without the gate, the small/scoped workload regresses by 3.4 % paired
mean: the security check scans only ~5 batches and finishes in ~50–80
ms, so the up-front cost of constructing + destroying a second worker
pool outweighs the saved cold-start. The unconstrained scan runs
security over ~1000+ files where the hidden cold-start dominates.

## Benchmark — `node bin/repomix.cjs --quiet` (1046 files)

Two independent paired n=50 runs (interleaved BEFORE/AFTER alternating
order, NODE_DISABLE_COMPILE_CACHE=1):

|        | min     | median  | mean    | max     | sd     |
|--------|---------|---------|---------|---------|--------|
| BEFORE | 1320 ms | 1454 ms | 1451 ms | 1590 ms | 49 ms  |
| AFTER  | 1318 ms | 1410 ms | 1416 ms | 1501 ms | 40 ms  |

- Mean paired Δ:   **+35.2 ms (2.42 % wall-clock reduction)**
- Median paired Δ: +32.5 ms (2.23 %)
- Paired-delta SD: 64.78 ms · paired t = **3.84** (p < 0.001)
- AFTER faster in **39/50** pairs (78 %)

Confirmation run (same setup, n=50): mean Δ +37.0 ms (2.55 %),
t = 3.93, 36/50 pairs faster.

## Regression check — `--include 'src,tests' --quiet` (258 files)

n=30 paired interleaved, NODE_DISABLE_COMPILE_CACHE=1:

|        | min    | median | mean   | max    |
|--------|--------|--------|--------|--------|
| BEFORE | 670 ms | 732 ms | 730 ms | 783 ms |
| AFTER  | 688 ms | 728 ms | 729 ms | 786 ms |

- Mean paired Δ:   +0.9 ms (0.13 %) — **neutral within noise**
  (paired t = 0.17)
- AFTER faster in 16/30 pairs

The gate falls back to the original lazy-spawn path on this workload,
so AFTER == BEFORE up to noise. Without the gate this workload
regresses by 3.4 % paired (t = -4.88).

## Correctness

- All **1260** unit tests pass (`npm test`); `npm run lint` clean
  (only the two pre-existing `biome-ignore` warnings unrelated to
  this change).
- XML output **byte-identical** between BEFORE and AFTER on both the
  default 1046-file workload and the `--include 'src,tests'`
  258-file workload (verified via `diff` on full ~4.85 MB outputs).
- `runSecurityCheck`'s public signature gains an optional `taskRunner`
  in deps; when omitted, behavior is unchanged. Existing callers
  outside the pack pipeline (e.g. MCP `fileSystemReadFileTool`) still
  spawn their own pool.
- The MCP main-thread security path is unaffected — it uses
  `runSecretLint` directly (worker module loaded once at process
  start) and never goes through the pool.

## Tests

- `tests/core/security/validateFileSafety.test.ts` — assertion on the
  `runSecurityCheck` call updated to include the new `{ taskRunner }`
  deps argument (currently undefined when no pre-warmed runner is
  provided).
- `tests/core/packager.test.ts`,
  `tests/core/packager/diffsFunctionality.test.ts`,
  `tests/core/packager/splitOutput.test.ts`,
  `tests/integration-tests/packager.test.ts` — extended `mockDeps` /
  `baseDeps` with a stubbed `createSecurityTaskRunner` so the default
  scope path no longer attempts to spawn a real worker pool from the
  test environment. The pack-level assertion on `validateFileSafety`
  now matches the new 6th-argument deps object via
  `expect.objectContaining({ taskRunner: expect.any(Object) })`.
2026-05-08 17:05:51 +00:00
Kazuki Yamada f67731056a test: Round-3 PR review feedback
- validateFileSafety: pin the negative path of `if (config.security.enableSecurityCheck)`
  — every other test enabled the check, so a regression that always runs
  the security check would have passed silently.
- unifiedWorker:
  - Add a positive workerData=securityCheck + ambiguous-task case so the
    pair (override + this) distinguishes "inference always wins" from
    "inference wins only when it yields a value".
  - Stop pretending the handler-cache test verifies caching. Both branches
    of `if (cached) return cached;` end with the same Map.set, and Node's
    own module cache makes the dynamic import effectively free, so the
    cache is unobservable from outside without exposing internals.
    Renamed to "repeated calls" with a comment explaining the limitation.
- fileSystemReadDirectoryTool: translate the pre-existing Japanese comment
  to English per CLAUDE.md.
- TokenCounter: extract `LoadEncodingFn` type alias instead of the
  unusual `typeof loadEncoding`, so a signature drift between the local
  function and the deps field would surface at the type level.
2026-04-26 22:47:21 +09:00
Kazuki Yamada cbdfc29b4d test: Cover error/edge paths in core (output, file, security, treeSitter)
Lift the four most impactful uncovered files past 90% lines without
introducing fragile or contrived tests. Each block targets real
user-facing branches (error handling, optional features, init/dispose).

- core/output/outputGenerate (78% -> ~90%):
  - buildOutputGeneratorContext: instructionFilePath success and missing-file
    paths; pre-computed vs. searchFiles fallback for empty directories;
    full-tree mode (success and listing failure); searchFiles failure wrap.
  - generateOutput: unsupported style throws RepomixError.

- core/security/validateFileSafety (79% -> ~95%):
  - logSuspiciousContentWarning loop: header line per section, plus
    singular ("issue") and plural ("issues") suffix per result.
  - No-op behavior when no suspicious git diff/log entries exist.

- core/file/fileSearch (88% -> ~92%):
  - handleGlobbyError: EPERM and EACCES translated to PermissionError;
    other error codes pass through.
  - Outer catch: generic Error wrapped with directory context;
    non-Error throw produces the generic fallback message.

- core/treeSitter/languageParser (74% -> ~88%):
  - getResources before init() throws RepomixError.
  - init() is idempotent (Parser.init is called only once across two calls).
  - Parser.init() failure is wrapped as RepomixError.
  - dispose() resets state so subsequent calls require re-init.

Coverage:
- Statements 89.51% -> 90.23%
- Branches   79.31% -> 80.26%
- Functions  89.37% -> 89.69%
- Lines      90.06% -> 90.80%
2026-04-26 19:35:00 +09:00
Kazuki Yamada 5b5ee862a0 feat(cli): Add --include-logs option for git commit history
This feature allows users to include git log information in the output to help AI understand development patterns and file change relationships.

Key changes:
- Added --include-logs and --include-logs-count CLI options
- Default to 50 commits, configurable via CLI and config file
- Includes commit date, message, and changed file paths (excludes commit hashes)
- Added security checks and metrics calculation for git logs
- Updated output templates to include git logs section
- Comprehensive test coverage and TypeScript fixes

Resolves user request for including git commit history to provide development context for AI analysis.
2025-08-22 14:09:58 +09:00
Kazuki Yamada 1e7a09c4c7 refactor(security): Enhance security check structure by introducing SecurityCheckType and updating file path handling 2025-05-10 16:12:00 +09:00
Kazuki Yamada 265845a9c0 fix(gitDiff): Fix syntax and tests 2025-05-10 11:26:05 +09:00
Kazuki Yamada ebacdd967c feat(pack): Simplify the process and make it testable with DI 2025-01-25 12:43:38 +09:00
Mike Judge 33d9c14650 Fixes from linter 2024-12-24 17:42:55 -08:00
Mike Judge e57aea8940 Split up validateFileSafety into smaller functions that each do one thing 2024-12-24 16:11:10 -08:00
Mike Judge ce136f3397 Move validateFileSafety into the security folder 2024-12-24 13:58:40 -08:00