repomix-mirror

mirror of https://github.com/yamadashy/repomix.git synced 2026-05-30 11:18:53 +02:00

Author	SHA1	Message	Date
Claude	fb4c895085	perf(core): Pre-warm security worker pool to overlap @secretlint/core load The security worker pool currently spawns its 2 workers lazily inside `runSecurityCheck`, paying a ~50 ms `@secretlint/core` + `@secretlint/secretlint-rule-preset-recommend` module load on each freshly spawned worker (~100 ms wall-clock for both workers loading concurrently). That cold-start cost runs on the critical path inside the security-check phase, before any scanning begins. Mirror the existing `createMetricsTaskRunner` pattern: hoist the pool construction to `pack()` and dispatch one no-op task per worker at the pipeline entry, so the module load overlaps with the collectFiles + git ops phase (~200 ms) instead of stalling the security check. ## Mechanism - New `createSecurityTaskRunner(numOfTasks, deps?)` in `src/core/security/securityCheck.ts` returns `{ taskRunner, warmupPromise }`. The warm-up dispatches `maxThreads` no-op tasks (`{ items: [] }`) — Tinypool spawns a fresh worker for each concurrent task, fanning out the @secretlint/core load across all workers in parallel. - `runSecurityCheck` accepts an optional `taskRunner` in `deps`. When provided, the caller owns the pool's lifecycle (creation + cleanup); when omitted, runSecurityCheck creates and cleans up a fresh pool — preserving the existing behavior for direct callers (e.g. the MCP fileSystemReadFileTool path). - `validateFileSafety` accepts and forwards an optional `taskRunner`. - `pack()` calls `createSecurityTaskRunner` after `searchFiles` resolves (file count is now known) and before the parallel collectFiles + git ops block, so the warm-up runs concurrently with disk I/O. The task runner is plumbed through `validateFileSafety` deps; the pool is cleaned up alongside the metrics pool in the surrounding try/finally. ## Scope gate Pre-warming is gated on the same `hasExplicitScope` heuristic that already differentiates 2- vs. 3-worker metrics warm-up: \| Workload \| Pre-warm? \| \|--------------------------------------------------\|-----------\| \| Default scan (no `--include` / `--stdin`) \| yes \| \| `--include`, `config.include`, or `--stdin` set \| no \| Without the gate, the small/scoped workload regresses by 3.4 % paired mean: the security check scans only ~5 batches and finishes in ~50–80 ms, so the up-front cost of constructing + destroying a second worker pool outweighs the saved cold-start. The unconstrained scan runs security over ~1000+ files where the hidden cold-start dominates. ## Benchmark — `node bin/repomix.cjs --quiet` (1046 files) Two independent paired n=50 runs (interleaved BEFORE/AFTER alternating order, NODE_DISABLE_COMPILE_CACHE=1): \| \| min \| median \| mean \| max \| sd \| \|--------\|---------\|---------\|---------\|---------\|--------\| \| BEFORE \| 1320 ms \| 1454 ms \| 1451 ms \| 1590 ms \| 49 ms \| \| AFTER \| 1318 ms \| 1410 ms \| 1416 ms \| 1501 ms \| 40 ms \| - Mean paired Δ: +35.2 ms (2.42 % wall-clock reduction) - Median paired Δ: +32.5 ms (2.23 %) - Paired-delta SD: 64.78 ms · paired t = 3.84 (p < 0.001) - AFTER faster in 39/50 pairs (78 %) Confirmation run (same setup, n=50): mean Δ +37.0 ms (2.55 %), t = 3.93, 36/50 pairs faster. ## Regression check — `--include 'src,tests' --quiet` (258 files) n=30 paired interleaved, NODE_DISABLE_COMPILE_CACHE=1: \| \| min \| median \| mean \| max \| \|--------\|--------\|--------\|--------\|--------\| \| BEFORE \| 670 ms \| 732 ms \| 730 ms \| 783 ms \| \| AFTER \| 688 ms \| 728 ms \| 729 ms \| 786 ms \| - Mean paired Δ: +0.9 ms (0.13 %) — neutral within noise (paired t = 0.17) - AFTER faster in 16/30 pairs The gate falls back to the original lazy-spawn path on this workload, so AFTER == BEFORE up to noise. Without the gate this workload regresses by 3.4 % paired (t = -4.88). ## Correctness - All 1260 unit tests pass (`npm test`); `npm run lint` clean (only the two pre-existing `biome-ignore` warnings unrelated to this change). - XML output byte-identical between BEFORE and AFTER on both the default 1046-file workload and the `--include 'src,tests'` 258-file workload (verified via `diff` on full ~4.85 MB outputs). - `runSecurityCheck`'s public signature gains an optional `taskRunner` in deps; when omitted, behavior is unchanged. Existing callers outside the pack pipeline (e.g. MCP `fileSystemReadFileTool`) still spawn their own pool. - The MCP main-thread security path is unaffected — it uses `runSecretLint` directly (worker module loaded once at process start) and never goes through the pool. ## Tests - `tests/core/security/validateFileSafety.test.ts` — assertion on the `runSecurityCheck` call updated to include the new `{ taskRunner }` deps argument (currently undefined when no pre-warmed runner is provided). - `tests/core/packager.test.ts`, `tests/core/packager/diffsFunctionality.test.ts`, `tests/core/packager/splitOutput.test.ts`, `tests/integration-tests/packager.test.ts` — extended `mockDeps` / `baseDeps` with a stubbed `createSecurityTaskRunner` so the default scope path no longer attempts to spawn a real worker pool from the test environment. The pack-level assertion on `validateFileSafety` now matches the new 6th-argument deps object via `expect.objectContaining({ taskRunner: expect.any(Object) })`.	2026-05-08 17:05:51 +00:00
Kazuki Yamada	f67731056a	test: Round-3 PR review feedback - validateFileSafety: pin the negative path of `if (config.security.enableSecurityCheck)` — every other test enabled the check, so a regression that always runs the security check would have passed silently. - unifiedWorker: - Add a positive workerData=securityCheck + ambiguous-task case so the pair (override + this) distinguishes "inference always wins" from "inference wins only when it yields a value". - Stop pretending the handler-cache test verifies caching. Both branches of `if (cached) return cached;` end with the same Map.set, and Node's own module cache makes the dynamic import effectively free, so the cache is unobservable from outside without exposing internals. Renamed to "repeated calls" with a comment explaining the limitation. - fileSystemReadDirectoryTool: translate the pre-existing Japanese comment to English per CLAUDE.md. - TokenCounter: extract `LoadEncodingFn` type alias instead of the unusual `typeof loadEncoding`, so a signature drift between the local function and the deps field would surface at the type level.	2026-04-26 22:47:21 +09:00
Kazuki Yamada	cbdfc29b4d	test: Cover error/edge paths in core (output, file, security, treeSitter) Lift the four most impactful uncovered files past 90% lines without introducing fragile or contrived tests. Each block targets real user-facing branches (error handling, optional features, init/dispose). - core/output/outputGenerate (78% -> ~90%): - buildOutputGeneratorContext: instructionFilePath success and missing-file paths; pre-computed vs. searchFiles fallback for empty directories; full-tree mode (success and listing failure); searchFiles failure wrap. - generateOutput: unsupported style throws RepomixError. - core/security/validateFileSafety (79% -> ~95%): - logSuspiciousContentWarning loop: header line per section, plus singular ("issue") and plural ("issues") suffix per result. - No-op behavior when no suspicious git diff/log entries exist. - core/file/fileSearch (88% -> ~92%): - handleGlobbyError: EPERM and EACCES translated to PermissionError; other error codes pass through. - Outer catch: generic Error wrapped with directory context; non-Error throw produces the generic fallback message. - core/treeSitter/languageParser (74% -> ~88%): - getResources before init() throws RepomixError. - init() is idempotent (Parser.init is called only once across two calls). - Parser.init() failure is wrapped as RepomixError. - dispose() resets state so subsequent calls require re-init. Coverage: - Statements 89.51% -> 90.23% - Branches 79.31% -> 80.26% - Functions 89.37% -> 89.69% - Lines 90.06% -> 90.80%	2026-04-26 19:35:00 +09:00
Kazuki Yamada	5b5ee862a0	feat(cli): Add --include-logs option for git commit history This feature allows users to include git log information in the output to help AI understand development patterns and file change relationships. Key changes: - Added --include-logs and --include-logs-count CLI options - Default to 50 commits, configurable via CLI and config file - Includes commit date, message, and changed file paths (excludes commit hashes) - Added security checks and metrics calculation for git logs - Updated output templates to include git logs section - Comprehensive test coverage and TypeScript fixes Resolves user request for including git commit history to provide development context for AI analysis.	2025-08-22 14:09:58 +09:00
Kazuki Yamada	1e7a09c4c7	refactor(security): Enhance security check structure by introducing SecurityCheckType and updating file path handling	2025-05-10 16:12:00 +09:00
Kazuki Yamada	265845a9c0	fix(gitDiff): Fix syntax and tests	2025-05-10 11:26:05 +09:00
Kazuki Yamada	ebacdd967c	feat(pack): Simplify the process and make it testable with DI	2025-01-25 12:43:38 +09:00
Mike Judge	33d9c14650	Fixes from linter	2024-12-24 17:42:55 -08:00
Mike Judge	e57aea8940	Split up validateFileSafety into smaller functions that each do one thing	2024-12-24 16:11:10 -08:00
Mike Judge	ce136f3397	Move validateFileSafety into the security folder	2024-12-24 13:58:40 -08:00

10 Commits