repomix-mirror

mirror of https://github.com/yamadashy/repomix.git synced 2026-05-30 11:18:53 +02:00

Author	SHA1	Message	Date
Kazuki Yamada	fb281d4560	test(core): Wire createSecurityTaskRunner mock into smaller packager tests Continuation of the perf(core) Pre-warm security worker pool change — extends `mockDeps` / inline pack-test plumbing in the three smaller test files so the default-scope path no longer attempts to spawn a real worker pool from the test environment. - tests/core/packager/diffsFunctionality.test.ts: adds `mockCreateSecurityTaskRunner` to both pack-call sites. - tests/core/packager/splitOutput.test.ts: same — adds the stub to the inline mock deps. - tests/core/security/validateFileSafety.test.ts: updates the `runSecurityCheck` call assertion to include the new `{ taskRunner: undefined }` deps argument forwarded by `validateFileSafety` when no pre-warmed runner is provided. (See PR description / parent commit for the full perf change rationale, benchmark numbers, and correctness notes.)	2026-05-09 02:30:26 +09:00
Claude	68a47b9149	perf(core): Skip redundant full-output tokenization via wrapper-extraction fast path (-13.2%) When `tokenCountTree` is enabled `calculateSelectiveFileMetrics` already tokenizes every file individually on the primary worker pool. The original `calculateOutputMetrics` then re-tokenized the full output a second time, split into 200 KB chunks, to compute `totalTokens`. On large repos with the tree display enabled, this second pass was the single longest task in the `calculateMetrics` `Promise.all`, consuming roughly 1 second of worker time that duplicated work already done for the per-file counts. This change introduces a fast path for the common case (xml / markdown / plain output, non-parsable, single-part): walk the generated output with `indexOf(file.content, cursor)` once per file to splice file contents out of the output, tokenize only the remaining "wrapper" (template boilerplate + directory tree + git diff/log + per-file headers), and compute `totalTokens = Σ per-file tokens + wrapper tokens`. The accuracy delta versus the old 200 KB-chunk approach is bounded by BPE merges across file↔wrapper boundaries; on the repomix repository itself the measured error was 309 / 1,284,067 tokens ≈ 0.024 %, comparable to the chunk boundary error the existing approach already accepts. ## Implementation - `src/core/metrics/calculateMetrics.ts` - Add `extractOutputWrapper(output, processedFilesInOutputOrder)` which walks the output with a single forward cursor. Returns `null` and triggers a fall back to `calculateOutputMetrics` if any file content is not found (e.g., template escaped it, output was split, order mismatch). - Add `canUseFastOutputTokenPath(config)` gate: only enabled when `tokenCountTree` is truthy, `splitOutput` is undefined, `parsableStyle` is false, and the style is `xml` / `markdown` / `plain`. JSON output and parsable XML go through `JSON.stringify` / `fast-xml-builder` which escape file contents, so `indexOf(content)` would miss them. - In `calculateMetrics`, when the fast path is available and wrapper extraction succeeds, replace `outputMetricsPromise` with a promise that awaits the already-running `selectiveFileMetricsPromise`, sums the per-file token counts, and dispatches a single `runTokenCount` on the extracted wrapper string. The rest of the `Promise.all` is unchanged. - `src/core/packager.ts` - Call `sortOutputFiles(filteredProcessedFiles, config)` once in `pack` immediately after suspicious-file filtering and use its result as `processedFiles` downstream (for `produceOutput`, `calculateMetrics`, and the final result object). `generateOutput` internally calls `sortOutputFiles` as well, which is stable and memoized via `fileChangeCountsCache`, so the two now share the single git-log subprocess result and consumers see files in the exact order they appear in the output. This is a precondition for the fast path's forward-walk extraction. - Expose `sortOutputFiles` on `defaultDeps` so existing packager unit tests can inject their own implementation. - `tests/core/packager/diffsFunctionality.test.ts` - Extend the `gitRepositoryHandle.js` `vi.mock` to also stub `isGitInstalled` and `getFileChangeCount`, since `sortOutputFiles` resolves its default dependencies from that module at module load time. All 1102 existing tests pass unchanged; lint is clean. ## Benchmark Interleaved 30-run benchmark against the repomix repo itself (1018 files, ~4 MB xml output, `tokenCountTree: 50000`, `sortByChanges: true`, `includeDiffs` and `includeLogs` enabled via the repo's own `repomix.config.json`): base median: 2735.2 ms [2389 - 3528] IQR=367 ms opt median: 2373.6 ms [2125 - 2653] IQR=293 ms delta: -361.6 ms (-13.22%) Verbose trace before/after (single run, representative): before: Selective metrics calculation completed in 639 ms Output token count completed in 1046 ms Calculate Metrics wall: 1296 ms after: Selective metrics calculation completed in 579 ms Fast-path output tokens: files=1017293, wrapper=33678 (126996 chars) Calculate Metrics wall: ~580 ms The savings are concentrated in the `calculateMetrics` phase, which was the dominant critical path in the final `Promise.all` for tokenCountTree runs on large repos.	2026-04-12 17:47:03 +09:00
Kazuki Yamada	cfbab618c5	refactor(metrics): Encapsulate warmup logic in createMetricsTaskRunner Move worker thread warmup from packager into createMetricsTaskRunner, which now returns both a taskRunner and warmupPromise. This keeps the packager clean — it no longer needs to know warmup implementation details. Also: - Skip metrics worker pool creation on skill-generation path where it is unused - Await warmupPromise in finally block before cleanup to prevent tearing down workers during initialization Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 14:56:49 +09:00
Claude	4d2bbcf6cc	perf(core): Pre-initialize metrics worker pool to overlap tiktoken WASM loading Pipeline-level optimizations that produce measurable end-to-end improvement: - Pre-initialize metrics worker pool during file collection phase so tiktoken WASM loading overlaps with security checks and file processing. First token count task dropped from 381ms to 22ms (worker already warmed). - Lazy-load Jiti via dynamic import — only loaded when TS/JS config files are detected, saving startup time for the common JSON/default config path. - Fix O(n²) file path re-grouping in packager by using Map + Set for O(1) membership checks instead of .find() + .includes(). - Move binary extension check before fs.stat in fileRead to skip unnecessary stat syscalls for binary files. - Parallelize split output file writes with Promise.all instead of sequential for-loop. Benchmark (15 runs each, median ± IQR, packing repomix repo ~1000 files): main branch: 3515ms (P25: 3443, P75: 3581) perf branch: 3318ms (P25: 3215, P75: 3383) Improvement: -197ms (-5.6%) Pipeline stage breakdown (instrumented): - Metrics first-file init: 381ms → 22ms (worker pre-warmed) - Total metrics stage: 793ms → ~450ms All 1096 tests pass. Lint clean. https://claude.ai/code/session_01JoNjFe7S2roMfHfNcw6bso	2026-03-28 01:15:43 +09:00
Kazuki Yamada	5f9c22ef6d	refactor(core): Extract output generation logic from packager Move split/single output generation and writing logic to packager/produceOutput.ts to keep packager.ts focused on the high-level orchestration flow. - Create produceOutput module handling both output modes - Simplify packager.ts from 227 to 181 lines - Update related tests to use new dependency structure	2025-12-21 21:56:39 +09:00
Kazuki Yamada	8bf797114b	feat(cli): Report files detected as binary by content inspection Add new "Binary Files Detected" section to CLI output that shows files which were skipped due to binary content detection (not extension-based). This addresses issue #752 where users were not informed about files being silently excluded. Changes: - Update fileRead.ts to return detailed skip reasons (binary-extension, binary-content, size-limit, encoding-error) - Modify file collection pipeline to track and propagate skipped files - Add reportSkippedFiles function to display binary-content detected files - Show files with relative paths and helpful exclusion messages - Only display section when binary-content files are found - Add comprehensive test coverage for new functionality The implementation follows existing security check reporting patterns and provides users clear visibility into why files were excluded from output.	2025-08-23 14:35:54 +09:00
Kazuki Yamada	002bea3d61	refactor(core): Rename handleOutput to writeOutputToDisk for clarity Update dependency injection parameter names to be more descriptive of the actual functionality. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-07-23 00:05:52 +09:00
Kazuki Yamada	fecebc2ca6	refactor(core): Update GitDiffResult imports and restructure git handling modules	2025-05-24 14:14:26 +09:00
Kazuki Yamada	b13a21aebd	refactor(core): Migrate GitDiffResult and related functions to gitHandle module	2025-05-24 13:59:32 +09:00
Kazuki Yamada	0be489dcbb	refactor(core): Update git command imports and restructure gitHandle module	2025-05-24 13:37:57 +09:00
Kazuki Yamada	9538395cdf	refactor(core): Move Git-related modules to dedicated core/git directory	2025-05-19 14:53:28 +00:00
Devin AI	48ec00c63a	- Replace hardcoded config objects with createMockConfig utility - Add proper typing to mock objects and functions - Remove unnecessary type casting - Add GitDiffResult type to git diff objects Co-Authored-By: Kazuki Yamada <koukun0120@gmail.com>	2025-05-13 22:55:50 +09:00
Kazuki Yamada	265845a9c0	fix(gitDiff): Fix syntax and tests	2025-05-10 11:26:05 +09:00
Kazuki Yamada	464ccc582b	feat(diff): Refactor CLI and output generation to support git diffs - Updated CLI options to use `--include-diffs` instead of `--diffs`. - Refactored `printSummary` to accept a `PackResult` object for better data handling. - Introduced `getStagedDiff` function to retrieve staged changes from git. - Created `getGitDiffs` function to encapsulate logic for fetching both worktree and staged diffs. - Modified output generation functions to include git diffs in various formats (markdown, XML, plain text). - Updated tests to reflect changes in CLI options and output generation logic, ensuring proper handling of git diffs. - Removed deprecated `diffContent` from config schema and adjusted related logic.	2025-05-07 00:25:23 +09:00
pmdyy	412a94b9d9	fix(output): fix lints	2025-05-06 00:08:51 -06:00
pmdyy	cdfa93c594	feat(output): Add git diff support with --diffs flag	2025-05-05 23:57:05 -06:00

16 Commits