Commit Graph

37 Commits

Author SHA1 Message Date
Kazuki Yamada fb281d4560 test(core): Wire createSecurityTaskRunner mock into smaller packager tests
Continuation of the perf(core) Pre-warm security worker pool change —
extends `mockDeps` / inline pack-test plumbing in the three smaller test
files so the default-scope path no longer attempts to spawn a real
worker pool from the test environment.

- tests/core/packager/diffsFunctionality.test.ts: adds
  `mockCreateSecurityTaskRunner` to both pack-call sites.
- tests/core/packager/splitOutput.test.ts: same — adds the stub to the
  inline mock deps.
- tests/core/security/validateFileSafety.test.ts: updates the
  `runSecurityCheck` call assertion to include the new
  `{ taskRunner: undefined }` deps argument forwarded by
  `validateFileSafety` when no pre-warmed runner is provided.

(See PR description / parent commit for the full perf change rationale,
benchmark numbers, and correctness notes.)
2026-05-09 02:30:26 +09:00
Claude 68a47b9149 perf(core): Skip redundant full-output tokenization via wrapper-extraction fast path (-13.2%)
When `tokenCountTree` is enabled `calculateSelectiveFileMetrics` already
tokenizes every file individually on the primary worker pool. The original
`calculateOutputMetrics` then re-tokenized the full output a second time, split
into 200 KB chunks, to compute `totalTokens`. On large repos with the tree
display enabled, this second pass was the single longest task in the
`calculateMetrics` `Promise.all`, consuming roughly 1 second of worker time
that duplicated work already done for the per-file counts.

This change introduces a fast path for the common case (xml / markdown / plain
output, non-parsable, single-part): walk the generated output with
`indexOf(file.content, cursor)` once per file to splice file contents out of
the output, tokenize only the remaining "wrapper" (template boilerplate +
directory tree + git diff/log + per-file headers), and compute
`totalTokens = Σ per-file tokens + wrapper tokens`.

The accuracy delta versus the old 200 KB-chunk approach is bounded by BPE
merges across file↔wrapper boundaries; on the repomix repository itself the
measured error was 309 / 1,284,067 tokens ≈ 0.024 %, comparable to the chunk
boundary error the existing approach already accepts.

## Implementation

- `src/core/metrics/calculateMetrics.ts`
  - Add `extractOutputWrapper(output, processedFilesInOutputOrder)` which
    walks the output with a single forward cursor. Returns `null` and
    triggers a fall back to `calculateOutputMetrics` if any file content is
    not found (e.g., template escaped it, output was split, order mismatch).
  - Add `canUseFastOutputTokenPath(config)` gate: only enabled when
    `tokenCountTree` is truthy, `splitOutput` is undefined, `parsableStyle`
    is false, and the style is `xml` / `markdown` / `plain`. JSON output
    and parsable XML go through `JSON.stringify` / `fast-xml-builder` which
    escape file contents, so `indexOf(content)` would miss them.
  - In `calculateMetrics`, when the fast path is available and wrapper
    extraction succeeds, replace `outputMetricsPromise` with a promise that
    awaits the already-running `selectiveFileMetricsPromise`, sums the
    per-file token counts, and dispatches a single `runTokenCount` on the
    extracted wrapper string. The rest of the `Promise.all` is unchanged.

- `src/core/packager.ts`
  - Call `sortOutputFiles(filteredProcessedFiles, config)` once in `pack`
    immediately after suspicious-file filtering and use its result as
    `processedFiles` downstream (for `produceOutput`, `calculateMetrics`,
    and the final result object). `generateOutput` internally calls
    `sortOutputFiles` as well, which is stable and memoized via
    `fileChangeCountsCache`, so the two now share the single git-log
    subprocess result and consumers see files in the exact order they
    appear in the output. This is a precondition for the fast path's
    forward-walk extraction.
  - Expose `sortOutputFiles` on `defaultDeps` so existing packager unit
    tests can inject their own implementation.

- `tests/core/packager/diffsFunctionality.test.ts`
  - Extend the `gitRepositoryHandle.js` `vi.mock` to also stub
    `isGitInstalled` and `getFileChangeCount`, since `sortOutputFiles`
    resolves its default dependencies from that module at module load time.

All 1102 existing tests pass unchanged; lint is clean.

## Benchmark

Interleaved 30-run benchmark against the repomix repo itself (1018 files,
~4 MB xml output, `tokenCountTree: 50000`, `sortByChanges: true`, `includeDiffs`
and `includeLogs` enabled via the repo's own `repomix.config.json`):

    base median: 2735.2 ms  [2389 - 3528]  IQR=367 ms
    opt  median: 2373.6 ms  [2125 - 2653]  IQR=293 ms
    delta:       -361.6 ms  (-13.22%)

Verbose trace before/after (single run, representative):

    before:
      Selective metrics calculation completed in 639 ms
      Output token count completed in      1046 ms
      Calculate Metrics wall:               1296 ms

    after:
      Selective metrics calculation completed in 579 ms
      Fast-path output tokens: files=1017293, wrapper=33678 (126996 chars)
      Calculate Metrics wall:                ~580 ms

The savings are concentrated in the `calculateMetrics` phase, which was the
dominant critical path in the final `Promise.all` for tokenCountTree runs on
large repos.
2026-04-12 17:47:03 +09:00
Kazuki Yamada cfbab618c5 refactor(metrics): Encapsulate warmup logic in createMetricsTaskRunner
Move worker thread warmup from packager into createMetricsTaskRunner,
which now returns both a taskRunner and warmupPromise. This keeps the
packager clean — it no longer needs to know warmup implementation details.

Also:
- Skip metrics worker pool creation on skill-generation path where
  it is unused
- Await warmupPromise in finally block before cleanup to prevent
  tearing down workers during initialization

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 14:56:49 +09:00
Kazuki Yamada 96a6a7c804 perf(core): Cache empty directory paths to avoid redundant file search
When includeEmptyDirectories is enabled, buildOutputGeneratorContext
called searchFiles a second time just to obtain emptyDirPaths, despite
these already being computed during the initial file search in packager.

Changes:
- Capture emptyDirPaths from the initial searchFiles result in packager
  and thread them through the pipeline (packager → produceOutput →
  generateOutput/outputSplit → buildOutputGeneratorContext)
- Guard emptyDirPaths processing with includeEmptyDirectories check to
  skip unnecessary work when the feature is disabled
- Fix split output path which was not receiving emptyDirPaths despite
  the parameter being declared in produceOutput's signature
- Add tests for cache hit (searchFiles not called) and fallback paths

Local benchmark (repomix on itself, includeEmptyDirectories: true):
  main:   696.6ms ± 4.2ms
  branch: 637.1ms ± 2.6ms
  Improvement: ~60ms (~8.5%)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 00:03:49 +09:00
Claude ea29ac9ffa perf(core): Overlap security check, file processing, and metrics with output generation
Pipeline optimization that parallelizes independent stages to reduce end-to-end latency:

1. Run security check and file processing concurrently: Security check uses
   worker threads while file processing (in default config) runs on the main
   thread, so they don't compete for CPU. After both complete, suspicious files
   are filtered from the processed results using a Set for O(1) lookups.

2. Overlap output generation with metrics calculation: File metrics and git
   metrics don't depend on the generated output, so they start immediately
   via the worker pool while output generation runs on the main thread. Only
   output token counting waits for the output string, passed as a Promise.

Before (sequential):
  collectFiles → securityCheck(284ms) → processFiles(75ms) → produceOutput(185ms) → calculateMetrics(870ms)

After (overlapped):
  collectFiles → [securityCheck || processFiles] → [produceOutput || fileMetrics + gitMetrics] → outputMetrics

Benchmark results (repomix repo, 989 files, 10 runs each, back-to-back):

  Baseline:  avg 2187ms, median 2176ms, p90 2269ms
  Optimized: avg 2022ms, median 2018ms, p90 2070ms

  Average improvement: 165ms (7.5%)
  Median improvement:  158ms (7.3%)
  P90 improvement:     199ms (8.8%)

https://claude.ai/code/session_011Sfyivv65pcN4VhFjRnDbs
2026-03-31 22:39:06 +09:00
Claude 4d2bbcf6cc perf(core): Pre-initialize metrics worker pool to overlap tiktoken WASM loading
Pipeline-level optimizations that produce measurable end-to-end improvement:

- Pre-initialize metrics worker pool during file collection phase so tiktoken
  WASM loading overlaps with security checks and file processing. First token
  count task dropped from 381ms to 22ms (worker already warmed).
- Lazy-load Jiti via dynamic import — only loaded when TS/JS config files are
  detected, saving startup time for the common JSON/default config path.
- Fix O(n²) file path re-grouping in packager by using Map + Set for O(1)
  membership checks instead of .find() + .includes().
- Move binary extension check before fs.stat in fileRead to skip unnecessary
  stat syscalls for binary files.
- Parallelize split output file writes with Promise.all instead of sequential
  for-loop.

Benchmark (15 runs each, median ± IQR, packing repomix repo ~1000 files):

  main branch: 3515ms (P25: 3443, P75: 3581)
  perf branch: 3318ms (P25: 3215, P75: 3383)
  Improvement: -197ms (-5.6%)

Pipeline stage breakdown (instrumented):
  - Metrics first-file init: 381ms → 22ms (worker pre-warmed)
  - Total metrics stage: 793ms → ~450ms

All 1096 tests pass. Lint clean.

https://claude.ai/code/session_01JoNjFe7S2roMfHfNcw6bso
2026-03-28 01:15:43 +09:00
Florian Lefebvre 7ec67b181e lint 2026-03-24 09:52:58 +01:00
Florian Lefebvre 4f487444cb perf: migrate to tinyclip from clipboardy 2026-03-24 09:52:18 +01:00
Kazuki Yamada f41b75c560 fix(test): fix lint errors and update test signatures for filePathsByRoot
- Remove unused imports (generateFileTree, treeToString) in fileTreeGenerate.test.ts
- Add filePathsByRoot parameter to generateOutput and produceOutput calls in tests
- Update expect assertions to include filePathsByRoot argument
2026-01-04 23:11:28 +09:00
Kazuki Yamada 375da204b1 refactor(core): Extract inner functions from generateSplitOutputParts
- Move makeChunkConfig and renderGroups to module level for better readability
- Add GenerateOutputFn type alias using typeof generateOutput
- Add comment explaining O(N²) complexity and why it's acceptable
- Fix test mock property names to match actual GitDiffResult/GitLogResult types
- Update integration tests to use produceOutput instead of individual functions
2025-12-21 21:56:39 +09:00
Kazuki Yamada e780bcab45 test(core): Add produceOutput unit tests
Cover both single and split output modes:
- Single output: generate, write, clipboard
- Split output: multiple file writes, no clipboard
- Git diff/log passthrough
- Progress callback invocations
2025-12-21 21:56:39 +09:00
Kazuki Yamada 5f9c22ef6d refactor(core): Extract output generation logic from packager
Move split/single output generation and writing logic to
packager/produceOutput.ts to keep packager.ts focused
on the high-level orchestration flow.

- Create produceOutput module handling both output modes
- Simplify packager.ts from 227 to 181 lines
- Update related tests to use new dependency structure
2025-12-21 21:56:39 +09:00
Dango233 3216450bd5 docs(cli): Document --split-output
Add the new split output flag to README and website docs, including examples and the config option.
2025-12-21 21:56:39 +09:00
Dango233 e51d77a7c6 feat(cli): Add --split-output option
Adds a size-based output splitter via --split-output (kb/mb) and writes numbered parts without splitting within a top-level folder.

Also updates metrics aggregation for multi-part output and adds unit tests.
2025-12-21 21:56:39 +09:00
Kazuki Yamada 8bf797114b feat(cli): Report files detected as binary by content inspection
Add new "Binary Files Detected" section to CLI output that shows files which were
skipped due to binary content detection (not extension-based). This addresses issue #752
where users were not informed about files being silently excluded.

Changes:
- Update fileRead.ts to return detailed skip reasons (binary-extension, binary-content, size-limit, encoding-error)
- Modify file collection pipeline to track and propagate skipped files
- Add reportSkippedFiles function to display binary-content detected files
- Show files with relative paths and helpful exclusion messages
- Only display section when binary-content files are found
- Add comprehensive test coverage for new functionality

The implementation follows existing security check reporting patterns and provides
users clear visibility into why files were excluded from output.
2025-08-23 14:35:54 +09:00
Kazuki Yamada 002bea3d61 refactor(core): Rename handleOutput to writeOutputToDisk for clarity
Update dependency injection parameter names to be more descriptive of the actual functionality.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-23 00:05:52 +09:00
Kazuki Yamada fecebc2ca6 refactor(core): Update GitDiffResult imports and restructure git handling modules 2025-05-24 14:14:26 +09:00
Kazuki Yamada b13a21aebd refactor(core): Migrate GitDiffResult and related functions to gitHandle module 2025-05-24 13:59:32 +09:00
Kazuki Yamada 0be489dcbb refactor(core): Update git command imports and restructure gitHandle module 2025-05-24 13:37:57 +09:00
Kazuki Yamada 9538395cdf refactor(core): Move Git-related modules to dedicated core/git directory 2025-05-19 14:53:28 +00:00
Devin AI 48ec00c63a - Replace hardcoded config objects with createMockConfig utility
- Add proper typing to mock objects and functions
- Remove unnecessary type casting
- Add GitDiffResult type to git diff objects

Co-Authored-By: Kazuki Yamada <koukun0120@gmail.com>
2025-05-13 22:55:50 +09:00
Kazuki Yamada 5617f9de64 test(cli): Remove stdout mode test for piped input 2025-05-10 16:33:09 +09:00
Kazuki Yamada 95f7092b94 feat(cli): add --stdout option for output to stdout instead of file, enhancing CLI flexibility
- Introduced `--stdout` flag to allow output to standard output, which cannot be used with the `--output` option.
- Updated CLI configuration to handle `stdout` mode.
- Enhanced documentation with examples for using `--stdout`.
- Added tests to ensure correct behavior when using `--stdout` in various scenarios.
2025-05-10 16:33:09 +09:00
Kazuki Yamada 265845a9c0 fix(gitDiff): Fix syntax and tests 2025-05-10 11:26:05 +09:00
Kazuki Yamada 464ccc582b feat(diff): Refactor CLI and output generation to support git diffs
- Updated CLI options to use `--include-diffs` instead of `--diffs`.
- Refactored `printSummary` to accept a `PackResult` object for better data handling.
- Introduced `getStagedDiff` function to retrieve staged changes from git.
- Created `getGitDiffs` function to encapsulate logic for fetching both worktree and staged diffs.
- Modified output generation functions to include git diffs in various formats (markdown, XML, plain text).
- Updated tests to reflect changes in CLI options and output generation logic, ensuring proper handling of git diffs.
- Removed deprecated `diffContent` from config schema and adjusted related logic.
2025-05-07 00:25:23 +09:00
pmdyy 412a94b9d9 fix(output): fix lints 2025-05-06 00:08:51 -06:00
pmdyy cdfa93c594 feat(output): Add git diff support with --diffs flag 2025-05-05 23:57:05 -06:00
Kazuki Yamada 53ae395280 test(cli): Fix logger mock path in clipboard test 2025-04-20 18:38:05 +09:00
Kazuki Yamada 25afab0829 test(core): Improve test coverage for copyToClipboardIfEnabled 2025-04-20 17:59:20 +09:00
Kazuki Yamada 6c9a149eb5 feat(pack): Simplify various processes 2025-01-25 13:55:19 +09:00
Kazuki Yamada e64c6044ac fix(lint): Fix lint errors 2024-12-27 23:45:17 +07:00
Mike Judge 33d9c14650 Fixes from linter 2024-12-24 17:42:55 -08:00
Mike Judge 9afa2aeaa9 Extract another function from the packager 2024-12-24 16:26:58 -08:00
Mike Judge f4797b81fd Pull out writeOutputToDisk into its own function for clarity 2024-12-24 16:19:23 -08:00
Mike Judge ce136f3397 Move validateFileSafety into the security folder 2024-12-24 13:58:40 -08:00
Mike Judge 7c3bcae08e Rename getSafeFiles to validateFileSafety 2024-12-24 13:29:10 -08:00
Mike Judge 0ee1fbc972 Extract getSafeFiles from Packager 2024-12-24 11:32:46 -08:00