Commit Graph

377 Commits

Author SHA1 Message Date
Kazuki Yamada c8560d1bc6 perf(core): Reduce metrics warmup to 1 worker on warm-cache path
Lower `EAGER_WARMUP_THREADS` from 2 to 1 when `tokenCountCacheFileExists()`
returns true. With the persistent token-count disk cache populated by a
prior run, `calculateFileMetrics` serves every per-file token count from
the in-memory map and dispatches zero worker tasks. The only worker work
that survives caching on a warm rerun is a small fixed set of dispatches:

  - the wrapper-token tokenization (cache hit after run #2)
  - git diff staged/worktree token counts (only when
    `output.git.includeDiffs` is enabled)
  - git log token count (only when `output.git.includeLogs` is enabled)

That worst case is 2-3 short tasks (a few KB each) that fit a single warm
worker serially in well under 30 ms. Spawning a second warm worker means
a redundant ~340 ms BPE table parse that contends with the file-collection
main thread for CPU AND extends the final `pool.destroy()` blocking wait
(BPE-loaded workers take ~21 ms to terminate vs ~3 ms when idle).

Cold-cache (no cache file) behavior is preserved: the unscoped path keeps
3 warm workers and the explicit-scope path keeps 2, so the actual file
tokenizations still parallelise across the original worker counts.

The probe is a coarse heuristic — a cache file written by a previous run
that used a different `tokenCount.encoding` (e.g. cl100k_base instead of
the default o200k_base) yields no hits for the current run, so the metrics
phase pays one BPE parse sequentially on the critical path before
tokenizing files. This is a one-time cost on encoding switches; subsequent
runs rebuild the cache for the new encoding and hit again.

Benchmark (paired, n=25, repomix self-pack on 1068 files):

  WARM CACHE (cache file present)
    BASELINE  mean=968.9ms  median=976.0ms  sd=40.3ms
    AFTER     mean=883.2ms  median=875.0ms  sd=33.1ms
    DELTA     mean=85.6 ms (8.84%)  median=87.0 ms  sd=42.7
              t=10.02 (df=24)  faster=24/25

  COLD CACHE (cache file deleted before each run, n=12)
    BASELINE  mean=1606.3ms  median=1588.0ms  sd=58.6ms
    AFTER     mean=1593.2ms  median=1598.5ms  sd=58.6ms
    DELTA     mean=13.2 ms (0.82%)  t=0.62  faster=9/12  — within noise

Stacks on top of the existing warm-cache wins on this branch (token-count
disk cache, output-wrapper cache, prefetched template, native ignore-file
prescan, etc.); this single change pushes warm-cache wall-clock another
~86 ms below the previous floor.
2026-05-11 00:13:37 +09:00
Kazuki Yamada ca44a74f3f perf(core): Wire warm-cache heuristic into packager and tests
Companion to the previous commit. Plumb `tokenCountCacheFileExists` into
the packager `defaultDeps` so the metrics warm-up sizing can be exercised
deterministically from tests, and add a paired test that asserts the
2-warm-up-worker branch is taken when the persistent disk cache exists.

Also rename the cold-cache test to make the new gating explicit and refresh
its docstring with the warm/cold distinction.

https://claude.ai/code/session_01TJqKkJ8n3r6Pa2JdW9Vp2w
2026-05-10 15:59:43 +09:00
autofix-ci[bot] 371927e211 [autofix.ci] apply automated fixes 2026-05-09 01:07:48 +00:00
Kazuki Yamada 1d2df3bea3 test(search): Update fileSearch tests for prescan-based ignore collection
Add fs.readdir mock (returning empty array) to all relevant beforeEach
blocks so collectIgnoreFilePatterns does not fail with "entries is not
iterable". Update globby option assertions to reflect gitignore: false
and ignoreFiles: [] now that patterns are pre-collected by the prescan.

https://claude.ai/code/session_01Fm25x51fmGGeFMJyCm1CER
2026-05-09 10:06:46 +09:00
Kazuki Yamada 03b8e70b9c test(core): Wire createSecurityTaskRunner mock into remaining packager tests
Final commit of the perf(core) Pre-warm security worker pool change —
extends the unit packager test and the integration packager test:

- tests/core/packager.test.ts: adds `createSecurityTaskRunner` mock to
  the orchestration test's `mockDeps` and to the `parallel error
  handling` `baseDeps()` shared fixture, updates the
  `validateFileSafety.toHaveBeenCalledWith` assertion to expect the new
  6th-argument deps object (`{ taskRunner: <Object> }`), and adds
  positive/negative gate assertions —
  `expect(deps.createSecurityTaskRunner).toHaveBeenCalled()` for the
  default unscoped path, `.not.toHaveBeenCalled()` for the
  `--include 'src'` and `explicitFiles` (--stdin) paths.

- tests/integration-tests/packager.test.ts: adds the
  `createSecurityTaskRunner` stub so the default-scope path no longer
  attempts to spawn a real worker pool (the previous unhandled-rejection
  noise from a missing worker file URL is gone with this change).

(See PR description / first source commit for the full perf change
rationale, benchmark numbers, and correctness notes.)
2026-05-09 02:35:18 +09:00
Kazuki Yamada fb281d4560 test(core): Wire createSecurityTaskRunner mock into smaller packager tests
Continuation of the perf(core) Pre-warm security worker pool change —
extends `mockDeps` / inline pack-test plumbing in the three smaller test
files so the default-scope path no longer attempts to spawn a real
worker pool from the test environment.

- tests/core/packager/diffsFunctionality.test.ts: adds
  `mockCreateSecurityTaskRunner` to both pack-call sites.
- tests/core/packager/splitOutput.test.ts: same — adds the stub to the
  inline mock deps.
- tests/core/security/validateFileSafety.test.ts: updates the
  `runSecurityCheck` call assertion to include the new
  `{ taskRunner: undefined }` deps argument forwarded by
  `validateFileSafety` when no pre-warmed runner is provided.

(See PR description / parent commit for the full perf change rationale,
benchmark numbers, and correctness notes.)
2026-05-09 02:30:26 +09:00
Claude 15ee2f8d40 perf(core): Use 3 metrics warm-up workers for unconstrained scope
Packing.

  Bumps EAGER_WARMUP_THREADS from 2 to 3 in src/core/packager.ts when the
  user did not narrow the file set via --include / config.include / --stdin.
  Tinypool fixes maxThreads at construction, so the 3rd worker must be
  pre-warmed during the searchFiles + collectFiles window or it stalls
  dispatch (a 4-thread / 2-warm experiment regressed by 27% paired in a
  prior iteration). With explicit scope the file set is typically a few
  hundred files, the metrics phase is shorter, and the 3rd worker's
  ~250ms BPE warm-up dominates the parallelism gain — paired benchmarks
  regressed -11.85% on the 258-file `--include 'src,tests'` workload at
  unconditional EAGER_WARMUP_THREADS=3, so the heuristic falls back to 2.

Reasoning.

  After change 3 on this branch (eager metrics warm-up), the metrics phase
  is the dominant wall-clock contributor on the default 1046-file workload
  (~770 ms in `calculate metrics`, vs ~120 ms output generation, ~370 ms
  search, ~270 ms collect, ~200 ms security check). Five sub-agent
  investigations over independent scopes (CLI startup, file search/glob,
  file collect/security, output generation, token counting) converged on
  metrics worker count as the only candidate clearing the 2% bar without
  regressing other phases. Output gen, security pre-warm, file-search
  scoping, and CLI lazy-load were all measured below threshold or net-
  negative; documented as the previous iteration's notes plus the
  follow-on attempts here:

  - EAGER_WARMUP_THREADS=3 unconditional: -11.85% paired regression on
    the 258-file workload (n=20, t=-10.85), +2.92% on the 1046-file
    workload — net negative because small workloads can't amortize the
    extra BPE parse.
  - Pre-warm the security worker pool gated on the metrics warm-up:
    security-check phase shrunk from 197 ms to 110 ms, but the saving was
    absorbed by the parallel `Process Files` branch and an offsetting
    worker-spawn cost during collect. Paired n=30 measured -4.90% on
    258-file and 0.81% (noise) on 1046-file. Reverted.

Verification.

  Paired interleaved benchmarks (n=20, NODE_DISABLE_COMPILE_CACHE=1):

  Default workload — `node bin/repomix.cjs --quiet` (1046 files):
  |        | min     | median  | mean    | max     | sd     |
  |--------|---------|---------|---------|---------|--------|
  | BEFORE | 1820 ms | 1885 ms | 1886 ms | 2020 ms | 45 ms  |
  | AFTER  | 1700 ms | 1845 ms | 1840 ms | 1970 ms | 62 ms  |
  - Mean paired Δ:   +46.5 ms (2.46% wall-clock reduction)
  - Median paired Δ: +50.0 ms (2.65%)
  - Paired-delta SD: 65.3 ms · paired t = 3.18 (p < 0.01)
  - AFTER faster in 15/20 pairs (75%)

  Scoped workload — `node bin/repomix.cjs --include 'src,tests' --quiet`
  (258 files):
  |        | min     | median  | mean    | max     | sd     |
  |--------|---------|---------|---------|---------|--------|
  | BEFORE | 900 ms  | 955 ms  | 953 ms  | 990 ms  | 25 ms  |
  | AFTER  | 910 ms  | 940 ms  | 946 ms  | 1010 ms | 29 ms  |
  - Mean paired Δ:   +6.5 ms (0.68%) — neutral within noise (t = 0.90)
  - The heuristic falls back to 2 warm workers, so this branch matches
    pre-change behavior; the small positive delta is sampling noise.

  An independent reviewer's paired n=15 NODE_DISABLE_COMPILE_CACHE=1 run
  on a separate sample reported +4.10% (t=6.61, 14/15 pairs) on the
  default workload, consistent direction at higher magnitude.

Correctness.

  - All 1260 unit tests pass (`npm test`); 3 new tests in
    `tests/core/packager.test.ts` exercise both heuristic branches plus
    the `--stdin` (explicitFiles) path.
  - `npm run lint` clean (only pre-existing warnings unchanged from main).
  - XML and Markdown output byte-identical between BEFORE and AFTER on
    both workloads (verified via sha256sum).
  - Worker-pool size confirmed via `--verbose` logs:
    - Default scan: `min=1, max=3 threads` for `calculateMetrics`.
    - `--include 'src,tests'`: `min=1, max=2 threads` (unchanged).
  - Single-CPU and 2-CPU hosts are unaffected (`min(cpuCount, 3) =
    min(cpuCount, 2)` for cpuCount ≤ 2).
  - Public `pack()` API unchanged (no new parameters; the heuristic reads
    existing `config.include` and `explicitFiles` arguments).

Risks.

  The heuristic is a coarse proxy. Pathological cases:
  - User runs default scan on a tiny repo (~50 files): 3 workers, +1
    extra BPE parse. The cost is bounded by the eager-warm-up overlap
    with searchFiles/collectFiles, so the worst case approaches the
    paired noise floor (~30 ms sd on 258-file). Not measured below 50
    files; expected to be neutral-to-slightly-negative within typical
    run-to-run variance.
  - User runs `--include 'huge-dir'` on a 5000-file project: 2 workers,
    misses the parallelism win. Falls back to current production
    behavior — no regression vs main.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-07 09:48:32 +00:00
Claude d51de61526 perf(core): Start metrics worker warm-up before searchFiles
Background
----------
Each metrics worker independently parses gpt-tokenizer's ~2.2 MB
`o200k_base.js` BPE table on its first task (~200-300 ms pure-CPU per
worker). The pool was previously created in `pack()` after the file
search and sort phases, so the only stages that could absorb the BPE
warm-up were `collectFiles` + git subprocesses + security check + file
processing. On a 258-file run this still left a residual ~80-130 ms
`await metricsWarmupPromise` stall before the metrics phase.

Change
------
Move `createMetricsTaskRunner` to fire before `searchFiles`. This adds
the ~130 ms glob scan to the hidden warm-up budget and shrinks the
residual stall to ~0-12 ms on the 258-file workload.

Pool sizing: Tinypool fixes `maxThreads` at construction, and the file
count is not yet known. Pre-warming exactly the workers we'll use is
essential — Tinypool queues tasks for newly spawned (cold) workers and
the pipeline can't progress until those workers finish their BPE parse
and pick up the queued task (an experiment with `maxThreads=cpuCount=4`
and only 2 warm workers regressed the 258-file workload by 27 % paired).
So the pool is sized to a fixed 2 workers (`numOfTasks = 2 ×
TASKS_PER_THREAD = 400` → `maxThreads = min(cpuCount, 2)`), matching the
security pool's hard cap and the typical metrics pool size for repos
≤400 files after the TASKS_PER_THREAD=200 sizing on this branch.

Larger repos (>400 files) would benefit from more parallelism, but the
1046-file regression check below shows the eager-warmup gain still
net-improves wall-clock at maxThreads=2 (the BPE warm-up cost
~250 ms × cpuCount-2 extra workers dominates the parallelism savings on
the metrics phase). On single-CPU hosts the heuristic naturally
collapses to maxThreads=1, identical to today's behavior.

The `try { } finally { cleanup }` block is widened to cover the new
early call so the worker pool is cleaned up on early throws too. A new
`searchFiles`-rejection test in `tests/core/packager.test.ts` exercises
that path explicitly.

`TASKS_PER_THREAD` is exported from `processConcurrency.ts` and consumed
by name in `packager.ts` to keep the eager-warmup constant tied to the
shared sizing rule.

Benchmark
---------
Both runs use n=… paired interleaved (alternating BEFORE-first /
AFTER-first ordering) with `NODE_DISABLE_COMPILE_CACHE=1` so cold-start
BPE parse is measured rather than masked. 4-vCPU Intel(R) Xeon(R) host.

`node bin/repomix.cjs --include 'src,tests' --quiet` (258 files, n=20):

|        | min     | median  | mean    | max     | sd     |
|--------|---------|---------|---------|---------|--------|
| BEFORE | 1007 ms | 1044 ms | 1054 ms | 1164 ms | 36 ms  |
| AFTER  |  893 ms |  966 ms |  962 ms | 1065 ms | 36 ms  |

- Mean paired Δ:   +91.6 ms (8.69 % wall-clock reduction)
- Median paired Δ: +97.5 ms (9.34 %)
- Paired-delta SD: 36.0 ms · paired t = 11.39 (p < 0.001)
- AFTER faster in 20/20 pairs (100 %)

Regression check — `node bin/repomix.cjs --quiet` (default, 1046 files,
n=15) on a clean repo (baseline binary built outside the working tree
so it does not get picked up as a workload file):

|        | min     | median  | mean    | max     | sd     |
|--------|---------|---------|---------|---------|--------|
| BEFORE | 1769 ms | 1872 ms | 1877 ms | 2063 ms | 79 ms  |
| AFTER  | 1751 ms | 1820 ms | 1837 ms | 2018 ms | 61 ms  |

- Mean paired Δ:   +40.0 ms (2.13 %)
- Median paired Δ: +48.6 ms (2.60 %)
- Paired-delta SD: 51.7 ms · paired t = 2.99 (p ≈ 0.01)
- AFTER faster in 11/15 pairs (73 %)

The larger workload also clears the 2 % threshold; the eager warm-up's
gain offsets the maxThreads=2 cap that's now applied unconditionally.

Correctness
-----------
- All 1257 unit tests pass (`npm test`); `npm run lint` clean (only
  pre-existing warnings).
- XML and Markdown output byte-identical between BEFORE and AFTER on
  both the 258-file and 1046-file workloads.
- Worker-pool size confirmed via `--verbose` logs: `min=1, max=2 threads`
  for `calculateMetrics` on both workloads (was `max=2` on 258 files,
  `max=4` on 1046 files before this change).
- New test `cleans up the metrics worker pool when searchFiles rejects`
  exercises the widened `try/finally` cleanup path.
2026-05-07 02:56:06 +00:00
Kazuki Yamada d10814163d fix(core): Cover const and external constructors in Dart query
Three constructor variants were silently dropped during --compress:

- `const Foo(...);` and `const Foo.named(...) : ...;` parse as
  `(declaration (constant_constructor_signature ...))` — a node type the
  existing constructor query did not list.
- `const factory Foo() = Bar;` parses as
  `(redirecting_factory_constructor_signature (const_builtin) (identifier) ...)`
  whose first named child is `const_builtin`, so the leading-anchor
  `. (identifier)` pattern failed to match.
- `external factory Foo.make();` parses as
  `(declaration (factory_constructor_signature ...))` — bare under
  `declaration`, not wrapped in `method_signature`, so the existing
  factory query missed it.

Switch the constructor / factory / redirecting-factory queries to
capture the whole signature node as `@name.definition.method`. This
emits the same source line(s) DefaultParseStrategy already produces and
is robust across all body / external / const / redirecting variants.
2026-05-06 22:12:39 +09:00
Kazuki Yamada def2985b30 feat(core): Capture plain constructor and operator overload in Dart query
Two pre-existing gaps surfaced while extending queryDart:

- Plain constructors (e.g. `Animal(this.name);`) live directly under
  `declaration`, not wrapped in `method_signature`, so the existing
  `(method_signature (constructor_signature ...))` query never matched
  them. Add a sibling query against `(declaration (constructor_signature ...))`.
- Operator overloads (`operator +`, `operator []`, `operator []=`,
  `operator ==`, ...) parse as `(method_signature (operator_signature ...))`
  but `operator_signature` has no identifier name field — the operator
  token surfaces as `(binary_operator)` / `([])` / `([]=)` children.
  Capture the whole `operator_signature` as `@name.definition.method` so
  DefaultParseStrategy emits its full source range.

Verified against `--compress` on a real Dart file: signatures that were
previously dropped (only their `///` doc comments survived) now appear
in compressed output.
2026-05-06 22:12:39 +09:00
Kazuki Yamada ada200a080 feat(core): Capture mixin, typedef, getter, setter, and factory in Dart query
intent(dart-query): make --compress preserve Dart definition kinds that were silently dropped — mixin, typedef, getter, setter, factory, and redirecting factory
decision(capture-naming): align Dart captures with the dominant @name.definition.X convention used by queryTypeScript/queryPython/queryRust; output is unchanged because DefaultParseStrategy matches via name.includes('name')
constraint(redirecting-factory): tree-sitter-dart grammar makes redirecting_factory_constructor_signature a child of `declaration`, not `method_signature`, so it must be queried bare to avoid a "Bad pattern structure" parse error
constraint(type-alias): type_alias's name node is `type_identifier`, not `identifier` — using `identifier` would silently match nothing
learned(external-keyword): `external` modifier in Dart is a sibling token outside function_signature/method_signature, so existing captures already cover `external void foo();` without changes
2026-05-06 22:12:39 +09:00
Kazuki Yamada 3ff49306e1 refactor(file): Simplify cheap pre-screen down to NULL probe + BOM exemption
intent(simpler): 前 commit の cheap pre-screen は `isbinaryfile@5.0.2` の `isBinaryCheck` のうち valid UTF-8 でも binary 判定する 3 規則 (PDF magic / NULL / suspicious 制御バイト比率 >10%) を mirror していたが、`TextDecoder('utf-8', { fatal: true })` を `isBinaryFile` の前に動かしている時点で valid UTF-8 buffer は protobuf detector に渡らない。pathological case 回避という主目的は TextDecoder の reorder だけで達成しており、PDF magic と suspicious-byte ratio の mirror は (1) 実害ほぼゼロのエッジケースを救うだけで (2) `isbinaryfile` 内部実装への coupling を抱える、という割に合わない構成だった。

fix(simplify): cheap pre-screen を NULL-byte probe + BOM exemption の最小構成に縮小。

decision(keep-null-probe): NULL byte だけは独立した正当な理由で残す — `U+0000` は **XML 1.0 で不正な文字** で、本ツールの主出力フォーマット (XML) の正当性を破壊する。`TextDecoder` は `0x00` を valid UTF-8 (U+0000) として通すので、ここで弾かないと NULL を含む buffer が text として pack され、downstream の XML parser が落ちる。これは `isbinaryfile` の rule mirror ではなく、repomix 自身の出力 robustness 要件。

decision(drop-pdf-magic): PDF は `is-binary-path` の `.pdf` 拡張子で先に弾かれる。拡張子なしの ASCII-only PDF stub は実例ほぼゼロ (本物の PDF は cross-ref とバイナリストリームを内包し UTF-8 decode で失敗する経路を通る)。守る価値が低い。

decision(drop-suspicious-ratio): 純粋な C0 制御バイト高比率の valid UTF-8 buffer は実プロジェクトに存在しない。`isbinaryfile` の UTF-8 lookahead を完全 mirror する保守コスト (DEL boundary 等のドリフトリスク) > 効用。

constraint(coupling-minimal): NULL byte は universal な binary signal で、`isbinaryfile` の rule に縛られない。同 BOM exemption も標準 BOM の規格に準拠したもので upstream ドリフトの影響を受けない。

test(cleanup): 関連 regression test 3 件を削除 (PDF magic / suspicious ratio / DEL boundary)。これらは削除した規則の挙動を保証するもので、現実装ではすべて意図的に「text として pack」する。残るのは UTF-8 multi-byte / UTF-8 BOM+NULL / UTF-16 LE BOM の 3 件で、いずれも本 PR が回避したい pathological / regression を直接守る。

bench(no-regression, M-series Mac, hyperfine --warmup 1 --runs 5):
- `node bin/repomix.cjs --quiet`: 399ms → 418ms (誤差範囲、JS の手書き 512-byte loop が消えた分の差は noise floor)
- 出力差分: 既存の base に対して 0 ファイル削除、Korean md が +1 (silent drop 解消、変更なし)
- fileRead.ts: 175 → 159 行 (-16 行)。pre-screen 関連で実質 ~35 行削減。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 23:11:29 +09:00
Kazuki Yamada 0248b61085 fix(file): Include UTF-8 BOM in cheap pre-screen exemption
intent(no-regression): codex re-review で指摘 — BOM 例外関数 `hasNonUtf8TextBom` が UTF-16/UTF-32/GB18030 のみで UTF-8 BOM (`EF BB BF`) を含めていなかった。`isbinaryfile@5.0.2` の `isBinaryCheck` は UTF-8 BOM を見た瞬間に `false` を返すため、`EF BB BF 00 41` のような buffer は変更前は text として fast path に流れていた。今回の差分では UTF-8 BOM 後の NULL byte が cheap probe で先に拾われ binary 判定される regression。

fix(utf8-bom-exempt): 関数を `hasTextBom` に rename し、UTF-8 BOM (`EF BB BF`) の判定を最初に追加。`isbinaryfile` 本家と同じ並びで BOM 例外を持たせ、cheap probe を skip して UTF-8 fast path に到達させる。

constraint(test-source-non-binary): codex iter2 指摘 — 新規 BOM exemption テストの期待値に raw NUL byte literal を埋めると `fileRead.test.ts` 自身が `grep`/`rg` から binary 扱いになり、ファイル横断検索や CI のテキスト走査ジョブから不可視化される。`'\0A'` エスケープ表記に置換して、実行時の文字列 (char codes 0, 65) は同じまま source は ASCII に戻した。

test(regression): 1 件追加。
- `EF BB BF 00 41` (UTF-8 BOM + NULL + 'A') が `binary-content` で skip されず、UTF-8 fast path で `'\0A'` として decode されることを確認。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 20:41:30 +09:00
Kazuki Yamada 6e81c449b6 fix(test): Drop hyphen from "mis-classified" to satisfy typos check
intent(ci-green): typos@1.45.1 flags `mis` as a typo of `miss`/`mist` and the
hyphenated `mis-classified` in the new PDF-magic regression test comment
trips the `Check typos` job. The unhyphenated `misclassified` is the more
common spelling and passes the dictionary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 20:32:38 +09:00
Kazuki Yamada 5ab82d5152 fix(file): Mirror isbinaryfile's PDF magic + suspicious-byte ratio rules in cheap pre-screen
intent(no-regression): codex review で指摘 — 元の差分は cheap pre-screen を NULL-byte probe + UTF-16/UTF-32 BOM 例外のみで構成していたが、`isbinaryfile@5.0.2` の `isBinaryCheck` には valid UTF-8 でも binary 判定する規則が他に2つある: (1) 先頭 5 バイトが `%PDF-` (PDF magic), (2) 先頭 512 バイト中の suspicious 制御バイト比率 >10%。これらを cheap pre-screen に含めないと、`%PDF-` 始まりの拡張子なし/`.txt` ファイルや、ASCII 制御文字が高比率の valid UTF-8 ファイルが従来 skip されていたのに pack に含まれる回帰が発生する。

fix(pdf-magic): UTF-16/UTF-32 BOM 例外の後、NULL probe の前に `%PDF-` 判定を追加。`isbinaryfile` 本家と同じ位置。

fix(suspicious-ratio): 既存の NULL probe ループに suspicious カウンタを追加し、ループ後に `>10%` 閾値で binary 判定。suspicious 集合は `isbinaryfile` の `(b < 7 || b > 14) && (b < 32 || b > 127)` 条件を valid-UTF-8 入力に絞って簡略化したもの: `b < 7` または `b in 0x0F..0x1F`。これは valid UTF-8 multi-byte の continuation/lead bytes (0x80..0xFF) と排他なので、UTF-8 awareness なしの flat byte scan で正しい結果になる。protobuf 検出器 (`isBinaryProto`) は意図的に mirror しない — それが本 PR で回避している pathological case 本体。

constraint(del-boundary): codex 再レビュー指摘 — 当初 0x7F (DEL) を suspicious 集合に含めていたが、`isbinaryfile` の条件 `b < 32 || b > 127` は 127 を排除する。修正版では `b === 0x7f` を外し、コメントも本家挙動に合わせて訂正。

test(regression): 3 件追加。
- valid-UTF-8 PDF magic (`%PDF-1.4\n...`) を `binary-content` で skip することを確認
- 64 バイトの 0x01 のみの buffer (suspicious 100%) を `binary-content` で skip することを確認
- 64 バイトの 0x7F のみの buffer (valid UTF-8, DEL 100%) は **skip しない** ことを境界として固定

bench(no-perf-regression, M-series Mac, hyperfine --warmup 1 --runs 5):
- `node bin/repomix.cjs --quiet`: 406ms → 399ms (誤差範囲)
- pre-screen の追加コストは 512 バイト線形スキャン分で、UTF-8 fast path (元から TextDecoder で全バッファ走査) に比べて無視できる

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 16:15:44 +09:00
Kazuki Yamada 8a08815ba1 perf(file): Try UTF-8 decode before isBinaryFile to dodge protobuf-detector pathological case
intent(latency): `node bin/repomix.cjs` がリポジトリ自身を pack する際の wall-clock が PR #1533 (`docs(website): Add localized page metadata`, commit 9bd663ae) で 0.38s → 1.15s に約3倍に増えた。9bd663ae は 14言語 × 24ページ = 336個の md に YAML frontmatter (`title` + `description`) を追記しただけで、出力サイズはほぼ変わらないのに実行時間だけが伸びていた。

root-cause(isbinaryfile): `readRawFile` は全ファイルの buffer を `isBinaryFile` (= `isbinaryfile` パッケージ) に通してから UTF-8 fast path に進む。`isbinaryfile` の `isBinaryCheck` は protobuf-shape 検出器 (`isBinaryProto`) を含み、これが任意ファイル先頭バイトを varint として解釈し `new Array(varint)` で配列を確保する。一部の正当な UTF-8 バイト列ではこのループが数秒スピン or `RangeError: Invalid array length` を投げる。具体例: `website/client/src/ko/guide/tips/best-practices.md` (4,243 bytes, valid UTF-8 韓国語 md) は単独で `isBinaryFile` 呼び出しに ~3,500ms かかり最終的に throw → 外側の try/catch で握り潰され `encoding-error` で silent drop されていた。デフォルトの pack 時はこの 1 ファイルだけで毎回 ~3,500ms を払っていた。これは upstream `isbinaryfile` のバグ (信頼できない入力で `new Array(n)` を bound せず確保) だが、修正を待たずに自衛する。

fix(reorder): `isBinaryFile` を UTF-8 fast path の **後** に動かし、UTF-8 として decode 失敗した buffer のみに適用する。NULL バイト (= U+0000、valid UTF-8) を含むバイナリは UTF-8 fast path を素通りしてしまうため、`isBinaryFile` の前に 512 バイトの cheap な NULL-byte probe を挿入。NULL は最強のバイナリシグナルかつ `isBinaryCheck` のうち UTF-8 fatal decode を通過する入力に triggers する唯一の規則。残りの heuristics (PDF magic / suspicious-byte ratio / protobuf shape) は非 UTF-8 バイト列を要求するので、UTF-8 fast path に乗らないファイルだけが従来通り `isBinaryFile` に渡る。

constraint(utf16-utf32-bom): UTF-16 LE は ASCII `A` を `0x41 0x00` と encode し、UTF-32 BE BOM は `0x00 0x00 0xFE 0xFF` で始まる。NULL probe をそのまま走らせるとこれらの text ファイルを binary 誤判定する。`isbinaryfile` 自身の `isBinaryCheck` は BOM 例外を持っているので、`hasNonUtf8TextBom` でこれを mirror し、UTF-16/UTF-32 BOM 始まりの buffer は probe を skip して slow path (jschardet+iconv) にそのまま落とす。挙動は pre-change と同一。

side-effect(restored-file): 上の Korean Markdown ファイルは throw → silent drop されて出力から消えていたが、本修正後は正しく出力に含まれる。

test(regression): `tests/core/file/fileRead.test.ts` に2件追加。
- valid UTF-8 multi-byte (Hangul 3-byte 連続、NULL なし) を text としてそのまま round-trip
- UTF-16 LE BOM ファイル ("Hello\n") が NULL を含んでも slow path で正しく decode

bench(local, M-series Mac, hyperfine --warmup 1 --runs 5):
- `node bin/repomix.cjs --quiet` 全体: 1152ms → **406ms** (約 2.8× 速)
- `--include 'website/client/src/ko/guide/tips/best-practices.md'` 単独: 880ms → **170ms** (約 5.2× 速)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 15:56:10 +09:00
Claude 98866f001f perf(metrics): Increase METRICS_BATCH_SIZE from 10 to 50
Cuts per-thread IPC overhead in calculateFileMetrics by ~80% by sending
fewer, larger batches to the metrics worker pool. At ~1000 files this
reduces the dispatched task count from ~100 to ~20 (~5 batches per
worker on a 4-thread pool instead of ~25), saving ~40 estimated
worker-ms of pure serialization/dispatch overhead per worker on the
critical path.

The previous size of 10 was tuned for fast worker availability to
overlap with output generation, but tinypool round-trips were measured
at ~2ms minimum (batches of trivially small files bottom out there),
so the IPC dominated for short batches. Increasing to 50 keeps load
balance comparable (~5:1 per thread vs. the prior ~25:1) and avoids
the load-imbalance pitfall of even larger sizes (e.g. 100), where one
oversized batch can monopolize a worker and stretch the tail.

Benchmark on this repo (1016 files, 4 cores; 100 paired interleaved
runs alternating BATCH=10 and BATCH=50 with the same prebuilt JS,
swapping only the constant, full pipeline timed end-to-end):

  BATCH=10  mean 1.5589s  median 1.5480s  stdev 0.0966s
  BATCH=50  mean 1.5111s  median 1.5030s  stdev 0.0625s
  Saved     47.8ms (3.07%) mean / 45.0ms (2.91%) median
            (trimmed mean 41.0ms, dropping top/bottom 10%)
  Wins      69/100 paired runs faster with BATCH=50
  t-test    paired t = 4.545 (df=99, p < 0.001)
  95% CI    [27.2 ms, 68.4 ms] on the mean improvement
  Sign test 2-sided p = 0.0002

Also adds a regression test for the batching path: with 120 input files
the test asserts result order/completeness and that the progress
callback fires once per dispatched batch (not per file, not just once).

All 1250 tests pass; output is byte-identical for the same source tree.
2026-04-30 00:06:39 +09:00
Kazuki Yamada f67731056a test: Round-3 PR review feedback
- validateFileSafety: pin the negative path of `if (config.security.enableSecurityCheck)`
  — every other test enabled the check, so a regression that always runs
  the security check would have passed silently.
- unifiedWorker:
  - Add a positive workerData=securityCheck + ambiguous-task case so the
    pair (override + this) distinguishes "inference always wins" from
    "inference wins only when it yields a value".
  - Stop pretending the handler-cache test verifies caching. Both branches
    of `if (cached) return cached;` end with the same Map.set, and Node's
    own module cache makes the dynamic import effectively free, so the
    cache is unobservable from outside without exposing internals.
    Renamed to "repeated calls" with a comment explaining the limitation.
- fileSystemReadDirectoryTool: translate the pre-existing Japanese comment
  to English per CLAUDE.md.
- TokenCounter: extract `LoadEncodingFn` type alias instead of the
  unusual `typeof loadEncoding`, so a signature drift between the local
  function and the deps field would surface at the type level.
2026-04-26 22:47:21 +09:00
Kazuki Yamada 7ff8c8b155 test: Address PR review feedback (RepomixError instances + mock isolation)
- outputGenerate: tests titled "throws RepomixError…" / "wraps … in
  RepomixError" now assert the rejection is an instance of RepomixError
  in addition to the message regex, matching the test names.
- LanguageParser: collapse the duplicate getParserForLang('javascript')
  rejection assertions into a single .catch capture that checks both
  type and message.
- calculateMetrics: vi.mocked(initTaskRunner).mockReset() before
  mockReturnValueOnce so a future test that omits taskRunner can't
  silently consume the override.
- packager: pre-attach a no-op .catch on the rejected warmupPromise so
  vitest's unhandled-rejection detection doesn't fire before pack
  awaits it. Production code mirrors this pattern in packager.ts:262.
2026-04-26 22:29:06 +09:00
Claude 1bef40ba2d test: Tighten misleading test names and pin packageManager guard
Address two review threads on PR #1518 that flagged tests whose titles
overstated what was being verified.

- fileProcess: the longBase64 string is one continuous line, so the
  truncateBase64 → removeEmptyLines ordering was never actually under
  test (truncateBase64Content's regex does not span newlines). Rename
  to describe the combined behavior the test really pins.
- skillTechStack: rename the per-directory case to reflect that root
  and subpackage land in separate buckets keyed by getDirPath, and
  add a second case with two package.json entries at the same path
  to genuinely exercise the parsed.packageManager && !result.packageManager
  guard at skillTechStack.ts:541.
2026-04-26 13:24:32 +00:00
Kazuki Yamada bcd849177f refactor(metrics): Inject loadEncoding via deps for testability
Address PR review feedback (claude R4): the previous error-handling tests
overwrote the private `countFn` field via a cast, which silently breaks
on a rename. Add a `deps` parameter to TokenCounter that defaults to the
real `loadEncoding`, and switch the error-handling tests to inject a fake
that returns the throwing function directly. Matches the dependency
injection pattern documented in CLAUDE.md.
2026-04-26 22:22:58 +09:00
Kazuki Yamada e5f7a1f311 fix(shared): Address PR review feedback
- shared/errorHandle: recognize duck-typed OperationCancelledError from
  worker boundaries in isRepomixError (it extends RepomixError but the
  name was missing from the structured-clone fallback comparison).
  Add a regression test for the worker-boundary case.

Test improvements per coderabbit / claude review:
- cliReport: assert skill-directory + relative path on the same log line.
- processConcurrency: restore process.versions.bun by removing the property
  when it didn't originally exist, instead of leaving it defined-as-undefined.
- logger: drop the no-op `process.env.REPOMIX_LOG_LEVEL = undefined` (it
  coerces to the string "undefined" and is overwritten by the next delete).
- unifiedWorker: replace the tautological cache test with one that proves
  cache uniqueness via onWorkerTermination cleanup count; add a test for
  task-based inference overriding workerData (bundled-env reuse).
- calculateMetricsWorker: new direct test for the default export's items
  vs. single-mode dispatch — unifiedWorker mocks this module so the branch
  was otherwise untested.
- packRemoteRepositoryTool: hard-code the expected output path instead of
  expect.any(String) to catch arg-swap regressions.
- memoryUtils: tighten getMemoryStats assertions with sanity bounds
  (heapUsed <= heapTotal, rss > 0, heapUsagePercent <= 100) so a
  unit-conversion regression (bytes vs MB) would fail the test.
2026-04-26 22:20:42 +09:00
Kazuki Yamada 402e4906d7 test: Pin v1.14.0 regression-prone invariants
Targeted regression tests for the high-risk areas identified in the
v1.13.1..main audit, focusing on silent-correctness bugs and parallel
error handling — places that wouldn't surface in CI but would in the
field.

- core/metrics/calculateMetrics: pin numeric equivalence between the
  fast path (Σ file tokens + wrapper tokens) and the slow path (full
  output tokenization). Cover wrapper-extraction fallback, split-output
  fallback, and worker pool cleanup when fileMetrics rejects.

- core/file/fileProcess: pin transform ordering invariants —
  removeComments → removeEmptyLines (blank lines from comment removal
  must be cleaned up; preserved when removeEmptyLines is off);
  truncateBase64 → removeEmptyLines (multi-line base64 squashed first);
  trim → showLineNumbers (no leading/trailing blanks numbered).
  Plus worker/lightweight path parity for inputs that don't need
  worker processing.

- core/packager: pin metrics worker pool cleanup on parallel branch
  failures (validateFileSafety, produceOutput, calculateMetrics, warmup
  rejection). Verify prefetchSortData failure is isolated and does not
  block sortOutputFiles.

- core/skill/skillTechStack: cover untested fix-commit invariants —
  root entry sorts first in monorepo output; configFiles deduplicated
  within a directory; first-seen packageManager wins per directory.
2026-04-26 19:43:53 +09:00
Kazuki Yamada cbdfc29b4d test: Cover error/edge paths in core (output, file, security, treeSitter)
Lift the four most impactful uncovered files past 90% lines without
introducing fragile or contrived tests. Each block targets real
user-facing branches (error handling, optional features, init/dispose).

- core/output/outputGenerate (78% -> ~90%):
  - buildOutputGeneratorContext: instructionFilePath success and missing-file
    paths; pre-computed vs. searchFiles fallback for empty directories;
    full-tree mode (success and listing failure); searchFiles failure wrap.
  - generateOutput: unsupported style throws RepomixError.

- core/security/validateFileSafety (79% -> ~95%):
  - logSuspiciousContentWarning loop: header line per section, plus
    singular ("issue") and plural ("issues") suffix per result.
  - No-op behavior when no suspicious git diff/log entries exist.

- core/file/fileSearch (88% -> ~92%):
  - handleGlobbyError: EPERM and EACCES translated to PermissionError;
    other error codes pass through.
  - Outer catch: generic Error wrapped with directory context;
    non-Error throw produces the generic fallback message.

- core/treeSitter/languageParser (74% -> ~88%):
  - getResources before init() throws RepomixError.
  - init() is idempotent (Parser.init is called only once across two calls).
  - Parser.init() failure is wrapped as RepomixError.
  - dispose() resets state so subsequent calls require re-init.

Coverage:
- Statements 89.51% -> 90.23%
- Branches   79.31% -> 80.26%
- Functions  89.37% -> 89.69%
- Lines      90.06% -> 90.80%
2026-04-26 19:35:00 +09:00
Kazuki Yamada 9aac452504 test: Raise overall coverage from 87.9% to 90.1%
Cover previously-untested paths across the shared, cli, core, and mcp
layers, focusing on branches that represent real user-facing behavior
rather than line-coverage chasing.

Highlights:
- shared/errorHandle: cover handleError (RepomixError, unexpected Error,
  unknown values, duck-typed worker errors, debug-level branches) and
  the three error class constructors.
- shared/logger: cover setLogLevelByWorkerData for env-var, workerData
  (array and object shapes), and invalid/missing inputs.
- shared/memoryUtils: add a fresh test file covering stats, log helpers,
  and withMemoryLogging success/error paths.
- shared/processConcurrency: cover cleanupWorkerPool (Node, Bun-skip,
  swallowed teardown errors) and the run/cleanup delegation.
- shared/unifiedWorker: cover the cache-hit path and the workerData
  (array/object) and REPOMIX_WORKER_TYPE detection branches.
- core/metrics/TokenCounter: cover the catch branch (Error,
  non-Error throws, with/without filePath).
- core/file/fileManipulate: cover removeEmptyLines on inherited base
  and composite manipulators.
- cli/cliReport: cover skill-directory and split-output summary lines.
- mcp/tools/packRemoteRepositoryTool: add tests mirroring the
  packCodebaseTool pattern (success, runCli failure, runCli throw,
  workspace creation failure).
- mcp/tools/fileSystemReadDirectoryTool: switch to mocking
  node:fs/promises so existing mocks actually intercept calls, and
  cover the file-vs-dir, listing, empty-directory, and readdir-error
  paths.

Result:
- Statements 87.29% -> 89.51%
- Branches   76.16% -> 79.31%
- Functions  87.60% -> 89.37%
- Lines      87.89% -> 90.06%
2026-04-26 19:28:09 +09:00
Kazuki Yamada 44db93451d test(core): Cover precondition guards in truncateBase64
intent(truncateBase64-tests): add explicit coverage for the new fast-path guards introduced alongside the regex-skip optimization
decision(test-cases): focus on the four cases that exercise guard behavior not previously asserted — empty input, exactly-below-threshold (255 chars), run-reset on non-base64 separator, and non-base64 data URI without `;base64,`

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 18:09:02 +09:00
Claude 4bd257e280 perf(core): Speed up empty-directory detection in file search
Merge the two separate globby traversals used by `searchFiles` into a
single one and parallelize the per-directory `readdir` calls used to
filter empty directories.

Background
----------
When `output.includeEmptyDirectories` is enabled (the default for
`repomix.config.json` in this repo, and any repo that wants an accurate
directory tree), `searchFiles` previously walked the working tree twice:
once with `onlyFiles: true` and a second time with `onlyDirectories:
true`. Each call re-traversed the tree and re-parsed every `.gitignore`
/ `.repomixignore` file. `findEmptyDirectories` then issued `readdir`
serially for every matched directory, awaiting each syscall before
starting the next.

Change
------
* Replace the two globby invocations with one `objectMode: true,
  onlyFiles: false` call. Partition the returned `GlobEntry[]` by
  `dirent.isFile()` / `dirent.isDirectory()`, matching the previous
  `onlyFiles: true` semantics for symlinks and other non-file
  non-directory entries.
* Rewrite `findEmptyDirectories` to run the per-directory `readdir`
  checks concurrently via `Promise.all`. Ordering is preserved by the
  result array and the caller sorts the final list anyway.
* When `includeEmptyDirectories` is disabled, keep the fast
  `onlyFiles: true` path unchanged so the default CLI run pays no cost.

Benchmark (hyperfine, repomix packing itself, 30 runs, warmup 3)
----------------------------------------------------------------
Run 1: baseline 2.162s ± 0.042s → perf 2.017s ± 0.029s  → -145ms (-6.7%)
Run 2: baseline 2.161s ± 0.023s → perf 2.030s ± 0.027s  → -131ms (-6.1%)

Per-stage verbose timings:
  baseline: [globby files 200ms] + [globby dirs 85ms] + [empty dirs 61ms]
  perf:     [combined globby    223ms]                + [empty dirs 66ms]
  saved:    -57ms consistently on the critical path
2026-04-26 16:41:38 +09:00
Kazuki Yamada 0b6c775d68 fix(metrics): Address PR review feedback
- Fix calculateMetrics test to use parsableStyle: true so it exercises
  the fallback path (calculateOutputMetrics mock) instead of accidentally
  hitting the fast path
- Correct packager comment to clarify that git-log is cached but the
  array sort itself runs twice (negligible cost)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 23:55:01 +09:00
Kazuki Yamada ea174db0f8 test(metrics): Add tests for extractOutputWrapper and canUseFastOutputTokenPath
Export the two helper functions and add 13 unit tests covering:
- extractOutputWrapper: normal extraction, missing content (null),
  empty files, identical content, wrong order, no files, no wrapper
- canUseFastOutputTokenPath: each style variant, splitOutput,
  parsableStyle

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 23:41:32 +09:00
Kazuki Yamada 8321c7a11e refactor(metrics): Rename calculateSelectiveFileMetrics to calculateFileMetrics
The function now always calculates metrics for all files, so the
"Selective" prefix no longer reflects its behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 20:47:06 +09:00
Kazuki Yamada 6dea6cb32e refactor(metrics): Always calculate per-file token counts for all files
Previously, per-file tokenization was only done for all files when
`tokenCountTree` was enabled; otherwise only the top N files (by char
count) were tokenized for the "Top Files" display. But output
tokenization always processes all file contents anyway, so the
"selective" path was not saving any work — it just prevented the fast
output-token path from being used.

Now we always tokenize every file individually, which:
- Enables the wrapper-extraction fast path regardless of tokenCountTree
- Simplifies the metrics pipeline by removing the conditional branching
- Provides complete per-file token data for all downstream consumers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:58:41 +09:00
Claude 68a47b9149 perf(core): Skip redundant full-output tokenization via wrapper-extraction fast path (-13.2%)
When `tokenCountTree` is enabled `calculateSelectiveFileMetrics` already
tokenizes every file individually on the primary worker pool. The original
`calculateOutputMetrics` then re-tokenized the full output a second time, split
into 200 KB chunks, to compute `totalTokens`. On large repos with the tree
display enabled, this second pass was the single longest task in the
`calculateMetrics` `Promise.all`, consuming roughly 1 second of worker time
that duplicated work already done for the per-file counts.

This change introduces a fast path for the common case (xml / markdown / plain
output, non-parsable, single-part): walk the generated output with
`indexOf(file.content, cursor)` once per file to splice file contents out of
the output, tokenize only the remaining "wrapper" (template boilerplate +
directory tree + git diff/log + per-file headers), and compute
`totalTokens = Σ per-file tokens + wrapper tokens`.

The accuracy delta versus the old 200 KB-chunk approach is bounded by BPE
merges across file↔wrapper boundaries; on the repomix repository itself the
measured error was 309 / 1,284,067 tokens ≈ 0.024 %, comparable to the chunk
boundary error the existing approach already accepts.

## Implementation

- `src/core/metrics/calculateMetrics.ts`
  - Add `extractOutputWrapper(output, processedFilesInOutputOrder)` which
    walks the output with a single forward cursor. Returns `null` and
    triggers a fall back to `calculateOutputMetrics` if any file content is
    not found (e.g., template escaped it, output was split, order mismatch).
  - Add `canUseFastOutputTokenPath(config)` gate: only enabled when
    `tokenCountTree` is truthy, `splitOutput` is undefined, `parsableStyle`
    is false, and the style is `xml` / `markdown` / `plain`. JSON output
    and parsable XML go through `JSON.stringify` / `fast-xml-builder` which
    escape file contents, so `indexOf(content)` would miss them.
  - In `calculateMetrics`, when the fast path is available and wrapper
    extraction succeeds, replace `outputMetricsPromise` with a promise that
    awaits the already-running `selectiveFileMetricsPromise`, sums the
    per-file token counts, and dispatches a single `runTokenCount` on the
    extracted wrapper string. The rest of the `Promise.all` is unchanged.

- `src/core/packager.ts`
  - Call `sortOutputFiles(filteredProcessedFiles, config)` once in `pack`
    immediately after suspicious-file filtering and use its result as
    `processedFiles` downstream (for `produceOutput`, `calculateMetrics`,
    and the final result object). `generateOutput` internally calls
    `sortOutputFiles` as well, which is stable and memoized via
    `fileChangeCountsCache`, so the two now share the single git-log
    subprocess result and consumers see files in the exact order they
    appear in the output. This is a precondition for the fast path's
    forward-walk extraction.
  - Expose `sortOutputFiles` on `defaultDeps` so existing packager unit
    tests can inject their own implementation.

- `tests/core/packager/diffsFunctionality.test.ts`
  - Extend the `gitRepositoryHandle.js` `vi.mock` to also stub
    `isGitInstalled` and `getFileChangeCount`, since `sortOutputFiles`
    resolves its default dependencies from that module at module load time.

All 1102 existing tests pass unchanged; lint is clean.

## Benchmark

Interleaved 30-run benchmark against the repomix repo itself (1018 files,
~4 MB xml output, `tokenCountTree: 50000`, `sortByChanges: true`, `includeDiffs`
and `includeLogs` enabled via the repo's own `repomix.config.json`):

    base median: 2735.2 ms  [2389 - 3528]  IQR=367 ms
    opt  median: 2373.6 ms  [2125 - 2653]  IQR=293 ms
    delta:       -361.6 ms  (-13.22%)

Verbose trace before/after (single run, representative):

    before:
      Selective metrics calculation completed in 639 ms
      Output token count completed in      1046 ms
      Calculate Metrics wall:               1296 ms

    after:
      Selective metrics calculation completed in 579 ms
      Fast-path output tokens: files=1017293, wrapper=33678 (126996 chars)
      Calculate Metrics wall:                ~580 ms

The savings are concentrated in the `calculateMetrics` phase, which was the
dominant critical path in the final `Promise.all` for tokenCountTree runs on
large repos.
2026-04-12 17:47:03 +09:00
Kazuki Yamada 96ebaaeb88 test(output): Update xmldom API usage for v0.9
The @xmldom/xmldom upgrade removed the errorHandler option and changed
documentElement to be nullable. Switch to the new onError callback and
add null checks so the test suite compiles and runs against the new API.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 18:29:06 +09:00
Kazuki Yamada 9e199164e3 fix(metrics): Use 'characters' instead of 'KB' in chunk size comments
JS strings use UTF-16 encoding where character count != byte count.
Use 'K characters' for technical accuracy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 17:51:14 +09:00
Claude b59952174b perf(metrics): Increase output token counting chunk size from 100KB to 200KB
Benchmarks show 200KB chunks are optimal for output token counting,
reducing worker round-trips while maintaining good parallelism across
available CPU cores.

For a 3.9MB output (typical large repo), this reduces chunks from 39
to 20, saving ~46ms per run due to fewer structured-clone round-trips.

Benchmark results (repomix self-pack, 996 files, 3.8M chars, 5 runs):
- Before (100K chunks): 1384ms median
- After (200K chunks):  1293ms median
- Improvement: ~91ms = ~6.6%

Combined with existing batch IPC optimization, total improvement vs
baseline is ~156ms = ~10.8%.

https://claude.ai/code/session_01NjmXXUzBrB2oe4FD82NpGe
2026-04-06 17:51:14 +09:00
Kazuki Yamada e475825965 perf(metrics): Batch token counting IPC to reduce worker round-trip overhead
Selective file metrics previously sent one IPC round-trip per file to
worker threads for token counting. With ~991 files and ~0.5ms overhead
per round-trip, this added ~495ms of pure IPC waste.

This change introduces batch mode for the metrics worker, grouping files
into batches of 50 before sending to workers. This reduces round-trips
from 991 to 20.

Type safety improvement over the original approach: instead of scattering
`as number` casts across all callers, a new metricsWorkerRunner module
centralizes the type narrowing in two helper functions (runTokenCount and
runBatchTokenCount), keeping all other modules fully type-safe.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 03:11:31 +09:00
Kazuki Yamada 7de692329a refactor(core): Inject getProcessConcurrency via deps in securityCheck
Move getProcessConcurrency from a direct module import to the deps
parameter for consistency with initTaskRunner. This makes it easier
to test with different concurrency values without module-level mocking.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 01:29:35 +09:00
Kazuki Yamada 2a879425a0 perf(core): Reduce worker thread contention for faster pipeline execution
Add maxWorkerThreads option to WorkerOptions for explicit thread count
capping, then use it to reduce CPU contention when metrics and security
worker pools run concurrently during the pipeline overlap phase.

- Metrics pool: capped at (processConcurrency - 1)
- Security pool: capped at floor(processConcurrency / 2)

On a 4-core machine this reduces concurrent threads from 8 (4+4) to 5
(3+2), avoiding context-switching overhead during gpt-tokenizer warmup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:48:48 +09:00
Kazuki Yamada 55d3293134 perf(core): Skip binary files during GitHub archive tar extraction
Add an archive entry filter that checks file extensions with isBinaryPath
before writing to disk, avoiding unnecessary I/O for binary files (images,
fonts, executables, etc.) that would be excluded later anyway.

The filter strips the leading tar segment (e.g. "repo-branch/") since tar's
filter callback receives paths before strip is applied.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 23:25:19 +09:00
Kazuki Yamada d30ad69c68 perf(security): Reduce batch size from 500 to 50 for better parallelism
A batch size of 50 still reduces IPC round-trips by ~98% (990 → 20)
while producing enough batches to utilize all available CPU cores
on multi-core systems.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 00:31:31 +09:00
Kazuki Yamada fbb925b467 fix(security): Add numOfTasks comment and fix test batch size references
- Add comment explaining why numOfTasks uses totalItems instead of
  batches.length (passing batches.length would yield maxThreads=1,
  forcing sequential execution)
- Fix test comments that incorrectly referenced batch size 100
  when actual BATCH_SIZE is 500

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 00:23:54 +09:00
Kazuki Yamada 1c99f9617b perf(security): Batch security check tasks to reduce IPC overhead 2026-04-03 23:42:52 +09:00
Kazuki Yamada 55215e9c78 perf(metrics): Warm up all worker threads instead of just one
Fire maxThreads warmup tasks so every worker thread has gpt-tokenizer
loaded before metrics calculation begins. Combined with the early
warmup position (before collectFiles/securityCheck), this eliminates
cold-start latency on all threads without adding to the critical path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 16:27:52 +09:00
Kazuki Yamada 8f5ef2ce4d test(metrics): Add unit tests for createMetricsTaskRunner
Cover the new factory function: return shape validation, warmup task
payload verification, and error swallowing behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 16:06:37 +09:00
Kazuki Yamada cfbab618c5 refactor(metrics): Encapsulate warmup logic in createMetricsTaskRunner
Move worker thread warmup from packager into createMetricsTaskRunner,
which now returns both a taskRunner and warmupPromise. This keeps the
packager clean — it no longer needs to know warmup implementation details.

Also:
- Skip metrics worker pool creation on skill-generation path where
  it is unused
- Await warmupPromise in finally block before cleanup to prevent
  tearing down workers during initialization

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 14:56:49 +09:00
Kazuki Yamada b7cf3f05e2 refactor(git): Simplify URL construction by leveraging codeload.github.com auto-resolution
codeload.github.com resolves branches, tags, and SHAs automatically
without refs/heads/ or refs/tags/ prefixes. This eliminates the tag
fallback URL entirely and simplifies buildGitHubArchiveUrl to a single
return statement, saving an extra round trip for tag-based downloads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 14:47:30 +09:00
Kazuki Yamada 951249f9cf perf(git): Use codeload.github.com directly to skip 302 redirect
Download archives from codeload.github.com instead of github.com/archive
to eliminate the intermediate 302 redirect, saving ~100-300ms per request.
This is the same pattern used by create-react-app and degit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 14:35:34 +09:00
Kazuki Yamada ed8defffee perf(metrics): Reduce output token counting chunks from ~1000 to ~10
CHUNK_SIZE was used as the number of chunks (1000), creating ~1KB chunks
for 1MB output. Each chunk dispatched a worker task with ~0.5ms overhead
for serialization, scheduling, and callback resolution, totaling ~500ms
of overhead that dominated the actual tokenization work.

Replace with TARGET_CHARS_PER_CHUNK (100,000) so chunks are sized by
content rather than count. A 1MB output now produces ~10 chunks instead
of ~1000, reducing worker round-trip overhead by ~99%.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 13:53:09 +09:00
Kazuki Yamada 96a6a7c804 perf(core): Cache empty directory paths to avoid redundant file search
When includeEmptyDirectories is enabled, buildOutputGeneratorContext
called searchFiles a second time just to obtain emptyDirPaths, despite
these already being computed during the initial file search in packager.

Changes:
- Capture emptyDirPaths from the initial searchFiles result in packager
  and thread them through the pipeline (packager → produceOutput →
  generateOutput/outputSplit → buildOutputGeneratorContext)
- Guard emptyDirPaths processing with includeEmptyDirectories check to
  skip unnecessary work when the feature is disabled
- Fix split output path which was not receiving emptyDirPaths despite
  the parameter being declared in produceOutput's signature
- Add tests for cache hit (searchFiles not called) and fallback paths

Local benchmark (repomix on itself, includeEmptyDirectories: true):
  main:   696.6ms ± 4.2ms
  branch: 637.1ms ± 2.6ms
  Improvement: ~60ms (~8.5%)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 00:03:49 +09:00
Kazuki Yamada ee5a748a0f refactor(core): Simplify calculateMetrics output param to Promise-only
The output parameter was typed as `string | string[] | Promise<...>` but
callers can always wrap sync values in Promise.resolve(). Simplifying to
`Promise<string | string[]>` makes the interface cleaner.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:14:15 +09:00