repomix-mirror

mirror of https://github.com/yamadashy/repomix.git synced 2026-05-30 11:18:53 +02:00

Author	SHA1	Message	Date
Kazuki Yamada	c8560d1bc6	perf(core): Reduce metrics warmup to 1 worker on warm-cache path Lower `EAGER_WARMUP_THREADS` from 2 to 1 when `tokenCountCacheFileExists()` returns true. With the persistent token-count disk cache populated by a prior run, `calculateFileMetrics` serves every per-file token count from the in-memory map and dispatches zero worker tasks. The only worker work that survives caching on a warm rerun is a small fixed set of dispatches: - the wrapper-token tokenization (cache hit after run #2) - git diff staged/worktree token counts (only when `output.git.includeDiffs` is enabled) - git log token count (only when `output.git.includeLogs` is enabled) That worst case is 2-3 short tasks (a few KB each) that fit a single warm worker serially in well under 30 ms. Spawning a second warm worker means a redundant ~340 ms BPE table parse that contends with the file-collection main thread for CPU AND extends the final `pool.destroy()` blocking wait (BPE-loaded workers take ~21 ms to terminate vs ~3 ms when idle). Cold-cache (no cache file) behavior is preserved: the unscoped path keeps 3 warm workers and the explicit-scope path keeps 2, so the actual file tokenizations still parallelise across the original worker counts. The probe is a coarse heuristic — a cache file written by a previous run that used a different `tokenCount.encoding` (e.g. cl100k_base instead of the default o200k_base) yields no hits for the current run, so the metrics phase pays one BPE parse sequentially on the critical path before tokenizing files. This is a one-time cost on encoding switches; subsequent runs rebuild the cache for the new encoding and hit again. Benchmark (paired, n=25, repomix self-pack on 1068 files): WARM CACHE (cache file present) BASELINE mean=968.9ms median=976.0ms sd=40.3ms AFTER mean=883.2ms median=875.0ms sd=33.1ms DELTA mean=85.6 ms (8.84%) median=87.0 ms sd=42.7 t=10.02 (df=24) faster=24/25 COLD CACHE (cache file deleted before each run, n=12) BASELINE mean=1606.3ms median=1588.0ms sd=58.6ms AFTER mean=1593.2ms median=1598.5ms sd=58.6ms DELTA mean=13.2 ms (0.82%) t=0.62 faster=9/12 — within noise Stacks on top of the existing warm-cache wins on this branch (token-count disk cache, output-wrapper cache, prefetched template, native ignore-file prescan, etc.); this single change pushes warm-cache wall-clock another ~86 ms below the previous floor.	2026-05-11 00:13:37 +09:00
Kazuki Yamada	ca44a74f3f	perf(core): Wire warm-cache heuristic into packager and tests Companion to the previous commit. Plumb `tokenCountCacheFileExists` into the packager `defaultDeps` so the metrics warm-up sizing can be exercised deterministically from tests, and add a paired test that asserts the 2-warm-up-worker branch is taken when the persistent disk cache exists. Also rename the cold-cache test to make the new gating explicit and refresh its docstring with the warm/cold distinction. https://claude.ai/code/session_01TJqKkJ8n3r6Pa2JdW9Vp2w	2026-05-10 15:59:43 +09:00
autofix-ci[bot]	371927e211	[autofix.ci] apply automated fixes	2026-05-09 01:07:48 +00:00
Kazuki Yamada	1d2df3bea3	test(search): Update fileSearch tests for prescan-based ignore collection Add fs.readdir mock (returning empty array) to all relevant beforeEach blocks so collectIgnoreFilePatterns does not fail with "entries is not iterable". Update globby option assertions to reflect gitignore: false and ignoreFiles: [] now that patterns are pre-collected by the prescan. https://claude.ai/code/session_01Fm25x51fmGGeFMJyCm1CER	2026-05-09 10:06:46 +09:00
Kazuki Yamada	db647834f5	test(core): Restore '# Directory Structure' assertion in markdown case Earlier push commit (`03b8e70b`) accidentally dropped the `expect(actualOutput).toContain('# Directory Structure')` assertion inside the markdown branch of the integration test's case-style switch. That line still exists on disk and was present pre-change; restoring it to keep the markdown-style coverage symmetric with the plain-style case below it.	2026-05-09 02:38:14 +09:00
Kazuki Yamada	03b8e70b9c	test(core): Wire createSecurityTaskRunner mock into remaining packager tests Final commit of the perf(core) Pre-warm security worker pool change — extends the unit packager test and the integration packager test: - tests/core/packager.test.ts: adds `createSecurityTaskRunner` mock to the orchestration test's `mockDeps` and to the `parallel error handling` `baseDeps()` shared fixture, updates the `validateFileSafety.toHaveBeenCalledWith` assertion to expect the new 6th-argument deps object (`{ taskRunner: <Object> }`), and adds positive/negative gate assertions — `expect(deps.createSecurityTaskRunner).toHaveBeenCalled()` for the default unscoped path, `.not.toHaveBeenCalled()` for the `--include 'src'` and `explicitFiles` (--stdin) paths. - tests/integration-tests/packager.test.ts: adds the `createSecurityTaskRunner` stub so the default-scope path no longer attempts to spawn a real worker pool (the previous unhandled-rejection noise from a missing worker file URL is gone with this change). (See PR description / first source commit for the full perf change rationale, benchmark numbers, and correctness notes.)	2026-05-09 02:35:18 +09:00
Kazuki Yamada	fb281d4560	test(core): Wire createSecurityTaskRunner mock into smaller packager tests Continuation of the perf(core) Pre-warm security worker pool change — extends `mockDeps` / inline pack-test plumbing in the three smaller test files so the default-scope path no longer attempts to spawn a real worker pool from the test environment. - tests/core/packager/diffsFunctionality.test.ts: adds `mockCreateSecurityTaskRunner` to both pack-call sites. - tests/core/packager/splitOutput.test.ts: same — adds the stub to the inline mock deps. - tests/core/security/validateFileSafety.test.ts: updates the `runSecurityCheck` call assertion to include the new `{ taskRunner: undefined }` deps argument forwarded by `validateFileSafety` when no pre-warmed runner is provided. (See PR description / parent commit for the full perf change rationale, benchmark numbers, and correctness notes.)	2026-05-09 02:30:26 +09:00
Claude	15ee2f8d40	perf(core): Use 3 metrics warm-up workers for unconstrained scope Packing. Bumps EAGER_WARMUP_THREADS from 2 to 3 in src/core/packager.ts when the user did not narrow the file set via --include / config.include / --stdin. Tinypool fixes maxThreads at construction, so the 3rd worker must be pre-warmed during the searchFiles + collectFiles window or it stalls dispatch (a 4-thread / 2-warm experiment regressed by 27% paired in a prior iteration). With explicit scope the file set is typically a few hundred files, the metrics phase is shorter, and the 3rd worker's ~250ms BPE warm-up dominates the parallelism gain — paired benchmarks regressed -11.85% on the 258-file `--include 'src,tests'` workload at unconditional EAGER_WARMUP_THREADS=3, so the heuristic falls back to 2. Reasoning. After change 3 on this branch (eager metrics warm-up), the metrics phase is the dominant wall-clock contributor on the default 1046-file workload (~770 ms in `calculate metrics`, vs ~120 ms output generation, ~370 ms search, ~270 ms collect, ~200 ms security check). Five sub-agent investigations over independent scopes (CLI startup, file search/glob, file collect/security, output generation, token counting) converged on metrics worker count as the only candidate clearing the 2% bar without regressing other phases. Output gen, security pre-warm, file-search scoping, and CLI lazy-load were all measured below threshold or net- negative; documented as the previous iteration's notes plus the follow-on attempts here: - EAGER_WARMUP_THREADS=3 unconditional: -11.85% paired regression on the 258-file workload (n=20, t=-10.85), +2.92% on the 1046-file workload — net negative because small workloads can't amortize the extra BPE parse. - Pre-warm the security worker pool gated on the metrics warm-up: security-check phase shrunk from 197 ms to 110 ms, but the saving was absorbed by the parallel `Process Files` branch and an offsetting worker-spawn cost during collect. Paired n=30 measured -4.90% on 258-file and 0.81% (noise) on 1046-file. Reverted. Verification. Paired interleaved benchmarks (n=20, NODE_DISABLE_COMPILE_CACHE=1): Default workload — `node bin/repomix.cjs --quiet` (1046 files): \| \| min \| median \| mean \| max \| sd \| \|--------\|---------\|---------\|---------\|---------\|--------\| \| BEFORE \| 1820 ms \| 1885 ms \| 1886 ms \| 2020 ms \| 45 ms \| \| AFTER \| 1700 ms \| 1845 ms \| 1840 ms \| 1970 ms \| 62 ms \| - Mean paired Δ: +46.5 ms (2.46% wall-clock reduction) - Median paired Δ: +50.0 ms (2.65%) - Paired-delta SD: 65.3 ms · paired t = 3.18 (p < 0.01) - AFTER faster in 15/20 pairs (75%) Scoped workload — `node bin/repomix.cjs --include 'src,tests' --quiet` (258 files): \| \| min \| median \| mean \| max \| sd \| \|--------\|---------\|---------\|---------\|---------\|--------\| \| BEFORE \| 900 ms \| 955 ms \| 953 ms \| 990 ms \| 25 ms \| \| AFTER \| 910 ms \| 940 ms \| 946 ms \| 1010 ms \| 29 ms \| - Mean paired Δ: +6.5 ms (0.68%) — neutral within noise (t = 0.90) - The heuristic falls back to 2 warm workers, so this branch matches pre-change behavior; the small positive delta is sampling noise. An independent reviewer's paired n=15 NODE_DISABLE_COMPILE_CACHE=1 run on a separate sample reported +4.10% (t=6.61, 14/15 pairs) on the default workload, consistent direction at higher magnitude. Correctness. - All 1260 unit tests pass (`npm test`); 3 new tests in `tests/core/packager.test.ts` exercise both heuristic branches plus the `--stdin` (explicitFiles) path. - `npm run lint` clean (only pre-existing warnings unchanged from main). - XML and Markdown output byte-identical between BEFORE and AFTER on both workloads (verified via sha256sum). - Worker-pool size confirmed via `--verbose` logs: - Default scan: `min=1, max=3 threads` for `calculateMetrics`. - `--include 'src,tests'`: `min=1, max=2 threads` (unchanged). - Single-CPU and 2-CPU hosts are unaffected (`min(cpuCount, 3) = min(cpuCount, 2)` for cpuCount ≤ 2). - Public `pack()` API unchanged (no new parameters; the heuristic reads existing `config.include` and `explicitFiles` arguments). Risks. The heuristic is a coarse proxy. Pathological cases: - User runs default scan on a tiny repo (~50 files): 3 workers, +1 extra BPE parse. The cost is bounded by the eager-warm-up overlap with searchFiles/collectFiles, so the worst case approaches the paired noise floor (~30 ms sd on 258-file). Not measured below 50 files; expected to be neutral-to-slightly-negative within typical run-to-run variance. - User runs `--include 'huge-dir'` on a 5000-file project: 2 workers, misses the parallelism win. Falls back to current production behavior — no regression vs main. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2026-05-07 09:48:32 +00:00
Claude	d51de61526	perf(core): Start metrics worker warm-up before searchFiles Background ---------- Each metrics worker independently parses gpt-tokenizer's ~2.2 MB `o200k_base.js` BPE table on its first task (~200-300 ms pure-CPU per worker). The pool was previously created in `pack()` after the file search and sort phases, so the only stages that could absorb the BPE warm-up were `collectFiles` + git subprocesses + security check + file processing. On a 258-file run this still left a residual ~80-130 ms `await metricsWarmupPromise` stall before the metrics phase. Change ------ Move `createMetricsTaskRunner` to fire before `searchFiles`. This adds the ~130 ms glob scan to the hidden warm-up budget and shrinks the residual stall to ~0-12 ms on the 258-file workload. Pool sizing: Tinypool fixes `maxThreads` at construction, and the file count is not yet known. Pre-warming exactly the workers we'll use is essential — Tinypool queues tasks for newly spawned (cold) workers and the pipeline can't progress until those workers finish their BPE parse and pick up the queued task (an experiment with `maxThreads=cpuCount=4` and only 2 warm workers regressed the 258-file workload by 27 % paired). So the pool is sized to a fixed 2 workers (`numOfTasks = 2 × TASKS_PER_THREAD = 400` → `maxThreads = min(cpuCount, 2)`), matching the security pool's hard cap and the typical metrics pool size for repos ≤400 files after the TASKS_PER_THREAD=200 sizing on this branch. Larger repos (>400 files) would benefit from more parallelism, but the 1046-file regression check below shows the eager-warmup gain still net-improves wall-clock at maxThreads=2 (the BPE warm-up cost ~250 ms × cpuCount-2 extra workers dominates the parallelism savings on the metrics phase). On single-CPU hosts the heuristic naturally collapses to maxThreads=1, identical to today's behavior. The `try { } finally { cleanup }` block is widened to cover the new early call so the worker pool is cleaned up on early throws too. A new `searchFiles`-rejection test in `tests/core/packager.test.ts` exercises that path explicitly. `TASKS_PER_THREAD` is exported from `processConcurrency.ts` and consumed by name in `packager.ts` to keep the eager-warmup constant tied to the shared sizing rule. Benchmark --------- Both runs use n=… paired interleaved (alternating BEFORE-first / AFTER-first ordering) with `NODE_DISABLE_COMPILE_CACHE=1` so cold-start BPE parse is measured rather than masked. 4-vCPU Intel(R) Xeon(R) host. `node bin/repomix.cjs --include 'src,tests' --quiet` (258 files, n=20): \| \| min \| median \| mean \| max \| sd \| \|--------\|---------\|---------\|---------\|---------\|--------\| \| BEFORE \| 1007 ms \| 1044 ms \| 1054 ms \| 1164 ms \| 36 ms \| \| AFTER \| 893 ms \| 966 ms \| 962 ms \| 1065 ms \| 36 ms \| - Mean paired Δ: +91.6 ms (8.69 % wall-clock reduction) - Median paired Δ: +97.5 ms (9.34 %) - Paired-delta SD: 36.0 ms · paired t = 11.39 (p < 0.001) - AFTER faster in 20/20 pairs (100 %) Regression check — `node bin/repomix.cjs --quiet` (default, 1046 files, n=15) on a clean repo (baseline binary built outside the working tree so it does not get picked up as a workload file): \| \| min \| median \| mean \| max \| sd \| \|--------\|---------\|---------\|---------\|---------\|--------\| \| BEFORE \| 1769 ms \| 1872 ms \| 1877 ms \| 2063 ms \| 79 ms \| \| AFTER \| 1751 ms \| 1820 ms \| 1837 ms \| 2018 ms \| 61 ms \| - Mean paired Δ: +40.0 ms (2.13 %) - Median paired Δ: +48.6 ms (2.60 %) - Paired-delta SD: 51.7 ms · paired t = 2.99 (p ≈ 0.01) - AFTER faster in 11/15 pairs (73 %) The larger workload also clears the 2 % threshold; the eager warm-up's gain offsets the maxThreads=2 cap that's now applied unconditionally. Correctness ----------- - All 1257 unit tests pass (`npm test`); `npm run lint` clean (only pre-existing warnings). - XML and Markdown output byte-identical between BEFORE and AFTER on both the 258-file and 1046-file workloads. - Worker-pool size confirmed via `--verbose` logs: `min=1, max=2 threads` for `calculateMetrics` on both workloads (was `max=2` on 258 files, `max=4` on 1046 files before this change). - New test `cleans up the metrics worker pool when searchFiles rejects` exercises the widened `try/finally` cleanup path.	2026-05-07 02:56:06 +00:00
Claude	cf013a0c8f	perf(shared): Raise TASKS_PER_THREAD from 100 to 200 to reduce worker contention Background ---------- On a typical CLI run (`node bin/repomix.cjs --include 'src,tests' --quiet`, 258 files, 4-vCPU host), the metrics worker pool was sized as `ceil(258 / 100) = 3 workers`. Combined with the security pool's hard cap of 2 workers (securityCheck.ts:90) and the main thread, the process held 6 active threads on 4 cores during the overlap of `validateFileSafety` and `calculateMetrics`. Each metrics worker independently parses gpt-tokenizer's ~2.2 MB `o200k_base.js` BPE table on its first task — a ~200-300 ms pure-CPU operation per worker. Spawning 3 cold metrics workers in the warm-up phase (calculateMetrics.ts:46-48) therefore drove the security workers off the CPU during their own (concurrent) cold-start, inflating the critical-path security phase. Change ------ Raise `TASKS_PER_THREAD` from 100 to 200 so: - ≤200 file repos: 1 metrics worker (was 1) — no change - 201-400 file repos: 2 metrics workers (was 3) — -1 worker, the win - 401-600 file repos: 3 metrics workers (was 4-cap) — -1 worker - 601-800 file repos: 4 metrics workers (was 4-cap) — no change - 801+ file repos: 4 metrics workers (was 4-cap) — no change (cap) For the 258-file benchmark this brings active workers during the metrics+security overlap to 2 + 2 = 4, matching CPU count, and halves the parallel BPE-loading work in the warm-up phase. Tests for `getWorkerThreadCount` and `createWorkerPool` are updated to reflect the new ratio. Benchmark --------- `node bin/repomix.cjs --include 'src,tests' --quiet` (258 files), n=20 paired interleaved (alternating BEFORE-first / AFTER-first ordering): \| \| min \| p25 \| median \| p75 \| mean \| sd \| \|--------\|---------\|---------\|---------\|---------\|---------\|--------\| \| BEFORE \| 1045 ms \| 1092 ms \| 1109 ms \| 1122 ms \| 1107 ms \| 27 ms \| \| AFTER \| 937 ms \| 973 ms \| 991 ms \| 1020 ms \| 994 ms \| 29 ms \| Mean paired Δ: +112.5 ms (10.17 % wall-clock reduction) Median paired Δ: +115.4 ms (10.66 % wall-clock reduction) Paired-delta SD: 36.2 ms (paired t = 13.88, p < 0.001) AFTER faster in 20/20 pairs (100 %) Regression check — `node bin/repomix.cjs --quiet` (default, 1572 files), n=15 paired interleaved: \| \| min \| p25 \| median \| p75 \| mean \| sd \| \|--------\|---------\|---------\|---------\|---------\|---------\|--------\| \| BEFORE \| 1933 ms \| 1970 ms \| 2016 ms \| 2102 ms \| 2028 ms \| 62 ms \| \| AFTER \| 1955 ms \| 1966 ms \| 2004 ms \| 2131 ms \| 2034 ms \| 74 ms \| Mean paired Δ: -6.2 ms (-0.31 %) (paired t = -0.29, p > 0.05) Median paired Δ: -12.7 ms (statistically neutral) No regression on the large workload — both 100 and 200 saturate the per-CPU cap at 4 workers for ≥800 file repos, so the dispatch-time behavior is identical there. Correctness ----------- - 1256 / 1256 unit tests pass. - `npm run lint` clean (only pre-existing warnings unrelated to this change). - No behavioral change to file processing, tokenization, security checks, or output. Pool sizing is the only effect.	2026-05-07 01:11:11 +00:00
Kazuki Yamada	d10814163d	fix(core): Cover const and external constructors in Dart query Three constructor variants were silently dropped during --compress: - `const Foo(...);` and `const Foo.named(...) : ...;` parse as `(declaration (constant_constructor_signature ...))` — a node type the existing constructor query did not list. - `const factory Foo() = Bar;` parses as `(redirecting_factory_constructor_signature (const_builtin) (identifier) ...)` whose first named child is `const_builtin`, so the leading-anchor `. (identifier)` pattern failed to match. - `external factory Foo.make();` parses as `(declaration (factory_constructor_signature ...))` — bare under `declaration`, not wrapped in `method_signature`, so the existing factory query missed it. Switch the constructor / factory / redirecting-factory queries to capture the whole signature node as `@name.definition.method`. This emits the same source line(s) DefaultParseStrategy already produces and is robust across all body / external / const / redirecting variants.	2026-05-06 22:12:39 +09:00
Kazuki Yamada	def2985b30	feat(core): Capture plain constructor and operator overload in Dart query Two pre-existing gaps surfaced while extending queryDart: - Plain constructors (e.g. `Animal(this.name);`) live directly under `declaration`, not wrapped in `method_signature`, so the existing `(method_signature (constructor_signature ...))` query never matched them. Add a sibling query against `(declaration (constructor_signature ...))`. - Operator overloads (`operator +`, `operator []`, `operator []=`, `operator ==`, ...) parse as `(method_signature (operator_signature ...))` but `operator_signature` has no identifier name field — the operator token surfaces as `(binary_operator)` / `([])` / `([]=)` children. Capture the whole `operator_signature` as `@name.definition.method` so DefaultParseStrategy emits its full source range. Verified against `--compress` on a real Dart file: signatures that were previously dropped (only their `///` doc comments survived) now appear in compressed output.	2026-05-06 22:12:39 +09:00
Kazuki Yamada	ada200a080	feat(core): Capture mixin, typedef, getter, setter, and factory in Dart query intent(dart-query): make --compress preserve Dart definition kinds that were silently dropped — mixin, typedef, getter, setter, factory, and redirecting factory decision(capture-naming): align Dart captures with the dominant @name.definition.X convention used by queryTypeScript/queryPython/queryRust; output is unchanged because DefaultParseStrategy matches via name.includes('name') constraint(redirecting-factory): tree-sitter-dart grammar makes redirecting_factory_constructor_signature a child of `declaration`, not `method_signature`, so it must be queried bare to avoid a "Bad pattern structure" parse error constraint(type-alias): type_alias's name node is `type_identifier`, not `identifier` — using `identifier` would silently match nothing learned(external-keyword): `external` modifier in Dart is a sibling token outside function_signature/method_signature, so existing captures already cover `external void foo();` without changes	2026-05-06 22:12:39 +09:00
Kazuki Yamada	3ff49306e1	refactor(file): Simplify cheap pre-screen down to NULL probe + BOM exemption intent(simpler): 前 commit の cheap pre-screen は `isbinaryfile@5.0.2` の `isBinaryCheck` のうち valid UTF-8 でも binary 判定する 3 規則 (PDF magic / NULL / suspicious 制御バイト比率 >10%) を mirror していたが、`TextDecoder('utf-8', { fatal: true })` を `isBinaryFile` の前に動かしている時点で valid UTF-8 buffer は protobuf detector に渡らない。pathological case 回避という主目的は TextDecoder の reorder だけで達成しており、PDF magic と suspicious-byte ratio の mirror は (1) 実害ほぼゼロのエッジケースを救うだけで (2) `isbinaryfile` 内部実装への coupling を抱える、という割に合わない構成だった。 fix(simplify): cheap pre-screen を NULL-byte probe + BOM exemption の最小構成に縮小。 decision(keep-null-probe): NULL byte だけは独立した正当な理由で残す — `U+0000` は XML 1.0 で不正な文字で、本ツールの主出力フォーマット (XML) の正当性を破壊する。`TextDecoder` は `0x00` を valid UTF-8 (U+0000) として通すので、ここで弾かないと NULL を含む buffer が text として pack され、downstream の XML parser が落ちる。これは `isbinaryfile` の rule mirror ではなく、repomix 自身の出力 robustness 要件。 decision(drop-pdf-magic): PDF は `is-binary-path` の `.pdf` 拡張子で先に弾かれる。拡張子なしの ASCII-only PDF stub は実例ほぼゼロ (本物の PDF は cross-ref とバイナリストリームを内包し UTF-8 decode で失敗する経路を通る)。守る価値が低い。 decision(drop-suspicious-ratio): 純粋な C0 制御バイト高比率の valid UTF-8 buffer は実プロジェクトに存在しない。`isbinaryfile` の UTF-8 lookahead を完全 mirror する保守コスト (DEL boundary 等のドリフトリスク) > 効用。 constraint(coupling-minimal): NULL byte は universal な binary signal で、`isbinaryfile` の rule に縛られない。同 BOM exemption も標準 BOM の規格に準拠したもので upstream ドリフトの影響を受けない。 test(cleanup): 関連 regression test 3 件を削除 (PDF magic / suspicious ratio / DEL boundary)。これらは削除した規則の挙動を保証するもので、現実装ではすべて意図的に「text として pack」する。残るのは UTF-8 multi-byte / UTF-8 BOM+NULL / UTF-16 LE BOM の 3 件で、いずれも本 PR が回避したい pathological / regression を直接守る。 bench(no-regression, M-series Mac, hyperfine --warmup 1 --runs 5): - `node bin/repomix.cjs --quiet`: 399ms → 418ms (誤差範囲、JS の手書き 512-byte loop が消えた分の差は noise floor) - 出力差分: 既存の base に対して 0 ファイル削除、Korean md が +1 (silent drop 解消、変更なし) - fileRead.ts: 175 → 159 行 (-16 行)。pre-screen 関連で実質 ~35 行削減。 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 23:11:29 +09:00
Kazuki Yamada	0248b61085	fix(file): Include UTF-8 BOM in cheap pre-screen exemption intent(no-regression): codex re-review で指摘 — BOM 例外関数 `hasNonUtf8TextBom` が UTF-16/UTF-32/GB18030 のみで UTF-8 BOM (`EF BB BF`) を含めていなかった。`isbinaryfile@5.0.2` の `isBinaryCheck` は UTF-8 BOM を見た瞬間に `false` を返すため、`EF BB BF 00 41` のような buffer は変更前は text として fast path に流れていた。今回の差分では UTF-8 BOM 後の NULL byte が cheap probe で先に拾われ binary 判定される regression。 fix(utf8-bom-exempt): 関数を `hasTextBom` に rename し、UTF-8 BOM (`EF BB BF`) の判定を最初に追加。`isbinaryfile` 本家と同じ並びで BOM 例外を持たせ、cheap probe を skip して UTF-8 fast path に到達させる。 constraint(test-source-non-binary): codex iter2 指摘 — 新規 BOM exemption テストの期待値に raw NUL byte literal を埋めると `fileRead.test.ts` 自身が `grep`/`rg` から binary 扱いになり、ファイル横断検索や CI のテキスト走査ジョブから不可視化される。`'\0A'` エスケープ表記に置換して、実行時の文字列 (char codes 0, 65) は同じまま source は ASCII に戻した。 test(regression): 1 件追加。 - `EF BB BF 00 41` (UTF-8 BOM + NULL + 'A') が `binary-content` で skip されず、UTF-8 fast path で `'\0A'` として decode されることを確認。 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 20:41:30 +09:00
Kazuki Yamada	6e81c449b6	fix(test): Drop hyphen from "mis-classified" to satisfy typos check intent(ci-green): typos@1.45.1 flags `mis` as a typo of `miss`/`mist` and the hyphenated `mis-classified` in the new PDF-magic regression test comment trips the `Check typos` job. The unhyphenated `misclassified` is the more common spelling and passes the dictionary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 20:32:38 +09:00
Kazuki Yamada	5ab82d5152	fix(file): Mirror isbinaryfile's PDF magic + suspicious-byte ratio rules in cheap pre-screen intent(no-regression): codex review で指摘 — 元の差分は cheap pre-screen を NULL-byte probe + UTF-16/UTF-32 BOM 例外のみで構成していたが、`isbinaryfile@5.0.2` の `isBinaryCheck` には valid UTF-8 でも binary 判定する規則が他に2つある: (1) 先頭 5 バイトが `%PDF-` (PDF magic), (2) 先頭 512 バイト中の suspicious 制御バイト比率 >10%。これらを cheap pre-screen に含めないと、`%PDF-` 始まりの拡張子なし/`.txt` ファイルや、ASCII 制御文字が高比率の valid UTF-8 ファイルが従来 skip されていたのに pack に含まれる回帰が発生する。 fix(pdf-magic): UTF-16/UTF-32 BOM 例外の後、NULL probe の前に `%PDF-` 判定を追加。`isbinaryfile` 本家と同じ位置。 fix(suspicious-ratio): 既存の NULL probe ループに suspicious カウンタを追加し、ループ後に `>10%` 閾値で binary 判定。suspicious 集合は `isbinaryfile` の `(b < 7 \|\| b > 14) && (b < 32 \|\| b > 127)` 条件を valid-UTF-8 入力に絞って簡略化したもの: `b < 7` または `b in 0x0F..0x1F`。これは valid UTF-8 multi-byte の continuation/lead bytes (0x80..0xFF) と排他なので、UTF-8 awareness なしの flat byte scan で正しい結果になる。protobuf 検出器 (`isBinaryProto`) は意図的に mirror しない — それが本 PR で回避している pathological case 本体。 constraint(del-boundary): codex 再レビュー指摘 — 当初 0x7F (DEL) を suspicious 集合に含めていたが、`isbinaryfile` の条件 `b < 32 \|\| b > 127` は 127 を排除する。修正版では `b === 0x7f` を外し、コメントも本家挙動に合わせて訂正。 test(regression): 3 件追加。 - valid-UTF-8 PDF magic (`%PDF-1.4\n...`) を `binary-content` で skip することを確認 - 64 バイトの 0x01 のみの buffer (suspicious 100%) を `binary-content` で skip することを確認 - 64 バイトの 0x7F のみの buffer (valid UTF-8, DEL 100%) は skip しないことを境界として固定 bench(no-perf-regression, M-series Mac, hyperfine --warmup 1 --runs 5): - `node bin/repomix.cjs --quiet`: 406ms → 399ms (誤差範囲) - pre-screen の追加コストは 512 バイト線形スキャン分で、UTF-8 fast path (元から TextDecoder で全バッファ走査) に比べて無視できる Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 16:15:44 +09:00
Kazuki Yamada	8a08815ba1	perf(file): Try UTF-8 decode before isBinaryFile to dodge protobuf-detector pathological case intent(latency): `node bin/repomix.cjs` がリポジトリ自身を pack する際の wall-clock が PR #1533 (`docs(website): Add localized page metadata`, commit `9bd663ae`) で 0.38s → 1.15s に約3倍に増えた。9bd663ae は 14言語 × 24ページ = 336個の md に YAML frontmatter (`title` + `description`) を追記しただけで、出力サイズはほぼ変わらないのに実行時間だけが伸びていた。 root-cause(isbinaryfile): `readRawFile` は全ファイルの buffer を `isBinaryFile` (= `isbinaryfile` パッケージ) に通してから UTF-8 fast path に進む。`isbinaryfile` の `isBinaryCheck` は protobuf-shape 検出器 (`isBinaryProto`) を含み、これが任意ファイル先頭バイトを varint として解釈し `new Array(varint)` で配列を確保する。一部の正当な UTF-8 バイト列ではこのループが数秒スピン or `RangeError: Invalid array length` を投げる。具体例: `website/client/src/ko/guide/tips/best-practices.md` (4,243 bytes, valid UTF-8 韓国語 md) は単独で `isBinaryFile` 呼び出しに ~3,500ms かかり最終的に throw → 外側の try/catch で握り潰され `encoding-error` で silent drop されていた。デフォルトの pack 時はこの 1 ファイルだけで毎回 ~3,500ms を払っていた。これは upstream `isbinaryfile` のバグ (信頼できない入力で `new Array(n)` を bound せず確保) だが、修正を待たずに自衛する。 fix(reorder): `isBinaryFile` を UTF-8 fast path の後に動かし、UTF-8 として decode 失敗した buffer のみに適用する。NULL バイト (= U+0000、valid UTF-8) を含むバイナリは UTF-8 fast path を素通りしてしまうため、`isBinaryFile` の前に 512 バイトの cheap な NULL-byte probe を挿入。NULL は最強のバイナリシグナルかつ `isBinaryCheck` のうち UTF-8 fatal decode を通過する入力に triggers する唯一の規則。残りの heuristics (PDF magic / suspicious-byte ratio / protobuf shape) は非 UTF-8 バイト列を要求するので、UTF-8 fast path に乗らないファイルだけが従来通り `isBinaryFile` に渡る。 constraint(utf16-utf32-bom): UTF-16 LE は ASCII `A` を `0x41 0x00` と encode し、UTF-32 BE BOM は `0x00 0x00 0xFE 0xFF` で始まる。NULL probe をそのまま走らせるとこれらの text ファイルを binary 誤判定する。`isbinaryfile` 自身の `isBinaryCheck` は BOM 例外を持っているので、`hasNonUtf8TextBom` でこれを mirror し、UTF-16/UTF-32 BOM 始まりの buffer は probe を skip して slow path (jschardet+iconv) にそのまま落とす。挙動は pre-change と同一。 side-effect(restored-file): 上の Korean Markdown ファイルは throw → silent drop されて出力から消えていたが、本修正後は正しく出力に含まれる。 test(regression): `tests/core/file/fileRead.test.ts` に2件追加。 - valid UTF-8 multi-byte (Hangul 3-byte 連続、NULL なし) を text としてそのまま round-trip - UTF-16 LE BOM ファイル ("Hello\n") が NULL を含んでも slow path で正しく decode bench(local, M-series Mac, hyperfine --warmup 1 --runs 5): - `node bin/repomix.cjs --quiet` 全体: 1152ms → 406ms (約 2.8× 速) - `--include 'website/client/src/ko/guide/tips/best-practices.md'` 単独: 880ms → 170ms (約 5.2× 速) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 15:56:10 +09:00
Claude	98866f001f	perf(metrics): Increase METRICS_BATCH_SIZE from 10 to 50 Cuts per-thread IPC overhead in calculateFileMetrics by ~80% by sending fewer, larger batches to the metrics worker pool. At ~1000 files this reduces the dispatched task count from ~100 to ~20 (~5 batches per worker on a 4-thread pool instead of ~25), saving ~40 estimated worker-ms of pure serialization/dispatch overhead per worker on the critical path. The previous size of 10 was tuned for fast worker availability to overlap with output generation, but tinypool round-trips were measured at ~2ms minimum (batches of trivially small files bottom out there), so the IPC dominated for short batches. Increasing to 50 keeps load balance comparable (~5:1 per thread vs. the prior ~25:1) and avoids the load-imbalance pitfall of even larger sizes (e.g. 100), where one oversized batch can monopolize a worker and stretch the tail. Benchmark on this repo (1016 files, 4 cores; 100 paired interleaved runs alternating BATCH=10 and BATCH=50 with the same prebuilt JS, swapping only the constant, full pipeline timed end-to-end): BATCH=10 mean 1.5589s median 1.5480s stdev 0.0966s BATCH=50 mean 1.5111s median 1.5030s stdev 0.0625s Saved 47.8ms (3.07%) mean / 45.0ms (2.91%) median (trimmed mean 41.0ms, dropping top/bottom 10%) Wins 69/100 paired runs faster with BATCH=50 t-test paired t = 4.545 (df=99, p < 0.001) 95% CI [27.2 ms, 68.4 ms] on the mean improvement Sign test 2-sided p = 0.0002 Also adds a regression test for the batching path: with 120 input files the test asserts result order/completeness and that the progress callback fires once per dispatched batch (not per file, not just once). All 1250 tests pass; output is byte-identical for the same source tree.	2026-04-30 00:06:39 +09:00
Kazuki Yamada	f67731056a	test: Round-3 PR review feedback - validateFileSafety: pin the negative path of `if (config.security.enableSecurityCheck)` — every other test enabled the check, so a regression that always runs the security check would have passed silently. - unifiedWorker: - Add a positive workerData=securityCheck + ambiguous-task case so the pair (override + this) distinguishes "inference always wins" from "inference wins only when it yields a value". - Stop pretending the handler-cache test verifies caching. Both branches of `if (cached) return cached;` end with the same Map.set, and Node's own module cache makes the dynamic import effectively free, so the cache is unobservable from outside without exposing internals. Renamed to "repeated calls" with a comment explaining the limitation. - fileSystemReadDirectoryTool: translate the pre-existing Japanese comment to English per CLAUDE.md. - TokenCounter: extract `LoadEncodingFn` type alias instead of the unusual `typeof loadEncoding`, so a signature drift between the local function and the deps field would surface at the type level.	2026-04-26 22:47:21 +09:00
Kazuki Yamada	7ff8c8b155	test: Address PR review feedback (RepomixError instances + mock isolation) - outputGenerate: tests titled "throws RepomixError…" / "wraps … in RepomixError" now assert the rejection is an instance of RepomixError in addition to the message regex, matching the test names. - LanguageParser: collapse the duplicate getParserForLang('javascript') rejection assertions into a single .catch capture that checks both type and message. - calculateMetrics: vi.mocked(initTaskRunner).mockReset() before mockReturnValueOnce so a future test that omits taskRunner can't silently consume the override. - packager: pre-attach a no-op .catch on the rejected warmupPromise so vitest's unhandled-rejection detection doesn't fire before pack awaits it. Production code mirrors this pattern in packager.ts:262.	2026-04-26 22:29:06 +09:00
Claude	1bef40ba2d	test: Tighten misleading test names and pin packageManager guard Address two review threads on PR #1518 that flagged tests whose titles overstated what was being verified. - fileProcess: the longBase64 string is one continuous line, so the truncateBase64 → removeEmptyLines ordering was never actually under test (truncateBase64Content's regex does not span newlines). Rename to describe the combined behavior the test really pins. - skillTechStack: rename the per-directory case to reflect that root and subpackage land in separate buckets keyed by getDirPath, and add a second case with two package.json entries at the same path to genuinely exercise the parsed.packageManager && !result.packageManager guard at skillTechStack.ts:541.	2026-04-26 13:24:32 +00:00
Kazuki Yamada	bcd849177f	refactor(metrics): Inject loadEncoding via deps for testability Address PR review feedback (claude R4): the previous error-handling tests overwrote the private `countFn` field via a cast, which silently breaks on a rename. Add a `deps` parameter to TokenCounter that defaults to the real `loadEncoding`, and switch the error-handling tests to inject a fake that returns the throwing function directly. Matches the dependency injection pattern documented in CLAUDE.md.	2026-04-26 22:22:58 +09:00
Kazuki Yamada	e5f7a1f311	fix(shared): Address PR review feedback - shared/errorHandle: recognize duck-typed OperationCancelledError from worker boundaries in isRepomixError (it extends RepomixError but the name was missing from the structured-clone fallback comparison). Add a regression test for the worker-boundary case. Test improvements per coderabbit / claude review: - cliReport: assert skill-directory + relative path on the same log line. - processConcurrency: restore process.versions.bun by removing the property when it didn't originally exist, instead of leaving it defined-as-undefined. - logger: drop the no-op `process.env.REPOMIX_LOG_LEVEL = undefined` (it coerces to the string "undefined" and is overwritten by the next delete). - unifiedWorker: replace the tautological cache test with one that proves cache uniqueness via onWorkerTermination cleanup count; add a test for task-based inference overriding workerData (bundled-env reuse). - calculateMetricsWorker: new direct test for the default export's items vs. single-mode dispatch — unifiedWorker mocks this module so the branch was otherwise untested. - packRemoteRepositoryTool: hard-code the expected output path instead of expect.any(String) to catch arg-swap regressions. - memoryUtils: tighten getMemoryStats assertions with sanity bounds (heapUsed <= heapTotal, rss > 0, heapUsagePercent <= 100) so a unit-conversion regression (bytes vs MB) would fail the test.	2026-04-26 22:20:42 +09:00
Kazuki Yamada	94cf791102	fix(test): Use path.join in cliReport skill-directory assertion for Windows The test asserted on a literal '.claude/skills/test-skill' substring, but getDisplayPath calls path.relative which produces backslashes on Windows. CI failed on all Windows runners. Constructing the expected substring with path.join makes the assertion OS-native.	2026-04-26 19:55:58 +09:00
Kazuki Yamada	402e4906d7	test: Pin v1.14.0 regression-prone invariants Targeted regression tests for the high-risk areas identified in the v1.13.1..main audit, focusing on silent-correctness bugs and parallel error handling — places that wouldn't surface in CI but would in the field. - core/metrics/calculateMetrics: pin numeric equivalence between the fast path (Σ file tokens + wrapper tokens) and the slow path (full output tokenization). Cover wrapper-extraction fallback, split-output fallback, and worker pool cleanup when fileMetrics rejects. - core/file/fileProcess: pin transform ordering invariants — removeComments → removeEmptyLines (blank lines from comment removal must be cleaned up; preserved when removeEmptyLines is off); truncateBase64 → removeEmptyLines (multi-line base64 squashed first); trim → showLineNumbers (no leading/trailing blanks numbered). Plus worker/lightweight path parity for inputs that don't need worker processing. - core/packager: pin metrics worker pool cleanup on parallel branch failures (validateFileSafety, produceOutput, calculateMetrics, warmup rejection). Verify prefetchSortData failure is isolated and does not block sortOutputFiles. - core/skill/skillTechStack: cover untested fix-commit invariants — root entry sorts first in monorepo output; configFiles deduplicated within a directory; first-seen packageManager wins per directory.	2026-04-26 19:43:53 +09:00
Kazuki Yamada	cbdfc29b4d	test: Cover error/edge paths in core (output, file, security, treeSitter) Lift the four most impactful uncovered files past 90% lines without introducing fragile or contrived tests. Each block targets real user-facing branches (error handling, optional features, init/dispose). - core/output/outputGenerate (78% -> ~90%): - buildOutputGeneratorContext: instructionFilePath success and missing-file paths; pre-computed vs. searchFiles fallback for empty directories; full-tree mode (success and listing failure); searchFiles failure wrap. - generateOutput: unsupported style throws RepomixError. - core/security/validateFileSafety (79% -> ~95%): - logSuspiciousContentWarning loop: header line per section, plus singular ("issue") and plural ("issues") suffix per result. - No-op behavior when no suspicious git diff/log entries exist. - core/file/fileSearch (88% -> ~92%): - handleGlobbyError: EPERM and EACCES translated to PermissionError; other error codes pass through. - Outer catch: generic Error wrapped with directory context; non-Error throw produces the generic fallback message. - core/treeSitter/languageParser (74% -> ~88%): - getResources before init() throws RepomixError. - init() is idempotent (Parser.init is called only once across two calls). - Parser.init() failure is wrapped as RepomixError. - dispose() resets state so subsequent calls require re-init. Coverage: - Statements 89.51% -> 90.23% - Branches 79.31% -> 80.26% - Functions 89.37% -> 89.69% - Lines 90.06% -> 90.80%	2026-04-26 19:35:00 +09:00
Kazuki Yamada	9aac452504	test: Raise overall coverage from 87.9% to 90.1% Cover previously-untested paths across the shared, cli, core, and mcp layers, focusing on branches that represent real user-facing behavior rather than line-coverage chasing. Highlights: - shared/errorHandle: cover handleError (RepomixError, unexpected Error, unknown values, duck-typed worker errors, debug-level branches) and the three error class constructors. - shared/logger: cover setLogLevelByWorkerData for env-var, workerData (array and object shapes), and invalid/missing inputs. - shared/memoryUtils: add a fresh test file covering stats, log helpers, and withMemoryLogging success/error paths. - shared/processConcurrency: cover cleanupWorkerPool (Node, Bun-skip, swallowed teardown errors) and the run/cleanup delegation. - shared/unifiedWorker: cover the cache-hit path and the workerData (array/object) and REPOMIX_WORKER_TYPE detection branches. - core/metrics/TokenCounter: cover the catch branch (Error, non-Error throws, with/without filePath). - core/file/fileManipulate: cover removeEmptyLines on inherited base and composite manipulators. - cli/cliReport: cover skill-directory and split-output summary lines. - mcp/tools/packRemoteRepositoryTool: add tests mirroring the packCodebaseTool pattern (success, runCli failure, runCli throw, workspace creation failure). - mcp/tools/fileSystemReadDirectoryTool: switch to mocking node:fs/promises so existing mocks actually intercept calls, and cover the file-vs-dir, listing, empty-directory, and readdir-error paths. Result: - Statements 87.29% -> 89.51% - Branches 76.16% -> 79.31% - Functions 87.60% -> 89.37% - Lines 87.89% -> 90.06%	2026-04-26 19:28:09 +09:00
Kazuki Yamada	44db93451d	test(core): Cover precondition guards in truncateBase64 intent(truncateBase64-tests): add explicit coverage for the new fast-path guards introduced alongside the regex-skip optimization decision(test-cases): focus on the four cases that exercise guard behavior not previously asserted — empty input, exactly-below-threshold (255 chars), run-reset on non-base64 separator, and non-base64 data URI without `;base64,` Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 18:09:02 +09:00
Kazuki Yamada	c2059ff90f	refactor(shared): Address PR review nitpicks on asyncMap intent(asyncMap): tighten doc and test based on coderabbitai review feedback decision(asyncMap-doc): explicitly note that workers are not cooperatively cancelled after a rejection — sibling workers keep claiming indices decision(asyncMap-test): replace timing-sensitive `peakActive > 1` with exact `=== 4` — workers spawn synchronously via Promise.all so the cap is hit deterministically Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 16:53:44 +09:00
Kazuki Yamada	3ddc446143	refactor(core): Bound concurrency for empty-directory readdir checks intent(empty-dir-check): protect very large repos from FD exhaustion that unbounded Promise.all could trigger rejected(p-limit): user wants to keep dependencies minimal — built a small in-tree helper instead decision(asyncMap): single mapWithConcurrency helper rather than a p-limit-style limiter object — only call site is array map decision(concurrency-limit): 20 in flight — well above libuv default thread pool (4) while still bounded for users who tuned UV_THREADPOOL_SIZE Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 16:41:38 +09:00
Claude	4bd257e280	perf(core): Speed up empty-directory detection in file search Merge the two separate globby traversals used by `searchFiles` into a single one and parallelize the per-directory `readdir` calls used to filter empty directories. Background ---------- When `output.includeEmptyDirectories` is enabled (the default for `repomix.config.json` in this repo, and any repo that wants an accurate directory tree), `searchFiles` previously walked the working tree twice: once with `onlyFiles: true` and a second time with `onlyDirectories: true`. Each call re-traversed the tree and re-parsed every `.gitignore` / `.repomixignore` file. `findEmptyDirectories` then issued `readdir` serially for every matched directory, awaiting each syscall before starting the next. Change ------ * Replace the two globby invocations with one `objectMode: true, onlyFiles: false` call. Partition the returned `GlobEntry[]` by `dirent.isFile()` / `dirent.isDirectory()`, matching the previous `onlyFiles: true` semantics for symlinks and other non-file non-directory entries. * Rewrite `findEmptyDirectories` to run the per-directory `readdir` checks concurrently via `Promise.all`. Ordering is preserved by the result array and the caller sorts the final list anyway. * When `includeEmptyDirectories` is disabled, keep the fast `onlyFiles: true` path unchanged so the default CLI run pays no cost. Benchmark (hyperfine, repomix packing itself, 30 runs, warmup 3) ---------------------------------------------------------------- Run 1: baseline 2.162s ± 0.042s → perf 2.017s ± 0.029s → -145ms (-6.7%) Run 2: baseline 2.161s ± 0.023s → perf 2.030s ± 0.027s → -131ms (-6.1%) Per-stage verbose timings: baseline: [globby files 200ms] + [globby dirs 85ms] + [empty dirs 61ms] perf: [combined globby 223ms] + [empty dirs 66ms] saved: -57ms consistently on the critical path	2026-04-26 16:41:38 +09:00
Kazuki Yamada	b25f4446d8	test(server): Relocate server tests into website/server intent(test-ownership): move tests into website/server/tests/ so they collocate with the code under test and stop reaching up through three parents; reviewer follow-up wanted dedicated coverage and the root vs. website/server package boundary makes collocation the right long-term layout decision(vitest-config): give website/server its own vitest.config.ts + `test` script; root's existing tests/*/.test.ts include no longer catches server tests since they moved outside that tree, so the two test runs stay independent decision(tsconfig-test): add tsconfig.test.json extending the build config and lift lint-tsc to `-p tsconfig.test.json` — the build tsconfig's rootDir: "./src" excludes tests/, so a single lint command wouldn't have type-checked them learned(valibot-instanceof): with tests now resolving valibot from the same website/server/node_modules as validateRequest, the cause-check can go back to `instanceof v.ValiError` — the duck-type workaround was only needed when the root harness and server pulled different valibot copies constraint(ci-website): added a `test-website-server` job that links the local repomix build the same way lint-website-server does; tests don't actually import repomix today, but colocation means they easily could later and the link step keeps parity	2026-04-19 21:47:29 +09:00
Kazuki Yamada	04b9612758	test(server): Add dedicated coverage for validateRequest intent(server): land validateRequest unit tests alongside this PR's rewrite per reviewer follow-up — previously only exercised indirectly via classifyRejectReason with mocks, so silently dropping `.cause` during a future refactor would not fail any test learned(valibot-instanceof): website/server resolves valibot from its own node_modules (distinct copy from the root harness), so `instanceof v.ValiError` fails across that boundary even for genuine ValiError instances; duck-type the cause check on `{ name: 'ValiError', issues: [] }` to match classifyRejectReason's own contract Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 16:43:52 +09:00
Kazuki Yamada	b7f4a810a0	test(server): Use MESSAGES constant in cause-chain test fix(drift-guard): swap the hard-coded 'Either URL or file must be provided' literal for MESSAGES.MISSING_INPUT so the cause-chain test honors the classifier drift-guard the file's header comment promises — a future message rewrite now propagates to this assertion too, instead of silently falling back to 'unknown' Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 16:41:30 +09:00
Kazuki Yamada	e53502eb46	fix(server): Address PR review feedback fix(pack-event-test): migrate tests/website/server/packEventSchema.test.ts to the valibot-shaped `{ name, issues: [{ message, path: [{ key }] }] }` fixture so the invalid_format case — which relies on the path fallback — actually exercises the post-migration classifier decision(path-formatting): replace the hand-rolled `path.map/filter/join` in both packEventSchema.ts and validation.ts with `v.getDotPath(issue)` — the valibot built-in does the same reconstruction and eliminates the two-site drift risk reviewers flagged fix(validation-error-message): drop the leading `": "` when a valibot issue has no path (top-level checks like MISSING_INPUT / BOTH_PROVIDED), so the rendered AppError message reads cleanly rejected(cause-walk-depth): coderabbit suggested walking `.cause` recursively up to depth 3 — declined because validateRequest wraps exactly once by contract and no second wrapper exists in the chain; defensive recursion would add surface area for no caller Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 16:23:45 +09:00
Kazuki Yamada	9f6d0b5bdb	Merge pull request #1489 from yamadashy/perf/try-valibot perf(config): Migrate configSchema from zod to valibot (experimental)	2026-04-19 10:39:01 +09:00
Kazuki Yamada	b27bebb1c5	test(config): Round out non-integer rejection parity intent(parity-coverage): the numeric-constraint enforcement block already rejects non-integer `maxFileSize` and `splitOutput`, but the three `v.pipe(v.number(), v.integer(), v.minValue(1))` siblings (`topFilesLength`, `git.sortByChangesMaxCommits`, `git.includeLogsCount`) only had min-bound rejection. Add non-integer cases for each so every pipe-guarded numeric field has matching fencepost coverage	2026-04-19 01:11:36 +09:00
Kazuki Yamada	85b0c7a7fb	fix(shared): Clean up root-level validation error formatting intent(readability): an issue with no `path` (e.g., root-level schema mismatch) previously rendered as `[] message`; emit just `message` when segments are empty. Small quality-of-life for error output intent(limitation-pin): add an integration test that documents the known ESM unwrap ambiguity — a CJS module shaped like `{ default: {...}, otherKey: ... }` has `otherKey` silently dropped by our heuristic. Non-issue for RepomixConfig today, but worth freezing so the behavior can't drift without someone noticing	2026-04-18 23:23:53 +09:00
Kazuki Yamada	53ddb5ac36	test(website): Extract and cover normalizeObjectNode intent(test-coverage): normalizeObjectNode is the load-bearing workaround for @valibot/to-json-schema's output gaps (missing additionalProperties:false, noisy empty required arrays). A bug in its tree walk would silently break the published repomix.config.json IntelliSense schema. Extract to its own module and pin the behavior: adding/respecting additionalProperties, stripping empty required, recursing through properties / anyOf / oneOf / items, handling null and primitives	2026-04-18 23:13:43 +09:00
Kazuki Yamada	10846fbf8d	refactor(server): Extract validation messages to shared constants Addresses PR #1493 round 5 follow-up (schema-drift tracking) from claude. intent(eliminate-drift): prior reviews flagged the string-coupling between packRequestSchema (produces zod messages) and classifyRejectReason (matches them back to a metric label) as fragile — a message rewrite in one place would silently land requests in the 'other' bucket, quietly mislabeling the dashboard. Extract all 11 messages to a shared MESSAGES module so the producer, the consumer, and the test's expected values all reference the same constants decision(map-over-switch): classifyRejectReason's 11-case switch becomes a MESSAGE_TO_REASON lookup table. Same runtime behavior, easier to scan, and the keys are compile-time-typed via the shared constants (so a typo in either side is a TS error) decision(preserve-invalid_format-path): 'invalid_format' is keyed by path (`format`), not by message text. Left as path-match so it continues to work even if zod's default enum error message changes improve(test-references-constants): test.each cases now reference MESSAGES directly. The test continues to catch classifier-logic drift (wrong label for a known key, missing entry in the lookup) while schema/classifier drift becomes impossible by construction Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 23:07:25 +09:00
Kazuki Yamada	744bb7fcdc	test(config): Pin CliSchema intersect preservation in isolation intent(pre-merge-gate): buildCliConfig parses against repomixConfigCliSchema before mergeConfigs, so the intersect must preserve both the base-schema `output.filePath` and the CLI-only `output.stdout` + top-level `skillGenerate` before the merged schema ever sees the value. Add a targeted test so a future change to the intersect shape doesn't silently strip CLI fields at the cli-parse step	2026-04-18 23:02:56 +09:00
Kazuki Yamada	e580284729	fix(test): Drop server-only imports from packEventSchema test intent(ci-green): CI failed on `Cannot find package 'repomix'` because the test imported `packRequestSchema.ts` which has `import from 'repomix'` — the root vitest harness doesn't install server-only packages (repomix is this repo, hono is a server dep). Local runs passed by coincidence of self-reference resolution decision(construct-zoderror-directly): rewrite the test to construct `z.ZodError` instances with messages matching the schema rather than invoking the schema. Keeps classifier-drift coverage (if the switch stops matching a known label, a test fails) at the cost of not catching schema-drift (if a zod message in packRequestSchema is edited without updating the test, the test still passes but dashboards mislabel). Schema-drift catching would need a shared constants module — tracked as a follow-up decision(native-error-cause): use `new Error(msg, { cause })` instead of importing `AppError`. AppError pulls `hono/utils/http-status` (type-only, but some module resolvers still probe). Native Error with cause exercises the same classifier code path improve(table-driven): restructure the 11 label cases as `test.each` so each case is named individually in the test output Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 22:51:50 +09:00
Kazuki Yamada	43eb51cac4	test(server): Add classifyRejectReason tests + document Boolean() intent Addresses PR #1493 round 3 review from claude. intent(catch-schema-drift): every previous review round flagged the fragile string coupling between classifyRejectReason and the zod messages in packRequestSchema. The earlier response was that adding tests would require new infrastructure in website/server/ — but tests/website/cliCommand.test.ts already tests website/ code under the root vitest harness. Extended the same pattern to cover every rejection path through the real schema so message edits surface as CI failures instead of silent 'other'-bucket dashboard drift decision(table-driven-through-real-schema): the tests invoke validateRequest(packRequestSchema, input) rather than constructing synthetic ZodError objects — this catches both classifier drift AND schema drift in one assertion. 13 tests covering 11 reject labels + cause-chain extraction + non-error input improve(boolean-intent-comment): claude noted that Boolean() collapses `undefined` (user didn't send field) with `false` (user explicitly disabled), losing that distinction in the pack_options_usage metric. This is intentional — the metric answers "what % of packs had compress enabled" where both "off" and "unspecified" correctly mean the feature wasn't active. Added a comment so future contributors don't "fix" it Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 22:45:43 +09:00
Kazuki Yamada	e2a63eede8	test(config): Guard CLI-only fields through MergedSchema intersect intent(regression-guard): `repomixConfigDefaultSchema.output` is strict and omits `stdout`, so any future change that alters `v.intersect` merging semantics would silently drop the CLI `--stdout` flag (and `skillGenerate`) after `mergeConfigs` validates the merged value. Pin the behavior with a targeted test	2026-04-18 22:37:57 +09:00
Kazuki Yamada	7c2e8791a6	fix(config): Address latest PR review feedback intent(interop-consolidation): drop `interopDefault: true` from the jiti setup in configLoad — the explicit ESM namespace unwrap at the call site already handles every module-format case we test (.ts / .mts / .js / .mjs / .cjs). Having both the jiti flag and the manual unwrap was redundant and made the intent fuzzier intent(error-path-cleanliness): filter out empty path segments before joining in rethrowValidationErrorIfSchemaError — a malformed ValiError item (object without `key`) would otherwise produce `[output..style]`; dropping the empty entry keeps the path readable. Added a dedicated test covering the filter	2026-04-18 17:19:33 +09:00
Kazuki Yamada	d83224a7cf	test(config): Cover numeric constraint enforcement in schemas intent(behavioral-coverage): previous tests verified structural equivalence to the zod schemas but never asserted that the integer / minValue / maxValue pipes actually reject invalid values — add focused rejection cases for maxFileSize, topFilesLength, splitOutput, and the git counts so the constraint migration is proven behaviorally decision(default-schema-maxvalue): mirror the `v.maxValue(Number.MAX_SAFE_INTEGER)` from the base schema onto `repomixConfigDefaultSchema.splitOutput` so the upper bound is enforced regardless of which schema the caller parses against — the prior edit only touched the base entry	2026-04-18 17:02:54 +09:00
Kazuki Yamada	193e4dd7c8	test(config): Cover ESM unwrap edge cases in configLoad intent(test-coverage): the nuanced `defaultExport typeof object` check in configLoad was only exercised through real jiti imports — add two mocked jitiImport cases so the wrapped-object unwrap branch and the CJS-with-primitive-`default` pass-through branch are both guarded against regressions	2026-04-18 16:50:40 +09:00
Kazuki Yamada	1e8a5e5e1e	fix(config): Address PR review feedback intent(error-handle): drop the instanceof Error guard in rethrowValidationErrorIfSchemaError so ValiError / ZodError round-tripped through a worker (plain { name, message, issues }) is still recognized — aligns with isError / isRepomixError elsewhere in the file intent(schema-parity): restore splitOutput's upper bound (Number.MAX_SAFE_INTEGER) in the generated JSON schema so editor hints match the previous zod output; also strip the empty required:[] arrays that @valibot/to-json-schema emits on every object node intent(esm-unwrap): only unwrap jiti's .default when it's an object, preserving a CJS config that legitimately exports { default: 'plain', ...rest }; plain Symbol.toStringTag === 'Module' was too narrow — jiti returns non-Module namespace wrappers for .ts / .mts files intent(test-coverage): add tests/shared/errorHandle.test.ts covering the Zod + Valibot + worker-serialized paths through rethrowValidationErrorIfSchemaError; tighten the default-schema assertion in configSchema.test.ts from /expected\|invalid/i to toThrow(v.ValiError) + targeted message pattern decision(path-segment-fallback): return '' for unknown object-shaped path items instead of falling into String(segment) → "[object Object]"; defensive, non-breaking	2026-04-18 15:47:11 +09:00
Kazuki Yamada	574abd4c44	perf(config): Migrate configSchema from zod to valibot intent(cold-start): evaluate valibot as a simpler alternative to PR #1484's async deferral of zod decision(validation-library): valibot v1 picked for smaller module graph and tree-shaking friendliness decision(error-handling): rename rethrowValidationErrorIfZodError to rethrowValidationErrorIfSchemaError and duck-type both ZodError and ValiError so the helper stays library-agnostic constraint(module-unwrap): jiti.import returns an ESM Module namespace even with interopDefault — zod v4 quietly unwrapped it, valibot does not, so configLoad manually picks .default learned(zod-v4-modules): z.object().parse(moduleNamespace) reaches into .default automatically — not documented prominently and was masking the jiti interop gap learned(cold-start-impact): ~10ms (~6%) faster startup in hyperfine A/B — below the -15ms bar from the task spec, but worth keeping as a branch for the simplicity win	2026-04-18 15:34:36 +09:00

1 2 3 4 5 ...

582 Commits