Commit Graph

141 Commits

Author SHA1 Message Date
autofix-ci[bot] 371927e211 [autofix.ci] apply automated fixes 2026-05-09 01:07:48 +00:00
Kazuki Yamada 1d2df3bea3 test(search): Update fileSearch tests for prescan-based ignore collection
Add fs.readdir mock (returning empty array) to all relevant beforeEach
blocks so collectIgnoreFilePatterns does not fail with "entries is not
iterable". Update globby option assertions to reflect gitignore: false
and ignoreFiles: [] now that patterns are pre-collected by the prescan.

https://claude.ai/code/session_01Fm25x51fmGGeFMJyCm1CER
2026-05-09 10:06:46 +09:00
Kazuki Yamada 3ff49306e1 refactor(file): Simplify cheap pre-screen down to NULL probe + BOM exemption
intent(simpler): 前 commit の cheap pre-screen は `isbinaryfile@5.0.2` の `isBinaryCheck` のうち valid UTF-8 でも binary 判定する 3 規則 (PDF magic / NULL / suspicious 制御バイト比率 >10%) を mirror していたが、`TextDecoder('utf-8', { fatal: true })` を `isBinaryFile` の前に動かしている時点で valid UTF-8 buffer は protobuf detector に渡らない。pathological case 回避という主目的は TextDecoder の reorder だけで達成しており、PDF magic と suspicious-byte ratio の mirror は (1) 実害ほぼゼロのエッジケースを救うだけで (2) `isbinaryfile` 内部実装への coupling を抱える、という割に合わない構成だった。

fix(simplify): cheap pre-screen を NULL-byte probe + BOM exemption の最小構成に縮小。

decision(keep-null-probe): NULL byte だけは独立した正当な理由で残す — `U+0000` は **XML 1.0 で不正な文字** で、本ツールの主出力フォーマット (XML) の正当性を破壊する。`TextDecoder` は `0x00` を valid UTF-8 (U+0000) として通すので、ここで弾かないと NULL を含む buffer が text として pack され、downstream の XML parser が落ちる。これは `isbinaryfile` の rule mirror ではなく、repomix 自身の出力 robustness 要件。

decision(drop-pdf-magic): PDF は `is-binary-path` の `.pdf` 拡張子で先に弾かれる。拡張子なしの ASCII-only PDF stub は実例ほぼゼロ (本物の PDF は cross-ref とバイナリストリームを内包し UTF-8 decode で失敗する経路を通る)。守る価値が低い。

decision(drop-suspicious-ratio): 純粋な C0 制御バイト高比率の valid UTF-8 buffer は実プロジェクトに存在しない。`isbinaryfile` の UTF-8 lookahead を完全 mirror する保守コスト (DEL boundary 等のドリフトリスク) > 効用。

constraint(coupling-minimal): NULL byte は universal な binary signal で、`isbinaryfile` の rule に縛られない。同 BOM exemption も標準 BOM の規格に準拠したもので upstream ドリフトの影響を受けない。

test(cleanup): 関連 regression test 3 件を削除 (PDF magic / suspicious ratio / DEL boundary)。これらは削除した規則の挙動を保証するもので、現実装ではすべて意図的に「text として pack」する。残るのは UTF-8 multi-byte / UTF-8 BOM+NULL / UTF-16 LE BOM の 3 件で、いずれも本 PR が回避したい pathological / regression を直接守る。

bench(no-regression, M-series Mac, hyperfine --warmup 1 --runs 5):
- `node bin/repomix.cjs --quiet`: 399ms → 418ms (誤差範囲、JS の手書き 512-byte loop が消えた分の差は noise floor)
- 出力差分: 既存の base に対して 0 ファイル削除、Korean md が +1 (silent drop 解消、変更なし)
- fileRead.ts: 175 → 159 行 (-16 行)。pre-screen 関連で実質 ~35 行削減。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 23:11:29 +09:00
Kazuki Yamada 0248b61085 fix(file): Include UTF-8 BOM in cheap pre-screen exemption
intent(no-regression): codex re-review で指摘 — BOM 例外関数 `hasNonUtf8TextBom` が UTF-16/UTF-32/GB18030 のみで UTF-8 BOM (`EF BB BF`) を含めていなかった。`isbinaryfile@5.0.2` の `isBinaryCheck` は UTF-8 BOM を見た瞬間に `false` を返すため、`EF BB BF 00 41` のような buffer は変更前は text として fast path に流れていた。今回の差分では UTF-8 BOM 後の NULL byte が cheap probe で先に拾われ binary 判定される regression。

fix(utf8-bom-exempt): 関数を `hasTextBom` に rename し、UTF-8 BOM (`EF BB BF`) の判定を最初に追加。`isbinaryfile` 本家と同じ並びで BOM 例外を持たせ、cheap probe を skip して UTF-8 fast path に到達させる。

constraint(test-source-non-binary): codex iter2 指摘 — 新規 BOM exemption テストの期待値に raw NUL byte literal を埋めると `fileRead.test.ts` 自身が `grep`/`rg` から binary 扱いになり、ファイル横断検索や CI のテキスト走査ジョブから不可視化される。`'\0A'` エスケープ表記に置換して、実行時の文字列 (char codes 0, 65) は同じまま source は ASCII に戻した。

test(regression): 1 件追加。
- `EF BB BF 00 41` (UTF-8 BOM + NULL + 'A') が `binary-content` で skip されず、UTF-8 fast path で `'\0A'` として decode されることを確認。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 20:41:30 +09:00
Kazuki Yamada 6e81c449b6 fix(test): Drop hyphen from "mis-classified" to satisfy typos check
intent(ci-green): typos@1.45.1 flags `mis` as a typo of `miss`/`mist` and the
hyphenated `mis-classified` in the new PDF-magic regression test comment
trips the `Check typos` job. The unhyphenated `misclassified` is the more
common spelling and passes the dictionary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 20:32:38 +09:00
Kazuki Yamada 5ab82d5152 fix(file): Mirror isbinaryfile's PDF magic + suspicious-byte ratio rules in cheap pre-screen
intent(no-regression): codex review で指摘 — 元の差分は cheap pre-screen を NULL-byte probe + UTF-16/UTF-32 BOM 例外のみで構成していたが、`isbinaryfile@5.0.2` の `isBinaryCheck` には valid UTF-8 でも binary 判定する規則が他に2つある: (1) 先頭 5 バイトが `%PDF-` (PDF magic), (2) 先頭 512 バイト中の suspicious 制御バイト比率 >10%。これらを cheap pre-screen に含めないと、`%PDF-` 始まりの拡張子なし/`.txt` ファイルや、ASCII 制御文字が高比率の valid UTF-8 ファイルが従来 skip されていたのに pack に含まれる回帰が発生する。

fix(pdf-magic): UTF-16/UTF-32 BOM 例外の後、NULL probe の前に `%PDF-` 判定を追加。`isbinaryfile` 本家と同じ位置。

fix(suspicious-ratio): 既存の NULL probe ループに suspicious カウンタを追加し、ループ後に `>10%` 閾値で binary 判定。suspicious 集合は `isbinaryfile` の `(b < 7 || b > 14) && (b < 32 || b > 127)` 条件を valid-UTF-8 入力に絞って簡略化したもの: `b < 7` または `b in 0x0F..0x1F`。これは valid UTF-8 multi-byte の continuation/lead bytes (0x80..0xFF) と排他なので、UTF-8 awareness なしの flat byte scan で正しい結果になる。protobuf 検出器 (`isBinaryProto`) は意図的に mirror しない — それが本 PR で回避している pathological case 本体。

constraint(del-boundary): codex 再レビュー指摘 — 当初 0x7F (DEL) を suspicious 集合に含めていたが、`isbinaryfile` の条件 `b < 32 || b > 127` は 127 を排除する。修正版では `b === 0x7f` を外し、コメントも本家挙動に合わせて訂正。

test(regression): 3 件追加。
- valid-UTF-8 PDF magic (`%PDF-1.4\n...`) を `binary-content` で skip することを確認
- 64 バイトの 0x01 のみの buffer (suspicious 100%) を `binary-content` で skip することを確認
- 64 バイトの 0x7F のみの buffer (valid UTF-8, DEL 100%) は **skip しない** ことを境界として固定

bench(no-perf-regression, M-series Mac, hyperfine --warmup 1 --runs 5):
- `node bin/repomix.cjs --quiet`: 406ms → 399ms (誤差範囲)
- pre-screen の追加コストは 512 バイト線形スキャン分で、UTF-8 fast path (元から TextDecoder で全バッファ走査) に比べて無視できる

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 16:15:44 +09:00
Kazuki Yamada 8a08815ba1 perf(file): Try UTF-8 decode before isBinaryFile to dodge protobuf-detector pathological case
intent(latency): `node bin/repomix.cjs` がリポジトリ自身を pack する際の wall-clock が PR #1533 (`docs(website): Add localized page metadata`, commit 9bd663ae) で 0.38s → 1.15s に約3倍に増えた。9bd663ae は 14言語 × 24ページ = 336個の md に YAML frontmatter (`title` + `description`) を追記しただけで、出力サイズはほぼ変わらないのに実行時間だけが伸びていた。

root-cause(isbinaryfile): `readRawFile` は全ファイルの buffer を `isBinaryFile` (= `isbinaryfile` パッケージ) に通してから UTF-8 fast path に進む。`isbinaryfile` の `isBinaryCheck` は protobuf-shape 検出器 (`isBinaryProto`) を含み、これが任意ファイル先頭バイトを varint として解釈し `new Array(varint)` で配列を確保する。一部の正当な UTF-8 バイト列ではこのループが数秒スピン or `RangeError: Invalid array length` を投げる。具体例: `website/client/src/ko/guide/tips/best-practices.md` (4,243 bytes, valid UTF-8 韓国語 md) は単独で `isBinaryFile` 呼び出しに ~3,500ms かかり最終的に throw → 外側の try/catch で握り潰され `encoding-error` で silent drop されていた。デフォルトの pack 時はこの 1 ファイルだけで毎回 ~3,500ms を払っていた。これは upstream `isbinaryfile` のバグ (信頼できない入力で `new Array(n)` を bound せず確保) だが、修正を待たずに自衛する。

fix(reorder): `isBinaryFile` を UTF-8 fast path の **後** に動かし、UTF-8 として decode 失敗した buffer のみに適用する。NULL バイト (= U+0000、valid UTF-8) を含むバイナリは UTF-8 fast path を素通りしてしまうため、`isBinaryFile` の前に 512 バイトの cheap な NULL-byte probe を挿入。NULL は最強のバイナリシグナルかつ `isBinaryCheck` のうち UTF-8 fatal decode を通過する入力に triggers する唯一の規則。残りの heuristics (PDF magic / suspicious-byte ratio / protobuf shape) は非 UTF-8 バイト列を要求するので、UTF-8 fast path に乗らないファイルだけが従来通り `isBinaryFile` に渡る。

constraint(utf16-utf32-bom): UTF-16 LE は ASCII `A` を `0x41 0x00` と encode し、UTF-32 BE BOM は `0x00 0x00 0xFE 0xFF` で始まる。NULL probe をそのまま走らせるとこれらの text ファイルを binary 誤判定する。`isbinaryfile` 自身の `isBinaryCheck` は BOM 例外を持っているので、`hasNonUtf8TextBom` でこれを mirror し、UTF-16/UTF-32 BOM 始まりの buffer は probe を skip して slow path (jschardet+iconv) にそのまま落とす。挙動は pre-change と同一。

side-effect(restored-file): 上の Korean Markdown ファイルは throw → silent drop されて出力から消えていたが、本修正後は正しく出力に含まれる。

test(regression): `tests/core/file/fileRead.test.ts` に2件追加。
- valid UTF-8 multi-byte (Hangul 3-byte 連続、NULL なし) を text としてそのまま round-trip
- UTF-16 LE BOM ファイル ("Hello\n") が NULL を含んでも slow path で正しく decode

bench(local, M-series Mac, hyperfine --warmup 1 --runs 5):
- `node bin/repomix.cjs --quiet` 全体: 1152ms → **406ms** (約 2.8× 速)
- `--include 'website/client/src/ko/guide/tips/best-practices.md'` 単独: 880ms → **170ms** (約 5.2× 速)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 15:56:10 +09:00
Claude 1bef40ba2d test: Tighten misleading test names and pin packageManager guard
Address two review threads on PR #1518 that flagged tests whose titles
overstated what was being verified.

- fileProcess: the longBase64 string is one continuous line, so the
  truncateBase64 → removeEmptyLines ordering was never actually under
  test (truncateBase64Content's regex does not span newlines). Rename
  to describe the combined behavior the test really pins.
- skillTechStack: rename the per-directory case to reflect that root
  and subpackage land in separate buckets keyed by getDirPath, and
  add a second case with two package.json entries at the same path
  to genuinely exercise the parsed.packageManager && !result.packageManager
  guard at skillTechStack.ts:541.
2026-04-26 13:24:32 +00:00
Kazuki Yamada 402e4906d7 test: Pin v1.14.0 regression-prone invariants
Targeted regression tests for the high-risk areas identified in the
v1.13.1..main audit, focusing on silent-correctness bugs and parallel
error handling — places that wouldn't surface in CI but would in the
field.

- core/metrics/calculateMetrics: pin numeric equivalence between the
  fast path (Σ file tokens + wrapper tokens) and the slow path (full
  output tokenization). Cover wrapper-extraction fallback, split-output
  fallback, and worker pool cleanup when fileMetrics rejects.

- core/file/fileProcess: pin transform ordering invariants —
  removeComments → removeEmptyLines (blank lines from comment removal
  must be cleaned up; preserved when removeEmptyLines is off);
  truncateBase64 → removeEmptyLines (multi-line base64 squashed first);
  trim → showLineNumbers (no leading/trailing blanks numbered).
  Plus worker/lightweight path parity for inputs that don't need
  worker processing.

- core/packager: pin metrics worker pool cleanup on parallel branch
  failures (validateFileSafety, produceOutput, calculateMetrics, warmup
  rejection). Verify prefetchSortData failure is isolated and does not
  block sortOutputFiles.

- core/skill/skillTechStack: cover untested fix-commit invariants —
  root entry sorts first in monorepo output; configFiles deduplicated
  within a directory; first-seen packageManager wins per directory.
2026-04-26 19:43:53 +09:00
Kazuki Yamada cbdfc29b4d test: Cover error/edge paths in core (output, file, security, treeSitter)
Lift the four most impactful uncovered files past 90% lines without
introducing fragile or contrived tests. Each block targets real
user-facing branches (error handling, optional features, init/dispose).

- core/output/outputGenerate (78% -> ~90%):
  - buildOutputGeneratorContext: instructionFilePath success and missing-file
    paths; pre-computed vs. searchFiles fallback for empty directories;
    full-tree mode (success and listing failure); searchFiles failure wrap.
  - generateOutput: unsupported style throws RepomixError.

- core/security/validateFileSafety (79% -> ~95%):
  - logSuspiciousContentWarning loop: header line per section, plus
    singular ("issue") and plural ("issues") suffix per result.
  - No-op behavior when no suspicious git diff/log entries exist.

- core/file/fileSearch (88% -> ~92%):
  - handleGlobbyError: EPERM and EACCES translated to PermissionError;
    other error codes pass through.
  - Outer catch: generic Error wrapped with directory context;
    non-Error throw produces the generic fallback message.

- core/treeSitter/languageParser (74% -> ~88%):
  - getResources before init() throws RepomixError.
  - init() is idempotent (Parser.init is called only once across two calls).
  - Parser.init() failure is wrapped as RepomixError.
  - dispose() resets state so subsequent calls require re-init.

Coverage:
- Statements 89.51% -> 90.23%
- Branches   79.31% -> 80.26%
- Functions  89.37% -> 89.69%
- Lines      90.06% -> 90.80%
2026-04-26 19:35:00 +09:00
Kazuki Yamada 9aac452504 test: Raise overall coverage from 87.9% to 90.1%
Cover previously-untested paths across the shared, cli, core, and mcp
layers, focusing on branches that represent real user-facing behavior
rather than line-coverage chasing.

Highlights:
- shared/errorHandle: cover handleError (RepomixError, unexpected Error,
  unknown values, duck-typed worker errors, debug-level branches) and
  the three error class constructors.
- shared/logger: cover setLogLevelByWorkerData for env-var, workerData
  (array and object shapes), and invalid/missing inputs.
- shared/memoryUtils: add a fresh test file covering stats, log helpers,
  and withMemoryLogging success/error paths.
- shared/processConcurrency: cover cleanupWorkerPool (Node, Bun-skip,
  swallowed teardown errors) and the run/cleanup delegation.
- shared/unifiedWorker: cover the cache-hit path and the workerData
  (array/object) and REPOMIX_WORKER_TYPE detection branches.
- core/metrics/TokenCounter: cover the catch branch (Error,
  non-Error throws, with/without filePath).
- core/file/fileManipulate: cover removeEmptyLines on inherited base
  and composite manipulators.
- cli/cliReport: cover skill-directory and split-output summary lines.
- mcp/tools/packRemoteRepositoryTool: add tests mirroring the
  packCodebaseTool pattern (success, runCli failure, runCli throw,
  workspace creation failure).
- mcp/tools/fileSystemReadDirectoryTool: switch to mocking
  node:fs/promises so existing mocks actually intercept calls, and
  cover the file-vs-dir, listing, empty-directory, and readdir-error
  paths.

Result:
- Statements 87.29% -> 89.51%
- Branches   76.16% -> 79.31%
- Functions  87.60% -> 89.37%
- Lines      87.89% -> 90.06%
2026-04-26 19:28:09 +09:00
Kazuki Yamada 44db93451d test(core): Cover precondition guards in truncateBase64
intent(truncateBase64-tests): add explicit coverage for the new fast-path guards introduced alongside the regex-skip optimization
decision(test-cases): focus on the four cases that exercise guard behavior not previously asserted — empty input, exactly-below-threshold (255 chars), run-reset on non-base64 separator, and non-base64 data URI without `;base64,`

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 18:09:02 +09:00
Claude 4bd257e280 perf(core): Speed up empty-directory detection in file search
Merge the two separate globby traversals used by `searchFiles` into a
single one and parallelize the per-directory `readdir` calls used to
filter empty directories.

Background
----------
When `output.includeEmptyDirectories` is enabled (the default for
`repomix.config.json` in this repo, and any repo that wants an accurate
directory tree), `searchFiles` previously walked the working tree twice:
once with `onlyFiles: true` and a second time with `onlyDirectories:
true`. Each call re-traversed the tree and re-parsed every `.gitignore`
/ `.repomixignore` file. `findEmptyDirectories` then issued `readdir`
serially for every matched directory, awaiting each syscall before
starting the next.

Change
------
* Replace the two globby invocations with one `objectMode: true,
  onlyFiles: false` call. Partition the returned `GlobEntry[]` by
  `dirent.isFile()` / `dirent.isDirectory()`, matching the previous
  `onlyFiles: true` semantics for symlinks and other non-file
  non-directory entries.
* Rewrite `findEmptyDirectories` to run the per-directory `readdir`
  checks concurrently via `Promise.all`. Ordering is preserved by the
  result array and the caller sorts the final list anyway.
* When `includeEmptyDirectories` is disabled, keep the fast
  `onlyFiles: true` path unchanged so the default CLI run pays no cost.

Benchmark (hyperfine, repomix packing itself, 30 runs, warmup 3)
----------------------------------------------------------------
Run 1: baseline 2.162s ± 0.042s → perf 2.017s ± 0.029s  → -145ms (-6.7%)
Run 2: baseline 2.161s ± 0.023s → perf 2.030s ± 0.027s  → -131ms (-6.1%)

Per-stage verbose timings:
  baseline: [globby files 200ms] + [globby dirs 85ms] + [empty dirs 61ms]
  perf:     [combined globby    223ms]                + [empty dirs 66ms]
  saved:    -57ms consistently on the critical path
2026-04-26 16:41:38 +09:00
Kazuki Yamada 6fecdca6b3 test(core): Add combined worker + lightweight pipeline integration test
Add test that exercises all transforms together: removeComments (worker)
+ truncateBase64 + removeEmptyLines + showLineNumbers (lightweight) to
verify the full two-phase pipeline produces correct output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 18:27:55 +09:00
Kazuki Yamada f1067163ec refactor(core): Simplify into single applyLightweightTransforms and remove redundant trim
Merge applyPreCompressTransforms and applyPostCompressTransforms into
a single applyLightweightTransforms function. Move truncateBase64 to
post-worker phase since tree-sitter handles string literals as single
AST nodes regardless of content size.

Remove redundant trim from worker processContent — the main thread
applyLightweightTransforms already handles it.

Final pipeline:
  Worker: removeComments → compress
  Main:   truncateBase64 → removeEmptyLines → trim → showLineNumbers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 18:01:22 +09:00
Kazuki Yamada 47e4a65b61 fix(core): Move removeEmptyLines to post-compress to preserve ordering
Move removeEmptyLines from applyPreCompressTransforms to
applyPostCompressTransforms so it runs after removeComments.
This ensures empty lines created by comment removal are cleaned up.

Transform order: truncateBase64 (pre) → [removeComments → compress] (worker) → removeEmptyLines → trim → showLineNumbers (post)

Simplify applyPreCompressTransforms to only handle truncateBase64
with an early return when disabled.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 17:57:00 +09:00
Kazuki Yamada cac35d0465 fix(core): Preserve transform order by splitting into pre/post compress phases
Split applyLightweightTransforms into applyPreCompressTransforms and
applyPostCompressTransforms to preserve the original execution order:
truncateBase64 → removeComments → removeEmptyLines → trim → compress → showLineNumbers

Pre-compress transforms (truncateBase64, removeEmptyLines) must run
before tree-sitter parsing to avoid performance regression with large
base64 strings and to ensure empty line removal affects chunk merging.

Action: split lightweight transforms into pre-compress and post-compress phases
Why: previous refactor changed execution order, causing tree-sitter to receive
untreated base64 and content with empty lines, altering compress output

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 17:40:11 +09:00
Kazuki Yamada e978decb2b test(core): Add regression tests for base64 truncation and lastIndex safety
Add test for consecutive truncateBase64Content calls to verify global
regex lastIndex reset works correctly. Add test for truncateBase64
config branch in applyLightweightTransforms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 17:33:54 +09:00
Kazuki Yamada 3e70628307 refactor(core): Separate lightweight transforms from worker processing
Extract lightweight file transforms (truncateBase64, removeEmptyLines,
trim, showLineNumbers) into applyLightweightTransforms() on the main
thread, keeping only heavy operations (removeComments, compress) in
worker processContent(). This eliminates dual management of the same
logic across worker and main thread paths.

Also pre-compile base64 regex patterns at module level to avoid
re-creation per file call.

Action: split processContent into heavy (worker) and lightweight (main thread) phases
Action: extract applyLightweightTransforms() as single source of truth for lightweight ops
Action: hoist regex patterns in truncateBase64.ts to module scope with lastIndex reset
Why: lightweight transforms were duplicated in both processFilesMainThread and processContent
Why: regex re-compilation per file added unnecessary overhead for large repos

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 17:24:32 +09:00
Kazuki Yamada e2101a50d1 test(core): Add boundary test for exactly 256-char base64 string
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 22:59:45 +09:00
Kazuki Yamada 25ec27028e fix(core): Reduce false positives in truncateBase64 for path-like strings
Raise MIN_BASE64_LENGTH_STANDALONE from 60 to 256 since truncating short
strings saves negligible tokens. Require digits in isLikelyBase64 heuristic
since real base64-encoded binary data virtually always contains numbers,
while XPath and file path strings typically do not.

Closes #1298

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 22:52:33 +09:00
Kazuki Yamada f38828aa90 fix(test): Use path.sep in fileSearch test for cross-platform compatibility
Mock data and expected sort order now use path.sep instead of hardcoded
'/' separators. On Windows, path.sep is '\' so sortPaths splits
differently, producing a different sort order.

Co-Authored-By: Claude Opus 4.6 (1M context) <koukun0120@gmail.com>
2026-03-20 01:16:01 +09:00
Kazuki Yamada f5977b2e6a test(core): Use exact sorted order assertion in fileSearch test
Replace weak arrayContaining assertion with exact toEqual using the
correct sorted order, so the test verifies both content and sort behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <koukun0120@gmail.com>
2026-03-20 01:08:03 +09:00
yamadashy d9fa509ee6 perf(core): Optimize sortPaths with decorate-sort-undecorate pattern
Pre-compute path.split() once per path before sorting, avoiding
O(N log N) repeated string allocations during comparisons.
Benchmark: 10,000 files 65ms → 11ms (6x speedup).

Co-Authored-By: Claude Opus 4.6 (1M context) <koukun0120@gmail.com>
2026-03-20 00:54:33 +09:00
Kazuki Yamada e97691dd36 perf(core): Replace worker threads with promise pool for file collection
After the UTF-8 fast path optimization eliminated the CPU-heavy jschardet
bottleneck, file collection became I/O-bound. Worker threads now add pure
overhead (Tinypool init, structured clone, IPC) without benefit.

Benchmark (954 files, M2 Pro 10-core):
- Worker Threads: ~108ms → Promise Pool (c=50): ~37ms (2.9x faster)

Changes:
- Replace Tinypool worker dispatch with a simple promise pool (c=50)
- Inject readRawFile via deps for testability
- Remove unused concurrentTasksPerWorker from WorkerOptions
- Simplify tests to use readRawFile mock instead of 5+ module mocks
2026-02-17 23:09:18 +09:00
Kazuki Yamada 7dcdbae24d perf(core): Add UTF-8 fast path to skip expensive jschardet encoding detection
Previously, every file went through jschardet.detect() which scans the entire
buffer through multiple encoding probers (MBCS, SBCS, Latin1) with frequency
table lookups — the most expensive CPU operation in file collection.

Since ~99% of source code files are UTF-8, we now try TextDecoder('utf-8',
{ fatal: true }) first. If it succeeds, jschardet and iconv are skipped entirely.
Non-UTF-8 files (e.g., Shift-JIS, EUC-KR) fall back to the original detection path.

Additionally, set concurrentTasksPerWorker=3 for fileCollect workers to better
overlap I/O waits within each worker thread.

Benchmark results (838 files, 10 CPU cores):
- Before: ~616ms
- After:  ~108ms (5.7x faster)
2026-02-17 23:09:18 +09:00
Kazuki Yamada f41b75c560 fix(test): fix lint errors and update test signatures for filePathsByRoot
- Remove unused imports (generateFileTree, treeToString) in fileTreeGenerate.test.ts
- Add filePathsByRoot parameter to generateOutput and produceOutput calls in tests
- Update expect assertions to include filePathsByRoot argument
2026-01-04 23:11:28 +09:00
spandan-kumar 3f2680e5d5 feat(tree): add multi-root directory labels
When packing multiple directories, the directory tree output now shows
labeled sections like [cli]/, [config]/ to clarify which files belong
to which root directory.

- Add FilesByRoot interface and generateTreeStringWithRoots function
- Update output pipeline to pass file-to-root mapping
- Add unit tests for new tree generation functions
- Update existing tests for new function signatures

Closes #1023
2026-01-04 22:57:10 +09:00
Kazuki Yamada cdae79d115 fix(file): Replace strip-comments with @repomix/strip-comments
Replace the original strip-comments package with @repomix/strip-comments,
which provides enhanced support for:
- Go directives (//go:build, //go:generate, etc.)
- C++ document comments (///)
- Python docstrings (""" and ''') and hash comments

This removes the custom GoManipulator, PythonManipulator, and CppManipulator
implementations in favor of the improved library support.

Note: preserveNewlines option keeps newlines for line number preservation,
so docstrings are replaced with empty lines rather than being fully removed.
2025-12-15 00:15:17 +09:00
Kazuki Yamada 47398ae820 test(file): Add test for legitimate U+FFFD character handling
Verify that files containing intentional U+FFFD characters in the source
are correctly read (not skipped), testing the TextDecoder validation path.
2025-12-14 19:44:47 +09:00
Kazuki Yamada c4354e7745 fix(file): improve U+FFFD detection for UTF-8 encoding
- Use TextDecoder('utf-8', { fatal: true }) to distinguish actual decode
  errors from legitimate U+FFFD characters in UTF-8 files
- Change test temp directory from tests/fixtures to os.tmpdir() to avoid
  clobbering committed fixtures and reduce parallel-run collisions
- Non-UTF-8 files still use iconv.decode() fallback behavior

Addresses CodeRabbit review comments on PR #1007
2025-12-14 18:56:34 +09:00
Kazuki Yamada 72b27e4c9f fix(file): remove jschardet confidence check for encoding detection
Remove the confidence < 0.2 check that was causing valid UTF-8/ASCII files
to be incorrectly skipped. Files are now only skipped if they contain actual
decode errors (U+FFFD replacement characters).

This fixes issues where:
- Valid Python files were skipped with confidence=0.00 (#869)
- HTML files with Thymeleaf syntax (~{}) were incorrectly detected as binary (#847)

The isbinaryfile library (added in PR #1006) now handles binary detection more
accurately, making the confidence-based heuristic unnecessary.

Fixes #869
2025-12-14 18:44:48 +09:00
Kazuki Yamada 7f0d05d703 feat(core): Replace istextorbinary with is-binary-path and isbinaryfile
Migrate from istextorbinary (last updated 2023-12) to actively maintained packages:
- is-binary-path: Extension-based binary detection (updated 2024-04)
- isbinaryfile: Content-based binary detection with zero dependencies (updated 2025-12)

Improvements:
- Binary extension coverage: 13 → 262 extensions (~20x increase)
- Content detection: Better UTF-16/CJK support, statistical analysis (512 bytes vs 72 bytes)

The two-stage detection logic (extension check → content check) is preserved.
2025-12-14 18:03:43 +09:00
Kazuki Yamada b99d08398f test(core): Add parent directory ignore file tests
Add comprehensive tests for parent directory ignore file handling to address
PR #964 review feedback (Risk: High concern from claude[bot]).

Added three new test cases:

1. Parent directory .ignore file handling
   - Verifies .ignore files in parent directories are respected
   - Tests with useDotIgnore: true configuration
   - Ensures patterns apply to nested subdirectories

2. Parent directory .repomixignore file handling
   - Verifies .repomixignore files in parent directories are respected
   - Tests default configuration (.repomixignore always enabled)
   - Ensures patterns apply to nested subdirectories

3. Git worktree + parent .gitignore interaction
   - Verifies worktree environments handle parent .gitignore correctly
   - Combines worktree detection with parent .gitignore pattern application
   - Tests that .git file (not directory) is properly handled in worktree
   - Ensures gitignore: true option enables parent .gitignore handling

All tests follow the same pattern as existing "should respect parent directory
.gitignore patterns (v16 behavior)" test, providing consistent coverage for
.gitignore, .ignore, and .repomixignore files.

These tests ensure that globby v16's parent directory ignore file handling
works correctly for all supported ignore file types, not just .gitignore.
2025-11-24 19:11:18 +09:00
Kazuki Yamada dd25beccfd test(core): Add type guards to globby options in tests
Fix TS18048 errors in createBaseGlobbyOptions consistency tests by adding
expect(options).toBeDefined() and if (!options) continue guards. This ensures
type safety and prevents undefined access to globby call options.

All three tests now properly guard against potentially undefined options:
- should use consistent base options across all globby calls
- should respect gitignore config consistently across all functions
- should apply custom ignore patterns consistently across all functions

This addresses the coderabbitai feedback on PR #964.
2025-11-24 17:44:58 +09:00
Kazuki Yamada f0d8de48ca refactor(core): Address PR feedback for globby v16 update
This commit addresses three suggestions from AI code review bots on PR #964:

1. Remove unnecessary array spreads in createBaseGlobbyOptions
   - Removed defensive copying of ignorePatterns and ignoreFilePatterns
   - Arrays are already created fresh in calling functions, making spreads redundant
   - Minor performance optimization by avoiding unnecessary array allocations

2. Extract prepareIgnoreContext helper function
   - Centralized duplicate ignore pattern preparation logic
   - Eliminated code duplication across searchFiles, listDirectories, and listFiles
   - The new helper handles:
     * Getting ignore patterns and ignore file patterns
     * Normalizing patterns for consistent trailing slash handling
     * Git worktree special case handling
   - Improves maintainability and ensures consistency across all globby calls

3. Add explanatory comment to v16 behavior test
   - Documented why v16's behavior is superior (matches Git's standard behavior)
   - Clarifies that v16 respects parent directory .gitignore files
   - Helps future maintainers understand the intentional breaking change

All 856 tests pass with no regressions.
2025-11-24 17:44:58 +09:00
Kazuki Yamada c9d296eec6 chore: Fix linting errors
- Add website/server/dist/ to .gitignore for secretlint
- Fix TypeScript type errors in fileSearch.test.ts
- Format imports in fileSearch.ts (biome)
2025-11-24 17:44:58 +09:00
Kazuki Yamada 4b2d8c12d0 test(core): Add regression tests for globby v16 update
- Add test for parent directory .gitignore pattern handling (v16 behavior)
- Add tests for createBaseGlobbyOptions consistency across all functions
- Verify gitignore option is passed correctly to all globby calls
- Ensure no regression from v15 to v16 upgrade

These tests prove that:
1. Parent .gitignore files are respected with globby v16
2. All 4 globby calls (searchFiles files/dirs, listDirectories, listFiles)
   use consistent base options
3. gitignore configuration is applied uniformly across all functions

All 856 tests pass, confirming no regression from the changes.
2025-11-24 17:44:58 +09:00
Kazuki Yamada 3e410ce4dd feat(core): Improve .gitignore handling with globby v16
- Upgrade globby from v15 to v16
- Use gitignore option to respect parent directory .gitignore files
- This matches Git's standard behavior where parent .gitignore patterns apply to subdirectories
- Move .gitignore handling from ignoreFiles to gitignore option
- Update tests to reflect the new behavior

This change improves compatibility with Git and provides more accurate file filtering when running Repomix in subdirectories.
2025-11-24 17:44:58 +09:00
Kazuki Yamada 44d172bcb9 fix(core): Correct .ignore file priority order
Fixed the priority order of ignore files to match the intended behavior:
- .gitignore (lowest priority)
- .ignore (medium priority)
- .repomixignore (highest priority)

The previous implementation had .repomixignore at the lowest priority,
which was incorrect. Repomix-specific ignore rules should take precedence
over generic ignore files.

This ensures that:
1. .repomixignore can override .ignore and .gitignore rules
2. .ignore can override .gitignore rules
3. The priority order documented in README is correctly implemented
2025-11-08 19:45:58 +09:00
Kazuki Yamada bb7fae2b45 feat(core): Add .ignore file support
This PR adds support for .ignore files, which are used by tools like ripgrep and the silver searcher. This allows users to maintain a single .ignore file that works across multiple tools instead of maintaining separate ignore files.

Changes:
- Add ignore.useDotIgnore config option (default: true)
- Add --no-dot-ignore CLI flag to disable .ignore file usage
- Update ignore file priority: .repomixignore > .ignore > .gitignore > default patterns
- Add comprehensive tests for .ignore file handling
- Update documentation to reflect new .ignore file support

The .ignore file is enabled by default but can be disabled via configuration or CLI flag, maintaining backward compatibility.

Resolves #937
2025-11-08 15:51:21 +09:00
Kazuki Yamada 72735cfdb1 test(coverage): Improve test coverage for CLI and core modules
Added comprehensive test coverage for critical CLI and core functionality:

- Created new test file for cliSpinner with 15 tests covering:
  * Spinner start/stop/update operations
  * Quiet/verbose/stdout mode handling
  * Success/fail message display
  * Interval management

- Enhanced initAction tests (11→17 tests):
  * Added isCancel handling for user cancellation
  * Added return value validation tests
  * Covered config and ignore file creation flows

- Enhanced cliReport tests (8→15 tests):
  * Added git diffs/logs reporting tests
  * Added security check reporting for git content
  * Added single vs multiple issue message handling

- Enhanced permissionCheck tests (13→16 tests):
  * Added macOS-specific error message tests
  * Added platform-specific error handling tests
  * Added unknown error code handling

- Enhanced outputGenerate tests (7→12 tests):
  * Added git diffs/logs inclusion tests
  * Added JSON format output tests
  * Added file/directory structure exclusion tests

Overall improvements:
- Test count: 804 → 840 (+36 tests)
- Code coverage: 70.63% → 71.00% (+0.37%)
- Branch coverage: 77.64% → 78.55% (+0.91%)
- Significant improvement in CLI modules (cliSpinner: 25% → 59.61%)
2025-10-31 01:18:21 +09:00
Kazuki Yamada ea1cc485c2 chore(config): disable organizeImports for src/index.ts
Added override configuration to disable Biome's organizeImports feature
specifically for src/index.ts to allow manual import order management
while keeping automatic import organization enabled for other files.
2025-09-21 13:54:12 +09:00
Kazuki Yamada f87e00dbdf chore(lint): upgrade biome to v2.2.4 and fix all lint errors
Updated biome from v1.9.4 to v2.2.4 to take advantage of latest linting improvements.

- Upgraded @biomejs/biome from ^1.9.4 to ^2.2.4
- Updated biome.json configuration for v2 compatibility:
  - Changed schema to 2.2.4
  - Updated file includes/ignores syntax
  - Added Vue file overrides to disable noUnusedVariables/noUnusedImports
- Fixed all lint errors:
  - Added radix parameter to parseInt calls
  - Prefixed unused parameters with underscore
  - Removed unused imports
  - Fixed biome suppression comments
  - Removed !important from CSS
  - Added type ignores for Vue component definitions

All 325 files now pass lint with 0 warnings and 0 errors.
2025-09-21 13:39:43 +09:00
Kazuki Yamada 5898d6397c refactor(workers): improve code quality and type safety
Address PR review feedback:
- Fix worker path to use relative path instead of lib directory
- Add proper function overloads for defaultActionWorker
- Remove unsafe type assertions in worker code
- Improve error handling with optional stack property
- Extract log level validation logic to reduce duplication
- Add NaN check for environment variable parsing

All tests pass and linting issues resolved.
2025-09-20 22:11:44 +09:00
Kazuki Yamada 78b25b86e7 feat(core): use direct globby import instead of worker isolation
Replace executeGlobbyInWorker with direct globby calls since worker isolation
is no longer necessary for globby execution.

- Remove src/core/file/globbyExecute.ts wrapper
- Remove src/core/file/workers/globbyWorker.ts
- Update fileSearch.ts to import and use globby directly
- Update tests to mock globby instead of executeGlobbyInWorker
- Simplify integration tests by removing worker mocks
2025-09-18 23:53:27 +09:00
Kazuki Yamada ddd2814f84 fix(tests): Update test mocks to use new WorkerOptions interface 2025-08-31 16:32:49 +09:00
Kazuki Yamada 8f07b63a61 feat(core): Add runtime selection support for worker pools
Add WorkerRuntime type and configurable runtime parameter to createWorkerPool and initTaskRunner functions. This allows choosing between 'worker_threads' and 'child_process' runtimes based on performance requirements.

- Add WorkerRuntime type definition for type safety
- Add optional runtime parameter to createWorkerPool with child_process default
- Add optional runtime parameter to initTaskRunner with child_process default
- Configure fileCollectWorker to use worker_threads for better performance
- Update all test files to use WorkerRuntime type
- Add comprehensive tests for runtime parameter functionality
- Maintain backward compatibility with existing code

The fileCollectWorker now benefits from worker_threads faster startup and shared memory, while other workers continue using child_process for stability.
2025-08-31 16:18:12 +09:00
Kazuki Yamada 575ae2bca4 test(core): Remove misleading Go nested block comments test
Removed the Go nested block comments test case as it was unnecessary
and potentially misleading. Go block comments do not nest according
to the language specification, so testing this behavior is not needed
and could cause confusion about the expected behavior.

The remaining tests adequately cover Go comment parsing functionality.
2025-08-31 00:39:05 +09:00
Kazuki Yamada 23a0f00005 fix(core): Correct Go block comment parsing to match language spec
Go block comments do not nest according to the language specification.
The first */ sequence should close the comment, regardless of any /*
sequences within it. This change removes the blockCommentDepth tracking
and ensures correct parsing behavior for Go code containing sequences
like /* comment with /* nested */ part */.

Updated test expectations to reflect the correct Go language behavior.
2025-08-31 00:05:04 +09:00