Add fs.readdir mock (returning empty array) to all relevant beforeEach
blocks so collectIgnoreFilePatterns does not fail with "entries is not
iterable". Update globby option assertions to reflect gitignore: false
and ignoreFiles: [] now that patterns are pre-collected by the prescan.
https://claude.ai/code/session_01Fm25x51fmGGeFMJyCm1CER
intent(ci-green): typos@1.45.1 flags `mis` as a typo of `miss`/`mist` and the
hyphenated `mis-classified` in the new PDF-magic regression test comment
trips the `Check typos` job. The unhyphenated `misclassified` is the more
common spelling and passes the dictionary.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address two review threads on PR #1518 that flagged tests whose titles
overstated what was being verified.
- fileProcess: the longBase64 string is one continuous line, so the
truncateBase64 → removeEmptyLines ordering was never actually under
test (truncateBase64Content's regex does not span newlines). Rename
to describe the combined behavior the test really pins.
- skillTechStack: rename the per-directory case to reflect that root
and subpackage land in separate buckets keyed by getDirPath, and
add a second case with two package.json entries at the same path
to genuinely exercise the parsed.packageManager && !result.packageManager
guard at skillTechStack.ts:541.
Targeted regression tests for the high-risk areas identified in the
v1.13.1..main audit, focusing on silent-correctness bugs and parallel
error handling — places that wouldn't surface in CI but would in the
field.
- core/metrics/calculateMetrics: pin numeric equivalence between the
fast path (Σ file tokens + wrapper tokens) and the slow path (full
output tokenization). Cover wrapper-extraction fallback, split-output
fallback, and worker pool cleanup when fileMetrics rejects.
- core/file/fileProcess: pin transform ordering invariants —
removeComments → removeEmptyLines (blank lines from comment removal
must be cleaned up; preserved when removeEmptyLines is off);
truncateBase64 → removeEmptyLines (multi-line base64 squashed first);
trim → showLineNumbers (no leading/trailing blanks numbered).
Plus worker/lightweight path parity for inputs that don't need
worker processing.
- core/packager: pin metrics worker pool cleanup on parallel branch
failures (validateFileSafety, produceOutput, calculateMetrics, warmup
rejection). Verify prefetchSortData failure is isolated and does not
block sortOutputFiles.
- core/skill/skillTechStack: cover untested fix-commit invariants —
root entry sorts first in monorepo output; configFiles deduplicated
within a directory; first-seen packageManager wins per directory.
intent(truncateBase64-tests): add explicit coverage for the new fast-path guards introduced alongside the regex-skip optimization
decision(test-cases): focus on the four cases that exercise guard behavior not previously asserted — empty input, exactly-below-threshold (255 chars), run-reset on non-base64 separator, and non-base64 data URI without `;base64,`
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge the two separate globby traversals used by `searchFiles` into a
single one and parallelize the per-directory `readdir` calls used to
filter empty directories.
Background
----------
When `output.includeEmptyDirectories` is enabled (the default for
`repomix.config.json` in this repo, and any repo that wants an accurate
directory tree), `searchFiles` previously walked the working tree twice:
once with `onlyFiles: true` and a second time with `onlyDirectories:
true`. Each call re-traversed the tree and re-parsed every `.gitignore`
/ `.repomixignore` file. `findEmptyDirectories` then issued `readdir`
serially for every matched directory, awaiting each syscall before
starting the next.
Change
------
* Replace the two globby invocations with one `objectMode: true,
onlyFiles: false` call. Partition the returned `GlobEntry[]` by
`dirent.isFile()` / `dirent.isDirectory()`, matching the previous
`onlyFiles: true` semantics for symlinks and other non-file
non-directory entries.
* Rewrite `findEmptyDirectories` to run the per-directory `readdir`
checks concurrently via `Promise.all`. Ordering is preserved by the
result array and the caller sorts the final list anyway.
* When `includeEmptyDirectories` is disabled, keep the fast
`onlyFiles: true` path unchanged so the default CLI run pays no cost.
Benchmark (hyperfine, repomix packing itself, 30 runs, warmup 3)
----------------------------------------------------------------
Run 1: baseline 2.162s ± 0.042s → perf 2.017s ± 0.029s → -145ms (-6.7%)
Run 2: baseline 2.161s ± 0.023s → perf 2.030s ± 0.027s → -131ms (-6.1%)
Per-stage verbose timings:
baseline: [globby files 200ms] + [globby dirs 85ms] + [empty dirs 61ms]
perf: [combined globby 223ms] + [empty dirs 66ms]
saved: -57ms consistently on the critical path
Add test that exercises all transforms together: removeComments (worker)
+ truncateBase64 + removeEmptyLines + showLineNumbers (lightweight) to
verify the full two-phase pipeline produces correct output.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merge applyPreCompressTransforms and applyPostCompressTransforms into
a single applyLightweightTransforms function. Move truncateBase64 to
post-worker phase since tree-sitter handles string literals as single
AST nodes regardless of content size.
Remove redundant trim from worker processContent — the main thread
applyLightweightTransforms already handles it.
Final pipeline:
Worker: removeComments → compress
Main: truncateBase64 → removeEmptyLines → trim → showLineNumbers
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move removeEmptyLines from applyPreCompressTransforms to
applyPostCompressTransforms so it runs after removeComments.
This ensures empty lines created by comment removal are cleaned up.
Transform order: truncateBase64 (pre) → [removeComments → compress] (worker) → removeEmptyLines → trim → showLineNumbers (post)
Simplify applyPreCompressTransforms to only handle truncateBase64
with an early return when disabled.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split applyLightweightTransforms into applyPreCompressTransforms and
applyPostCompressTransforms to preserve the original execution order:
truncateBase64 → removeComments → removeEmptyLines → trim → compress → showLineNumbers
Pre-compress transforms (truncateBase64, removeEmptyLines) must run
before tree-sitter parsing to avoid performance regression with large
base64 strings and to ensure empty line removal affects chunk merging.
Action: split lightweight transforms into pre-compress and post-compress phases
Why: previous refactor changed execution order, causing tree-sitter to receive
untreated base64 and content with empty lines, altering compress output
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add test for consecutive truncateBase64Content calls to verify global
regex lastIndex reset works correctly. Add test for truncateBase64
config branch in applyLightweightTransforms.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract lightweight file transforms (truncateBase64, removeEmptyLines,
trim, showLineNumbers) into applyLightweightTransforms() on the main
thread, keeping only heavy operations (removeComments, compress) in
worker processContent(). This eliminates dual management of the same
logic across worker and main thread paths.
Also pre-compile base64 regex patterns at module level to avoid
re-creation per file call.
Action: split processContent into heavy (worker) and lightweight (main thread) phases
Action: extract applyLightweightTransforms() as single source of truth for lightweight ops
Action: hoist regex patterns in truncateBase64.ts to module scope with lastIndex reset
Why: lightweight transforms were duplicated in both processFilesMainThread and processContent
Why: regex re-compilation per file added unnecessary overhead for large repos
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Raise MIN_BASE64_LENGTH_STANDALONE from 60 to 256 since truncating short
strings saves negligible tokens. Require digits in isLikelyBase64 heuristic
since real base64-encoded binary data virtually always contains numbers,
while XPath and file path strings typically do not.
Closes#1298
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mock data and expected sort order now use path.sep instead of hardcoded
'/' separators. On Windows, path.sep is '\' so sortPaths splits
differently, producing a different sort order.
Co-Authored-By: Claude Opus 4.6 (1M context) <koukun0120@gmail.com>
Replace weak arrayContaining assertion with exact toEqual using the
correct sorted order, so the test verifies both content and sort behavior.
Co-Authored-By: Claude Opus 4.6 (1M context) <koukun0120@gmail.com>
After the UTF-8 fast path optimization eliminated the CPU-heavy jschardet
bottleneck, file collection became I/O-bound. Worker threads now add pure
overhead (Tinypool init, structured clone, IPC) without benefit.
Benchmark (954 files, M2 Pro 10-core):
- Worker Threads: ~108ms → Promise Pool (c=50): ~37ms (2.9x faster)
Changes:
- Replace Tinypool worker dispatch with a simple promise pool (c=50)
- Inject readRawFile via deps for testability
- Remove unused concurrentTasksPerWorker from WorkerOptions
- Simplify tests to use readRawFile mock instead of 5+ module mocks
Previously, every file went through jschardet.detect() which scans the entire
buffer through multiple encoding probers (MBCS, SBCS, Latin1) with frequency
table lookups — the most expensive CPU operation in file collection.
Since ~99% of source code files are UTF-8, we now try TextDecoder('utf-8',
{ fatal: true }) first. If it succeeds, jschardet and iconv are skipped entirely.
Non-UTF-8 files (e.g., Shift-JIS, EUC-KR) fall back to the original detection path.
Additionally, set concurrentTasksPerWorker=3 for fileCollect workers to better
overlap I/O waits within each worker thread.
Benchmark results (838 files, 10 CPU cores):
- Before: ~616ms
- After: ~108ms (5.7x faster)
- Remove unused imports (generateFileTree, treeToString) in fileTreeGenerate.test.ts
- Add filePathsByRoot parameter to generateOutput and produceOutput calls in tests
- Update expect assertions to include filePathsByRoot argument
When packing multiple directories, the directory tree output now shows
labeled sections like [cli]/, [config]/ to clarify which files belong
to which root directory.
- Add FilesByRoot interface and generateTreeStringWithRoots function
- Update output pipeline to pass file-to-root mapping
- Add unit tests for new tree generation functions
- Update existing tests for new function signatures
Closes#1023
Replace the original strip-comments package with @repomix/strip-comments,
which provides enhanced support for:
- Go directives (//go:build, //go:generate, etc.)
- C++ document comments (///)
- Python docstrings (""" and ''') and hash comments
This removes the custom GoManipulator, PythonManipulator, and CppManipulator
implementations in favor of the improved library support.
Note: preserveNewlines option keeps newlines for line number preservation,
so docstrings are replaced with empty lines rather than being fully removed.
- Use TextDecoder('utf-8', { fatal: true }) to distinguish actual decode
errors from legitimate U+FFFD characters in UTF-8 files
- Change test temp directory from tests/fixtures to os.tmpdir() to avoid
clobbering committed fixtures and reduce parallel-run collisions
- Non-UTF-8 files still use iconv.decode() fallback behavior
Addresses CodeRabbit review comments on PR #1007
Remove the confidence < 0.2 check that was causing valid UTF-8/ASCII files
to be incorrectly skipped. Files are now only skipped if they contain actual
decode errors (U+FFFD replacement characters).
This fixes issues where:
- Valid Python files were skipped with confidence=0.00 (#869)
- HTML files with Thymeleaf syntax (~{}) were incorrectly detected as binary (#847)
The isbinaryfile library (added in PR #1006) now handles binary detection more
accurately, making the confidence-based heuristic unnecessary.
Fixes#869
Fix TS18048 errors in createBaseGlobbyOptions consistency tests by adding
expect(options).toBeDefined() and if (!options) continue guards. This ensures
type safety and prevents undefined access to globby call options.
All three tests now properly guard against potentially undefined options:
- should use consistent base options across all globby calls
- should respect gitignore config consistently across all functions
- should apply custom ignore patterns consistently across all functions
This addresses the coderabbitai feedback on PR #964.
This commit addresses three suggestions from AI code review bots on PR #964:
1. Remove unnecessary array spreads in createBaseGlobbyOptions
- Removed defensive copying of ignorePatterns and ignoreFilePatterns
- Arrays are already created fresh in calling functions, making spreads redundant
- Minor performance optimization by avoiding unnecessary array allocations
2. Extract prepareIgnoreContext helper function
- Centralized duplicate ignore pattern preparation logic
- Eliminated code duplication across searchFiles, listDirectories, and listFiles
- The new helper handles:
* Getting ignore patterns and ignore file patterns
* Normalizing patterns for consistent trailing slash handling
* Git worktree special case handling
- Improves maintainability and ensures consistency across all globby calls
3. Add explanatory comment to v16 behavior test
- Documented why v16's behavior is superior (matches Git's standard behavior)
- Clarifies that v16 respects parent directory .gitignore files
- Helps future maintainers understand the intentional breaking change
All 856 tests pass with no regressions.
- Add test for parent directory .gitignore pattern handling (v16 behavior)
- Add tests for createBaseGlobbyOptions consistency across all functions
- Verify gitignore option is passed correctly to all globby calls
- Ensure no regression from v15 to v16 upgrade
These tests prove that:
1. Parent .gitignore files are respected with globby v16
2. All 4 globby calls (searchFiles files/dirs, listDirectories, listFiles)
use consistent base options
3. gitignore configuration is applied uniformly across all functions
All 856 tests pass, confirming no regression from the changes.
- Upgrade globby from v15 to v16
- Use gitignore option to respect parent directory .gitignore files
- This matches Git's standard behavior where parent .gitignore patterns apply to subdirectories
- Move .gitignore handling from ignoreFiles to gitignore option
- Update tests to reflect the new behavior
This change improves compatibility with Git and provides more accurate file filtering when running Repomix in subdirectories.
Fixed the priority order of ignore files to match the intended behavior:
- .gitignore (lowest priority)
- .ignore (medium priority)
- .repomixignore (highest priority)
The previous implementation had .repomixignore at the lowest priority,
which was incorrect. Repomix-specific ignore rules should take precedence
over generic ignore files.
This ensures that:
1. .repomixignore can override .ignore and .gitignore rules
2. .ignore can override .gitignore rules
3. The priority order documented in README is correctly implemented
This PR adds support for .ignore files, which are used by tools like ripgrep and the silver searcher. This allows users to maintain a single .ignore file that works across multiple tools instead of maintaining separate ignore files.
Changes:
- Add ignore.useDotIgnore config option (default: true)
- Add --no-dot-ignore CLI flag to disable .ignore file usage
- Update ignore file priority: .repomixignore > .ignore > .gitignore > default patterns
- Add comprehensive tests for .ignore file handling
- Update documentation to reflect new .ignore file support
The .ignore file is enabled by default but can be disabled via configuration or CLI flag, maintaining backward compatibility.
Resolves#937
Added override configuration to disable Biome's organizeImports feature
specifically for src/index.ts to allow manual import order management
while keeping automatic import organization enabled for other files.
Updated biome from v1.9.4 to v2.2.4 to take advantage of latest linting improvements.
- Upgraded @biomejs/biome from ^1.9.4 to ^2.2.4
- Updated biome.json configuration for v2 compatibility:
- Changed schema to 2.2.4
- Updated file includes/ignores syntax
- Added Vue file overrides to disable noUnusedVariables/noUnusedImports
- Fixed all lint errors:
- Added radix parameter to parseInt calls
- Prefixed unused parameters with underscore
- Removed unused imports
- Fixed biome suppression comments
- Removed !important from CSS
- Added type ignores for Vue component definitions
All 325 files now pass lint with 0 warnings and 0 errors.
Address PR review feedback:
- Fix worker path to use relative path instead of lib directory
- Add proper function overloads for defaultActionWorker
- Remove unsafe type assertions in worker code
- Improve error handling with optional stack property
- Extract log level validation logic to reduce duplication
- Add NaN check for environment variable parsing
All tests pass and linting issues resolved.
Replace executeGlobbyInWorker with direct globby calls since worker isolation
is no longer necessary for globby execution.
- Remove src/core/file/globbyExecute.ts wrapper
- Remove src/core/file/workers/globbyWorker.ts
- Update fileSearch.ts to import and use globby directly
- Update tests to mock globby instead of executeGlobbyInWorker
- Simplify integration tests by removing worker mocks
Add WorkerRuntime type and configurable runtime parameter to createWorkerPool and initTaskRunner functions. This allows choosing between 'worker_threads' and 'child_process' runtimes based on performance requirements.
- Add WorkerRuntime type definition for type safety
- Add optional runtime parameter to createWorkerPool with child_process default
- Add optional runtime parameter to initTaskRunner with child_process default
- Configure fileCollectWorker to use worker_threads for better performance
- Update all test files to use WorkerRuntime type
- Add comprehensive tests for runtime parameter functionality
- Maintain backward compatibility with existing code
The fileCollectWorker now benefits from worker_threads faster startup and shared memory, while other workers continue using child_process for stability.
Removed the Go nested block comments test case as it was unnecessary
and potentially misleading. Go block comments do not nest according
to the language specification, so testing this behavior is not needed
and could cause confusion about the expected behavior.
The remaining tests adequately cover Go comment parsing functionality.
Go block comments do not nest according to the language specification.
The first */ sequence should close the comment, regardless of any /*
sequences within it. This change removes the blockCommentDepth tracking
and ensures correct parsing behavior for Go code containing sequences
like /* comment with /* nested */ part */.
Updated test expectations to reflect the correct Go language behavior.