Add test that exercises all transforms together: removeComments (worker)
+ truncateBase64 + removeEmptyLines + showLineNumbers (lightweight) to
verify the full two-phase pipeline produces correct output.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merge applyPreCompressTransforms and applyPostCompressTransforms into
a single applyLightweightTransforms function. Move truncateBase64 to
post-worker phase since tree-sitter handles string literals as single
AST nodes regardless of content size.
Remove redundant trim from worker processContent — the main thread
applyLightweightTransforms already handles it.
Final pipeline:
Worker: removeComments → compress
Main: truncateBase64 → removeEmptyLines → trim → showLineNumbers
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move removeEmptyLines from applyPreCompressTransforms to
applyPostCompressTransforms so it runs after removeComments.
This ensures empty lines created by comment removal are cleaned up.
Transform order: truncateBase64 (pre) → [removeComments → compress] (worker) → removeEmptyLines → trim → showLineNumbers (post)
Simplify applyPreCompressTransforms to only handle truncateBase64
with an early return when disabled.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split applyLightweightTransforms into applyPreCompressTransforms and
applyPostCompressTransforms to preserve the original execution order:
truncateBase64 → removeComments → removeEmptyLines → trim → compress → showLineNumbers
Pre-compress transforms (truncateBase64, removeEmptyLines) must run
before tree-sitter parsing to avoid performance regression with large
base64 strings and to ensure empty line removal affects chunk merging.
Action: split lightweight transforms into pre-compress and post-compress phases
Why: previous refactor changed execution order, causing tree-sitter to receive
untreated base64 and content with empty lines, altering compress output
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add test for consecutive truncateBase64Content calls to verify global
regex lastIndex reset works correctly. Add test for truncateBase64
config branch in applyLightweightTransforms.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract lightweight file transforms (truncateBase64, removeEmptyLines,
trim, showLineNumbers) into applyLightweightTransforms() on the main
thread, keeping only heavy operations (removeComments, compress) in
worker processContent(). This eliminates dual management of the same
logic across worker and main thread paths.
Also pre-compile base64 regex patterns at module level to avoid
re-creation per file call.
Action: split processContent into heavy (worker) and lightweight (main thread) phases
Action: extract applyLightweightTransforms() as single source of truth for lightweight ops
Action: hoist regex patterns in truncateBase64.ts to module scope with lastIndex reset
Why: lightweight transforms were duplicated in both processFilesMainThread and processContent
Why: regex re-compilation per file added unnecessary overhead for large repos
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace string += with array accumulation + join('\n') in mergeAdjacentChunks
to avoid O(k²) copying when merging adjacent tree-sitter code chunks
- Extract searchInLines from searchInContent in grepRepomixOutputTool so
performGrepSearch splits content once and reuses the lines array for both
search and formatting, avoiding a redundant O(n) split on large files
Pipeline-level optimizations that produce measurable end-to-end improvement:
- Pre-initialize metrics worker pool during file collection phase so tiktoken
WASM loading overlaps with security checks and file processing. First token
count task dropped from 381ms to 22ms (worker already warmed).
- Lazy-load Jiti via dynamic import — only loaded when TS/JS config files are
detected, saving startup time for the common JSON/default config path.
- Fix O(n²) file path re-grouping in packager by using Map + Set for O(1)
membership checks instead of .find() + .includes().
- Move binary extension check before fs.stat in fileRead to skip unnecessary
stat syscalls for binary files.
- Parallelize split output file writes with Promise.all instead of sequential
for-loop.
Benchmark (15 runs each, median ± IQR, packing repomix repo ~1000 files):
main branch: 3515ms (P25: 3443, P75: 3581)
perf branch: 3318ms (P25: 3215, P75: 3383)
Improvement: -197ms (-5.6%)
Pipeline stage breakdown (instrumented):
- Metrics first-file init: 381ms → 22ms (worker pre-warmed)
- Total metrics stage: 793ms → ~450ms
All 1096 tests pass. Lint clean.
https://claude.ai/code/session_01JoNjFe7S2roMfHfNcw6bso
Previously, the interactive overwrite prompt confirmed but did not
remove the old directory, leaving stale files (e.g. renamed
tech-stack.md) behind. Now the directory is removed before
regeneration, consistent with --force behavior.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Aligns with files.md pattern (## File: <path>). Each package is now
a ## section under a single # Tech Stacks heading, with ### subsections
for Languages, Frameworks, Dependencies, etc.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instead of merging all dependency files into a single flat list,
detectTechStack now returns a TechStackInfo[] grouped by package
directory. Each directory containing a dependency file produces its
own entry with path, languages, frameworks, dependencies, etc.
generateTechStackMd renders each package as a separate section with
`path: (root)` or `path: packages/xxx`, separated by `---`. This
gives AI consumers clearer per-package context and makes line-based
retrieval easier.
Removes deduplicateDependencies as dependencies are now scoped
per-package and don't need cross-package deduplication. configFiles
stores filenames only (not full paths) since the package path
provides the directory context.
Closes#1182
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use first-wins for packageManager to match other dedup strategies
- Deduplicate dependencies by name:version to preserve version skew
- Normalize Node.js version v prefix before runtime version dedup
- Fix stale comment referencing root-level-only detection
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Defer PicoSpinner instantiation to avoid unnecessary object allocation
when the spinner will never be displayed (quiet, verbose, or stdout mode).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace log-update dependency with picospinner (from tinylibs) to reduce
transitive dependencies. picospinner provides built-in spinner functionality
(frames, symbols, succeed/fail states) that was previously manually
implemented on top of log-update, simplifying cliSpinner.ts.
This removes 12 transitive packages (ansi-escapes, cli-cursor, slice-ansi,
wrap-ansi, string-width, etc.) from the dependency tree.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Deduplicate runtimeVersions by runtime:version pair to prevent
duplicate entries when multiple version files exist across
subdirectories in monorepos.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously, detectTechStack() only checked root-level dependency files,
causing tech-stack.md to be empty for monorepo setups using --include
to target a specific package.
Now all dependency files in processedFiles are checked regardless of
directory depth. Since processedFiles is already filtered by
--include/--ignore, this naturally scopes detection to the user's
target. Also adds dependency deduplication for cases where multiple
package.json files define the same package, and stores config file
full paths to distinguish files across subdirectories.
Closes#1182
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Raise MIN_BASE64_LENGTH_STANDALONE from 60 to 256 since truncating short
strings saves negligible tokens. Require digits in isLikelyBase64 heuristic
since real base64-encoded binary data virtually always contains numbers,
while XPath and file path strings typically do not.
Closes#1298
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract duplicated DefaultActionRunnerResult mock into
createMockDefaultActionResult() helper function
- Add missing REPOMIX_REMOTE_TRUST_CONFIG env var mention in ko, pt-br,
ru library usage docs for consistency with other languages
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the intermediate isRemote flag that inverted remoteTrustConfig
only to be re-inverted back to skipLocalConfig in defaultAction. Now
remoteAction computes skipLocalConfig directly, reducing the internal
flag chain from 3 concepts to 2 (remoteTrustConfig → skipLocalConfig).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move absolute path validation for --config to before repository
download/clone, avoiding wasted I/O on invalid input
- Consolidate duplicate findConfigFile calls in skipLocalConfig branch
into a single search with conditional handling
- Add test for relative --config rejection even with --remote-trust-config
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Relative --config paths in remote mode would resolve against the cloned
temp directory, potentially loading and executing malicious config files
(e.g., repomix.config.ts) from untrusted repositories.
Now rejects relative paths with a clear error message guiding users to
use absolute paths instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The --config flag represents an explicit user choice and should not be
blocked in remote mode. Only auto-detected config files in the cloned
repo are skipped.
Also adds a logger.note() message when a config file is found in the
remote repository but skipped, guiding users to --remote-trust-config.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When using `repomix --remote <url>` or the MCP `pack_remote_repository` tool,
config files (repomix.config.ts/js) from the cloned repository were executed
via jiti, allowing a malicious repository to achieve arbitrary code execution
on the user's machine.
This commit skips all local config file loading when processing remote
repositories. The `isRemote` flag is propagated from remoteAction through
defaultAction to loadFileConfig, which skips local config auto-detection
and --config flag resolution. Global config and CLI options continue to
work normally.
Users who need to trust remote configs can do so in a future release via
an explicit opt-in flag (e.g., --trust-remote-config).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use a strict error handler for @xmldom/xmldom's DOMParser that throws on
all severity levels (warning, error, fatalError). By default, xmldom
silently continues parsing malformed XML, which could mask XMLBuilder
regressions. This ensures tests fail immediately on any XML well-formedness
issue.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fast-xml-parser has accumulated 10 CVEs (6 in 2026 alone), with a recurring
pattern of incomplete fixes in its DOCTYPE/entity parser. Since Repomix only
uses the XMLBuilder functionality (not the parser), switching to
fast-xml-builder — the standalone builder package that fast-xml-parser v5
internally delegates to — eliminates 9/10 parser-side CVE noise while
maintaining identical behavior.
- Replace fast-xml-parser (831KB) with fast-xml-builder (176KB) as dependency
- Add @xmldom/xmldom as devDependency for XML validation in tests
- Update import in outputGenerate.ts (named → default export)
- Migrate test XML parsing from fast-xml-parser's XMLParser to @xmldom/xmldom's
DOMParser, providing cross-implementation validation of generated XML
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mock data and expected sort order now use path.sep instead of hardcoded
'/' separators. On Windows, path.sep is '\' so sortPaths splits
differently, producing a different sort order.
Co-Authored-By: Claude Opus 4.6 (1M context) <koukun0120@gmail.com>
Replace weak arrayContaining assertion with exact toEqual using the
correct sorted order, so the test verifies both content and sort behavior.
Co-Authored-By: Claude Opus 4.6 (1M context) <koukun0120@gmail.com>
With the streaming pipeline, errors propagate as native Error objects
rather than RepomixError, so the isExtractionError check was always
false. Retrying extraction errors is acceptable since the retry loop
is bounded to 3 attempts.
The previous ZIP-based archive download used fflate's in-memory extraction,
which failed on large repositories (e.g. facebook/react) due to memory
constraints and ZIP64 limitations.
Switch to tar.gz format with Node.js built-in zlib + tar package, enabling
a full streaming pipeline (HTTP response -> gunzip -> tar extract -> disk)
with no temporary files and constant memory usage regardless of repo size.
Key changes:
- Replace fflate with tar package for archive extraction
- Change archive URLs from .zip to .tar.gz
- Use streaming pipeline instead of download-then-extract
- Leverage tar's built-in strip and path traversal protection
- Explicitly destroy streams after pipeline for Bun compatibility
- Use child_process runtime under Bun to avoid worker_threads hang
File collection was replaced with a promise pool approach in 96ff05dc,
but the worker-related code remained. This removes the now-unused
fileCollectWorker and all references to it from the worker system.
After the UTF-8 fast path optimization eliminated the CPU-heavy jschardet
bottleneck, file collection became I/O-bound. Worker threads now add pure
overhead (Tinypool init, structured clone, IPC) without benefit.
Benchmark (954 files, M2 Pro 10-core):
- Worker Threads: ~108ms → Promise Pool (c=50): ~37ms (2.9x faster)
Changes:
- Replace Tinypool worker dispatch with a simple promise pool (c=50)
- Inject readRawFile via deps for testability
- Remove unused concurrentTasksPerWorker from WorkerOptions
- Simplify tests to use readRawFile mock instead of 5+ module mocks
Previously, every file went through jschardet.detect() which scans the entire
buffer through multiple encoding probers (MBCS, SBCS, Latin1) with frequency
table lookups — the most expensive CPU operation in file collection.
Since ~99% of source code files are UTF-8, we now try TextDecoder('utf-8',
{ fatal: true }) first. If it succeeds, jschardet and iconv are skipped entirely.
Non-UTF-8 files (e.g., Shift-JIS, EUC-KR) fall back to the original detection path.
Additionally, set concurrentTasksPerWorker=3 for fileCollect workers to better
overlap I/O waits within each worker thread.
Benchmark results (838 files, 10 CPU cores):
- Before: ~616ms
- After: ~108ms (5.7x faster)
Allow users to run `repomix https://github.com/user/repo` or
`repomix git@github.com:user/repo.git` without the `--remote` flag.
Only explicit URL formats (https:// and git@) are auto-detected.
Shorthand format (owner/repo) is not auto-detected to avoid
ambiguity with local directory paths.
Closes#1120
Define CliCommandPackOptions interface locally in cliCommand.ts instead of
importing PackOptions from usePackOptions.ts which depends on Vue module.
This prevents tsc from following the import chain to Vue in CI.
Address PR review comments:
- Add shell escaping for user-controlled values (repositoryUrl, includePatterns, ignorePatterns)
to prevent command injection when users copy-paste the generated command
- Skip --remote flag for uploaded file names by validating with isValidRemoteValue
- Add unit tests for generateCliCommand covering all option combinations