Commit Graph

498 Commits

Author SHA1 Message Date
Kazuki Yamada 6fecdca6b3 test(core): Add combined worker + lightweight pipeline integration test
Add test that exercises all transforms together: removeComments (worker)
+ truncateBase64 + removeEmptyLines + showLineNumbers (lightweight) to
verify the full two-phase pipeline produces correct output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 18:27:55 +09:00
Kazuki Yamada f1067163ec refactor(core): Simplify into single applyLightweightTransforms and remove redundant trim
Merge applyPreCompressTransforms and applyPostCompressTransforms into
a single applyLightweightTransforms function. Move truncateBase64 to
post-worker phase since tree-sitter handles string literals as single
AST nodes regardless of content size.

Remove redundant trim from worker processContent — the main thread
applyLightweightTransforms already handles it.

Final pipeline:
  Worker: removeComments → compress
  Main:   truncateBase64 → removeEmptyLines → trim → showLineNumbers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 18:01:22 +09:00
Kazuki Yamada 47e4a65b61 fix(core): Move removeEmptyLines to post-compress to preserve ordering
Move removeEmptyLines from applyPreCompressTransforms to
applyPostCompressTransforms so it runs after removeComments.
This ensures empty lines created by comment removal are cleaned up.

Transform order: truncateBase64 (pre) → [removeComments → compress] (worker) → removeEmptyLines → trim → showLineNumbers (post)

Simplify applyPreCompressTransforms to only handle truncateBase64
with an early return when disabled.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 17:57:00 +09:00
Kazuki Yamada cac35d0465 fix(core): Preserve transform order by splitting into pre/post compress phases
Split applyLightweightTransforms into applyPreCompressTransforms and
applyPostCompressTransforms to preserve the original execution order:
truncateBase64 → removeComments → removeEmptyLines → trim → compress → showLineNumbers

Pre-compress transforms (truncateBase64, removeEmptyLines) must run
before tree-sitter parsing to avoid performance regression with large
base64 strings and to ensure empty line removal affects chunk merging.

Action: split lightweight transforms into pre-compress and post-compress phases
Why: previous refactor changed execution order, causing tree-sitter to receive
untreated base64 and content with empty lines, altering compress output

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 17:40:11 +09:00
Kazuki Yamada e978decb2b test(core): Add regression tests for base64 truncation and lastIndex safety
Add test for consecutive truncateBase64Content calls to verify global
regex lastIndex reset works correctly. Add test for truncateBase64
config branch in applyLightweightTransforms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 17:33:54 +09:00
Kazuki Yamada 3e70628307 refactor(core): Separate lightweight transforms from worker processing
Extract lightweight file transforms (truncateBase64, removeEmptyLines,
trim, showLineNumbers) into applyLightweightTransforms() on the main
thread, keeping only heavy operations (removeComments, compress) in
worker processContent(). This eliminates dual management of the same
logic across worker and main thread paths.

Also pre-compile base64 regex patterns at module level to avoid
re-creation per file call.

Action: split processContent into heavy (worker) and lightweight (main thread) phases
Action: extract applyLightweightTransforms() as single source of truth for lightweight ops
Action: hoist regex patterns in truncateBase64.ts to module scope with lastIndex reset
Why: lightweight transforms were duplicated in both processFilesMainThread and processContent
Why: regex re-compilation per file added unnecessary overhead for large repos

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 17:24:32 +09:00
Kazuki Yamada 2fac8d85ee perf(core): Optimize chunk merging and avoid redundant string split in grep tool
- Replace string += with array accumulation + join('\n') in mergeAdjacentChunks
  to avoid O(k²) copying when merging adjacent tree-sitter code chunks
- Extract searchInLines from searchInContent in grepRepomixOutputTool so
  performGrepSearch splits content once and reuses the lines array for both
  search and formatting, avoiding a redundant O(n) split on large files
2026-03-28 15:12:29 +09:00
Claude 4d2bbcf6cc perf(core): Pre-initialize metrics worker pool to overlap tiktoken WASM loading
Pipeline-level optimizations that produce measurable end-to-end improvement:

- Pre-initialize metrics worker pool during file collection phase so tiktoken
  WASM loading overlaps with security checks and file processing. First token
  count task dropped from 381ms to 22ms (worker already warmed).
- Lazy-load Jiti via dynamic import — only loaded when TS/JS config files are
  detected, saving startup time for the common JSON/default config path.
- Fix O(n²) file path re-grouping in packager by using Map + Set for O(1)
  membership checks instead of .find() + .includes().
- Move binary extension check before fs.stat in fileRead to skip unnecessary
  stat syscalls for binary files.
- Parallelize split output file writes with Promise.all instead of sequential
  for-loop.

Benchmark (15 runs each, median ± IQR, packing repomix repo ~1000 files):

  main branch: 3515ms (P25: 3443, P75: 3581)
  perf branch: 3318ms (P25: 3215, P75: 3383)
  Improvement: -197ms (-5.6%)

Pipeline stage breakdown (instrumented):
  - Metrics first-file init: 381ms → 22ms (worker pre-warmed)
  - Total metrics stage: 793ms → ~450ms

All 1096 tests pass. Lint clean.

https://claude.ai/code/session_01JoNjFe7S2roMfHfNcw6bso
2026-03-28 01:15:43 +09:00
Kazuki Yamada 41ed574da1 fix(skill): Remove existing skill directory on overwrite confirmation
Previously, the interactive overwrite prompt confirmed but did not
remove the old directory, leaving stale files (e.g. renamed
tech-stack.md) behind. Now the directory is removed before
regeneration, consistent with --force behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 00:27:25 +09:00
Kazuki Yamada 7649725a21 refactor(skill): Rename tech-stack.md to tech-stacks.md with ## Tech Stack: <path> format
Aligns with files.md pattern (## File: <path>). Each package is now
a ## section under a single # Tech Stacks heading, with ### subsections
for Languages, Frameworks, Dependencies, etc.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 00:20:49 +09:00
Kazuki Yamada 2a7139c89f refactor(skill): Use '.' instead of '(root)' for root directory label
More natural as a path value and consistent with filesystem conventions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 23:58:35 +09:00
Kazuki Yamada b2191509ca refactor(skill): Group tech stack detection by package directory
Instead of merging all dependency files into a single flat list,
detectTechStack now returns a TechStackInfo[] grouped by package
directory. Each directory containing a dependency file produces its
own entry with path, languages, frameworks, dependencies, etc.

generateTechStackMd renders each package as a separate section with
`path: (root)` or `path: packages/xxx`, separated by `---`. This
gives AI consumers clearer per-package context and makes line-based
retrieval easier.

Removes deduplicateDependencies as dependencies are now scoped
per-package and don't need cross-package deduplication. configFiles
stores filenames only (not full paths) since the package path
provides the directory context.

Closes #1182

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 23:48:44 +09:00
Kazuki Yamada 6a9d028e75 Merge pull request #1310 from yamadashy/fix/skill-tech-stack-subdirectory-detection
fix(skill): Detect tech stack from dependency files in subdirectories
2026-03-27 23:27:58 +09:00
Kazuki Yamada 005eb791eb fix(skill): Address PR review feedback for tech stack detection
- Use first-wins for packageManager to match other dedup strategies
- Deduplicate dependencies by name:version to preserve version skew
- Normalize Node.js version v prefix before runtime version dedup
- Fix stale comment referencing root-level-only detection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 00:32:02 +09:00
Kazuki Yamada 18e4e386c9 refactor(cli): Skip PicoSpinner construction in quiet mode
Defer PicoSpinner instantiation to avoid unnecessary object allocation
when the spinner will never be displayed (quiet, verbose, or stdout mode).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 00:29:08 +09:00
Kazuki Yamada 31bb9ed22d refactor(cli): Replace log-update with picospinner for spinner implementation
Replace log-update dependency with picospinner (from tinylibs) to reduce
transitive dependencies. picospinner provides built-in spinner functionality
(frames, symbols, succeed/fail states) that was previously manually
implemented on top of log-update, simplifying cliSpinner.ts.

This removes 12 transitive packages (ansi-escapes, cli-cursor, slice-ansi,
wrap-ansi, string-width, etc.) from the dependency tree.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 00:23:01 +09:00
Kazuki Yamada c4b096f996 fix(skill): Add deduplication for runtime versions in tech stack detection
Deduplicate runtimeVersions by runtime:version pair to prevent
duplicate entries when multiple version files exist across
subdirectories in monorepos.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 00:09:46 +09:00
Kazuki Yamada 87949970da fix(skill): Detect tech stack from dependency files in subdirectories
Previously, detectTechStack() only checked root-level dependency files,
causing tech-stack.md to be empty for monorepo setups using --include
to target a specific package.

Now all dependency files in processedFiles are checked regardless of
directory depth. Since processedFiles is already filtered by
--include/--ignore, this naturally scopes detection to the user's
target. Also adds dependency deduplication for cases where multiple
package.json files define the same package, and stores config file
full paths to distinguish files across subdirectories.

Closes #1182

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 00:01:05 +09:00
Kazuki Yamada e2101a50d1 test(core): Add boundary test for exactly 256-char base64 string
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 22:59:45 +09:00
Kazuki Yamada 25ec27028e fix(core): Reduce false positives in truncateBase64 for path-like strings
Raise MIN_BASE64_LENGTH_STANDALONE from 60 to 256 since truncating short
strings saves negligible tokens. Require digits in isLikelyBase64 heuristic
since real base64-encoded binary data virtually always contains numbers,
while XPath and file path strings typically do not.

Closes #1298

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 22:52:33 +09:00
Florian Lefebvre 7ec67b181e lint 2026-03-24 09:52:58 +01:00
Florian Lefebvre 4f487444cb perf: migrate to tinyclip from clipboardy 2026-03-24 09:52:18 +01:00
autofix-ci[bot] c5d104e5c1 [autofix.ci] apply automated fixes 2026-03-22 15:54:13 +00:00
Kazuki Yamada 329eda2832 refactor(test): Extract mock helper and fix missing env var docs
- Extract duplicated DefaultActionRunnerResult mock into
  createMockDefaultActionResult() helper function
- Add missing REPOMIX_REMOTE_TRUST_CONFIG env var mention in ko, pt-br,
  ru library usage docs for consistency with other languages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 00:53:04 +09:00
Kazuki Yamada 908d0c1cb5 refactor(cli): Replace isRemote flag with skipLocalConfig
Remove the intermediate isRemote flag that inverted remoteTrustConfig
only to be re-inverted back to skipLocalConfig in defaultAction. Now
remoteAction computes skipLocalConfig directly, reducing the internal
flag chain from 3 concepts to 2 (remoteTrustConfig → skipLocalConfig).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 00:43:56 +09:00
Kazuki Yamada 89c7875732 fix(security): Move --config validation before download and deduplicate findConfigFile
- Move absolute path validation for --config to before repository
  download/clone, avoiding wasted I/O on invalid input
- Consolidate duplicate findConfigFile calls in skipLocalConfig branch
  into a single search with conditional handling
- Add test for relative --config rejection even with --remote-trust-config

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 00:29:02 +09:00
Kazuki Yamada 17f5bb8062 fix(security): Require absolute path for --config in remote mode
Relative --config paths in remote mode would resolve against the cloned
temp directory, potentially loading and executing malicious config files
(e.g., repomix.config.ts) from untrusted repositories.

Now rejects relative paths with a clear error message guiding users to
use absolute paths instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 23:35:20 +09:00
Kazuki Yamada f2b41b875a fix(security): Allow --config flag in remote mode and add skip log message
The --config flag represents an explicit user choice and should not be
blocked in remote mode. Only auto-detected config files in the cloned
repo are skipped.

Also adds a logger.note() message when a config file is found in the
remote repository but skipped, guiding users to --remote-trust-config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 23:35:20 +09:00
Kazuki Yamada 127f561d13 test(cli): Add tests for --remote-trust-config and env var opt-in
Cover the previously untested opt-in paths:
- archive-download path passes isRemote: true
- --remote-trust-config flag sets isRemote to false
- REPOMIX_REMOTE_TRUST_CONFIG=true env var sets isRemote to false
- Non-"true" env var values (e.g., "yes") keep isRemote true

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 23:35:20 +09:00
Kazuki Yamada 186f74a85f fix(security): Skip config file loading from remote repositories
When using `repomix --remote <url>` or the MCP `pack_remote_repository` tool,
config files (repomix.config.ts/js) from the cloned repository were executed
via jiti, allowing a malicious repository to achieve arbitrary code execution
on the user's machine.

This commit skips all local config file loading when processing remote
repositories. The `isRemote` flag is propagated from remoteAction through
defaultAction to loadFileConfig, which skips local config auto-detection
and --config flag resolution. Global config and CLI options continue to
work normally.

Users who need to trust remote configs can do so in a future release via
an explicit opt-in flag (e.g., --trust-remote-config).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 23:35:20 +09:00
Kazuki Yamada cdb6ab156c test(output): Add strict XML error handling to DOMParser in tests
Use a strict error handler for @xmldom/xmldom's DOMParser that throws on
all severity levels (warning, error, fatalError). By default, xmldom
silently continues parsing malformed XML, which could mask XMLBuilder
regressions. This ensures tests fail immediately on any XML well-formedness
issue.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 01:02:37 +09:00
Kazuki Yamada 04b277f269 refactor(deps): Replace fast-xml-parser with fast-xml-builder
fast-xml-parser has accumulated 10 CVEs (6 in 2026 alone), with a recurring
pattern of incomplete fixes in its DOCTYPE/entity parser. Since Repomix only
uses the XMLBuilder functionality (not the parser), switching to
fast-xml-builder — the standalone builder package that fast-xml-parser v5
internally delegates to — eliminates 9/10 parser-side CVE noise while
maintaining identical behavior.

- Replace fast-xml-parser (831KB) with fast-xml-builder (176KB) as dependency
- Add @xmldom/xmldom as devDependency for XML validation in tests
- Update import in outputGenerate.ts (named → default export)
- Migrate test XML parsing from fast-xml-parser's XMLParser to @xmldom/xmldom's
  DOMParser, providing cross-implementation validation of generated XML

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 01:02:37 +09:00
Kazuki Yamada f38828aa90 fix(test): Use path.sep in fileSearch test for cross-platform compatibility
Mock data and expected sort order now use path.sep instead of hardcoded
'/' separators. On Windows, path.sep is '\' so sortPaths splits
differently, producing a different sort order.

Co-Authored-By: Claude Opus 4.6 (1M context) <koukun0120@gmail.com>
2026-03-20 01:16:01 +09:00
Kazuki Yamada f5977b2e6a test(core): Use exact sorted order assertion in fileSearch test
Replace weak arrayContaining assertion with exact toEqual using the
correct sorted order, so the test verifies both content and sort behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <koukun0120@gmail.com>
2026-03-20 01:08:03 +09:00
yamadashy d9fa509ee6 perf(core): Optimize sortPaths with decorate-sort-undecorate pattern
Pre-compute path.split() once per path before sorting, avoiding
O(N log N) repeated string allocations during comparisons.
Benchmark: 10,000 files 65ms → 11ms (6x speedup).

Co-Authored-By: Claude Opus 4.6 (1M context) <koukun0120@gmail.com>
2026-03-20 00:54:33 +09:00
Kazuki Yamada 2e98b5cc18 refactor(core): Remove dead isExtractionError check from retry logic
With the streaming pipeline, errors propagate as native Error objects
rather than RepomixError, so the isExtractionError check was always
false. Retrying extraction errors is acceptable since the retry loop
is bounded to 3 attempts.
2026-02-23 22:58:56 +09:00
Kazuki Yamada 44f15b6477 refactor(core): Remove unused getArchiveFilename function
The streaming tar.gz extraction no longer uses temporary files,
making this filename generation function unnecessary.
2026-02-18 22:41:26 +09:00
Kazuki Yamada ef194b8eeb perf(core): Replace ZIP archive download with streaming tar.gz extraction
The previous ZIP-based archive download used fflate's in-memory extraction,
which failed on large repositories (e.g. facebook/react) due to memory
constraints and ZIP64 limitations.

Switch to tar.gz format with Node.js built-in zlib + tar package, enabling
a full streaming pipeline (HTTP response -> gunzip -> tar extract -> disk)
with no temporary files and constant memory usage regardless of repo size.

Key changes:
- Replace fflate with tar package for archive extraction
- Change archive URLs from .zip to .tar.gz
- Use streaming pipeline instead of download-then-extract
- Leverage tar's built-in strip and path traversal protection
- Explicitly destroy streams after pipeline for Bun compatibility
- Use child_process runtime under Bun to avoid worker_threads hang
2026-02-18 00:22:07 +09:00
Kazuki Yamada 05f11f46c7 refactor(core): Remove unused fileCollect worker infrastructure
File collection was replaced with a promise pool approach in 96ff05dc,
but the worker-related code remained. This removes the now-unused
fileCollectWorker and all references to it from the worker system.
2026-02-17 23:09:18 +09:00
Kazuki Yamada e97691dd36 perf(core): Replace worker threads with promise pool for file collection
After the UTF-8 fast path optimization eliminated the CPU-heavy jschardet
bottleneck, file collection became I/O-bound. Worker threads now add pure
overhead (Tinypool init, structured clone, IPC) without benefit.

Benchmark (954 files, M2 Pro 10-core):
- Worker Threads: ~108ms → Promise Pool (c=50): ~37ms (2.9x faster)

Changes:
- Replace Tinypool worker dispatch with a simple promise pool (c=50)
- Inject readRawFile via deps for testability
- Remove unused concurrentTasksPerWorker from WorkerOptions
- Simplify tests to use readRawFile mock instead of 5+ module mocks
2026-02-17 23:09:18 +09:00
Kazuki Yamada 7dcdbae24d perf(core): Add UTF-8 fast path to skip expensive jschardet encoding detection
Previously, every file went through jschardet.detect() which scans the entire
buffer through multiple encoding probers (MBCS, SBCS, Latin1) with frequency
table lookups — the most expensive CPU operation in file collection.

Since ~99% of source code files are UTF-8, we now try TextDecoder('utf-8',
{ fatal: true }) first. If it succeeds, jschardet and iconv are skipped entirely.
Non-UTF-8 files (e.g., Shift-JIS, EUC-KR) fall back to the original detection path.

Additionally, set concurrentTasksPerWorker=3 for fileCollect workers to better
overlap I/O waits within each worker thread.

Benchmark results (838 files, 10 CPU cores):
- Before: ~616ms
- After:  ~108ms (5.7x faster)
2026-02-17 23:09:18 +09:00
autofix-ci[bot] 1d5297c9a6 [autofix.ci] apply automated fixes 2026-02-17 13:55:26 +00:00
Kazuki Yamada aef7cc1f4a feat(cli): Add ssh:// and git:// protocol support to remote URL auto-detection
The existing --remote flag already supports ssh:// and git:// protocols
via git-url-parse, so auto-detection should cover them as well.
2026-02-17 22:54:03 +09:00
Kazuki Yamada 540e8dd2a3 feat(cli): Auto-detect explicit remote URLs in positional arguments
Allow users to run `repomix https://github.com/user/repo` or
`repomix git@github.com:user/repo.git` without the `--remote` flag.

Only explicit URL formats (https:// and git@) are auto-detected.
Shorthand format (owner/repo) is not auto-detected to avoid
ambiguity with local directory paths.

Closes #1120
2026-02-17 22:40:13 +09:00
Kazuki Yamada 66e572f62e test(core): Add test for skipping retry on extraction error
Verify that extraction errors cause immediate failure without retrying,
since the same archive will produce the same extraction error.
2026-02-14 18:55:22 +09:00
Kazuki Yamada 7ea89f7fd0 fix(website): Remove Vue dependency from cliCommand utility to fix CI
Define CliCommandPackOptions interface locally in cliCommand.ts instead of
importing PackOptions from usePackOptions.ts which depends on Vue module.
This prevents tsc from following the import chain to Vue in CI.
2026-02-03 00:06:09 +09:00
Kazuki Yamada 91c39bb5ee fix(website): Avoid importing Vue-dependent module in test to fix CI lint
Use local interface definition instead of importing PackOptions from
usePackOptions.ts which depends on Vue and fails tsc in CI.
2026-02-03 00:04:13 +09:00
Kazuki Yamada 62280a6870 fix(website): Add shell escaping and ZIP upload handling to CLI command generator
Address PR review comments:
- Add shell escaping for user-controlled values (repositoryUrl, includePatterns, ignorePatterns)
  to prevent command injection when users copy-paste the generated command
- Skip --remote flag for uploaded file names by validating with isValidRemoteValue
- Add unit tests for generateCliCommand covering all option combinations
2026-02-02 00:21:16 +09:00
Kazuki Yamada 26f6c6c83a Merge pull request #1098 from yamadashy/chore/optimize-tsconfig
chore(config): Optimize tsconfig for TypeScript 5.x and Node.js 20+
2026-01-17 14:09:58 +09:00
Kazuki Yamada 4c7d8fbc99 build(config): Optimize tsconfig for TypeScript 5.x and Node.js 20+
- Update target from es2016 to es2022 (Node.js 20+ fully supports ES2022)
- Add moduleDetection: "force" to treat all files as modules
- Add verbatimModuleSyntax: true (TypeScript 5.0+ recommended setting)
- Remove esModuleInterop (replaced by verbatimModuleSyntax)
- Remove noImplicitAny (redundant, included in strict)
- Remove compileOnSave (unused, VS Code ignores this option)
- Remove redundant declaration: true from tsconfig.build.json
- Fix repomix.config.cts to use CommonJS syntax (module.exports)
2026-01-17 14:06:54 +09:00