Add an archive entry filter that checks file extensions with isBinaryPath
before writing to disk, avoiding unnecessary I/O for binary files (images,
fonts, executables, etc.) that would be excluded later anyway.
The filter strips the leading tar segment (e.g. "repo-branch/") since tar's
filter callback receives paths before strip is applied.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
codeload.github.com resolves branches, tags, and SHAs automatically
without refs/heads/ or refs/tags/ prefixes. This eliminates the tag
fallback URL entirely and simplifies buildGitHubArchiveUrl to a single
return statement, saving an extra round trip for tag-based downloads.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Download archives from codeload.github.com instead of github.com/archive
to eliminate the intermediate 302 redirect, saving ~100-300ms per request.
This is the same pattern used by create-react-app and degit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
With the streaming pipeline, errors propagate as native Error objects
rather than RepomixError, so the isExtractionError check was always
false. Retrying extraction errors is acceptable since the retry loop
is bounded to 3 attempts.
The previous ZIP-based archive download used fflate's in-memory extraction,
which failed on large repositories (e.g. facebook/react) due to memory
constraints and ZIP64 limitations.
Switch to tar.gz format with Node.js built-in zlib + tar package, enabling
a full streaming pipeline (HTTP response -> gunzip -> tar extract -> disk)
with no temporary files and constant memory usage regardless of repo size.
Key changes:
- Replace fflate with tar package for archive extraction
- Change archive URLs from .zip to .tar.gz
- Use streaming pipeline instead of download-then-extract
- Leverage tar's built-in strip and path traversal protection
- Explicitly destroy streams after pipeline for Bun compatibility
- Use child_process runtime under Bun to avoid worker_threads hang
Vitest v4 changed how vi.fn() and vi.mock() work with class constructors.
Arrow functions in mockImplementation no longer work as constructors
when called with 'new' keyword.
Changes:
- Use regular function syntax instead of arrow functions for constructor mocks
- Use vi.hoisted() to define class mocks that can be used in vi.mock() factories
- Replace vi.fn().mockReturnValue() with vi.fn().mockImplementation() for class mocks
- Update mock instance retrieval to use vi.mocked().mock.results[0].value
Added override configuration to disable Biome's organizeImports feature
specifically for src/index.ts to allow manual import order management
while keeping automatic import organization enabled for other files.
Created thorough unit tests covering all functionality of the GitHub archive
download and extraction module. Tests include:
- Successful download and extraction flow
- Progress callback handling
- Retry logic with exponential backoff
- URL fallback strategies (main → master → tag)
- Error handling for network failures, ZIP corruption, timeouts
- Security validations for path traversal and absolute paths
- Archive cleanup on both success and failure
- Multiple response scenarios (404, timeout, missing body)
Test coverage includes:
- downloadGitHubArchive function with various scenarios
- isArchiveDownloadSupported function
- All edge cases and error conditions
- Security protection mechanisms
Uses proper mocking with vitest for external dependencies:
- fetch API for HTTP requests
- fflate library for ZIP extraction
- Node.js fs operations
- Stream processing components
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>