Address two review threads on PR #1518 that flagged tests whose titles
overstated what was being verified.
- fileProcess: the longBase64 string is one continuous line, so the
truncateBase64 → removeEmptyLines ordering was never actually under
test (truncateBase64Content's regex does not span newlines). Rename
to describe the combined behavior the test really pins.
- skillTechStack: rename the per-directory case to reflect that root
and subpackage land in separate buckets keyed by getDirPath, and
add a second case with two package.json entries at the same path
to genuinely exercise the parsed.packageManager && !result.packageManager
guard at skillTechStack.ts:541.
Targeted regression tests for the high-risk areas identified in the
v1.13.1..main audit, focusing on silent-correctness bugs and parallel
error handling — places that wouldn't surface in CI but would in the
field.
- core/metrics/calculateMetrics: pin numeric equivalence between the
fast path (Σ file tokens + wrapper tokens) and the slow path (full
output tokenization). Cover wrapper-extraction fallback, split-output
fallback, and worker pool cleanup when fileMetrics rejects.
- core/file/fileProcess: pin transform ordering invariants —
removeComments → removeEmptyLines (blank lines from comment removal
must be cleaned up; preserved when removeEmptyLines is off);
truncateBase64 → removeEmptyLines (multi-line base64 squashed first);
trim → showLineNumbers (no leading/trailing blanks numbered).
Plus worker/lightweight path parity for inputs that don't need
worker processing.
- core/packager: pin metrics worker pool cleanup on parallel branch
failures (validateFileSafety, produceOutput, calculateMetrics, warmup
rejection). Verify prefetchSortData failure is isolated and does not
block sortOutputFiles.
- core/skill/skillTechStack: cover untested fix-commit invariants —
root entry sorts first in monorepo output; configFiles deduplicated
within a directory; first-seen packageManager wins per directory.
Add test that exercises all transforms together: removeComments (worker)
+ truncateBase64 + removeEmptyLines + showLineNumbers (lightweight) to
verify the full two-phase pipeline produces correct output.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merge applyPreCompressTransforms and applyPostCompressTransforms into
a single applyLightweightTransforms function. Move truncateBase64 to
post-worker phase since tree-sitter handles string literals as single
AST nodes regardless of content size.
Remove redundant trim from worker processContent — the main thread
applyLightweightTransforms already handles it.
Final pipeline:
Worker: removeComments → compress
Main: truncateBase64 → removeEmptyLines → trim → showLineNumbers
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move removeEmptyLines from applyPreCompressTransforms to
applyPostCompressTransforms so it runs after removeComments.
This ensures empty lines created by comment removal are cleaned up.
Transform order: truncateBase64 (pre) → [removeComments → compress] (worker) → removeEmptyLines → trim → showLineNumbers (post)
Simplify applyPreCompressTransforms to only handle truncateBase64
with an early return when disabled.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split applyLightweightTransforms into applyPreCompressTransforms and
applyPostCompressTransforms to preserve the original execution order:
truncateBase64 → removeComments → removeEmptyLines → trim → compress → showLineNumbers
Pre-compress transforms (truncateBase64, removeEmptyLines) must run
before tree-sitter parsing to avoid performance regression with large
base64 strings and to ensure empty line removal affects chunk merging.
Action: split lightweight transforms into pre-compress and post-compress phases
Why: previous refactor changed execution order, causing tree-sitter to receive
untreated base64 and content with empty lines, altering compress output
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add test for consecutive truncateBase64Content calls to verify global
regex lastIndex reset works correctly. Add test for truncateBase64
config branch in applyLightweightTransforms.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract lightweight file transforms (truncateBase64, removeEmptyLines,
trim, showLineNumbers) into applyLightweightTransforms() on the main
thread, keeping only heavy operations (removeComments, compress) in
worker processContent(). This eliminates dual management of the same
logic across worker and main thread paths.
Also pre-compile base64 regex patterns at module level to avoid
re-creation per file call.
Action: split processContent into heavy (worker) and lightweight (main thread) phases
Action: extract applyLightweightTransforms() as single source of truth for lightweight ops
Action: hoist regex patterns in truncateBase64.ts to module scope with lastIndex reset
Why: lightweight transforms were duplicated in both processFilesMainThread and processContent
Why: regex re-compilation per file added unnecessary overhead for large repos
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add WorkerRuntime type and configurable runtime parameter to createWorkerPool and initTaskRunner functions. This allows choosing between 'worker_threads' and 'child_process' runtimes based on performance requirements.
- Add WorkerRuntime type definition for type safety
- Add optional runtime parameter to createWorkerPool with child_process default
- Add optional runtime parameter to initTaskRunner with child_process default
- Configure fileCollectWorker to use worker_threads for better performance
- Update all test files to use WorkerRuntime type
- Add comprehensive tests for runtime parameter functionality
- Maintain backward compatibility with existing code
The fileCollectWorker now benefits from worker_threads faster startup and shared memory, while other workers continue using child_process for stability.
Add generic initTaskRunner function to processConcurrency.ts to eliminate
duplicate initialization logic across multiple modules. This reduces code
duplication and provides consistent worker pool management with proper
type safety through generic parameters.
- Add TaskRunner<T, R> interface and initTaskRunner function
- Remove duplicate createTaskRunner wrappers from 5 modules
- Update all deps parameters to use shared initTaskRunner directly
- Maintain type safety with explicit generic type parameters
- Update corresponding test mocks to match new signature
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
TokenCounter instances were not being properly freed when worker threads
were terminated by Tinypool's idle timeout. This caused memory leaks
when using runCli as a library.
Changes:
- Add SIGTERM/SIGINT handlers to fileMetricsWorker and outputMetricsWorker
- Add freeTokenCounters function with proper cleanup and debug logging
- Convert all worker usage to consistent taskRunner pattern with cleanup
- Add cleanupWorkerPool function for explicit worker pool termination
- Update all related tests to match new taskRunner interface
The fix ensures TokenCounter resources are properly freed when workers
terminate, preventing memory accumulation during library usage.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>