This change introduces a thread-safe version of the 'SerializedDiagnosticConsumer' and refactors scanning compilation instance creation code to ensure this consumer gets added when the scanner query configuration command-line includes '-serialized-diagnostics-path' option.
Otherwise, when multiple workers encounter a diagnostic simultaneously we can encounter races which lead to corrupted diagnostic data or crashes
Resolves rdar://159598539
While this made sense in the distant past where the scanning service provided backing storage for the dependency cache, it no longer does so and now makes for awkard layering where clients get at the service via the cache. Now the cache is a simple data structure while all the clients that need access to the scanning service will get it explicitly.
- 'SwiftModuleScanner' will now be owned directly by the 'ModuleDependencyScanningWorker' and will contain all the necessary custom logic, instead of being instantiated by the module interface loader for each query
- Moves ownership over module output path and sdk module output path directly into the scanning worker, instead of the cache
This was used a long time ago for a design of a scanner which could rely on the client to specify that some modules *will be* present at a given location but are not yet during the scan. We have long ago determined that the scanner must have all modules available to it at the time of scan for soundness. This code has been stale for a couple of years and it is time to simplify things a bit by deleting it.
Since we enabled parallel dependency scanning by-default, each individual scan needs a diagnostic consumer that is safe to use across many threads. Deprecate the 'Locking' sub-class, making its behavior the default in the base class.
For the main source module, provide info on which dependencies are directly imported into the user program, explicitly ('import' statement) or implicitly (e.g. stdlib). Thist list does not include Swift overlay dependencies, cross-import dependencies, bridging header dependencies.
With '-sdk-module-cache-path', Swift textual interfaces found in the SDK will be built into a separate SDK-specific module cache.
Clang modules are not yet affected by this change, pending addition of the required API.
Batch dependency scanning was added as a mechanism to support multiple compilation contexts within a single module dependency graph.
The Swift compiler and the Explicitly-built modules model has long since abandoned this approach and this code has long been stale. It is time to remove it and its associated C API.
Instead, each scan's 'ModuleDependenciesCache' will hold all of the data corresponding to discovered module dependencies.
The initial design presumed the possibility of sharing a global scanning cache amongs different scanner invocations, possibly even different concurrent scanner invocations.
This change also deprecates two libSwiftScan entry-points: 'swiftscan_scanner_cache_load' and 'swiftscan_scanner_cache_serialize'. They never ended up getting used, and since this code has been largely stale, we are confident they have not otherwise had users, and they do not fit with this design.
A follow-up change will re-introduce moduele dependency cache serialization on a per-query basis and bring the binary format up-to-date.
The scanning action does not have any need for handling `-llvm` options, since it will never perform any code-gen. LLVM option processing relies on global option parsing structures, and the scanner has needed to carefully attempt to synchronize access to them. This change guards the configuration of LLVM options to not happen at all for dependency scanning actions, and removes calls to `llvm::cl::ResetAllOptionOccurrences()` that were previously needed.
Resolves rdar://120754696
Teach scanner to respect the working directory set in the invocation
through scanner C API.
Also add test infrastructure to testing scanner from C API. Break up
DependencyScan lib into two so the swift-scan-test and remain small
without understanding swift AST.
rdar://127626011
This change modifies the dependency scanner to keep track of source locations of each encountered 'import' statement, in order to be able to emit diagnostics with source locations if an import failed to resolve.
- Keep track of each 'import' statement's source buffer, line number, and column number when adding it. The dependency scanner utilizes separate compilation instances, and therefore separate Source Managers for scanning `import` statements of user sources and textual interfaces of Swift dependencies. Since import resolution may happen in the main scanner compilation instance while the `import` itself was found by an interface-scanning sub-instance, we cannot simply hold on to the import's `SourceLoc`.
- Add libSwiftScan API for diagnostics to carry above source locations to clients.
Guard 'resetCache' and 'resetDiagnostics' as critical sections using 'DependencyScanningToolStateLock'. Otherwise there's a chance that one thread in the scanner is doing a reset on the diagnostic consumer, while some other thread is adding this diagnostic consumer to another scan instance which may also be populating said consumer at the time. Similarly, for resetting 'ScanningService' though it is much more unlikely to be reset while in-use by other scanning actions.
From being a scattered collection of 'static' methods in ScanDependencies.cpp
and member methods of ASTContext. This makes 'ScanDependencies.cpp' much easier
to read, and abstracts the actual scanning logic away to a place with common
state which will make it easier to reason about in the future.
Instead of the code querying the compiler's built-in Clang instance, refactor the
dependency scanner to explicitly keep track of module output path. It is still
set according to '-module-cache-path' as it has been prior to this change, but
now the scanner can use a different module cache for scanning PCMs, as specified
with '-clang-scanner-module-cache-path', without affecting module output path.
Resolves rdar://113222853
Teach swift dependency scanner to use CAS to capture the full dependencies for a build and construct build commands with immutable inputs from CAS.
This allows swift compilation caching using CAS.
Other parts of the scanner lib (e.g. target-info query) already do this. We must always make sure to process the incoming command-line strings and run them through 'llvm::cl::TokenizeGNUCommandLine' in order to process escaped paths.
Part of rdar://106712169
Using a virutal output backend to capture all the outputs from
swift-frontend invocation. This allows redirecting and/or mirroring
compiler outputs to multiple location using different OutputBackend.
As an example usage for the virtual outputs, teach swift compiler to
check its output determinism by running the compiler invocation
twice and compare the hash of all its outputs.
Virtual output will be used to enable caching in the future.
Using mutual exclusion, ensuring that multiple threads executing dependency scans do not encounter data races on shared mutable state.
There are two layers with shared state where we need to be careful:
- `DependencyScanningTool`, as the main entity that scanning clients interact with. This tool instantiates compiler instances for individual scans, computing a scanning invocation hash. It needs to remember those instances for future use, and when creating instances it needs to reset LLVM argument processor's global state, meaning all uses of argument processing must be in a critical section.
- `SwiftDependencyScanningService`, as the main cache where dependency scanning results are stored. Each individual scan instantiates a `ModuleDependenciesCache`, which uses the scanning service as the underlying storage. The services' storage is segmented to storing dependencies discovered in a scan with a given context hash, which means two different scanning invocations running at the same time will be accessing different locations in its storage, thus not requiring synchronization. But the service still has some shared state that must be protected, such as the collection of discovered source modules, and the map used to query context-hash-specific underlying cache storage.
We would previously unconditionally treat the command line as GNU style arguments. However, Windows uses a different command-line style, and this would incorrectly process the arguments, potentially corrupting paths which do not quote the path separator. Ideally, we would introduce a new api (`swiftscan_compiler_target_info_query_v3`?) that takes a quoting style (matching `--rsp-quoting`) which would allow us to support both quoting styles properly.
This new version takes the path to the compiler executable as a parameter, in order for libSwiftScan to compute compiler-relative portions of runtimeLibraryPaths, runtimeResourcePath. V1, without knowing the path to the compiler executable, produced incomplete sets of these paths.
Adopts Clang's 'DependencyScanningWorkerFilesystem' for use by the scanner, with the persistent
scanner instance keeping a 'DependencyScanningFilesystemSharedCache'.
Introduces a concept of a dependency scanning action context hash, which is used to select an instance of a global dependency scanning cache which gets re-used across dependency scanning actions.
This change tweaks the 'GlobalModuleDependenciesCache', which persists across scanner invocations with the same 'DependencyScanningTool' to no longer cache discovered Clang modules.
Doing so felt like a premature optimization, and we should instead attempt to share as much state as possible by keeping around the actual Clang scanner's state, which performs its own caching. Caching discovered dependencies both in the Clang scanner instance, and in our own cache is much more error-prone - the Clang scanner has a richer context for what is okay and not okay to cache/re-use.
Instead, we still cache discovered Clang dependencies *within* a given scan, since those are discovered using a common Clang scanner instance and should be safe to keep for the duration of the scan.
This change should make it simpler to pin down the core functionality and correctness of the scanner.
Once we turn our attention to the scanner's performance, we can revisit this strategy and optimize the caching behaviour.
When we are building a Swift module which has an underlying Clang module, and which generates an ObjC interface ('-Swift.h'), the mechanism for building the latter involves a VFS redirect of its modulemap to one that does not yet have the generated Swift code, because it must be built before the Swift portion is built because the Swift portion depends on it. This means that the invocation to build this module is different to one used by the clients which depend on this module.
To avoid the subsequent client scans from re-using the partial (VFS-redirected) module, ensure that we do not store dependency info of the underlying Clang module into the global scanner cache. This will cause subsequent client scans to re-scan for this module, and find the fully-resolved modulemap without a VFS redirect.
Resolves rdar://88309064
This does not seem to serve a purpose other than corrupting arguments with whitespaces - they get merged into one large string where the whitespace boundary between arguments and whitespaces within arguments are blurred.
Part of rdar://98985453