Otherwise, when multiple workers encounter a diagnostic simultaneously we can encounter races which lead to corrupted diagnostic data or crashes
Resolves rdar://159598539
Since we enabled parallel dependency scanning by-default, each individual scan needs a diagnostic consumer that is safe to use across many threads. Deprecate the 'Locking' sub-class, making its behavior the default in the base class.
For the main source module, provide info on which dependencies are directly imported into the user program, explicitly ('import' statement) or implicitly (e.g. stdlib). Thist list does not include Swift overlay dependencies, cross-import dependencies, bridging header dependencies.
With '-sdk-module-cache-path', Swift textual interfaces found in the SDK will be built into a separate SDK-specific module cache.
Clang modules are not yet affected by this change, pending addition of the required API.
Batch dependency scanning was added as a mechanism to support multiple compilation contexts within a single module dependency graph.
The Swift compiler and the Explicitly-built modules model has long since abandoned this approach and this code has long been stale. It is time to remove it and its associated C API.
Instead, each scan's 'ModuleDependenciesCache' will hold all of the data corresponding to discovered module dependencies.
The initial design presumed the possibility of sharing a global scanning cache amongs different scanner invocations, possibly even different concurrent scanner invocations.
This change also deprecates two libSwiftScan entry-points: 'swiftscan_scanner_cache_load' and 'swiftscan_scanner_cache_serialize'. They never ended up getting used, and since this code has been largely stale, we are confident they have not otherwise had users, and they do not fit with this design.
A follow-up change will re-introduce moduele dependency cache serialization on a per-query basis and bring the binary format up-to-date.
The scanning action does not have any need for handling `-llvm` options, since it will never perform any code-gen. LLVM option processing relies on global option parsing structures, and the scanner has needed to carefully attempt to synchronize access to them. This change guards the configuration of LLVM options to not happen at all for dependency scanning actions, and removes calls to `llvm::cl::ResetAllOptionOccurrences()` that were previously needed.
Resolves rdar://120754696
Teach scanner to respect the working directory set in the invocation
through scanner C API.
Also add test infrastructure to testing scanner from C API. Break up
DependencyScan lib into two so the swift-scan-test and remain small
without understanding swift AST.
rdar://127626011
This change modifies the dependency scanner to keep track of source locations of each encountered 'import' statement, in order to be able to emit diagnostics with source locations if an import failed to resolve.
- Keep track of each 'import' statement's source buffer, line number, and column number when adding it. The dependency scanner utilizes separate compilation instances, and therefore separate Source Managers for scanning `import` statements of user sources and textual interfaces of Swift dependencies. Since import resolution may happen in the main scanner compilation instance while the `import` itself was found by an interface-scanning sub-instance, we cannot simply hold on to the import's `SourceLoc`.
- Add libSwiftScan API for diagnostics to carry above source locations to clients.
Guard 'resetCache' and 'resetDiagnostics' as critical sections using 'DependencyScanningToolStateLock'. Otherwise there's a chance that one thread in the scanner is doing a reset on the diagnostic consumer, while some other thread is adding this diagnostic consumer to another scan instance which may also be populating said consumer at the time. Similarly, for resetting 'ScanningService' though it is much more unlikely to be reset while in-use by other scanning actions.
From being a scattered collection of 'static' methods in ScanDependencies.cpp
and member methods of ASTContext. This makes 'ScanDependencies.cpp' much easier
to read, and abstracts the actual scanning logic away to a place with common
state which will make it easier to reason about in the future.
Instead of the code querying the compiler's built-in Clang instance, refactor the
dependency scanner to explicitly keep track of module output path. It is still
set according to '-module-cache-path' as it has been prior to this change, but
now the scanner can use a different module cache for scanning PCMs, as specified
with '-clang-scanner-module-cache-path', without affecting module output path.
Resolves rdar://113222853
Teach swift dependency scanner to use CAS to capture the full dependencies for a build and construct build commands with immutable inputs from CAS.
This allows swift compilation caching using CAS.
Other parts of the scanner lib (e.g. target-info query) already do this. We must always make sure to process the incoming command-line strings and run them through 'llvm::cl::TokenizeGNUCommandLine' in order to process escaped paths.
Part of rdar://106712169
Using a virutal output backend to capture all the outputs from
swift-frontend invocation. This allows redirecting and/or mirroring
compiler outputs to multiple location using different OutputBackend.
As an example usage for the virtual outputs, teach swift compiler to
check its output determinism by running the compiler invocation
twice and compare the hash of all its outputs.
Virtual output will be used to enable caching in the future.
Using mutual exclusion, ensuring that multiple threads executing dependency scans do not encounter data races on shared mutable state.
There are two layers with shared state where we need to be careful:
- `DependencyScanningTool`, as the main entity that scanning clients interact with. This tool instantiates compiler instances for individual scans, computing a scanning invocation hash. It needs to remember those instances for future use, and when creating instances it needs to reset LLVM argument processor's global state, meaning all uses of argument processing must be in a critical section.
- `SwiftDependencyScanningService`, as the main cache where dependency scanning results are stored. Each individual scan instantiates a `ModuleDependenciesCache`, which uses the scanning service as the underlying storage. The services' storage is segmented to storing dependencies discovered in a scan with a given context hash, which means two different scanning invocations running at the same time will be accessing different locations in its storage, thus not requiring synchronization. But the service still has some shared state that must be protected, such as the collection of discovered source modules, and the map used to query context-hash-specific underlying cache storage.
We would previously unconditionally treat the command line as GNU style arguments. However, Windows uses a different command-line style, and this would incorrectly process the arguments, potentially corrupting paths which do not quote the path separator. Ideally, we would introduce a new api (`swiftscan_compiler_target_info_query_v3`?) that takes a quoting style (matching `--rsp-quoting`) which would allow us to support both quoting styles properly.
This new version takes the path to the compiler executable as a parameter, in order for libSwiftScan to compute compiler-relative portions of runtimeLibraryPaths, runtimeResourcePath. V1, without knowing the path to the compiler executable, produced incomplete sets of these paths.
Adopts Clang's 'DependencyScanningWorkerFilesystem' for use by the scanner, with the persistent
scanner instance keeping a 'DependencyScanningFilesystemSharedCache'.
Introduces a concept of a dependency scanning action context hash, which is used to select an instance of a global dependency scanning cache which gets re-used across dependency scanning actions.
This change tweaks the 'GlobalModuleDependenciesCache', which persists across scanner invocations with the same 'DependencyScanningTool' to no longer cache discovered Clang modules.
Doing so felt like a premature optimization, and we should instead attempt to share as much state as possible by keeping around the actual Clang scanner's state, which performs its own caching. Caching discovered dependencies both in the Clang scanner instance, and in our own cache is much more error-prone - the Clang scanner has a richer context for what is okay and not okay to cache/re-use.
Instead, we still cache discovered Clang dependencies *within* a given scan, since those are discovered using a common Clang scanner instance and should be safe to keep for the duration of the scan.
This change should make it simpler to pin down the core functionality and correctness of the scanner.
Once we turn our attention to the scanner's performance, we can revisit this strategy and optimize the caching behaviour.
When we are building a Swift module which has an underlying Clang module, and which generates an ObjC interface ('-Swift.h'), the mechanism for building the latter involves a VFS redirect of its modulemap to one that does not yet have the generated Swift code, because it must be built before the Swift portion is built because the Swift portion depends on it. This means that the invocation to build this module is different to one used by the clients which depend on this module.
To avoid the subsequent client scans from re-using the partial (VFS-redirected) module, ensure that we do not store dependency info of the underlying Clang module into the global scanner cache. This will cause subsequent client scans to re-scan for this module, and find the fully-resolved modulemap without a VFS redirect.
Resolves rdar://88309064
This does not seem to serve a purpose other than corrupting arguments with whitespaces - they get merged into one large string where the whitespace boundary between arguments and whitespaces within arguments are blurred.
Part of rdar://98985453
This separates it from `libSwiftScan` and allows us to build this library without building much of the rest of the compiler.
Also refactor `utils/build-parser-lib` into `utils/build-tooling-libs` which builds both SwiftSyntaxParser and SwiftStaticMirror.
Instead of checking that the stdlib can be loaded in a variety of places, check it when setting up the compiler instance. This required a couple more checks to avoid loading the stdlib in cases where it’s not needed.
To be able to differentiate stdlib loading failures from other setup errors, make `CompilerInstance::setup` return an error message on failure via an inout parameter. Consume that error on the call side, replacing a previous, more generic error message, adding error handling where appropriate or ignoring the error message, depending on the context.
And only resolve cached dependencies that came from scanning actions with the same target triple.
This change means that the `GlobalModuleDependenciesCache` must be configured with a specific target triple for every scannig action, and it will only resolve previously-found dependencies from previous scannig actions using the exact same triple.
Furthermore, the `GlobalModuleDependenciesCache` separately tracks source-file-based module dependencies as those represent main Swift modules of previous scanning actions, and we must be able to resolve those regardless of the target triple.
Resolves rdar://83105455
These kinds of modules differ from `SwiftTextual` modules in that they do not have an interface and have source-files.
It is cleaner to enforce this distinction with types, instead of checking for interface optionality everywhere.
This change causes the cache to be layered with a local "cache" that wraps the global cache, which will serve as the source of truth. The local cache persists only for the duration of a given scanning action, and has a store of references to dependencies resolved as a part of the current scanning action only, while the global cache is the one that persists across scanning actions (e.g. in `DependencyScanningTool`) and stores actual module dependency info values.
Only the local cache can answer dependency lookup queries, checking current scanning action results first, before falling back to querying the global cache, with queries disambiguated by the current scannning action's search paths, ensuring we never resolve a dependency lookup query with a module info that could not be found in the current action's search paths.
This change is required because search-path disambiguation can lead to false-negatives: for example, the Clang dependency scanner may find modules relative to the compiler's path that are not on the compiler's direct search paths. While such false-negative query responses should be functionally safe, we rely on the current scanning action's results being always-present-in-the-cache for the scanner's functionality. This layering ensures that the cache use-sites remain unchanged and that we get both: preserved global state which can be queried disambiguated with the search path details, and an always-consistent local (current action) cache state.