Teach swift dependency scanner to use CAS to capture the full dependencies for a build and construct build commands with immutable inputs from CAS.
This allows swift compilation caching using CAS.
Instead of being a part of 'directDependencies' on a module dependency info, make them a separate array of dependency IDs for Swift Source and Textual modules.
This will allow clients to still distinguish direct module dependencies imported from a given module, versus dependencies added because direct/transitive Clang module dependencies have Swift overlays.
This change does *not* remove overlay dependencies from 'directDependencies' yet, just adds them as a separate field on the module details info. A followup change will remove overlay and bridging header dependencies from 'directDependencies' once the clients have had a chance to adopt to this change.
Other parts of the scanner lib (e.g. target-info query) already do this. We must always make sure to process the incoming command-line strings and run them through 'llvm::cl::TokenizeGNUCommandLine' in order to process escaped paths.
Part of rdar://106712169
For a `@Testable` import in program source, if a Swift interface dependency is discovered, and has an adjacent binary `.swiftmodule`, open up the module, and pull in its optional dependencies. If an optional dependency cannot be resolved on the filesystem, fail silently without raising a diagnostic.
Using a virutal output backend to capture all the outputs from
swift-frontend invocation. This allows redirecting and/or mirroring
compiler outputs to multiple location using different OutputBackend.
As an example usage for the virtual outputs, teach swift compiler to
check its output determinism by running the compiler invocation
twice and compare the hash of all its outputs.
Virtual output will be used to enable caching in the future.
Instead, treat them like any other module that is specific to the scanning context hash of the scan it originates from.
Otherwise we may actually have simultaneous scans happening for the same source module but with different context hashes, and the current scheme leads to collisions.
For example, when scanning a source module `Foo`, which, when depending on module `Bar` causes a cross-import overlay `_Foo_Bar` to be added, do not add this cross-import overlay when scanning `Foo` itself. For example, if `Foo` adds a dependency on `Bar` itself in its own dependency graph.
Using mutual exclusion, ensuring that multiple threads executing dependency scans do not encounter data races on shared mutable state.
There are two layers with shared state where we need to be careful:
- `DependencyScanningTool`, as the main entity that scanning clients interact with. This tool instantiates compiler instances for individual scans, computing a scanning invocation hash. It needs to remember those instances for future use, and when creating instances it needs to reset LLVM argument processor's global state, meaning all uses of argument processing must be in a critical section.
- `SwiftDependencyScanningService`, as the main cache where dependency scanning results are stored. Each individual scan instantiates a `ModuleDependenciesCache`, which uses the scanning service as the underlying storage. The services' storage is segmented to storing dependencies discovered in a scan with a given context hash, which means two different scanning invocations running at the same time will be accessing different locations in its storage, thus not requiring synchronization. But the service still has some shared state that must be protected, such as the collection of discovered source modules, and the map used to query context-hash-specific underlying cache storage.
Add them to the set of direct dependencies of the Swift module the bridging header belongs to, therefore also ensuiring that their module info will be contained in in the output graph.
Part of rdar://105742859
- '-o <output_path>'
- '-disable-implicit-swift-modules'
- '-Xcc -fno-implicit-modules' and '-Xcc -fno-implicit-module-maps'
- '-candidate-module-file'
These were previously supplied by the driver. Instead, they will now be ready to be run directly from the dependency scanner's output.
Do this by computing a transitive closure on the computed dependency graph, relying on the fact that it is a DAG.
The used algorithm is:
```
for each v ∈ V {
T(v) = { v }
}
for v ∈ V in reverse topological order {
for each (v, w) ∈ E {
T(v) = T(v) ∪ T(w)
}
}
```
We would previously unconditionally treat the command line as GNU style arguments. However, Windows uses a different command-line style, and this would incorrectly process the arguments, potentially corrupting paths which do not quote the path separator. Ideally, we would introduce a new api (`swiftscan_compiler_target_info_query_v3`?) that takes a quoting style (matching `--rsp-quoting`) which would allow us to support both quoting styles properly.
Otherwise the scanning action will not look for them as dependencies, and the compilation it is used to inform will not specify these moduels as explicit inpouts.
Resolves rdar://104761392
This new version takes the path to the compiler executable as a parameter, in order for libSwiftScan to compute compiler-relative portions of runtimeLibraryPaths, runtimeResourcePath. V1, without knowing the path to the compiler executable, produced incomplete sets of these paths.
This changes the scanner's behavior to "resolve" a discovered module's dependencies to a set of Module IDs: module name + module kind (swift textual, swift binary, clang, etc.).
The 'ModuleDependencyInfo' objects that are stored in the dependency scanner's cache now carry a set of kind-qualified ModuleIDs for their dependencies, in addition to unqualified imported module names of their dependencies.
Previously, the scanner's internal state would cache a module dependnecy as having its own set of dependencies which were stored as names of imported modules. This led to a design where any time we needed to process the dependency downstream from its discovery (e.g. cycle detection, graph construction), we had to query the ASTContext to resolve this dependency's imports, which shouldn't be necessary. Now, upon discovery, we "resolve" a discovered dependency by executing a lookup for each of its imported module names (this operation happens regardless of this patch) and store a fully-resolved set of dependencies in the dependency module info.
Moreover, looking up a given module dependency by name (via `ASTContext`'s `getModuleDependencies`) would result in iterating over the scanner's module "loaders" and querying each for the module name. The corresponding modules would then check the scanner's cache for a respective discovered module, and if no such module is found the "loader" would search the filesystem.
This meant that in practice, we searched the filesystem on many occasions where we actually had cached the required dependency, as follows:
Suppose we had previously discovered a Clang module "foo" and cached its dependency info.
-> ASTContext.getModuleDependencies("foo")
--> (1) Swift Module "Loader" checks caches for a Swift module "foo" and doesn't find one, so it searches the filesystem for "foo" and fails to find one.
--> (2) Clang Module "Loader" checks caches for a Clang module "foo", finds one and returns it to the client.
This means that we were always searching the filesystem in (1) even if we knew that to be futile.
With this change, queries to `ASTContext`'s `getModuleDependencies` will always check all the caches first, and only delegate to the scanner "loaders" if no cached dependency is found. The loaders are then no longer in the business of checking the cached contents.
To handle cases in the scanner where we must only lookup either a Swift-only module or a Clang-only module, this patch splits 'getModuleDependencies' into an alrady-existing 'getSwiftModuleDependencies' and a newly-added 'getClangModuleDependencies'.
Adopts Clang's 'DependencyScanningWorkerFilesystem' for use by the scanner, with the persistent
scanner instance keeping a 'DependencyScanningFilesystemSharedCache'.
Introduces a concept of a dependency scanning action context hash, which is used to select an instance of a global dependency scanning cache which gets re-used across dependency scanning actions.
`getValue` -> `value`
`getValueOr` -> `value_or`
`hasValue` -> `has_value`
`map` -> `transform`
The old API will be deprecated in the rebranch.
To avoid merge conflicts, use the new API already in the main branch.
rdar://102362022
This change tweaks the 'GlobalModuleDependenciesCache', which persists across scanner invocations with the same 'DependencyScanningTool' to no longer cache discovered Clang modules.
Doing so felt like a premature optimization, and we should instead attempt to share as much state as possible by keeping around the actual Clang scanner's state, which performs its own caching. Caching discovered dependencies both in the Clang scanner instance, and in our own cache is much more error-prone - the Clang scanner has a richer context for what is okay and not okay to cache/re-use.
Instead, we still cache discovered Clang dependencies *within* a given scan, since those are discovered using a common Clang scanner instance and should be safe to keep for the duration of the scan.
This change should make it simpler to pin down the core functionality and correctness of the scanner.
Once we turn our attention to the scanner's performance, we can revisit this strategy and optimize the caching behaviour.
When we are building a Swift module which has an underlying Clang module, and which generates an ObjC interface ('-Swift.h'), the mechanism for building the latter involves a VFS redirect of its modulemap to one that does not yet have the generated Swift code, because it must be built before the Swift portion is built because the Swift portion depends on it. This means that the invocation to build this module is different to one used by the clients which depend on this module.
To avoid the subsequent client scans from re-using the partial (VFS-redirected) module, ensure that we do not store dependency info of the underlying Clang module into the global scanner cache. This will cause subsequent client scans to re-scan for this module, and find the fully-resolved modulemap without a VFS redirect.
Resolves rdar://88309064
This does not seem to serve a purpose other than corrupting arguments with whitespaces - they get merged into one large string where the whitespace boundary between arguments and whitespaces within arguments are blurred.
Part of rdar://98985453
This separates it from `libSwiftScan` and allows us to build this library without building much of the rest of the compiler.
Also refactor `utils/build-parser-lib` into `utils/build-tooling-libs` which builds both SwiftSyntaxParser and SwiftStaticMirror.
The Windows uses `\` as a path separator, which is not permitted within
a JSON string without escaping. This corrects the encoding of the path
separator in the emitted dependency information. This issue was found
through the swift-driver test suite.
Instead of checking that the stdlib can be loaded in a variety of places, check it when setting up the compiler instance. This required a couple more checks to avoid loading the stdlib in cases where it’s not needed.
To be able to differentiate stdlib loading failures from other setup errors, make `CompilerInstance::setup` return an error message on failure via an inout parameter. Consume that error on the call side, replacing a previous, more generic error message, adding error handling where appropriate or ignoring the error message, depending on the context.