Teach swift dependency scanner to use CAS to capture the full dependencies for a build and construct build commands with immutable inputs from CAS.
This allows swift compilation caching using CAS.
Instead of being a part of 'directDependencies' on a module dependency info, make them a separate array of dependency IDs for Swift Source and Textual modules.
This will allow clients to still distinguish direct module dependencies imported from a given module, versus dependencies added because direct/transitive Clang module dependencies have Swift overlays.
This change does *not* remove overlay dependencies from 'directDependencies' yet, just adds them as a separate field on the module details info. A followup change will remove overlay and bridging header dependencies from 'directDependencies' once the clients have had a chance to adopt to this change.
For a `@Testable` import in program source, if a Swift interface dependency is discovered, and has an adjacent binary `.swiftmodule`, open up the module, and pull in its optional dependencies. If an optional dependency cannot be resolved on the filesystem, fail silently without raising a diagnostic.
Using a virutal output backend to capture all the outputs from
swift-frontend invocation. This allows redirecting and/or mirroring
compiler outputs to multiple location using different OutputBackend.
As an example usage for the virtual outputs, teach swift compiler to
check its output determinism by running the compiler invocation
twice and compare the hash of all its outputs.
Virtual output will be used to enable caching in the future.
For example, when scanning a source module `Foo`, which, when depending on module `Bar` causes a cross-import overlay `_Foo_Bar` to be added, do not add this cross-import overlay when scanning `Foo` itself. For example, if `Foo` adds a dependency on `Bar` itself in its own dependency graph.
Add them to the set of direct dependencies of the Swift module the bridging header belongs to, therefore also ensuiring that their module info will be contained in in the output graph.
Part of rdar://105742859
- '-o <output_path>'
- '-disable-implicit-swift-modules'
- '-Xcc -fno-implicit-modules' and '-Xcc -fno-implicit-module-maps'
- '-candidate-module-file'
These were previously supplied by the driver. Instead, they will now be ready to be run directly from the dependency scanner's output.
Do this by computing a transitive closure on the computed dependency graph, relying on the fact that it is a DAG.
The used algorithm is:
```
for each v ∈ V {
T(v) = { v }
}
for v ∈ V in reverse topological order {
for each (v, w) ∈ E {
T(v) = T(v) ∪ T(w)
}
}
```
Otherwise the scanning action will not look for them as dependencies, and the compilation it is used to inform will not specify these moduels as explicit inpouts.
Resolves rdar://104761392
This changes the scanner's behavior to "resolve" a discovered module's dependencies to a set of Module IDs: module name + module kind (swift textual, swift binary, clang, etc.).
The 'ModuleDependencyInfo' objects that are stored in the dependency scanner's cache now carry a set of kind-qualified ModuleIDs for their dependencies, in addition to unqualified imported module names of their dependencies.
Previously, the scanner's internal state would cache a module dependnecy as having its own set of dependencies which were stored as names of imported modules. This led to a design where any time we needed to process the dependency downstream from its discovery (e.g. cycle detection, graph construction), we had to query the ASTContext to resolve this dependency's imports, which shouldn't be necessary. Now, upon discovery, we "resolve" a discovered dependency by executing a lookup for each of its imported module names (this operation happens regardless of this patch) and store a fully-resolved set of dependencies in the dependency module info.
Moreover, looking up a given module dependency by name (via `ASTContext`'s `getModuleDependencies`) would result in iterating over the scanner's module "loaders" and querying each for the module name. The corresponding modules would then check the scanner's cache for a respective discovered module, and if no such module is found the "loader" would search the filesystem.
This meant that in practice, we searched the filesystem on many occasions where we actually had cached the required dependency, as follows:
Suppose we had previously discovered a Clang module "foo" and cached its dependency info.
-> ASTContext.getModuleDependencies("foo")
--> (1) Swift Module "Loader" checks caches for a Swift module "foo" and doesn't find one, so it searches the filesystem for "foo" and fails to find one.
--> (2) Clang Module "Loader" checks caches for a Clang module "foo", finds one and returns it to the client.
This means that we were always searching the filesystem in (1) even if we knew that to be futile.
With this change, queries to `ASTContext`'s `getModuleDependencies` will always check all the caches first, and only delegate to the scanner "loaders" if no cached dependency is found. The loaders are then no longer in the business of checking the cached contents.
To handle cases in the scanner where we must only lookup either a Swift-only module or a Clang-only module, this patch splits 'getModuleDependencies' into an alrady-existing 'getSwiftModuleDependencies' and a newly-added 'getClangModuleDependencies'.
Adopts Clang's 'DependencyScanningWorkerFilesystem' for use by the scanner, with the persistent
scanner instance keeping a 'DependencyScanningFilesystemSharedCache'.
Introduces a concept of a dependency scanning action context hash, which is used to select an instance of a global dependency scanning cache which gets re-used across dependency scanning actions.
`getValue` -> `value`
`getValueOr` -> `value_or`
`hasValue` -> `has_value`
`map` -> `transform`
The old API will be deprecated in the rebranch.
To avoid merge conflicts, use the new API already in the main branch.
rdar://102362022
This change tweaks the 'GlobalModuleDependenciesCache', which persists across scanner invocations with the same 'DependencyScanningTool' to no longer cache discovered Clang modules.
Doing so felt like a premature optimization, and we should instead attempt to share as much state as possible by keeping around the actual Clang scanner's state, which performs its own caching. Caching discovered dependencies both in the Clang scanner instance, and in our own cache is much more error-prone - the Clang scanner has a richer context for what is okay and not okay to cache/re-use.
Instead, we still cache discovered Clang dependencies *within* a given scan, since those are discovered using a common Clang scanner instance and should be safe to keep for the duration of the scan.
This change should make it simpler to pin down the core functionality and correctness of the scanner.
Once we turn our attention to the scanner's performance, we can revisit this strategy and optimize the caching behaviour.
When we are building a Swift module which has an underlying Clang module, and which generates an ObjC interface ('-Swift.h'), the mechanism for building the latter involves a VFS redirect of its modulemap to one that does not yet have the generated Swift code, because it must be built before the Swift portion is built because the Swift portion depends on it. This means that the invocation to build this module is different to one used by the clients which depend on this module.
To avoid the subsequent client scans from re-using the partial (VFS-redirected) module, ensure that we do not store dependency info of the underlying Clang module into the global scanner cache. This will cause subsequent client scans to re-scan for this module, and find the fully-resolved modulemap without a VFS redirect.
Resolves rdar://88309064
This separates it from `libSwiftScan` and allows us to build this library without building much of the rest of the compiler.
Also refactor `utils/build-parser-lib` into `utils/build-tooling-libs` which builds both SwiftSyntaxParser and SwiftStaticMirror.
The Windows uses `\` as a path separator, which is not permitted within
a JSON string without escaping. This corrects the encoding of the path
separator in the emitted dependency information. This issue was found
through the swift-driver test suite.
Instead of checking that the stdlib can be loaded in a variety of places, check it when setting up the compiler instance. This required a couple more checks to avoid loading the stdlib in cases where it’s not needed.
To be able to differentiate stdlib loading failures from other setup errors, make `CompilerInstance::setup` return an error message on failure via an inout parameter. Consume that error on the call side, replacing a previous, more generic error message, adding error handling where appropriate or ignoring the error message, depending on the context.
llvm-project `ErrorHandling.h` was updated to remove std::string. This
added a new `report_fatal_error` overload taking a `const Twine &`,
removed the overload that took `const std::string &`, and updated
`fatal_error_handler_t` to use `const char *` rather than `const
std::string &`.
Fix uses of these functions to take into account these updates. Note
that without the `const std::string &` overload, passing a `std::string`
into `report_fatal_error` now results in an ambiguous match between the
`StringRef` and `Twine` overloads so we need to be explicit about one or
the other.
Doing so will allow clients to know which Swift-specific PCM arguments are already captured from the scan that first discovered this module.
SwiftDriver, in particular, will be able to use this information to avoid re-scanning a given Clang module if the initial scan was sufficient for all possible sets of PCM arguments on Swift modules that depend on said Clang module.
And only resolve cached dependencies that came from scanning actions with the same target triple.
This change means that the `GlobalModuleDependenciesCache` must be configured with a specific target triple for every scannig action, and it will only resolve previously-found dependencies from previous scannig actions using the exact same triple.
Furthermore, the `GlobalModuleDependenciesCache` separately tracks source-file-based module dependencies as those represent main Swift modules of previous scanning actions, and we must be able to resolve those regardless of the target triple.
Resolves rdar://83105455
These kinds of modules differ from `SwiftTextual` modules in that they do not have an interface and have source-files.
It is cleaner to enforce this distinction with types, instead of checking for interface optionality everywhere.
This change causes the cache to be layered with a local "cache" that wraps the global cache, which will serve as the source of truth. The local cache persists only for the duration of a given scanning action, and has a store of references to dependencies resolved as a part of the current scanning action only, while the global cache is the one that persists across scanning actions (e.g. in `DependencyScanningTool`) and stores actual module dependency info values.
Only the local cache can answer dependency lookup queries, checking current scanning action results first, before falling back to querying the global cache, with queries disambiguated by the current scannning action's search paths, ensuring we never resolve a dependency lookup query with a module info that could not be found in the current action's search paths.
This change is required because search-path disambiguation can lead to false-negatives: for example, the Clang dependency scanner may find modules relative to the compiler's path that are not on the compiler's direct search paths. While such false-negative query responses should be functionally safe, we rely on the current scanning action's results being always-present-in-the-cache for the scanner's functionality. This layering ensures that the cache use-sites remain unchanged and that we get both: preserved global state which can be queried disambiguated with the search path details, and an always-consistent local (current action) cache state.
The dependency scanner's cache persists across different queries and answering a subsequent query's module lookup with a module not in the query's search path is not correct.
For example, suppose we are looking for a Swift module `Foo` with a set of search paths `SP`.
And dependency scanner cache already contains a module `Foo`, for which we found an interface file at location `L`. If `L`∉`SP`, then we cannot re-use the cached entry because we’d be resolving the scanning query to a filesystem location that the current scanning context is not aware of.
Resolves rdar://81175942