Improve swift dependency scanner by validating and selecting dependency
module into scanner. This provides benefits that:
* Build system does not need to schedule interface compilation task if
the candidate module is picked, it can just use the candidate module
directly.
* There is no need for forwarding module in the explicit module build.
Since the build system is coordinating the build, there is no need for
the forwarding module in the module cache to avoid duplicated work,
* This also correctly supports all the module loading modes in the
dependency scanner.
This is achieved by only adding validate and up-to-date binary module as
the candidate module for swift interface module dependency. This allows
caching build to construct the correct dependency in the CAS. If there
is a candidate module for the interface module, dependency scanner will
return a binary module dependency in the dependency graph.
The legacy behavior is mostly preserved with a hidden frontend flag
`-no-scanner-module-validation`, while the scanner output is mostly
interchangeable with new scanner behavior with `prefer-interface` module
loading mode except the candidate module will not be returned.
rdar://123711823
In case import resolution order somehow sometimes matters, it's prudent to process/resolve/locate implicitly-imported modules first.
Resolves rdar://113917657
dependencies
It is valuable for clients to be able to distinguish which dependencies of a
Swift module originated from 'import' statements, and which ones are implicit
dependency Swift overlays of imported Clang modules.
The code of `ScanDependencies.cpp` was creating invalid JSON since #66031
because in the case of having `extraPcmArgs` and `swiftOverlayDependencies`,
but not `bridgingHeader`, a comma will not be added at the end of
`extraPcmArgs`, creating an invalid JSON file. Additionally that same PR
added a trailing comma at the end of the `swiftOverlayDependencies`, which
valid JSON does not allow, but that bug was removed in #66366.
Both problems are, however, present in the 5.9 branch, because #66936
included #66031, but not #66366.
Besides fixing the problem in `ScanDependencies.cpp` I modified every test
that uses `--scan-dependencies` to pass the produced JSON through
Python's `json.tool` in order to validate proper JSON is produced. In
most cases I was able to pipe the output of the tool into `FileCheck`,
but in some cases the validation is done by itself because the checks
depend on the exact format generated by `--scan-dependencies`. In
a couple of tests I added a call to `FileCheck` that seemed to be
missing.
Without these changes, two tests seems to be generating invalid JSON in
my machine:
- `ScanDependencies/local_cache_consistency.swift` (which outputs `Expecting ',' delimiter: line 525 column 11 (char 22799)`)
- `ScanDependencies/placholder_overlay_deps.swift`
Do this by computing a transitive closure on the computed dependency graph, relying on the fact that it is a DAG.
The used algorithm is:
```
for each v ∈ V {
T(v) = { v }
}
for v ∈ V in reverse topological order {
for each (v, w) ∈ E {
T(v) = T(v) ∪ T(w)
}
}
```
This changes the scanner's behavior to "resolve" a discovered module's dependencies to a set of Module IDs: module name + module kind (swift textual, swift binary, clang, etc.).
The 'ModuleDependencyInfo' objects that are stored in the dependency scanner's cache now carry a set of kind-qualified ModuleIDs for their dependencies, in addition to unqualified imported module names of their dependencies.
Previously, the scanner's internal state would cache a module dependnecy as having its own set of dependencies which were stored as names of imported modules. This led to a design where any time we needed to process the dependency downstream from its discovery (e.g. cycle detection, graph construction), we had to query the ASTContext to resolve this dependency's imports, which shouldn't be necessary. Now, upon discovery, we "resolve" a discovered dependency by executing a lookup for each of its imported module names (this operation happens regardless of this patch) and store a fully-resolved set of dependencies in the dependency module info.
Moreover, looking up a given module dependency by name (via `ASTContext`'s `getModuleDependencies`) would result in iterating over the scanner's module "loaders" and querying each for the module name. The corresponding modules would then check the scanner's cache for a respective discovered module, and if no such module is found the "loader" would search the filesystem.
This meant that in practice, we searched the filesystem on many occasions where we actually had cached the required dependency, as follows:
Suppose we had previously discovered a Clang module "foo" and cached its dependency info.
-> ASTContext.getModuleDependencies("foo")
--> (1) Swift Module "Loader" checks caches for a Swift module "foo" and doesn't find one, so it searches the filesystem for "foo" and fails to find one.
--> (2) Clang Module "Loader" checks caches for a Clang module "foo", finds one and returns it to the client.
This means that we were always searching the filesystem in (1) even if we knew that to be futile.
With this change, queries to `ASTContext`'s `getModuleDependencies` will always check all the caches first, and only delegate to the scanner "loaders" if no cached dependency is found. The loaders are then no longer in the business of checking the cached contents.
To handle cases in the scanner where we must only lookup either a Swift-only module or a Clang-only module, this patch splits 'getModuleDependencies' into an alrady-existing 'getSwiftModuleDependencies' and a newly-added 'getClangModuleDependencies'.
We were previously forgetting about these.
On top of that, make sure we treat accumulated dependencies as a set, because adding clang overlay dependencies may duplicate existing already-added dependencies for some module.
This matches the behavior of the current client (`swift-driver`) and reduces ambiguity in how the nodes in the graph are to be treated. Swift dependencies with a textual interface, for example, must be built into a binary module by clients. Swift dependencies without a textual interface, with only a binary module, are to be used directly, without any up-to-date checks.
Note, this is distinct from Swift dependencies that have a textual interface, for which we also detect potential pre-build binary module candidates. Those are still reported in the `details` field of textual Swift dependencies as `prebuiltModuleCandidates`.
This is meant to address a problem that arises on incremental package builds:
A re-scan on an already-built package instead of a placeholder dependency produces a graph that contains a dependency consisting solely of the previously-built swiftmodule.
This is not the behaviour we would like. Instead, we should respect that this is a placeholder dependency and ensure that the dependency graph for that dependency itself captures the fact that a previously-built module exists using compiledModuleCandidates field of the dependency graph.
This ensures that when the dependency scanner is invoked with additional clang (`-Xcc`) options, the Clang scanner is correctly configured using these options.
In the fast dependency scanner, depending on whether a module intrface was found via the import search path or framework search path, encode into the dependency graph Swift module details, whether a given module is a framework.