Just as with the result cache, instead of a single DenseMap with
type-erased AnyRequest keys, we can use per-request maps for a
nice performance improvement.
Other than simplifying some code, the big improvement here is
that we 'freeze' the reference dependencies for a request down
to a simple vector. We only use a DenseSet to store dependencies
of active requests.
Now that the top-level source file is the only dependency source that
matters, the only case that matters is when request evaluation enters
a primary file. For non-primaries, there will be no corresponding
swiftdeps file to emit references into, so we're just wasting time and
memory keeping track of anything that happens there.
This is only possible after we removed cascading dependencies because
unqualified lookups had to be charged to the files they originated in.
Now, we charge those lookups to the primary that initiated the request.
-enable-experimental-private-intransitive-dependencies -> -enable-direct-intramodule-dependencies
-disable-experimental-private-intransitive-dependencies -> -disable-direct-intramodule-dependencies
While we're here, rename DependencyCollector::Mode's constants and clean
up the documentation.
In order for private dependencies to be completely correct, it must perform the name lookup unioning step when a cached request is replayed - not just when lookups are first performed. In order to reduce the overhead of this union operation, it is not necessary to walk the entire active request stack, just walk to the nearest cached request in the stack and union into that. When it is popped, its replay step will itself union into the next cached request.
To see why, consider a request graph:
A* -> B -> C*
|
-> D*
where A, C, and D are cached.
If a caller were to force C and D, then force A independenty, today we would *only* replay the names looked up by C and D the first time A was evaluated. That is, subsequent evaluations of A do not replay the correct set of names. If we were to perform the union step during replay as well, requests that force A would also see C and D’s lookups.
Without this, callers that force requests like the DeclChecker have to be wary of the way they force the interface type request so other files see the right name sets.
rdar://64008262
Split off the notion of "recording" dependencies from the notion of
"collecting" dependencies. This corrects an oversight in the previous
design where dependency replay and recording were actually not "free" in
WMO where we actually never track dependencies. This architecture also
lays the groundwork for the removal of the referenced name trackers.
The algorithm builds upon the infrastructure for dependency sources and
sinks laid down during the cut over to request-based dependency tracking
in #30723.
The idea of the naive algorithm is this:
For a chain of requests A -> B* -> C -> D* -> ... -> L where L is a lookup
request and all starred requests are cached, once L writes into the
dependency collector, the active stack is walked and at each cache-point
the results of dependency collection are associated with the request
itself (in this example, B* and D* have all the names L found associated
with them). Subsequent evaluations of these cached requests (B* and D*
et al) will then *replay* the previous lookup results from L into the
active referenced name tracker. One complication is, suppose the
evaluation of a cached request involves multiple downstream name
lookups. More concretely, suppose we have the following request trace:
A* -> B -> L
|
-> C -> L
|
-> D -> L
|
-> ...
Then A* must see the union of the results of each L. If this reminds
anyone of a union-find, that is no accident! A persistent union-find
a la Conchon and Filliatre is probably in order to help bring down peak
heap usage...
Finish off private intransitive dependencies with an implementation of
dependency replay.
For the sake of illustration, imagine a chain of requests
A -> B -> C -> ...
Supposing each request is never cached, then every invocation of the
compiler with the same inputs will always kick off the exact same set of
requests. For the purposes of dependency tracking, that also means every
single lookup request will run without issue, and all dependencies will
be accurately reported. But we live in a world with cached requests.
Suppose request B* is cached. The first time we encounter that request,
its evaluation order looks identical:
A -> B* -> C -> ...
If we are in a mode that compiles single primaries, this is not
a problem because every request graph will look like this.
But if we are in a mode where we are compiling multiple primaries, then
subsequent request graphs will *actually* hit the cache and never
execute request C or any of its dependent computations!
A -> B*
Supposing C was a lookup request, that means the name(s) looked up
downstream of B* will *never* be recorded in the referenced name tracker
which can lead to miscompilation. Note that this is not a problem
inherent to the design of the request evaluator - caches in the compiler
have *always* hidden dependent lookups. In fact, the request evaluator
provides us our first opportunity to resolve this correctness bug!
Annotate the covered switches with `llvm_unreachable` to avoid the MSVC
warning which does not recognise the covered switches. This allows us
to avoid a spew of warnings.
Add a mode bit to the dependency collector that respects the frontend flag in the previous commit.
Notably, we now write over the dependency files at the end of the compiler pipeline when this flag is on so that dependency from SILGen and IRGen are properly written to disk.
Define a new type DependencyCollector that abstracts over the
incremental dependency gathering logic. This will insulate the
request-based name tracking code from future work on private,
intransitive dependencies.
* Document a number of legacy conditions and edge cases
* Add lexicon definitions for "dependency source", "dependency sink",
"cascading dependency" and "private dependency"
Convert most of the name lookup requests and a few other ancillary typechecking requests into dependency sinks.
Some requests are also combined sinks and sources in order to emulate the current scheme, which performs scope changes based on lookup flags. This is generally undesirable, since it means those requests cannot immediately be generalized to a purely context-based scheme because they depend on some client-provided entropy source. In particular, the few callers that are providing the "known private" name lookup flag need to be converted to perform lookups in the appropriate private context.
Clients that are passing "no known dependency" are currently considered universally incorrect and are outside the scope of the compatibility guarantees. This means that request-based dependency tracking registers strictly more edges than manual dependency tracking. It also means that once we fixup the clients that are passing "known private", we can completely remove these name lookup flags.
Finally, some tests had to change to accomodate the new scheme. Currently, we go out of our way to register a dependency edge for extensions that declare protocol conformances. However, we were also asserting in at least one test that extensions without protocol conformances weren't registering dependency edges. This is blatantly incorrect and has been undone now that the request-based scheme is automatically registering this edge.
Formalize DependencyScope, DependencySource, and the incremental dependency stack.
Also specialize SimpleRequest to formalize dependency sources and dependency sinks. This allows the evaluator's internal entrypoints to specalize away the incremental dependency tracking infrastructure if a request is not actually dependency-relevant.