This reinstates the use of direct adjacency information when gathering
constraints, effectively reverting
54bdd7b840.
Fixes the regression that commit caused, which is tracked by
rdar://problem/54274245.
Reinstate the list of adjacencies in each constraint graph node,
effectively reverting
dfdd352d3d. Exclude one-way constraints
from this computation; we'll handle them separately.
One-way constraint expressions, which are the only things that
introduce one-way constraints at this point, want to look through
lvalue types to produce values. Rename OneWayBind to OneWayEqual, map
it down to an Equal constraint when it is simplified (to drop
lvalue-ness), and apply that coercion during constraint application.
Part of rdar://problem/50150793.
There were a few places where we wanted fast testing to see whether a
particular type variable is currently of interest. Instead of building
local hash tables in those places, keep type variables in a SetVector
for efficient testing.
We only care about gathering a one-way constraint if (1) the left-hand
side is in the set of type variables we care about now, and (2) the
type variable we started from is in the right-hand side.
When we encounter a cycle in the one-way component graph due to constraints
that (e.g.) tie together the outputs of two one-way constraints, collapse
the components along the cycle into a single connected component, because
all of those type variables must be solved together.
Introduce the notion of "one-way" binding constraints of the form
$T0 one-way bind to $T1
which treats the type variables $T0 and $T1 as independent up until
the point where $T1 simplifies down to a concrete type, at which point
$T0 will be bound to that concrete type. $T0 won't be bound in any
other way, so type information ends up being propagated right-to-left,
only. This allows a constraint system to be broken up in more
components that are solved independently. Specifically, the connected
components algorithm now proceeds as follows:
1. Compute connected components, excluding one-way constraints from
consideration.
2. Compute a directed graph amongst the components using only the
one-way constraints, where an edge A -> B indicates that the type
variables in component A need to be solved before those in component
B.
3. Using the directed graph, compute the set of components that need
to be solved before a given component.
To utilize this, implement a new kind of solver step that handles the
propagation of partial solutions across one-way constraints. This
introduces a new kind of "split" within a connected component, where
we collect each combination of partial solutions for the input
components and (separately) try to solve the constraints in this
component. Any correct solution from any of these attempts will then
be recorded as a (partial) solution for this component.
For example, consider:
let _: Int8 = b ? Builtin.one_way(int8Or16(17)) :
Builtin.one_way(int8Or16(42\
))
where int8Or16 is overloaded with types `(Int8) -> Int8` and
`(Int16) -> Int16`. There are two one-way components (`int8Or16(17)`)
and (`int8Or16(42)`), each of which can produce a value of type `Int8`
or `Int16`. Those two components will be solved independently, and the
partial solutions for each will be fed into the component that
evaluates the ternary operator. There are four ways to attempt that
evaluation:
```
[Int8, Int8]
[Int8, Int16]
[Int16, Int8]
[Int16, Int16]
To test this, introduce a new expression builtin `Builtin.one_way(x)` that
introduces a one-way expression constraint binding the result of the
expression 'x'. The builtin is meant to be used for testing purposes,
and the one-way constraint expression itself can be synthesized by the
type checker to introduce one-way constraints later on.
Of these two, there are only two (partial) solutions that can work at
all, because the types in the ternary operator need a common
supertype:
[Int8, Int8]
[Int16, Int16]
Therefore, these are the partial solutions that will be considered the
results of the component containing the ternary expression. Note that
only one of them meets the final constraint (convertibility to
`Int8`), so the expression is well-formed.
Part of rdar://problem/50150793.
Move the logic for creating connected components of orphaned
constraints into the connected-components algorithm code, rather than
making it a special part of SplitterStep.
Maintain the order of constraints when splitting the system into
connected components, to match the behavior prior to the refactoring
into a separate connected-components algorithm.
Have the constraint graph's connected-component implementation be more
self-contained, producing a vector containing each of the actual
components (where each is defined by a list of type variables and a list
of constraints). This simplifies the contract with the client
(SplitterStep) and eliminates a bunch of separate mapping steps to
interpret the results.
It also lets us enrich the Component data structure in the future.
The connected components computation was not forming components when all of
the type variables in a component were already bound. Any remaining
constraints involving those type variables would then arbitrarily end up
in component 0, due to the the default-construction behavior of a map’s
operator[] with missing keys, artificially increasing the size of that
component with (typically) additional disjunctions.
Ensure that all constraints get into a component, creating one to hold
the a constraint even when all of the type variables are already bound.
Then, assert the invariant that every constraint is associated with a
component.
In time, we should probably eliminate this notion of disjunctions that
remain even when the type variable has been bound. For now, strengthen the
invariant to at least ensure that they get solved in isolation.
Simplify the connected-components computation slightly and make sure
that it never performs work outside of the subgraph described by the
input set of type variables.
The API of the connected-components algorithm asks clients to
provide the set of type variables of interest. However, the connected
components algorithm itself was operating across the entire set of
type variables, then narrowing the result down to the type variables
of interest. Instead, only perform connected components on those type
variables of interest, so that we are only doing work proportional to
the subgraph we're working in.
Simplify the interface to gatherConstraints() by performing the
uniquing within the function itself and returning only the resulting
(uniqued) vector of constraints.
Each constraint graph node maintained adjacency information that was
fairly expensive---a vector of type variables and a dense map of extra
adjacency information---and that was continuously maintained. Remove
this adjacency information, instead recomputing it by walking the
constraints (to get their type variables) and having a separate
(smaller) list of type variables that are adjacent due to fixed
bindings. This simplifies the constraint graph code and reduces
per-node memory overhead.
Use the adjacencies implied by the constraints of a node rather than looking
at the "adjacency" list, and try to simplify this code path a bit. The
diagnostic change is because we are now uniformly recording the
members of the equivalence class.
The constraint graph maintains a fairly heavyweight list of
adjacencies that is only used in two operations: connected components
and gathering related constraints. Switch connected components over to
primarily use the set of constraints (which are necessary for many
reasons), reducing the need for the adjacencies list.
The ConstraintSystem class is on the order of 1000s of bytes in size on
the stacka nd is causing issues with dispatch's 64k stack limit.
This changes most Small data types which store data on the stack to non
small heap based data types.
Most of the use-cases of `gatherConstraints` require filtering
at least based on the constraint kind that caller is interested in,
so instead of returning unrelated results and asking caller to
filter separately, let's add that functionality directly to
`gatherConstraints`.
Since it's possible to find the same constraint through two different
but equivalent type variables, let's use a set to store constraints
instead of a vector to avoid processing the same constraint multiple
times.
Currently we have this non-optimal behavior in `contractEdges` where
for every type variable it gathers constraints for its whole equivalence
class before checking if any are "contractable", instead constraints
could be gathered/filtered once which removes a lot of useless work.
Improve situation around closure parameter/argument contractions
by allowing such action if it can be proved that none of the bindings
would result in solver attempting to bind parameter to `inout` type.
Resolves: rdar://problem/36838495
Currently edge related to the parameter bindings is contracted
without properly checking if newly created equivalence class has
the same inout & l-value requirements. This patch improves the
situation by disallowing contraction of the edges related to parameter
binding constraint where left-hand side has `inout` attribute set.
Such guarantees that parameter can get `inout` type assigned when
argument gets `l-value` type.
Resolves: rdar://problem/33429010
Presently subtype constraint is considered a candidate for contraction
via `shouldContractEdge` when left-hand side of the subtype constraint
is an inout type with type variable inside. Although it's intended to
be used only in combination with BindParam constraint assigned to the
same variable, that is actually never checked and contraction of subtype
constraint like that is invalid.
Resolves: rdar://problem/34333874
When gathering constraints for a type variable, we were gathering all of
the constraints for the members of the equivalence class, and then for
the adjacencies of the representative of the equivalence class, but not
from the adjacencies of other members of the equivalence class.
For example for:
#3 = $T3
#4 = $T4 equivalent to $T3
#5 = $T5 as $T4.Element
after binding $T3 we would collect the constraints related to $T3 and
$T4, but not $T5. The end result can be that we finish examining all
disjunctions and type bindings but still have inactive constraints in
the constraint graph, which is treated as a failure in the solver.
Fixes SR-5120 / rdar://problem/32618740.
The constraint graph models type variables (as the nodes) and
constraints (as the multi-edges connecting nodes). The connected
components within this (multi-)graph are independent subproblems that
are solved separately; the results from each subproblem are then
combined. The approach helps curtail exponential behavior, because
(e.g.) the disjunctions/type variables in one component won't ever be
explored while solving for another component
This approach assumes that all of the constraints that cannot be
immediately solved are associated with one or more type
variables. This is almost entirely true---constraints that don't
involve type variables are immediately simplified.
Except for disjunctions. A disjunction involving no type variables
would not appear *at all* in the constraint graph. Worse, it's
independence from other constraints could not be established, so the
constraint solver would go exponential for every one of these
constraints. This has always been an issue, but it got worse with the
separation of type checking of "as" into the "coercion" case and the
"bridging" case, which introduced more of these disjunctions. This led
to counterintuitive behavior where adding "as Foo" would cause the
type checking to take *more* time than leaving it off, if both sides
of the "as" were known to be concrete. rdar://problem/30545483
captures a case (now in the new test case) where we saw such
exponential blow-ups.
Teach the constraint graph to keep track of "orphaned" constraints
that don't reference any type variables, and treat each "orphaned"
constraint as a separate connected component. That way, they're solved
independently.
Fixes rdar://problem/30545483 and will likely curtain other
exponential behavior we're seeing in the solver.
Once we've bound a type variable, we find those inactive constraints
that mention the type variable and make them active, so they'll be
simplified again. However, we weren't finding *all* constraints that
could be affected---in particular, we weren't searching everything
related to the type variables in the equivalence class, which meant
that some constraints would not get visited... and we would to
type-check simply because we didn't look at a constraint again when we
should have.
Fixes rdar://problem/29633747.
In the constraint solver, we've traditionally modeled nested type via
a "type member" constraint of the form
$T1 = $T0.NameOfTypeMember
and treated $T1 as a type variable. While the solver did generally try
to avoid attempting bindings for $T1 (it would wait until $T0 was
bound, which solves the constraint), on occasion we would get weird
behavior because the solver did try to bind the type
variable.
With this commit, model nested types via DependentMemberType, the same
way we handle (e.g.) the nested type of a generic type parameter. This
solution maintains more information (e.g., we know specifically which
associated type we're referring to), fits in better with the type
system (we know how to deal with dependent members throughout the type
checker, AST, and so on), and is easier to reason able.
This change is a performance optimization for the type checker for a
few reasons. First, it reduces the number of type variables we need to
deal with significantly (we create half as many type variables while
type checking the standard library), and the solver scales poorly with
the number of type variables because it visits all of the
as-yet-unbound type variables at each solving step. Second, it
eliminates a number of redundant by-name lookups in cases where we
already know which associated type we want.
Overall, this change provides a 25% speedup when type-checking the
standard library.