The previous algorithm was doing an iterative forward data flow analysis
followed by a reverse data flow analysis. I suspect the history here is that
it was a reverse analysis, and that didn't really work for infinite loops,
and so complexity accumulated.
The new algorithm is quite straightforward and relies on the allocations
being properly jointly post-dominated, just not nested. We simply walk
forward through the blocks in consistent-with-dominance order, maintaining
the stack of active allocations and deferring deallocations that are
improperly nested until we deallocate the allocations above it. The only
real subtlety is that we have to delay walking into dead-end regions until
we've seen all of the edges into them, so that we can know whether we have
a coherent stack state in them. If the state is incoherent, we need to
remove any deallocations of previous allocations because we cannot talk
correctly about what's on top of the stack.
The reason I'm doing this, besides it just being a simpler and hopefully
faster algorithm, is that modeling some of the uses of the async stack
allocator properly requires builtins that cannot just be semantically
reordered. That should be somewhat easier to handle with the new approach,
although really (1) we should not have runtime functions that need this and
(2) we're going to need a conservatively-correct solution that's different
from this anyway because hoisting allocations is *also* limited in its own
way.
I've attached a rather pedantic proof of the correctness of the algorithm.
The thing that concerns me most about the rewritten pass is that it isn't
actually validating joint post-dominance on input, so if you give it bad
input, it might be a little mystifying to debug the verifier failures.
When its operand has coroutine kind `yield_once_2`, a `begin_apply`
instruction produces an additional value representing the storage
allocated by the callee. This storage must be deallocated by a
`dealloc_stack` on every path out of the function. Like any other stack
allocation, it must obey stack discipline.
Although I don't plan to bring over new assertions wholesale
into the current qualification branch, it's entirely possible
that various minor changes in main will use the new assertions;
having this basic support in the release branch will simplify that.
(This is why I'm adding the includes as a separate pass from
rewriting the individual assertions)
* `alloc_vector`: allocates an uninitialized vector of elements on the stack or in a statically initialized global
* `vector`: creates an initialized vector in a statically initialized global
Reformatting everything now that we have `llvm` namespaces. I've
separated this from the main commit to help manage merge-conflicts and
for making it a bit easier to read the mega-patch.
This is phase-1 of switching from llvm::Optional to std::optional in the
next rebranch. llvm::Optional was removed from upstream LLVM, so we need
to migrate off rather soon. On Darwin, std::optional, and llvm::Optional
have the same layout, so we don't need to be as concerned about ABI
beyond the name mangling. `llvm::Optional` is only returned from one
function in
```
getStandardTypeSubst(StringRef TypeName,
bool allowConcurrencyManglings);
```
It's the return value, so it should not impact the mangling of the
function, and the layout is the same as `std::optional`, so it should be
mostly okay. This function doesn't appear to have users, and the ABI was
already broken 2 years ago for concurrency and no one seemed to notice
so this should be "okay".
I'm doing the migration incrementally so that folks working on main can
cherry-pick back to the release/5.9 branch. Once 5.9 is done and locked
away, then we can go through and finish the replacement. Since `None`
and `Optional` show up in contexts where they are not `llvm::None` and
`llvm::Optional`, I'm preparing the work now by going through and
removing the namespace unwrapping and making the `llvm` namespace
explicit. This should make it fairly mechanical to go through and
replace llvm::Optional with std::optional, and llvm::None with
std::nullopt. It's also a change that can be brought onto the
release/5.9 with minimal impact. This should be an NFC change.
The new alloc_pack_metadata and dealloc_pack_metadata are inserted as
part of IRGen lowering. The former indicates that the next instruction
might result in on-stack pack metadata being emitted. The latter
indicates that this is the position at which metadata emitted on behalf
of its operand should be cleaned up.
`getValue` -> `value`
`getValueOr` -> `value_or`
`hasValue` -> `has_value`
`map` -> `transform`
The old API will be deprecated in the rebranch.
To avoid merge conflicts, use the new API already in the main branch.
rdar://102362022
Reduces the number of _ContiguousArrayStorage metadata.
In order to support constant time bridging we do need to set the correct
metadata when we bridge to Objective-C. This is so that the type check
succeeds when bridging back from Objective-C to reuse the storage
instance rather than bridging the elements.
To support dynamically setting the `_ContiguousArrayStorage` element
type i needed to add support for optimizing `alloc_ref_dynamic`
throughout the optimizer.
Possible future improvements:
* Use different metadata such that we can disambiguate native Swift
classes during destruction -- allowing native release rather then unknown
release usage.
* Optimize the newly added semantic function
getContiguousArrayStorageType
rdar://86171143
Introduce a new instruction `dealloc_stack_ref ` and remove the `stack` flag from `dealloc_ref`.
The `dealloc_ref [stack]` was confusing, because all it does is to mark the deallocation of the stack space for a stack promoted object.
Beside fixing the compiler crash, this change also improves the stack-nesting correction mechanisms in the inliners:
* Instead of trying to correct the nesting after each inlining of a callee, correct the nesting once when inlining is finished for a caller function.
This fixes a potential compile time problem, because StackNesting iterates over the whole function.
In worst case this can lead to quadratic behavior in case many begin_apply instructions with overlapping stack locations are inlined.
* Because we are doing it only once for a caller, we can remove the complex logic for checking if it is necessary.
We can just do it unconditionally in case any coroutine gets inlined.
The inliners iterate over all instruction of a function anyway, so this does not increase the computational complexity (StackNesting is roughly linear with the number of instructions).
rdar://problem/47615442
Instead of some special treatment of unreachable blocks, model unreachable as implicitly deallocating all alive stack locations at that point.
This requires an additional forward-dataflow pass. But it now correctly models the problem and fixes a compiler crash.
rdar://problem/47402694
Some of the `dealloc_stack` instructions inserted where getting
a wrong scope. This manifests when running AllocBoxToStack because
it uses StackNesting as an utility. Yet another improvement in
debug informations at `-Onone`.
Fixes SR-6738.
introduce a common superclass, SILNode.
This is in preparation for allowing instructions to have multiple
results. It is also a somewhat more elegant representation for
instructions that have zero results. Instructions that are known
to have exactly one result inherit from a class, SingleValueInstruction,
that subclasses both ValueBase and SILInstruction. Some care must be
taken when working with SILNode pointers and testing for equality;
please see the comment on SILNode for more information.
A number of SIL passes needed to be updated in order to handle this
new distinction between SIL values and SIL instructions.
Note that the SIL parser is now stricter about not trying to assign
a result value from an instruction (like 'return' or 'strong_retain')
that does not produce any.
This is useful for optimizations (like AllocBoxToStack) which create (de-)alloc_stack instructions.
They can just insert the new instructions anywhere without worrying about nesting and correct the nesting afterwards.