Commit Graph

350 Commits

Author SHA1 Message Date
Michael Gottesman
f73276259e [loop-rotate] In OSSA, instead of creating address phis, sneak the address through the phi using a RawPointer.
In OSSA, we do not allow for address phis, but in certain cases the logic of
LoopRotate really wants them. To work around this issue, I added some code in
this PR to loop rotate that as a post-pass fixes up any address phis by
inserting address <-> raw pointer adapters and changing the address phi to
instead be of raw pointer type.
2021-02-12 23:20:17 -08:00
Erik Eckstein
fe10f98cf0 SIL: rename the SILBitfield.h header file to BasicBlockBits.h
NFC
2021-02-12 11:15:55 +01:00
Erik Eckstein
f48191966c SILOptimizer: use BasicBlockSet instead of SmallPtrSet in various transformations.
It reduces compile time.
2021-01-27 10:31:17 +01:00
eeckstein
a0884baa3c Merge pull request #35521 from eeckstein/sil-bitfields
SIL: add a utility which can manage per-block bitfields and flags efficiently.
2021-01-22 08:39:13 +01:00
Meghana Gupta
6c32cac98e Enable LoopRotate on OSSA 2021-01-21 16:27:51 -08:00
Meghana Gupta
845e63f901 Support ownershipKind in SILSSAUpdater 2021-01-21 16:27:50 -08:00
Erik Eckstein
3e8612b0d3 SILOptimizer: use the BasicBlockFlag utility in ValueLifetimeAnalysis 2021-01-21 21:31:41 +01:00
swift-ci
13ffb8b181 Merge pull request #35534 from meg-gupta/loopunrolltest 2021-01-21 01:50:33 -08:00
Meghana Gupta
3b60184f53 "[NFC] Add tests for LoopUnroll on OSSA""
Reland #35423
2021-01-20 22:26:18 -08:00
Meghana Gupta
f6f58263b2 Enable ArrayPropertyOpt on OSSA 2021-01-20 13:23:40 -08:00
Meghana Gupta
a7dc4621e4 clang-format some long lines in ArrayPropertyOpt 2021-01-20 13:23:39 -08:00
Meghana Gupta
8d217f362f Merge pull request #35380 from meg-gupta/abcoptsossa
Enable ArrayBoundsCheckElimination on OSSA
2021-01-20 13:21:06 -08:00
Meghana Gupta
00ea81c8df Enable ArrayBoundsCheckElimination on OSSA 2021-01-20 02:13:15 -08:00
Meghana Gupta
a5803f090a Refactor ABCOpts
- Updated free functions as class methods and avoid passing redundant
  args
- Updated ReleaseSafeArrays to be a class member
2021-01-20 01:46:17 -08:00
Ben Barham
16d614412a Revert "[NFC] Add tests for LoopUnroll on OSSA" 2021-01-19 09:00:36 +10:00
Meghana Gupta
246faae927 Merge pull request #35423 from meg-gupta/loopunrollossa
[NFC] Add tests for LoopUnroll on OSSA
2021-01-18 00:03:08 -08:00
Meghana Gupta
e10ffb79b2 Add tests for LoopUnroll on OSSA 2021-01-14 10:31:07 -08:00
Erik Eckstein
b7351780f7 SIL: move all the block-list modifying APIs to SILFunction.
... and remove SILFunction::getBlocks().

It's just a cleanup, NFC.
2021-01-14 17:35:31 +01:00
Meghana Gupta
4f054f17a3 Allow load instructions in ArrayPropertyOpt's checkSafeArrayAddressUses 2021-01-11 16:08:45 -08:00
Meghana Gupta
77a992096d Update a comment about StructAddressUsers 2021-01-11 16:05:35 -08:00
Meghana Gupta
28bf85f22b Merge pull request #35312 from meg-gupta/cowoptossa
Enable COWArrayOpts on OSSA
2021-01-08 16:58:50 -08:00
Meghana Gupta
dae2530799 Enable COWArrayOpts on OSSA
Additional handling of copy_value/destroy_value/load[copy]/
begin_borrow/end_borrow is needed to support OSSA.

TODO: Support handling of 2d array in the pass for OSSA.
Currently hoisting of loads and borrows are not supported in OSSA
2021-01-08 12:03:59 -08:00
Andrew Trick
cec5513373 Fix ForEachLoopUnroll use-after-free miscompile.
This pass generated incorrect borrow scopes:

%stack = alloc_stack
%borrow = begin_borrow %element
store_borrow %borrow to %stack
end_borrow %borrow
try_apply %f(%stack) normal bb1, error bb2
...
destroy_value %element

This was not showing up as a miscompile before because:

- an array holds an extra copy of the unrolled elements, that array is
  now being optimized away completely.

- CopyPropagation now canonicalizes OSSA lifetimes independent of
  unrelated program side effects.

So, since there is no explicit relationship between %borrow and the
OSSA value in %stack, we end up with:

%stack = alloc_stack
%borrow = begin_borrow %element
store_borrow %borrow to %stack
end_borrow %borrow
destroy_value %element
try_apply %f(%stack) normal bb1, error bb2

Fixes rdar://72904101 ([CanonicalOSSA] Fix ForEachLoopUnroll use-after-free miscompile.)
2021-01-07 14:09:54 -08:00
Andrew Trick
437765e7e1 LICM: split loads that are wider than the loop-stored value.
For combined load-store hoisting, split loads that contain the
loop-stored value into a single load from the same address as the
loop-stores, and a set of loads disjoint from the loop-stores. The
single load will be hoisted while sinking the stores to the same
address. The disjoint loads will be hoisted normally in a subsequent
iteration on the same loop.

loop:
  load %outer
  store %inner1
exit:

Will be split into

loop:
  load %inner1
  load %inner2
  store %inner1
exit:

Then, combined load/store hoisting will produce:

  load %inner1
loop:
  load %inner2
exit:
  store %inner1
2020-11-12 13:59:43 -08:00
Andrew Trick
d86099f05f Use AccessPath in LICM.
The LICM algorithm was not robust with respect to address projection
because it identifies a projected address by its SILValue. This should
never be done! Use AccessPath instead.

Fixes regressions caused by rdar://66791257 (Print statement provokes
"Can't unsafeBitCast between types of different sizes" when
optimizations enabled)
2020-11-10 12:19:18 -08:00
Andrew Trick
4106b7698d Fix ArrayPropertyOpt dominator update
Pass the DomTree into SILCloner so that edge splitting
properly creates new domtree nodes.

The confusing edge splitting code that was in ArrayPropertyOpt is no
longer needed.

We could cleanup ArrayPropertyOpt even more by moving fixDomTree
into SILCloner. But there's already too much going on in this patch.
2020-11-03 01:40:00 -08:00
Andrew Trick
b06aea022e Layout loop unrolled blocks next to the original loop. 2020-11-03 01:40:00 -08:00
Andrew Trick
834615671e LoopRotate: Avoid introducing new critical edges during rotation. 2020-11-03 01:40:00 -08:00
Andrew Trick
3128eae3f0 Add NestedSemanticFunctionCheck diagnostic
to check for improperly nested '@_semantic' functions.

Add a missing @_semantics("array.init") in ArraySlice found by the
diagnostic.

Distinguish between array.init and array.init.empty.

Categorize the types of semantic functions by how they affect the
inliner and pass pipeline, and centralize this logic in
PerformanceInlinerUtils. The ultimate goal is to prevent inlining of
"Fundamental" @_semantics calls and @_effects calls until the late
pipeline where we can safely discard semantics. However, that requires
significant pipeline changes.

In the meantime, this change prevents the situation from getting worse
and makes the intention clear. However, it has no significant effect
on the pass pipeline and inliner.
2020-10-26 17:02:33 -07:00
Andrew Trick
6f2cda1390 Add AccessUseVisitor and cleanup related APIs.
Add AccesssedStorage::compute and computeInScope to mirror AccessPath.

Allow recovering the begin_access for Nested storage.

Adds AccessedStorage.visitRoots().
2020-10-16 15:00:10 -07:00
Andrew Trick
cc0aa2f8b8 Add an AccessPath abstraction and formalize memory access
Things that have come up recently but are somewhat blocked on this:

- Moving AccessMarkerElimination down in the pipeline
- SemanticARCOpts correctness and improvements
- AliasAnalysis improvements
- LICM performance regressions
- RLE/DSE improvements

Begin to formalize the model for valid memory access in SIL. Ignoring
ownership, every access is a def-use chain in three parts:

object root -> formal access base -> memory operation address

AccessPath abstracts over this path and standardizes the identity of a
memory access throughout the optimizer. This abstraction is the basis
for a new AccessPathVerification.

With that verification, we now have all the properties we need for the
type of analysis requires for exclusivity enforcement, but now
generalized for any memory analysis. This is suitable for an extremely
lightweight analysis with no side data structures. We currently have a
massive amount of ad-hoc memory analysis throughout SIL, which is
incredibly unmaintainable, bug-prone, and not performance-robust. We
can begin taking advantage of this verifably complete model to solve
that problem.

The properties this gives us are:

Access analysis must be complete over memory operations: every memory
operation needs a recognizable valid access. An access can be
unidentified only to the extent that it is rooted in some non-address
type and we can prove that it is at least *not* part of an access to a
nominal class or global property. Pointer provenance is also required
for future IRGen-level bitfield optimizations.

Access analysis must be complete over address users: for an identified
object root all memory accesses including subobjects must be
discoverable.

Access analysis must be symmetric: use-def and def-use analysis must
be consistent.

AccessPath is merely a wrapper around the existing accessed-storage
utilities and IndexTrieNode. Existing passes already very succesfully
use this approach, but in an ad-hoc way. With a general utility we
can:

- update passes to use this approach to identify memory access,
  reducing the space and time complexity of those algorithms.

- implement an inexpensive on-the-fly, debug mode address lifetime analysis

- implement a lightweight debug mode alias analysis

- ultimately improve the power, efficiency, and maintainability of
  full alias analysis

- make our type-based alias analysis sensistive to the access path
2020-10-16 15:00:10 -07:00
Andrew Trick
2767b51d61 Change Projection to support signed indices.
Change ProjectionIndex for ref_tail_addr to std::numeric_limits<int>::max();
This is necessary to disambiguate the tail elements from
ref_element_addr field zero.
2020-10-16 15:00:09 -07:00
Andrew Trick
5ae231eaab Rename getFieldNo() to getFieldIndex().
Do I really need to justify this?
2020-09-24 22:44:13 -07:00
Andrew Trick
4ca3c232b7 Fix LICM to avoid hoisting never-executed traps
It is legal for the optimizer to consider code after a loop always
reachable, but when a loop has no exits, or when the loops exits are
dominated by a conditional statement, we should not consider
conditional statements within the loop as dominating all possible
execution paths through the loop. At least not when there is at least
one path through the loop that contains a "synchronization point",
such as a function that may contain a memory barrier, perform I/O, or
exit the program.

Sadly, we still don't model synchronization points in the optimizer,
so we need to conservatively assume all loops have a synchronization
point and avoid hoisting conditional traps that may never be executed.

Fixes rdar://66791257 (Print statement provokes "Can't unsafeBitCast
between types of different sizes" when optimizations enabled)

Originated in 2014.
2020-08-12 13:55:46 -07:00
Michael Gottesman
d064241599 [ssa-updater] Modernize style before adding support for guaranteed parameters.
Specifically:

1. I made methods, variables camelCase.
2. I expanded out variable names (e.x.: bb -> block, predBB -> predBlocks, U -> wrappedUse).
3. I changed typedef -> using.
4. I changed a few c style for loops into for each loops using llvm::enumerate.

NOTE: I left the parts needed for syncing to LLVM in the old style since LLVM
needs these to exist for CRTP to work correctly for the SILSSAUpdater.
2020-08-06 15:41:00 -07:00
Andrew Trick
5826e75b00 Generalize the MemAccessUtils API.
For use outside access enforcement passes.

Add isUniquelyIdentifiedAfterEnforcement.

Rename functions for clarity and generality.

Rename isUniquelyIdentifiedOrClass to isFormalAccessBase.

Rename findAccessedStorage to identifyFormalAccess.

Rename findAccessedStorageNonNested to findAccessedStorage.

Part of generalizing the utility for use outside the access
enforcement passes.
2020-07-17 10:13:20 -07:00
Erik Eckstein
71a642e51b stdlib, SIL optimizer: use the SIL copy-on-write representation in the Array types.
Use the new builtins for COW representation in Array, ContiguousArray and ArraySlice.
The basic idea is to strictly separate code which mutates an array buffer from code which reads from an array.
The concept is explained in more detail in docs/SIL.rst, section "Copy-on-Write Representation".

The main change is to use beginCOWMutation() instead of isUniquelyReferenced() and insert endCOWMutation() at the end of all mutating functions. Also, reading from the array buffer must be done differently, depending on if the buffer is in a mutable or immutable state.

All the required invariants are enforced by runtime checks - but only in an assert-build of the library: a bit in the buffer object side-table indicates if the buffer is mutable or not.

Along with the library changes, also two optimizations needed to be updated: COWArrayOpt and ObjectOutliner.
2020-06-08 15:02:22 +02:00
Erik Eckstein
1559fe333f SIL: a new library intrinsic to "finalize" array literals
For COW support in SIL it's required to "finalize" array literals.
_finalizeUninitializedArray is a compiler known stdlib function which is called after all elements of an array literal are stored.
This runtime function marks the array literal as finished.

  %uninitialized_result_tuple = apply %_allocateUninitializedArray(%count)
  %mutable_array = tuple_extract %uninitialized_result_tuple, 0
  %elem_base_address = tuple_extract %uninitialized_result_tuple, 1
  ...
  store %elem_0 to %elem_addr_0
  store %elem_1 to %elem_addr_1
  ...
  %final_array = apply %_finalizeUninitializedArray(%mutable_array)

In this commit _finalizeUninitializedArray is still a no-op because the COW support is not used in the Array implementation yet.
2020-06-08 10:24:29 +02:00
Anthony Latsis
267e32dcd8 Merge pull request #32118 from AnthonyLatsis/post-increment-cleanup
[NFC] Pre- increment and decrement where possible
2020-06-02 20:10:29 +03:00
Anthony Latsis
9fd1aa5d59 [NFC] Pre- increment and decrement where possible 2020-06-01 15:39:29 +03:00
Varun Gandhi
c14e934563 [NFC] Remove redundant includes for llvm/ADT/SmallSet.h. 2020-05-31 13:07:45 -07:00
Erik Eckstein
2403e56eb5 SIL: new "array.end_mutation" and "array.finalize_intrinsic" array semantics
Used to "finalize" an array literal. It's not used, yet. So this is NFC.
Also handle the "array.finalize_intrinsic" function in various array specific optimizations.
2020-05-26 18:01:17 +02:00
Saleem Abdulrasool
cebe79d482 SIL: use object libraries instead of globbing
This simplifies the handling of the subdirectories in the SIL and
SILOptimizer paths.  Create individual libraries as object libraries
which allows the analysis of the source changes to be limited in scope.
Because these are object libraries, this has 0 overhead compared to the
previous implementation.  However, string operations over the filenames
are avoided.  The cost for this is that any new sub-library needs to be
added into the list rather than added with the special local function.
2020-05-18 18:56:34 +00:00
Erik Eckstein
ba4da8e0d3 LICM: enable more stores to moved out of a loop
Even if a store is not dominating the loop exits, it makes sense to move it out of the loop if the pre-header also as a store to the same memory location.
When this is done, dead-store-elimination can then most likely remove the store in the pre-header.
2020-05-18 15:31:34 +02:00
Erik Eckstein
54e54ae4fe COWArrayOpts: fix a mis-compile related to owned function arguments
COWArrayOpts wrongly assumed that an 'owned' argument is released in the function.

https://bugs.swift.org/browse/SR-12440
rdar://problem/62201043
2020-05-04 17:30:40 +02:00
Andrew Trick
1c12de3241 Fix LICM combined load/store hoisting/sinking optimization.
This loop optimization hoists and sinks a group of loads and stores to
the same address.

Consider this SIL...

PRELOOP:
  %stackAddr = alloc_stack $Index
  %outerAddr1 = struct_element_addr %stackAddr : $*Index, #Index.value
  %innerAddr1 = struct_element_addr %outerAddr1 : $*Int, #Int._value

  %outerAddr2 = struct_element_addr %stackAddr : $*Index, #Index.value
  %innerAddr2 = struct_element_addr %outerAddr2 : $*Int, #Int._value

LOOP:
  %_ = load %innerAddr2 : $*Builtin.Int64
  store %_ to %outerAddr2 : $*Int
  %_ = load %innerAddr1 : $*Builtin.Int64

There are two bugs:

1) LICM miscompiles code during combined load/store hoisting and sinking.

When the loop contains an aliasing load from a difference projection
value, the optimization sinks the store but never replaces the
load. At runtime, the load reads a stale value.

FIX: isOnlyLoadedAndStored needs to check for other load instructions
before hoisting/sinking a seemingly unrelated set of
loads/stores. Checking side effect instructions is insufficient. The
same bug could happen with stores, which also do not produce side
effects.

Fixes <rdar://61246061> LICM miscompile:
Combined load/store hoisting/sinking with aliases

2) The LICM algorithm is not robust with respect to address projection
   because it identifies a projected address by its SILValue. This
   should never be done! It is trivial to represent a project path
   using an IndexTrieNode (there is also an abstraction called
   "ProjectionPath", but it should _never_ actually be stored by an
   analysis because of the time and space complexity of doing so).

The second bug is not necessary to fix for correctness, so will be
fixed in a follow-up commit.
2020-04-03 08:25:20 -07:00
Andrew Trick
73ee38c162 Add tracing to LICM for reloaded store/restored load optimization. 2020-04-03 08:25:20 -07:00
Erik Eckstein
287619445b LoopRotate: limit the size of the block to be duplicated.
This avoids significant code size regressions if the loop block to be duplicated is very large.

Also remove the ShouldRotate option. It's not needed anymore as disabling the pass can be done by -sil-disable-pass.

rdar://problem/33970453
2020-04-01 10:02:43 +02:00
Erik Eckstein
3ad7d548c2 LICM: hoist calls to global_init functions
Global initializers are executed only once.
Therefore it's possible to hoist such an initializer call to a loop pre-header - in case there are no conflicting side-effects in the loop before the call.
Also, the call must post-dominate the loop pre-header. Otherwise it would be executed speculatively.
2020-03-23 16:08:56 +01:00
Fred Riss
259d78a350 Adapt to llvm.org StringRef API change 2020-03-13 19:08:22 +01:00