Commit Graph

1033 Commits

Author SHA1 Message Date
Slava Pestov
772cf3a2fa SIL Optimizer: More principled substitution remapping in devirtualizer
When devirtualizing witness method and class method calls, we
transform apply instructions operating on the result of a SIL
witness_method or class_method instruction to direct calls of
a function_ref.

The generic signature of the dynamic call site might not match
the generic signature of the static thunk, so the substitution
list from the dynamic apply instruction cannot be used directly;
instead, we must transform it to a substitution list suitable
for the static thunk.

- With witness methods, the method is called using the protocol
  requirement's signature, <Self : P, ...>, however the
  witness thunk has a generic signature derived from the
  concrete witness.

  For example, the requirement might have a signature
  <Self : P, T>, where the concrete witness thunk might
  have a signature <X, Y>, where the concrete conforming type
  is G<X, Y>.

  At the call site, we substitute Self := G<X', Y'>; however
  to be able to call the witness thunk directly, we need to
  form substitutions X := X' and Y := Y'.

- A similar situation occurs with class methods when the
  dynamically-dispatched call is performed against a derived
  class, but devirtualization actually finds the method on a
  base class of the derived class.

  The base class may have a different number of generic
  parameters than the derived class, either because the
  derived class makes some generic parameters of the base
  class concrete, or if the derived class introduces new
  generic parameters of its own.

In both cases, we need to consider the generic signature of the
dynamic call site (the protocol requirement or the derived
class method) as well as the generic signature of the static
thunk, and carefully remap the substitutions from one form
into another.

Previously the optimizer would implicitly rely on substitutions
being in AllArchetypes order, in particular that concatenating
outer substitutions with inner substitutions makes sense.

This assumption is about to go away, so this patch refactors
the optimizer to use some new abstractions for remapping
substitution lists.
2016-09-06 11:51:13 -07:00
Xin Tong
9b338e7f00 Replace SmallVector with SmallSetVector in the new epilogue ARC matcher 2016-09-06 09:53:17 -07:00
Tim Bodeit
1d666e6bd3 [Analysis] Adjust inline documentation in ClassHierarchyAnalysis.cpp
Base parameter was removed in 3c792e4648
2016-09-05 21:07:08 +02:00
Erik Eckstein
fb4b2f9675 AliasAnalysis: improve the may-decrement-ref-count check for builtins.
This helps the ARC optimizations to eliminate retain-release pairs across the "destroyArray" builtin.
2016-08-31 10:28:33 -07:00
Erik Eckstein
bd731cfff0 ARC: fix epilog-release-matcher. It has to handle dealloc_ref instructions.
This fixes a problem which might result in converting the owned self argument of a deallocating deinitializer into a guaranteed argument.

rdar://problem/28096460
2016-08-31 09:19:57 -07:00
Xin Tong
23f8eef616 Two small improvements on epilogue retain/release matcher.
1. Make sure to abort the data flow as soon as we know we cant find the epilogue retain/release.
2. Ignore retain in the throw block, because we do not use the result or insert retain for it
in the throw block on caller side. This is a bug really, we have a test case for it in the
functionsigopts.sil. It will be tested once this new epilogue retain matcher is wired up.
2016-08-29 22:12:09 -07:00
Saleem Abdulrasool
9203283628 SILOptimizer: switch to NodeRef
This adds the typedef and switches uses of NodeType * to NodeRef.  This is in
preparation for the eventual NodeRef-ization of the GraphTraits in LLVM.  NFC.
2016-08-25 13:01:11 -07:00
Erik Eckstein
959e19d7bc Add an optimization to eliminate a partial_apply if all applied arguments are dead in the applied function.
This consists of 3 parts:
1) Extend CallerAnalysis to also provide information if a function is partially applied
2) A new DeadArgSignatureOpt pass, similar to FunctionSignatureOpts, which just specializes for dead arguments of partially applied functions.
3) Let CapturePropagation eliminate such partial_apply instructions and replace them with a thin_to_thick conversion of the specialized functions.

This optimzation improves benchmarks where static struct or class functions are passed as a closure (e.g. -20% for SortStrings).
Such functions have a additional metatype parameter. We used to create a partial_apply in this case, which allocates a context, etc.
But this is not necessary as the metatype parameter is not used in most cases.

rdar://problem/27513085
2016-08-23 07:32:41 -07:00
Xin Tong
34eadd43ee Small refactoring in RRCM. 2016-08-09 21:44:53 -07:00
Slava Pestov
ddc51c5917 AST: Implement SE-0102, introducing new semantics for Never alongside @noreturn
No migrator support yet, and the code for @noreturn is still in
place.
2016-07-22 14:56:39 -07:00
Erik Eckstein
51980f7c5f SideEffectAnalysis: better selection of what non-returning instructions should be ignored.
rdar://problem/27453076
2016-07-22 09:11:34 -07:00
Andrew Trick
c47687da2c Add an isStrict flag to SIL pointer_to_address. (#3529)
Strict aliasing only applies to memory operations that use strict
addresses. The optimizer needs to be aware of this flag. Uses of raw
addresses should not have their address substituted with a strict
address.

Also add Builtin.LoadRaw which will be used by raw pointer loads.
2016-07-15 15:04:02 -05:00
Xin Tong
e61fc669e3 Merge pull request #3244 from trentxintong/ReleaseCM
Implement an iterative data flow to find epilogue retains or releases
2016-07-11 15:11:28 -07:00
Xin Tong
eaaf825032 Implement an iterative data flow to find epilogue retains or releases.
We have a few places this analysis can be used. e.g. FSO, ASO, etc.
I will wire them up one by one later.

rdar://problem/26446587
2016-07-11 14:06:06 -07:00
Erik Eckstein
dda1749f96 EscapeAnalysis: fix handling of return tuples from "array.uninitialized" semantic calls.
fixes rdar://problem/27033210
2016-07-01 11:05:23 +02:00
Xin Tong
35471ab345 Merge pull request #2310 from trentxintong/SFSO
Simplify function signature optimzation and fix a memory leak in function signature.
2016-05-25 15:11:12 -07:00
Xin Tong
fb3eb0b646 Simplify function signature optimzation.
Several functionalities have been added to FSO over time and the logic has become
muddled.

We were always looking at a static image of the SIL and try to reason about what kind of
function signature related optimizations we can do.

This can easily lead to muddled logic. e.g. we need to consider 2 different function
signature optimizations together instead of independently.

Split 1 single function to do all sorts of different analyses in FSO into several
small transformations, each of which does a specific job. After every analysis, we produce
a new function and eventually we collapse all intermediate thunks to in a single thunk.

With this change, it will be easier to implement function signature optimization as now
we can do them independently now.

Small modifications to the test cases.
2016-05-25 11:12:27 -07:00
Xin Tong
b5b905e3cc Bring back rc identity cache.
This takes off 0.7% of the 27.7% of the time we spent in SILoptimizer when building
Stdlib.
2016-05-25 09:25:27 -07:00
practicalswift
5a3067e24e [gardening] Fix plural issues. 2016-05-21 18:45:31 +02:00
Mark Lacey
921dededad Use a bump pointer allocator in the callee set creation.
Shaves about 19% of the time from the construction of these sets. The
SmallVector size was chosen to minimize the number of dynamic
allocations we end up doing while building the stdlib. This should be a
reasonable size for most projects, too. It's a bit wasteful in space,
but the total amount of allocated space here is pretty small to begin
with.
2016-05-11 17:07:27 -07:00
Arnold Schwaighofer
4df87a6554 Refactor unsafeGuaranteed code into utility functions.
NFC.
2016-05-08 08:10:43 -07:00
practicalswift
9a078b54ef [gardening] Fix recently introduced typo: "a executable" → "an executable"
[gardening] Fix recently introduced typo: "a offset" → "an offset"
[gardening] Fix recently introduced typo: "accessiblity" → "accessibility"
[gardening] Fix recently introduced typo: "cant" → "can't"
[gardening] Fix recently introduced typo: "inteference" → "interference"
[gardening] Fix recently introduced typo: "unsatified" → "unsatisfied"
[gardening] Remove accidental space.
2016-04-24 22:11:59 +02:00
Xin Tong
49f1c66d7b Rename mayUseValue to mayHaveSymmetricInteference 2016-04-19 15:23:45 -07:00
Xin Tong
51b1c0bc68 Implement retain, release code motion.
Iterative data flow retain sinking and release hoisting.

This allows us to sink retains and hoist releases across harmless loops. which is
an improvement on the SILCodeMotion retain sinking and release hoisting.

It also separates the duty of moving retain and release with the duty of eliminating them
in ASO.

This should eventually replace RR code motion in SILcodemotion and insertion point
in ARCsequence opts (ASO).

This is the performance difference i get with retain sinking and release hoisting.
After disabling retain release code motion in ASO and SILCodeMotion. we can start to take
those code out once this lands.

I see that we go from 24.5% of time spent in SILOptimizations w.r.t. the whole stdlib compilation
to 25.1%.

Improvement is better (i.e. retain sinking and hoisting releases result in performance gain).

<details open>
  <summary>Regression (7)</summary>

TEST                                                    | OLD_MIN | NEW_MIN | DELTA (%) | SPEEDUP
---                                                     | ---     | ---     | ---       | ---
SetIsSubsetOf                                           | 441     | 510     | +15.7%    | **0.86x**
SetIntersect                                            | 1041    | 1197    | +15.0%    | **0.87x**
BenchLangCallingCFunction                               | 184     | 211     | +14.7%    | **0.87x**
Sim2DArray                                              | 326     | 372     | +14.1%    | **0.88x**
SetIsSubsetOf_OfObjects                                 | 498     | 567     | +13.9%    | **0.88x**
GeekbenchGEMM                                           | 945     | 1022    | +8.2%     | **0.92x**
COWTree                                                 | 3839    | 4181    | +8.9%     | **0.92x(?)**

</details>

<details >
  <summary>Improvement (31)</summary>

TEST                                                    | OLD_MIN | NEW_MIN | DELTA (%) | SPEEDUP
---                                                     | ---     | ---     | ---       | ---
ObjectiveCBridgeFromNSDictionaryAnyObjectToString       | 174526  | 165392  | -5.2%     | **1.06x**
RGBHistogram                                            | 3128    | 2957    | -5.5%     | **1.06x**
ObjectiveCBridgeToNSDictionary                          | 16510   | 15494   | -6.2%     | **1.07x**
LuhnAlgoLazy                                            | 2294    | 2120    | -7.6%     | **1.08x**
DictionarySwapOfObjects                                 | 6477    | 5994    | -7.5%     | **1.08x**
StringRemoveDupes                                       | 1610    | 1485    | -7.8%     | **1.08x**
ObjectiveCBridgeFromNSSetAnyObjectToString              | 159358  | 147824  | -7.2%     | **1.08x**
ObjectiveCBridgeToNSSet                                 | 16191   | 14924   | -7.8%     | **1.08x**
DictionaryHashableClass                                 | 1839    | 1704    | -7.3%     | **1.08x**
DictionaryLiteral                                       | 2906    | 2678    | -7.8%     | **1.09x(?)**
StringUtilsUnderscoreCase                               | 10031   | 9187    | -8.4%     | **1.09x**
LuhnAlgoEager                                           | 2320    | 2113    | -8.9%     | **1.10x**
ObjectiveCBridgeFromNSSetAnyObjectToStringForced        | 99553   | 90348   | -9.2%     | **1.10x**
RIPEMD                                                  | 3327    | 3009    | -9.6%     | **1.11x**
Combos                                                  | 595     | 538     | -9.6%     | **1.11x**
Roman                                                   | 10      | 9       | -10.0%    | **1.11x**
StringUtilsCamelCase                                    | 10783   | 9646    | -10.5%    | **1.12x**
SetIntersect_OfObjects                                  | 2511    | 2182    | -13.1%    | **1.15x**
SwiftStructuresTrie                                     | 28331   | 24339   | -14.1%    | **1.16x**
Dictionary2OfObjects                                    | 3748    | 3115    | -16.9%    | **1.20x**
DictionaryOfObjects                                     | 2473    | 2050    | -17.1%    | **1.21x**
Dictionary                                              | 894     | 737     | -17.6%    | **1.21x**
Dictionary2                                             | 2268    | 1859    | -18.0%    | **1.22x**
StringIteration                                         | 8027    | 6344    | -21.0%    | **1.27x**
Phonebook                                               | 8207    | 6436    | -21.6%    | **1.28x**
BenchLangArray                                          | 119     | 91      | -23.5%    | **1.31x**
LinkedList                                              | 8267    | 6297    | -23.8%    | **1.31x**
StrToInt                                                | 5585    | 4180    | -25.2%    | **1.34x**
Dictionary3OfObjects                                    | 1122    | 831     | -25.9%    | **1.35x**
Dictionary3                                             | 731     | 515     | -29.6%    | **1.42x**
SuperChars                                              | 513353  | 258735  | -49.6%    | **1.98x**
2016-04-18 15:39:17 -07:00
Xin Tong
d84de12943 Revert "Change FSO explosion heuristic"
This reverts commit fa09c6b71d.

Broke Linux build. And also PR "please benchmark" does not seem to catch it.
2016-04-14 11:05:00 -07:00
Xin Tong
fa09c6b71d Change FSO explosion heuristic
If we can not find the epilogue releases for all the fields with
reference sematics, but we found for some fields. Explode the argument.

I do not see a performance improvement with this change

rdar://25451364
2016-04-13 19:40:53 -07:00
Xin Tong
1a4f567685 More conservative about when we can move a release across an instruction
We now consider effect of deinit in addition to the released value.

rdar://25362826

This is the only 10%+ regression i measured on my machine. no performance improvement.

Sim2DArray                                              | 326     | 366     | +12.3%    | **0.89x**
2016-04-12 20:39:30 -07:00
Roman Levenstein
2e77b3990b Add [nonatomic] attribute to all SIL reference counting instructions. 2016-04-06 01:52:43 -07:00
practicalswift
abfecfde17 [gardening] if ([space]…[space]) → if (…), for(…) → for (…), while(…) → while (…), [[space]x, y[space]] → [x, y] 2016-04-04 16:22:11 +02:00
Ge Sen
5ad36b2962 [gardening] Put white spaces in between if/while clauses and braces where it is missing.
For instance:

'if (foo){' => 'if (foo) {'
2016-04-02 14:43:45 +08:00
Erik Eckstein
0ea4fe7b98 EscapeAnalysis: fix a bug in graph merging
This bug can end up in doing wrong stack promotion.
2016-03-29 16:33:47 -07:00
Erik Eckstein
66cd115456 SideEffectAnalysis: fix handling of indirect parameters to partial_apply
Fixes rdar://problem/24960559
2016-03-28 15:59:01 -07:00
practicalswift
d00a5ef814 [gardening] Weekly gardening: typos, duplicate includes, header formatting, etc. 2016-03-24 22:41:10 +01:00
Xin Tong
e0ba695d17 Merge pull request #1852 from trentxintong/FSO
Remove function signature rewriter and make function signature analysis a Util
2016-03-24 12:42:05 -07:00
Xin Tong
9a3761000c Move function signature analysis to a Util
We really only need this signature analysis in the cloner pass now.
2016-03-24 11:17:47 -07:00
Xin Tong
0a562b7fe1 Merge pull request #1847 from trentxintong/FSO
Make FSO thunks always_inline.
2016-03-24 10:38:38 -07:00
Xin Tong
2a63907a17 Make FSO thunks always_inline.
This forces the callsites to be rewritten by the inliner.

we have the issue that the thunk changes from the time the its created to
the time its reread to figure out what we have done to the original function

This results in missed opportunities.

This solution solves the problem gracefully, because the thunk carries the information
on how to set up the call to the optimized functions.

Inlining the thunk makes the callsite calling the optimized function for free. i.e.
without any rewriting.

I did not measure any regression with this change.
2016-03-24 09:18:13 -07:00
Xin Tong
7ff5156cc2 Merge pull request #1827 from trentxintong/FSO
Minor refactor in Epilogue Retain/Release matchers
2016-03-24 08:48:04 -07:00
Arnold Schwaighofer
7fb2cceec0 Add a method to _NSContiguousString to facilitate stack promotion
Use it for hashing and comparison.

During String's hashValue and comparison function we create a
_NSContiguousString instance to call Foundation's hash/compare function. This is
expensive because we have allocate and deallocate a short lived object on the
heap (and deallocation for Swift objects is expensive).  Instead help the
optimizer to allocate this object on the stack.

Introduces two functions on the internal _NSContiguousString:
_unsafeWithNotEscapedSelfPointer and _unsafeWithNotEscapedSelfPointerPair that
pass the _NSContiguousString instance as an opaque pointer to their closure
argument. Usage of these functions asserts that the closure will not escape
objects transitively reachable from the opaque pointer.

We then use those functions to call into the runtime to call foundation
functions on the passed strings. The optimizer can promote the strings to the
stack because of the assertion this API makes.

  let lhsStr = _NSContiguousString(self._core) // will be promoted to the stack.
  let rhsStr = _NSContiguousString(rhs._core) // will be promoted to the stack.
  let res = lhsStr._unsafeWithNotEscapedSelfPointerPair(rhsStr) {
    return _stdlib_compareNSStringDeterministicUnicodeCollationPointer($0, $1)
  }

Tested by existing String tests.

We should see some nice performance improvements for string comparison and
dictionary benchmarks.

Here is what I measured at -O on my machine

Name                          Speedup
Dictionary                      2.00x
Dictionary2                     1.45x
Dictionary2OfObjects            1.20x
Dictionary3                     1.50x
Dictionary3OfObjects            1.45x
DictionaryOfObjects             1.40x
SuperChars                      1.60x

rdar://22173647
2016-03-24 05:43:29 -07:00
Xin Tong
9a020c8c7a Minor refactoring in epilogue retain matcher 2016-03-23 22:16:49 -07:00
Xin Tong
b1c7bc5e4b Reinstate "Minor refactoring in epilogue retain matcher" 2016-03-23 22:16:34 -07:00
Xin Tong
6e07c5ec60 Revert "Minor refactoring in epilogue release matcher. NFC"
This reverts commit a191ae72a7.

Broke Opt+Assert, Stdlib DebInfo+Assert.
2016-03-21 11:08:31 -07:00
Xin Tong
a191ae72a7 Minor refactoring in epilogue release matcher. NFC 2016-03-20 23:13:50 -07:00
Xin Tong
cff61d7fe7 Implement a function signature cloner and rewriter.
This split the function signature module pass into 2 functin passes.

By doing so,  this allows us to rewrite to using the FSO-optimized
function prior to attempting inlining, but allow us to do a substantial
amount of optimization on the current function before attempting to do
FSO on that function.

And also helps us to move to a model which module pass is NOT used unless
necesary.

I do not see regression nor improvement for on the performance test suite.

functionsignopts.sil and functionsignopt_sroa.sil are modified because the
mangler now takes into account of information in the projection tree.
2016-03-19 23:57:37 -07:00
Andrew Trick
f6a2e7c362 [comment] Clarify RC identity over casts. 2016-03-18 04:01:16 -07:00
Xin Tong
fd353df19e Remove some of unneeded functionality in CallerAnalysis
We really only need the analysis to tell whether a function has caller
inside the module or not. We do not need to know the callsites.

Remove them for now to make the analysis more memory efficient.

Add a note to indicate it can be extended.
2016-03-17 21:16:24 -07:00
Xin Tong
eab029d795 Add CallerAnalysis Printer.
This provides some basic testing on CallerAnalysis before hooking it
up to function signature opts.
2016-03-17 10:51:16 -07:00
Xin Tong
6b9cde8ffd Fix typo 2016-03-16 18:00:07 -07:00
Xin Tong
cca9c2521a Improve CallerAnalysis.
Address the comments from 0acc0a8464

I still have not made up my mind how to handle deleted functions.

CallerAnalysis is not hooked up to anything yet.
2016-03-16 17:49:34 -07:00
practicalswift
a934702d51 [gardening] Fix recently introduced typo: "fucntion" → "function" 2016-03-16 23:17:13 +01:00