Commit Graph

2053 Commits

Author SHA1 Message Date
Mark Lacey
921dededad Use a bump pointer allocator in the callee set creation.
Shaves about 19% of the time from the construction of these sets. The
SmallVector size was chosen to minimize the number of dynamic
allocations we end up doing while building the stdlib. This should be a
reasonable size for most projects, too. It's a bit wasteful in space,
but the total amount of allocated space here is pretty small to begin
with.
2016-05-11 17:07:27 -07:00
Roman Levenstein
73b6a38edc [sil-devirtualizer] Do not perform a speculative devirtualization for no-opt callees. 2016-05-11 16:28:50 -07:00
Arnold Schwaighofer
4df87a6554 Refactor unsafeGuaranteed code into utility functions.
NFC.
2016-05-08 08:10:43 -07:00
Erik Eckstein
d6e86b7c4b Add a new SIL pass to move conditions closer to switch_enum to enable jump threading.
For details see the comment in ConditionForwarding.cpp.
This optimization pass helps to optimize loops iterating over closed ranges, e.g. for i in 0...n { }
2016-05-05 10:34:08 -07:00
Xin Tong
57e2bdb123 Revert "Simplify function signature optimization" 2016-04-25 16:33:17 -07:00
Xin Tong
633ca2e92b Simplify function signature optimzation.
Several functionalities have been added to FSO over time and the logic has become
muddled.

We were always looking at a static image of the SIL and try to reason about what kind of
function signature related optimizations we can do.

This can easily lead to muddled logic. e.g. we need to consider 2 different function
signature optimizations together instead of independently.

Split 1 single function to do all sorts of different analyses in FSO into several
small transformations, each of which does a specific job. After every analysis, we produce
a new function and eventually we collapse all intermediate thunks to in a single thunk.

With this change, it will be easier to implement function signature optimization as now
we can do them independently now.

Minimal modifications to the test cases.
2016-04-25 15:28:51 -07:00
practicalswift
9a078b54ef [gardening] Fix recently introduced typo: "a executable" → "an executable"
[gardening] Fix recently introduced typo: "a offset" → "an offset"
[gardening] Fix recently introduced typo: "accessiblity" → "accessibility"
[gardening] Fix recently introduced typo: "cant" → "can't"
[gardening] Fix recently introduced typo: "inteference" → "interference"
[gardening] Fix recently introduced typo: "unsatified" → "unsatisfied"
[gardening] Remove accidental space.
2016-04-24 22:11:59 +02:00
Xin Tong
b27697eff1 Merge pull request #2246 from trentxintong/CodeMotion
Rename mayUseValue to mayHaveSymmetricInteference
2016-04-19 19:40:33 -07:00
Xin Tong
49f1c66d7b Rename mayUseValue to mayHaveSymmetricInteference 2016-04-19 15:23:45 -07:00
Xin Tong
3824a9479e Merge pull request #2064 from trentxintong/CodeMotion
implement retain release code motion.
2016-04-18 18:16:50 -07:00
swift-ci
48e0aac9a6 Merge pull request #2234 from trentxintong/IP 2016-04-18 16:41:10 -07:00
Xin Tong
51b1c0bc68 Implement retain, release code motion.
Iterative data flow retain sinking and release hoisting.

This allows us to sink retains and hoist releases across harmless loops. which is
an improvement on the SILCodeMotion retain sinking and release hoisting.

It also separates the duty of moving retain and release with the duty of eliminating them
in ASO.

This should eventually replace RR code motion in SILcodemotion and insertion point
in ARCsequence opts (ASO).

This is the performance difference i get with retain sinking and release hoisting.
After disabling retain release code motion in ASO and SILCodeMotion. we can start to take
those code out once this lands.

I see that we go from 24.5% of time spent in SILOptimizations w.r.t. the whole stdlib compilation
to 25.1%.

Improvement is better (i.e. retain sinking and hoisting releases result in performance gain).

<details open>
  <summary>Regression (7)</summary>

TEST                                                    | OLD_MIN | NEW_MIN | DELTA (%) | SPEEDUP
---                                                     | ---     | ---     | ---       | ---
SetIsSubsetOf                                           | 441     | 510     | +15.7%    | **0.86x**
SetIntersect                                            | 1041    | 1197    | +15.0%    | **0.87x**
BenchLangCallingCFunction                               | 184     | 211     | +14.7%    | **0.87x**
Sim2DArray                                              | 326     | 372     | +14.1%    | **0.88x**
SetIsSubsetOf_OfObjects                                 | 498     | 567     | +13.9%    | **0.88x**
GeekbenchGEMM                                           | 945     | 1022    | +8.2%     | **0.92x**
COWTree                                                 | 3839    | 4181    | +8.9%     | **0.92x(?)**

</details>

<details >
  <summary>Improvement (31)</summary>

TEST                                                    | OLD_MIN | NEW_MIN | DELTA (%) | SPEEDUP
---                                                     | ---     | ---     | ---       | ---
ObjectiveCBridgeFromNSDictionaryAnyObjectToString       | 174526  | 165392  | -5.2%     | **1.06x**
RGBHistogram                                            | 3128    | 2957    | -5.5%     | **1.06x**
ObjectiveCBridgeToNSDictionary                          | 16510   | 15494   | -6.2%     | **1.07x**
LuhnAlgoLazy                                            | 2294    | 2120    | -7.6%     | **1.08x**
DictionarySwapOfObjects                                 | 6477    | 5994    | -7.5%     | **1.08x**
StringRemoveDupes                                       | 1610    | 1485    | -7.8%     | **1.08x**
ObjectiveCBridgeFromNSSetAnyObjectToString              | 159358  | 147824  | -7.2%     | **1.08x**
ObjectiveCBridgeToNSSet                                 | 16191   | 14924   | -7.8%     | **1.08x**
DictionaryHashableClass                                 | 1839    | 1704    | -7.3%     | **1.08x**
DictionaryLiteral                                       | 2906    | 2678    | -7.8%     | **1.09x(?)**
StringUtilsUnderscoreCase                               | 10031   | 9187    | -8.4%     | **1.09x**
LuhnAlgoEager                                           | 2320    | 2113    | -8.9%     | **1.10x**
ObjectiveCBridgeFromNSSetAnyObjectToStringForced        | 99553   | 90348   | -9.2%     | **1.10x**
RIPEMD                                                  | 3327    | 3009    | -9.6%     | **1.11x**
Combos                                                  | 595     | 538     | -9.6%     | **1.11x**
Roman                                                   | 10      | 9       | -10.0%    | **1.11x**
StringUtilsCamelCase                                    | 10783   | 9646    | -10.5%    | **1.12x**
SetIntersect_OfObjects                                  | 2511    | 2182    | -13.1%    | **1.15x**
SwiftStructuresTrie                                     | 28331   | 24339   | -14.1%    | **1.16x**
Dictionary2OfObjects                                    | 3748    | 3115    | -16.9%    | **1.20x**
DictionaryOfObjects                                     | 2473    | 2050    | -17.1%    | **1.21x**
Dictionary                                              | 894     | 737     | -17.6%    | **1.21x**
Dictionary2                                             | 2268    | 1859    | -18.0%    | **1.22x**
StringIteration                                         | 8027    | 6344    | -21.0%    | **1.27x**
Phonebook                                               | 8207    | 6436    | -21.6%    | **1.28x**
BenchLangArray                                          | 119     | 91      | -23.5%    | **1.31x**
LinkedList                                              | 8267    | 6297    | -23.8%    | **1.31x**
StrToInt                                                | 5585    | 4180    | -25.2%    | **1.34x**
Dictionary3OfObjects                                    | 1122    | 831     | -25.9%    | **1.35x**
Dictionary3                                             | 731     | 515     | -29.6%    | **1.42x**
SuperChars                                              | 513353  | 258735  | -49.6%    | **1.98x**
2016-04-18 15:39:17 -07:00
Xin Tong
bfc9683b49 Use a SmallPtrSet instead of a DenseSet. More memory efficient 2016-04-18 14:54:39 -07:00
practicalswift
0c89048988 [gardening] Fix recently introduced typo: "transistive" → "transitive" 2016-04-14 22:26:44 +02:00
Xin Tong
31b6c65039 Fix a logic error in eraseUseOfValue.
I failed to create a test case. And we hav existing tests in FSO that will
exercise this.

rdar://25559780
2016-04-13 20:53:28 -07:00
Erik Eckstein
3e52d24853 add a debug dump function for ValueLifetimeAnalysis 2016-04-13 13:22:30 -07:00
Xin Tong
1a4f567685 More conservative about when we can move a release across an instruction
We now consider effect of deinit in addition to the released value.

rdar://25362826

This is the only 10%+ regression i measured on my machine. no performance improvement.

Sim2DArray                                              | 326     | 366     | +12.3%    | **0.89x**
2016-04-12 20:39:30 -07:00
Erik Eckstein
6cdfc2e469 EscapeAnalysis: Make the CGNode class public. It's used by clients. NFC. 2016-04-08 10:20:47 -07:00
Slava Pestov
5aa99fa346 SILOptimizer: Create non-[fragile] specializations of [fragile] functions where possible
Change the optimizer to only make specializations [fragile] if both the
original callee is [fragile] *and* the caller is [fragile].

Otherwise, the specialized callee might be [fragile] even if it is never
called from a [fragile] function, which inhibits the optimizer from
devirtualizing calls inside the specialization.

This opens up some missed optimization opportunities in the performance
inliner and devirtualization, which currently reject fragile->non-fragile
references:

TEST                                                    | OLD_MIN | NEW_MIN | DELTA (%) | SPEEDUP
---                                                     | ---     | ---     | ---       | ---
DictionaryRemoveOfObjects                               | 38391   | 35859   | -6.6%     | **1.07x**
Hanoi                                                   | 5853    | 5288    | -9.7%     | **1.11x**
Phonebook                                               | 18287   | 14988   | -18.0%    | **1.22x**
SetExclusiveOr_OfObjects                                | 20001   | 15906   | -20.5%    | **1.26x**
SetUnion_OfObjects                                      | 16490   | 12370   | -25.0%    | **1.33x**

Right now, passes other than performance inlining and devirtualization
of class methods are not checking invariants on [fragile] functions
at all, which was incorrect; as part of the work on building the
standard library with -enable-resilience, I added these checks, which
regressed performance with resilience disabled. This patch makes up for
these regressions.

Furthermore, once SIL type lowering is aware of resilience, this will
allow the stack promotion pass to make further optimizations after
specializing [fragile] callees.
2016-04-08 02:10:31 -07:00
practicalswift
abfecfde17 [gardening] if ([space]…[space]) → if (…), for(…) → for (…), while(…) → while (…), [[space]x, y[space]] → [x, y] 2016-04-04 16:22:11 +02:00
Mark Lacey
99d4485713 Fix double delete in generic specialization.
We ended up adding the same instruction twice to a SmallVector of
instructions to be deleted. To avoid this, we'll track these
to-be-deleted instructions in a SmallSetVector instead.

We were also failing to add an instruction that we can delete to the set
of instructions to be deleted, so I fixed that as well.

I've added a test case, but it's currently disabled because fixing this
turned up another issue in the same code which I still need to take a
look at.

Fixes rdar://problem/25369617.
2016-03-30 13:10:00 -07:00
Xin Tong
f95d9b3c92 Change FSO heuristic.
FSO functions that have high potential but does not have caller inside
current module.

The thunk can then be inlined into the module calling the function and
the function would get the benefit of FSO.

The heuristic for selecting such function is
1. Have no indirect caller. This would introduce a thunk.
2. Have potential to give better performance. i.e. function argument can
be O2G.

Regression
TEST                                                    | OLD_MIN | NEW_MIN | DELTA (%) | SPEEDUP
---                                                     | ---     | ---     | ---       | ---
BenchLangCallingCFunction                               | 184     | 211     | +14.7%    | **0.87x**
Calculator                                              | 55      | 59      | +7.3%     | **0.93x**
DeadArray                                               | 687     | 741     | +7.9%     | **0.93x**
MonteCarloPi                                            | 39275   | 41669   | +6.1%     | **0.94x**

Improvement
TEST                                                    | OLD_MIN | NEW_MIN | DELTA (%) | SPEEDUP
---                                                     | ---     | ---     | ---       | ---
LuhnAlgoLazy                                            | 2478    | 2327    | -6.1%     | **1.06x**
OpenClose                                               | 54      | 51      | -5.6%     | **1.06x**
SortLettersInPlace                                      | 1016    | 946     | -6.9%     | **1.07x**
ObjectiveCBridgeFromNSDictionaryAnyObjectToStringForced | 149993  | 139755  | -6.8%     | **1.07x**
Phonebook                                               | 9666    | 8992    | -7.0%     | **1.07x**
ObjectiveCBridgeFromNSDictionaryAnyObjectToString       | 222713  | 206538  | -7.3%     | **1.08x**
LuhnAlgoEager                                           | 2393    | 2226    | -7.0%     | **1.08x**
Dictionary                                              | 1307    | 1196    | -8.5%     | **1.09x**
JSONHelperDeserialize                                   | 3808    | 3492    | -8.3%     | **1.09x**
StdlibSort                                              | 7310    | 4084    | -44.1%    | **1.79x**

I see 0.15% increase in code size for Benchmark_O.

Thanks @gottesmm for suggesting this opportunity.

rdar://25345056
2016-03-29 23:04:36 -07:00
Arnold Schwaighofer
255779082e Add a peephole optimization for the builtin "unsafeGuaranteed"
We can remove the retain/release pair preceeding the builtins based on the
knowledge that the lifetime of the reference is guaranteed by someone hanging on
to the reference elsewhere.
2016-03-27 06:47:16 -07:00
Slava Pestov
a9ad760b78 SIL: Clean up duplicated "can be referenced from a fragile function" checks 2016-03-25 22:46:50 -07:00
Chris Lattner
e4c7bca43a Merge pull request #1861 from practicalswift/weekly-cleanup-01
[gardening] Weekly gardening: typos, duplicate includes, header formatting, etc.
2016-03-24 22:39:36 -07:00
Xin Tong
f557a3253d Merge pull request #1857 from trentxintong/FSO
Rename FunctionSignatureOptCloner to FunctionSignatureOpts
2016-03-24 15:57:34 -07:00
practicalswift
d00a5ef814 [gardening] Weekly gardening: typos, duplicate includes, header formatting, etc. 2016-03-24 22:41:10 +01:00
Xin Tong
5907b8a3e2 Rename FunctionSignatureOptCloner to FunctionSignatureOpts
Eventually, we decided to do this

1. Have the function signature opts (used to be called the cloner to create
the optimized function.
2. Mark the thunk as always_inline
3. Rely on the inliner to inline the thunk to get the benefit of calling optimized
function directly.
2016-03-24 12:50:12 -07:00
Xin Tong
e0ba695d17 Merge pull request #1852 from trentxintong/FSO
Remove function signature rewriter and make function signature analysis a Util
2016-03-24 12:42:05 -07:00
Xin Tong
9a3761000c Move function signature analysis to a Util
We really only need this signature analysis in the cloner pass now.
2016-03-24 11:17:47 -07:00
Xin Tong
3f075dfe47 Remove function signature rewriter.
We decided to use the inliner to rewrite the caller's callsites.

And eventually I will turn FunctionSignatureAnalysis into a Utility.
As its data should only be used and kept in the cloner pass.
2016-03-24 10:50:47 -07:00
Xin Tong
c44006aa9d Merge pull request #1824 from trentxintong/RLE
Make sure non-epilogue releases do not kill redundant loads
2016-03-24 10:48:21 -07:00
Xin Tong
524ed34583 Make sure epilogue releases do not kill redundant loads
I did not measure a performance improvements with this.
2016-03-23 23:59:54 -07:00
Xin Tong
9a020c8c7a Minor refactoring in epilogue retain matcher 2016-03-23 22:16:49 -07:00
Xin Tong
b1c7bc5e4b Reinstate "Minor refactoring in epilogue retain matcher" 2016-03-23 22:16:34 -07:00
Andrew Trick
482b264afc Reapply "Merge pull request #1725 from atrick/specialize"
This was mistakenly reverted in an attempt to fix buildbots.
Unfortunately it's now smashed into one commit.

---
Introduce @_specialize(<type list>) internal attribute.

This attribute can be attached to generic functions. The attribute's
arguments must be a list of concrete types to be substituted in the
function's generic signature. Any number of specializations may be
associated with a generic function.

This attribute provides a hint to the compiler. At -O, the compiler
will generate the specified specializations and emit calls to the
specialized code in the original generic function guarded by type
checks.

The current attribute is designed to be an internal tool for
performance experimentation. It does not affect the language or
API. This work may be extended in the future to add user-visible
attributes that do provide API guarantees and/or direct dispatch to
specialized code.

This attribute works on any generic function: a freestanding function
with generic type parameters, a nongeneric method declared in a
generic class, a generic method in a nongeneric class or a generic
method in a generic class. A function's generic signature is a
concatenation of the generic context and the function's own generic
type parameters.

e.g.

struct S<T> {
var x: T
@_specialize(Int, Float)
mutating func exchangeSecond<U>(u: U, _ t: T) -> (U, T) {
x = t
return (u, x)
}
}
// Substitutes: <T, U> with <Int, Float> producing:
// S<Int>::exchangeSecond<Float>(u: Float, t: Int) -> (Float, Int)

---
[SILOptimizer] Introduce an eager-specializer pass.

This pass finds generic functions with @_specialized attributes and
generates specialized code for the attribute's concrete types. It
inserts type checks and guarded dispatch at the beginning of the
generic function for each specialization. Since we don't currently
expose this attribute as API and don't specialize vtables and witness
tables yet, the only way to reach the specialized code is by calling
the generic function which performs the guarded dispatch.

In the future, we can build on this work in several ways:
- cross module dispatch directly to specialized code
- dynamic dispatch directly to specialized code
- automated specialization based on less specific hints
- partial specialization
- and so on...

I reorganized and refactored the optimizer's generic utilities to
support direct function specialization as opposed to apply
specialization.
2016-03-21 12:43:05 -07:00
Xin Tong
6e07c5ec60 Revert "Minor refactoring in epilogue release matcher. NFC"
This reverts commit a191ae72a7.

Broke Opt+Assert, Stdlib DebInfo+Assert.
2016-03-21 11:08:31 -07:00
Xin Tong
b2b5247ba9 Merge pull request #1756 from trentxintong/FSO
Minor refactoring in epilogue release matcher
2016-03-21 07:59:46 -07:00
Xin Tong
a191ae72a7 Minor refactoring in epilogue release matcher. NFC 2016-03-20 23:13:50 -07:00
Xin Tong
570c19b9c6 Merge pull request #1754 from trentxintong/FSO
Remove function signature optimization module pass.
2016-03-20 15:55:50 -07:00
Xin Tong
53888e12b5 Remove FunctionSignatureOpts.cpp.
This optimization pass has been replaced by FunctionSigatureOptCloner.cpp
and FunctionSigatureOptRewriter.cpp in cff61d7fe7
2016-03-20 15:05:02 -07:00
Xin Tong
e3ec0703fd Merge pull request #1744 from trentxintong/FSO
Implement a function signature cloner and rewriter.
2016-03-20 11:44:54 -07:00
practicalswift
a0d494c143 [gardening] Fix recently introduced typos: "fucntion" → "function", "functio" → "function", "mergable" → "mergeable", "mistmatched" → "mismatched" 2016-03-20 10:34:32 +01:00
Xin Tong
cff61d7fe7 Implement a function signature cloner and rewriter.
This split the function signature module pass into 2 functin passes.

By doing so,  this allows us to rewrite to using the FSO-optimized
function prior to attempting inlining, but allow us to do a substantial
amount of optimization on the current function before attempting to do
FSO on that function.

And also helps us to move to a model which module pass is NOT used unless
necesary.

I do not see regression nor improvement for on the performance test suite.

functionsignopts.sil and functionsignopt_sroa.sil are modified because the
mangler now takes into account of information in the projection tree.
2016-03-19 23:57:37 -07:00
Andrew Trick
5bda28e1cb Revert "Merge pull request #1725 from atrick/specialize"
Temporarily reverting @_specialize because stdlib unit tests are
failing on an internal branch during deserialization.

This reverts commit e2c43cfe14, reversing
changes made to 9078011f93.
2016-03-18 22:31:29 -07:00
Erik Eckstein
6d654aa3e8 Debugging on SIL level.
This change follows up on an idea from Michael (thanks!).
It enables debugging and profiling on SIL level, which is useful for compiler debugging.

There is a new frontend option -gsil which lets the compiler write a SIL file and generated debug info for it.
For details see docs/DebuggingTheCompiler.rst and the comments in SILDebugInfoGenerator.cpp.
2016-03-18 14:02:06 -07:00
Andrew Trick
e2c43cfe14 Merge pull request #1725 from atrick/specialize
@_specialize attribute
2016-03-18 13:24:31 -07:00
Andrew Trick
295dc96fb6 [SILOptimizer] Introduce an eager-specializer pass.
This pass finds generic functions with @_specialized attributes and
generates specialized code for the attribute's concrete types. It
inserts type checks and guarded dispatch at the beginning of the
generic function for each specialization. Since we don't currently
expose this attribute as API and don't specialize vtables and witness
tables yet, the only way to reach the specialized code is by calling
the generic function which performs the guarded dispatch.

In the future, we can build on this work in several ways:
- cross module dispatch directly to specialized code
- dynamic dispatch directly to specialized code
- automated specialization based on less specific hints
- partial specialization
- and so on...

I reorganized and refactored the optimizer's generic utilities to
support direct function specialization as opposed to apply
specialization.
2016-03-18 10:18:55 -07:00
Xin Tong
fd353df19e Remove some of unneeded functionality in CallerAnalysis
We really only need the analysis to tell whether a function has caller
inside the module or not. We do not need to know the callsites.

Remove them for now to make the analysis more memory efficient.

Add a note to indicate it can be extended.
2016-03-17 21:16:24 -07:00
Xin Tong
f543c336e7 Use SetVector instead of a SmallVector+DenseMap in CallerAnalysis 2016-03-17 17:31:38 -07:00