Augment the `isSingleRetainablePointer` check that allows IRGen to avoid adding branching around retain/release operations on enums that use the null pointer extra inhabitant with a more general "can value witness extra inhabitants" method on TypeInfo, which says whether a type's retain/release operations are safe to invoke on some or all of its extra inhabitants. This lets us generalize the optimization to include things like `String?` or `ClassProtocol?` which are common types with a nullable pointer in them.
This change modifies spare bit masks so that they are arranged in
the byte order of the target platform. It also modifies and
consolidates the code that gathers and scatters bits into enum
values.
All enum-related validation tests are now passing on IBM Z (s390x)
which is a big-endian platform.
This change uses the 'gather bits' functionality of enum payloads
to create a contiguous value to switch over. This allows us to
remove the code that currently attempts to build a switch statement
by comparing each element in the payload in turn.
The downside of this technique is that we may do more work up front
gathering bits and we may also need to compare larger values in some
situations. The upside is that we can remove a lot of complicated
code from IRGen. Also, we pass the responsibility for multi-way
branch generation to LLVM which can make use of a wider range of
switch lowering strategies than IRGen can sensibly support.
Add a new scatterBits function that is simpler and more generic
than the old interleaveSpareBits function. It is essentially a
constant version of the emitScatterBits function.
The change replaces 'set bit enumeration' with arithmetic
and bitwise operations. For example, the formula
'(((x & -x) + x) & x) ^ x' can be used to find the rightmost
contiguous bit mask. This is essentially the operation that
SetBitEnumerator.findNext() performed.
Removing this functionality reduces the complexity of the
ClusteredBitVector (a.k.a. SpareBitVector) implementation and,
more importantly, API which will make it easier to modify
the implementation of spare bit masks going forward. My end
goal being to make spare bit operations work more reliably on
big endian systems.
Side note:
This change modifies the emit gather/scatter functions so that
they work with an APInt, rather than a SpareBitVector, which
makes these functions a bit more generic. These functions emit
instructions that are essentially equivalent to the parallel bit
extract/deposit (PEXT and PDEP) instructions in BMI2 on x86_64
(although we don't emit those directly currently). They also map
well to bitwise manipulation instructions on other platforms (e.g.
RISBG on IBM Z). So we might find uses for them outside spare bit
manipulation in the future.
This is essentially a long-belated follow-up to Arnold's #12606.
The key observation here is that the enum-tag-single-payload witnesses
are strictly more powerful than the XI witnesses: you can simulate
the XI witnesses by using an extra case count that's <= the XI count.
Of course the result is less efficient than the XI witnesses, but
that's less important than overall code size, and we can work on
fast-paths for that.
The extra inhabitant count is stored in a 32-bit field (always present)
following the ValueWitnessFlags, which now occupy a fixed 32 bits.
This inflates non-XI VWTs on 32-bit targets by a word, but the net effect
on XI VWTs is to shrink them by two words, which is likely to be the
more important change. Also, being able to access the XI count directly
should be a nice win.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
This allows us to layout-optimize Optional<T> when T is a struct with an
extra-inhabitant-bearing field anywhere in its definition, not only at
the beginning. rdar://problem/43019427
Fixes a regression in the source compatibility suite which I had a
lot of trouble extracting into a separate test case.
Most of this patch is just moving the outlining code into a separate
file and organizing it into a helper class instead of copy/pasting
so much code. The main functional change is implicit in the difference
between collecting formal metadata and collecting it for layout, which
then is exploited in bindMetadataParameters.
As a secondary change, stop collecting metadata for class-bounded
archetypes; we don't actually need it to do value operations.
Most of the work of this patch is just propagating metadata states
throughout the system, especially local-type-data caching and
metadata-path resolution. It took a few design revisions to get both
DynamicMetadataRequest and MetadataResponse to a shape that felt
right and seemed to make everything easier.
The design is laid out pretty clearly (I hope) in the comments on
DynamicMetadataRequest and MetadataResponse, so I'm not going to
belabor it again here. Instead, I'll list out the work that's still
outstanding:
- I'm sure there are places we're asking for complete metadata where
we could be asking for something weaker.
- I need to actually test the runtime behavior to verify that it's
breaking the cycles it's supposed to, instead of just not regressing
anything else.
- I need to add something to the runtime to actually force all the
generic arguments of a generic type to be complete before reporting
completion. I think we can get away with this for now because all
existing types construct themselves completely on the first request,
but there might be a race condition there if another asks for the
type argument, gets an abstract metadata, and constructs a type with
it without ever needing it to be completed.
- Non-generic resilient types need to be switched over to an IRGen
pattern that supports initialization suspension.
- We should probably space out the MetadataStates so that there's some
space between Abstract and Complete.
- The runtime just calmly sits there, never making progress and
permanently blocking any waiting threads, if you actually form an
unresolvable metadata dependency cycle. It is possible to set up such
a thing in a way that Sema can't diagnose, and we should detect it at
runtime. I've set up some infrastructure so that it should be
straightforward to diagnose this, but I haven't actually implemented
the diagnostic yet.
- It's not clear to me that swift_checkMetadataState is really cheap
enough that it doesn't make sense to use a cache for type-fulfilled
metadata in associated type access functions. Fortunately this is not
ABI-affecting, so we can evaluate it anytime.
- Type layout really seems like a lot of code now that we sometimes
need to call swift_checkMetadataState for generic arguments. Maybe
we can have the runtime do this by marking low bits or something, so
that a TypeLayoutRef is actually either (1) a TypeLayout, (2) a known
layout-complete metadata, or (3) a metadata of unknown state. We could
do that later with a flag, but we'll need to at least future-proof by
allowing the runtime functions to return a MetadataDependency.
If a type has the same layout as one of the basic integer types, or has a single refcounted pointer representation, we can use prefab value witness tables from the runtime instead of instantiating new ones. This saves quite a bit of code size, particularly in the Apple SDK overlays, where there are lots of swift_newtype wrappers and option set structs.
All of the information contained by this field (list of property names)
is already encoded as part of the field reflection metadata and
is accessible via `swift_getFieldAt` runtime method.
- Create the value witness table as a separate global object instead
of concatenating it to the metadata pattern.
- Always pass the metadata to the runtime and let the runtime handle
instantiating or modifying the value witness table.
- Pass the right layout algorithm version to the runtime; currently
this is always "Swift 5".
- Create a runtime function to instantiate single-case enums.
Among other things, this makes the copying of the VWT, and any
modifications of it, explicit and in the runtime, which is more
future-proof.
The approach here is to split this into two cases:
- If all case payloads have a fixed size, spare bits may be
potentially used to differentiate between cases, and the
remote reflection library does not have enough information to
compute the layout itself.
However, the total size must be fixed, so IRGen just emits a
builtin type descriptor (which I need to rename to 'fixed type
descriptor' since these are also used for imported value types,
and now, certain enums).
- If at least one case has a size that depends on a generic
parameter or is a resilient type, IRGen does not know the size,
but this means fancy tricks with spare bits cannot be used either.
The remote reflection library uses the same approach as the
runtime, basically taking the maximum of the payload size and
alignment, and adding a tag byte.
As with single-payload enums, we produce a new kind of
RecordTypeInfo, this time with a field for every enum case.
All cases start at offset zero (but of course this might change,
if for example we put the enum tag before the address point).
Also, just as with single-payload enums, there is no remote
'project case index' operation on ReflectionContext yet.
So the the main benefit from this change is that we don't entirely
give up when doing layout of class instances containing enums;
however, tools still cannot look inside the enum values themselves,
except in the simplest cases involving optionals.
Notably, the remote reflection library finally understands all
of the standard library's collection types -- Array, Character,
Dictionary, Set, and String.
Properly lower reference counting SIL instructions with nonatomic attribute as invocations of corresponding non-atomic reference counting runtime functions.
Recent changes added support for resiliently-sized enums, and
enums resilient to changes in implementation strategy.
This patch adds resilient case numbering, fixing the problem
where adding new payload cases would break existing code by
changing the numbering of no-payload cases.
The problem is that internally, enum cases are numbered with payload
cases coming first, followed by no-payload cases. While each list
is itself in declaration order, with new additions coming at the
end, we need to partition it to give us a fast runtime test for
"is this a payload or no-payload case index."
The resilient numbering strategy used here is that the getEnumTag
and destructiveInjectEnumTag value witness functions now take a
tag index in the range [-ElementsWithPayload..ElementsWithNoPayload-1].
Payload elements are numbered in *reverse* declaration order, so
adding new payload cases yields decreasing tag indices, and adding
new no-payload cases yields increasing tag indices, allowing use
sites to be resilient.
This adds the adjustment between 'fragile' and 'resilient' tag
indices in a somewhat unsatisfying manner, because the calculation
could be pushed down further into EnumImplStrategy, simplifying
both the IRGen code and the generated IR. I'll clean this up later.
In the meantime, clean up some other stuff in GenEnum.cpp, mostly
abstracting code that walks cases.
Correct format:
```
//===--- Name of file - Description ----------------------------*- Lang -*-===//
```
Notes:
* Comment line should be exactly 80 chars.
* Padding: Pad with dashes after "Description" to reach 80 chars.
* "Name of file", "Description" and "Lang" are all optional.
* In case of missing "Lang": drop the "-*-" markers.
* In case of missing space: drop one, two or three dashes before "Name of file".
If an enum is fixed-layout in our resilience domain but not
universally fixed-layout, other resilience domains will use
runtime functions to project and inject payloads.
These expect to find the payload size in the metadata, so
emit it if the enum is not universally fixed-layout.
Note that we do know the payload size, so it is constant
for us; there's no runtime call required to initialize
the metadata.
This value witness function takes an address of an enum value where the
payload has already been initialized, together with a case index, and
forms the enum value.
The formal behavior can be thought of as satisfying an identity in
relation to the existing two enum value witnesses. For any enum
value, the following is to leave the value unchanged:
tag = getEnumTag(value)
destructiveProjectEnumData(value)
destructiveInjectEnumData(value, tag)
This is the last missing piece for the inject_enum_addr SIL instruction
to handle resilient enums, allowing the implementation of an enum to be
decoupled from its uses. Also, it should be useful for dynamically
constructing enum cases with write reflection, once we get around to
doing such a thing.
The body of the value witness is emitted by a new emitStoreTag() method
on EnumImplStrategy. This is similar to the existing storeTag(), except
the case index is a value instead of a contant.
This is implemented as follows for the different enum strategies:
1) For enums consisting of a single case, this is trivial.
2) For enums where all cases are empty, stores the case index into the
payload area.
3) For enums with a single payload case, emits a call to a runtime
function. Note that for non-generic single payload enums, this could
be open-coded more efficiently, but the function still has the
correct behavior since it supports extra inhabitants and so on.
A follow-up patch will make this more efficient.
4) For multi-payload enums, there are two cases:
a) If one of the payloads is generic or resilient, the enum is
dynamically-sized, and a call to a runtime function is emitted.
b) If the entire enum is fixed-size, the value witness checks if
the case is empty or not.
If the case has a payload, the case index is swizzled into
spare bits of the payload, if any, with remaining bits going
into the extra tag area.
If the case is empty, the case index is swizzled into the
spare bits of the payload, the remaining bits of the payload,
and the extra tag area.
The implementations of emitStoreTag() duplicate existing logic in the
enum strategies, in particular case 4)b) is rather complicated.
Code cleanups are welcome here!
For example, if a @_fixed_layout struct A contains a resilient struct B
from the same module M, then inside M, A can have a fixed size, but
outside, A has a dynamic size because B is opaque. In this case, A is
not "universally fixed-size". This impacts multi-payload enums, because
if A is placed inside a multi-payload enum E which is lowered inside X,
we would get a fixed layout with spare bits, but lowering E outside of
X would yield a dynamic layout. This is incorrect.
Fix this by plumbing through a new predicate IsAlwaysFixedSize, which
is similar to IsPOD and IsBitwiseTakable, where a compound type inherits
the property if all leaf types exhibit it, and only use spare bits if
the original and substituted types have this property.
We were calling hasTypeParameter() on the interface type of the enum
element. Since enum elements are case constructors now, the interface
type was a GenericFunctionType, and since conceptually these cannot
contain free type variables, this would always return to false.
The right fix here is to pass down the unsubstituted type info and
look at the spare bits of that when doing multi-payload enum layout.
Now that this works, we can remove a FIXME that was added to patch
around this.
Fixes two issues with the original patch:
- If numCaseBits was >= 32, the high bits of the tag were shifted
by an incorrect amount. In this case, the high bits were always
zero anyway, but the invalid shift triggered an assertion in
ARM64 codegen. Fixed to not generate the offending instructions
at all, since just loading the payload value is enough.
- When the payload is very small, the number of no-payload cases
could be greater than 2**payloadBits, requiring multiple payload
tags to represent them all. There was an arithmetic error in
this case.
This re-applies r31328.
Swift SVN r31365
The payload tag discriminates between payload cases, but empty cases
are stored in the common spare bits of the payload types, so the logic
here would report all empty cases as having the first empty case selected.
And when writing tests, I didn't cover enums with multiple empty *and*
non-empty cases. Oops...
Now we emit a little bit more code to assemble the correct case index
from both the payload tag and payload value, then select between the
two values depending on the payload tag.
Fixes <rdar://problem/22192074>.
Swift SVN r31328
These will be used for reflection, and eventually to speed up generic
operations on single payload enums as well.
Progress on <rdar://problem/21739870>.
Swift SVN r30214
Using LLVM large integers to represent enum payloads has been causing compiler performance and code size problems with large types, and has also exposed a long tail of backend bugs. Replace them with an "EnumPayload" abstraction that manages breaking a large opaque binary value into chunks, along with masking, testing, and extracting typed data from the binary blob. For now, use a word-sized chunking schema always, though the architecture here is set up to eventually allow the use of an arbitrary explosion schema, which would benefit single-payload enums by allowing the payload to follow the explosion schema of the contained value.
This time, adjust the assertion in emitCompare not to perform a check before we've established that the payload is empty, since APInt doesn't have a 0-bit state and the default-constructed form is nondeterminisitic. (We should probably use a more-tailored representation for enum payload bit patterns than APInt or ClusteredBitVector.)
Swift SVN r28985
Using LLVM large integers to represent enum payloads has been causing compiler performance and code size problems with large types, and has also exposed a long tail of backend bugs. Replace them with an "EnumPayload" abstraction that manages breaking a large opaque binary value into chunks, along with masking, testing, and extracting typed data from the binary blob. For now, use a word-sized chunking schema always, though the architecture here is set up to eventually allow the use of an arbitrary explosion schema, which would benefit single-payload enums by allowing the payload to follow the explosion schema of the contained value.
Swift SVN r28982