This change modifies spare bit masks so that they are arranged in
the byte order of the target platform. It also modifies and
consolidates the code that gathers and scatters bits into enum
values.
All enum-related validation tests are now passing on IBM Z (s390x)
which is a big-endian platform.
This commit centralizes the code that converts variable length
tag values, stored in enum payloads and extra tag bytes, to and
from the 4-byte integer values that the runtime uses to represent
the enum case.
Note that currently big endian machines will store the tag value
in the first word of the destination. This reflects the current
behaviour of the compiler. I am however expecting to change this
so that the value is stored as a true variable-length big-endian
integer in the near future, so the tag value will be stored in
the last 4 bytes of payloads rather than the first 4 bytes like
they are on little-endian systems.
The part of the tag stored in the payload can currently be up to
8 bytes in size (though only the 'low' 4 bytes can be non-zero).
On little-endian machines this doesn't matter, we can always just
store up to 4 bytes and zero the remaining payload bytes. On big-
endian systems however we may need to store more than 4 bytes.
The store implementation now mirrors the runtime code that fetches
the tag on big-endian systems which already treats the payload tag
as an 8 byte integer.
This is a spot fix but longer term we might want to consider
refactoring this code to reduce the number of differences between
big- and little-endian implementations. For example, we could
centralise some of the copying logic and/or make the payload tag
a 4 byte field on all platforms.
This is essentially a long-belated follow-up to Arnold's #12606.
The key observation here is that the enum-tag-single-payload witnesses
are strictly more powerful than the XI witnesses: you can simulate
the XI witnesses by using an extra case count that's <= the XI count.
Of course the result is less efficient than the XI witnesses, but
that's less important than overall code size, and we can work on
fast-paths for that.
The extra inhabitant count is stored in a 32-bit field (always present)
following the ValueWitnessFlags, which now occupy a fixed 32 bits.
This inflates non-XI VWTs on 32-bit targets by a word, but the net effect
on XI VWTs is to shrink them by two words, which is likely to be the
more important change. Also, being able to access the XI count directly
should be a nice win.
Previously, they would forward their unused spare bits to be used by other multi-payload enums, but
did not implement anything for single-payload extra inhabitants.
So far single payload enums were implemented in terms of runtime functions which
internally emitted several calls to value witnesses.
This commit adds value witnesses to get and store the enum tag side stepping the
need for witness calls as this information is statically available in many cases
/// int (*getEnumTagSinglePayload)(const T* enum, UINT_TYPE emptyCases)
/// Given an instance of valid single payload enum with a payload of this
/// witness table's type (e.g Optional<ThisType>) , get the tag of the enum.
/// void (*storeEnumTagSinglePayload)(T* enum, INT_TYPE whichCase,
/// UINT_TYPE emptyCases)
/// Given uninitialized memory for an instance of a single payload enum with a
/// payload of this witness table's type (e.g Optional<ThisType>), store the
/// tag.
A simple 'for element in array' loop in generic code operating on a
ContigousArray of Int is ~25% faster on arm64.
rdar://31408033