The uniquing key for these was just the number of witness tables,
but the function itself referenced the specific existential type
it was instantiated with.
Everything still worked because getOrCreateHelperFunction() would
bitcast an existing function to the correct type, and in practice
the layout of an existential type only depends on the number of
witness tables.
However on master-next, other changes were made that stripped
off the bitcasts. This would result in assertions or LLVM
verifier failures when multiple existential types were used in
a single translation unit.
Fixes <rdar://problem/54780404>.
box descriptors
We want to substitute opaque result types in addTypeRef but when we pass
SILFunctionTypes this would fail because AST type substitution does not
support lowered SIL types.
Instead add addLoweredTypeRef which substitutes based on SILTypes.
rdar://54529445
If we're allowed to know at IRGen time what the underlying type of an opaque type is, we can
satisfy references to the opaque type's metadata or protocol witness tables by directly referencing
the underlying type instead.
When we generate code that asks for complete metadata for a fully concrete specific type that
doesn't have trivial metadata access, like `(Int, String)` or `[String: [Any]]`,
generate a cache variable that points to a mangled name, and use a common accessor function
that turns that cache variable into a pointer to the instantiated metadata. This saves a bunch
of code size, and should have minimal runtime impact, since the demangling of any string only
has to happen once.
This mostly just works, though it exposed a couple of issues:
- Mangling a type ref including objc protocols didn't cause the objc protocol record to get
instantiated. Fixed as part of this patch.
- The runtime type demangler doesn't correctly handle retroactive conformances. If there are
multiple retroactive conformances in a process at runtime, then even though the mangled string
refers to a specific conformance, the runtime still just picks one without listening to the
mangler. This is left to fix later, rdar://problem/53828345.
There is some more follow-up work that we can do to further improve the gains:
- We could improve the runtime-provided entry points, adding versions that don't require size
to be cached, and which can handle arbitrary metadata requests. This would allow for mangled
names to also be used for incomplete metadata accesses and improve code size of some generic
type accessors. However, we'd only be able to take advantage of the new entry points in
OSes that ship a new runtime.
- We could choose to always symbolic reference all type references, which would generally reduce
the size of mangled strings, as well as make runtime demangling more efficient, since it wouldn't
need to hit the runtime caches. This would however require that we be able to handle symbolic
references across files in the MetadataReader in order to avoid regressing remote mirror
functionality.
When referencing a superclass type from a subclass, for example, the
type uses the subclass's generic parameters, not the superclass's.
This can be important if a nested type constrains away some of its
parent type's generic parameters.
This doesn't solve all the problems around mis-referenced generic
parameters when some are constrained away, though. That might
require a runtime change. See the FIXME comments in the test cases.
rdar://problem/51627403
Instead of a thunk insert the dispatch into the original function.
If the original function should be executed the prolog just jumps to the "real" code in the function. Otherwise the replacement function is called.
There is one little complication here: when the replacement function calls the original function, the original function should not dispatch to the replacement again.
To pass this information, we use a flag in thread local storage.
The setting and reading of the flag is done in two new runtime functions.
rdar://problem/51043781
- In Sema, don't traverse nested declarations while deducing the opaque return type. This would
cause returns inside nested functions to clobber the return type of the outer function.
- In IRGen, walk the list of opaque return types we keep in the SourceFile already for type
reconstruction, instead of trying to visit them ad-hoc as part of walking the AST, since
IRGen doesn't normally walk the bodies of function decls directly.
Fixes rdar://problem/50459091
They aren't normally decl contexts, but if one has an opaque type, we want to be able to record
the property as a context so that we can reconstruct it in RemoteAST.
If -enable-anonymous-context-mangled-names is enabled, meaning that we assign names to
anonymous context descriptors for discovery by RemoteAST, then include opaque type descriptors
in the type metadata record table so that they can also be found at runtime by RemoteAST for
debugger support.
This is to support dynamic function replacement of functions with opaque
result type.
This approach requires that all state is thrown away (that could contain the
old returned type for an opaque type) between replacements.
rdar://48887938
Previously even if a type's metadata was optimized away, we would still
emit a field descriptor, which in turn could reference nominal type
descriptors for other types via symbolic references, etc.
Instead of a wholly separate lazyness mechanism for foreign metadata where
the first call to getAddrOfForeignTypeMetadataCandidate() would emit the
metadata, emit it using the lazy metadata mechanism.
This eliminates some code duplication. It also ensures that foreign
metadata is only emitted once per SIL module, and not once per LLVM
module, avoiding duplicate copies that must be ODR'd away in multi-threaded
mode.
This fixes the test case from <rdar://problem/49710077>.
The old logic was confusing. The LazyTypeGlobals map would contain
entries for all referenced types, even those without lazy metadata.
And for a type with lazy metadata, the IsLazy field would begin
with a value of false -- unless it was imported.
When a non-imported type was finally visited in the AST, we would
try to "enable" lazyness for it, which meant queueing up any
metadata that had been requested prior, or immediately emitting
the metadata otherwise.
Instead, let's add a separate map that caches whether a type has
lazy metadata or not. The first time we ask for the metadata of a
type, consult this map. If the type has lazy metadata according to
the map, queue up metadata emission for the type. Otherwise, emit
metadata eagerly when the type is visited in the AST.
This consolidates the various doesClassMetadataRequire*() checks, making
them more managable.
This also adds a forth state, ClassMetadataStrategy::Update. This will be used
when deploying to the new Objective-C runtime. For now it's not plumbed through.
Progress on <rdar://problem/47649465>.
Protocol descriptors for resilient protocols relatively-reference
default witness thunks, so when using -num-threads N with N > 1,
we must ensure the default witness thunk is emitted in the same
LLVM module.
Protocol descriptors for resilient protocols relatively-reference
default witness thunks, so when using -num-threads N with N > 1,
we must ensure the default witness thunk is emitted in the same
LLVM module.
We were wastefully emitting an accessor if a field had a type, for
example if my field type was (() -> (X, Array<Y>>) we would force
the emission of a function to construct (() -> (X, Array<Y>)) even
though all we care about is the type metadata for X and Y.
Conversely, we would skip the field type if it contained an
archetype, even if it otherwise contained metadata that we need
to force to emit, for instance something like (T, X) where T is
a generic parameter and X is a nominal type.
A final side effect is we no longer try to emit type metadata for
one-element tuples when emitting enum payload metadata, which is
something I want to assert against.
This is essentially a long-belated follow-up to Arnold's #12606.
The key observation here is that the enum-tag-single-payload witnesses
are strictly more powerful than the XI witnesses: you can simulate
the XI witnesses by using an extra case count that's <= the XI count.
Of course the result is less efficient than the XI witnesses, but
that's less important than overall code size, and we can work on
fast-paths for that.
The extra inhabitant count is stored in a 32-bit field (always present)
following the ValueWitnessFlags, which now occupy a fixed 32 bits.
This inflates non-XI VWTs on 32-bit targets by a word, but the net effect
on XI VWTs is to shrink them by two words, which is likely to be the
more important change. Also, being able to access the XI count directly
should be a nice win.