The compiler pre-canonicalizes protocol composition types by minimizing constraints and sorting the remaining protocols by module + name, which ought to be globally stable within a program (assuming there aren't multiple modules with the same name, in which case we'll have bigger problems…). The compiler also statically lays out existential types according to its conception of the canonical composition ordering, so the runtime's own attempts to form a stable ordering lead to layout inconsistencies between runtime and compile-time layout, leading to crashes like SR-4477.
* IRGen: Change c-o-w existential implementation functions
* initialzeBufferWith(Copy|Take)OfBuffer value witness implementation for cow existentials
Implement and use initialzeBufferWith(Copy|Take)OfBuffer value witnesses for
copy-on-write existentials.
Before we used a free standing function but the overhead of doing so was
noticable (~20-30%) on micro benchmarks.
* IRGen: Use common getCopyOutOfLineBoxPointerFunction
* Add a runtime function to conditionally make a box unique
* Fix compilation of HeapObject.cpp on i386
* Fix IRGen test case
* Fix test case for i386
Adds the runtime implementation for copy-on-write existentials. This support is
enabled if SWIFT_RUNTIME_ENABLE_COW_EXISTENTIALS is defined. Focus is on
correctness -- not performance yet.
Don't use allocate/deallocate/projectBuffer witnesses for globals in cow
existential mode.
Use SWIFT_RUNTIME_ENABLE_COW_EXISTENTIALS configuration to set the default for
SILOptions.
This includes an IRGen fix to use the right projection in
emitMetatypeOfOpaqueExistential if SWIFT_RUNTIME_ENABLE_COW_EXISTENTIALS is set.
Use unknownRetain instead of native retain in dynamicCastToExistential.
Previously it was part of swiftBasic.
The demangler library does not depend on llvm (except some header-only utilities like StringRef). Putting it into its own library makes sure that no llvm stuff will be linked into clients which use the demangler library.
This change also contains other refactoring, like moving demangler code into different files. This makes it easier to remove the old demangler from the runtime library when we switch to the new symbol mangling.
Also in this commit: remove some unused API functions from the demangler Context.
fixes rdar://problem/30503344
This makes the demangler about 10 times faster.
It also changes the lifetimes of nodes. Previously nodes were reference-counted.
Now the returned demangle node-tree is owned by the Demangler class and it’s lifetime ends with the lifetime of the Demangler.
Therefore the old (and already deprecated) global functions demangleSymbolAsNode and demangleTypeAsNode are no longer available.
Another change is that the demangling for reflection now only supports the new mangling (which should be no problem because
we are generating only new mangled names for reflection).
It also uses the new mangling for type names in meta-data (except for top-level non-generic classes).
lldb has now support for new mangled metadata type names.
This reinstates commit 21ba292943.
This seems to more than fix a performance regression that we
detected on a metadata-allocation microbenchmark.
A few months ago, I improved the metadata cache representation
and changed the metadata allocation scheme to primarily use malloc.
Previously, we'd been using malloc in the concurrent tree data
structure but a per-cache slab allocator for the metadata itself.
At the time, I was concerned about the overhead of per-cache
allocators, since many metadata patterns see only a small number
of instantiations. That's still an important factor, so in the
new scheme we're using a global allocator; but instead of using
malloc for individual allocations, we're using a slab allocator,
which should have better peak, single-thread performance, at the
cost of not easily supporting deallocation. Deallocation is
only used for metadata when there's contention on the cache, and
specifically only when there's contention for the same key, so
leaking a little isn't the worst thing in the world.
The initial slab is a 64K globally-allocated buffer.
Successive slabs are 16K and allocated with malloc.
rdar://28189496
For this we are linking the new re-mangler instead of the old one into the swift runtime library.
Also we are linking the new de-mangling into the swift runtime library.
It also switches to the new mangling for class names of generic swift classes in the metadata.
Note that for non-generic class we still have to use the old mangling, because the ObjC runtime in the OS depends on it (it de-mangles the class names).
But names of generic classes are not handled by the ObjC runtime anyway, so there should be no problem to change the mangling for those.
The reason for this change is that it avoids linking the old re-mangler into the runtime library.
This seems to more than fix a performance regression that we
detected on a metadata-allocation microbenchmark.
A few months ago, I improved the metadata cache representation
and changed the metadata allocation scheme to primarily use malloc.
Previously, we'd been using malloc in the concurrent tree data
structure but a per-cache slab allocator for the metadata itself.
At the time, I was concerned about the overhead of per-cache
allocators, since many metadata patterns see only a small number
of instantiations. That's still an important factor, so in the
new scheme we're using a global allocator; but instead of using
malloc for individual allocations, we're using a slab allocator,
which should have better peak, single-thread performance, at the
cost of not easily supporting deallocation. Deallocation is
only used for metadata when there's contention on the cache, and
specifically only when there's contention for the same key, so
leaking a little isn't the worst thing in the world.
The initial slab is a 64K globally-allocated buffer.
Successive slabs are 16K and allocated with malloc.
rdar://28189496
Changes:
* Terminate all namespaces with the correct closing comment.
* Make sure argument names in comments match the corresponding parameter name.
* Remove redundant get() calls on smart pointers.
* Prefer using "override" or "final" instead of "virtual". Remove "virtual" where appropriate.
This affects the computed stride for fixed-sized types in IRGen as well as the stored stride in value witness tables.
The reason is to let comparisons and difference operations work for pointers to zero-sized types.
(Currently this is achieved by using Builtin.strideof_nonzero in MemoryLayout.stride, but this requires a std::max(1, stride) operation after loading the stride)
IIRC we never had any evidence that the performance impact of a
separate allocator here was actually measurable, and it does come
at a significant fragmentation cost because every single cache
allocates at least a page of memory. Sharing that with the system
allocator makes more sense, even if these allocations are typically
permanent.
This also means that standard memory-debugging tools will actually
find problems with out-of-bounds accesses to metadata.
MetadataCache's allocator into it.
The major functional change here is that MetadataCache will now use
the slab allocator for tree nodes, but I also switched the Hashable
conformances cache to use ConcurrentMap directly instead of a
Lazy<ConcurrentMap<>>.
Previously, these were all using MetadataCache. MetadataCache is a
more heavyweight structure which acquires a lock before building the
metadata. This is appropriate if building the metadata is very
expensive or might have semantic side-effects which cannot be rolled
back. It's also useful when there's a risk of re-entrance, since it
can diagnose such things instead of simply dead-locking or infinitely
recursing. However, it's necessary for structural cases like tuple
and function types, and instead we can just use ConcurrentMap, which
does a compare-and-swap to publish the constructed metadata and
potentially destroys it if another thread successfully won the race.
This is an optimization which we could not previously attempt.
As part of this, fix tuple metadata uniquing to consider the label
string correctly. This exposes a bug where the runtime demangling
of tuple metadata nodes doesn't preserve labels; fix this as well.
Previously, these were all using MetadataCache. MetadataCache is a
more heavyweight structure which acquires a lock before building the
metadata. This is appropriate if building the metadata is very
expensive or might have semantic side-effects which cannot be rolled
back. It's also useful when there's a risk of re-entrance, since it
can diagnose such things instead of simply dead-locking or infinitely
recursing. However, it's necessary for structural cases like tuple
and function types, and instead we can just use ConcurrentMap, which
does a compare-and-swap to publish the constructed metadata and
potentially destroys it if another thread successfully won the race.
This is an optimization which we could not previously attempt.
As part of this, fix tuple metadata uniquing to consider the label
string correctly. This exposes a bug where the runtime demangling
of tuple metadata nodes doesn't preserve labels; fix this as well.
under the lock.
Unfortunately, unit-testing this is gratuitously difficult
because of C++ problems. (Fixed in C++17, apparently.) Will
be tested by a future patch which creates a nested foreign
type.