mirror of
https://github.com/apple/swift.git
synced 2025-12-14 20:36:38 +01:00
472 lines
18 KiB
ReStructuredText
472 lines
18 KiB
ReStructuredText
:orphan:
|
|
|
|
Remote mirrors proposal
|
|
=======================
|
|
|
|
.. contents::
|
|
|
|
This proposal describes a new implementation for nominal type metadata which
|
|
will enable out-of-process heap inspection, intended for use by debugging tools
|
|
for detecting leaks and cycles. This implementation will subsume the existing
|
|
reflection support for Mirrors, enabling out-of-process usage while also
|
|
reducing generated binary size.
|
|
|
|
Radars tracking this work:
|
|
|
|
- rdar://problem/15617914
|
|
- rdar://problem/17019505
|
|
- rdar://problem/20771693
|
|
|
|
Goals and non-goals
|
|
-------------------
|
|
|
|
We wish to do post-mortem debugging of memory allocations in a Swift program.
|
|
Debugging tools can already introspect the memory allocator to identify all
|
|
live memory allocations in the program's heap.
|
|
|
|
If the compiler were to emit the necessary metadata, the layout of most
|
|
allocations can be ascertained, and in particular we can identify any
|
|
references inside the heap object. This metadata can be used together with the
|
|
core dump of a program to build a graph of objects.
|
|
|
|
We have to be able to get all the necessary information without executing any
|
|
code in the address space of the target, since it may be dead or otherwise in a
|
|
funny state.
|
|
|
|
In order to identify strong retain cycles, we need to know for each reference
|
|
if it is strong, weak, or unowned.
|
|
|
|
We wish to be able to opt out of metadata selectively. For secrecy, we might
|
|
want to strip out field names, but keep metadata about which fields contain
|
|
references. For release builds, we might want to strip out most of the field
|
|
metadata altogether, except where explicitly required for code that relies on
|
|
reflection for functionality.
|
|
|
|
It would be better if the new functionality subsumes some of the existing
|
|
metadata, instead of adding a whole new set of structures that the compiler and
|
|
runtime must keep in sync.
|
|
|
|
While this should have zero runtime overhead when not in use, it is OK if
|
|
introspection requires some additional computation, especially if it can be
|
|
front-loaded or memoized.
|
|
|
|
It is OK if in rare cases the metadata is not precise -- for some allocations,
|
|
we might not be able to figure out the runtime type of the contained value.
|
|
Also we will not attempt to verify the "roots" of the object graph by walking
|
|
the stack for pointers.
|
|
|
|
Types of heap allocations
|
|
-------------------------
|
|
|
|
There are several types of heap allocations in Swift. We mostly concern
|
|
ourselves with class instances for now, but in the fullness of time we would
|
|
like to have accurate metadata for all heap allocations.
|
|
|
|
Swift class instances
|
|
~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
These have an isa pointer that points to a class metadata record.
|
|
|
|
Objective-C class instances
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
These also have an isa pointer, but the class metadata record has the
|
|
Objective-C bit set.
|
|
|
|
Boxes
|
|
~~~~~
|
|
|
|
These are used for heap-allocating mutable values captured in closures, for
|
|
indirect enum cases, and for Error existential values. They have an
|
|
identifying isa pointer and reference count, but the isa pointer is shared by
|
|
all boxes and thus does not describe the heap layout of the box.
|
|
|
|
Contexts
|
|
~~~~~~~~
|
|
|
|
The context for a thick function is laid out like a tuple consisting of the
|
|
captured values. Currently, the only aspect of the layout that is needed by the
|
|
runtime is knowledge of which captured values are heap pointers. A unique isa
|
|
pointer is created for each possible layout here.
|
|
|
|
Blocks
|
|
~~~~~~
|
|
|
|
Blocks are similar to contexts but have a common header and package the
|
|
function pointer and captured values in a single retainable heap object.
|
|
|
|
Metatypes
|
|
~~~~~~~~~
|
|
|
|
Runtime-allocated metatypes will appear in the malloc heap. They themselves
|
|
cannot contain heap references though.
|
|
|
|
|
|
Opaque value buffers
|
|
~~~~~~~~~~~~~~~~~~~~
|
|
|
|
These come up when a value is too large to fit inside of an existential's
|
|
inline storage, for example. They do not have a header, so we will not attempt
|
|
to introspect them at first -- eventually, we could identify pointers to buffers
|
|
where the existential is itself inside of a heap-allocated object.
|
|
|
|
Existing metadata
|
|
-----------------
|
|
|
|
Swift already has a lot of reflective features and much of the groundwork for
|
|
this exists in some form or another, but each one is lacking in at least one
|
|
important respect.
|
|
|
|
Generic type metadata
|
|
~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The isa pointer of an object points to a metadata record. For instances of
|
|
generic class types, the metadata is lazily instantiated from the generic
|
|
metadata template together with the concrete types that are bound to generic
|
|
parameters.
|
|
|
|
Generic type metadata is instantiated for generic classes with live instances,
|
|
and for metatype records of value types which are explicitly referenced from
|
|
source.
|
|
|
|
When the compiler needs to emit a generic type metadata record, it uses one of
|
|
several strategies depending on the type being referenced. For concrete
|
|
non-generic types, a direct call to a lazy accessor can be generated. For bound
|
|
generic types T<P1, ..., Pn>, we recursively emit metadata references for the
|
|
generic parameters Pn, then call the getter for the bound type T. For
|
|
archetypes -- that is, generic type parameters which are free variables in the
|
|
function body being compiled -- the metadata is passed in as a value, so the
|
|
compiler simply emits a copy of that.
|
|
|
|
Generic type metadata tells us the size of each heap allocation, but does not
|
|
by itself tell us the types of the fields or what references they contain.
|
|
|
|
Mirrors and NominalTypeDescriptors
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The implementation of Mirrors uses runtime primitives which introspect the
|
|
fields of an opaque value by looking at the NominalTypeDescriptor embedded in a
|
|
type's metadata record.
|
|
|
|
For structures and classes, the NominalTypeDescriptor contains a function
|
|
pointer which returns an array of field types. The function pointer points to a
|
|
"field type metadata function" emitted by the compiler. This function emits
|
|
metadata record references for each field type and collects them in an array.
|
|
Since the isa pointer of a class instance points at an instantiated type, the
|
|
field types of such a NominalTypeDescriptor are also all concrete types.
|
|
|
|
NominalTypeDescriptors record field names, in addition to types. Right now, all
|
|
of this information is stored together, without any way of stripping it out.
|
|
Also, NominalTypeDescriptors do not record whether a reference is strong, weak
|
|
or unowned, but that would be simple to fix.
|
|
|
|
A bigger problem is that we have to call a function to lazily generate the
|
|
field type metadata. While a NominalTypeDescriptor for every instantiated class
|
|
type appears in a crashed process, the field types do not, because only a call
|
|
to the field type function will instantiate them.
|
|
|
|
Objective-C instance variable metadata
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The Objective-C runtime keeps track of the types of instance variables of
|
|
classes, and there is enough information here to identify pointers in instances
|
|
of concrete types, however there's no support for generic types. We could have
|
|
generic type metadata instantiation also clone and fill in templates for
|
|
Objective-C instance variables, but this would add a runtime cost to a feature
|
|
that is primarily intended for debugging.
|
|
|
|
DWARF metadata
|
|
~~~~~~~~~~~~~~
|
|
|
|
IRGen emits some minimal amount of DWARF metadata for non-generic types, but
|
|
makes no attempt to describe generic type layout to the debugger in this
|
|
manner.
|
|
|
|
However, DWARF has the advantage that it can be introspected without running
|
|
code, and stripped out.
|
|
|
|
New field type metadata format
|
|
------------------------------
|
|
|
|
The main limitation of all of the above is either an inability to reason about
|
|
generic types, or the requirement to run code in the target.
|
|
|
|
Suppose T is a generic type, and S is some set of substitutions.
|
|
|
|
The compiler conceptually implements an operation G(T, S) which returns a
|
|
lazily-instantiated type descriptor for the given input parameters. However,
|
|
its really performing a partial evaluation G(T)(S), with the "G(T)" part
|
|
happening at compile time.
|
|
|
|
Similarly, we can think of the field type access function as an operation F(T,
|
|
S) which returns the types of the fields of T, with T again fixed at compile
|
|
time.
|
|
|
|
What we really want here is to build an "interpreter" -- or really, a parser for
|
|
a simple serialized graph -- which understands how to parse uninstantiated
|
|
generic metadata, keep track of substitutions, and calculate field offsets,
|
|
sizes, and locations of references.
|
|
|
|
This "interpreter" has to be able to find metadata for leaf types "from
|
|
scratch", and calculate field sizes and offsets in the same way that generic
|
|
type metadata instantiation calculates object sizes.
|
|
|
|
The "interpreter" will take the form of a library for understanding field type
|
|
metadata records and symbolic type references. This will be a C++ library and
|
|
it needs to support the following use cases:
|
|
|
|
#. In-process reflection, for backing the current Mirrors in the standard
|
|
library
|
|
#. Out-of-process reflection, for heap debugging tools
|
|
#. Out-of-process reflection, for a new remote Mirrors feature in the library
|
|
(optional)
|
|
|
|
The API will be somewhat similar to Mirrors as they are in the stdlib today.
|
|
|
|
The details are described below.
|
|
|
|
Symbolic type references
|
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Since we're operating on uninstantiated generic metadata, we need some way to
|
|
describe compositions of types. Instead of using metadata record pointers,
|
|
which are now insufficient, we use type references written in a mini-language.
|
|
|
|
A symbolic type reference is a recursive structure describing an arbitrary
|
|
Swift AST type in terms of nominal types, generic type parameters, and
|
|
compositions of them, such as tuple types.
|
|
|
|
For each AST type, we can distinguish between the minimum information we need
|
|
to identify heap references therein, and the full type for reflection. The
|
|
former could be retained while the latter could be stripped out in certain
|
|
builds.
|
|
|
|
We already have a very similar encoding -- parameter type mangling in SIL. It
|
|
would be good to re-use this encoding, but for completeness, the full format of
|
|
a type reference is described below:
|
|
|
|
|
|
#. **A built-in type reference.** Special tokens can be used to refer to
|
|
various built-in types that have runtime support.
|
|
|
|
#. **A concrete type reference.** This can either be a mangled name of a type,
|
|
or a GOT offset in the target.
|
|
|
|
#. **A heap reference.** This consists of:
|
|
|
|
- strong, weak or unowned
|
|
- (optional) a reference to the class type itself
|
|
|
|
#. **A bound generic type.** This consists of:
|
|
|
|
- A concrete or built-in type reference
|
|
- A nested symbolic type reference for each generic parameter
|
|
|
|
#. **A tuple type.** This consists of:
|
|
|
|
- A recursive sequence of symbolic type references.
|
|
|
|
#. **A function type.** This consists of:
|
|
|
|
- A representation,
|
|
- (optional) input and output types
|
|
|
|
#. **A protocol composition type.** This consists of:
|
|
|
|
- A flag indicating if any of the protocols are class-constrained, which
|
|
changes the representation
|
|
- The number of non-@objc protocols in the composition
|
|
- (optional) references to all protocols in the composition
|
|
|
|
#. **A metatype.** This consists of:
|
|
|
|
- (optional) a type reference to the instance type
|
|
- there's no required information -- a metatype is always a single pointer to
|
|
a heap object which itself does not reference any other heap objects.
|
|
|
|
#. **An existential metatype.** This consists of:
|
|
|
|
- The number of protocols in the composition.
|
|
- (optional) type references to the protocol members.
|
|
|
|
#. **A generic parameter.** Within the field types of a generic type,
|
|
references to generic parameters can appear. Generic parameters are uniquely
|
|
identifiable by an index here (and once we add nested generic types, a depth).
|
|
|
|
You can visualize type references as if they are written in an S-expression
|
|
format -- but in reality, it would be serialized in a compact binary form:
|
|
|
|
::
|
|
|
|
(tuple_type
|
|
(bound_generic_type
|
|
(concrete_type "Array")
|
|
(concrete_type "Int"))
|
|
(bound_generic_type
|
|
(builtin_type "Optional")
|
|
(generic_type_parameter_type index=0)))
|
|
|
|
We will provide a library of standalone routines for decoding, encoding and
|
|
manipulating symbolic type references.
|
|
|
|
Field type metadata records
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
We introduce a new type of metadata, stored in its own section so that it can
|
|
be stripped out, called "field type metadata". For each nominal type, we emit a
|
|
record containing the following:
|
|
|
|
#. the name of the nominal type,
|
|
#. the number of generic parameters,
|
|
#. type references, written in the mini-language above, for each of its field
|
|
types.
|
|
#. field names, if enabled.
|
|
|
|
Field type metadata is linked together so that it can be looked up by name,
|
|
post-mortem by introspecting the core dump.
|
|
|
|
We add a new field to the NominalTypeDescriptor to store a pointer to field
|
|
type metadata for this nominal type. In "new-style" NominalTypeDescriptors that
|
|
contain this field, the existing field type function will point to a common
|
|
field type function, defined in the runtime, which instantiates the field type
|
|
metadata. This allows for backward compatibility with old code, if desired.
|
|
|
|
Field type metadata instantiation
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
First, given an isa pointer in the target, we need to build the symbolic type
|
|
reference by walking backwards from instantiated to uninstantiated metadata,
|
|
collecting generic parameters. This operation is lazy, caching the result for
|
|
each isa pointer.
|
|
|
|
::
|
|
|
|
enum SymbolicTypeReference {
|
|
case Concrete(String)
|
|
case BoundGeneric(String, [SymbolicTypeReference])
|
|
case Tuple([SymbolicTypeReference])
|
|
...
|
|
}
|
|
|
|
func getSymbolicTypeOfObject(_ isa: void*) -> SymbolicTypeReference
|
|
|
|
Next, we define an "instantiation" operation, which takes a completely
|
|
substituted symbolic type reference, and returns a list of concrete field types
|
|
and offsets.
|
|
|
|
This operation will need to recursively visit field metadata records and keep
|
|
track of generic parameter substitutions in order to correctly calculate all
|
|
field offsets and sizes.
|
|
|
|
The result of instantiating metadata for each given SymbolicTypeReference can
|
|
be cached for faster lookup.
|
|
|
|
This library has to be careful when following any pointers in the target, to
|
|
properly handle partially-initialized objects, runtime bugs that led to memory
|
|
corruption, or malicious code, without crashing or exploiting the debugging
|
|
tools.
|
|
|
|
::
|
|
|
|
enum FieldLayout {
|
|
// the field contains a heap reference
|
|
case Strong, Weak, Unowned
|
|
// the field is an opaque binary blob, contents unknown.
|
|
case Opaque
|
|
// the field is a value type -- look inside recursively.
|
|
case ValueType(indirect field: FieldDescriptor)
|
|
}
|
|
|
|
struct FieldDescriptor {
|
|
let size: UInt
|
|
let align: UInt
|
|
let offset: UInt
|
|
let layout: FieldLayout
|
|
}
|
|
|
|
func instantiateSymbolicType(_ ref: SymbolicTypeReference) -> [FieldTypeDescriptor]
|
|
|
|
Field type metadata can have circular references -- for example, consider two
|
|
classes which contain optionals of each other. In order to calculate field
|
|
offsets correctly, we need to break cycles when we know something is a class
|
|
type, and use a work-list algorithm instead of unbounded recursion to ensure
|
|
forward progress.
|
|
|
|
Enum type metadata
|
|
~~~~~~~~~~~~~~~~~~
|
|
|
|
For enums, the field metadata record will also need to contain enough
|
|
information about the spare bits and tag bits of the payload types that we can
|
|
at runtime determine the case of an enum and project the payload, again without
|
|
running code in the target.
|
|
|
|
This will allow us to remove a pair of value witness functions generated purely
|
|
for reflection, since they don't seem to be performance-critical.
|
|
|
|
Closures
|
|
~~~~~~~~
|
|
|
|
For closure contexts and blocks, it would be nice to emit metadata, too.
|
|
|
|
Secrecy and release builds
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
There are several levels of metadata we can choose to emit here:
|
|
|
|
#. For code that requires runtime for functional purposes, or for the standard
|
|
library in debug builds, we can have a protocol conformance or compiler flag
|
|
enable unconditional emission of all metadata.
|
|
#. For system frameworks, we can omit field names and replace class names with
|
|
unique identifiers, but keep the type metadata to help users debug memory leaks
|
|
where framework classes are retaining instances of user classes.
|
|
#. For release builds, we can strip out all the metadata except where
|
|
explicitly required in 1).
|
|
|
|
This probably requires putting the required metadata in a different section
|
|
from the debug metadata. Perhaps field names should be separate from symbolic
|
|
type references too.
|
|
|
|
Performance
|
|
~~~~~~~~~~~
|
|
|
|
Since the field type metadata instantiation only happens once per isa pointer,
|
|
mirrors will not suffer a performance impact beyond the initial warm-up time.
|
|
Once the field type descriptor has been constructed, reflective access of
|
|
fields will proceed as before.
|
|
|
|
There might also be a marginal performance gain from removing all the field
|
|
type functions from the text segment, where they're currently interspersed with
|
|
other code, and replacing them with read only data containing no relocations,
|
|
which won't get paged in until needed.
|
|
|
|
Resilience
|
|
~~~~~~~~~~
|
|
|
|
We may choose to implement the new metadata facility after stabilizing the ABI.
|
|
In this case, we should front-load some engineering work on
|
|
NominalTypeDescriptors first, to make them more amenable to future extension.
|
|
|
|
We need to carefully review the new metadata format and make sure it is
|
|
flexible enough to support future language features, such as bound generic
|
|
existentials, which may further complicate heap layout.
|
|
|
|
As described above, it is possible to introduce this change in a
|
|
backwards-compatible manner. We keep the field type function field in the
|
|
NominalTypeDescriptor, but for "new-style" records, set it to point to a common
|
|
function, defined in the runtime, which parses the new metadata and returns an
|
|
array of field types that can be used by old clients.
|
|
|
|
Testing
|
|
~~~~~~~
|
|
|
|
By transitioning mirrors to use the new metadata, existing tests can be used to
|
|
verify behavior. Additional tests can be developed to perform various
|
|
allocations and assert properties of the resulting object graph, either from
|
|
in- or out-of-process.
|
|
|
|
If we go with the gradual approach where we have both field type functions and
|
|
field type metadata, we can also instantiate the former and compare it against
|
|
the result of invoking the latter, for all types in the system, as a means of
|
|
validating the field type metadata.
|
|
|