Structurally prevent a number of common anti-patterns involving generic
signatures by separating the interface into GenericSignature and the
implementation into GenericSignatureBase. In particular, this allows
the comparison operators to be deleted which forces callers to
canonicalize the signature or ask to compare pointers explicitly.
Now that GenericSignatures store their single unique GenericEnvironment,
we can remove similar logic from deserialization to preserve identity
of GenericEnvironments.
A generic environment is always serialized as a GenericSignature with
a lazily-recreated environment, though sometimes it has to include
extra info specifically for generic environments used by SIL. The code
that was doing this claimed a bit for disambiguating between the two,
shrinking the permitted size of a compiled module from 2^31 bits to
2^30. (The code isn't just needlessly complicated; GenericEnvironments
used to be serialized with more information.)
Rather than have two representations for GenericEnvironmentID, this
commit just drops it altogether in favor of referencing
GenericSignatures directly. This causes a negligible file size
shrinkage for swiftmodules in addition to eliminating the problematic
disambiguation bit.
For now, the Deserialization logic will continue to cache
GenericEnvironments that are used directly by Deserialization, but
really that should probably be done at the AST level. Then we can
simplify further to ModuleFile tracking a plain list of
GenericSignatures.
...by making it a tagged union of either a DeclID or a
LocalDeclContextID. This should lead to smaller module files and be
slightly more efficient to deserialize, and also means that every
AST entity kind is serialized in exactly one way, which allows for
the following commit's refactoring.
The AST block in a compiled module represents an object graph, which
is essentially serialized in four steps:
- An entity (such as a Decl or Type) is encountered, given an ID, and
added to a worklist.
- The next entity is popped from the worklist and its offset in the
output stream is recorded.
- During the course of writing that entity, more entities will be
referenced and added to the worklist.
- Once the entire worklist is drained, the offsets get written to a
table in the Index block.
The implementation of this was duplicated for each kind of entity in
the AST block; this commit factors that out into a reusable helper.
No intended high-level functionality change, but the order in which
Decls and Types get emitted might change a little now that they're not
in the same queue.
It's a pretty obscure feature (and one we wish we didn't need), but
sometimes API is initially exposed through one module in order to
build another one, and we want the canonical presented name to be
something else. Push this concept into Swift's AST properly so that
other parts of the compiler stop having to know that this is a
Clang-specific special case.
No functionality change in this commit; will be used in the next
commit.
Now that we don't store requirements in the GenericParamList, there's
no reason to use trailing records to list out the
GenericTypeParamDecls.
No functionality change.
A module compiled with `-enable-private-imports` allows other modules to
import private declarations if the importing source file uses an
``@_private(from: "SourceFile.swift") import statement.
rdar://29318654
* Introduce stored inlinable function bodies
* Remove serialization changes
* [InterfaceGen] Print inlinable function bodies
* Clean up a little bit and add test
* Undo changes to InlinableText
* Add serialization and deserialization for inlinable body text
* Allow parser to parse accessor bodies in interfaces
* Fix some tests
* Fix remaining tests
* Add tests for usableFromInline decls
* Add comments
* Clean up function body printing throughout
* Add tests for subscripts
* Remove comment about subscript inlinable text
* Address some comments
* Handle lack of @objc on Linux
The string-keyed tables don't actually need Identifier keys on the
serialization side -- no reason to persist these in the ASTContext's
string table either.
They're not actually stored separately in the module file, nor
deserialized separately, but this way we're not sticking a bunch of
strings in the /ASTContext's/ string table, which would persist after
serialization. (The ASTContext's string table is also implemented
using a BumpPtrAllocator-backed StringMap, so the performance
characteristics of this should be about the same.)
This was originally added to support "derived" top-level declarations,
which in practice was just '==' implementations for enums. Now that
those are members, we don't have the notion of derived top-level
declarations anymore, and neither is there anything that would
normally be a cross-reference that we want to force to be serialized
directly.
No functionality change (because this was unused).
Allow substitution maps to be serialized directly (via an ID), writing out
the replacement types and conformances as appropriate. This is a more
efficient form of serialization than the current SubstitutionList approach,
because it maintains uniqueness of substitution maps within a module file,
and is a step toward eliminating SubstitutionList entirely.
Rather than inlining generic signatures in a half dozen places throughout
the serialization format, serialize (uniqued) generic signatures with their
own GenericSignatureID. Update various layouts (generic function types,
SIL function types, generic environments, extension cross-references) to
use GenericSignatureID.
Shaves ~187k off the size of Swift.swiftmodule.
Somewhat oddly, DeclID seems not to like being a key in DenseMap
due to the -1 value from getEmptyKey(), I don't quite get whether
this is intentional but using a raw uint32_t seems to work fine.
Special DeclNames represent names that do not have an identifier in the
surface language. This implies serializing the information about whether
a name is special together with its identifier (if it is not special)
in both the module file and the swift lookup table.
With the introduction of special decl names, `Identifier getName()` on
`ValueDecl` will be removed and pushed down to nominal declarations
whose name is guaranteed not to be special. Prepare for this by calling
to `DeclBaseName getBaseName()` instead where appropriate.
In order to accomplish this, cross-module references to typealiases
are now banned except from within conformances and NameAliasTypes, the
latter of which records the canonical type to determine if the
typealias has changed. For conformances, we don't have a good way to
check if the typealias has changed without trying to map it into
context, but that's all right---the rest of the compiler can already
fall back to the canonical type.
Module files store all of the Objective-C method entrypoints in a
central table indexed by selector, then filter the results based on
the specific class being requested. Rather than storing the class as
a TypeID---which requires a bunch of deserialization---store its
mangled name. This allows us to deserialize less, and causes circular
deserialization in rdar://problem/31615640.
Previously looking up an extension would result in all extensions for
types with the same name (nested or not) being deserialized; this
could even bring in base types that had not been deserialized yet. Add
in a string to distinguish an extension's base type; in the top-level
case this is just a module name, but for nested types it's a full
mangled name.
This is a little heavier than I'd like it to be, since it means we
mangle names and then throw them away, and since it means there's a
whole bunch of extra string data in the module just for uniquely
identifying a declaration. But it's correct, and does less work than
before, and fixes a circularity issue with a nested type A.B.A that
apparently used to work.
https://bugs.swift.org/browse/SR-3915
SubstitutionList is going to be a more compact representation of
a SubstitutionMap, suitable for inline allocation inside another
object.
For now, it's just a typedef for ArrayRef<Substitution>.
There's a class of errors in Serialization called "circularity
issues", where declaration A in file A.swift depends on declaration B
in file B.swift, and B also depends on A. In some cases we can manage
to type-check each of these files individually due to the laziness of
'validateDecl', but then fail to merge the "partial modules" generated
from A.swift and B.swift to form a single swiftmodule for the library
(because deserialization is a little less lazy for some things). A
common case of this is when at least one of the declarations is
nested, in which case a lookup to find that declaration needs to load
all the members of the parent type. This gets even worse when the
nested type is defined in an extension.
This commit sidesteps that issue specifically for nested types by
creating a top-level, per-file table of nested types in the "partial
modules". When a type is in the same module, we can then look it up
/without/ importing all other members of the parent type.
The long-term solution is to allow accessing any members of a type
without having to load them all, something we should support not just
for module-merging while building a single target but when reading
from imported modules as well. This should improve both compile time
and memory usage, though I'm not sure to what extent. (Unfortunately,
too many things still depend on the whole members list being loaded.)
Because this is a new code path, I put in a switch to turn it off:
frontend flag -disable-serialization-nested-type-lookup-table
https://bugs.swift.org/browse/SR-3707 (and possibly others)
Teach the serialization of SIL generic environments, which used to be
a trailing record following the SIL function definition, to use the
same uniqued "generic environment IDs" that are used for the AST
generic environments. Many of them overlap anyway, and SIL functions
tend to have AST generic environments anyway.
This approach guarantees that the AST + SIL deserialization provide
the same uniqueness of generic environments present prior to
serialization.
Serialize generic environments via a generic environment ID with a
separte offset table, so we have identity for the generic environments
and will share generic environments on deserialization.
When a pattern within a type context is serialized, serialize its
interface type (not its contextual type). When deserializing, record
the interface type and keep a side table of the associated
DeclContext, so that we can lazily map to the contextual type on first
access. This is designed to break recursion when we change the way
archetypes and generic environments are serialized.
An environment is always associated with a location with a signature, so
having them separate is pointless duplication. This patch also updates
the serialization to round-trip the signature data.
The witnesses in a NormalProtocolConformance have never been
completely serialized, because their substitutions involved a weird
mix of archetypes that blew up the deserialization code. So, only the
witness declarations themselves got serialized. Many clients (the type
checker, SourceKit, etc.) didn't need the extra information, but some
clients (e.g., the SIL optimizers) would end up recomputing this
information. Ick.
Now, serialize the complete Witness structure along with the AST,
including information about the synthetic environment, complete
substitutions, etc. This should obsolete some redundant code paths in
the SIL optimization infrastructure.
This (de-)serialization code takes a new-ish approach to serializing
the synthetic environment in that it avoids serializing any
archetypes. Rather, it maps everything back to interface types during
serialization, and deserialization forms a new generic environment
(with new archetypes!) on-the-fly, mapping deserialized types back
into that environment (and to those archetypes). This way, we don't
have to maintain identity of archetypes in the deserialization code,
and might get some better re-use of the archetypes.
More of rdar://problem/24079818.
Sugared GenericTypeParamTypes point to GenericTypeParamDecls,
allowing the name of the parameter as written by the user to be
recovered. Canonical GenericTypeParamTypes on the other hand
only store a depth and index, without referencing the original
declaration.
When printing SIL, we wish to output the original generic parameter
names, even though SIL only uses canonical types. Previously,
we used to accomplish this by mapping the generic parameter to an
archetype and printing the name of the archetype. This was not
adequate if multiple generic parameters mapped to the same
archetype, or if a generic parameter was mapped to a concrete type.
The new approach preserves the original sugared types in the
GenericEnvironment, adding a new GenericEnvironment::getSugaredType()
method.
There are also some other assorted simplifications made possible
by this.
Unfortunately this makes GenericEnvironments use a bit more memory,
however I have more improvements coming that will offset the gains,
in addition to making substitution lists smaller also.
This patch is rather large, since it was hard to make this change
incrementally, but most of the changes are mechanical.
Now that we have a lighter-weight data structure in the AST for mapping
interface types to archetypes and vice versa, use that in SIL instead of
a GenericParamList.
This means that when serializing a SILFunction body, we no longer need to
serialize references to archetypes from other modules.
Several methods used for forming substitutions can now be moved from
GenericParamList to GenericEnvironment.
Also, GenericParamList::cloneWithOuterParameters() and
GenericParamList::getEmpty() can now go away, since they were only used
when SILGen-ing witness thunks.
Finally, when printing generic parameters with identical names, the
SIL printer used to number them from highest depth to lowest, by
walking generic parameter lists starting with the innermost one.
Now, ambiguous generic parameters are numbered from lowest depth
to highest, by walking the generic signature, which means test
output in one of the SILGen tests has changed.