Do the same thing that we are already doing for trivia: Since RawSyntax
nodes always live inside a SyntaxArena, we don't need to tail-allocate
an OwnedString to store the token's text. Instead we can just copy it
to the SyntaxArena. If we copy the entire source buffer to the syntax
arena at the start of parsing, this means that no more copies are
required later on. Plus we also avoid ref-counting the OwnedString which
should also increase performance.
Referencing a string in arbitrary memory is not safe since the source
buffer to which it points may have been freed. Instead copy all strings
into the SyntaxArena. Since RawSyntax nodes retain their arena, they can
be sure that the string won't disappear if it lives in their arena.
To avoid lots of small copies, we copy the entire source buffer once
into the syntax arena and make StringRefs point into that buffer.
This way, we will later be able to store additional information about
the node inside the same arena with a guarantee that they will always be
alive as long as the node is alive.
These additional information will include
a) the token's text (which can be a StringRef into a copy of the source
code that lives inside the SyntaxArena)
b) the token's unparsed trivia, which can be decomposed into pieces when
needed.
By convention, most structs and classes in the Swift compiler include a `dump()` method which prints debugging information. This method is meant to be called only from the debugger, but this means they’re often unused and may be eliminated from optimized binaries. On the other hand, some parts of the compiler call `dump()` methods directly despite them being intended as a pure debugging aid. clang supports attributes which can be used to avoid these problems, but they’re used very inconsistently across the compiler.
This commit adds `SWIFT_DEBUG_DUMP` and `SWIFT_DEBUG_DUMPER(<name>(<params>))` macros to declare `dump()` methods with the appropriate set of attributes and adopts this macro throughout the frontend. It does not pervasively adopt this macro in SILGen, SILOptimizer, or IRGen; these components use `dump()` methods in a different way where they’re frequently called from debugging code. Nor does it adopt it in runtime components like swiftRuntime and swiftReflection, because I’m a bit worried about size.
Despite the large number of files and lines affected, this change is NFC.
This allows an elegant design in which we can still allocate RawSyntax
nodes using a bump allocator but are able to automatically free that
buffer once the last RawSyntax node within that buffer is freed.
This also resolves a memory leak of RawSyntax nodes that was caused by
ParserUnit not freeing its underlying ASTContext.
Naming the bit-field structs is a significant readability improvement
because it's very clear that you shouldn't touch e.g. Bits.Token
without having checked/asserted that you're in a token case.
The assertions are all in statement context (which was obvious
because the NDEBUG versions all included semicolons), so there's no
reason not to use the traditional `do { } while (false)` trick instead
of a statement-expression.
This also clears up some warnings in atypical build configurations.
Introduced SyntaxArena for managing memory and cache.
SyntaxArena holds BumpPtrAllocator as a allocation storage.
RawSyntax is now able to be constructed with normal heap allocation, or
by SyntaxArena. RawSyntax has ManualMemory flag which indicates it's managed by
SyntaxArena. If the flag is true, its Retain()/Release() is no-op thus it's
never destructed by IntrusiveRefCntPtr.
This speedups the memory allocation for RawSyntax.
Also, in Syntax parsing, "token" RawSyntax is reused if:
a) It's not string literal with >16 length; and
b) It doesn't contain random text trivia (e.g. comment).
This reduces the overall allocation cost.
libSyntax nodes don't maintain absolute source location on each
individual node. Instead, the absolute locations are calculated on
demand with a given root by accumulating the length of all the other
nodes before the target node. This bridging is important for issuing
diagnostics from libSyntax entities.
With the observation that our current implementation of the source
location calculation has multiple bugs, this patch re-implemented this
bridging by using the newly-added syntax visitor. Also, we moved the function
from RawSyntax to Syntax for better visibility.
To test this source location calculation, we added a new action in
swift-syntax-test. This action parses a given file as a
SourceFileSyntax, calculates the absolute location of the
EOF token in the SourceFileSyntax, and dump the buffer from the start
of the input file to the absolute location of the EOF. Finally, we compare
the dump with the original input to ensure they are identical.
Along with starting to support ternary expressions, this commit also
slightly changes SyntaxParsingContext APIs as follows:
1. Previously, makeNode() only supports node creation by using the nodes
from the underlying syntax token array; this commit allows it to use the nodes from
the pending syntax list as well.
2. This commit strictly limits that the pending syntax list should never
contain token syntax node.
3. The node kind test shouldn't include unknown kinds. They are noisy.
This commit teaches parser to generate code block syntax node. As a support for this,
SyntaxParsingContext can be created by a single syntax kind, indicating the whole context
should be parsed into a node of that given syntax. Another change is to bridge created syntax
node with the given context kind. For instance, if a statement context results into an expression
node, the expression node will be bridged to a statement by wrapping it with a ExpressionStmt
node.
* Generate libSyntax API
This patch removes the hand-rolled libSyntax API and replaces it with an
API that's entirely automatically generated. This means the API is
guaranteed to be internally stylistically and functionally consistent.
Previously, users of TokenSyntax would always deal with RC<TokenSyntax>
which is a subclass of RawSyntax. Instead, provide TokenSyntax as a
fully-realized Syntax node, that will always exist as a leaf in the
Syntax tree.
This hides the implementation detail of RawSyntax and SyntaxData
completely from clients of libSyntax, and paves the way for future
generation of Syntax nodes.
The Swift 4 Migrator is invoked through either the driver and frontend
with the -update-code flag.
The basic pipeline in the frontend is:
- Perform some list of syntactic fixes (there are currently none).
- Perform N rounds of sema fix-its on the primary input file, currently
set to 7 based on prior migrator seasons. Right now, this is just set
to take any fix-it suggested by the compiler.
- Emit a replacement map file, a JSON file describing replacements to a
file that Xcode knows how to understand.
Currently, the Migrator maintains a history of migration states along
the way for debugging purposes.
- Add -emit-remap frontend option
This will indicate the EmitRemap frontend action.
- Don't fork to a separte swift-update binary.
This is going to be a mode of the compiler, invoked by the same flags.
- Add -disable-migrator-fixits option
Useful for debugging, this skips the phase in the Migrator that
automatically applies fix-its suggested by the compiler.
- Add -emit-migrated-file-path option
This is used for testing/debugging scenarios. This takes the final
migration state's output text and writes it to the file specified
by this option.
- Add -dump-migration-states-dir
This dumps all of the migration states encountered during a migration
run for a file to the given directory. For example, the compiler
fix-it migration pass dumps the input file, the output file, and the
remap file between the two.
State output has the following naming convention:
${Index}-${MigrationPassName}-${What}.${extension}, such as:
1-FixitMigrationState-Input.swift
rdar://problem/30926261
* Refactor Tuple Type Syntax
This patch:
- Refactors TypeArgumentListSyntax and
TypeArgumentListSyntaxData to use the SyntaxCollection and
SyntaxCollectionData APIs.
- Refactors TupleTypeElementSyntax to own its trailing comma, and
updates the tests accordingly.
- Provides an infrastructure for promoting types to use
the SyntaxCollection APIs
* Addressed comments.
* Renamed makeBlankTypeArgumentList()
* Update makeTupleType
* Changed makeTupleType to take an element list.
* Updated comment.
* Improved API for creating TupleTypeElementListSyntax'es
* Added round-trip test
* Removed last TypeArgumentList holdovers.
* Fixed round-trip test invocation
Add an option to the lexer to go back and get a list of "full"
tokens, which include their leading and trailing trivia, which
we can index into from SourceLocs in the current AST.
This starts the Syntax sublibrary, which will support structured
editing APIs. Some skeleton support and basic implementations are
in place for types and generics in the grammar. Yes, it's slightly
redundant with what we have right now. lib/AST conflates syntax
and semantics in the same place(s); this is a first step in changing
that to separate the two concepts for clarity and also to get closer
to incremental parsing and type-checking. The goal is to eventually
extract all of the syntactic information from lib/AST and change that
to be more of a semantic/symbolic model.
Stub out a Semantics manager. This ought to eventually be used as a hub
for encapsulating lazily computed semantic information for syntax nodes.
For the time being, it can serve as a temporary place for mapping from
Syntax nodes to semantically full lib/AST nodes.
This is still in a molten state - don't get too close, wear appropriate
proximity suits, etc.