We were only keeping track of `RawSyntax` node IDs to incrementally transfer a syntax tree via JSON. However, AFAICT the incremental JSON transfer option has been superceeded by `SyntaxParseActions`, which are more efficient.
So, let’s clean up and remove the `RawSyntax` node ID and JSON incremental transfer option.
In places that still need a notion of `RawSyntax` identity (like determining the reused syntax regions), use the `RawSyntax`’s pointer instead of the manually created ID.
In `incr_transfer_round_trip.py` always use the code path that uses the `SyntaxParseActions` and remove the transitional code that was still using the incremental JSON transfer but was never called.
Now that we have a fast SyntaxDataRef, create a corresponding SyntaxRef hierarchy. In contrast to the Syntax, SyntaxRef does *not* own the backing SyntaxDataRef, but merely has a pointer to the SyntaxDataRef.
In addition to the requirements imposed by SyntaxDataRef, the user of the SyntaxRef hierarchy needs to make sure that the backing SyntaxDataRef of a SyntaxRef node stays alive. While this sounds like a lot of requirements, it has performance advantages:
- Passing a SyntaxRef node around is just passing a pointer around.
- When casting a SyntaxRef node, we only need to create a new SyntaxRef (aka. pointer) that points to the same underlying SyntaxDataRef - there's no need to duplicate the SyntaxDataRef.
- As SyntaxDataRef is not ref-counted, there's no ref-counting overhead involved.
Furthermore, the requirements are typically fulfilled. The getChild methods on SyntaxRef return an OwnedSyntaxRef, which stores the SyntaxDataRef. As long as this variable is stored somewhere on the stack, the corresponding SyntaxRef can safely be used for the duration of the stack frame. Even calls like the following are possible, because OwnedSyntaxRef returned by getChild stays alive for the duration of the entire statement.
```
useSyntaxRef(mySyntaxRef.getChild(0).getRef())
```
Instead of having a heap-allocated RefCountedBox to store a SyntaxData's
parent, reference-count SyntaxData itself. This has a couple of
advantages:
- When passing SyntaxData around, only a pointer needs to be passed
instead of the entire struct contents. This is faster.
- We can later introduce a SyntaxDataRef, which behaves similar to
SyntaxData, but delegates the responsibility that the parent stays
alive to the user. While sacrificing guaranteed memory safety, this
means that SyntaxData can then be stack-allocated without any
ref-counting overhead.
Instead, only reference count the SyntaxArena that the RawSyntax nodes
live in. The user of RawSyntax nodes must guarantee that the SyntaxArena
stays alive as long as the RawSyntax nodes are being accessed.
During parse time, the SyntaxTreeCreator holds on to the SyntaxArena
in which it creates RawSyntax nodes. When inspecting a syntax tree,
the root SyntaxData node keeps the SyntaxArena alive. The change should
be mostly invisible to the users of the public libSyntax API.
This change significantly decreases the overall reference-counting
overhead. Since we were not able to free individual RawSyntax nodes
anyway, performing the reference-counting on the level of the
SyntaxArena feels natural.
Instead, reference count the SyntaxData's parent. This has a couple of
advantages:
1. We eliminate a const_cast that was potentially unsafe
2. It more closely resembles the architecture on the Swift side
3. It has the potential to be optimised further if the parent can be
accessed in an unsafe, non-reference-counted way
By convention, most structs and classes in the Swift compiler include a `dump()` method which prints debugging information. This method is meant to be called only from the debugger, but this means they’re often unused and may be eliminated from optimized binaries. On the other hand, some parts of the compiler call `dump()` methods directly despite them being intended as a pure debugging aid. clang supports attributes which can be used to avoid these problems, but they’re used very inconsistently across the compiler.
This commit adds `SWIFT_DEBUG_DUMP` and `SWIFT_DEBUG_DUMPER(<name>(<params>))` macros to declare `dump()` methods with the appropriate set of attributes and adopts this macro throughout the frontend. It does not pervasively adopt this macro in SILGen, SILOptimizer, or IRGen; these components use `dump()` methods in a different way where they’re frequently called from debugging code. Nor does it adopt it in runtime components like swiftRuntime and swiftReflection, because I’m a bit worried about size.
Despite the large number of files and lines affected, this change is NFC.
(implemented by Nathan Hawes @nathawes)
Advance \p Loc to the last non-missing token of the specified or, if it
doesn't contain any, the last non-missing token preceding it in the
tree.
libSyntax nodes don't maintain absolute source location on each
individual node. Instead, the absolute locations are calculated on
demand with a given root by accumulating the length of all the other
nodes before the target node. This bridging is important for issuing
diagnostics from libSyntax entities.
With the observation that our current implementation of the source
location calculation has multiple bugs, this patch re-implemented this
bridging by using the newly-added syntax visitor. Also, we moved the function
from RawSyntax to Syntax for better visibility.
To test this source location calculation, we added a new action in
swift-syntax-test. This action parses a given file as a
SourceFileSyntax, calculates the absolute location of the
EOF token in the SourceFileSyntax, and dump the buffer from the start
of the input file to the absolute location of the EOF. Finally, we compare
the dump with the original input to ensure they are identical.
* Generate libSyntax API
This patch removes the hand-rolled libSyntax API and replaces it with an
API that's entirely automatically generated. This means the API is
guaranteed to be internally stylistically and functionally consistent.
This will make it easier to incrementally implement syntax nodes,
while allowing us to embed nodes that we do know about inside ones
that we don't.
https://bugs.swift.org/browse/SR-4062
A return statement needs something to return, so implement
integer-literal-expression too. This necessarily also forced
UnknownExprSyntax, UnknownStmtSyntax, and UnknownDeclSyntax,
which are stand-in token buckets for when we don't know
how to transform/migrate an AST.
This commit also contains the core function for caching
SyntaxData children. This is highly tricky code, with some
detailed comments in SyntaxData.{h,cpp}. The gist is that
we have to atomically swap in a SyntaxData pointer into the
child field, so we can maintain pointer identity of SyntaxData
nodes, while still being able to cache them internally.
To prove that this works, there is a multithreaded test that
checks that two threads can ask for a child that hasn't been
cached yet without crashing or violating pointer identity.
https://bugs.swift.org/browse/SR-4010