Currently, when creating a `RawSyntax` layout node, the `RawSyntax` constructor needs to iterate over all child nodes to
a) sum up their sub node count
b) add their arena as a child arena of the new node's arena
But we are already iterating over all child nodes in every place that calls these constructors. So instead of looping twice, we can perform the above operations in the loop that already exists and pass the parameters to the `RawSyntax` constructor, which spees up `RawSyntax` node creation.
To ensure the integrity of the `RawSyntax` tree, the passed in values are still validated in release builds.
If the source is empty, the start of the copied buffer is a nullptr which doesn't live inside the SyntaxArena's bump allocator. Thus the assert inside `setHotUseMemoryArea` fails.
We have finally reached our goal of optimising deferred node creation
for SyntaxTreeCreator. Instead of creating dedicated deferred nodes and
copying the data into a RawSyntax node when recording, we always create
RawSyntax nodes. Recording a deferred node is thus a no-op, since we
have already created a RawSyntax node. Should a deferred node not be
recorded, it stays alive in the SyntaxArena without any reference to it.
While this means, we are leaking some memory for such nodes, most nodes
do get recorded, so the overhead should be fine compared to the
performance benefit.
By now ParsedRawSyntaxNode does not have any knowledge about deferred
node data anymore, which frees up SyntaxParseActions (and, in
particular its sublass SyntaxTreeCreator) to perform optimisations to
more efficiently create and record deferred nodes.
Instead, only reference count the SyntaxArena that the RawSyntax nodes
live in. The user of RawSyntax nodes must guarantee that the SyntaxArena
stays alive as long as the RawSyntax nodes are being accessed.
During parse time, the SyntaxTreeCreator holds on to the SyntaxArena
in which it creates RawSyntax nodes. When inspecting a syntax tree,
the root SyntaxData node keeps the SyntaxArena alive. The change should
be mostly invisible to the users of the public libSyntax API.
This change significantly decreases the overall reference-counting
overhead. Since we were not able to free individual RawSyntax nodes
anyway, performing the reference-counting on the level of the
SyntaxArena feels natural.
Do the same thing that we are already doing for trivia: Since RawSyntax
nodes always live inside a SyntaxArena, we don't need to tail-allocate
an OwnedString to store the token's text. Instead we can just copy it
to the SyntaxArena. If we copy the entire source buffer to the syntax
arena at the start of parsing, this means that no more copies are
required later on. Plus we also avoid ref-counting the OwnedString which
should also increase performance.
In practice SyntaxArena.containsPointer is almost always called with a
pointer from the SyntaxArena's source buffer. To avoid walking through
all of the bump allocator's slabs until we find the one containing the
source buffer, add a hot use memory region (which lives inside the bump
allocator) that is checked first before consulting the bump allocator.
Referencing a string in arbitrary memory is not safe since the source
buffer to which it points may have been freed. Instead copy all strings
into the SyntaxArena. Since RawSyntax nodes retain their arena, they can
be sure that the string won't disappear if it lives in their arena.
To avoid lots of small copies, we copy the entire source buffer once
into the syntax arena and make StringRefs point into that buffer.
Currently when parsing a SourceFile, the parser
gets handed pointers so that it can write the
interface hash and collected tokens directly into
the file. It can also call `setSyntaxRoot` at
the end of parsing to set the syntax tree.
In preparation for the removal of
`performParseOnly`, this commit formalizes these
values as outputs of `ParseSourceFileRequest`,
ensuring that the file gets parsed when the
interface hash, collected tokens, or syntax tree
is queried.
Like the last commit, SourceFile is used a lot by Parse and Sema, but
less so by the ClangImporter and (de)Serialization. Split it out to
cut down on recompilation times when something changes.
This commit does /not/ split the implementation of SourceFile out of
Module.cpp, which is where most of it lives. That might also be a
reasonable change, but the reason I was reluctant to is because a
number of SourceFile members correspond to the entry points in
ModuleDecl. Someone else can pick this up later if they decide it's a
good idea.
No functionality change.
Most of AST, Parse, and Sema deal with FileUnits regularly, but SIL
and IRGen certainly don't. Split FileUnit out into its own header to
cut down on recompilation times when something changes.
No functionality change.
So that we can easily detect 'ParsedSyntaxNode' leaking. When it's
moved, the original node become "null" node. In the destructor of
'ParsedSyntaxNode', assert the node is not "recorded" node.
This is a follow up to the discussion on #22740 to switch the host
libraries to use the `target_link_libraries` rather than the
`LINK_LIBRARIES` special handling. This allows the dependency to be
properly tracked by CMake and allows us to use the more modern syntax.
This eliminates the overhead of ParsedRawSyntaxNode needing to do memory management.
If ParsedRawSyntaxNode needs to point to some data the memory is allocated from a bump allocator.
There are also some improvements on how the ParsedSyntaxBuilders work.
Instead of creating syntax nodes directly, modify the parser to invoke an abstract interface 'SyntaxParseActions' while it is parsing the source code.
This decouples the act of parsing from the act of forming a syntax tree representation.
'SyntaxTreeCreator' is an implementation of SyntaxParseActions that handles the logic of creating a syntax tree.
To enforce the layering separation of parsing and syntax tree creation, a static library swiftSyntaxParse is introduced to compose the two.
This decoupling is important for introducing a syntax parser library for SwiftSyntax to directly access parsing.