Like the last commit, SourceFile is used a lot by Parse and Sema, but
less so by the ClangImporter and (de)Serialization. Split it out to
cut down on recompilation times when something changes.
This commit does /not/ split the implementation of SourceFile out of
Module.cpp, which is where most of it lives. That might also be a
reasonable change, but the reason I was reluctant to is because a
number of SourceFile members correspond to the entry points in
ModuleDecl. Someone else can pick this up later if they decide it's a
good idea.
No functionality change.
Most of AST, Parse, and Sema deal with FileUnits regularly, but SIL
and IRGen certainly don't. Split FileUnit out into its own header to
cut down on recompilation times when something changes.
No functionality change.
The type deduction here will copy the returned ParsedTypeSyntax which
is no longer copy constructible. Use a reference to hold the result
instead which should fix the build on Windows. The returned value of
the `popToken` returns a `ParsedTokenSyntax` which is no longer copy
constructible. Explicitly move the value.
So that we can easily detect 'ParsedSyntaxNode' leaking. When it's
moved, the original node become "null" node. In the destructor of
'ParsedSyntaxNode', assert the node is not "recorded" node.
There were a few places discarding recorded syntax:
- '#<code-complete>' at top-level (this should be parsed as UnknownDecl).
- 'typealias' decl with inheritance clause in protocol decl.
Instead of creating the AST directly in the parser (and libSyntax or
SwiftSyntax via SyntaxParsingContext), make Parser to explicitly create
a tree of ParsedSyntaxNodes. Their OpaqueSyntaxNodes can be either
libSyntax or SwiftSyntax. If AST is needed, it can be generated from the
libSyntax tree.
SyntaxParseActions::recordToken() et al. may return the same pointer
value for different nodes (e.g. `nullptr`). So we cannot use DenseMap to
associate the node from the explicit syntax parsing actions to libSyntax
node. Instead, use a structure that wraps them.
Instead of creating the AST directly in the parser (and libSyntax or
SwiftSyntax via SyntaxParsingContext), make Parser to explicitly create
a tree of ParsedSyntaxNodes. Their OpaqueSyntaxNodes can be either
libSyntax or SwiftSyntax. If AST is needed, it can be generated from the
libSyntax tree.
This eliminates the overhead of ParsedRawSyntaxNode needing to do memory management.
If ParsedRawSyntaxNode needs to point to some data the memory is allocated from a bump allocator.
There are also some improvements on how the ParsedSyntaxBuilders work.
Instead of creating syntax nodes directly, modify the parser to invoke an abstract interface 'SyntaxParseActions' while it is parsing the source code.
This decouples the act of parsing from the act of forming a syntax tree representation.
'SyntaxTreeCreator' is an implementation of SyntaxParseActions that handles the logic of creating a syntax tree.
To enforce the layering separation of parsing and syntax tree creation, a static library swiftSyntaxParse is introduced to compose the two.
This decoupling is important for introducing a syntax parser library for SwiftSyntax to directly access parsing.
Instead of creating multiple CodeBlockItemList nodes, that need to get merged and discarded later on, do this:
* Ensure for libSyntax parsing that we parse the whole file
* Create top-level CodeBlockItem nodes that we just directly wrap with a single CodeBlockItemList node at the end
The importance of this change will become more obvious later on when we'll decouple syntax parsing from the formation of libSyntax tree nodes.
This allows an elegant design in which we can still allocate RawSyntax
nodes using a bump allocator but are able to automatically free that
buffer once the last RawSyntax node within that buffer is freed.
This also resolves a memory leak of RawSyntax nodes that was caused by
ParserUnit not freeing its underlying ASTContext.
We cannot use unowned strings for token texts of incrementally parsed
syntax trees since the source buffer to which reused nodes refer will
have been freed for reused nodes. Always copying the token text whenever
OwnedString is passed is too expensive. A reference counted copy of the
string allows us to keep the token's string alive across incremental
parses while eliminating unnecessary copies.