Before this patch, we have one flag (KeepSyntaxInfo) to turn on two syntax
functionalities of parser: (1) collecting parsed tokens for coloring and
(2) building syntax trees. Since sourcekitd is the only consumer of either of these
functionalities, sourcekitd by default always enables such flag.
However, empirical results show (2) is both heavier and less-frequently
needed than (1). Therefore, separating the flag to two flags makes more
sense, where CollectParsedToken controls (1) and BuildSyntaxTree
controls (2).
CollectingParsedToken is always enabled by sourcekitd because
formatting and syntax-coloring need it; however BuildSyntaxTree should
be explicitly switched on by sourcekitd clients.
resolves: rdar://problem/37483076
Introduced SyntaxArena for managing memory and cache.
SyntaxArena holds BumpPtrAllocator as a allocation storage.
RawSyntax is now able to be constructed with normal heap allocation, or
by SyntaxArena. RawSyntax has ManualMemory flag which indicates it's managed by
SyntaxArena. If the flag is true, its Retain()/Release() is no-op thus it's
never destructed by IntrusiveRefCntPtr.
This speedups the memory allocation for RawSyntax.
Also, in Syntax parsing, "token" RawSyntax is reused if:
a) It's not string literal with >16 length; and
b) It doesn't contain random text trivia (e.g. comment).
This reduces the overall allocation cost.
With more syntax nodes being specialized, we'd like this
straight-forward way to pinpoint unknown entities. This diagnostics
is only issued in -emit-syntax frontend action and swift-syntax-test
invocation.
This has three principal advantages:
- It gives some additional type-safety when working
with known accessors.
- It makes it significantly easier to test whether a declaration
is an accessor and encourages the use of a common idiom.
- It saves a small amount of memory in both FuncDecl and its
serialized form.
This allows the root context to have a separate place to keep track of
the global data that each sub-context can access to, for instance,
SourceFile, DiagnosticEngine, etc.
A string interpolation expression is composed of { OpenQuote, Segments,
CloseQuote }. To represent OpenQuote, CloseQuote and StringSegment, we have to
introduce new token kinds correspondingly.
Tuple expression essentially has the same underlying structure as
function call arguments in libSyntax. However, we separate them as
different libSyntax kinds for better usability.
Different from AST, libSyntax currently allows single-child,
label-free tuple expressions (represented as ParenExpr in AST). This is
subject to change if we need to adopt the same differentiation in
libSyntax in the future.
Avoid heap-allocated memory for syntax parsing context.
Add more assertions to ensure syntax nodes are created only at the top of context stack.
Allow syntax parsing context to delay the specifying of context kind and target syntax kind.
This commit also adds ArrayExpr and DictionaryExpr to the libSyntax nodes
family. Also, it refactors the original parser code for these two
expressions to better fit to the design of SyntaxParsingContext.
This commit has also fixed two crashers.
This commit teaches parser to parse two libSyntax nodes: FunctionCallArgument and
FunctionCallArgumentList. Along with the change, some libSyntax parsing infrastructure changes
as well: (1) parser doesn't directly insert token into the buffer for libSyntax node creation;
instead, when creating a simple libSyntax node like integer literal expression, parser should indicate the location of the last token in the node; (2) implicit libSyntax nodes like empty
statement list must contain a source location indicating where the implicit nodes should appear
(immediately before the token at the given location).
This commit teaches parser to generate code block syntax node. As a support for this,
SyntaxParsingContext can be created by a single syntax kind, indicating the whole context
should be parsed into a node of that given syntax. Another change is to bridge created syntax
node with the given context kind. For instance, if a statement context results into an expression
node, the expression node will be bridged to a statement by wrapping it with a ExpressionStmt
node.
* Re-apply "libSyntax: Ensure round-trip printing when we build syntax tree from parser incrementally. (#12709)"
* Re-apply "libSyntax: Root parsing context should hold a reference to the current token in the parser, NFC."
* Re-apply "libSyntax: avoid copying token text when lexing token syntax nodes, NFC. (#12723)"
* Actually fix the container-overflow issue.
Since all parsing contexts need a reference to the current token of the
parser, we should pass the token reference to the root context. Therefore, the derived
sub-contexts can just copy it while being spawned.
This patch allows Parser to generate a refined token stream to satisfy tooling's need. For syntax coloring, token stream from lexer is insufficient because (1) we have contextual keywords like get and set; (2) we may allow keywords to be used as argument labels and names; and (3) we need to split tokens like "==<". In this patch, these refinements are directly fulfilled through parsing without additional heuristics. The refined token vector is optionally saved in SourceFile instance.
...finally breaking the dependency of Parse on Sema.
There are still some unfortunate dependencies here -- Xi's working on
getting /AST/ not dependent on Sema -- but this is a step forward.
It is a little strange that parseIntoSourceFile is in ParseSIL, and
therefore that that's still a dependency for anyone trying to, well,
parse. However, nearly all clients that parse want to type-check as
well, and that requires Sema, Serialization, and the ClangImporter...
and Serialization and SIL currently require each other as well
(another circular dependency). So it's not actively causing us trouble
right now.
Previously, users of TokenSyntax would always deal with RC<TokenSyntax>
which is a subclass of RawSyntax. Instead, provide TokenSyntax as a
fully-realized Syntax node, that will always exist as a leaf in the
Syntax tree.
This hides the implementation detail of RawSyntax and SyntaxData
completely from clients of libSyntax, and paves the way for future
generation of Syntax nodes.
Push the getName method from ValueDecl down to only those types that are
guaranteed to have a name that is backed by an identifier and that will
not be special.
With the introduction of special decl names, `Identifier getName()` on
`ValueDecl` will be removed and pushed down to nominal declarations
whose name is guaranteed not to be special. Prepare for this by calling
to `DeclBaseName getBaseName()` instead where appropriate.
This fixes a syntax coloring issue when only the innermost " character were highlighted as part of the string, e.g:
""<str>"
This is a string
"</str>""
The Swift 4 Migrator is invoked through either the driver and frontend
with the -update-code flag.
The basic pipeline in the frontend is:
- Perform some list of syntactic fixes (there are currently none).
- Perform N rounds of sema fix-its on the primary input file, currently
set to 7 based on prior migrator seasons. Right now, this is just set
to take any fix-it suggested by the compiler.
- Emit a replacement map file, a JSON file describing replacements to a
file that Xcode knows how to understand.
Currently, the Migrator maintains a history of migration states along
the way for debugging purposes.
- Add -emit-remap frontend option
This will indicate the EmitRemap frontend action.
- Don't fork to a separte swift-update binary.
This is going to be a mode of the compiler, invoked by the same flags.
- Add -disable-migrator-fixits option
Useful for debugging, this skips the phase in the Migrator that
automatically applies fix-its suggested by the compiler.
- Add -emit-migrated-file-path option
This is used for testing/debugging scenarios. This takes the final
migration state's output text and writes it to the file specified
by this option.
- Add -dump-migration-states-dir
This dumps all of the migration states encountered during a migration
run for a file to the given directory. For example, the compiler
fix-it migration pass dumps the input file, the output file, and the
remap file between the two.
State output has the following naming convention:
${Index}-${MigrationPassName}-${What}.${extension}, such as:
1-FixitMigrationState-Input.swift
rdar://problem/30926261