'ParserUnit' is used for analyzing syntax structures _mainly_ in
SourceKit.
Since we removed IfConfigDecl from AST, ParserUnit didn't
inclue any AST in #if ... #endif regions even for active region because
it used to consider all inactive. Instead, consider every region
"active" and include all the AST nodes.
rdar://117387631
Out of all operating systems ever supported by Swift, only Ubuntu 14.04
had libstdc++ 4.8, and Swift has sunset support for Ubuntu 14.04 for a
while now.
This adjusts the tests for the difference between line endings on
different platforms. Windows uses CRLF while most Unicies use LF. This
was exposed during the update to the new LLVM snapshot.
The recommended way forward is to use the SyntaxClassifier on the Swift
side.
By removing the C++ SyntaxClassifier, we can also eliminate the
-force-libsyntax-based-processing option that was used to bootstrap
incremental parsing and would generate the syntax map from a syntax
tree.
Made an utility method 'consumeArgumentLabel', and use it for:
* Labels in tuple type
* Labels in tuple expression
* Argument and parameter names in parameter clause
If an edit didn't intersect with an existing highlighted tokens but caused a
later highlighted token to change kind, syntax highlighting would be lost
between the edit and that token.
Resolves rdar://problem/33463141.
We still need to adjust the affected range to the line boundaries and return all
tokens on the line when there are no new tokens, as the client will clear all
tokens on that line in its copy of the syntax map leaving the other tokens
unhighlighted. We also need to extend the affected range to include the ranges
of the mismatched tokens from the previous syntaxmap, so their highlighting will
be cleared.
Also add more comments to better document the new syntax map structure and
behaviour.
This patch changes the syntax map data structure it uses to be offset based
rather than line/col based in order to avoid calling getLineAndColumn for the
start and end offset of every token. This removes the 30% of time spent in
getLineAndColumn for this request in large files (rdar://problem/28965123).
The logic for returning the affected range and the token ranges to highlight
following an edit also made several assumptions that no longer hold. This
patch changes it to compare the syntax maps from before and after the edit,
find the first mismtaching tokens from the start and end of the syntax maps
and return the tokens in that range (adjusted to line boundaries). This fixes
syntax highlighting issues with interpolated multi-line strings
(rdar://problem/32148117) and block comments.
With the above changes the per-keystroke time spent for syntax highlighting
(with sematic info disabled) dropped from ~80ms to just under 50ms for a
12KLOC file.
We still need to adjust the affected range to the line boundaries and return all
tokens on the line when there are no new tokens, as the client will clear all
tokens on that line in its copy of the syntax map leaving the other tokens
unhighlighted. We also need to extend the affected range to include the ranges
of the mismatched tokens from the previous syntaxmap, so their highlighting will
be cleared.
Also add more comments to better document the new syntax map structure and
behaviour.
This patch changes the syntax map data structure it uses to be offset based
rather than line/col based in order to avoid calling getLineAndColumn for the
start and end offset of every token. This removes the 30% of time spent in
getLineAndColumn for this request in large files (rdar://problem/28965123).
The logic for returning the affected range and the token ranges to highlight
following an edit also made several assumptions that no longer hold. This
patch changes it to compare the syntax maps from before and after the edit,
find the first mismtaching tokens from the start and end of the syntax maps
and return the tokens in that range (adjusted to line boundaries). This fixes
syntax highlighting issues with interpolated multi-line strings
(rdar://problem/32148117) and block comments.
With the above changes the per-keystroke time spent for syntax highlighting
(with sematic info disabled) dropped from ~80ms to just under 50ms for a
12KLOC file.
Multiline strings (and multiline tokens in general) were not well supported by the existing highlighting logic. Edits
on one line can make tokens appear/disappear on previous and later lines, which broke assumptions in the existing
logic, and left odd ranges of source unhighlighted or out of date. This patch accounts for these changes, and also
changes unterminated multiline (and regular strings) to still be highlighted as strings, so the rest of the
file doesn't look like plain text.
Resolves rdar://problem/32148117.
* Removed `parseConstructorArguments()`, unified with
`parseSingleParameterClause()`.
* Use `parseSingleParameterClause()` from `parseFunctionSignature()`, so
that we can share the recovery code.
* Removed `isFirstParameterClause` parameter from `mapParsedParameters`,
because it's predictable from `paramContext`.
Store leading a trailing "trivia" around a token, such as whitespace,
comments, doc comments, and escaping backticks. These are syntactically
important for preserving formatting when printing ASTs but don't
semantically affect the program.
Tokens take all trailing trivia up to, but not including, the next
newline. This is important to maintain checks that statements without
semicolon separators start on a new line, among other things.
Trivia are now data attached to the ends of tokens, not tokens
themselves.
Create a new Syntax sublibrary for upcoming immutable, persistent,
thread-safe ASTs, which will contain only the syntactic information
about source structure, as well as for generating new source code, and
structural editing. Proactively move swift::Token into there.
Since this patch is getting a bit large, a token fuzzer which checks
for round-trip equivlence with the workflow:
fuzzer => token stream => file1
=> Lexer => token stream => file 2 => diff(file1, file2)
Will arrive in a subsequent commit.
This patch does not change the grammar.
Argument labels are allowed to use keywords, in which case we want to
treat them as identifiers in the syntax map (except for '_'). This
commit moves calculation of that into the original lexing instead of
in the model walker, which makes it much more robust, since the model
walker was only guessing about what was next on the the TokenNodes list.
This fixes a bug where arrays of object literals would only have the
first object correct (the following ones were identifiers), as well as
some incorrect cases where we treated keywords as identifiers.
rdar://problem/27726422
* A bunch of them require objc_interop because they import code containing
Objective-C.
* Many others fail on Ubuntu 14.04 because the C++ there doesn't have a
functional std::regex implementation which is required by the
`complete-test` tool.
It may be possible to adjust some of these tests in the future to not
need these extra requirements, but this is a straightforward way to
clean up Linux test results for now.