This patch allows Parser to generate a refined token stream to satisfy tooling's need. For syntax coloring, token stream from lexer is insufficient because (1) we have contextual keywords like get and set; (2) we may allow keywords to be used as argument labels and names; and (3) we need to split tokens like "==<". In this patch, these refinements are directly fulfilled through parsing without additional heuristics. The refined token vector is optionally saved in SourceFile instance.
This changes `getBaseName()` on `DeclName` to return a `DeclBaseName`
instead of an `Identifier`. All places that will continue to be
expecting an `Identifier` are changed to call `getBaseIdentifier` which
will later assert that the `DeclName` is actually backed by an
identifier and not a special name.
For transitional purposes, a conversion operator from `DeclBaseName` to
`Identifier` has been added that will be removed again once migration
to DeclBaseName has been completed in other parts of the compiler.
Unify approach to printing declaration names
Printing a declaration's name using `<<` and `getBaseName()` is be
independent of the return type of `getBaseName()` which will change in
the future from `Identifier` to `DeclBaseName`
Multiline strings (and multiline tokens in general) were not well supported by the existing highlighting logic. Edits
on one line can make tokens appear/disappear on previous and later lines, which broke assumptions in the existing
logic, and left odd ranges of source unhighlighted or out of date. This patch accounts for these changes, and also
changes unterminated multiline (and regular strings) to still be highlighted as strings, so the rest of the
file doesn't look like plain text.
Resolves rdar://problem/32148117.
Resolves: https://bugs.swift.org/browse/SR-4426
* Make IfConfigDecl be able to hold ASTNodes
* Parse #if as IfConfigDecl
* Stop enclosing toplevel #if into TopLevelCodeDecl.
* Eliminate IfConfigStmt
Replace `NameOfType foo = dyn_cast<NameOfType>(bar)` with DRY version `auto foo = dyn_cast<NameOfType>(bar)`.
The DRY auto version is by far the dominant form already used in the repo, so this PR merely brings the exceptional cases (redundant repetition form) in line with the dominant form (auto form).
See the [C++ Core Guidelines](https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#es11-use-auto-to-avoid-redundant-repetition-of-type-names) for a general discussion on why to use `auto` to avoid redundant repetition of type names.
Add an option to the lexer to go back and get a list of "full"
tokens, which include their leading and trailing trivia, which
we can index into from SourceLocs in the current AST.
This starts the Syntax sublibrary, which will support structured
editing APIs. Some skeleton support and basic implementations are
in place for types and generics in the grammar. Yes, it's slightly
redundant with what we have right now. lib/AST conflates syntax
and semantics in the same place(s); this is a first step in changing
that to separate the two concepts for clarity and also to get closer
to incremental parsing and type-checking. The goal is to eventually
extract all of the syntactic information from lib/AST and change that
to be more of a semantic/symbolic model.
Stub out a Semantics manager. This ought to eventually be used as a hub
for encapsulating lazily computed semantic information for syntax nodes.
For the time being, it can serve as a temporary place for mapping from
Syntax nodes to semantically full lib/AST nodes.
This is still in a molten state - don't get too close, wear appropriate
proximity suits, etc.
Also rename ASTWalker::shouldWalkIntoFunctionGenericParams() to shouldWalkIntoGenericParams() since it's now used when walking NominalTypeDecl (not just AbstractFunctionDecl).
Changes:
* Terminate all namespaces with the correct closing comment.
* Make sure argument names in comments match the corresponding parameter name.
* Remove redundant get() calls on smart pointers.
* Prefer using "override" or "final" instead of "virtual". Remove "virtual" where appropriate.
Store leading a trailing "trivia" around a token, such as whitespace,
comments, doc comments, and escaping backticks. These are syntactically
important for preserving formatting when printing ASTs but don't
semantically affect the program.
Tokens take all trailing trivia up to, but not including, the next
newline. This is important to maintain checks that statements without
semicolon separators start on a new line, among other things.
Trivia are now data attached to the ends of tokens, not tokens
themselves.
Create a new Syntax sublibrary for upcoming immutable, persistent,
thread-safe ASTs, which will contain only the syntactic information
about source structure, as well as for generating new source code, and
structural editing. Proactively move swift::Token into there.
Since this patch is getting a bit large, a token fuzzer which checks
for round-trip equivlence with the workflow:
fuzzer => token stream => file1
=> Lexer => token stream => file 2 => diff(file1, file2)
Will arrive in a subsequent commit.
This patch does not change the grammar.
Most keywords can be used as argument labels. When they are they shouldn't be
colored as a keyword. This patch fixes three issues with this:
- keywords as local argument labels (where an external label was also present)
were still colored as keywords
- 'let', 'var', and 'inout' cannot be used as argument labels, but were colored
as identifiers rather than keywords when used like one
- the check to see if a keyword token is being used as an argument label wasn't
quite restrictive enough, e.g. treating the 'let's in 'case (let x, let y):' as
identifiers.
This gave a roughly 40-45% improvement in sourcekitd's incremental
syntax-only parse time in files with a lot of doc comments (test case
was ~6000 lines, with ~780 lines being doc comments). This is on the
critical path for every edit.
While there were a few smaller improvements we could have made to the
original code, ultimately std::regex is slow, and it was better to just
use a custom parser for these simple patterns.
rdar://problem/28809397
[SyntaxModel] When searching URLs in doc comments, reduce the number of protocol name comparisons by looking ahead more characters, NFC. rdar://28298506
Searching URL in doc comments can be expensive. We used to look for
every colon as an indicator of potential URLs. However, this is not
efficient enough. Suggested by Ben, we further divide protocols into
categories so that most protocols can use "://" as an indicator of its
existence.
Not sure whether this is enough to close the radar, but I believe it is
a valuable performance improvement anyway.
Keywords like 'let' can serve as argument labels. When they do so, we should
highlight them as identifiers instead of keywords. However, the check for this
situation seems overly lenient so that when 'let', 'var' appear in conditions
of IfStmt or GuardStmt, they are wrongly highlighted as identifiers too. This
commit strengthens the checking to preserve keywords' identity in these statements.
rdar://28297337
Argument labels are allowed to use keywords, in which case we want to
treat them as identifiers in the syntax map (except for '_'). This
commit moves calculation of that into the original lexing instead of
in the model walker, which makes it much more robust, since the model
walker was only guessing about what was next on the the TokenNodes list.
This fixes a bug where arrays of object literals would only have the
first object correct (the following ones were identifiers), as well as
some incorrect cases where we treated keywords as identifiers.
rdar://problem/27726422
What I've implemented here deviates from the current proposal text
in the following ways:
- I had to introduce a FunctionArrowPrecedence to capture the parsing
of -> in expression contexts.
- I found it convenient to continue to model the assignment property
explicitly.
- The comparison and casting operators have historically been
non-associative; I have chosen to preserve that, since I don't
think this proposal intended to change it.
- This uses the precedence group names and higherThan/lowerThan
as agreed in discussion.