* Re-apply "libSyntax: Ensure round-trip printing when we build syntax tree from parser incrementally. (#12709)"
* Re-apply "libSyntax: Root parsing context should hold a reference to the current token in the parser, NFC."
* Re-apply "libSyntax: avoid copying token text when lexing token syntax nodes, NFC. (#12723)"
* Actually fix the container-overflow issue.
This is likely the root cause for memory surge when we always turn on
syntax token lexing. Since the underlying buffer outlives the syntax
tree, it's reasonable to refer the text instead of copying and owning it.
For very large source files, the parser's syntax map---which contains a
very large number of TrivialLists---was taking an inordinate amount of
memory due to the inefficiency of std::deque. Specifically, a
std::deque containing just one trivial element would allocate 4k of
memory. With the ~120MB SIL output of one of the parse_stdlib tests,
these std::deques would add up to > 6GB of memory, most of which is
wasted.
Replacing the std::deque with a std::vector knocks the memory required
for one of the parse_stdlib tests from > 8GB down closer to 2 GB. The
parser's syntax map is still large (e.g., a 512MB allocation for the
overall vector plus a few hundred MB of raw-syntax data), but not
prohibitively so.
Part of rdar://problem/34771322.
For very large source files, the parser's syntax map---which contains a
very large number of TrivialLists---was taking an inordinate amount of
memory due to the inefficiency of std::deque. Specifically, a
std::deque containing just one trivial element would allocate 4k of
memory. With the ~120MB SIL output of one of the parse_stdlib tests,
these std::deques would add up to > 6GB of memory, most of which is
wasted.
Replacing the std::deque with a std::vector knocks the memory required
for one of the parse_stdlib tests from > 8GB down closer to 2 GB. The
parser's syntax map is still large (e.g., a 512MB allocation for the
overall vector plus a few hundred MB of raw-syntax data), but not
prohibitively so.
Part of rdar://problem/34771322.
Update error messages to mention the invalid character.
Improve the diagnostic of floating point exponents.
Add tests for error messages when parsing floating point exponents.
Update existing tests for new error messages.
Rephrased error message to indicate which character is unexpected.
Provide error message variations when parsing binary, octal, decimal (default), and hexadecimal integer literals.
Look for unexpected digits in binary and octal integer literals.
Look for unexpected letters in hex integer literals.
Resolves: SR-5236 rdar://problem/32858684
Previously, users of TokenSyntax would always deal with RC<TokenSyntax>
which is a subclass of RawSyntax. Instead, provide TokenSyntax as a
fully-realized Syntax node, that will always exist as a leaf in the
Syntax tree.
This hides the implementation detail of RawSyntax and SyntaxData
completely from clients of libSyntax, and paves the way for future
generation of Syntax nodes.
Maintain inner most string literal mode to determine whether we allow
newline character or not.
* Disallow newline after multiline string in string interpolation. (SR-5171)
* Allow unbalanced `"` in multiline string in string interpolation.
* [Parse] Refactored internal structure of Tokens.def and documented usage.
Added a level of structure to the macro definitions to allow Swift
keywords to be cleanly accessed separately from SIL and Swift keywords
together. Documented structure and usage.
* [Parse] Made use of new guarantees and abstractions in Tokens.def
Used guarantees about undefining macros after import and new
SWIFT_KEYWORD abstraction to simplify usage of the Token.def
imports.
* Gardening
This introduces a few unfortunate things because the syntax is awkward.
In particular, the period and following token in \.[a], \.? and \.! are
token sequences that don't appear anywhere else in Swift, and so need
special handling. This is somewhat compounded by \foo.bar.baz possibly
being \(foo).bar.baz or \(foo.bar).baz (parens around the type), and,
furthermore, needing to distinguish \Foo?.bar from \Foo.?bar.
rdar://problem/31724243
This adds support for SE-0168, multi-line string literals.
Extend the lexer to recognize the new literals. Test cases added.
There are still areas for future diagnostic improvement, such as fixits and notes as to why a multi-line string literal will be malformed. Multi-line literals are explicitly forbidden inside of string interpolation, though this may be relaxed in the future.
The Swift 4 Migrator is invoked through either the driver and frontend
with the -update-code flag.
The basic pipeline in the frontend is:
- Perform some list of syntactic fixes (there are currently none).
- Perform N rounds of sema fix-its on the primary input file, currently
set to 7 based on prior migrator seasons. Right now, this is just set
to take any fix-it suggested by the compiler.
- Emit a replacement map file, a JSON file describing replacements to a
file that Xcode knows how to understand.
Currently, the Migrator maintains a history of migration states along
the way for debugging purposes.
- Add -emit-remap frontend option
This will indicate the EmitRemap frontend action.
- Don't fork to a separte swift-update binary.
This is going to be a mode of the compiler, invoked by the same flags.
- Add -disable-migrator-fixits option
Useful for debugging, this skips the phase in the Migrator that
automatically applies fix-its suggested by the compiler.
- Add -emit-migrated-file-path option
This is used for testing/debugging scenarios. This takes the final
migration state's output text and writes it to the file specified
by this option.
- Add -dump-migration-states-dir
This dumps all of the migration states encountered during a migration
run for a file to the given directory. For example, the compiler
fix-it migration pass dumps the input file, the output file, and the
remap file between the two.
State output has the following naming convention:
${Index}-${MigrationPassName}-${What}.${extension}, such as:
1-FixitMigrationState-Input.swift
rdar://problem/30926261
Add an option to the lexer to go back and get a list of "full"
tokens, which include their leading and trailing trivia, which
we can index into from SourceLocs in the current AST.
This starts the Syntax sublibrary, which will support structured
editing APIs. Some skeleton support and basic implementations are
in place for types and generics in the grammar. Yes, it's slightly
redundant with what we have right now. lib/AST conflates syntax
and semantics in the same place(s); this is a first step in changing
that to separate the two concepts for clarity and also to get closer
to incremental parsing and type-checking. The goal is to eventually
extract all of the syntactic information from lib/AST and change that
to be more of a semantic/symbolic model.
Stub out a Semantics manager. This ought to eventually be used as a hub
for encapsulating lazily computed semantic information for syntax nodes.
For the time being, it can serve as a temporary place for mapping from
Syntax nodes to semantically full lib/AST nodes.
This is still in a molten state - don't get too close, wear appropriate
proximity suits, etc.
ede6bf7a80 increments the buffer pointer to early
when searching for the <# prefix to stop lexing an operator,
so the source buffer is accessed 1 byte off the end.
Thanks ASan and thanks @gparker42!
rdar://problem/28457876
'.<#placeholder#>' is actually an unresolved reference where the name is
an editor placeholder, not the operator '.<' followed by #placeholder#>.
rdar://problem/28457876
RFC 2279 states that, in UTF-8:
"The octet values FE and FF never appear."
RFC 3629 states that, in UTF-8:
"The octet values C0, C1, F5 to FF never appear."
Generalize the check to advance past invalid starting bytes for
a UTF-8 sequence to fix a crash in the lexer.
Store leading a trailing "trivia" around a token, such as whitespace,
comments, doc comments, and escaping backticks. These are syntactically
important for preserving formatting when printing ASTs but don't
semantically affect the program.
Tokens take all trailing trivia up to, but not including, the next
newline. This is important to maintain checks that statements without
semicolon separators start on a new line, among other things.
Trivia are now data attached to the ends of tokens, not tokens
themselves.
Create a new Syntax sublibrary for upcoming immutable, persistent,
thread-safe ASTs, which will contain only the syntactic information
about source structure, as well as for generating new source code, and
structural editing. Proactively move swift::Token into there.
Since this patch is getting a bit large, a token fuzzer which checks
for round-trip equivlence with the workflow:
fuzzer => token stream => file1
=> Lexer => token stream => file 2 => diff(file1, file2)
Will arrive in a subsequent commit.
This patch does not change the grammar.
Like cursor-info, range info (""source.request.cursorinfo"") answers some
questions clients have for a code snippet under selection, for instance, the type of a selected
expression. This commit implements this new quest kind and provides two
simple information about the selected code: (1) the kind of the
snippet, currently limited to single-statement and expression; and (2)
the type of the selected expression. Gradually, we will enrich the
response to provide more insight into the selected code snippet.
When in Swift 3 Compatibility Mode we now acceptable a standalone
'$' as an identifier. In all other cases this is now disallowed
and must be surrounded by backticks.