Commit Graph

580 Commits

Author SHA1 Message Date
Rintaro Ishizaki
766774206b [Lexer] Don't setEscapedIdentifier(true) for tok::eof at ArtificialEOF
https://bugs.swift.org/browse/SR-6926

This happens when the Parser re-lexing comment tokens that sets
ArtificialEOF at the end of comment range.
It used to cause an assertion failure:
(!value || Kind == tok::identifier) && "only identifiers can be escaped identifiers"
2018-02-12 14:58:12 +09:00
Rintaro Ishizaki
0780c529c4 [Syntax] Unify RawSyntax and RawTokenSyntax using union and TrailingObjects
It better matches with SwiftSyntax model.

Using TrailingObjects reduces the number of heap allocation which
gains 18% performance improvement.
2018-01-18 14:49:46 +09:00
Erik Eckstein
a680768971 SIL: In textual SIL allow global SIL names starting with '$'.
For example: @$S1m3fooyyF
It's needed to change the mangling prefix to $S.
The parser change only affects SIL (and not swift).

I didn't add test case because it will be fully tested when changing the mangling prefix.
2018-01-05 11:29:15 -08:00
omochimetaru
bc88330740 [Parse] Lexer build backtick trivia around espaced identifier token 2017-12-29 00:22:49 +09:00
omochimetaru
ebd5323b42 [Syntax] add UTF-8 BOM support to libSyntax 2017-12-28 01:26:09 +09:00
omochimetaru
861ee3a112 [Parse] use pre increment for simple increment (#13624) 2017-12-27 15:46:58 +09:00
omochimetaru
70986a687f [Parse] fix lexTrivia LF bug 2017-12-22 14:04:14 +09:00
omochimetaru
fbe34e0f6f [Parse] improve LF handling efficiency. 2017-12-22 01:23:25 +09:00
omochimetaru
f86e1c8201 [Parse] add CRLF support in lexTrivia 2017-12-21 14:27:13 +09:00
omochimetaru
9daeaf0d06 [Parse] refactor lexTrivia with squash 2017-12-20 14:09:47 +09:00
omochimetaru
24509a0bde [Parse] Change LeadingTrivia type to Trivia 2017-12-20 14:09:47 +09:00
Rintaro Ishizaki
cc72a3b934 [Lexer] Use ContentStart position for hashbang trivia 2017-12-19 09:24:34 +09:00
Rintaro Ishizaki
2c06060165 [Syntax] Add CarriageReturn trivia kind
To distinguish '\r' from '\n'.
2017-12-19 09:24:34 +09:00
Rintaro Ishizaki
181333ce0f [Lexer] Lex conflict marker as a trivia 2017-12-19 09:24:33 +09:00
omochimetaru
5de598f34a [Parse] use skipHashbang in lexTrivia 2017-12-18 18:22:04 +09:00
omochimetaru
aeb9ba6f96 [Parse] use skipSlashSlashComment in lexTrivia 2017-12-18 18:22:04 +09:00
omochimetaru
f7136ae635 [Parse] delete skipUpToEndOfLine 2017-12-18 18:22:04 +09:00
omochimetaru
ed58c152bf [Parse] Improve Lexer's UTF-8 BOM handling (#13483)
* Add BOM handling testcases
* Add ContentStart to Lexer for BOM handling
2017-12-18 17:22:11 +09:00
Rintaro Ishizaki
5571e5cc76 [Lexer] Clear trivia at the top of lexImpl()
To make sure we only parse trivia for the current token.
2017-12-08 12:08:05 +09:00
Rintaro Ishizaki
9b32c62fbf [Lexer] Add TODO/FIXMEs for lexTrivia 2017-12-08 12:08:05 +09:00
Rintaro Ishizaki
2b1e316cf6 [Syntax] Add parsing hashbang (shebang) as a trivia.
Added GarbageText trivia kind for any skipped text.
2017-12-08 12:07:00 +09:00
Rintaro Ishizaki
e7a393f13f [Lexer] Lex vertical tab '\v' and form-feed '\t' trivias 2017-12-08 11:36:20 +09:00
Rintaro Ishizaki
d767dc39ba [Lexer] Improve implementation of lexTrivia 2017-12-08 11:36:20 +09:00
Rintaro Ishizaki
dcc37c3340 [Syntax] Normalize TriviaPiece internal value.
Length field of comments are always 1.
Text field of whitespaces are always "".
2017-12-04 10:46:03 -08:00
Rintaro Ishizaki
e01d525621 [Lexer] Remove some special trivia handling in Lexer
Even in multiline string mode, we should parse trailing trivia.

Removed special handling for backtick trivias, it's not produced in
Lexer anyway.
2017-12-04 10:46:03 -08:00
Rintaro Ishizaki
d46073dd75 [libSyntax] Backtracking restarts from leading trivia position
When reading syntax.
2017-12-04 10:46:03 -08:00
Rintaro Ishizaki
a78fda0720 [Syntax] Always lex Trivia when SF.shouldKeepSyntaxInfo()
For backward compatibility, Don't lex comments as trailing trivias.
2017-11-17 14:56:49 +09:00
Rintaro Ishizaki
40b195d98c [Syntax] Get rid of fullLex
Defer (Token, Trivia) -> RawTokenSyntax conversion from Lexer to Parser.
This is a part of effort for consolidating Syntax and AST parsing.
2017-11-17 14:56:49 +09:00
Xi Ge
75db3c1db8 Re-apply libSyntax patches after fixing ASAN issue (#12730)
* Re-apply "libSyntax: Ensure round-trip printing when we build syntax tree from parser incrementally. (#12709)"

* Re-apply "libSyntax: Root parsing context should hold a reference to the current token in the parser, NFC."

* Re-apply "libSyntax: avoid copying token text when lexing token syntax nodes, NFC. (#12723)"

* Actually fix the container-overflow issue.
2017-11-03 13:25:33 -07:00
Xi Ge
7ebf66ed2d libSyntax: forward declare libSyntax entities in several header files, NFC. (#12735) 2017-11-02 20:55:18 -07:00
Xi Ge
4d1249aa82 Revert "libSyntax: Ensure round-trip printing when we build syntax tree from parser incrementally. (#12709)"
This reverts commit 0d98c4c5df.
2017-11-02 14:44:26 -07:00
Xi Ge
407db56b8d Revert "libSyntax: avoid copying token text when lexing token syntax nodes, NFC. (#12723)"
This reverts commit 7981630ddd.
2017-11-02 14:43:42 -07:00
Xi Ge
7981630ddd libSyntax: avoid copying token text when lexing token syntax nodes, NFC. (#12723)
This is likely the root cause for memory surge when we always turn on
syntax token lexing. Since the underlying buffer outlives the syntax
tree, it's reasonable to refer the text instead of copying and owning it.
2017-11-02 14:04:25 -07:00
Xi Ge
0d98c4c5df libSyntax: Ensure round-trip printing when we build syntax tree from parser incrementally. (#12709) 2017-11-01 20:29:30 -07:00
Doug Gregor
8f43cba0b5 [Syntax] Replace TrivialList's std::deque with a std::vector.
For very large source files, the parser's syntax map---which contains a
very large number of TrivialLists---was taking an inordinate amount of
memory due to the inefficiency of std::deque. Specifically, a
std::deque containing just one trivial element would allocate 4k of
memory. With the ~120MB SIL output of one of the parse_stdlib tests,
these std::deques would add up to > 6GB of memory, most of which is
wasted.

Replacing the std::deque with a std::vector knocks the memory required
for one of the parse_stdlib tests from > 8GB down closer to 2 GB. The
parser's syntax map is still large (e.g., a 512MB allocation for the
overall vector plus a few hundred MB of raw-syntax data), but not
prohibitively so.

Part of rdar://problem/34771322.
2017-11-01 14:02:21 -07:00
Doug Gregor
945ac3de0a Revert " Re-enable parse_stdlib tests." 2017-11-01 06:59:35 -07:00
Doug Gregor
62f43ae75b [Syntax] Replace TrivialList's std::deque with a std::vector.
For very large source files, the parser's syntax map---which contains a
very large number of TrivialLists---was taking an inordinate amount of
memory due to the inefficiency of std::deque. Specifically, a
std::deque containing just one trivial element would allocate 4k of
memory. With the ~120MB SIL output of one of the parse_stdlib tests,
these std::deques would add up to > 6GB of memory, most of which is
wasted.

Replacing the std::deque with a std::vector knocks the memory required
for one of the parse_stdlib tests from > 8GB down closer to 2 GB. The
parser's syntax map is still large (e.g., a 512MB allocation for the
overall vector plus a few hundred MB of raw-syntax data), but not
prohibitively so.

Part of rdar://problem/34771322.
2017-10-31 23:33:19 -07:00
Xi Ge
844aeae2d5 Re-apply "libSyntax: create a basic infrastructure for generating libSyntax entities by using Parser." (#12538) 2017-10-20 22:58:28 -07:00
Greg Parker
48a6b9d464 Revert "libSyntax: create a basic infrastructure for generating libSyntax entities by using Parser."
This reverts commit ee7a06276d.
It causes build failures like "'swift/Syntax/SyntaxNodes.h' file not found".
2017-10-19 17:11:48 -07:00
Xi Ge
ee7a06276d libSyntax: create a basic infrastructure for generating libSyntax entities by using Parser. 2017-10-18 17:02:00 -07:00
Saleem Abdulrasool
7bd2256120 runtime: clean up last of -Wqual-cast warnings
This fixes up the remaining cast qualifier warnings from GCC 6.  Use
multiple casts to adjust the const qualification.  Prefer C++ style
casts.  NFC.
2017-09-22 14:14:13 -07:00
Xi Ge
2ba1ca2d8f IDE: simplify some code. NFC (#11935)
* IDE: simplify some code. NFC

* add assert.
2017-09-14 18:19:48 -07:00
Rintaro Ishizaki
a2d3ff4deb [SE-0182][Lexer] Diagnose escaped newline at the end of the last line in multiline string 2017-07-26 21:18:58 +09:00
John Holdsworth
c0fcc1afba [Parse] An implementation for SE-0182 2017-07-21 18:07:06 +01:00
David Rönnqvist
9ed9a860a0 Fix unicode handling when checking invalid characters
Use `advanceIfValidContinuationOfIdentifier` instead of `isValidIdentifierContinuationCodePoint` to handle unicode.
2017-07-08 19:27:42 +02:00
David Rönnqvist
57731ebc09 [QoI] [Parse] Improve error message when parsing floating point exponent
Update error messages to mention the invalid character.
Improve the diagnostic of floating point exponents.

Add tests for error messages when parsing floating point exponents.
Update existing tests for new error messages.
2017-07-08 13:32:34 +02:00
David Rönnqvist
a615d9ede3 [QoI] Improve error message when parsing integer literal
Rephrased error message to indicate which character is unexpected.
Provide error message variations when parsing binary, octal, decimal (default), and hexadecimal integer literals.
Look for unexpected digits in binary and octal integer literals.
Look for unexpected letters in hex integer literals.

Resolves: SR-5236 rdar://problem/32858684
2017-07-06 19:59:09 +02:00
Harlan
70089a7bcc [Syntax] Represent TokenSyntax as a Syntax node (#10606)
Previously, users of TokenSyntax would always deal with RC<TokenSyntax>
which is a subclass of RawSyntax. Instead, provide TokenSyntax as a
fully-realized Syntax node, that will always exist as a leaf in the
Syntax tree.

This hides the implementation detail of RawSyntax and SyntaxData
completely from clients of libSyntax, and paves the way for future
generation of Syntax nodes.
2017-06-27 11:08:10 -07:00
Rintaro Ishizaki
c8bd1aa401 [Parse] Fix skipping string interpolation in Lexer
Maintain inner most string literal mode to determine whether we allow
newline character or not.

* Disallow newline after multiline string in string interpolation. (SR-5171)
* Allow unbalanced `"` in multiline string in string interpolation.
2017-06-16 02:22:49 +09:00
Robert Widmann
3e2bbfe904 [Gardening] Cleanup TokenKinds.def (#10034)
* [Parse] Refactored internal structure of Tokens.def and documented usage.

Added a level of structure to the macro definitions to allow Swift
keywords to be cleanly accessed separately from SIL and Swift keywords
together. Documented structure and usage.

* [Parse] Made use of new guarantees and abstractions in Tokens.def

Used guarantees about undefining macros after import and new
SWIFT_KEYWORD abstraction to simplify usage of the Token.def
imports.

* Gardening
2017-06-01 15:08:48 -07:00