Commit Graph

20 Commits

Author SHA1 Message Date
Rintaro Ishizaki
057254dbc1 [Syntax] Bump allocate and cache/reuse RawSyntax
Introduced SyntaxArena for managing memory and cache.

SyntaxArena holds BumpPtrAllocator as a allocation storage.
RawSyntax is now able to be constructed with normal heap allocation, or
by SyntaxArena. RawSyntax has ManualMemory flag which indicates it's managed by
SyntaxArena. If the flag is true, its Retain()/Release() is no-op thus it's
never destructed by IntrusiveRefCntPtr.
This speedups the memory allocation for RawSyntax.

Also, in Syntax parsing, "token" RawSyntax is reused if:
a) It's not string literal with >16 length; and
b) It doesn't contain random text trivia (e.g. comment).
This reduces the overall allocation cost.
2018-02-02 01:27:06 +09:00
Rintaro Ishizaki
0780c529c4 [Syntax] Unify RawSyntax and RawTokenSyntax using union and TrailingObjects
It better matches with SwiftSyntax model.

Using TrailingObjects reduces the number of heap allocation which
gains 18% performance improvement.
2018-01-18 14:49:46 +09:00
omochimetaru
3a3e89ba0c [Syntax] add TriviaKind::CarriageReturnLineFeed 2017-12-20 20:50:40 +09:00
omochimetaru
aa32b42327 [Syntax] add trivia squash function 2017-12-20 14:09:47 +09:00
Rintaro Ishizaki
2c06060165 [Syntax] Add CarriageReturn trivia kind
To distinguish '\r' from '\n'.
2017-12-19 09:24:34 +09:00
Rintaro Ishizaki
d160ea2efa [Syntax] Privatize TriviaPiece constructor
So that we don't accidentaly create invalid trivia piece like:
  { TriviaKind::lineComment, 6, "foobar" }
2017-12-08 12:08:03 +09:00
Rintaro Ishizaki
2b1e316cf6 [Syntax] Add parsing hashbang (shebang) as a trivia.
Added GarbageText trivia kind for any skipped text.
2017-12-08 12:07:00 +09:00
Rintaro Ishizaki
e7a393f13f [Lexer] Lex vertical tab '\v' and form-feed '\t' trivias 2017-12-08 11:36:20 +09:00
Rintaro Ishizaki
d46073dd75 [libSyntax] Backtracking restarts from leading trivia position
When reading syntax.
2017-12-04 10:46:03 -08:00
Rintaro Ishizaki
0a401b381c [Syntax] Rewrite SyntaxParsingContext
Read RawSyntaxToken along with Parser::consumeToken()

* Single Lexer pass
* Backtracking support
* Split token support
2017-11-18 15:35:46 +09:00
Doug Gregor
8f43cba0b5 [Syntax] Replace TrivialList's std::deque with a std::vector.
For very large source files, the parser's syntax map---which contains a
very large number of TrivialLists---was taking an inordinate amount of
memory due to the inefficiency of std::deque. Specifically, a
std::deque containing just one trivial element would allocate 4k of
memory. With the ~120MB SIL output of one of the parse_stdlib tests,
these std::deques would add up to > 6GB of memory, most of which is
wasted.

Replacing the std::deque with a std::vector knocks the memory required
for one of the parse_stdlib tests from > 8GB down closer to 2 GB. The
parser's syntax map is still large (e.g., a 512MB allocation for the
overall vector plus a few hundred MB of raw-syntax data), but not
prohibitively so.

Part of rdar://problem/34771322.
2017-11-01 14:02:21 -07:00
Doug Gregor
945ac3de0a Revert " Re-enable parse_stdlib tests." 2017-11-01 06:59:35 -07:00
Doug Gregor
62f43ae75b [Syntax] Replace TrivialList's std::deque with a std::vector.
For very large source files, the parser's syntax map---which contains a
very large number of TrivialLists---was taking an inordinate amount of
memory due to the inefficiency of std::deque. Specifically, a
std::deque containing just one trivial element would allocate 4k of
memory. With the ~120MB SIL output of one of the parse_stdlib tests,
these std::deques would add up to > 6GB of memory, most of which is
wasted.

Replacing the std::deque with a std::vector knocks the memory required
for one of the parse_stdlib tests from > 8GB down closer to 2 GB. The
parser's syntax map is still large (e.g., a 512MB allocation for the
overall vector plus a few hundred MB of raw-syntax data), but not
prohibitively so.

Part of rdar://problem/34771322.
2017-10-31 23:33:19 -07:00
practicalswift
d352652a72 Merge pull request #7727 from practicalswift/typos-20170223
[gardening] Fix typos
2017-02-24 09:15:12 +01:00
practicalswift
33a5601ad1 [gardening] Fix typos 2017-02-23 22:46:40 +01:00
David Farler
733988cdfe [Syntax] Add Trivia C++ unit tests
https://bugs.swift.org/browse/SR-4053
2017-02-23 13:46:08 -08:00
practicalswift
e44af328fb [gardening] Fix incorrect Swift URLs. 2017-02-21 14:20:34 +01:00
David Farler
7ee42994c8 Start the Syntax library and optional full token lexing
Add an option to the lexer to go back and get a list of "full"
tokens, which include their leading and trailing trivia, which
we can index into from SourceLocs in the current AST.

This starts the Syntax sublibrary, which will support structured
editing APIs. Some skeleton support and basic implementations are
in place for types and generics in the grammar. Yes, it's slightly
redundant with what we have right now. lib/AST conflates syntax
and semantics in the same place(s); this is a first step in changing
that to separate the two concepts for clarity and also to get closer
to incremental parsing and type-checking. The goal is to eventually
extract all of the syntactic information from lib/AST and change that
to be more of a semantic/symbolic model.

Stub out a Semantics manager. This ought to eventually be used as a hub
for encapsulating lazily computed semantic information for syntax nodes.
For the time being, it can serve as a temporary place for mapping from
Syntax nodes to semantically full lib/AST nodes.

This is still in a molten state - don't get too close, wear appropriate
proximity suits, etc.
2017-02-17 12:57:04 -08:00
David Farler
f450f0ccdf Revert "Preserve whitespace and comments during lexing as Trivia"
This reverts commit d6e2b58382.
2016-11-18 13:23:31 -08:00
David Farler
d6e2b58382 Preserve whitespace and comments during lexing as Trivia
Store leading a trailing "trivia" around a token, such as whitespace,
comments, doc comments, and escaping backticks. These are syntactically
important for preserving formatting when printing ASTs but don't
semantically affect the program.

Tokens take all trailing trivia up to, but not including, the next
newline. This is important to maintain checks that statements without
semicolon separators start on a new line, among other things.

Trivia are now data attached to the ends of tokens, not tokens
themselves.

Create a new Syntax sublibrary for upcoming immutable, persistent,
thread-safe ASTs, which will contain only the syntactic information
about source structure, as well as for generating new source code, and
structural editing. Proactively move swift::Token into there.

Since this patch is getting a bit large, a token fuzzer which checks
for round-trip equivlence with the workflow:

fuzzer => token stream => file1
  => Lexer => token stream => file 2 => diff(file1, file2)

Will arrive in a subsequent commit.

This patch does not change the grammar.
2016-11-15 16:11:57 -08:00