LLVM is presumably moving towards `std::string_view` -
`StringRef::startswith` is deprecated on tip. `SmallString::startswith`
was just renamed there (maybe with some small deprecation inbetween, but
if so, we've missed it).
The `SmallString::startswith` references were moved to
`.str().starts_with()`, rather than adding the `starts_with` on
`stable/20230725` as we only had a few of them. Open to switching that
over if anyone feels strongly though.
Queue up diagnostics when lexing, waiting until
`Lexer::lex` is called before emitting them. This
allows us to re-lex without having to deal with
previously invalid tokens.
Updated uses of object::SectionRef::getContents() since it now returns
an Expected<StringRef> instead of modifying the one it's passed.
See also: git-svn-id:
https://llvm.org/svn/llvm-project/llvm/trunk@360892
91177308-0d34-0410-b5e6-96231b3b80d8
Previously, the Lexer kept a single flag whether we’re lexing Swift or SIL. Instead, keep track if we’re parsing Swift, SIL, or a Swiftinterface file. .swiftinterface files allow $-prefixed identifiers anywhere.
Lexer::getEncodedStringSegment (now getEncodedStringSegmentImpl)
assumes that it can read one byte past the end of a string segment in
order to avoid bounds-checks on things like "is this a \r\n
sequence?". However, the function was being used for strings that did
not come from source where this assumption was not always valid.
Change the reusable form of the function to always copy into a
temporary buffer, allowing the fast path to continue to be used for
normal parsing.
Caught by ASan!
rdar://problem/44306756
The right way is findBufferContainingLoc. getBufferIdentifierForLoc is
both slower and wrong in the presence of #sourceLocation.
I couldn't come up with a test for the change in IDE/Utils.cpp because
refactoring still seems to be broken around #sourceLocation. I'll file
bugs for that.
Having this be a single buffer hardcoded in the SourceManager and set
by all clients is silly. SourceFiles with the 'Main' kind are allowed
to have hashbang lines (`#!`), other files are not. And anyone
manually setting up a Lexer can decide for themselves.
No intended behavioral change.
Store leading a trailing "trivia" around a token, such as whitespace,
comments, doc comments, and escaping backticks. These are syntactically
important for preserving formatting when printing ASTs but don't
semantically affect the program.
Tokens take all trailing trivia up to, but not including, the next
newline. This is important to maintain checks that statements without
semicolon separators start on a new line, among other things.
Trivia are now data attached to the ends of tokens, not tokens
themselves.
Create a new Syntax sublibrary for upcoming immutable, persistent,
thread-safe ASTs, which will contain only the syntactic information
about source structure, as well as for generating new source code, and
structural editing. Proactively move swift::Token into there.
Since this patch is getting a bit large, a token fuzzer which checks
for round-trip equivlence with the workflow:
fuzzer => token stream => file1
=> Lexer => token stream => file 2 => diff(file1, file2)
Will arrive in a subsequent commit.
This patch does not change the grammar.
I didn't want to rip this logic out wholesale. There is a possibility
the character lexing can be reborn/revisited later, and
disabling it in the parser was easy.
Swift SVN r18102
outside of debugger-support mode. Rip out the existing special-case
code when parsing expr-identifier.
This means that the Lexer needs a LangOptions. Doug and I
talked about just adding that as a field of SourceMgr, but
decided that it was worth it to preserve the possibility of
parsing different dialects in different source files.
By design, the lexer doesn't tokenize fundamentally differently
in different language modes; it might decide something is invalid,
or it might (eventually) use a different token kind for the
same consumed text, but we don't want it deciding to consume more or
less of the stream per token.
Note that SIL mode does make that kind of difference, and that
arguably means that various APIs for tokenizing need to take a
"is SIL mode" flag, but we're getting away with it because we
just don't really care about fidelity of SIL source files.
rdar://14899000
Swift SVN r13896
Also remove the SourceLoc parameter from addNewSourceBuffer(). In llvm::SourceMgr
it is used to indicate textual inclusion, which we don't have in swift.
Swift SVN r10014
Replace uses of it with the newly introduced constructor that accepts a buffer ID.
The StringRef constructor was rather unsafe since it had the implicit requirement that the StringRef
was null-terminated.
Swift SVN r6942