Files
swift-mirror/stdlib/public/LLVMSupport/StringRef.cpp
Kuba (Brecka) Mracek 7e33575c6b Re-import LLVMSupport from llvm-project and enforce header includes only being used from the Swift fork when building stdlib (#40173)
* Enforce using headers from Swift's LLVMSupport fork and not llvm-project when building stdlib

* [LLVMSupport] Re-import LLVMSupport .cpp and .h files from 9ff3a9759b7c2f146e7f46e4aebc60453c577c5a from apple/llvm-project

Done via the following commands, while having llvm-project checked out at 9ff3a9759b7c2f146e7f46e4aebc60453c577c5a, a
commit on the stable/20210726 branch of apple/llvm-project, <9ff3a9759b>:

for i in swift/stdlib/public/LLVMSupport/*.cpp ; do cp llvm-project/llvm/lib/Support/$(basename $i) $i ; done
for i in swift/stdlib/include/llvm/ADT/*.h; do cp llvm-project/llvm/include/llvm/ADT/$(basename $i) $i ; done
for i in swift/stdlib/include/llvm/Support/*.h; do cp llvm-project/llvm/include/llvm/Support/$(basename $i) $i ; done
cp llvm-project/llvm/include/llvm/ADT/ScopeExit.h swift/stdlib/include/llvm/ADT/ScopeExit.h
cp llvm-project/llvm/include/llvm/ADT/Twine.h swift/stdlib/include/llvm/ADT/Twine.h
cp llvm-project/llvm/include/llvm/Support/raw_ostream.h swift/stdlib/include/llvm/Support/raw_ostream.h

* [LLVMSupport] Re-namespace the LLVMSupport fork after re-forking by re-applying b72788c27a

More precisely:

1) git cherry-pick b72788c27a
2) manually resolve the conflict in AlignOf.h by keeping the HEAD's version of the chunk and discarding the cherry-pick's change
3) git add AlignOf.h
4) git status | grep "deleted by us" | awk '{print($4)}' | xargs git rm
5) git cherry-pick --continue

Original namespacing commit message:

> This adds the `__swift::__runtime` inline namespace to the LLVMSupport
> interfaces.  This avoids an ODR violation when LLVM and Swift are in the
> same address space.  It also will aid in the process of pruning the
> LLVMSupport library by ensuring that accidental leakage of the llvm
> namespace does not allow us to remove symbols which we rely on.

* [LLVMSupport] Re-apply "pruning" on re-forked LLVMSupport from bb102707ed

This re-applies the "pruning" commit from bb102707ed, which did the following:
- Remove many whole files,
- Remove "epoch tracking" and "reverse iteration" support from ADT containers
- Remove "ABI break checking" support from STLExtras
- Remove float parsing functions from StringExtras.h
- Remove APInt/APSInt dependencies from StringRef.h + StringRef.cpp (edit distance, int parsing)
- Remove some variants of error handling and dependency of dbgs() from ErrorHandling.h and ErrorHandling.cpp

We don't need to do the whole-file-removal step, because that's already done, but the rest is re-applied by doing:

1) git cherry-pick bb102707ed
2) manually resolving conflict in ADT/DenseMap.h by keeping HEAD's version of the chunk and removing epoch tracking from it
3) manually resolving conflict in ADT/STLExtras.h by keeping HEAD's version of the chunk and removing ABI check checking from it
4) manually resolving conflict in ADT/StringExtras.h by deleting the whole chunk (removing APInt/APSInt dependent functions)
5) manually resolving conflict in ErrorHandling.cpp by force-applying the cherry-pick's version (removing write() calls and OOM callback)
6) manually resolving the three conflicts in CMakeLists.txt files by keeping HEAD's version completely
7) git add stdlib/include/llvm/{ADT/StringSwitch.h,ADT/Twine.h,Support/raw_ostream.h}

Original commit description:

> Reduce LLVMSupport to the subset required for the runtime.  This reduces
> the TCB and the overheads of the runtime.  The inline namespace's
> preservation ensures that ODR violations do not occur.

* [LLVMSupport] Re-apply all post-import modifications on LLVMSupport that the Swift's fork has

Since the previous commits re-imported "vanilla" versions of LLVMSupport, we need to re-apply all modifications that the Swift's fork has made since the last import. More precisely:

1) git diff 7b70120440cd39d67a595a7d0ea4e828ecc6ee44..origin/main -- stdlib/include/llvm stdlib/public/LLVMSupport | git apply -3 --exclude "stdlib/include/llvm/Support/DataTypes.h" --exclude "stdlib/include/llvm/Config/llvm-config.h.cmake"
2) manually resolve conflict in STLExtras.h by applying the "__swift::__runtime" prefix to HEAD's version
3) manually resolve conflicts in StringSwitch.h by keeping HEAD's version (removing the Unicode BOM marker at the beginning of the file, keeping LLVM's version of the string functions)
4) manually resolve conflict in SwapByteOrder.h by adding the `defined(__wasi__)` part into the #if

* [LLVMSupport] Drop remaining dependencies on APSInt.h, Error.h, DataTypes.h and STLForwardCompat.h

Most cases can drop the #includes without any changes, in some cases there are
straighforward replacements (climits, cstdint). For STLForwardCompat.h, we need
to bring in parts of STLForwardCompat.h from llvm-project.

* [LLVMSupport] Remove raw_ostream.h and drop dependencies to it from the runtime

* [LLVMSupport] Simplify error reporting in SmallVector and avoid using std::string when producing fatal errors messages

Co-authored-by: Saleem Abdulrasool <compnerd@compnerd.org>
2021-12-02 17:21:51 -08:00

496 lines
14 KiB
C++

//===-- StringRef.cpp - Lightweight String References ---------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
#include "llvm/ADT/StringExtras.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/Hashing.h"
#include <bitset>
#include <climits>
using namespace llvm;
// MSVC emits references to this into the translation units which reference it.
#ifndef _MSC_VER
constexpr size_t StringRef::npos;
#endif
// strncasecmp() is not available on non-POSIX systems, so define an
// alternative function here.
static int ascii_strncasecmp(const char *LHS, const char *RHS, size_t Length) {
for (size_t I = 0; I < Length; ++I) {
unsigned char LHC = toLower(LHS[I]);
unsigned char RHC = toLower(RHS[I]);
if (LHC != RHC)
return LHC < RHC ? -1 : 1;
}
return 0;
}
int StringRef::compare_insensitive(StringRef RHS) const {
if (int Res = ascii_strncasecmp(Data, RHS.Data, std::min(Length, RHS.Length)))
return Res;
if (Length == RHS.Length)
return 0;
return Length < RHS.Length ? -1 : 1;
}
bool StringRef::startswith_insensitive(StringRef Prefix) const {
return Length >= Prefix.Length &&
ascii_strncasecmp(Data, Prefix.Data, Prefix.Length) == 0;
}
bool StringRef::endswith_insensitive(StringRef Suffix) const {
return Length >= Suffix.Length &&
ascii_strncasecmp(end() - Suffix.Length, Suffix.Data, Suffix.Length) == 0;
}
size_t StringRef::find_insensitive(char C, size_t From) const {
char L = toLower(C);
return find_if([L](char D) { return toLower(D) == L; }, From);
}
/// compare_numeric - Compare strings, handle embedded numbers.
int StringRef::compare_numeric(StringRef RHS) const {
for (size_t I = 0, E = std::min(Length, RHS.Length); I != E; ++I) {
// Check for sequences of digits.
if (isDigit(Data[I]) && isDigit(RHS.Data[I])) {
// The longer sequence of numbers is considered larger.
// This doesn't really handle prefixed zeros well.
size_t J;
for (J = I + 1; J != E + 1; ++J) {
bool ld = J < Length && isDigit(Data[J]);
bool rd = J < RHS.Length && isDigit(RHS.Data[J]);
if (ld != rd)
return rd ? -1 : 1;
if (!rd)
break;
}
// The two number sequences have the same length (J-I), just memcmp them.
if (int Res = compareMemory(Data + I, RHS.Data + I, J - I))
return Res < 0 ? -1 : 1;
// Identical number sequences, continue search after the numbers.
I = J - 1;
continue;
}
if (Data[I] != RHS.Data[I])
return (unsigned char)Data[I] < (unsigned char)RHS.Data[I] ? -1 : 1;
}
if (Length == RHS.Length)
return 0;
return Length < RHS.Length ? -1 : 1;
}
//===----------------------------------------------------------------------===//
// String Operations
//===----------------------------------------------------------------------===//
std::string StringRef::lower() const {
return std::string(map_iterator(begin(), toLower),
map_iterator(end(), toLower));
}
std::string StringRef::upper() const {
return std::string(map_iterator(begin(), toUpper),
map_iterator(end(), toUpper));
}
//===----------------------------------------------------------------------===//
// String Searching
//===----------------------------------------------------------------------===//
/// find - Search for the first string \arg Str in the string.
///
/// \return - The index of the first occurrence of \arg Str, or npos if not
/// found.
size_t StringRef::find(StringRef Str, size_t From) const {
if (From > Length)
return npos;
const char *Start = Data + From;
size_t Size = Length - From;
const char *Needle = Str.data();
size_t N = Str.size();
if (N == 0)
return From;
if (Size < N)
return npos;
if (N == 1) {
const char *Ptr = (const char *)::memchr(Start, Needle[0], Size);
return Ptr == nullptr ? npos : Ptr - Data;
}
const char *Stop = Start + (Size - N + 1);
// For short haystacks or unsupported needles fall back to the naive algorithm
if (Size < 16 || N > 255) {
do {
if (std::memcmp(Start, Needle, N) == 0)
return Start - Data;
++Start;
} while (Start < Stop);
return npos;
}
// Build the bad char heuristic table, with uint8_t to reduce cache thrashing.
uint8_t BadCharSkip[256];
std::memset(BadCharSkip, N, 256);
for (unsigned i = 0; i != N-1; ++i)
BadCharSkip[(uint8_t)Str[i]] = N-1-i;
do {
uint8_t Last = Start[N - 1];
if (LLVM_UNLIKELY(Last == (uint8_t)Needle[N - 1]))
if (std::memcmp(Start, Needle, N - 1) == 0)
return Start - Data;
// Otherwise skip the appropriate number of bytes.
Start += BadCharSkip[Last];
} while (Start < Stop);
return npos;
}
size_t StringRef::find_insensitive(StringRef Str, size_t From) const {
StringRef This = substr(From);
while (This.size() >= Str.size()) {
if (This.startswith_insensitive(Str))
return From;
This = This.drop_front();
++From;
}
return npos;
}
size_t StringRef::rfind_insensitive(char C, size_t From) const {
From = std::min(From, Length);
size_t i = From;
while (i != 0) {
--i;
if (toLower(Data[i]) == toLower(C))
return i;
}
return npos;
}
/// rfind - Search for the last string \arg Str in the string.
///
/// \return - The index of the last occurrence of \arg Str, or npos if not
/// found.
size_t StringRef::rfind(StringRef Str) const {
size_t N = Str.size();
if (N > Length)
return npos;
for (size_t i = Length - N + 1, e = 0; i != e;) {
--i;
if (substr(i, N).equals(Str))
return i;
}
return npos;
}
size_t StringRef::rfind_insensitive(StringRef Str) const {
size_t N = Str.size();
if (N > Length)
return npos;
for (size_t i = Length - N + 1, e = 0; i != e;) {
--i;
if (substr(i, N).equals_insensitive(Str))
return i;
}
return npos;
}
/// find_first_of - Find the first character in the string that is in \arg
/// Chars, or npos if not found.
///
/// Note: O(size() + Chars.size())
StringRef::size_type StringRef::find_first_of(StringRef Chars,
size_t From) const {
std::bitset<1 << CHAR_BIT> CharBits;
for (size_type i = 0; i != Chars.size(); ++i)
CharBits.set((unsigned char)Chars[i]);
for (size_type i = std::min(From, Length), e = Length; i != e; ++i)
if (CharBits.test((unsigned char)Data[i]))
return i;
return npos;
}
/// find_first_not_of - Find the first character in the string that is not
/// \arg C or npos if not found.
StringRef::size_type StringRef::find_first_not_of(char C, size_t From) const {
for (size_type i = std::min(From, Length), e = Length; i != e; ++i)
if (Data[i] != C)
return i;
return npos;
}
/// find_first_not_of - Find the first character in the string that is not
/// in the string \arg Chars, or npos if not found.
///
/// Note: O(size() + Chars.size())
StringRef::size_type StringRef::find_first_not_of(StringRef Chars,
size_t From) const {
std::bitset<1 << CHAR_BIT> CharBits;
for (size_type i = 0; i != Chars.size(); ++i)
CharBits.set((unsigned char)Chars[i]);
for (size_type i = std::min(From, Length), e = Length; i != e; ++i)
if (!CharBits.test((unsigned char)Data[i]))
return i;
return npos;
}
/// find_last_of - Find the last character in the string that is in \arg C,
/// or npos if not found.
///
/// Note: O(size() + Chars.size())
StringRef::size_type StringRef::find_last_of(StringRef Chars,
size_t From) const {
std::bitset<1 << CHAR_BIT> CharBits;
for (size_type i = 0; i != Chars.size(); ++i)
CharBits.set((unsigned char)Chars[i]);
for (size_type i = std::min(From, Length) - 1, e = -1; i != e; --i)
if (CharBits.test((unsigned char)Data[i]))
return i;
return npos;
}
/// find_last_not_of - Find the last character in the string that is not
/// \arg C, or npos if not found.
StringRef::size_type StringRef::find_last_not_of(char C, size_t From) const {
for (size_type i = std::min(From, Length) - 1, e = -1; i != e; --i)
if (Data[i] != C)
return i;
return npos;
}
/// find_last_not_of - Find the last character in the string that is not in
/// \arg Chars, or npos if not found.
///
/// Note: O(size() + Chars.size())
StringRef::size_type StringRef::find_last_not_of(StringRef Chars,
size_t From) const {
std::bitset<1 << CHAR_BIT> CharBits;
for (size_type i = 0, e = Chars.size(); i != e; ++i)
CharBits.set((unsigned char)Chars[i]);
for (size_type i = std::min(From, Length) - 1, e = -1; i != e; --i)
if (!CharBits.test((unsigned char)Data[i]))
return i;
return npos;
}
void StringRef::split(SmallVectorImpl<StringRef> &A,
StringRef Separator, int MaxSplit,
bool KeepEmpty) const {
StringRef S = *this;
// Count down from MaxSplit. When MaxSplit is -1, this will just split
// "forever". This doesn't support splitting more than 2^31 times
// intentionally; if we ever want that we can make MaxSplit a 64-bit integer
// but that seems unlikely to be useful.
while (MaxSplit-- != 0) {
size_t Idx = S.find(Separator);
if (Idx == npos)
break;
// Push this split.
if (KeepEmpty || Idx > 0)
A.push_back(S.slice(0, Idx));
// Jump forward.
S = S.slice(Idx + Separator.size(), npos);
}
// Push the tail.
if (KeepEmpty || !S.empty())
A.push_back(S);
}
void StringRef::split(SmallVectorImpl<StringRef> &A, char Separator,
int MaxSplit, bool KeepEmpty) const {
StringRef S = *this;
// Count down from MaxSplit. When MaxSplit is -1, this will just split
// "forever". This doesn't support splitting more than 2^31 times
// intentionally; if we ever want that we can make MaxSplit a 64-bit integer
// but that seems unlikely to be useful.
while (MaxSplit-- != 0) {
size_t Idx = S.find(Separator);
if (Idx == npos)
break;
// Push this split.
if (KeepEmpty || Idx > 0)
A.push_back(S.slice(0, Idx));
// Jump forward.
S = S.slice(Idx + 1, npos);
}
// Push the tail.
if (KeepEmpty || !S.empty())
A.push_back(S);
}
//===----------------------------------------------------------------------===//
// Helpful Algorithms
//===----------------------------------------------------------------------===//
/// count - Return the number of non-overlapped occurrences of \arg Str in
/// the string.
size_t StringRef::count(StringRef Str) const {
size_t Count = 0;
size_t N = Str.size();
if (!N || N > Length)
return 0;
for (size_t i = 0, e = Length - N + 1; i < e;) {
if (substr(i, N).equals(Str)) {
++Count;
i += N;
}
else
++i;
}
return Count;
}
static unsigned GetAutoSenseRadix(StringRef &Str) {
if (Str.empty())
return 10;
if (Str.startswith("0x") || Str.startswith("0X")) {
Str = Str.substr(2);
return 16;
}
if (Str.startswith("0b") || Str.startswith("0B")) {
Str = Str.substr(2);
return 2;
}
if (Str.startswith("0o")) {
Str = Str.substr(2);
return 8;
}
if (Str[0] == '0' && Str.size() > 1 && isDigit(Str[1])) {
Str = Str.substr(1);
return 8;
}
return 10;
}
bool __swift::__runtime::llvm::consumeUnsignedInteger(
StringRef &Str, unsigned Radix, unsigned long long &Result) {
// Autosense radix if not specified.
if (Radix == 0)
Radix = GetAutoSenseRadix(Str);
// Empty strings (after the radix autosense) are invalid.
if (Str.empty()) return true;
// Parse all the bytes of the string given this radix. Watch for overflow.
StringRef Str2 = Str;
Result = 0;
while (!Str2.empty()) {
unsigned CharVal;
if (Str2[0] >= '0' && Str2[0] <= '9')
CharVal = Str2[0] - '0';
else if (Str2[0] >= 'a' && Str2[0] <= 'z')
CharVal = Str2[0] - 'a' + 10;
else if (Str2[0] >= 'A' && Str2[0] <= 'Z')
CharVal = Str2[0] - 'A' + 10;
else
break;
// If the parsed value is larger than the integer radix, we cannot
// consume any more characters.
if (CharVal >= Radix)
break;
// Add in this character.
unsigned long long PrevResult = Result;
Result = Result * Radix + CharVal;
// Check for overflow by shifting back and seeing if bits were lost.
if (Result / Radix < PrevResult)
return true;
Str2 = Str2.substr(1);
}
// We consider the operation a failure if no characters were consumed
// successfully.
if (Str.size() == Str2.size())
return true;
Str = Str2;
return false;
}
bool __swift::__runtime::llvm::consumeSignedInteger(
StringRef &Str, unsigned Radix, long long &Result) {
unsigned long long ULLVal;
// Handle positive strings first.
if (Str.empty() || Str.front() != '-') {
if (consumeUnsignedInteger(Str, Radix, ULLVal) ||
// Check for value so large it overflows a signed value.
(long long)ULLVal < 0)
return true;
Result = ULLVal;
return false;
}
// Get the positive part of the value.
StringRef Str2 = Str.drop_front(1);
if (consumeUnsignedInteger(Str2, Radix, ULLVal) ||
// Reject values so large they'd overflow as negative signed, but allow
// "-0". This negates the unsigned so that the negative isn't undefined
// on signed overflow.
(long long)-ULLVal > 0)
return true;
Str = Str2;
Result = -ULLVal;
return false;
}
/// GetAsUnsignedInteger - Workhorse method that converts a integer character
/// sequence of radix up to 36 to an unsigned long long value.
bool __swift::__runtime::llvm::getAsUnsignedInteger(
StringRef Str, unsigned Radix, unsigned long long &Result) {
if (consumeUnsignedInteger(Str, Radix, Result))
return true;
// For getAsUnsignedInteger, we require the whole string to be consumed or
// else we consider it a failure.
return !Str.empty();
}
bool __swift::__runtime::llvm::getAsSignedInteger(
StringRef Str, unsigned Radix, long long &Result) {
if (consumeSignedInteger(Str, Radix, Result))
return true;
// For getAsSignedInteger, we require the whole string to be consumed or else
// we consider it a failure.
return !Str.empty();
}
// Implementation of StringRef hashing.
hash_code __swift::__runtime::llvm::hash_value(StringRef S) {
return hash_combine_range(S.begin(), S.end());
}