For the options that specifies the output, it should be cache invariant.
Fix the one remaining option that is not correctly labelled and add an
unittest to make sure all the options with output path naming convertion
are correctly marked as CacheInvariant.
rdar://146155049
I hit this in https://github.com/apple/swift/pull/72476. I put the declaration
that hit this in a header, but as I thought about it... there was no real harm
in just fixing the issue and preventing future breakage.
When loading input from CAS, `swift-frontend` relies on the input file
name to determine the type to look from CAS entry. In the case where
file extension is `.private.swiftinterface`, swift mis-identify that as
`.swiftinterface` file and look up the wrong input file. Add a new
file type lookup function that can figure out the type from the full
filename.
Also add few diagnostics during the CAS lookup for the input file to
error out immediately, rather than rely on the lookup failure later.
As of CMake 3.25, there are now global variables `LINUX=1`, `ANDROID=1`,
etc. These conflict with expressions that used these names as unquoted
strings in positions where CMake accepts 'variable|string', for example:
- `if(sdk STREQUAL LINUX)` would fail, because `LINUX` is now defined and
expands to 1, where it would previously coerce to a string.
- `if(${sdk} STREQUAL "LINUX")` would fail if `sdk=LINUX`, because the
left-hand side expands twice.
In this patch, I looked for a number of patterns to fix up, sometimes a
little defensively:
- Quoted right-hand side of `STREQUAL` where I was confident it was
intended to be a string literal.
- Removed manual variable expansion on left-hand side of `STREQUAL`,
`MATCHES` and `IN_LIST` where I was confident it was unintended.
Fixes#65028.
I wanted a bit vector that I could use as a compact sorted
set of enum values: an inline-allocated, fixed-size array
of bits supporting efficient and convenient set operations
and iteration.
The C++ standard library offers std::bitset, but the API
is far from ideal for this purpose. It's positioned as
an abstract bit-vector rather than as a set. To use it
as a set, you have to turn your values into indices, which
for enums means explicitly casting them all in the caller.
There's also no iteration operation, so to find the
elements of the set, you have to iterate over all possible
indices, test whether they're in the set, and (if so)
cast the current index back to the enum. Not only is that
much more awkward than normal iteration, but it's also
substantially less efficient than what you can get by
counting trailing zeroes in a mask.
LLVM and Swift offer a number of other bit vectors, but
they're all dynamically allocated because they're meant
to track arbitrary sets. That's not a non-starter for my
use case, which is in textual serialization and so rather
slow anyway, but it's also not very hard to whip together
the optimal data structure here.
I have committed the cardinal sin of C++ data structure
design and provided the operations as ordinary methods
instead of operators.
This commit adds a new frontend flag that applies debug path prefixing to the
paths serialized in swiftmodule files. This makes it possible to use swiftmodule
files that have been built on different machines by applying the inverse map
when debugging, in a similar fashion to source path prefixing.
The inverse mapping in LLDB will be handled in a follow up PR.
Second pass at #39138
Tests updated to handle windows path separators.
This reverts commit f5aa95b381.
This commit adds a function to remap the clang arguments passed
during compilation. This is intented to be shared across the
Swift compiler and LLDB to apply path remapping for debug info
paths.
The properties of this multimap cache are:
1. Values are stored (inline if Small) in a Vector and our map internally maps
keys to (start, length) of slices of the Vector. This is done instead of
storing arrays refs to ensure that if our array goes from small -> large, we
do not have stale pointers.
2. Values are only allowed to be inserted all at once. This is ok, since this is
a cache.
3. One is not storing individual small vectors in a map (or state storing
SmallVectors). This can inadvertantly add up to using a lot of memory and is
not needed for homogenous data.
I have been using this in a bunch of places in the compiler and rather than
implement it by hand over and over (and maybe messing up), this commit just
commits a correct implementation.
This data structure is a map backed by a vector like data structure. It has two
phases:
1. An insertion phase when the map is mutable and one inserts (key, value) pairs
into the map. These are just appeneded into the storage array.
2. A frozen stage when the map is immutable and one can now perform map queries
on the multimap.
The map transitions from the mutable, thawed phase to the immutable, frozen
phase by performing a stable_sort of its internal storage by only the key. Since
this is a stable_sort, we know that the relative insertion order of values is
preserved if their keys equal. Thus the sorting will have created contiguous
regions in the array of values, all mapped to the same key, that are insertion
order. Thus by finding the lower_bound for a given key, we are guaranteed to get
the first element in that continguous range. We can then do a forward search to
find the end of the region, allowing us to then return an ArrayRef to these
internal values.
The reason why I keep on finding myself using this is that this map enables one
to map a key to an array of values without needing to store small vectors in a
map or use heap allocated memory, all key, value pairs are stored inline (in
potentially a single SmallVector given that one is using SmallFrozenMultiMap).
We have a lot of "transform a range" types already:
llvm::mapped_iterator, swift::TransformRange and
swift::TransformIterator, and swift::ArrayRefView for static
transformations. This gets rid of one more layer without losing
any real functionality.
...by coalescing duplicates and dropping conflicts. Both cases can
happen with "expected-error 2 {{...}}": we might get multiple fix-its
providing the same new message, or one message might have diverged
into two, giving us incompatible changes.
It is more efficient than llvm::AppendingBinaryByteStream if a lot of
small data gets appended to it because it doesn't need to resize its
buffer on each write.
Our libcache implementation of swift::sys::Cache was broken for
ref-counted values (which are used by e.g. the SourceKit ASTManager).
It would always `retain(value)` in `set(key, value)`, but under the hood
libcache shares values, so we would only get one `release(value)` if the
same value was used across multiple keys, or if the same value *and* key
were set multiple times.
This was causing us to never release ASTs cached by SourceKit even when
the underlying libcache purged itself under memory pressure.
rdar://problem/21619189
The difference is that TransformArrayRef stores its function as an std::function
instead of using a template parameter. This is useful in situations where one
wants to define such a type in a header on forward declared pointers. If one had
to define the function to be used as a template parameter, one would have to
define the function or provide a forward declared version
C++ atomic's fetch_sub returns the previous value, where we want to
check the new value. This was causing massive memory leaks in SourceKit.
For ThreadSafeRefCountedBase, just switch to the one in LLVM that's
already correct. We should move the VPTR one to LLVM as well and then
we can get rid of this header.
rdar://problem/27358273
This is an immutable data structure with the following properties:
1. All of the sets are sorted and can be iterated over.
2. It takes in a bump ptr allocator and uses that allocator for all
allocations.
3. All concatenation operations involve only one bump ptr allocation.
4. Since we are only storing pointers, the data structure does not need any
destructors to be invoked to be cleaned up. The bumpptrallocator memory just
needs to be freed.
I am going to use this to improve the compile time performance of ARC.
This commit adds a number of compression routines:
1. A dictionary based compression.
2. Huffman based compression.
3. A compression algorithm for swift names that's based on the other two.
This commit also adds two large autogenerated files: CBCTables.h and HuffTables.h.
These files contain the autogenerated string tables and auto-generated code for
fast compression/decompression. The internal tree data structures are lowered
into code that does the variable length encoding/decoding and searching of
fragments in the codebook. The files were generated by processing the symbols
from several large swift applications (stdlib, unittests, simd, ui app, etc).
The list of the programs is listed as part of the output of the tool in the
header file.
I decided to commit the auto-generated files for two reasons. First, we have a
cyclic dependency problem where we need to analyze the output of the compiler
(swift files) in order to generate the tables. And second, these tables will
become a part of the Swift ABI and should remain constant.
It should be possible to split the code that generates the Trie-based data
structure and auto-generate it as part of the Swift build process.
PointerIntEnum is a more powerful PointerIntPair data structure. It uses
an enum with special cases to understand characteristics of the data and
then uses this information and the some tricks to be able to
represent:
1. Up to tagged bit number of pointer cases. The cases are stored inline.
2. Inline indices up to 4096.
3. Out of line indices > 4096.
It takes advantage of the trick that we use in the runtime already to
distinguish pointers from indices: namely that the zero page on modern
OSes do not allocate the zero page.
I made unittests for all of the operations so it is pretty well tested
out.
I am going to use this in a subsequent commit to compress projection in
the common case (the inline case) down to 1/3 of its size. The reason
why the inline case is common is that in most cases where projection is
used it will be targeting relative offsets in an array which are not
likely to be greater than a page. The mallocing of memory just enables
us to degrade gracefully.
a ternary tree with a fixed-length per-node inline key buffer.
I plan to use this for metadata path caches, where it's useful to
be able to quickly find the most-derived point along a path that
you've already cached, but it should be useful for other things
in the compiler as well, like function-with-argument-label
lookups and possibly code completion.
This is quite a bit more space-efficient (and somewhat faster)
than doing scans after a lower_bound on a std::map<std::string, T>.
I haven't implemented balancing yet, and I don't need delete at
all for metadata paths, so I don't plan to work on that.
Swift SVN r32453
sequence which can be read with a forward iterator.
This will be useful for storing access paths to metadata or
protocol conformance values, which are typically very short.
Now with a fix to directly include <climits> for CHAR_BIT.
This was being transitively included on Darwin, but that's
not portable.
Swift SVN r29485