mirror of
https://github.com/apple/swift.git
synced 2025-12-14 20:36:38 +01:00
The `expand` opcode was being decoded incorrectly in the case where we were trying to expand prefixes with codes of 64 or above. rdar://144497804
231 lines
7.5 KiB
Markdown
231 lines
7.5 KiB
Markdown
Compact ImageMap Format
|
|
=======================
|
|
|
|
A process' address space contains (among other things) the set of
|
|
dynamically loaded images that have been mapped into that address
|
|
space. When generating crash logs or symbolicating backtraces, we
|
|
need to be able to capture and potentially store the list of images
|
|
that has been loaded, as well as some of the attributes of those
|
|
images, including each image's
|
|
|
|
- Path
|
|
- Build ID (aka UUID)
|
|
- Base address
|
|
- End-of-text address
|
|
|
|
Compact ImageMap Format (CIF) is a binary format for holding this
|
|
information.
|
|
|
|
### General Format
|
|
|
|
Compact ImageMap Format data is byte aligned and starts with an
|
|
information byte:
|
|
|
|
~~~
|
|
7 6 5 4 3 2 1 0
|
|
┌───────────────────────┬───────┐
|
|
│ version │ size │
|
|
└───────────────────────┴───────┘
|
|
~~~
|
|
|
|
The `version` field identifies the version of CIF that is in use; this
|
|
document describes version `0`. The `size` field is encoded as
|
|
follows:
|
|
|
|
| `size` | Machine word size |
|
|
| :----: | :---------------- |
|
|
| 00 | 16-bit |
|
|
| 01 | 32-bit |
|
|
| 10 | 64-bit |
|
|
| 11 | Reserved |
|
|
|
|
This is followed immediately by a field containing the name of the platform
|
|
that generated this image map. This field consists of a single byte length
|
|
followed by a UTF-8 string of that length.
|
|
|
|
After that is a field encoding the number of images in the image map;
|
|
this field is encoded as a sequence of bytes, each holding seven bits
|
|
of data, with the top bit clear for the final byte. The most
|
|
significant byte is the first. e.g.
|
|
|
|
| `count` | Encoding |
|
|
| ------: | :---------- |
|
|
| 0 | 00 |
|
|
| 1 | 01 |
|
|
| 127 | 7f |
|
|
| 128 | 81 00 |
|
|
| 129 | 81 01 |
|
|
| 700 | 85 3c |
|
|
| 1234 | 89 52 |
|
|
| 16384 | 81 80 00 |
|
|
| 65535 | 83 ff 7f |
|
|
| 2097152 | 81 80 80 00 |
|
|
|
|
This in turn is followed by the list of images, stored in order of
|
|
increasing base address. For each image, we start with a header byte:
|
|
|
|
~~~
|
|
7 6 5 4 3 2 1 0
|
|
┌───┬───┬───────────┬───────────┐
|
|
│ r │ 0 │ acount │ ecount │
|
|
└───┴───┴───────────┴───────────┘
|
|
~~~
|
|
|
|
If `r` is set, then the base address is understood to be relative to
|
|
the previously computed base address.
|
|
|
|
This byte is followed by `acount + 1` bytes of base address, then
|
|
`ecount + 1` bytes of offset to the end of text.
|
|
|
|
Following this is an encoded count of bytes in the build ID,
|
|
encoded using the 7-bit scheme we used to encode the image count, and
|
|
then after that come the build ID bytes themselves.
|
|
|
|
Finally, we encode the path string using the scheme below.
|
|
|
|
### String Encoding
|
|
|
|
Image paths contain a good deal of redundancy; paths are therefore
|
|
encoded using a prefix compression scheme. The basic idea here is
|
|
that while generating or reading the data, we maintain a mapping from
|
|
small integers to path prefix segments.
|
|
|
|
The mapping is initialised with the following fixed list that never
|
|
need to be stored in CIF data:
|
|
|
|
| code | Path prefix |
|
|
| :--: | :---------------------------------- |
|
|
| 0 | `/lib` |
|
|
| 1 | `/usr/lib` |
|
|
| 2 | `/usr/local/lib` |
|
|
| 3 | `/opt/lib` |
|
|
| 4 | `/System/Library/Frameworks` |
|
|
| 5 | `/System/Library/PrivateFrameworks` |
|
|
| 6 | `/System/iOSSupport` |
|
|
| 7 | `/Library/Frameworks` |
|
|
| 8 | `/System/Applications` |
|
|
| 9 | `/Applications` |
|
|
| 10 | `C:\Windows\System32` |
|
|
| 11 | `C:\Program Files` |
|
|
|
|
Codes below 32 are reserved for future expansion of the fixed list.
|
|
|
|
Strings are encoded as a sequence of bytes, as follows:
|
|
|
|
| `opcode` | Mnemonic | Meaning |
|
|
| :--------: | :-------- | :---------------------------------------- |
|
|
| `00000000` | `end` | Marks the end of the string |
|
|
| `00xxxxxx` | `str` | Raw string data |
|
|
| `01xxxxxx` | `framewk` | Names a framework |
|
|
| `1exxxxxx` | `expand` | Identifies a prefix in the table |
|
|
|
|
#### `end`
|
|
|
|
##### Encoding
|
|
|
|
~~~
|
|
7 6 5 4 3 2 1 0
|
|
┌───────────────────────────────┐
|
|
│ 0 0 0 0 0 0 0 0 │ end
|
|
└───────────────────────────────┘
|
|
~~~
|
|
|
|
#### Meaning
|
|
|
|
Marks the end of the string
|
|
|
|
#### `str`
|
|
|
|
##### Encoding
|
|
|
|
~~~
|
|
7 6 5 4 3 2 1 0
|
|
┌───────┬───────────────────────┐
|
|
│ 0 0 │ count │ str
|
|
└───────┴───────────────────────┘
|
|
~~~
|
|
|
|
##### Meaning
|
|
|
|
The next `count` bytes are included in the string verbatim.
|
|
Additionally, all path prefixes of this string data will be added to
|
|
the current prefix table. For instance, if the string data is
|
|
`/swift/linux/x86_64/libfoo.so`, then the prefix `/swift` will be
|
|
assigned the next available code, `/swift/linux` the code after that,
|
|
and `/swift/linux/x86_64` the code following that one.
|
|
|
|
#### `framewk`
|
|
|
|
##### Encoding
|
|
|
|
~~~
|
|
7 6 5 4 3 2 1 0
|
|
┌───────┬───────────────────────┐
|
|
│ 0 1 │ count │ framewk
|
|
└───────┴───────────────────────┘
|
|
~~~
|
|
|
|
##### Meaning
|
|
|
|
The next byte is a version character (normally `A`, but some
|
|
frameworks use higher characters), after which there are `count + 1`
|
|
bytes of name.
|
|
|
|
This is expanded using the pattern
|
|
`/<name>.framework/Versions/<version>/<name>`. This also marks the
|
|
end of the string.
|
|
|
|
#### `expand`
|
|
|
|
##### Encoding
|
|
|
|
~~~
|
|
7 6 5 4 3 2 1 0
|
|
┌───┬───┬───────────────────────┐
|
|
│ 1 │ e │ code │ expand
|
|
└───┴───┴───────────────────────┘
|
|
~~~
|
|
|
|
##### Meaning
|
|
|
|
If `e` is `0`, `code` is the index into the prefix table for the
|
|
prefix that should be appended to the string at this point.
|
|
|
|
If `e` is `1`, this opcode is followed by `code + 1` bytes that give
|
|
a value `v` such that `v + 64` is the index into the prefix table for
|
|
the prefix that should be appended to the string at this point.
|
|
|
|
#### Example
|
|
|
|
Let's say we wish to encode the following strings:
|
|
|
|
/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit
|
|
/System/Library/Frameworks/Photos.framework/Versions/A/Photos
|
|
/usr/lib/libobjc.A.dylib
|
|
/usr/lib/libz.1.dylib
|
|
/usr/lib/swift/libswiftCore.dylib
|
|
/usr/lib/libSystem.B.dylib
|
|
/usr/lib/libc++.1.dylib
|
|
|
|
We would encode
|
|
|
|
<84> <45> CAppKit <00>
|
|
|
|
We then follow with
|
|
|
|
<84> <45> APhotos <00>
|
|
|
|
Next we have
|
|
|
|
<81> <10> /libobjc.A.dylib <00>
|
|
<81> <0d> /libz.1.dylib <00>
|
|
<81> <19> /swift/libswiftCore.dylib <00>
|
|
|
|
assigning code 32 to `/swift`, then
|
|
|
|
<81> <12> /libSystem.B.dylib <00>
|
|
<81> <0f> /libc++.1.dylib <00>
|
|
|
|
In total the original data would have taken up 256 bytes. Instead, we
|
|
have used 122 bytes, a saving of over 50%.
|