Docs on how to make backwards-compatible changes using LLVM bitstream

2025-12-14 20:36:38 +01:00 · 2018-10-05 09:56:29 -07:00
parent b20e8e9cfa
commit 76e5488aec
2 changed files with 78 additions and 14 deletions
--- a/docs/StableBitcode.md
+++ b/docs/StableBitcode.md
@@ -0,0 +1,46 @@
+# Making Backwards-Compatible Changes in the LLVM Bitstream Format
+
+Swift uses the [LLVM bitstream][] format for some of its serialization logic. This format was invented as a container for LLVM IR. It is a binary format supporting two basic structures: *blocks,* which define regions of the file, and *records,* which contain data fields that can be up to 64 bits. It has a few nice properties that make it a useful container format for us as well:
+
+- It is easy to skip over an entire block, because the block's length is recorded at its start.
+
+- It is possible to jump to specific offsets *within* a block without having to reparse from the start of the block.
+
+- A format change doesn't immediately invalidate existing bitstream files, because the stream includes layout information for each record.
+
+However, it has some disadvantages as well:
+
+- Each record can only contain one variable-sized entry (either an array or a "blob" of bytes).
+
+- Higher-level features like cross-references or lookup by key have to be built on top of the format, usually in a way that the existing tooling doesn't understand.
+
+You can view the contents of any LLVM bitstream using the `llvm-bcanalyzer` tool's `-dump` option.
+
+[LLVM bitstream]: http://llvm.org/docs/BitCodeFormat.html
+
+
+## Backwards-compatibility
+
+For a format change to be backwards-compatible, we need the v5 tools to be able to read a file generated by the v6 tools. At a high level, this means that whatever data is introduced in v6, it doesn't interfere with what v5 is looking for.
+
+(We also care about *forwards*-compatibility, which says that the v6 tools is able to read a file generated by the v5 tools. This is usually easier to maintain, because the v5 format is already known.)
+
+In practice, there are a few ways to accomplish this with LLVM bitstreams:
+
+- If the deserialization logic is set to skip over any blocks it doesn't understand, a new format can always add new blocks.
+
+- If the deserialization logic is set to skip over any *records* it doesn't understand, a new format can always add new *records.* Be careful, though, of records that are expected to appear immediately after another record---if you put a new record between them, you may break the expectations of older compilers.
+
+- If the deserialization logic always looks for a possible blob entry in records (i.e. passing a StringRef out-parameter to BitstreamCursor's `readRecord`), a new format can add blob data to an existing record that does not have it.
+
+- If the deserialization logic always checks for a minimum number of fields in a record before extracting those fields, or if the only field in a record is blob data, a new format can add new fields to an existing record, as long as they come after any existing non-blob fields.
+
+    Note that the BCRecordLayout DSL expects the number of fields to **match exactly**. If you want to use BCRecordLayout's `readRecord` method, the deserialization logic will have to check that the deserialized data has the correct number of fields ahead of time. If it has more fields, you can make an ArrayRef that slices off the extra ones; if it has fewer, you're reading from an old format and will need to use a different BCRecordLayout, or just read them manually.
+
+    (We could also add more API to BCRecordLayout to make this easier. It's part of LLVM, but it's a part of LLVM originally contributed by Swift folks.)
+
+    Note also that it's still okay to use BCRecordLayout for *serialization.* It's only deserialization where we have to be careful about multiple formats.
+
+Remember that any new data will be *ignored* by the old tools. If it's something that *should* affect how old tools read the file, it must be encoded in an existing field; if that's impossible, you have a backwards-incompatible change and should bump the major version number of the file.
+
+If the existing deserialization logic is already checking for the exact size of a record (and therefore preventing new fields from being added), one trick is to put a second record after the first, and check for its presence in the new version of the tools. As long as the old logic is set up to skip unknown records, this shouldn't cause any problems.
--- a/lib/Serialization/DocFormat.h
+++ b/lib/Serialization/DocFormat.h
@@ -35,29 +35,47 @@ const unsigned char SWIFTDOC_SIGNATURE[] = { 0xE2, 0x9C, 0xA8, 0x07 };

 /// Serialized swiftdoc format major version number.
 ///
-/// Increment this value when making a backwards-incompatible change, which
-/// should be rare. When incrementing this value, reset SWIFTDOC_VERSION_MINOR
-/// to 0.
+/// Increment this value when making a backwards-incompatible change, i.e. where
+/// an \e old compiler will \e not be able to read the new format. This should
+/// be rare. When incrementing this value, reset SWIFTDOC_VERSION_MINOR to 0.
+///
+/// See docs/StableBitcode.md for information on how to make
+/// backwards-compatible changes using the LLVM bitcode format.
 const uint16_t SWIFTDOC_VERSION_MAJOR = 1;

 /// Serialized swiftdoc format minor version number.
 ///
-/// Increment this value when making a backwards-compatible change that might
-/// be interesting to test for. However, if old swiftdoc files are fully
-/// compatible with the new change, you do not need to increment this.
+/// Increment this value when making a backwards-compatible change that might be
+/// interesting to test for. A backwards-compatible change is one where an \e
+/// old compiler can read the new format without any problems (usually by
+/// ignoring new information).
 ///
-/// To ensure that two separate changes don't silently get merged into one
-/// in source control, you should also update the comment to briefly
-/// describe what change you made. The content of this comment isn't important;
-/// it just ensures a conflict if two people change the module format.
-/// Don't worry about adhering to the 80-column limit for this line.
+/// If the \e new compiler can treat the new and old format identically, or if
+/// the presence of a new record, block, or field is sufficient to indicate that
+/// the swiftdoc file is using a new format, it is okay not to increment this
+/// value. However, it may be interesting for a new compiler to treat the \e
+/// absence of information differently for the old and new formats; in this
+/// case, the difference in minor version number can distinguish the two.
+///
+/// The minor version number does not need to be changed simply to track which
+/// compiler generated a swiftdoc file; the full compiler version is already
+/// stored as text and can be checked by running the \c strings command-line
+/// tool on a swiftdoc file.
+///
+/// To ensure that two separate changes don't silently get merged into one in
+/// source control, you should also update the comment to briefly describe what
+/// change you made. The content of this comment isn't important; it just
+/// ensures a conflict if two people change the module format. Don't worry about
+/// adhering to the 80-column limit for this line.
 const uint16_t SWIFTDOC_VERSION_MINOR = 1; // Last change: skipping 0 for testing purposes

 /// The record types within the comment block.
 ///
-/// Be very careful when changing this block; it must remain stable. Adding new
-/// records is okay---they will be ignored---but modifying existing ones must be
-/// done carefully. You may need to update the version when you do so.
+/// Be very careful when changing this block; it must remain
+/// backwards-compatible. Adding new records is okay---they will be ignored---
+/// but modifying existing ones must be done carefully. You may need to update
+/// the version when you do so. See docs/StableBitcode.md for information on how
+/// to make backwards-compatible changes using the LLVM bitcode format.
 ///
 /// \sa COMMENT_BLOCK_ID
 namespace comment_block {