Merge branch 'bc/sha1-256-interop-01'

The beginning of SHA1-SHA256 interoperability work.

* bc/sha1-256-interop-01:
  t1010: use BROKEN_OBJECTS prerequisite
  t: allow specifying compatibility hash
  fsck: consider gpgsig headers expected in tags
  rev-parse: allow printing compatibility hash
  docs: add documentation for loose objects
  docs: improve ambiguous areas of pack format documentation
  docs: reflect actual double signature for tags
  docs: update offset order for pack index v3
  docs: update pack index v3 format
This commit is contained in:
Junio C Hamano
2025-10-22 11:38:58 -07:00
15 changed files with 255 additions and 32 deletions

View File

@@ -34,6 +34,7 @@ MAN5_TXT += gitformat-bundle.adoc
MAN5_TXT += gitformat-chunk.adoc
MAN5_TXT += gitformat-commit-graph.adoc
MAN5_TXT += gitformat-index.adoc
MAN5_TXT += gitformat-loose.adoc
MAN5_TXT += gitformat-pack.adoc
MAN5_TXT += gitformat-signature.adoc
MAN5_TXT += githooks.adoc

View File

@@ -10,6 +10,12 @@
`badFilemode`::
(INFO) A tree contains a bad filemode entry.
`badGpgsig`::
(ERROR) A tag contains a bad (truncated) signature (e.g., `gpgsig`) header.
`badHeaderContinuation`::
(ERROR) A continuation header (such as for `gpgsig`) is unexpectedly truncated.
`badName`::
(ERROR) An author/committer name is empty.

View File

@@ -324,11 +324,12 @@ The following options are unaffected by `--path-format`:
path of the current directory relative to the top-level
directory.
--show-object-format[=(storage|input|output)]::
Show the object format (hash algorithm) used for the repository
for storage inside the `.git` directory, input, or output. For
input, multiple algorithms may be printed, space-separated.
If not specified, the default is "storage".
--show-object-format[=(storage|input|output|compat)]::
Show the object format (hash algorithm) used for the repository for storage
inside the `.git` directory, input, output, or compatibility. For input,
multiple algorithms may be printed, space-separated. If `compat` is
requested and no compatibility algorithm is enabled, prints an empty line. If
not specified, the default is "storage".
--show-ref-format::
Show the reference storage format used for the repository.

View File

@@ -0,0 +1,53 @@
gitformat-loose(5)
==================
NAME
----
gitformat-loose - Git loose object format
SYNOPSIS
--------
[verse]
$GIT_DIR/objects/[0-9a-f][0-9a-f]/*
DESCRIPTION
-----------
Loose objects are how Git stores individual objects, where every object is
written as a separate file.
Over the lifetime of a repository, objects are usually written as loose objects
initially. Eventually, these loose objects will be compacted into packfiles
via repository maintenance to improve disk space usage and speed up the lookup
of these objects.
== Loose objects
Each loose object contains a prefix, followed immediately by the data of the
object. The prefix contains `<type> <size>\0`. `<type>` is one of `blob`,
`tree`, `commit`, or `tag` and `size` is the size of the data (without the
prefix) as a decimal integer expressed in ASCII.
The entire contents, prefix and data concatenated, is then compressed with zlib
and the compressed data is stored in the file. The object ID of the object is
the SHA-1 or SHA-256 (as appropriate) hash of the uncompressed data.
The file for the loose object is stored under the `objects` directory, with the
first two hex characters of the object ID being the directory and the remaining
characters being the file name. This is done to shard the data and avoid too
many files being in one directory, since some file systems perform poorly with
many items in a directory.
As an example, the empty tree contains the data (when uncompressed) `tree 0\0`
and, in a SHA-256 repository, would have the object ID
`6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321` and would be
stored under
`$GIT_DIR/objects/6e/f19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321`.
Similarly, a blob containing the contents `abc` would have the uncompressed
data of `blob 3\0abc`.
GIT
---
Part of the linkgit:git[1] suite

View File

@@ -32,6 +32,10 @@ In a repository using the traditional SHA-1, pack checksums, index checksums,
and object IDs (object names) mentioned below are all computed using SHA-1.
Similarly, in SHA-256 repositories, these values are computed using SHA-256.
CRC32 checksums are always computed over the entire packed object, including
the header (n-byte type and length); the base object name or offset, if any;
and the entire compressed object. The CRC32 algorithm used is that of zlib.
== pack-*.pack files have the following format:
- A header appears at the beginning and consists of the following:
@@ -80,6 +84,16 @@ Valid object types are:
Type 5 is reserved for future expansion. Type 0 is invalid.
=== Object encoding
Unlike loose objects, packed objects do not have a prefix containing the type,
size, and a NUL byte. These are not necessary because they can be determined by
the n-byte type and length that prefixes the data and so they are omitted from
the compressed and deltified data.
The computation of the object ID still uses this prefix by reconstructing it
from the type and length as needed.
=== Size encoding
This document uses the following "size encoding" of non-negative
@@ -92,6 +106,11 @@ values are more significant.
This size encoding should not be confused with the "offset encoding",
which is also used in this document.
When encoding the size of an undeltified object in a pack, the size is that of
the uncompressed raw object. For deltified objects, it is the size of the
uncompressed delta. The base object name or offset is not included in the size
computation.
=== Deltified representation
Conceptually there are only four object types: commit, tree, tag and

View File

@@ -173,6 +173,7 @@ manpages = {
'gitformat-chunk.adoc' : 5,
'gitformat-commit-graph.adoc' : 5,
'gitformat-index.adoc' : 5,
'gitformat-loose.adoc' : 5,
'gitformat-pack.adoc' : 5,
'gitformat-signature.adoc' : 5,
'githooks.adoc' : 5,

View File

@@ -227,9 +227,9 @@ network byte order):
** 4-byte length in bytes of shortened object names. This is the
shortest possible length needed to make names in the shortened
object name table unambiguous.
** 4-byte integer, recording where tables relating to this format
** 8-byte integer, recording where tables relating to this format
are stored in this index file, as an offset from the beginning.
* 4-byte offset to the trailer from the beginning of this file.
* 8-byte offset to the trailer from the beginning of this file.
* Zero or more additional key/value pairs (4-byte key, 4-byte
value). Only one key is supported: 'PSRC'. See the "Loose objects
and unreachable objects" section for supported values and how this
@@ -260,12 +260,10 @@ network byte order):
compressed data to be copied directly from pack to pack during
repacking without undetected data corruption.
* A table of 4-byte offset values. For an object in the table of
sorted shortened object names, the value at the corresponding
index in this table indicates where that object can be found in
the pack file. These are usually 31-bit pack file offsets, but
large offsets are encoded as an index into the next table with the
most significant bit set.
* A table of 4-byte offset values. The index of this table in pack order
indicates where that object can be found in the pack file. These are
usually 31-bit pack file offsets, but large offsets are encoded as
an index into the next table with the most significant bit set.
* A table of 8-byte offset entries (empty for pack files less than
2 GiB). Pack files are organized with heavily used objects toward
@@ -276,10 +274,14 @@ network byte order):
up to and not including the table of CRC32 values.
- Zero or more NUL bytes.
- The trailer consists of the following:
* A copy of the 20-byte SHA-256 checksum at the end of the
* A copy of the full main hash checksum at the end of the
corresponding packfile.
* 20-byte SHA-256 checksum of all of the above.
* Full main hash checksum of all of the above.
The "full main hash" is a full-length hash of the main (not compatibility)
algorithm in the repository. Thus, if the main algorithm is SHA-256, this is
a 32-byte SHA-256 hash and for SHA-1, it's a 20-byte SHA-1 hash.
Loose object index
~~~~~~~~~~~~~~~~~~
@@ -427,17 +429,19 @@ ordinary unsigned commit.
Signed Tags
~~~~~~~~~~~
We add a new field "gpgsig-sha256" to the tag object format to allow
signing tags without relying on SHA-1. Its signed payload is the
SHA-256 content of the tag with its gpgsig-sha256 field and "-----BEGIN PGP
SIGNATURE-----" delimited in-body signature removed.
We add new fields "gpgsig" and "gpgsig-sha256" to the tag object format to
allow signing tags in both formats. The in-body signature is used for the
signature in the current hash algorithm and the header is used for the
signature in the other algorithm. Thus, a dual-signature tag will contain both
an in-body signature and a gpgsig-sha256 header for the SHA-1 format of an
object or both an in-body signature and a gpgsig header for the SHA-256 format
of and object.
This means tags can be signed
The signed payload of the tag is the content of the tag in the current
algorithm with both its gpgsig and gpgsig-sha256 fields and
"-----BEGIN PGP SIGNATURE-----" delimited in-body signature removed.
1. using SHA-1 only, as in existing signed tag objects
2. using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body
signature.
3. using only SHA-256, by only using the gpgsig-sha256 field.
This means tags can be signed using one or both algorithms.
Mergetag embedding
~~~~~~~~~~~~~~~~~~