Merge branch 'bc/sha1-256-interop-01'

The beginning of SHA1-SHA256 interoperability work.

* bc/sha1-256-interop-01:
  t1010: use BROKEN_OBJECTS prerequisite
  t: allow specifying compatibility hash
  fsck: consider gpgsig headers expected in tags
  rev-parse: allow printing compatibility hash
  docs: add documentation for loose objects
  docs: improve ambiguous areas of pack format documentation
  docs: reflect actual double signature for tags
  docs: update offset order for pack index v3
  docs: update pack index v3 format
This commit is contained in:
Junio C Hamano
2025-10-22 11:38:58 -07:00
15 changed files with 255 additions and 32 deletions

View File

@@ -34,6 +34,7 @@ MAN5_TXT += gitformat-bundle.adoc
MAN5_TXT += gitformat-chunk.adoc MAN5_TXT += gitformat-chunk.adoc
MAN5_TXT += gitformat-commit-graph.adoc MAN5_TXT += gitformat-commit-graph.adoc
MAN5_TXT += gitformat-index.adoc MAN5_TXT += gitformat-index.adoc
MAN5_TXT += gitformat-loose.adoc
MAN5_TXT += gitformat-pack.adoc MAN5_TXT += gitformat-pack.adoc
MAN5_TXT += gitformat-signature.adoc MAN5_TXT += gitformat-signature.adoc
MAN5_TXT += githooks.adoc MAN5_TXT += githooks.adoc

View File

@@ -10,6 +10,12 @@
`badFilemode`:: `badFilemode`::
(INFO) A tree contains a bad filemode entry. (INFO) A tree contains a bad filemode entry.
`badGpgsig`::
(ERROR) A tag contains a bad (truncated) signature (e.g., `gpgsig`) header.
`badHeaderContinuation`::
(ERROR) A continuation header (such as for `gpgsig`) is unexpectedly truncated.
`badName`:: `badName`::
(ERROR) An author/committer name is empty. (ERROR) An author/committer name is empty.

View File

@@ -324,11 +324,12 @@ The following options are unaffected by `--path-format`:
path of the current directory relative to the top-level path of the current directory relative to the top-level
directory. directory.
--show-object-format[=(storage|input|output)]:: --show-object-format[=(storage|input|output|compat)]::
Show the object format (hash algorithm) used for the repository Show the object format (hash algorithm) used for the repository for storage
for storage inside the `.git` directory, input, or output. For inside the `.git` directory, input, output, or compatibility. For input,
input, multiple algorithms may be printed, space-separated. multiple algorithms may be printed, space-separated. If `compat` is
If not specified, the default is "storage". requested and no compatibility algorithm is enabled, prints an empty line. If
not specified, the default is "storage".
--show-ref-format:: --show-ref-format::
Show the reference storage format used for the repository. Show the reference storage format used for the repository.

View File

@@ -0,0 +1,53 @@
gitformat-loose(5)
==================
NAME
----
gitformat-loose - Git loose object format
SYNOPSIS
--------
[verse]
$GIT_DIR/objects/[0-9a-f][0-9a-f]/*
DESCRIPTION
-----------
Loose objects are how Git stores individual objects, where every object is
written as a separate file.
Over the lifetime of a repository, objects are usually written as loose objects
initially. Eventually, these loose objects will be compacted into packfiles
via repository maintenance to improve disk space usage and speed up the lookup
of these objects.
== Loose objects
Each loose object contains a prefix, followed immediately by the data of the
object. The prefix contains `<type> <size>\0`. `<type>` is one of `blob`,
`tree`, `commit`, or `tag` and `size` is the size of the data (without the
prefix) as a decimal integer expressed in ASCII.
The entire contents, prefix and data concatenated, is then compressed with zlib
and the compressed data is stored in the file. The object ID of the object is
the SHA-1 or SHA-256 (as appropriate) hash of the uncompressed data.
The file for the loose object is stored under the `objects` directory, with the
first two hex characters of the object ID being the directory and the remaining
characters being the file name. This is done to shard the data and avoid too
many files being in one directory, since some file systems perform poorly with
many items in a directory.
As an example, the empty tree contains the data (when uncompressed) `tree 0\0`
and, in a SHA-256 repository, would have the object ID
`6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321` and would be
stored under
`$GIT_DIR/objects/6e/f19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321`.
Similarly, a blob containing the contents `abc` would have the uncompressed
data of `blob 3\0abc`.
GIT
---
Part of the linkgit:git[1] suite

View File

@@ -32,6 +32,10 @@ In a repository using the traditional SHA-1, pack checksums, index checksums,
and object IDs (object names) mentioned below are all computed using SHA-1. and object IDs (object names) mentioned below are all computed using SHA-1.
Similarly, in SHA-256 repositories, these values are computed using SHA-256. Similarly, in SHA-256 repositories, these values are computed using SHA-256.
CRC32 checksums are always computed over the entire packed object, including
the header (n-byte type and length); the base object name or offset, if any;
and the entire compressed object. The CRC32 algorithm used is that of zlib.
== pack-*.pack files have the following format: == pack-*.pack files have the following format:
- A header appears at the beginning and consists of the following: - A header appears at the beginning and consists of the following:
@@ -80,6 +84,16 @@ Valid object types are:
Type 5 is reserved for future expansion. Type 0 is invalid. Type 5 is reserved for future expansion. Type 0 is invalid.
=== Object encoding
Unlike loose objects, packed objects do not have a prefix containing the type,
size, and a NUL byte. These are not necessary because they can be determined by
the n-byte type and length that prefixes the data and so they are omitted from
the compressed and deltified data.
The computation of the object ID still uses this prefix by reconstructing it
from the type and length as needed.
=== Size encoding === Size encoding
This document uses the following "size encoding" of non-negative This document uses the following "size encoding" of non-negative
@@ -92,6 +106,11 @@ values are more significant.
This size encoding should not be confused with the "offset encoding", This size encoding should not be confused with the "offset encoding",
which is also used in this document. which is also used in this document.
When encoding the size of an undeltified object in a pack, the size is that of
the uncompressed raw object. For deltified objects, it is the size of the
uncompressed delta. The base object name or offset is not included in the size
computation.
=== Deltified representation === Deltified representation
Conceptually there are only four object types: commit, tree, tag and Conceptually there are only four object types: commit, tree, tag and

View File

@@ -173,6 +173,7 @@ manpages = {
'gitformat-chunk.adoc' : 5, 'gitformat-chunk.adoc' : 5,
'gitformat-commit-graph.adoc' : 5, 'gitformat-commit-graph.adoc' : 5,
'gitformat-index.adoc' : 5, 'gitformat-index.adoc' : 5,
'gitformat-loose.adoc' : 5,
'gitformat-pack.adoc' : 5, 'gitformat-pack.adoc' : 5,
'gitformat-signature.adoc' : 5, 'gitformat-signature.adoc' : 5,
'githooks.adoc' : 5, 'githooks.adoc' : 5,

View File

@@ -227,9 +227,9 @@ network byte order):
** 4-byte length in bytes of shortened object names. This is the ** 4-byte length in bytes of shortened object names. This is the
shortest possible length needed to make names in the shortened shortest possible length needed to make names in the shortened
object name table unambiguous. object name table unambiguous.
** 4-byte integer, recording where tables relating to this format ** 8-byte integer, recording where tables relating to this format
are stored in this index file, as an offset from the beginning. are stored in this index file, as an offset from the beginning.
* 4-byte offset to the trailer from the beginning of this file. * 8-byte offset to the trailer from the beginning of this file.
* Zero or more additional key/value pairs (4-byte key, 4-byte * Zero or more additional key/value pairs (4-byte key, 4-byte
value). Only one key is supported: 'PSRC'. See the "Loose objects value). Only one key is supported: 'PSRC'. See the "Loose objects
and unreachable objects" section for supported values and how this and unreachable objects" section for supported values and how this
@@ -260,12 +260,10 @@ network byte order):
compressed data to be copied directly from pack to pack during compressed data to be copied directly from pack to pack during
repacking without undetected data corruption. repacking without undetected data corruption.
* A table of 4-byte offset values. For an object in the table of * A table of 4-byte offset values. The index of this table in pack order
sorted shortened object names, the value at the corresponding indicates where that object can be found in the pack file. These are
index in this table indicates where that object can be found in usually 31-bit pack file offsets, but large offsets are encoded as
the pack file. These are usually 31-bit pack file offsets, but an index into the next table with the most significant bit set.
large offsets are encoded as an index into the next table with the
most significant bit set.
* A table of 8-byte offset entries (empty for pack files less than * A table of 8-byte offset entries (empty for pack files less than
2 GiB). Pack files are organized with heavily used objects toward 2 GiB). Pack files are organized with heavily used objects toward
@@ -276,10 +274,14 @@ network byte order):
up to and not including the table of CRC32 values. up to and not including the table of CRC32 values.
- Zero or more NUL bytes. - Zero or more NUL bytes.
- The trailer consists of the following: - The trailer consists of the following:
* A copy of the 20-byte SHA-256 checksum at the end of the * A copy of the full main hash checksum at the end of the
corresponding packfile. corresponding packfile.
* 20-byte SHA-256 checksum of all of the above. * Full main hash checksum of all of the above.
The "full main hash" is a full-length hash of the main (not compatibility)
algorithm in the repository. Thus, if the main algorithm is SHA-256, this is
a 32-byte SHA-256 hash and for SHA-1, it's a 20-byte SHA-1 hash.
Loose object index Loose object index
~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~
@@ -427,17 +429,19 @@ ordinary unsigned commit.
Signed Tags Signed Tags
~~~~~~~~~~~ ~~~~~~~~~~~
We add a new field "gpgsig-sha256" to the tag object format to allow We add new fields "gpgsig" and "gpgsig-sha256" to the tag object format to
signing tags without relying on SHA-1. Its signed payload is the allow signing tags in both formats. The in-body signature is used for the
SHA-256 content of the tag with its gpgsig-sha256 field and "-----BEGIN PGP signature in the current hash algorithm and the header is used for the
SIGNATURE-----" delimited in-body signature removed. signature in the other algorithm. Thus, a dual-signature tag will contain both
an in-body signature and a gpgsig-sha256 header for the SHA-1 format of an
object or both an in-body signature and a gpgsig header for the SHA-256 format
of and object.
This means tags can be signed The signed payload of the tag is the content of the tag in the current
algorithm with both its gpgsig and gpgsig-sha256 fields and
"-----BEGIN PGP SIGNATURE-----" delimited in-body signature removed.
1. using SHA-1 only, as in existing signed tag objects This means tags can be signed using one or both algorithms.
2. using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body
signature.
3. using only SHA-256, by only using the gpgsig-sha256 field.
Mergetag embedding Mergetag embedding
~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~

View File

@@ -1107,11 +1107,20 @@ int cmd_rev_parse(int argc,
const char *val = arg ? arg : "storage"; const char *val = arg ? arg : "storage";
if (strcmp(val, "storage") && if (strcmp(val, "storage") &&
strcmp(val, "compat") &&
strcmp(val, "input") && strcmp(val, "input") &&
strcmp(val, "output")) strcmp(val, "output"))
die(_("unknown mode for --show-object-format: %s"), die(_("unknown mode for --show-object-format: %s"),
arg); arg);
puts(the_hash_algo->name);
if (!strcmp(val, "compat")) {
if (the_repository->compat_hash_algo)
puts(the_repository->compat_hash_algo->name);
else
putchar('\n');
} else {
puts(the_hash_algo->name);
}
continue; continue;
} }
if (!strcmp(arg, "--show-ref-format")) { if (!strcmp(arg, "--show-ref-format")) {

18
fsck.c
View File

@@ -1067,6 +1067,24 @@ int fsck_tag_standalone(const struct object_id *oid, const char *buffer,
else else
ret = fsck_ident(&buffer, oid, OBJ_TAG, options); ret = fsck_ident(&buffer, oid, OBJ_TAG, options);
if (buffer < buffer_end && (skip_prefix(buffer, "gpgsig ", &buffer) || skip_prefix(buffer, "gpgsig-sha256 ", &buffer))) {
eol = memchr(buffer, '\n', buffer_end - buffer);
if (!eol) {
ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_GPGSIG, "invalid format - unexpected end after 'gpgsig' or 'gpgsig-sha256' line");
goto done;
}
buffer = eol + 1;
while (buffer < buffer_end && starts_with(buffer, " ")) {
eol = memchr(buffer, '\n', buffer_end - buffer);
if (!eol) {
ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_HEADER_CONTINUATION, "invalid format - unexpected end in 'gpgsig' or 'gpgsig-sha256' continuation line");
goto done;
}
buffer = eol + 1;
}
}
if (buffer < buffer_end && !starts_with(buffer, "\n")) { if (buffer < buffer_end && !starts_with(buffer, "\n")) {
/* /*
* The verify_headers() check will allow * The verify_headers() check will allow

2
fsck.h
View File

@@ -25,9 +25,11 @@ enum fsck_msg_type {
FUNC(NUL_IN_HEADER, FATAL) \ FUNC(NUL_IN_HEADER, FATAL) \
FUNC(UNTERMINATED_HEADER, FATAL) \ FUNC(UNTERMINATED_HEADER, FATAL) \
/* errors */ \ /* errors */ \
FUNC(BAD_HEADER_CONTINUATION, ERROR) \
FUNC(BAD_DATE, ERROR) \ FUNC(BAD_DATE, ERROR) \
FUNC(BAD_DATE_OVERFLOW, ERROR) \ FUNC(BAD_DATE_OVERFLOW, ERROR) \
FUNC(BAD_EMAIL, ERROR) \ FUNC(BAD_EMAIL, ERROR) \
FUNC(BAD_GPGSIG, ERROR) \
FUNC(BAD_NAME, ERROR) \ FUNC(BAD_NAME, ERROR) \
FUNC(BAD_OBJECT_SHA1, ERROR) \ FUNC(BAD_OBJECT_SHA1, ERROR) \
FUNC(BAD_PACKED_REF_ENTRY, ERROR) \ FUNC(BAD_PACKED_REF_ENTRY, ERROR) \

View File

@@ -11,10 +11,13 @@ test_expect_success setup '
git add "$d" || return 1 git add "$d" || return 1
done && done &&
echo zero >one && echo zero >one &&
git update-index --add --info-only one && if test_have_prereq BROKEN_OBJECTS
git write-tree --missing-ok >tree.missing && then
git ls-tree $(cat tree.missing) >top.missing && git update-index --add --info-only one &&
git ls-tree -r $(cat tree.missing) >all.missing && git write-tree --missing-ok >tree.missing &&
git ls-tree $(cat tree.missing) >top.missing &&
git ls-tree -r $(cat tree.missing) >all.missing
fi &&
echo one >one && echo one >one &&
git add one && git add one &&
git write-tree >tree && git write-tree >tree &&
@@ -53,7 +56,7 @@ test_expect_success 'ls-tree output in wrong order given to mktree (2)' '
test_cmp tree.withsub actual test_cmp tree.withsub actual
' '
test_expect_success 'allow missing object with --missing' ' test_expect_success BROKEN_OBJECTS 'allow missing object with --missing' '
git mktree --missing <top.missing >actual && git mktree --missing <top.missing >actual &&
test_cmp tree.missing actual test_cmp tree.missing actual
' '

View File

@@ -454,6 +454,60 @@ test_expect_success 'tag with NUL in header' '
test_grep "error in tag $tag.*unterminated header: NUL at offset" out test_grep "error in tag $tag.*unterminated header: NUL at offset" out
' '
test_expect_success 'tag accepts gpgsig header even if not validly signed' '
test_oid_cache <<-\EOF &&
header sha1:gpgsig-sha256
header sha256:gpgsig
EOF
header=$(test_oid header) &&
sha=$(git rev-parse HEAD) &&
cat >good-tag <<-EOF &&
object $sha
type commit
tag good
tagger T A Gger <tagger@example.com> 1234567890 -0000
$header -----BEGIN PGP SIGNATURE-----
Not a valid signature
-----END PGP SIGNATURE-----
This is a good tag.
EOF
tag=$(git hash-object --literally -t tag -w --stdin <good-tag) &&
test_when_finished "remove_object $tag" &&
git update-ref refs/tags/good $tag &&
test_when_finished "git update-ref -d refs/tags/good" &&
git -c fsck.extraHeaderEntry=error fsck --tags
'
test_expect_success 'tag rejects invalid headers' '
test_oid_cache <<-\EOF &&
header sha1:gpgsig-sha256
header sha256:gpgsig
EOF
header=$(test_oid header) &&
sha=$(git rev-parse HEAD) &&
cat >bad-tag <<-EOF &&
object $sha
type commit
tag good
tagger T A Gger <tagger@example.com> 1234567890 -0000
$header -----BEGIN PGP SIGNATURE-----
Not a valid signature
-----END PGP SIGNATURE-----
junk
This is a bad tag with junk at the end of the headers.
EOF
tag=$(git hash-object --literally -t tag -w --stdin <bad-tag) &&
test_when_finished "remove_object $tag" &&
git update-ref refs/tags/bad $tag &&
test_when_finished "git update-ref -d refs/tags/bad" &&
test_must_fail git -c fsck.extraHeaderEntry=error fsck --tags 2>out &&
test_grep "error in tag $tag.*invalid format - extra header" out
'
test_expect_success 'cleaned up' ' test_expect_success 'cleaned up' '
git fsck >actual 2>&1 && git fsck >actual 2>&1 &&
test_must_be_empty actual test_must_be_empty actual

View File

@@ -207,6 +207,40 @@ test_expect_success 'rev-parse --show-object-format in repo' '
grep "unknown mode for --show-object-format: squeamish-ossifrage" err grep "unknown mode for --show-object-format: squeamish-ossifrage" err
' '
test_expect_success 'rev-parse --show-object-format in repo with compat mode' '
mkdir repo &&
(
sane_unset GIT_DEFAULT_HASH &&
cd repo &&
git init --object-format=sha256 &&
git config extensions.compatobjectformat sha1 &&
echo sha256 >expect &&
git rev-parse --show-object-format >actual &&
test_cmp expect actual &&
git rev-parse --show-object-format=storage >actual &&
test_cmp expect actual &&
git rev-parse --show-object-format=input >actual &&
test_cmp expect actual &&
git rev-parse --show-object-format=output >actual &&
test_cmp expect actual &&
echo sha1 >expect &&
git rev-parse --show-object-format=compat >actual &&
test_cmp expect actual &&
test_must_fail git rev-parse --show-object-format=squeamish-ossifrage 2>err &&
grep "unknown mode for --show-object-format: squeamish-ossifrage" err
) &&
mkdir repo2 &&
(
sane_unset GIT_DEFAULT_HASH &&
cd repo2 &&
git init --object-format=sha256 &&
echo >expect &&
git rev-parse --show-object-format=compat >actual &&
test_cmp expect actual
)
'
test_expect_success 'rev-parse --show-ref-format' ' test_expect_success 'rev-parse --show-ref-format' '
test_detect_ref_format >expect && test_detect_ref_format >expect &&
git rev-parse --show-ref-format >actual && git rev-parse --show-ref-format >actual &&

View File

@@ -1708,11 +1708,16 @@ test_set_hash () {
# Detect the hash algorithm in use. # Detect the hash algorithm in use.
test_detect_hash () { test_detect_hash () {
case "${GIT_TEST_DEFAULT_HASH:-$GIT_TEST_BUILTIN_HASH}" in case "${GIT_TEST_DEFAULT_HASH:-$GIT_TEST_BUILTIN_HASH}" in
"sha256") *:*)
test_hash_algo="${GIT_TEST_DEFAULT_HASH%%:*}"
test_compat_hash_algo="${GIT_TEST_DEFAULT_HASH##*:}"
test_repo_compat_hash_algo="$test_compat_hash_algo"
;;
sha256)
test_hash_algo=sha256 test_hash_algo=sha256
test_compat_hash_algo=sha1 test_compat_hash_algo=sha1
;; ;;
*) sha1)
test_hash_algo=sha1 test_hash_algo=sha1
test_compat_hash_algo=sha256 test_compat_hash_algo=sha256
;; ;;

View File

@@ -1924,6 +1924,19 @@ test_lazy_prereq DEFAULT_HASH_ALGORITHM '
test_lazy_prereq DEFAULT_REPO_FORMAT ' test_lazy_prereq DEFAULT_REPO_FORMAT '
test_have_prereq SHA1,REFFILES test_have_prereq SHA1,REFFILES
' '
# BROKEN_OBJECTS is a test whether we can write deliberately broken objects and
# expect them to work. When running using SHA-256 mode with SHA-1
# compatibility, we cannot write such objects because there's no SHA-1
# compatibility value for a nonexistent object.
test_lazy_prereq BROKEN_OBJECTS '
! test_have_prereq COMPAT_HASH
'
# COMPAT_HASH is a test if we're operating in a repository with SHA-256 with
# SHA-1 compatibility.
test_lazy_prereq COMPAT_HASH '
test -n "$test_repo_compat_hash_algo"
'
# Ensure that no test accidentally triggers a Git command # Ensure that no test accidentally triggers a Git command
# that runs the actual maintenance scheduler, affecting a user's # that runs the actual maintenance scheduler, affecting a user's