mirror of
https://github.com/git/git.git
synced 2025-12-12 20:36:24 +01:00
Merge branch 'bc/sha1-256-interop-01'
The beginning of SHA1-SHA256 interoperability work. * bc/sha1-256-interop-01: t1010: use BROKEN_OBJECTS prerequisite t: allow specifying compatibility hash fsck: consider gpgsig headers expected in tags rev-parse: allow printing compatibility hash docs: add documentation for loose objects docs: improve ambiguous areas of pack format documentation docs: reflect actual double signature for tags docs: update offset order for pack index v3 docs: update pack index v3 format
This commit is contained in:
@@ -34,6 +34,7 @@ MAN5_TXT += gitformat-bundle.adoc
|
||||
MAN5_TXT += gitformat-chunk.adoc
|
||||
MAN5_TXT += gitformat-commit-graph.adoc
|
||||
MAN5_TXT += gitformat-index.adoc
|
||||
MAN5_TXT += gitformat-loose.adoc
|
||||
MAN5_TXT += gitformat-pack.adoc
|
||||
MAN5_TXT += gitformat-signature.adoc
|
||||
MAN5_TXT += githooks.adoc
|
||||
|
||||
@@ -10,6 +10,12 @@
|
||||
`badFilemode`::
|
||||
(INFO) A tree contains a bad filemode entry.
|
||||
|
||||
`badGpgsig`::
|
||||
(ERROR) A tag contains a bad (truncated) signature (e.g., `gpgsig`) header.
|
||||
|
||||
`badHeaderContinuation`::
|
||||
(ERROR) A continuation header (such as for `gpgsig`) is unexpectedly truncated.
|
||||
|
||||
`badName`::
|
||||
(ERROR) An author/committer name is empty.
|
||||
|
||||
|
||||
@@ -324,11 +324,12 @@ The following options are unaffected by `--path-format`:
|
||||
path of the current directory relative to the top-level
|
||||
directory.
|
||||
|
||||
--show-object-format[=(storage|input|output)]::
|
||||
Show the object format (hash algorithm) used for the repository
|
||||
for storage inside the `.git` directory, input, or output. For
|
||||
input, multiple algorithms may be printed, space-separated.
|
||||
If not specified, the default is "storage".
|
||||
--show-object-format[=(storage|input|output|compat)]::
|
||||
Show the object format (hash algorithm) used for the repository for storage
|
||||
inside the `.git` directory, input, output, or compatibility. For input,
|
||||
multiple algorithms may be printed, space-separated. If `compat` is
|
||||
requested and no compatibility algorithm is enabled, prints an empty line. If
|
||||
not specified, the default is "storage".
|
||||
|
||||
--show-ref-format::
|
||||
Show the reference storage format used for the repository.
|
||||
|
||||
53
Documentation/gitformat-loose.adoc
Normal file
53
Documentation/gitformat-loose.adoc
Normal file
@@ -0,0 +1,53 @@
|
||||
gitformat-loose(5)
|
||||
==================
|
||||
|
||||
NAME
|
||||
----
|
||||
gitformat-loose - Git loose object format
|
||||
|
||||
|
||||
SYNOPSIS
|
||||
--------
|
||||
[verse]
|
||||
$GIT_DIR/objects/[0-9a-f][0-9a-f]/*
|
||||
|
||||
DESCRIPTION
|
||||
-----------
|
||||
|
||||
Loose objects are how Git stores individual objects, where every object is
|
||||
written as a separate file.
|
||||
|
||||
Over the lifetime of a repository, objects are usually written as loose objects
|
||||
initially. Eventually, these loose objects will be compacted into packfiles
|
||||
via repository maintenance to improve disk space usage and speed up the lookup
|
||||
of these objects.
|
||||
|
||||
== Loose objects
|
||||
|
||||
Each loose object contains a prefix, followed immediately by the data of the
|
||||
object. The prefix contains `<type> <size>\0`. `<type>` is one of `blob`,
|
||||
`tree`, `commit`, or `tag` and `size` is the size of the data (without the
|
||||
prefix) as a decimal integer expressed in ASCII.
|
||||
|
||||
The entire contents, prefix and data concatenated, is then compressed with zlib
|
||||
and the compressed data is stored in the file. The object ID of the object is
|
||||
the SHA-1 or SHA-256 (as appropriate) hash of the uncompressed data.
|
||||
|
||||
The file for the loose object is stored under the `objects` directory, with the
|
||||
first two hex characters of the object ID being the directory and the remaining
|
||||
characters being the file name. This is done to shard the data and avoid too
|
||||
many files being in one directory, since some file systems perform poorly with
|
||||
many items in a directory.
|
||||
|
||||
As an example, the empty tree contains the data (when uncompressed) `tree 0\0`
|
||||
and, in a SHA-256 repository, would have the object ID
|
||||
`6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321` and would be
|
||||
stored under
|
||||
`$GIT_DIR/objects/6e/f19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321`.
|
||||
|
||||
Similarly, a blob containing the contents `abc` would have the uncompressed
|
||||
data of `blob 3\0abc`.
|
||||
|
||||
GIT
|
||||
---
|
||||
Part of the linkgit:git[1] suite
|
||||
@@ -32,6 +32,10 @@ In a repository using the traditional SHA-1, pack checksums, index checksums,
|
||||
and object IDs (object names) mentioned below are all computed using SHA-1.
|
||||
Similarly, in SHA-256 repositories, these values are computed using SHA-256.
|
||||
|
||||
CRC32 checksums are always computed over the entire packed object, including
|
||||
the header (n-byte type and length); the base object name or offset, if any;
|
||||
and the entire compressed object. The CRC32 algorithm used is that of zlib.
|
||||
|
||||
== pack-*.pack files have the following format:
|
||||
|
||||
- A header appears at the beginning and consists of the following:
|
||||
@@ -80,6 +84,16 @@ Valid object types are:
|
||||
|
||||
Type 5 is reserved for future expansion. Type 0 is invalid.
|
||||
|
||||
=== Object encoding
|
||||
|
||||
Unlike loose objects, packed objects do not have a prefix containing the type,
|
||||
size, and a NUL byte. These are not necessary because they can be determined by
|
||||
the n-byte type and length that prefixes the data and so they are omitted from
|
||||
the compressed and deltified data.
|
||||
|
||||
The computation of the object ID still uses this prefix by reconstructing it
|
||||
from the type and length as needed.
|
||||
|
||||
=== Size encoding
|
||||
|
||||
This document uses the following "size encoding" of non-negative
|
||||
@@ -92,6 +106,11 @@ values are more significant.
|
||||
This size encoding should not be confused with the "offset encoding",
|
||||
which is also used in this document.
|
||||
|
||||
When encoding the size of an undeltified object in a pack, the size is that of
|
||||
the uncompressed raw object. For deltified objects, it is the size of the
|
||||
uncompressed delta. The base object name or offset is not included in the size
|
||||
computation.
|
||||
|
||||
=== Deltified representation
|
||||
|
||||
Conceptually there are only four object types: commit, tree, tag and
|
||||
|
||||
@@ -173,6 +173,7 @@ manpages = {
|
||||
'gitformat-chunk.adoc' : 5,
|
||||
'gitformat-commit-graph.adoc' : 5,
|
||||
'gitformat-index.adoc' : 5,
|
||||
'gitformat-loose.adoc' : 5,
|
||||
'gitformat-pack.adoc' : 5,
|
||||
'gitformat-signature.adoc' : 5,
|
||||
'githooks.adoc' : 5,
|
||||
|
||||
@@ -227,9 +227,9 @@ network byte order):
|
||||
** 4-byte length in bytes of shortened object names. This is the
|
||||
shortest possible length needed to make names in the shortened
|
||||
object name table unambiguous.
|
||||
** 4-byte integer, recording where tables relating to this format
|
||||
** 8-byte integer, recording where tables relating to this format
|
||||
are stored in this index file, as an offset from the beginning.
|
||||
* 4-byte offset to the trailer from the beginning of this file.
|
||||
* 8-byte offset to the trailer from the beginning of this file.
|
||||
* Zero or more additional key/value pairs (4-byte key, 4-byte
|
||||
value). Only one key is supported: 'PSRC'. See the "Loose objects
|
||||
and unreachable objects" section for supported values and how this
|
||||
@@ -260,12 +260,10 @@ network byte order):
|
||||
compressed data to be copied directly from pack to pack during
|
||||
repacking without undetected data corruption.
|
||||
|
||||
* A table of 4-byte offset values. For an object in the table of
|
||||
sorted shortened object names, the value at the corresponding
|
||||
index in this table indicates where that object can be found in
|
||||
the pack file. These are usually 31-bit pack file offsets, but
|
||||
large offsets are encoded as an index into the next table with the
|
||||
most significant bit set.
|
||||
* A table of 4-byte offset values. The index of this table in pack order
|
||||
indicates where that object can be found in the pack file. These are
|
||||
usually 31-bit pack file offsets, but large offsets are encoded as
|
||||
an index into the next table with the most significant bit set.
|
||||
|
||||
* A table of 8-byte offset entries (empty for pack files less than
|
||||
2 GiB). Pack files are organized with heavily used objects toward
|
||||
@@ -276,10 +274,14 @@ network byte order):
|
||||
up to and not including the table of CRC32 values.
|
||||
- Zero or more NUL bytes.
|
||||
- The trailer consists of the following:
|
||||
* A copy of the 20-byte SHA-256 checksum at the end of the
|
||||
* A copy of the full main hash checksum at the end of the
|
||||
corresponding packfile.
|
||||
|
||||
* 20-byte SHA-256 checksum of all of the above.
|
||||
* Full main hash checksum of all of the above.
|
||||
|
||||
The "full main hash" is a full-length hash of the main (not compatibility)
|
||||
algorithm in the repository. Thus, if the main algorithm is SHA-256, this is
|
||||
a 32-byte SHA-256 hash and for SHA-1, it's a 20-byte SHA-1 hash.
|
||||
|
||||
Loose object index
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
@@ -427,17 +429,19 @@ ordinary unsigned commit.
|
||||
|
||||
Signed Tags
|
||||
~~~~~~~~~~~
|
||||
We add a new field "gpgsig-sha256" to the tag object format to allow
|
||||
signing tags without relying on SHA-1. Its signed payload is the
|
||||
SHA-256 content of the tag with its gpgsig-sha256 field and "-----BEGIN PGP
|
||||
SIGNATURE-----" delimited in-body signature removed.
|
||||
We add new fields "gpgsig" and "gpgsig-sha256" to the tag object format to
|
||||
allow signing tags in both formats. The in-body signature is used for the
|
||||
signature in the current hash algorithm and the header is used for the
|
||||
signature in the other algorithm. Thus, a dual-signature tag will contain both
|
||||
an in-body signature and a gpgsig-sha256 header for the SHA-1 format of an
|
||||
object or both an in-body signature and a gpgsig header for the SHA-256 format
|
||||
of and object.
|
||||
|
||||
This means tags can be signed
|
||||
The signed payload of the tag is the content of the tag in the current
|
||||
algorithm with both its gpgsig and gpgsig-sha256 fields and
|
||||
"-----BEGIN PGP SIGNATURE-----" delimited in-body signature removed.
|
||||
|
||||
1. using SHA-1 only, as in existing signed tag objects
|
||||
2. using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body
|
||||
signature.
|
||||
3. using only SHA-256, by only using the gpgsig-sha256 field.
|
||||
This means tags can be signed using one or both algorithms.
|
||||
|
||||
Mergetag embedding
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Reference in New Issue
Block a user