xdl_hash_record_verbatim uses modified djb2 hash with XOR instead of ADD
for combining. The ADD-based variant is used as the basis of the modern
("GNU") symbol lookup scheme in ELF. Glibc dynamic loader received an
optimized version of this hash function thanks to Noah Goldstein [1].
Switch xdl_hash_record_verbatim to additive hashing and implement
an optimized loop following the scheme suggested by Noah.
Timing 'git log --oneline --shortstat v2.0.0..v2.5.0' under perf, I got
version | cycles, bn | instructions, bn
---------------------------------------
A 6.38 11.3
B 6.21 10.89
C 5.80 9.95
D 5.83 8.74
---------------------------------------
A: baseline (git master at e4ef0485fd)
B: plus 'xdiff: refactor xdl_hash_record()'
C: and plus this patch
D: with 'xdiff: use xxhash' by Phillip Wood
The resulting speedup for xdl_hash_record_verbatim itself is about 1.5x.
[1] https://inbox.sourceware.org/libc-alpha/20220519221803.57957-6-goldstein.w.n@gmail.com/
Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the --format flag to git-repo-info. By using this flag, the users
can choose the format for obtaining the data they requested.
Given that this command can be used for generating input for other
applications and for being read by end users, it requires at least two
formats: one for being read by humans and other for being read by
machines. Some other Git commands also have two output formats, notably
git-config which was the inspiration for the two formats that were
chosen here:
- keyvalue, where the retrieved data is printed one per line, using =
for delimiting the key and the value. This is the default format,
targeted for end users.
- nul, where the retrieved data is separated by NUL characters, using
the newline character for delimiting the key and the value. This
format is targeted for being read by machines.
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Justin Tobler <jltobler@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is part of the series that introduces the new subcommand
git-repo-info.
The flag `--is-shallow-repository` from git-rev-parse is used for
retrieving whether the repository is shallow. This way, it is used for
querying repository metadata, fitting in the purpose of git-repo-info.
Then, add a new field `layout.shallow` to the git-repo-info subcommand
containing that information.
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Justin Tobler <jltobler@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is part of the series that introduces the new subcommand
git-repo-info.
The flag --is-bare-repository from git-rev-parse is used for retrieving
whether the current repository is bare. This way, it is used for
querying repository metadata, fitting in the purpose of git-repo-info.
Then, add a new field layout.bare to the git-repo-info subcommand
containing that information.
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Justin Tobler <jltobler@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is part of the series that introduces the new subcommand
git-repo-info.
The flag `--show-ref-format` from git-rev-parse is used for retrieving
the reference format (i.e. `files` or `reftable`). This way, it is
used for querying repository metadata, fitting in the purpose of
git-repo-info.
Add a new field `references.format` to the repo-info subcommand
containing that information.
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Justin Tobler <jltobler@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, `git rev-parse` covers a wide range of functionality not
directly related to parsing revisions, as its name suggests. Over time,
many features like parsing datestrings, options, paths, and others
were added to it because there wasn't a more appropriate command
to place them.
Create a new Git command called `repo`. `git repo` will be the main
command for obtaining the information about a repository (such as
metadata and metrics).
Also declare a subcommand for `repo` called `info`. `git repo info`
will bring the functionality of retrieving repository-related
information currently returned by `rev-parse`.
Add the required documentation and build changes to enable usage of
this subcommand.
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Justin Tobler <jltobler@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As part of 9bbc981c6f (t/unit-tests: finalize migration of
reftable-related tests, 2025-07-24), the explicit list of
`UNIT_TEST_PROGRAMS` was turned into a wildcard pattern-derived list.
Let's do the same in the CMake definition.
This fixes build errors with symptoms like this:
CMake Error at CMakeLists.txt:132 (string):
string sub-command REPLACE requires at least four arguments.
Call Stack (most recent call first):
CMakeLists.txt:1037 (parse_makefile_for_scripts)
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Edit: We are continuing to follow the existing PO file convention, which
includes filenames but strips out line numbers from the file-location
comments. This standard was set by our former lead, Jordi Mas, and we
are maintaining it for project-wide consistency.
Signed-off-by: Mikel Forcada <mikel.forcada@gmail.com>
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Many of the commit-graph related functions take in both a repository and
the object database source (directly or via `struct commit_graph`) for
which we are supposed to load such a commit-graph. In the best case this
information is simply redundant as the source already contains a
reference to its owning object database, which in turn has a reference
to its repository. In the worst case this information could even
mismatch when passing in a source that doesn't belong to the same
repository.
Refactor the code so that we only pass in the object database source in
those cases.
There is one exception though, namely `load_commit_graph_chain_fd_st()`,
which is responsible for loading a commit-graph chain. It is expected
that parts of the commit-graph chain aren't located in the same object
source as the chain file itself, but in a different one. Consequently,
this function doesn't work on the source level but on the database level
instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's still a bunch of uses of `the_repository` in "commit-graph.c",
which we want to stop using due to it being a global variable. Refactor
the code to stop using `the_repository` in favor of the repository
provided via the calling context.
This allows us to drop the `USE_THE_REPOSITORY_VARIABLE` macro.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_hash_algo` as it implicitly relies on `the_repository`.
Instead, we either use the hash algo provided via the context or, if
there is no such hash algo, we use `the_repository` explicitly. Such
uses will be removed in subsequent commits.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor `parse_commit_graph()` so that it takes a repository instead of
taking repository settings. On the one hand this allows us to get rid of
instances where we access `the_hash_algo` by using the repository's hash
algorithm instead. On the other hand it also allows us to move the call
of `prepare_repo_settings()` into the function itself.
Note that there's one small catch, as the commit-graph fuzzer calls this
function directly without having a fully functional repository at hand.
And while the fuzzer already initializes `the_repository` with relevant
info, the call to `prepare_repo_settings()` would fail because we don't
have a fully-initialized repository.
Work around the issue by also settings `settings.initialized` to pretend
that we've already read the settings.
While at it, remove the redundant `parse_commit_graph()` declaration in
the fuzzer. It was added together with aa658574bf (commit-graph, fuzz:
add fuzzer for commit-graph, 2019-01-15), but as we also declared the
same function in "commit-graph.h" it wasn't ever needed.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit-graph stores the length of the hash algorithm it uses. In
subsequent commits we'll need to pass the whole hash algorithm around
though, which we currently don't have access to.
Refactor the code so that we store the hash algorithm instead of only
its size.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have two macros `GRAPH_DATA_WIDTH` and `GRAPH_MIN_SIZE` that compute
hash-dependent sizes. They do so by using the global `the_hash_algo`
variable though, which we want to get rid of over time.
Convert these macros into functions that accept the hash algorithm as
input parameter. Adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
0bdaa12169 (git-count-objects.txt: describe each line in -v output,
2013-02-08) forgot to include `packs`.
Signed-off-by: Daniele Sassoli <danielesassoli@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When you have two or more objects with object names that share more
than 32 letters in an SHA-1 repository, find_unique_abbrev() fails
to show disambiguation.
To see how many leading letters of a given full object name is
sufficiently unambiguous, the algorithm starts from a initial
length, guessed based on the estimated number of objects in the
repository, and see if another object that shares the prefix, and
keeps extending the abbreviation. The loop stops at GIT_MAX_RAWSZ,
which is counted as the number of bytes, since 5b20ace6 (sha1_name:
unroll len loop in find_unique_abbrev_r(), 2017-10-08); before that
change, it extended up to GIT_SHA1_HEXSZ, which meant to stop at the
end of hexadecimal SHA-1 object name.
Because the hexadecimal object name passed to the function is
NUL-terminated, and this fact is used to correctly terminate the
loop that scans for the first difference earlier in the function,
use it to make sure we do not increment the .cur_len member beyond
the end of the string.
Noticed-by: Jon Forrest <nobozo@gmail.com>
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some of the editor tests manipulate the environment or config in ways
that affect future tests, but those modifications are visible to future
tests and create a footgun for them.
Use test_config, subshells, single-command environment overrides, and
test helpers to automatically undo environment and config modifications
once finished.
Best-viewed-with: --ignore-all-space
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tests in t7005 mask Git error codes and do not use our nice test
helpers. Improve that, move some code into the setup test, and drop a
few old-style blank lines while at it.
Best-viewed-with: --ignore-all-space
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git imap-send` was built on the idea of copying emails to an IMAP folder
like drafts, and sending them later using an email client. Currently
the only way to do it is by piping output of `git format-patch` to IMAP
send.
Add another way to do it by using `git send-email` with the
`--use-imap-only` or `sendmail.useImapOnly` option. This allows users to
use the advanced features of `git send-email` like tweaking Cc: list
programmatically, compose the cover letter, etc. and then send the well
formatted emails to an IMAP folder using `git imap-send`.
While at it, use `` instead of '' for --smtp-encryption ssl in help
section of `git send-email`.
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some email providers like Apple iCloud Mail do not support sending a copy
of sent emails to the "Sent" folder if SMTP server is used. As a
workaround, various email clients like Thunderbird which rely on SMTP,
use IMAP to send a copy of sent emails to the "Sent" folder. Something
similar can be done if sending emails via `git send-email`, by using
the `git imap-send` command to send a copy of the sent email to an IMAP
folder specified by the user.
Add this functionality to `git send-email` by introducing a new
configuration variable `sendemail.imapfolder` and command line option
`--imap-folder` which specifies the IMAP folder to send a copy of the
sent emails to. If specified, a copy of the sent emails will be sent
by piping the emails to `git imap-send` command, after all emails are
sent via SMTP and the SMTP server has been closed.
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The synopsis section has an extra closing bracket, like this:
[--filter=<filter>] [--also-filter-submodules]]
The extra one is not the one at the end of this line; it is the one
after "...=<filter>".
The "--also-filter-submodules" option was added by f05da2b4 (clone,
submodule: pass partial clone filters to submodules, 2022-02-04).
Because it makes sense only when used with the "--filter=<filter>"
option, these two options are enclosed in a pair of brackets. The
extra one was added by 76880f05 (doc: git-clone: apply new
documentation formatting guidelines, 2024-03-29) by mistake.
Remove the extra and incorrect closing bracket, so that the line
reads:
[--filter=<filter> [--also-filter-submodules]]
Signed-off-by: Knut Harald Ryager <e-k-nut@hotmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When creating a new addition via either `reftable_stack_new_addition()`
or its convenince wrapper `reftable_stack_add()` we:
1. Create the "tables.list.lock" file.
2. Verify that the current version of the "tables.list" file is
up-to-date.
3. Write the new table records if so.
By default, the second step would cause us to bail out if we see that
there has been a concurrent write to the stack that made our in-memory
copy of the stack out-of-date. This is a safety mechanism to not write
records to the stack based on outdated information.
The downside though is that concurrent writes may now cause us to bail
out, which is not a good user experience. In addition, this isn't even
necessary for us, as Git knows to perform all checks for the old state
of references under the lock. (Well, in all except one case: when we
expire the reflog we first create the log iterator before we create the
lock, but this ordering is fixed as part of this commit.)
Consequently, most writers pass the `REFTABLE_STACK_NEW_ADDITION_RELOAD`
flag. The effect of this flag is that we reload the stack after having
acquired the lock in case the stack is out-of-date. This plugs the race
with concurrent writers, but we continue performing the verifications of
the expected old state to catch actual conflicts in the references we
are about to write.
Adapt the remaining callsites that don't yet pass this flag to do so.
While at it, drop a needless manual reload.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `flock` interface is implemented as part of "reftable/system.c" and
thus needs to be implemented by the integrator between the reftable
library and its parent code base. As such, we cannot rely on any
specific implementation thereof.
Regardless of that, users of the `flock` subsystem rely on `errno` being
set to specific values. This is fragile and not documented anywhere and
doesn't really make for a good interface.
Refactor the code so that the implementations themselves are expected to
return reftable-specific error codes. Our implementation of the `flock`
subsystem already knows to do this for all error paths except one.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we compact the reftable stack we first acquire the lock for the
"tables.list" file and then reload the stack to check that it is still
up-to-date. This is done by calling `stack_uptodate()`, which knows to
return zero in case the stack is up-to-date, a positive value if it is
not and a negative error code on unexpected conditions.
We don't do proper error checking though, but instead we only check
whether the returned error code is non-zero. If so, we simply bubble it
up the calling stack, which means that callers may see an unexpected
positive value.
Fix this issue by translating to `REFTABLE_OUTDATED_ERROR` instead.
Handle this situation in `reftable_addition_commit()`, where we perform
a best-effort auto-compaction.
All other callsites of `stack_uptodate()` know to handle a positive
return value and thus don't need to be fixed.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `reftable_stack_add()` function is a simple wrapper to lock the
stack, add records to it via a callback and then commit the
result. One problem with it though is that it doesn't accept any flags
for creating the addition. This makes it impossible to automatically
reload the stack in case it was modified before we managed to lock the
stack.
Add a `flags` field to plug this gap and pass it through accordingly.
For now this new flag won't be used by us, but it will be used by
libgit2.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While perfectly legal, older compiler toolchains complain when
zero-initializing structs that contain nested structs with `{0}`:
/home/libgit2/source/deps/reftable/stack.c:862:35: error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces]
struct reftable_addition empty = REFTABLE_ADDITION_INIT;
^~~~~~~~~~~~~~~~~~~~~~
/home/libgit2/source/deps/reftable/stack.c:707:33: note: expanded from macro 'REFTABLE_ADDITION_INIT'
#define REFTABLE_ADDITION_INIT {0}
^
We had the discussion around whether or not we want to handle such bogus
compiler errors in the past already [1]. Back then we basically decided
that we do not care about such old-and-buggy compilers, so while we
could fix the issue by using `{{0}}` instead this is not the preferred
way to handle this in the Git codebase.
We have an easier fix though: we can just drop the macro altogether and
handle initialization of the struct in `reftable_stack_addition_init()`.
Callers are expected to call this function already, so this change even
simplifies the calling convention.
[1]: https://lore.kernel.org/git/20220710081135.74964-1-sunshine@sunshineco.com/T/
Suggested-by: Carlo Arenas <carenas@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a couple of forward declarations in the stack-related code of
the reftable library. These declarations aren't really required, but are
simply caused by unfortunate ordering.
Reorder the code and remove the forward declarations.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable writer accidentally uses the Git-specific `QSORT()` macro.
This macro removes the need for the caller to provide the element size,
but other than that it's mostly equivalent to `qsort()`.
Replace the macro accordingly to make the library usable outside of Git.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>