Jeff King 8f32a5a6c0 fetch-pack: re-scan when double-checking graph objects
The fetch code tries to avoid asking the remote side for an object we
already have. It does this by traversing recent commits reachable from
our refs looking for matches. Commit 5d4cc78f72 (fetch-pack: die if in
commit graph but not obj db, 2024-11-05) introduced an extra check
there: if we think we have an object because it's in the commit graph,
we double-check that we actually have it in our object database with a
call to odb_has_object().

But that call does not pass any flags, and so the function won't call
reprepared_packed_git() if it does not find the object. That opens us up
to the usual race against some other process repacking the odb:

  1. We scan the list of packs in objects/pack but haven't yet opened them.

  2. Somebody else packs the object into a new pack (which we don't know
     about), and deletes the old pack it was in.

  3. Our odb_has_object() calls tries to open that old pack, but finds it
     is gone. We declare that we don't have the object.

And this causes us to erroneously complain and abort the fetch, thinking
our commit-graph and object database are out of sync. Instead, we should
pass HAS_OBJECT_RECHECK_PACKED, which will add a new step:

  4. We re-scan the pack directory again, find the new pack, and locate
     the object.

Often the fetch code tries to avoid these kinds of re-scans if it's
likely that we won't have the object. If the other side has told us
about object X and we want to know if we have it, we'll skip the re-scan
(to avoid spending a lot of effort when there are many such objects). We
can accept the racy false negative in that case because the worst case
is that we ask the other side to send us the object.

But this is not one of those cases. These are objects which are
accessible from _our_ refs, and which we already found in the commit
graph file. We should have them, and if we don't, we'll die()
immediately. So the performance impact is negligible, and getting the
right answer is important.

There's no test here because it's inherently racy. In fact, I had
trouble even developing a minimal test. The problem seen in the wild can
be produced like this:

  # Any git.git mirror which supports partial clones; I think this
  # should work with any repo that contains submodules, but note that
  # $obj below is specific to this repo
  url=https://github.com/git/git.git

  # This is a commit that is not at the tip of any branches (so after
  # we have it, we'll still have some commits to fetch).
  obj=cf6f63ea6bf35173e02e18bdc6a4ba41288acff9

  git init
  git fetch --filter=tree:0 $url $obj:refs/heads/foo
  git checkout foo
  git commit-graph write --reachable
  git fetch $url

What happens here is that the initial fetch grabs that older commit (and
its ancestors) but no trees or blobs, and the subsequent checkout grabs
the necessary trees and blobs just for that commit. The final fetch
spawns a long sequence of child fetches due to fetch_submodules(), which
wants to check whether there have been any gitlink modifications which
should trigger a fetch of the related submodule (we'll leave aside the
irony that we did not even check out any submodules yet).

That series of fetches causes us to accumulate packs, which eventually
triggers background maintenance to run. That repacks all-into-one, and
the pack containing $obj goes away in favor of a new pack. And then the
fetch eventually fails with:

  fatal: You are attempting to fetch cf6f63ea6b, which is in the commit graph file but
not in the object database.

In the scenario above, the race becomes likely because of the long
series of quick fetches. But I _think_ the bug is independent of partial
clones entirely, and you could run into the same thing with a single
fetch, some other process running "git repack" simultaneously, and a bit
of bad luck. I haven't been able to reproduce, though. I'm not sure if
that's because there's some mis-analysis above, or if the race window is
just small enough that it's hard to trigger.

At any rate, re-scanning here seems like an obviously correct thing to
do with no downside, and it does fix the partial-clone case shown above.

Reported-by: Дилян Палаузов <dilyan.palauzov@aegee.org>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-25 10:30:03 -07:00
2025-08-04 08:10:34 -07:00
2025-08-07 08:48:57 -07:00
2025-05-15 13:46:47 -07:00
2025-07-01 07:46:22 -07:00
2025-07-01 14:46:38 -07:00
2025-07-01 14:46:38 -07:00
2025-03-26 16:26:09 +09:00
2025-07-15 15:18:18 -07:00
2025-07-16 22:16:15 -07:00
2025-08-04 08:10:33 -07:00
2025-07-23 08:15:18 -07:00
2025-01-21 08:44:54 -08:00
2025-01-21 08:44:54 -08:00
2024-12-18 10:44:31 -08:00
2025-07-01 14:46:37 -07:00
2025-03-03 13:49:23 -08:00
2025-07-01 14:46:38 -07:00
2025-07-23 08:15:18 -07:00
2025-08-17 17:18:23 -07:00
2025-07-25 16:34:13 -07:00
2024-12-18 10:44:31 -08:00
2025-07-01 14:46:38 -07:00
2025-05-08 12:36:31 -07:00
2025-07-15 15:18:18 -07:00
2025-07-01 14:46:38 -07:00
2025-01-31 10:06:10 -08:00
2025-06-24 09:48:51 -07:00
2025-07-01 14:46:38 -07:00
2025-07-01 14:46:37 -07:00
2025-07-23 08:15:18 -07:00
2024-12-18 10:44:31 -08:00
2025-07-01 14:46:38 -07:00
2025-07-23 08:15:18 -07:00
2025-07-01 14:46:38 -07:00
2025-04-23 13:58:50 -07:00
2025-05-12 13:06:26 -07:00
2025-07-15 15:18:18 -07:00
2024-12-18 10:44:30 -08:00
2024-12-18 10:44:30 -08:00
2025-07-01 14:58:24 -07:00
2025-07-23 08:15:18 -07:00
2024-12-18 10:44:30 -08:00
2024-12-18 10:44:30 -08:00
2025-07-01 14:46:37 -07:00
2025-03-03 13:49:19 -08:00
2025-08-03 18:44:26 -07:00
2025-06-17 10:44:42 -07:00
2025-06-17 10:44:38 -07:00
2025-06-17 10:44:38 -07:00
2025-07-15 15:18:18 -07:00
2025-07-15 15:18:18 -07:00
2025-07-15 15:18:18 -07:00
2025-07-01 14:58:24 -07:00
2024-12-18 10:44:30 -08:00
2025-07-23 08:15:21 -07:00
2025-07-01 14:46:38 -07:00
2024-12-27 08:12:40 -08:00
2025-07-01 14:46:37 -07:00
2024-12-23 09:32:11 -08:00
2025-07-01 14:46:38 -07:00
2025-03-03 13:49:26 -08:00
2024-12-18 10:44:30 -08:00
2024-12-18 10:44:30 -08:00
2025-07-23 08:15:18 -07:00
2025-05-15 13:46:47 -07:00
2025-03-03 13:49:27 -08:00
2025-07-01 14:46:38 -07:00
2025-02-06 14:56:45 -08:00
2025-07-01 14:46:38 -07:00
2025-06-25 14:07:36 -07:00
2025-05-15 17:24:55 -07:00

Build status

Git - fast, scalable, distributed revision control system

Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.

Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.

See Documentation/gittutorial.adoc to get started, then see Documentation/giteveryday.adoc for a useful minimum set of commands, and Documentation/git-<commandname>.adoc for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial or git help tutorial, and the documentation of each command with man git-<commandname> or git help <commandname>.

CVS users may also want to read Documentation/gitcvs-migration.adoc (man gitcvs-migration or git help cvs-migration if git is installed).

The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission and Documentation/CodingGuidelines).

Those wishing to help with error message, usage and informational message string translations (localization l10) should see po/README.md (a po file is a Portable Object file that holds the translations).

To subscribe to the list, send an email to git+subscribe@vger.kernel.org (see https://subspace.kernel.org/subscribing.html for details). The mailing list archives are available at https://lore.kernel.org/git/, https://marc.info/?l=git and other archival sites.

Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.

The maintainer frequently sends the "What's cooking" reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.

The name "git" was given by Linus Torvalds when he wrote the very first version. He described the tool as "the stupid content tracker" and the name as (depending on your mood):

  • random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant.
  • stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
  • "global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
  • "goddamn idiotic truckload of sh*t": when it breaks
Description
Git Source Code Mirror - This is a publish-only repository but pull requests can be turned into patches to the mailing list via GitGitGadget (https://gitgitgadget.github.io/). Please follow Documentation/SubmittingPatches procedure for any of your improvements.
Readme 734 MiB
Languages
C 50.5%
Shell 38.7%
Perl 4.5%
Tcl 3.2%
Python 0.8%
Other 2.1%