Taylor Blau 08f612ba70 builtin/pack-objects.c: freshen objects from existing cruft packs
Once an object is written into a cruft pack, we can only freshen it by
writing a new loose or packed copy of that object with a more recent
mtime.

Prior to 61568efa95 (builtin/pack-objects.c: support `--max-pack-size`
with `--cruft`, 2023-08-28), we typically had at most one cruft pack in
a repository at any given time. So freshening unreachable objects was
straightforward when already rewriting the cruft pack (and its *.mtimes
file).

But 61568efa95 changes things: 'pack-objects' now supports writing
multiple cruft packs when invoked with `--cruft` and the
`--max-pack-size` flag. Cruft packs are rewritten until they reach some
size threshold, at which point they are considered "frozen", and will
only be modified in a pruning GC, or if the threshold itself is
adjusted.

Prior to this patch, however, this process breaks down when we attempt
to freshen an object packed in an earlier cruft pack, and that cruft
pack is larger than the threshold and thus will survive the repack.

When this is the case, it is impossible to freshen objects in cruft
pack(s) when those cruft packs are larger than the threshold. This is
because we would avoid writing them in the new cruft pack entirely, for
a couple of reasons.

 1. When enumerating packed objects via 'add_objects_in_unpacked_packs()'
    we pass the SKIP_IN_CORE_KEPT_PACKS, which is used to avoid looping
    over the packs we're going to retain (which are marked as kept
    in-core by 'read_cruft_objects()').

    This means that we will avoid enumerating additional packed copies
    of objects found in any cruft packs which are larger than the given
    size threshold. Thus there is no opportunity to call
    'create_object_entry()' whatsoever.

 2. We likewise will discard the loose copy (if one exists) of any
    unreachable object packed in a cruft pack that is larger than the
    threshold. Here our call path is 'add_unreachable_loose_objects()',
    which uses the 'add_loose_object()' callback.

    That function will eventually land us in 'want_object_in_pack()'
    (via 'add_cruft_object_entry()'), and we'll discard the object as it
    appears in one of the packs which we marked as kept in-core.

This means in effect that it is impossible to freshen an unreachable
object once it appears in a cruft pack larger than the given threshold.

Instead, we should pack an additional copy of an unreachable object we
want to freshen even if it appears in a cruft pack, provided that the
cruft copy has an mtime which is before the mtime of the copy we are
trying to pack/freshen. This is sub-optimal in the sense that it
requires keeping an additional copy of unreachable objects upon
freshening, but we don't have a better alternative without the ability
to make in-place modifications to existing *.mtimes files.

In order to implement this, we have to adjust the behavior of
'want_found_object()'. When 'pack-objects' is told that we're *not*
going to retain any cruft packs (i.e. the set of packs marked as kept
in-core does not contain a cruft pack), the behavior is unchanged.

But when there *is* at least one cruft pack that we're holding onto, it
is no longer sufficient to reject a copy of an object found in that
cruft pack for that reason alone. In this case, we only want to reject a
candidate object when copies of that object either:

 - exists in a non-cruft pack that we are retaining, regardless of that
   pack's mtime, or

 - exists in a cruft pack with an mtime at least as recent as the copy
   we are debating whether or not to pack, in which case freshening
   would be redundant.

To do this, keep track of whether or not we have any cruft packs in our
in-core kept list with a new 'ignore_packed_keep_in_core_has_cruft'
flag. When we end up in this new special case, we replace a call to
'has_object_kept_pack()' to 'want_cruft_object_mtime()', and only reject
objects when we have a copy in an existing cruft pack with at least as
recent an mtime as our candidate (in which case "freshening" would be
redundant).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-13 11:48:04 -07:00
2025-02-06 14:56:45 -08:00
2025-02-26 08:55:18 -08:00
2024-12-15 17:54:33 -08:00
2025-01-08 08:05:39 -08:00
2024-09-20 14:40:41 -07:00
2025-02-18 15:30:31 -08:00
2024-09-06 09:31:15 -07:00
2024-12-18 10:44:30 -08:00
2024-12-23 09:32:11 -08:00
2024-12-23 09:32:11 -08:00
2024-09-23 10:35:07 -07:00
2025-02-03 16:12:41 -08:00
2025-02-10 10:18:31 -08:00
2024-09-16 10:46:00 -07:00
2025-01-21 08:44:54 -08:00
2025-01-21 08:44:54 -08:00
2025-02-10 10:18:31 -08:00
2024-12-23 09:32:11 -08:00
2024-12-18 10:44:31 -08:00
2024-12-18 10:44:31 -08:00
2025-02-10 10:18:31 -08:00
2025-02-03 16:12:42 -08:00
2025-02-03 16:12:42 -08:00
2024-12-18 10:44:30 -08:00
2024-10-23 16:16:36 -04:00
2025-02-06 14:56:45 -08:00
2024-10-23 16:16:36 -04:00
2024-10-23 16:16:36 -04:00
2024-09-19 13:46:00 -07:00
2025-02-26 08:55:18 -08:00
2025-02-03 16:12:41 -08:00
2024-12-18 10:44:31 -08:00
2025-01-31 10:06:10 -08:00
2024-10-23 16:16:36 -04:00
2024-12-18 10:44:31 -08:00
2024-12-18 10:44:31 -08:00
2025-02-25 14:19:35 -08:00
2025-02-06 14:56:45 -08:00
2025-02-25 14:19:37 -08:00
2024-12-23 09:32:11 -08:00
2024-10-21 16:05:04 -04:00
2025-01-21 08:44:54 -08:00
2025-02-10 10:18:31 -08:00
2024-12-18 10:44:30 -08:00
2024-12-18 10:44:30 -08:00
2025-02-03 16:12:42 -08:00
2025-02-03 16:12:42 -08:00
2024-12-23 09:32:11 -08:00
2024-12-18 10:44:30 -08:00
2024-12-18 10:44:30 -08:00
2024-09-19 13:46:01 -07:00
2024-09-19 13:46:12 -07:00
2024-09-19 13:46:12 -07:00
2025-02-06 14:56:45 -08:00
2024-12-18 10:44:30 -08:00
2024-12-18 10:44:30 -08:00
2024-12-27 08:12:40 -08:00
2024-09-30 11:23:03 -07:00
2025-01-13 12:55:26 -08:00
2025-01-13 12:55:26 -08:00
2024-12-23 09:32:11 -08:00
2024-12-23 09:32:11 -08:00
2024-12-18 10:44:30 -08:00
2024-12-18 10:44:30 -08:00
2025-02-06 14:56:45 -08:00
2025-01-17 13:30:02 -08:00
2024-12-18 10:44:30 -08:00

Build status

Git - fast, scalable, distributed revision control system

Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.

Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.

See Documentation/gittutorial.txt to get started, then see Documentation/giteveryday.txt for a useful minimum set of commands, and Documentation/git-<commandname>.txt for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial or git help tutorial, and the documentation of each command with man git-<commandname> or git help <commandname>.

CVS users may also want to read Documentation/gitcvs-migration.txt (man gitcvs-migration or git help cvs-migration if git is installed).

The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission and Documentation/CodingGuidelines).

Those wishing to help with error message, usage and informational message string translations (localization l10) should see po/README.md (a po file is a Portable Object file that holds the translations).

To subscribe to the list, send an email to git+subscribe@vger.kernel.org (see https://subspace.kernel.org/subscribing.html for details). The mailing list archives are available at https://lore.kernel.org/git/, https://marc.info/?l=git and other archival sites.

Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.

The maintainer frequently sends the "What's cooking" reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.

The name "git" was given by Linus Torvalds when he wrote the very first version. He described the tool as "the stupid content tracker" and the name as (depending on your mood):

  • random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant.
  • stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
  • "global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
  • "goddamn idiotic truckload of sh*t": when it breaks
Description
Git Source Code Mirror - This is a publish-only repository but pull requests can be turned into patches to the mailing list via GitGitGadget (https://gitgitgadget.github.io/). Please follow Documentation/SubmittingPatches procedure for any of your improvements.
Readme 734 MiB
Languages
C 50.5%
Shell 38.7%
Perl 4.5%
Tcl 3.2%
Python 0.8%
Other 2.1%