git-mirror

mirror of https://github.com/git/git.git synced 2025-12-23 12:14:22 +01:00

Author	SHA1	Message	Date
Taylor Blau	04bc973222	Merge branch 'ew/cat-file-optim' into jch "git cat-file --batch" has been optimized. * ew/cat-file-optim: cat-file: use writev(2) if available cat-file: batch_write: use size_t for length cat-file: batch-command uses content_limit object_info: content_limit only applies to blobs packfile: packed_object_info avoids packed_to_object_type cat-file: use delta_base_cache entries directly packfile: inline cache_or_unpack_entry packfile: fix off-by-one in content_limit comparison packfile: allow content-limit for cat-file packfile: move sizep computation	2024-11-01 15:40:25 -04:00
Eric Ju	999bed85bd	cat-file: add remote-object-info to batch-command Since the `info` command in cat-file --batch-command prints object info for a given object, it is natural to add another command in cat-file --batch-command to print object info for a given object from a remote. Add `remote-object-info` to cat-file --batch-command. While `info` takes object ids one at a time, this creates overhead when making requests to a server so `remote-object-info` instead can take multiple object ids at once. cat-file --batch-command is generally implemented in the following manner: - Receive and parse input from user - Call respective function attached to command - Get object info, print object info In --buffer mode, this changes to: - Receive and parse input from user - Store respective function attached to command in a queue - After flush, loop through commands in queue - Call respective function attached to command - Get object info, print object info Notice how the getting and printing of object info is accomplished one at a time. As described above, this creates a problem for making requests to a server. Therefore, `remote-object-info` is implemented in the following manner: - Receive and parse input from user If command is `remote-object-info`: - Get object info from remote - Loop through and print each object info Else: - Call respective function attached to command - Parse input, get object info, print object info And finally for --buffer mode `remote-object-info`: - Receive and parse input from user - Store respective function attached to command in a queue - After flush, loop through commands in queue: If command is `remote-object-info`: - Get object info from remote - Loop through and print each object info Else: - Call respective function attached to command - Get object info, print object info To summarize, `remote-object-info` gets object info from the remote and then loop through the object info passed in, printing the info. In order for remote-object-info to avoid remote communication overhead in the non-buffer mode, the objects are passed in as such: remote-object-info <remote> <oid> <oid> ... <oid> rather than remote-object-info <remote> <oid> remote-object-info <remote> <oid> ... remote-object-info <remote> <oid> Helped-by: Jonathan Tan <jonathantanmy@google.com> Helped-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Eric Ju <eric.peijian@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-31 16:59:58 -04:00
Patrick Steinhardt	26b4df907b	environment: move object database functions into object layer The `odb_mkstemp()` and `odb_pack_keep()` functions are quite clearly tied to the object store, but regardless of that they are located in "environment.c". Move them over, which also helps to get rid of dependencies on `the_repository` in the environment subsystem. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-09-12 10:15:40 -07:00
Eric Wong	98521d6f04	cat-file: use delta_base_cache entries directly For objects already in the delta_base_cache, we can safely use one entry at-a-time directly to avoid the malloc+memcpy+free overhead. For a 1MB delta base object, this eliminates the speed penalty of duplicating large objects into memory and speeds up those 1MB delta base cached content retrievals by roughly 30%. While only 2-7% of objects are delta bases in repos I've looked at, this avoids up to 96MB of duplicated memory in the worst case with the default git config. The new delta_base_cache_lock is a simple single-threaded assertion to ensure cat-file (and similar) is the exclusive user of the delta_base_cache. In other words, we cannot have diff or similar commands using two or more entries directly from the delta base cache. The new lock has nothing to do with parallel access via multiple threads at the moment. Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-25 10:54:22 -07:00
Jeff King	117addcb5e	packfile: allow content-limit for cat-file Avoid unnecessary round trips to the object store to speed up cat-file contents retrievals. The majority of packed objects don't benefit from the streaming interface at all and we end up having to load them in core anyways to satisfy our streaming API. This drops the runtime of `git cat-file --batch-all-objects --unordered --batch' on git.git from ~7.1s to ~6.1s on Jeff's machine. [ew: commit message] Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-25 10:54:22 -07:00
Eric W. Biederman	c2538492df	object-file: add a compat_oid_in parameter to write_object_file_flags To create the proper signatures for commit objects both versions of the commit object need to be generated and signed. After that it is a waste to throw away the work of generating the compatibility hash so update write_object_file_flags to take a compatibility hash input parameter that it can use to skip the work of generating the compatability hash. Update the places that don't generate the compatability hash to pass NULL so it is easy to tell write_object_file_flags should not attempt to use their compatability hash. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-02 14:57:39 -07:00
brian m. carlson	23b2c7e95b	loose: add a mapping between SHA-1 and SHA-256 for loose objects As part of the transition plan, we'd like to add a file in the .git directory that maps loose objects between SHA-1 and SHA-256. Let's implement the specification in the transition plan and store this data on a per-repository basis in struct repository. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-02 14:57:38 -07:00
Junio C Hamano	4488bb3bed	Merge branch 'tb/object-access-overflow-protection' Various offset computation in the code that accesses the packfiles and other data in the object layer has been hardened against arithmetic overflow, especially on 32-bit systems. * tb/object-access-overflow-protection: commit-graph.c: prevent overflow in `verify_commit_graph()` commit-graph.c: prevent overflow in `write_commit_graph()` commit-graph.c: prevent overflow in `merge_commit_graph()` commit-graph.c: prevent overflow in `split_graph_merge_strategy()` commit-graph.c: prevent overflow in `load_tree_for_commit()` commit-graph.c: prevent overflow in `fill_commit_in_graph()` commit-graph.c: prevent overflow in `fill_commit_graph_info()` commit-graph.c: prevent overflow in `load_oid_from_graph()` commit-graph.c: prevent overflow in add_graph_to_chain() commit-graph.c: prevent overflow in `write_commit_graph_file()` pack-bitmap.c: ensure that eindex lookups don't overflow midx.c: prevent overflow in `fill_included_packs_batch()` midx.c: prevent overflow in `write_midx_internal()` midx.c: store `nr`, `alloc` variables as `size_t`'s midx.c: prevent overflow in `nth_midxed_offset()` midx.c: prevent overflow in `nth_midxed_object_oid()` midx.c: use `size_t`'s for fanout nr and alloc packfile.c: use checked arithmetic in `nth_packed_object_offset()` packfile.c: prevent overflow in `load_idx()` packfile.c: prevent overflow in `nth_packed_object_id()`	2023-07-25 12:05:23 -07:00
Elijah Newren	a034e9106f	object-store-ll.h: split this header out of object-store.h The vast majority of files including object-store.h did not need dir.h nor khash.h. Split the header into two files, and let most just depend upon object-store-ll.h, while letting the two callers that need it depend on the full object-store.h. After this patch: $ git grep -h include..object-store \| sort \| uniq -c 2 #include "object-store.h" 129 #include "object-store-ll.h" Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-06-21 13:39:54 -07:00

9 Commits