"git cat-file --batch" has been optimized.
* ew/cat-file-optim:
cat-file: use writev(2) if available
cat-file: batch_write: use size_t for length
cat-file: batch-command uses content_limit
object_info: content_limit only applies to blobs
packfile: packed_object_info avoids packed_to_object_type
cat-file: use delta_base_cache entries directly
packfile: inline cache_or_unpack_entry
packfile: fix off-by-one in content_limit comparison
packfile: allow content-limit for cat-file
packfile: move sizep computation
Since the `info` command in cat-file --batch-command prints object info
for a given object, it is natural to add another command in cat-file
--batch-command to print object info for a given object from a remote.
Add `remote-object-info` to cat-file --batch-command.
While `info` takes object ids one at a time, this creates overhead when
making requests to a server so `remote-object-info` instead can take
multiple object ids at once.
cat-file --batch-command is generally implemented in the following
manner:
- Receive and parse input from user
- Call respective function attached to command
- Get object info, print object info
In --buffer mode, this changes to:
- Receive and parse input from user
- Store respective function attached to command in a queue
- After flush, loop through commands in queue
- Call respective function attached to command
- Get object info, print object info
Notice how the getting and printing of object info is accomplished one
at a time. As described above, this creates a problem for making
requests to a server. Therefore, `remote-object-info` is implemented in
the following manner:
- Receive and parse input from user
If command is `remote-object-info`:
- Get object info from remote
- Loop through and print each object info
Else:
- Call respective function attached to command
- Parse input, get object info, print object info
And finally for --buffer mode `remote-object-info`:
- Receive and parse input from user
- Store respective function attached to command in a queue
- After flush, loop through commands in queue:
If command is `remote-object-info`:
- Get object info from remote
- Loop through and print each object info
Else:
- Call respective function attached to command
- Get object info, print object info
To summarize, `remote-object-info` gets object info from the remote and
then loop through the object info passed in, printing the info.
In order for remote-object-info to avoid remote communication overhead
in the non-buffer mode, the objects are passed in as such:
remote-object-info <remote> <oid> <oid> ... <oid>
rather than
remote-object-info <remote> <oid>
remote-object-info <remote> <oid>
...
remote-object-info <remote> <oid>
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Eric Ju <eric.peijian@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The `odb_mkstemp()` and `odb_pack_keep()` functions are quite clearly
tied to the object store, but regardless of that they are located in
"environment.c". Move them over, which also helps to get rid of
dependencies on `the_repository` in the environment subsystem.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For objects already in the delta_base_cache, we can safely use
one entry at-a-time directly to avoid the malloc+memcpy+free
overhead. For a 1MB delta base object, this eliminates the
speed penalty of duplicating large objects into memory and
speeds up those 1MB delta base cached content retrievals by
roughly 30%.
While only 2-7% of objects are delta bases in repos I've looked
at, this avoids up to 96MB of duplicated memory in the worst
case with the default git config.
The new delta_base_cache_lock is a simple single-threaded
assertion to ensure cat-file (and similar) is the exclusive user
of the delta_base_cache. In other words, we cannot have diff
or similar commands using two or more entries directly from the
delta base cache. The new lock has nothing to do with parallel
access via multiple threads at the moment.
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid unnecessary round trips to the object store to speed
up cat-file contents retrievals. The majority of packed objects
don't benefit from the streaming interface at all and we end up
having to load them in core anyways to satisfy our streaming
API.
This drops the runtime of
`git cat-file --batch-all-objects --unordered --batch' on
git.git from ~7.1s to ~6.1s on Jeff's machine.
[ew: commit message]
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To create the proper signatures for commit objects both versions of
the commit object need to be generated and signed. After that it is
a waste to throw away the work of generating the compatibility hash
so update write_object_file_flags to take a compatibility hash input
parameter that it can use to skip the work of generating the
compatability hash.
Update the places that don't generate the compatability hash to
pass NULL so it is easy to tell write_object_file_flags should
not attempt to use their compatability hash.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As part of the transition plan, we'd like to add a file in the .git
directory that maps loose objects between SHA-1 and SHA-256. Let's
implement the specification in the transition plan and store this data
on a per-repository basis in struct repository.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Various offset computation in the code that accesses the packfiles
and other data in the object layer has been hardened against
arithmetic overflow, especially on 32-bit systems.
* tb/object-access-overflow-protection:
commit-graph.c: prevent overflow in `verify_commit_graph()`
commit-graph.c: prevent overflow in `write_commit_graph()`
commit-graph.c: prevent overflow in `merge_commit_graph()`
commit-graph.c: prevent overflow in `split_graph_merge_strategy()`
commit-graph.c: prevent overflow in `load_tree_for_commit()`
commit-graph.c: prevent overflow in `fill_commit_in_graph()`
commit-graph.c: prevent overflow in `fill_commit_graph_info()`
commit-graph.c: prevent overflow in `load_oid_from_graph()`
commit-graph.c: prevent overflow in add_graph_to_chain()
commit-graph.c: prevent overflow in `write_commit_graph_file()`
pack-bitmap.c: ensure that eindex lookups don't overflow
midx.c: prevent overflow in `fill_included_packs_batch()`
midx.c: prevent overflow in `write_midx_internal()`
midx.c: store `nr`, `alloc` variables as `size_t`'s
midx.c: prevent overflow in `nth_midxed_offset()`
midx.c: prevent overflow in `nth_midxed_object_oid()`
midx.c: use `size_t`'s for fanout nr and alloc
packfile.c: use checked arithmetic in `nth_packed_object_offset()`
packfile.c: prevent overflow in `load_idx()`
packfile.c: prevent overflow in `nth_packed_object_id()`
The vast majority of files including object-store.h did not need dir.h
nor khash.h. Split the header into two files, and let most just depend
upon object-store-ll.h, while letting the two callers that need it
depend on the full object-store.h.
After this patch:
$ git grep -h include..object-store | sort | uniq -c
2 #include "object-store.h"
129 #include "object-store-ll.h"
Diff best viewed with `--color-moved`.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>