index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB

index-pack and unpack-objects both read pack data from stdin through
a 4 KiB static buffer. In index-pack, each fill() flushes consumed
bytes to the pack file via write_or_die(), capping every write(2)
at 4 KiB. unpack-objects uses the same buffer pattern for reads.

On FUSE-backed filesystems every write(2) is a synchronous round
trip through the FUSE protocol (userspace -> kernel -> userspace ->
back), so the 4 KiB buffer turns a clone into many unnecessary tiny
writes with noticeable latency overhead.

Increase the buffer from 4 KiB to 128 KiB. Introduce a shared
DEFAULT_IO_BUFFER_SIZE constant in git-compat-util.h (next to
MAX_IO_SIZE) and use it in index-pack, unpack-objects, and the
hashfile layer in csum-file (which already used 128 KiB but
hardcoded the value).

Pack file writes to a FUSE filesystem with writeback caching
disabled during HTTPS clones of git/git (~293 MB pack):

  74,958 -> 4,687 (94% fewer)

Wall-clock time of git clone over HTTPS onto a FUSE passthrough
filesystem with writeback caching disabled, 3 runs per variant:

  vscode (~1.26 GB pack): 84.5s -> 75.7s avg (10% faster)
  git/git (~306 MB pack):  22.6s -> 20.0s avg (11% faster)

Signed-off-by: Scott Bauersfeld <sbauersfeld@g.ucla.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Scott Bauersfeld
2026-04-28 14:47:40 +00:00
committed by Junio C Hamano
parent 94f057755b
commit 007062ab4e
4 changed files with 9 additions and 5 deletions
+1 -2
View File
@@ -145,8 +145,7 @@ static int check_self_contained_and_connected;
static struct progress *progress;
/* We always read in 4kB chunks. */
static unsigned char input_buffer[4096];
static unsigned char input_buffer[DEFAULT_IO_BUFFER_SIZE];
static unsigned int input_offset, input_len;
static off_t consumed_bytes;
static off_t max_input_size;
+1 -2
View File
@@ -23,8 +23,7 @@
static int dry_run, quiet, recover, has_errors, strict;
static const char unpack_usage[] = "git unpack-objects [-n] [-q] [-r] [--strict]";
/* We always read in 4kB chunks. */
static unsigned char buffer[4096];
static unsigned char buffer[DEFAULT_IO_BUFFER_SIZE];
static unsigned int offset, len;
static off_t consumed_bytes;
static off_t max_input_size;
+1 -1
View File
@@ -178,7 +178,7 @@ struct hashfile *hashfd_ext(const struct git_hash_algo *algop,
f->algop = unsafe_hash_algo(algop);
f->algop->init_fn(&f->ctx);
f->buffer_len = opts->buffer_len ? opts->buffer_len : 128 * 1024;
f->buffer_len = opts->buffer_len ? opts->buffer_len : DEFAULT_IO_BUFFER_SIZE;
f->buffer = xmalloc(f->buffer_len);
f->check_buffer = NULL;
+6
View File
@@ -712,6 +712,12 @@ static inline uint64_t u64_add(uint64_t a, uint64_t b)
# endif
#endif
/*
* Default buffer size for buffered I/O in index-pack, unpack-objects,
* and the hashfile layer in csum-file.
*/
#define DEFAULT_IO_BUFFER_SIZE (128 * 1024)
#ifdef HAVE_ALLOCA_H
# include <alloca.h>
# define xalloca(size) (alloca(size))