Commit Graph

1645 Commits

Author SHA1 Message Date
Junio C Hamano
5139fce01f Merge branch 'jc/diff-from-contents-fix'
The code to squelch output from "git diff -w --name-status"
etc. for paths that "git diff -w -p" would have stayed silent
leaked output from dry-run patch generation, which has been
corrected.

* jc/diff-from-contents-fix:
  diff: make sure the other caller of diff_flush_patch_quietly() is silent
2025-10-24 09:10:37 -07:00
Junio C Hamano
88b3704ab1 Merge branch 'jk/diff-from-contents-fix'
Recently we attempted to improve "git diff -w" and friends to
handle cases where patch output would be suppressed, but it
introduced a bug that emits unnecessary output, which has been
corrected.

* jk/diff-from-contents-fix:
  diff: restore redirection to /dev/null for diff_from_contents
2025-10-24 09:10:37 -07:00
Junio C Hamano
3da4413dbc diff: make sure the other caller of diff_flush_patch_quietly() is silent
Earlier, we added is a protection for the loop that computes "git
diff --quiet -w" to ensure calls to the diff_flush_patch_quietly()
helper stays quiet.  Do the same for another loop that deals with
options like "--name-status" to make calls to the same helper.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-23 09:05:28 -07:00
Jeff King
623f7af284 diff: restore redirection to /dev/null for diff_from_contents
In --quiet mode, since we produce only an exit code for "something was
changed" and no actual output, we can often get by with just a
tree-level diff. However, certain options require us to actually look at
the file contents (e.g., if we are ignoring whitespace changes). We have
a flag "diff_from_contents" for that, and if it is set we call
diff_flush() on each path.

To avoid producing any output (since we were asked to be --quiet), we
traditionally just redirected the output to /dev/null. That changed in
b55e6d36eb (diff: ensure consistent diff behavior with ignore options,
2025-08-08), which replaced that with a "dry_run" flag. In theory, with
dry_run set, we should produce no output. But it carries a risk of
regression: if we forget to respect dry_run in any of the output paths,
we'll accidentally produce output.

And indeed, there is at least one such regression in that commit, as it
covered only the case where we actually call into xdiff, and not
creation or deletion diffs, where we manually generate the headers. We
even test this case in t4035, but only with diff-tree, which does not
show the bug by default because it does not require diff_from_contents.
But git-diff does, because it allows external diff programs by default
(so we must dig into each diff filepair to decide if it requires running
an external diff that may declare two distinct blobs to actually be the
same).

We should fix all of those code paths to respect dry_run correctly, but
in the meantime we can protect ourselves more fully by restoring the
redirection to /dev/null. This gives us an extra layer of protection
against regressions dues to other code paths we've missed.

Though the original issue was reported with "git diff" (and due to its
default of --ext-diff), I've used "diff-tree -w" in the new test. It
triggers the same issue, but I think the fact that "-w" implies
diff_from_contents is a bit more obvious, and fits in with the rest of
t4035.

Reported-by: Jake Zimmerman <jake@zimmerman.io>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-17 11:41:50 -07:00
Junio C Hamano
a89fa2fff2 Merge branch 'jk/color-variable-fixes'
Some places in the code confused a variable that is *not* a boolean
to enable color but is an enum that records what the user requested
to do about color.  A couple of bugs of this sort have been fixed,
while the code has been cleaned up to prevent similar bugs in the
future.

* jk/color-variable-fixes:
  config: store want_color() result in a separate bool
  add-interactive: retain colorbool values longer
  color: return bool from want_color()
  color: use git_colorbool enum type to store colorbools
  pretty: use format_commit_context.auto_color as colorbool
  diff: stop passing ecbdata->use_color as boolean
  diff: pass o->use_color directly to fill_metainfo()
  diff: don't use diff_options.use_color as a strict bool
  diff: simplify color_moved check when flushing
  grep: don't treat grep_opt.color as a strict bool
  color: return enum from git_config_colorbool()
  color: use GIT_COLOR_* instead of numeric constants
2025-09-29 11:40:35 -07:00
Jeff King
e9330ae4b8 color: use git_colorbool enum type to store colorbools
We traditionally used "int" to store and pass around the values defined
by "enum git_colorbool" (which were originally just #define macros).
Using an int doesn't produce incorrect results, but using the actual
enum makes the intent of the code more clear.

It would be nice if the compiler could catch cases where we used the
enum and an int interchangeably, since it's very easy to accidentally
check the boolean true/false of a colorbool like:

  if (branch_use_color)

This is wrong because GIT_COLOR_UNKNOWN and GIT_COLOR_AUTO evaluate to
true in C, even though we may ultimately decide not to use color. But C
is pretty happy to convert between ints and enums (even with various
-Wenum-* warnings). So this sadly doesn't protect us from such mistakes,
but it hopefully does make the code easier to read.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-16 17:59:53 -07:00
Jeff King
955000d917 diff: stop passing ecbdata->use_color as boolean
In emit_hunk_header(), we evaluate ecbdata->color_diff both as a
git_colorbool, passing it to diff_get_color():

  const char *reset = diff_get_color(ecbdata->color_diff, DIFF_RESET);

and as a strict boolean:

  const char *reverse = ecbdata->color_diff ? GIT_COLOR_REVERSE : "";

At first glance this seems wrong. Usually we store the color decision as
a git_colorbool, so the second line would get confused by GIT_COLOR_AUTO
(which is boolean true, but may still mean we do not produce color).

However, the second line is correct because our caller sets color_diff
using want_color(), which collapses the colorbool to a strict true/false
boolean. The first line is _also_ correct because of the idempotence of
want_color(). Even though diff_get_color() will pass our true/false
value through want_color() again, the result will be left untouched.

But let's pass through the colorbool itself, which makes it more
consistent with the rest of the diff code. We'll need to then call
want_color() whenever we treat it as a boolean, but there is only such
spot (the one quoted above).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-16 13:37:06 -07:00
Jeff King
12df3c2e99 diff: pass o->use_color directly to fill_metainfo()
We pass the use_color parameter of fill_metainfo() as a strict boolean,
using:

  want_color(o->use_color) && !pgm

to derive its value. But then inside the function, we pass it to
diff_get_color(), which expects one of the git_colorbool enum values,
and so feeds it to want_color() again.

Even though want_color() produces a strict 0/1 boolean, this doesn't
produce wrong results because want_color() is idempotent. Since
GIT_COLOR_ALWAYS and NEVER are defined as 1 and 0, and because
want_color() passes through those values, evaluating "want_color(foo)"
and "want_color(want_color(foo))" will return the same result.

But as part of a longer strategy to align the types we use for storing
these values, let's pass through the colorbool directly. To handle the
"&&" case here, we'll convert the presence of "pgm" into "NEVER", which
arguably makes the intent of the code more clear anyway.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-16 13:37:06 -07:00
Jeff King
4cfc971a2b diff: don't use diff_options.use_color as a strict bool
We disable --color-moved if color is not in use at all. This happens in
diff_setup_done(), where we set options->color_moved to 0 if
options->use_color is not true. But a strict boolean check here is not
correct; use_color could be GIT_COLOR_UNKNOWN or GIT_COLOR_AUTO, both of
which evaluate to true, even though we may later decide not to show
colors.

We should be using want_color() to convert that git_colorbool into a
true boolean. As it turns out, this does not produce wrong output. Even
though we go to the trouble to detect the moved lines, ultimately we get
the color values from diff_get_color(), which does check want_color().
And so it returns the empty string for each color, and we "color" the
result with nothing.

So the output is correct, but there is a small but measurable
performance cost to doing the line detection. E.g., in git.git before
and after this patch (there are no colors shown because hyperfine
redirects output to /dev/null):

  Benchmark 1: ./git.old log --no-merges -p --color-moved -1000
    Time (mean ± σ):      1.019 s ±  0.013 s    [User: 0.955 s, System: 0.064 s]
    Range (min … max):    1.005 s …  1.045 s    10 runs

  Benchmark 2: ./git.new log --no-merges -p --color-moved -1000
    Time (mean ± σ):     982.9 ms ±  14.5 ms    [User: 925.8 ms, System: 57.1 ms]
    Range (min … max):   965.1 ms … 1003.2 ms    10 runs

  Summary
    ./git.new log --no-merges -p --color-moved -1000 ran
      1.04 ± 0.02 times faster than ./git.old log --no-merges -p --color-moved -1000

Note that the fix is not quite as simple as just calling want_color()
from diff_setup_done(). There's a subtle timing issue that goes back to
daa0c3d971 (color: delay auto-color decision until point of use,
2011-08-17), the commit that adds want_color() in the first place.  As
discussed there, we must delay evaluating the colorbool value until all
pager setup is complete.

So instead, we'll leave the "color_moved" field intact in diff_setup_done(),
and modify the point where it is evaluated. Fortunately there is only
one such spot that controls whether we run any of the color-moved code
at all.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-16 13:37:05 -07:00
Jeff King
8efe643e0e diff: simplify color_moved check when flushing
In diff_flush_patch_all_file_pairs(), we set o->emitted_symbols if and
only if o->color_moved is true. That causes the lower-level routines to
fill up o->emitted_symbols, which we then analyze in order to do the
actual colorizing.

But in that final step, we do:

  if (o->emitted_symbols) {
          if (o->color_moved) {
	     ...actual coloring...
	  }
	  ...clean up of emitted_symbols...
  }

The inner "if" will always trigger, since we set emitted_symbols only
when doing color_moved (it is a little confusing that it is set inside
the diff_options struct, but that is for convenience of passing it to
the lower-level routines; we always clear it at the end of flushing,
since 48edf3a02a (diff: clear emitted_symbols flag after use,
2019-01-24)).

Let's simplify the code a bit by just dropping the inner "if" and
running its block unconditionally.

In theory the current code might be useful if another feature besides
color_moved setup and used emitted_symbols, but it would be easy to
refactor later to handle that. And in the meantime, this makes further
work in this area easier.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-16 13:37:05 -07:00
Jeff King
3c3e9b8303 color: use GIT_COLOR_* instead of numeric constants
Long ago Git's decision to show color for a subsytem was stored in a
tri-state variable: it could be true (1), false (0), or unknown (-1).
But since daa0c3d971 (color: delay auto-color decision until point of
use, 2011-08-17) we want to carry around a new state, "auto", which
bases the decision on the tty-ness of stdout (rather than collapsing
that "auto" state to a true/false immediately).

That commit introduced a set of GIT_COLOR_* defines to represent each
state: UNKNOWN, ALWAYS, NEVER, and AUTO. But it only used the AUTO
value, and left alone code using bare 0/1/-1 values. And of course since
then we've grown many new spots that use those bare values.

Let's switch all of these to use the named constants. That should make
the code a bit easier to read, as it is more obvious that we're
representing a color decision.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-16 13:37:03 -07:00
Junio C Hamano
109c3df14c Merge branch 'tc/diff-tree-max-depth'
"git diff-tree" learned "--max-depth" option.

* tc/diff-tree-max-depth:
  diff: teach tree-diff a max-depth parameter
  within_depth: fix return for empty path
  combine-diff: zero memory used for callback filepairs
2025-08-25 14:22:01 -07:00
Junio C Hamano
244214e9b6 Merge branch 'ly/diff-name-only-with-diff-from-content'
Various options to "git diff" that makes comparison ignore certain
aspects of the differences (like "space changes are ignored",
"differences in lines that match these regular expressions are
ignored") did not work well with "--name-only" and friends.

* ly/diff-name-only-with-diff-from-content:
  diff: ensure consistent diff behavior with ignore options
2025-08-22 13:13:22 -07:00
Lidong Yan
b55e6d36eb diff: ensure consistent diff behavior with ignore options
In git-diff, options like `-w` and `-I<regex>`, two files are considered
equivalent under the specified "ignore" rules, even when they are not
bit-for-bit identical. For options like `--raw`, `--name-status`,
and `--name-only`, git-diff deliberately compares only the SHA values
to determine whether two files are equivalent, for performance reasons.
As a result, a file shown in `git diff --name-status` may not appear
in `git diff --patch`.

To quickly determine whether two files are equivalent, add a helper
function diff_flush_patch_quietly() in diff.c. Add `.dry_run` field in
`struct diff_options`. When `.dry_run` is true, builtin_diff() returns
immediately upon finding any change. Call diff_flush_patch_quietly()
to determine if we should flush `--raw`, `--name-only` or `--name-status`
output.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-08 07:54:44 -07:00
Jeff King
a1dfa5448d diff: teach tree-diff a max-depth parameter
When you are doing a tree-diff, there are basically two options: do not
recurse into subtrees at all, or recurse indefinitely. While most
callers would want to always recurse and see full pathnames, some may
want the efficiency of looking only at a particular level of the tree.
This is currently easy to do for the top-level (just turn off
recursion), but you cannot say "show me what changed in subdir/, but do
not recurse".

This patch adds a max-depth parameter which is measured from the closest
pathspec match, so that you can do:

  git log --raw --max-depth=1 -- a/b/c

and see the raw output for a/b/c/, but not those of a/b/c/d/
(instead of the raw output you would see for a/b/c/d).

Co-authored-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-07 15:29:35 -07:00
Junio C Hamano
f3a303aef0 diff: simplify parsing of diff.colormovedws
The code to parse this configuration variable, whose value is a
comma-separated list of known tokens like "ignore-space-change" and
"ignore-all-space", uses string_list_split() to split the value into
pieces, and then places each piece of string in a strbuf to trim,
before comparing the result with the list of known tokens.

Thanks to the previous steps, now string_list_split() can trim the
resulting pieces before it places them in the string list.  Use it
to simplify the code.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-02 22:34:45 -07:00
Junio C Hamano
9f6dfe43c8 string-list: align string_list_split() with its _in_place() counterpart
The string_list_split_in_place() function was updated by 52acddf3
(string-list: multi-delimiter `string_list_split_in_place()`,
2023-04-24) to take more than one delimiter characters, hoping that
we can later use it to replace our uses of strtok().  We however did
not make a matching change to the string_list_split() function,
which is very similar.

Before giving both functions more features in future commits, allow
string_list_split() to also take more than one delimiter characters
to make them closer to each other.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-02 22:29:27 -07:00
Patrick Steinhardt
e989dd96b8 odb: rename oid_object_info()
Rename `oid_object_info()` to `odb_read_object_info()` as well as their
`_extended()` variant to match other functions related to the object
database and our modern coding guidelines.

Introduce compatibility wrappers so that any in-flight topics will
continue to compile.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 14:46:37 -07:00
Patrick Steinhardt
8f49151763 object-store: rename files to "odb.{c,h}"
In the preceding commits we have renamed the structures contained in
"object-store.h" to `struct object_database` and `struct odb_backend`.
As such, the code files "object-store.{c,h}" are confusingly named now.
Rename them to "odb.{c,h}" accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 14:46:34 -07:00
Junio C Hamano
349083805e Merge branch 'js/diff-codeql-false-positive-workaround'
Work around false positive given by CodeQL.

* js/diff-codeql-false-positive-workaround:
  diff: check range before dereferencing an array element
2025-05-08 12:36:32 -07:00
Johannes Schindelin
104add8368 diff: check range before dereferencing an array element
Before accessing an array element at a given index, it should be
verified that the index is within the desired bounds, not afterwards,
otherwise it may not make sense to even access the array element in the
first place. This is the point of CodeQL's
`cpp/offset-use-before-range-check` rule.

This CodeQL rule unfortunately is also triggered by the
`fill_es_indent_data()` code, even though the condition `off < len - 1`
does not even need to guarantee that the offset is in bounds (`s` points
to a NUL-terminated string, for which `s[off] == '\r'` would fail before
running out of bounds).

Let's work around this rare false positive to help us use an otherwise
mostly useful tool is a worthy thing to do.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-29 12:38:34 -07:00
Junio C Hamano
2bc5414c41 Merge branch 'ps/parse-options-integers'
Update parse-options API to catch mistakes to pass address of an
integral variable of a wrong type/size.

* ps/parse-options-integers:
  parse-options: detect mismatches in integer signedness
  parse-options: introduce precision handling for `OPTION_UNSIGNED`
  parse-options: introduce precision handling for `OPTION_INTEGER`
  parse-options: rename `OPT_MAGNITUDE()` to `OPT_UNSIGNED()`
  parse-options: support unit factors in `OPT_INTEGER()`
  global: use designated initializers for options
  parse: fix off-by-one for minimum signed values
2025-04-24 17:25:34 -07:00
Patrick Steinhardt
d012ceb5f3 global: use designated initializers for options
While we expose macros for most of our different option types understood
by the "parse-options" subsystem, not every combination of fields that
has one as that would otherwise quickly lead to an explosion of macros.
Instead, we just initialize structures manually for those variants of
fields that don't have a macro.

Callsites that open-code these structure initialization don't use
designated initializers though and instead just provide values for each
of the fields that they want to initialize. This has three significant
downsides:

  - Callsites need to specify all values up to the last field that they
    care about. This often includes fields that should simply be left at
    their default zero-initialized state, which adds distraction.

  - Any reader not deeply familiar with the layout of the structure
    has a hard time figuring out what the respective initializers mean.

  - Reordering or introducing new fields in the middle of the structure
    is impossible without adapting all callsites.

Convert all sites to instead use designated initializers, which we have
started using in our codebase quite a while ago. This allows us to skip
any default-initialized fields, gives the reader context by specifying
the field names and allows us to reorder or introduce new fields where
we want to.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-17 08:15:15 -07:00
Patrick Steinhardt
68cd492a3e object-store: merge "object-store-ll.h" and "object-store.h"
The "object-store-ll.h" header has been introduced to keep transitive
header dependendcies and compile times at bay. Now that we have created
a new "object-store.c" file though we can easily move the last remaining
additional bit of "object-store.h", the `odb_path_map`, out of the
header.

Do so. As the "object-store.h" header is now equivalent to its low-level
alternative we drop the latter and inline it into the former.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15 08:24:37 -07:00
Junio C Hamano
0dfca98881 Merge branch 'ps/object-wo-the-repository' into ps/object-file-cleanup
* ps/object-wo-the-repository:
  hash: stop depending on `the_repository` in `null_oid()`
  hash: fix "-Wsign-compare" warnings
  object-file: split out logic regarding hash algorithms
  delta-islands: stop depending on `the_repository`
  object-file-convert: stop depending on `the_repository`
  pack-bitmap-write: stop depending on `the_repository`
  pack-revindex: stop depending on `the_repository`
  pack-check: stop depending on `the_repository`
  environment: move access to "core.bigFileThreshold" into repo settings
  pack-write: stop depending on `the_repository` and `the_hash_algo`
  object: stop depending on `the_repository`
  csum-file: stop depending on `the_repository`
2025-04-08 14:28:17 -07:00
Patrick Steinhardt
7d70b29c4f hash: stop depending on the_repository in null_oid()
The `null_oid()` function returns the object ID that only consists of
zeroes. Naturally, this ID also depends on the hash algorithm used, as
the number of zeroes is different between SHA1 and SHA256. Consequently,
the function returns the hash-algorithm-specific null object ID.

This is currently done by depending on `the_hash_algo`, which implicitly
makes us depend on `the_repository`. Refactor the function to instead
pass in the hash algorithm for which we want to retrieve the null object
ID. Adapt callsites accordingly by passing in `the_repository`, thus
bubbling up the dependency on that global variable by one layer.

There are a couple of trivial exceptions for subsystems that already got
rid of `the_repository`. These subsystems instead use the repository
that is available via the calling context:

  - "builtin/grep.c"
  - "grep.c"
  - "refs/debug.c"

There are also two non-trivial exceptions:

  - "diff-no-index.c": Here we know that we may not have a repository
    initialized at all, so we cannot rely on `the_repository`. Instead,
    we adapt `diff_no_index()` to get a `struct git_hash_algo` as
    parameter. The only caller is located in "builtin/diff.c", where we
    know to call `repo_set_hash_algo()` in case we're running outside of
    a Git repository. Consequently, it is fine to continue passing
    `the_repository->hash_algo` even in this case.

  - "builtin/ls-files.c": There is an in-flight patch series that drops
    `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic
    conflict because we use `null_oid()` in `show_submodule()`. The
    value is passed to `repo_submodule_init()`, which may use the object
    ID to resolve a tree-ish in the superproject from which we want to
    read the submodule config. As such, the object ID should refer to an
    object in the superproject, and consequently we need to use its hash
    algorithm.

    This means that we could in theory just not bother about this edge
    case at all and just use `the_repository` in "diff-no-index.c". But
    doing so would feel misdesigned.

Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in
"hash.c".

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10 13:16:20 -07:00
Patrick Steinhardt
7835ee75cd environment: move access to "core.bigFileThreshold" into repo settings
The "core.bigFileThreshold" setting is stored in a global variable and
populated via `git_default_core_config()`. This may cause issues in
the case where one is handling multiple different repositories in a
single process with different values for that config key, as we may or
may not see the correct value in that case. Furthermore, global state
blocks our path towards libification.

Refactor the code so that we instead store the value in `struct
repo_settings`, where the value is computed as-needed and cached.

Note that this change requires us to adapt one test in t1050 that
verifies that we die when parsing an invalid "core.bigFileThreshold"
value. The exercised Git command doesn't use the value at all, and thus
it won't hit the new code path that parses the value. This is addressed
by using git-hash-object(1) instead, which does read the value.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10 13:16:18 -07:00
Justin Tobler
c8a8e04099 diff: add option to skip resolving diff statuses
By default, `diffcore_std()` resolves the statuses for queued diff file
pairs by calling `diff_resolve_rename_copy()`. If status information is
already manually set, invoking `diffcore_std()` may change the status
value.

Introduce the `skip_resolving_statuses` diff option that prevents
`diffcore_std()` from resolving file pair statuses when enabled.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-03 08:17:47 -08:00
Justin Tobler
7c67d2a070 diff: return diff_filepair from diff queue helpers
The `diff_addremove()` and `diff_change()` functions set up and queue
diffs, but do not return the `diff_filepair` added to the queue. In a
subsequent commit, modifications to `diff_filepair` need to occur in
certain cases after being queued.

Since the existing `diff_addremove()` and `diff_change()` are also used
for callbacks in `diff_options` as types `add_remove_fn_t` and
`change_fn_t`, modifying the existing function signatures requires
further changes. The diff options for pruning use `file_add_remove()`
and `file_change()` where file pairs do not even get queued. Thus,
separate functions are implemented instead.

Split out the queuing operations into `diff_queue_addremove()` and
`diff_queue_change()` which also return a handle to the queued
`diff_filepair`. Both `diff_addremove()` and `diff_change()` are
reimplemented as thin wrappers around the new functions.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-03 08:17:47 -08:00
Junio C Hamano
a8a5bb1f78 Merge branch 'bc/diff-reject-empty-arg-to-pickaxe'
The -G/-S options to the "diff" family of commands caused us to hit
a BUG() when they get no values; they have been corrected.

* bc/diff-reject-empty-arg-to-pickaxe:
  diff: don't crash with empty argument to -G or -S
2025-02-25 14:19:35 -08:00
brian m. carlson
a620046b29 diff: don't crash with empty argument to -G or -S
The pickaxe options, -G and -S, need either a regex or a string to look
through the history for.  An empty value isn't very useful since it
would either match everything or nothing, and what's worse, we presently
crash with a BUG like so when the user provides one:

    BUG: diffcore-pickaxe.c:241: should have needle under -G or -S

Since it's not very nice of us to crash and this wouldn't do anything
useful anyway, let's simply inform the user that they must provide a
non-empty argument and exit with an error if they provide an empty one
instead.

Reported-by: Jared Van Bortel <cebtenzzre@gmail.com>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 10:17:02 -08:00
Patrick Steinhardt
0578f1e66a global: adapt callers to use generic hash context helpers
Adapt callers to use generic hash context helpers instead of using the
hash algorithm to update them. This makes the callsites easier to reason
about and removes the possibility that the wrong hash algorithm is used
to update the hash context's state. And as a nice side effect this also
gets rid of a bunch of users of `the_hash_algo`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31 10:06:11 -08:00
Patrick Steinhardt
7346e340f1 hash: stop typedeffing the hash context
We generally avoid using `typedef` in the Git codebase. One exception
though is the `git_hash_ctx`, likely because it used to be a union
rather than a struct until the preceding commit refactored it. But now
that it is a normal `struct` there isn't really a need for a typedef
anymore.

Drop the typedef and adapt all callers accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31 10:06:10 -08:00
Junio C Hamano
4156b6a741 Merge branch 'ps/build-sign-compare'
Start working to make the codebase buildable with -Wsign-compare.

* ps/build-sign-compare:
  t/helper: don't depend on implicit wraparound
  scalar: address -Wsign-compare warnings
  builtin/patch-id: fix type of `get_one_patchid()`
  builtin/blame: fix type of `length` variable when emitting object ID
  gpg-interface: address -Wsign-comparison warnings
  daemon: fix type of `max_connections`
  daemon: fix loops that have mismatching integer types
  global: trivial conversions to fix `-Wsign-compare` warnings
  pkt-line: fix -Wsign-compare warning on 32 bit platform
  csum-file: fix -Wsign-compare warning on 32-bit platform
  diff.h: fix index used to loop through unsigned integer
  config.mak.dev: drop `-Wno-sign-compare`
  global: mark code units that generate warnings with `-Wsign-compare`
  compat/win32: fix -Wsign-compare warning in "wWinMain()"
  compat/regex: explicitly ignore "-Wsign-compare" warnings
  git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-23 09:32:11 -08:00
Patrick Steinhardt
59b6131a67 pager: stop using the_repository
Stop using `the_repository` in the "pager" subsystem by passing in a
repository when setting up the pager and when configuring it.

Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18 10:44:30 -08:00
Junio C Hamano
913a1e157c Merge branch 'ps/build-sign-compare' into ps/the-repository
* ps/build-sign-compare:
  t/helper: don't depend on implicit wraparound
  scalar: address -Wsign-compare warnings
  builtin/patch-id: fix type of `get_one_patchid()`
  builtin/blame: fix type of `length` variable when emitting object ID
  gpg-interface: address -Wsign-comparison warnings
  daemon: fix type of `max_connections`
  daemon: fix loops that have mismatching integer types
  global: trivial conversions to fix `-Wsign-compare` warnings
  pkt-line: fix -Wsign-compare warning on 32 bit platform
  csum-file: fix -Wsign-compare warning on 32-bit platform
  diff.h: fix index used to loop through unsigned integer
  config.mak.dev: drop `-Wno-sign-compare`
  global: mark code units that generate warnings with `-Wsign-compare`
  compat/win32: fix -Wsign-compare warning in "wWinMain()"
  compat/regex: explicitly ignore "-Wsign-compare" warnings
  git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-18 10:43:16 -08:00
Patrick Steinhardt
41f43b8243 global: mark code units that generate warnings with -Wsign-compare
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06 20:20:02 +09:00
Karthik Nayak
cc656f4eb2 packfile: pass down repository to has_object[_kept]_pack
The functions `has_object[_kept]_pack` currently rely on the global
variable `the_repository`. To eliminate global variable usage in
`packfile.c`, we should progressively shift the dependency on
the_repository to higher layers. Let's remove its usage from these
functions and any related ones.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04 08:21:54 +09:00
Junio C Hamano
3eb4cc451e Merge branch 'jk/output-prefix-cleanup'
Code clean-up.

* jk/output-prefix-cleanup:
  diff: store graph prefix buf in git_graph struct
  diff: return line_prefix directly when possible
  diff: return const char from output_prefix callback
  diff: drop line_prefix_length field
  line-log: use diff_line_prefix() instead of custom helper
2024-10-10 14:22:30 -07:00
Jeff King
436728fe9d diff: return const char from output_prefix callback
The diff_options structure has an output_prefix callback for returning a
prefix string, but it does so by returning a pointer to a strbuf.

This makes the interface awkward. There's no reason the callback should
need to use a strbuf, and it creates questions about whether the
ownership of the resulting buffer should be transferred to the caller
(it should not be, but a recent attempt to clean up this code led to a
double-free in some cases).

The one advantage we get is that the strbuf contains a ptr/len pair, so
we could in theory have a prefix with embedded NULs. But we can observe
that none of the existing callbacks would ever produce such a NUL (they
are usually just indentation or graph symbols, and even the
"--line-prefix" option takes a NUL-terminated string).

And anyway, only one caller (the one in log_tree_diff_flush) actually
looks at the strbuf length. In every other case we use a helper function
which discards the length and just returns the NUL-terminated string.

So let's just have the callback return a "const char *" pointer. It's up
to the callbacks themselves if they want to use a strbuf under the hood.
And now the caller in log_tree_diff_flush() can just use the helper
function along with everybody else. That lets us even simplify out the
function pointer check, since the helper returns an empty string
(technically this does mean we'll sometimes issue an empty fputs() call,
but I don't think this code path is hot enough to care about that).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-03 14:22:22 -07:00
Jeff King
2011bb4f34 diff: drop line_prefix_length field
The diff_options structure holds a line_prefix string and an associated
length. But the length is always just the strlen() of the NUL-terminated
string. Let's simplify the code by just storing the string pointer and
assuming it is NUL-terminated when we use it.

This will cause us to compute the string length in a few extra spots,
but I don't think any of these are particularly hot code paths.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-03 14:22:21 -07:00
Patrick Steinhardt
a5aecb2cdc diff: improve lifecycle management of diff queues
The lifecycle management of diff queues is somewhat confusing:

  - For most of the part this can be attributed to `DIFF_QUEUE_CLEAR()`,
    which does not release any memory but rather initializes the queue,
    only. This is in contrast to our common naming schema, where
    "clearing" means that we release underlying memory and then
    re-initialize the data structure such that it is ready to use.

  - A second offender is `diff_free_queue()`, which does not free the
    queue structure itself. It is rather a release-style function.

Refactor the code to make things less confusing. `DIFF_QUEUE_CLEAR()` is
replaced by `DIFF_QUEUE_INIT` and `diff_queue_init()`, while
`diff_free_queue()` is replaced by `diff_queue_release()`. While on it,
adapt callsites where we call `DIFF_QUEUE_CLEAR()` with the intent to
release underlying memory to instead call `diff_queue_clear()` to fix
memory leaks.

This memory leak is exposed by t4211, but plugging it alone does not
make the whole test suite pass.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30 11:23:05 -07:00
Junio C Hamano
674e46fdd5 Merge branch 'ps/leakfixes-part-7' into ps/leakfixes-part-8
* ps/leakfixes-part-7: (23 commits)
  diffcore-break: fix leaking filespecs when merging broken pairs
  revision: fix leaking parents when simplifying commits
  builtin/maintenance: fix leak in `get_schedule_cmd()`
  builtin/maintenance: fix leaking config string
  promisor-remote: fix leaking partial clone filter
  grep: fix leaking grep pattern
  submodule: fix leaking submodule ODB paths
  trace2: destroy context stored in thread-local storage
  builtin/difftool: plug several trivial memory leaks
  builtin/repack: fix leaking configuration
  diffcore-order: fix leaking buffer when parsing orderfiles
  parse-options: free previous value of `OPTION_FILENAME`
  diff: fix leaking orderfile option
  builtin/pull: fix leaking "ff" option
  dir: fix off by one errors for ignored and untracked entries
  builtin/submodule--helper: fix leaking remote ref on errors
  t/helper: fix leaking subrepo in nested submodule config helper
  builtin/submodule--helper: fix leaking error buffer
  builtin/submodule--helper: clear child process when not running it
  submodule: fix leaking update strategy
  ...
2024-09-30 11:22:10 -07:00
Patrick Steinhardt
76c7e708bb diff: fix leaking orderfile option
The `orderfile` diff option is being assigned via `OPT_FILENAME()`,
which assigns an allocated string to the variable. We never free it
though, causing a memory leak.

Change the type of the string to `char *` and free it to plug the leak.
This also requires us to use `xstrdup()` to assign the global config to
it in case it is set.

This leak is being hit in t7621, but plugging it alone does not make the
test suite pass.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-27 08:25:35 -07:00
Junio C Hamano
f92c61aef0 Merge branch 'rs/diff-exit-code-binary'
"git diff --exit-code" ignored modified binary files, which has
been corrected.

* rs/diff-exit-code-binary:
  diff: report modified binary files as changes in builtin_diff()
2024-09-25 18:24:52 -07:00
René Scharfe
9a41735af6 diff: report modified binary files as changes in builtin_diff()
The diff machinery has two ways to detect changes to set the exit code:
Just comparing hashes and comparing blob contents.  The latter is needed
if certain changes have to be ignored, e.g. with --ignore-space-change
or --ignore-matching-lines.  It's enabled by the diff_options flag
diff_from_contents.

The code for handling binary files added by 1aaf69e669 (diff: shortcut
for diff'ing two binary SHA-1 objects, 2014-08-16) always uses a quick
hash-only comparison, even if the slow way is taken.  We need it to
report a hash difference as a change for the purpose of setting the
exit code, though, but it never did.  Fix that.

d7b97b7185 (diff: let external diffs report that changes are
uninteresting, 2024-06-09) set diff_from_contents if external diff
programs are allowed.  This is the default e.g. for git diff, and so
that change exposed the inconsistency much more widely.

Reported-by: Kohei Shibata <shiba200712@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-23 09:41:07 -07:00
Junio C Hamano
be8ca2848a Merge branch 'jc/range-diff-lazy-setup'
Code clean-up.

* jc/range-diff-lazy-setup:
  remerge-diff: clean up temporary objdir at a central place
  remerge-diff: lazily prepare temporary objdir on demand
2024-09-16 14:22:55 -07:00
Junio C Hamano
41390eb3e6 Merge branch 'rs/diff-exit-code-fix'
In a few corner cases "git diff --exit-code" failed to report
"changes" (e.g., renamed without any content change), which has
been corrected.

* rs/diff-exit-code-fix:
  diff: report dirty submodules as changes in builtin_diff()
  diff: report copies and renames as changes in run_diff_cmd()
2024-09-13 15:27:43 -07:00
René Scharfe
11591850dd diff: report dirty submodules as changes in builtin_diff()
The diff machinery has two ways to detect changes to set the exit code:
Just comparing hashes and comparing blob contents.  The latter is needed
if certain changes have to be ignored, e.g. with --ignore-space-change
or --ignore-matching-lines.  It's enabled by the diff_options flag
diff_from_contents.

The slower mode as never considered submodules (and subrepos) as changes
with --submodule=diff or --submodule=log, which is inconsistent with
--submodule=short (the default).  Fix it.

d7b97b7185 (diff: let external diffs report that changes are
uninteresting, 2024-06-09) set diff_from_contents if external diff
programs are allowed.  This is the default e.g. for git diff, and so
that change exposed the inconsistency much more widely.

Reported-by: David Hull <david.hull@friendbuy.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-08 13:21:24 -07:00
René Scharfe
87cf96094a diff: report copies and renames as changes in run_diff_cmd()
The diff machinery has two ways to detect changes to set the exit code:
Just comparing hashes and comparing blob contents.  The latter is needed
if certain changes have to be ignored, e.g. with --ignore-space-change
or --ignore-matching-lines.  It's enabled by the diff_options flag
diff_from_contents.

The slower mode has never considered copies and renames to be changes,
which is inconsistent with the quicker one.  Fix it.  Even if we ignore
the file contents (because it's empty or contains only ignored lines),
there's still the meta data change of adding or changing a filename, so
we need to report it in the exit code.

d7b97b7185 (diff: let external diffs report that changes are
uninteresting, 2024-06-09) set diff_from_contents if external diff
programs are allowed.  This is the default e.g. for git diff, and so
that change exposed the inconsistency much more widely.

Reported-by: Jorge Luis Martinez Gomez <jol@jol.dev>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-08 13:21:23 -07:00