Commit Graph

1667 Commits

Author SHA1 Message Date
Junio C Hamano
3b212a83fe Merge branch 'jc/whitespace-incomplete-line'
Both "git apply" and "git diff" learn a new whitespace error class,
"incomplete-line".

* jc/whitespace-incomplete-line:
  attr: enable incomplete-line whitespace error for this project
  diff: highlight and error out on incomplete lines
  apply: check and fix incomplete lines
  whitespace: allocate a few more bits and define WS_INCOMPLETE_LINE
  apply: revamp the parsing of incomplete lines
  diff: update the way rewrite diff handles incomplete lines
  diff: call emit_callback ecbdata everywhere
  diff: refactor output of incomplete line
  diff: keep track of the type of the last line seen
  diff: correct suppress_blank_empty hack
  diff: emit_line_ws_markup() if/else style fix
  whitespace: correct bit assignment comments
2025-11-30 18:31:40 -08:00
Junio C Hamano
1b93acd13a Merge branch 'ad/blame-diff-algorithm'
"git blame" learns "--diff-algorithm=<algo>" option.

* ad/blame-diff-algorithm:
  blame: make diff algorithm configurable
  xdiff: add 'minimal' to XDF_DIFF_ALGORITHM_MASK
2025-11-26 10:32:40 -08:00
Junio C Hamano
3176576a56 Merge branch 'rs/diff-quiet-no-rename'
As "git diff --quiet" only cares about the existence of any
changes, disable rename/copy detection to skip more expensive
processing whose result will be discarded anyway.

* rs/diff-quiet-no-rename:
  diff: disable rename detection with --quiet
2025-11-21 09:14:15 -08:00
Antonin Delpeuch
881793c4f7 xdiff: add 'minimal' to XDF_DIFF_ALGORITHM_MASK
The XDF_DIFF_ALGORITHM_MASK bit mask only includes bits for the patience
and histogram diffs, not for the minimal one. This means that when
reseting the diff algorithm to the default one, one needs to separately
clear the bit for the minimal diff. There are places in the code that fail
to do that: merge-ort.c and builtin/merge-file.c.

Add the XDF_NEED_MINIMAL bit to the bit mask, and remove the separate
clearing of this bit in the places where it hasn't been forgotten.

Signed-off-by: Antonin Delpeuch <antonin@delpeuch.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-17 09:31:59 -08:00
Junio C Hamano
ab2693cb52 diff: highlight and error out on incomplete lines
Teach "git diff" to highlight "\ No newline at end of file" message
as a whitespace error when incomplete-line whitespace error class is
in effect.  Thanks to the previous refactoring of complete rewrite
code path, we can do this at a single place.

Unlike whitespace errors in the payload where we need to annotate in
line, possibly using colors, the line that has whitespace problems,
we have a dedicated line already that can serve as the error
message, so paint it as a whitespace error message.

Also teach "git diff --check" to notice incomplete lines as
whitespace errors and report when incomplete-line whitespace error
class is in effect.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-12 14:04:05 -08:00
Junio C Hamano
a675104c39 whitespace: allocate a few more bits and define WS_INCOMPLETE_LINE
Reserve a few more bits in the diff flags word to be used for future
whitespace rules.  Add WS_INCOMPLETE_LINE without implementing the
behaviour (yet).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-12 14:04:04 -08:00
Junio C Hamano
8d8e3c6187 diff: update the way rewrite diff handles incomplete lines
The diff_symbol based output framework uses one DIFF_SYMBOL_* enum
value per the kind of output lines of "git diff", which corresponds
to one output line from the xdiff machinery used internally.  Most
notably, DIFF_SYMBOL_PLUS and DIFF_SYMBOL_MINUS that correspond to
"+" and "-" lines are designed to always take a complete line, even
if the output from xdiff machinery may produce "\ No newline at the
end of file" immediately after them.

But this is not true in the rewrite-diff codepath, which completely
bypasses the xdiff machinery.  Since the code path feeds the bytes
directly from the payload to the output routines, the output layer
has to deal with an incomplete line with DIFF_SYMBOL_PLUS and
DIFF_SYMBOL_MINUS, which never would see an incomplete line in the
normal code paths.  This lack of final newline is compensated by an
ugly hack for a fabricated DIFF_SYMBOL_NO_LF_EOF token to inject an
extra newline to the output to simulate output coming from the xdiff
machinery.

Revamp the way the complete-rewrite code path feeds the lines to the
output layer by treating the last line of the pre/post image when it
is an incomplete line specially.

This lets us remove the DIFF_SYMBOL_NO_LF_EOF hack and use the usual
DIFF_SYMBOL_CONTEXT_INCOMPLETE code path, which will later learn how
to handle whitespace errors.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-12 14:04:04 -08:00
Junio C Hamano
35925f1832 diff: call emit_callback ecbdata everywhere
Everybody else, except for emit_rewrite_lines(), calls the
emit_callback data ecbdata.  Make sure we call the same thing by
the same name for consistency.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-12 14:04:04 -08:00
Junio C Hamano
29228cbdc5 diff: refactor output of incomplete line
Create a helper function that reacts to "\ No newline at the end of
file" in preparation for unifying the incomplete line handling in
the code path that handles xdiff output and the code path that
bypasses xdiff and produces a complete-rewrite patch.

Currently the output from the DIFF_SYMBOL_CONTEXT_INCOMPLETE case
still (ab)uses the same code as what is used for context lines, but
that would change in a later step where we introduce support to treat
an incomplete line as a whitespace error.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-12 14:04:04 -08:00
Junio C Hamano
ced0561828 diff: keep track of the type of the last line seen
The "\ No newline at the end of the file" can come after any of the
"-" (deleted preimage line), " " (unchanged line), or "+" (added
postimage line).  In later steps in this series, we will start
treating a change that makes a file to end in an incomplete line
as a whitespace error, and we would need to know what the previous
line was when we react to "\ No newline" in the diff output.  If
the previous line was a context (i.e., unchanged) line, the file
lacked the final newline before the change, and the change did not
touch that line and left it still incomplete, so we do not want to
warn in such a case.

Teach fn_out_consume() function to keep track of what the previous
line was, and prepare an otherwise empty switch statement to let us
react differently to "\ No newline" based on that.

Note that there is an existing curiosity (read: likely to be a bug)
in the code that increments line number in the preimage file every
time it sees a line with "\ No newline" on it, regardless of what
the previous line was.  I left it as-is, because it does not affect
the main theme of this series, and more importantly, I do not think
it matters, as these numbers are used only to compare them with
blank_at_eof_in_{pre,post}image to issue a warning when we see more
empty line was added at the end, but by definition, after we see
"\ No newline at the end of the file" for an added line, we will not
see an added line for the file.

An independent audit to ensure that this curious increment can be
safely removed would make a good #leftoverbits clean-up (we may even
find some code that decrements this counter or over-increments the
other quantity this counter is compared with that compensates the
effect of this curious increment that hides a bug, in which case we
may also need to remove them).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-12 14:04:04 -08:00
Junio C Hamano
fc7abcd9d5 diff: correct suppress_blank_empty hack
The suppress-blank-empty feature abused the CONTEXT_INCOMPLETE
symbol that was meant to be used only for "\ No newline at the end
of file" code path.

The intent of the feature was to turn a context line we receive from
xdiff machinery (which always uses ' ' for context lines, even an
empty one) and spit it out as a truly empty line.

Perform such a conversion very locally at where a line from xdiff
that begins with ' ' is handled for output; there are many checks
before the control reaches such place that checks the first letter
of the diff output line to see if it is a context line, and having
to check for '\n' and treat it as a special case is error prone.

In order to catch similar hacks in the future, make sure the code
path that is meant for "\ No newline" case checks the first byte is
indeed a backslash.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-12 14:04:04 -08:00
Junio C Hamano
f83d1afafb diff: emit_line_ws_markup() if/else style fix
Apply the simple rule: if you need {} in one arm of the if/else
if/else... cascade, have {} in all of them.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-12 14:04:04 -08:00
Junio C Hamano
8d4725e48e whitespace: correct bit assignment comments
A comment in diff.c claimed that bits up to 12th (counting from 0th)
are whitespace rules, and 13th thru 15th are for new/old/context,
but it turns out it was miscounting.  Correct them, and clarify
where the whitespace rule bits come from in the comment.  Extend bit
assignment comments to cover bits used for color-moved, which
weren't described.

Also update the way these bit constants are defined to use (1 << N)
notation, instead of octal constants, as it tends to make it easier
to notice a breakage like this.

Sprinkle a few blank lines between logically distinct groups of CPP
macro definitions to make them easier to read.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-12 14:04:04 -08:00
René Scharfe
fa052367ef diff: disable rename detection with --quiet
Detecting renames and copies improves diff's output.  This effort is
wasted if we don't show any.  Disable detection in that case.

This actually fixes the error code when using the options --cached,
--find-copies-harder, --no-ext-diff and --quiet together:
run_diff_index() indirectly calls diff-lib.c::show_modified(), which
queues even non-modified entries using diff_change() because we need
them for copy detection.  diff_change() sets flags.has_changes, though,
which causes diff_can_quit_early() to declare we're done after seeing
only the very first entry -- way too soon.

Using --cached, --find-copies-harder and --quiet together without
--no-ext-diff was not affected even before, as it causes the flag
flags.diff_from_contents to be set, which disables the optimization
in a different way.

Reported-by: D. Ben Knoble <ben.knoble@gmail.com>
Suggested-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-10 11:23:57 -08:00
Jeff King
2ecb8857e7 diff: simplify run_external_diff() quiet logic
We'd sometimes end up in run_external_diff() to do a dry-run diff (e.g.,
to find content-level changes for --quiet). We recognize this quiet mode
by seeing the lack of DIFF_FORMAT_PATCH in the output format.

But since introducing an explicit dry-run check via 3ed5d8bd73 (diff:
stop output garbled message in dry run mode, 2025-10-20), this logic can
never trigger. We can only get to this function by calling
diff_flush_patch(), and that comes from only two places:

  1. A dry-run flush comes from diff_flush_patch_quietly(), which is
     always in dry-run mode (so the other half of our "||" is true
     anyway).

  2. A regular flush comes from diff_flush_patch_all_file_pairs(),
     which is only called when output_format has DIFF_FORMAT_PATCH in
     it.

So we can simplify our "quiet" condition to just checking dry-run mode
(which used to be a specific flag, but recently became just a NULL
"file" pointer). And since it's so simple, we can just do that inline.
This makes the logic about o->file more obvious, since we handle the
NULL and non-stdout cases next to each other.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24 10:38:58 -07:00
Jeff King
1ad2760020 diff: drop dry-run redirection to /dev/null
As an added protection against dry-run diffs accidentally producing
output, we redirect diff_options.file to /dev/null. But as of the
previous patch, this now does nothing, since dry-run diffs are
implemented by setting "file" to NULL.

So we can drop this extra code with no change in behavior. This is
effectively a revert of 623f7af284 (diff: restore redirection to
/dev/null for diff_from_contents, 2025-10-17) and 3da4413dbc (diff: make
sure the other caller of diff_flush_patch_quietly() is silent,
2025-10-22), but:

  1. We get a conflict because we already dropped the color_moved
     handling in an earlier patch. But we just resolve the conflicts to
     "theirs" (removing all of the code).

  2. We retain the test from 623f7af284.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24 10:15:22 -07:00
Jeff King
b2b5ad514d diff: replace diff_options.dry_run flag with NULL file
We introduced a dry_run flag to diff_options in b55e6d36eb (diff: ensure
consistent diff behavior with ignore options, 2025-08-08), with the idea
that the lower-level diff code could skip output when it is set.

As we saw with the bugs fixed by 3ed5d8bd73 (diff: stop output garbled
message in dry run mode, 2025-10-20), it is easy to miss spots. In the
end, we located all of them by checking where diff_options.file is used.

That suggests another possible approach: we can replace the dry_run
boolean with a NULL pointer for "file", as we know that using "file" in
dry_run mode would always be an error. This turns any missed spots from
producing extra output[1] into a segfault. Which is less forgiving, but
that is the point: this is indicative of a programming error, and
complaining loudly and immediately is good.

[1] We protect ourselves against garbled output as a separate step,
    courtesy of 623f7af284 (diff: restore redirection to /dev/null for
    diff_from_contents, 2025-10-17). So in that sense this patch can
    only introduce user-visible errors (since any "bugs" were going to
    /dev/null before), but the idea is to catch them rather than quietly
    send garbage to /dev/null.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24 10:15:22 -07:00
Jeff King
0152831d96 diff: drop save/restore of color_moved in dry-run mode
When running a dry-run content-level diff to check whether a "--quiet"
diff has any changes, we have always unset the color_moved variable
since the feature was added in 2e2d5ac184 (diff.c: color moved lines
differently, 2017-06-30). The reasoning is not given explicitly there,
but presumably the idea is that since color_moved requires a lot of
extra computation to match lines but does not actually affect the
found_changes flag, we want to skip it.

Later, in 3da4413dbc (diff: make sure the other caller of
diff_flush_patch_quietly() is silent, 2025-10-22) we copied the same
idea for other dry-run diffs.

But neither spot actually needs to reset this flag at all, because
diff_flush_patch() will not ever compute color_moved. Nor could it, as
it is only looking at a single file-pair, and we detect moves across
files. So color_moved is checked only when we are actually doing real
DIFF_FORMAT_PATCH output, and call diff_flush_patch_all_file_pairs().

So we can get rid of these extra lines to save and restore the
color_moved flag without changing the behavior at all. (Note that there
is no "restore" to drop for the second caller, as we know at that point
we are not generating any output and can just leave the feature
disabled).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24 10:15:21 -07:00
Jeff King
57c2b6cc86 diff: send external diff output to diff_options.file
Diff output usually goes to the process stdout, but it can be redirected
with the "--output" option. We store this in the "file" pointer of
diff_options, and all of the diff code should write there instead of to
stdout.

But there's one spot we missed: running an external diff cmd. We don't
redirect its output at all, so it just defaults to the stdout of the
parent process. We should instead point its stdout at our output file.
There are a few caveats to watch out for when doing so:

  - The stdout field takes a descriptor, not a FILE pointer. We can pull
    out the descriptor with fileno().

  - The run-command API always closes the stdout descriptor we pass to
    it. So we must duplicate it (otherwise we break the FILE pointer,
    since it now points to a closed descriptor).

  - We don't need to worry about closing our dup'd descriptor, since the
    point is that run-command will do it for us (even in the case of an
    error). But we do need to make sure we skip the dup() if we set
    no_stdout (because then run-command will not look at it at all).

  - When the output is going to stdout, it would not be wrong to dup()
    the descriptor, but we don't need to. We can skip that extra work
    with a simple pointer comparison.

  - It seems like you'd need to fflush() the descriptor before handing
    off a copy to the child process to prevent out-of-order writes. But
    that was true even before this patch! It works because run-command
    always calls fflush(NULL) before running the child.

The new test shows the breakage (and fix). The need for duplicating the
descriptor doesn't need a new test; that is covered by the later test
"GIT_EXTERNAL_DIFF with more than one changed files".

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24 10:15:21 -07:00
Junio C Hamano
5139fce01f Merge branch 'jc/diff-from-contents-fix'
The code to squelch output from "git diff -w --name-status"
etc. for paths that "git diff -w -p" would have stayed silent
leaked output from dry-run patch generation, which has been
corrected.

* jc/diff-from-contents-fix:
  diff: make sure the other caller of diff_flush_patch_quietly() is silent
2025-10-24 09:10:37 -07:00
Junio C Hamano
88b3704ab1 Merge branch 'jk/diff-from-contents-fix'
Recently we attempted to improve "git diff -w" and friends to
handle cases where patch output would be suppressed, but it
introduced a bug that emits unnecessary output, which has been
corrected.

* jk/diff-from-contents-fix:
  diff: restore redirection to /dev/null for diff_from_contents
2025-10-24 09:10:37 -07:00
Lidong Yan
3ed5d8bd73 diff: stop output garbled message in dry run mode
Earlier, b55e6d36 (diff: ensure consistent diff behavior with
ignore options, 2025-08-08) introduced "dry-run" mode to the
diff machinery so that content-based diff filtering (like
ignoring space changes or those that match -I<regex>) can first
try to produce a patch without emitting any output to see if
under the given diff filtering condition we would get any output
lines, and a new helper function diff_flush_patch_quietly() was
introduced to use the mode to see an individual filepair needs
to be shown.

However, the solution was not complete. When files are deleted,
file modes change, or there are unmerged entries in the index,
dry-run mode still produces output because we overlooked these
conditions, and as a result, dry-run mode was not quiet.

To fix this, return early in emit_diff_symbol_from_struct() if
we are in dry-run mode. This function will be called by all the
emit functions to output the results. Returning early can avoid
diff output when files are deleted or file modes are changed.
Stop print message in dry-run mode if we have unmerged entries
in index. Discard output of external diff tool in dry-run mode.

Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-23 09:06:52 -07:00
Junio C Hamano
0adac327a7 Merge branch 'jc/diff-from-contents-fix' into ly/diff-name-only-with-diff-from-content
* jc/diff-from-contents-fix:
  diff: make sure the other caller of diff_flush_patch_quietly() is silent
2025-10-23 09:06:29 -07:00
Junio C Hamano
3da4413dbc diff: make sure the other caller of diff_flush_patch_quietly() is silent
Earlier, we added is a protection for the loop that computes "git
diff --quiet -w" to ensure calls to the diff_flush_patch_quietly()
helper stays quiet.  Do the same for another loop that deals with
options like "--name-status" to make calls to the same helper.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-23 09:05:28 -07:00
Junio C Hamano
45b5ae65e8 Merge branch 'jk/diff-from-contents-fix' into ly/diff-name-only-with-diff-from-content
* jk/diff-from-contents-fix:
  diff: restore redirection to /dev/null for diff_from_contents
2025-10-22 12:58:50 -07:00
Jeff King
623f7af284 diff: restore redirection to /dev/null for diff_from_contents
In --quiet mode, since we produce only an exit code for "something was
changed" and no actual output, we can often get by with just a
tree-level diff. However, certain options require us to actually look at
the file contents (e.g., if we are ignoring whitespace changes). We have
a flag "diff_from_contents" for that, and if it is set we call
diff_flush() on each path.

To avoid producing any output (since we were asked to be --quiet), we
traditionally just redirected the output to /dev/null. That changed in
b55e6d36eb (diff: ensure consistent diff behavior with ignore options,
2025-08-08), which replaced that with a "dry_run" flag. In theory, with
dry_run set, we should produce no output. But it carries a risk of
regression: if we forget to respect dry_run in any of the output paths,
we'll accidentally produce output.

And indeed, there is at least one such regression in that commit, as it
covered only the case where we actually call into xdiff, and not
creation or deletion diffs, where we manually generate the headers. We
even test this case in t4035, but only with diff-tree, which does not
show the bug by default because it does not require diff_from_contents.
But git-diff does, because it allows external diff programs by default
(so we must dig into each diff filepair to decide if it requires running
an external diff that may declare two distinct blobs to actually be the
same).

We should fix all of those code paths to respect dry_run correctly, but
in the meantime we can protect ourselves more fully by restoring the
redirection to /dev/null. This gives us an extra layer of protection
against regressions dues to other code paths we've missed.

Though the original issue was reported with "git diff" (and due to its
default of --ext-diff), I've used "diff-tree -w" in the new test. It
triggers the same issue, but I think the fact that "-w" implies
diff_from_contents is a bit more obvious, and fits in with the rest of
t4035.

Reported-by: Jake Zimmerman <jake@zimmerman.io>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-17 11:41:50 -07:00
Junio C Hamano
a89fa2fff2 Merge branch 'jk/color-variable-fixes'
Some places in the code confused a variable that is *not* a boolean
to enable color but is an enum that records what the user requested
to do about color.  A couple of bugs of this sort have been fixed,
while the code has been cleaned up to prevent similar bugs in the
future.

* jk/color-variable-fixes:
  config: store want_color() result in a separate bool
  add-interactive: retain colorbool values longer
  color: return bool from want_color()
  color: use git_colorbool enum type to store colorbools
  pretty: use format_commit_context.auto_color as colorbool
  diff: stop passing ecbdata->use_color as boolean
  diff: pass o->use_color directly to fill_metainfo()
  diff: don't use diff_options.use_color as a strict bool
  diff: simplify color_moved check when flushing
  grep: don't treat grep_opt.color as a strict bool
  color: return enum from git_config_colorbool()
  color: use GIT_COLOR_* instead of numeric constants
2025-09-29 11:40:35 -07:00
Jeff King
e9330ae4b8 color: use git_colorbool enum type to store colorbools
We traditionally used "int" to store and pass around the values defined
by "enum git_colorbool" (which were originally just #define macros).
Using an int doesn't produce incorrect results, but using the actual
enum makes the intent of the code more clear.

It would be nice if the compiler could catch cases where we used the
enum and an int interchangeably, since it's very easy to accidentally
check the boolean true/false of a colorbool like:

  if (branch_use_color)

This is wrong because GIT_COLOR_UNKNOWN and GIT_COLOR_AUTO evaluate to
true in C, even though we may ultimately decide not to use color. But C
is pretty happy to convert between ints and enums (even with various
-Wenum-* warnings). So this sadly doesn't protect us from such mistakes,
but it hopefully does make the code easier to read.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-16 17:59:53 -07:00
Jeff King
955000d917 diff: stop passing ecbdata->use_color as boolean
In emit_hunk_header(), we evaluate ecbdata->color_diff both as a
git_colorbool, passing it to diff_get_color():

  const char *reset = diff_get_color(ecbdata->color_diff, DIFF_RESET);

and as a strict boolean:

  const char *reverse = ecbdata->color_diff ? GIT_COLOR_REVERSE : "";

At first glance this seems wrong. Usually we store the color decision as
a git_colorbool, so the second line would get confused by GIT_COLOR_AUTO
(which is boolean true, but may still mean we do not produce color).

However, the second line is correct because our caller sets color_diff
using want_color(), which collapses the colorbool to a strict true/false
boolean. The first line is _also_ correct because of the idempotence of
want_color(). Even though diff_get_color() will pass our true/false
value through want_color() again, the result will be left untouched.

But let's pass through the colorbool itself, which makes it more
consistent with the rest of the diff code. We'll need to then call
want_color() whenever we treat it as a boolean, but there is only such
spot (the one quoted above).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-16 13:37:06 -07:00
Jeff King
12df3c2e99 diff: pass o->use_color directly to fill_metainfo()
We pass the use_color parameter of fill_metainfo() as a strict boolean,
using:

  want_color(o->use_color) && !pgm

to derive its value. But then inside the function, we pass it to
diff_get_color(), which expects one of the git_colorbool enum values,
and so feeds it to want_color() again.

Even though want_color() produces a strict 0/1 boolean, this doesn't
produce wrong results because want_color() is idempotent. Since
GIT_COLOR_ALWAYS and NEVER are defined as 1 and 0, and because
want_color() passes through those values, evaluating "want_color(foo)"
and "want_color(want_color(foo))" will return the same result.

But as part of a longer strategy to align the types we use for storing
these values, let's pass through the colorbool directly. To handle the
"&&" case here, we'll convert the presence of "pgm" into "NEVER", which
arguably makes the intent of the code more clear anyway.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-16 13:37:06 -07:00
Jeff King
4cfc971a2b diff: don't use diff_options.use_color as a strict bool
We disable --color-moved if color is not in use at all. This happens in
diff_setup_done(), where we set options->color_moved to 0 if
options->use_color is not true. But a strict boolean check here is not
correct; use_color could be GIT_COLOR_UNKNOWN or GIT_COLOR_AUTO, both of
which evaluate to true, even though we may later decide not to show
colors.

We should be using want_color() to convert that git_colorbool into a
true boolean. As it turns out, this does not produce wrong output. Even
though we go to the trouble to detect the moved lines, ultimately we get
the color values from diff_get_color(), which does check want_color().
And so it returns the empty string for each color, and we "color" the
result with nothing.

So the output is correct, but there is a small but measurable
performance cost to doing the line detection. E.g., in git.git before
and after this patch (there are no colors shown because hyperfine
redirects output to /dev/null):

  Benchmark 1: ./git.old log --no-merges -p --color-moved -1000
    Time (mean ± σ):      1.019 s ±  0.013 s    [User: 0.955 s, System: 0.064 s]
    Range (min … max):    1.005 s …  1.045 s    10 runs

  Benchmark 2: ./git.new log --no-merges -p --color-moved -1000
    Time (mean ± σ):     982.9 ms ±  14.5 ms    [User: 925.8 ms, System: 57.1 ms]
    Range (min … max):   965.1 ms … 1003.2 ms    10 runs

  Summary
    ./git.new log --no-merges -p --color-moved -1000 ran
      1.04 ± 0.02 times faster than ./git.old log --no-merges -p --color-moved -1000

Note that the fix is not quite as simple as just calling want_color()
from diff_setup_done(). There's a subtle timing issue that goes back to
daa0c3d971 (color: delay auto-color decision until point of use,
2011-08-17), the commit that adds want_color() in the first place.  As
discussed there, we must delay evaluating the colorbool value until all
pager setup is complete.

So instead, we'll leave the "color_moved" field intact in diff_setup_done(),
and modify the point where it is evaluated. Fortunately there is only
one such spot that controls whether we run any of the color-moved code
at all.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-16 13:37:05 -07:00
Jeff King
8efe643e0e diff: simplify color_moved check when flushing
In diff_flush_patch_all_file_pairs(), we set o->emitted_symbols if and
only if o->color_moved is true. That causes the lower-level routines to
fill up o->emitted_symbols, which we then analyze in order to do the
actual colorizing.

But in that final step, we do:

  if (o->emitted_symbols) {
          if (o->color_moved) {
	     ...actual coloring...
	  }
	  ...clean up of emitted_symbols...
  }

The inner "if" will always trigger, since we set emitted_symbols only
when doing color_moved (it is a little confusing that it is set inside
the diff_options struct, but that is for convenience of passing it to
the lower-level routines; we always clear it at the end of flushing,
since 48edf3a02a (diff: clear emitted_symbols flag after use,
2019-01-24)).

Let's simplify the code a bit by just dropping the inner "if" and
running its block unconditionally.

In theory the current code might be useful if another feature besides
color_moved setup and used emitted_symbols, but it would be easy to
refactor later to handle that. And in the meantime, this makes further
work in this area easier.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-16 13:37:05 -07:00
Jeff King
3c3e9b8303 color: use GIT_COLOR_* instead of numeric constants
Long ago Git's decision to show color for a subsytem was stored in a
tri-state variable: it could be true (1), false (0), or unknown (-1).
But since daa0c3d971 (color: delay auto-color decision until point of
use, 2011-08-17) we want to carry around a new state, "auto", which
bases the decision on the tty-ness of stdout (rather than collapsing
that "auto" state to a true/false immediately).

That commit introduced a set of GIT_COLOR_* defines to represent each
state: UNKNOWN, ALWAYS, NEVER, and AUTO. But it only used the AUTO
value, and left alone code using bare 0/1/-1 values. And of course since
then we've grown many new spots that use those bare values.

Let's switch all of these to use the named constants. That should make
the code a bit easier to read, as it is more obvious that we're
representing a color decision.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-16 13:37:03 -07:00
Junio C Hamano
109c3df14c Merge branch 'tc/diff-tree-max-depth'
"git diff-tree" learned "--max-depth" option.

* tc/diff-tree-max-depth:
  diff: teach tree-diff a max-depth parameter
  within_depth: fix return for empty path
  combine-diff: zero memory used for callback filepairs
2025-08-25 14:22:01 -07:00
Junio C Hamano
244214e9b6 Merge branch 'ly/diff-name-only-with-diff-from-content'
Various options to "git diff" that makes comparison ignore certain
aspects of the differences (like "space changes are ignored",
"differences in lines that match these regular expressions are
ignored") did not work well with "--name-only" and friends.

* ly/diff-name-only-with-diff-from-content:
  diff: ensure consistent diff behavior with ignore options
2025-08-22 13:13:22 -07:00
Lidong Yan
b55e6d36eb diff: ensure consistent diff behavior with ignore options
In git-diff, options like `-w` and `-I<regex>`, two files are considered
equivalent under the specified "ignore" rules, even when they are not
bit-for-bit identical. For options like `--raw`, `--name-status`,
and `--name-only`, git-diff deliberately compares only the SHA values
to determine whether two files are equivalent, for performance reasons.
As a result, a file shown in `git diff --name-status` may not appear
in `git diff --patch`.

To quickly determine whether two files are equivalent, add a helper
function diff_flush_patch_quietly() in diff.c. Add `.dry_run` field in
`struct diff_options`. When `.dry_run` is true, builtin_diff() returns
immediately upon finding any change. Call diff_flush_patch_quietly()
to determine if we should flush `--raw`, `--name-only` or `--name-status`
output.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-08 07:54:44 -07:00
Jeff King
a1dfa5448d diff: teach tree-diff a max-depth parameter
When you are doing a tree-diff, there are basically two options: do not
recurse into subtrees at all, or recurse indefinitely. While most
callers would want to always recurse and see full pathnames, some may
want the efficiency of looking only at a particular level of the tree.
This is currently easy to do for the top-level (just turn off
recursion), but you cannot say "show me what changed in subdir/, but do
not recurse".

This patch adds a max-depth parameter which is measured from the closest
pathspec match, so that you can do:

  git log --raw --max-depth=1 -- a/b/c

and see the raw output for a/b/c/, but not those of a/b/c/d/
(instead of the raw output you would see for a/b/c/d).

Co-authored-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-07 15:29:35 -07:00
Junio C Hamano
f3a303aef0 diff: simplify parsing of diff.colormovedws
The code to parse this configuration variable, whose value is a
comma-separated list of known tokens like "ignore-space-change" and
"ignore-all-space", uses string_list_split() to split the value into
pieces, and then places each piece of string in a strbuf to trim,
before comparing the result with the list of known tokens.

Thanks to the previous steps, now string_list_split() can trim the
resulting pieces before it places them in the string list.  Use it
to simplify the code.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-02 22:34:45 -07:00
Junio C Hamano
9f6dfe43c8 string-list: align string_list_split() with its _in_place() counterpart
The string_list_split_in_place() function was updated by 52acddf3
(string-list: multi-delimiter `string_list_split_in_place()`,
2023-04-24) to take more than one delimiter characters, hoping that
we can later use it to replace our uses of strtok().  We however did
not make a matching change to the string_list_split() function,
which is very similar.

Before giving both functions more features in future commits, allow
string_list_split() to also take more than one delimiter characters
to make them closer to each other.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-02 22:29:27 -07:00
Patrick Steinhardt
e989dd96b8 odb: rename oid_object_info()
Rename `oid_object_info()` to `odb_read_object_info()` as well as their
`_extended()` variant to match other functions related to the object
database and our modern coding guidelines.

Introduce compatibility wrappers so that any in-flight topics will
continue to compile.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 14:46:37 -07:00
Patrick Steinhardt
8f49151763 object-store: rename files to "odb.{c,h}"
In the preceding commits we have renamed the structures contained in
"object-store.h" to `struct object_database` and `struct odb_backend`.
As such, the code files "object-store.{c,h}" are confusingly named now.
Rename them to "odb.{c,h}" accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 14:46:34 -07:00
Junio C Hamano
349083805e Merge branch 'js/diff-codeql-false-positive-workaround'
Work around false positive given by CodeQL.

* js/diff-codeql-false-positive-workaround:
  diff: check range before dereferencing an array element
2025-05-08 12:36:32 -07:00
Johannes Schindelin
104add8368 diff: check range before dereferencing an array element
Before accessing an array element at a given index, it should be
verified that the index is within the desired bounds, not afterwards,
otherwise it may not make sense to even access the array element in the
first place. This is the point of CodeQL's
`cpp/offset-use-before-range-check` rule.

This CodeQL rule unfortunately is also triggered by the
`fill_es_indent_data()` code, even though the condition `off < len - 1`
does not even need to guarantee that the offset is in bounds (`s` points
to a NUL-terminated string, for which `s[off] == '\r'` would fail before
running out of bounds).

Let's work around this rare false positive to help us use an otherwise
mostly useful tool is a worthy thing to do.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-29 12:38:34 -07:00
Junio C Hamano
2bc5414c41 Merge branch 'ps/parse-options-integers'
Update parse-options API to catch mistakes to pass address of an
integral variable of a wrong type/size.

* ps/parse-options-integers:
  parse-options: detect mismatches in integer signedness
  parse-options: introduce precision handling for `OPTION_UNSIGNED`
  parse-options: introduce precision handling for `OPTION_INTEGER`
  parse-options: rename `OPT_MAGNITUDE()` to `OPT_UNSIGNED()`
  parse-options: support unit factors in `OPT_INTEGER()`
  global: use designated initializers for options
  parse: fix off-by-one for minimum signed values
2025-04-24 17:25:34 -07:00
Patrick Steinhardt
d012ceb5f3 global: use designated initializers for options
While we expose macros for most of our different option types understood
by the "parse-options" subsystem, not every combination of fields that
has one as that would otherwise quickly lead to an explosion of macros.
Instead, we just initialize structures manually for those variants of
fields that don't have a macro.

Callsites that open-code these structure initialization don't use
designated initializers though and instead just provide values for each
of the fields that they want to initialize. This has three significant
downsides:

  - Callsites need to specify all values up to the last field that they
    care about. This often includes fields that should simply be left at
    their default zero-initialized state, which adds distraction.

  - Any reader not deeply familiar with the layout of the structure
    has a hard time figuring out what the respective initializers mean.

  - Reordering or introducing new fields in the middle of the structure
    is impossible without adapting all callsites.

Convert all sites to instead use designated initializers, which we have
started using in our codebase quite a while ago. This allows us to skip
any default-initialized fields, gives the reader context by specifying
the field names and allows us to reorder or introduce new fields where
we want to.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-17 08:15:15 -07:00
Patrick Steinhardt
68cd492a3e object-store: merge "object-store-ll.h" and "object-store.h"
The "object-store-ll.h" header has been introduced to keep transitive
header dependendcies and compile times at bay. Now that we have created
a new "object-store.c" file though we can easily move the last remaining
additional bit of "object-store.h", the `odb_path_map`, out of the
header.

Do so. As the "object-store.h" header is now equivalent to its low-level
alternative we drop the latter and inline it into the former.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15 08:24:37 -07:00
Junio C Hamano
0dfca98881 Merge branch 'ps/object-wo-the-repository' into ps/object-file-cleanup
* ps/object-wo-the-repository:
  hash: stop depending on `the_repository` in `null_oid()`
  hash: fix "-Wsign-compare" warnings
  object-file: split out logic regarding hash algorithms
  delta-islands: stop depending on `the_repository`
  object-file-convert: stop depending on `the_repository`
  pack-bitmap-write: stop depending on `the_repository`
  pack-revindex: stop depending on `the_repository`
  pack-check: stop depending on `the_repository`
  environment: move access to "core.bigFileThreshold" into repo settings
  pack-write: stop depending on `the_repository` and `the_hash_algo`
  object: stop depending on `the_repository`
  csum-file: stop depending on `the_repository`
2025-04-08 14:28:17 -07:00
Patrick Steinhardt
7d70b29c4f hash: stop depending on the_repository in null_oid()
The `null_oid()` function returns the object ID that only consists of
zeroes. Naturally, this ID also depends on the hash algorithm used, as
the number of zeroes is different between SHA1 and SHA256. Consequently,
the function returns the hash-algorithm-specific null object ID.

This is currently done by depending on `the_hash_algo`, which implicitly
makes us depend on `the_repository`. Refactor the function to instead
pass in the hash algorithm for which we want to retrieve the null object
ID. Adapt callsites accordingly by passing in `the_repository`, thus
bubbling up the dependency on that global variable by one layer.

There are a couple of trivial exceptions for subsystems that already got
rid of `the_repository`. These subsystems instead use the repository
that is available via the calling context:

  - "builtin/grep.c"
  - "grep.c"
  - "refs/debug.c"

There are also two non-trivial exceptions:

  - "diff-no-index.c": Here we know that we may not have a repository
    initialized at all, so we cannot rely on `the_repository`. Instead,
    we adapt `diff_no_index()` to get a `struct git_hash_algo` as
    parameter. The only caller is located in "builtin/diff.c", where we
    know to call `repo_set_hash_algo()` in case we're running outside of
    a Git repository. Consequently, it is fine to continue passing
    `the_repository->hash_algo` even in this case.

  - "builtin/ls-files.c": There is an in-flight patch series that drops
    `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic
    conflict because we use `null_oid()` in `show_submodule()`. The
    value is passed to `repo_submodule_init()`, which may use the object
    ID to resolve a tree-ish in the superproject from which we want to
    read the submodule config. As such, the object ID should refer to an
    object in the superproject, and consequently we need to use its hash
    algorithm.

    This means that we could in theory just not bother about this edge
    case at all and just use `the_repository` in "diff-no-index.c". But
    doing so would feel misdesigned.

Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in
"hash.c".

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10 13:16:20 -07:00
Patrick Steinhardt
7835ee75cd environment: move access to "core.bigFileThreshold" into repo settings
The "core.bigFileThreshold" setting is stored in a global variable and
populated via `git_default_core_config()`. This may cause issues in
the case where one is handling multiple different repositories in a
single process with different values for that config key, as we may or
may not see the correct value in that case. Furthermore, global state
blocks our path towards libification.

Refactor the code so that we instead store the value in `struct
repo_settings`, where the value is computed as-needed and cached.

Note that this change requires us to adapt one test in t1050 that
verifies that we die when parsing an invalid "core.bigFileThreshold"
value. The exercised Git command doesn't use the value at all, and thus
it won't hit the new code path that parses the value. This is addressed
by using git-hash-object(1) instead, which does read the value.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10 13:16:18 -07:00
Justin Tobler
c8a8e04099 diff: add option to skip resolving diff statuses
By default, `diffcore_std()` resolves the statuses for queued diff file
pairs by calling `diff_resolve_rename_copy()`. If status information is
already manually set, invoking `diffcore_std()` may change the status
value.

Introduce the `skip_resolving_statuses` diff option that prevents
`diffcore_std()` from resolving file pair statuses when enabled.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-03 08:17:47 -08:00