From c7e3b8085bb2f74371f5017f42c58b0acf01b915 Mon Sep 17 00:00:00 2001 From: Yee Cheng Chin Date: Thu, 27 Nov 2025 02:16:06 +0000 Subject: [PATCH] xdiff: optimize patience diff's LCS search The find_longest_common_sequence() function in patience diff is inefficient as it calls binary_search() for every unique line it encounters when deciding where to put it in the sequence. From instrumentation (using xctrace) on popular repositories, binary_search() takes up 50-60% of the run time within patience_diff() when performing a diff. To optimize this, add a boundary condition check before binary_search() is called to see if the encountered unique line is located after the entire currently tracked longest subsequence. If so, skip the unnecessary binary search and simply append the entry to the end of sequence. Given that most files compared in a diff are usually quite similar to each other, this condition is very common, and should be hit much more frequently than the binary search. Below are some end-to-end performance results by timing `git log --shortstat --oneline -500 --patience` on different repositories with the old and new code. Generally speaking this seems to give at least 8-10% speed up. The "binary search hit %" column describes how often the algorithm enters the binary search path instead of the new faster path. Even in the WebKit case we can see that it's quite rare (1.46%). | Repo | Speed difference | binary search hit % | |----------|------------------|---------------------| | vim | 1.27x | 0.01% | | pytorch | 1.16x | 0.02% | | cpython | 1.14x | 0.06% | | ripgrep | 1.14x | 0.03% | | git | 1.13x | 0.12% | | vscode | 1.09x | 0.10% | | WebKit | 1.08x | 1.46% | The benchmarks were done using hyperfine, on an Apple M1 Max laptop, with git compiled with `-O3 -flto`. Signed-off-by: Yee Cheng Chin Signed-off-by: Junio C Hamano --- xdiff/xpatience.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c index 77dc411d19..d4094e6acf 100644 --- a/xdiff/xpatience.c +++ b/xdiff/xpatience.c @@ -211,7 +211,10 @@ static int find_longest_common_sequence(struct hashmap *map, struct entry **res) for (entry = map->first; entry; entry = entry->next) { if (!entry->line2 || entry->line2 == NON_UNIQUE) continue; - i = binary_search(sequence, longest, entry); + if (longest == 0 || entry->line2 > sequence[longest - 1]->line2) + i = longest - 1; + else + i = binary_search(sequence, longest, entry); entry->previous = i < 0 ? NULL : sequence[i]; ++i; if (i <= anchor_i)