git-mirror

mirror of https://github.com/git/git.git synced 2025-12-12 20:36:24 +01:00

Author	SHA1	Message	Date
Ezekiel Newren	22ce0cb639	xdiff: rename rindex -> reference_index The classic diff adds only the lines that it's going to consider, during the diff, to an array. A mapping between the compacted array, and the lines of the file that they reference, is facilitated by this array. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-18 14:53:11 -08:00
Ezekiel Newren	e35877eadb	xdiff: make xdfile_t.nreff a size_t instead of long size_t is used because nreff describes the number of elements in memory for rindex. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-18 14:53:11 -08:00
Ezekiel Newren	016538780e	xdiff: make xdfile_t.nrec a size_t instead of long size_t is used because nrec describes the number of elements for both recs, and for 'changed' + 2. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-18 14:53:10 -08:00
Ezekiel Newren	6a26019c81	xdiff: split xrecord_t.ha into line_hash and minimal_perfect_hash The ha field is serving two different purposes, which makes the code harder to read. At first glance, it looks like many places assume there could never be hash collisions between lines of the two input files. In reality, line_hash is used together with xdl_recmatch() to ensure correct comparisons of lines, even when collisions occur. To make this clearer, the old ha field has been split: * line_hash: a straightforward hash of a line, independent of any external context. Its type is uint64_t, as it comes from a fixed width hash function. * minimal_perfect_hash: Not a new concept, but now a separate field. It comes from the classifier's general-purpose hash table, which assigns each line a unique and minimal hash across the two files. A size_t is used here because it's meant to be used to index an array. This also avoids ` as usize` casts on the Rust side when using it to index a slice. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-18 14:53:10 -08:00
Ezekiel Newren	b0d4ae30f5	xdiff: use unambiguous types in xdl_hash_record() Convert the function signature and body to use unambiguous types. char is changed to uint8_t because this function processes bytes in memory. unsigned long to uint64_t so that the hash output is consistent across platforms. `flags` was changed from long to uint64_t to ensure the high order bits are not dropped on platforms that treat long as 32 bits. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-18 14:53:10 -08:00
Ezekiel Newren	9bd193253c	xdiff: use size_t for xrecord_t.size size_t is the appropriate type because size is describing the number of elements, bytes in this case, in memory. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-18 14:53:10 -08:00
Ezekiel Newren	10f97d6aff	xdiff: make xrecord_t.ptr a uint8_t instead of char Make xrecord_t.ptr uint8_t because it's referring to bytes in memory. In order to avoid a refactor avalanche, many uses of this field were cast to char* or similar. Places where casting was unnecessary: xemit.c:156 xmerge.c:124 xmerge.c:127 xmerge.c:164 xmerge.c:169 xmerge.c:172 xmerge.c:178 Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-18 14:53:10 -08:00
Ezekiel Newren	8b9c5d2e3a	xdiff: change type of xdfile_t.changed from char to bool The only values possible for 'changed' is 1 and 0, which exactly maps to a bool type. It might not look like this because action1 and action2 (which use to be dis1, and dis2) were also of type char and were assigned numerical values within a few lines of 'changed' (what used to be rchg). Using DISCARD/KEEP/INVESTIGATE for action1[i]/action2[j], and true/false for changed[k] makes it clear to future readers that these are logically separate concepts. Best-viewed-with: --color-words Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-03 10:19:40 -07:00
Ezekiel Newren	e385e1b7d2	xdiff: add macros DISCARD(0), KEEP(1), INVESTIGATE(2) in xprepare.c This commit is refactor-only; no behavior is changed. A future commit will use bool literals for changed[i]. The functions xdl_clean_mmatch() and xdl_cleanup_records() will be cleaned up more in a future patch series. The changes to xdl_cleanup_records(), in this patch, are just to make it clear why `char rchg` is refactored to `bool changed`. Rename dis* to action* and replace literal numericals with macros. The old names came from when dis* (which I think was short for discard) was treated like a boolean, but over time it grew into a ternary state machine. The result was confusing because dis* and rchg* both used 0/1 values with different meanings. The new names and macros make the states explicit. nm is short for number of matches, and mlim is a heuristic limit: nm == 0 -> action[i] = DISCARD -> changed[i] = true 0 < nm < mlim -> action[i] = KEEP -> changed[i] = false nm >= mlim -> action[i] = INVESTIGATE -> changed[i] = xdl_clean_mmatch() When need_min is true, only DISCARD and KEEP occur because the limit is effectively infinite. Best-viewed-with: --color-words Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-03 10:19:40 -07:00
Ezekiel Newren	b7de64a6d6	xdiff: rename rchg -> changed in xdfile_t The field rchg (now 'changed') declares if a line in a file is changed or not. A later commit will change it's type from 'char' to 'bool' to make its purpose even more clear. Best-viewed-with: --color-words Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-30 14:12:46 -07:00
Ezekiel Newren	d43d591252	xdiff: delete chastore from xdfile_t xdfile_t currently uses chastore_t which is an arena allocator. I think that xrecord_t used to be a linked list and recs didn't exist originally. When recs was added I think they forgot to remove xdfile_t.next, but was overlooked. This dual data structure setup makes the code somewhat confusing. Additionally the C type chastore_t isn't FFI friendly, and provides little to no performance benefit over using realloc to grow an array. Performance impact of deleting fields from xdfile_t: Deleting ha is about 5% slower. Deleting cha is about 5% faster. Delete ha, but keep cha time hyperfine --warmup 3 -L exe build_v2.51.0/git,build_delete_ha/git '{exe} log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null' Benchmark 1: build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.269 s ± 0.017 s [User: 1.135 s, System: 0.128 s] Range (min … max): 1.249 s … 1.286 s 10 runs Benchmark 2: build_delete_ha/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.339 s ± 0.017 s [User: 1.234 s, System: 0.099 s] Range (min … max): 1.320 s … 1.358 s 10 runs Summary build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null ran 1.06 ± 0.02 times faster than build_delete_ha/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Delete cha, but keep ha time hyperfine --warmup 3 -L exe build_v2.51.0/git,build_delete_chastore/git '{exe} log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null' Benchmark 1: build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.290 s ± 0.001 s [User: 1.154 s, System: 0.130 s] Range (min … max): 1.288 s … 1.292 s 10 runs Benchmark 2: build_delete_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.232 s ± 0.017 s [User: 1.105 s, System: 0.121 s] Range (min … max): 1.205 s … 1.249 s 10 runs Summary build_delete_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null ran 1.05 ± 0.01 times faster than build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Delete ha AND chastore time hyperfine --warmup 3 -L exe build_v2.51.0/git,build_delete_ha_and_chastore/git '{exe} log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null' Benchmark 1: build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.291 s ± 0.002 s [User: 1.156 s, System: 0.129 s] Range (min … max): 1.287 s … 1.295 s 10 runs Benchmark 2: build_delete_ha_and_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.306 s ± 0.001 s [User: 1.195 s, System: 0.105 s] Range (min … max): 1.305 s … 1.308 s 10 runs Summary build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null ran 1.01 ± 0.00 times faster than build_delete_ha_and_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Best-viewed-with: --color-words Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-30 14:12:46 -07:00
Ezekiel Newren	6d507bd41a	xdiff: delete fields ha, line, size in xdlclass_t in favor of an xrecord_t The fields from xdlclass_t are aliases of xrecord_t: xdlclass_t.line -> xrecord_t.ptr xdlclass_t.size -> xrecord_t.size xdlclass_t.ha -> xrecord_t.ha xdlclass_t carries a copy of the data in xrecord_t, but instead of embedding xrecord_t it duplicates the individual fields. A future commit will change the types used in xrecord_t so embed it in xdlclass_t first, so we don't have to remember to change the types here as well. Best-viewed-with: --color-words Helped-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-30 14:12:46 -07:00
Ezekiel Newren	5c294dceb2	xdiff: delete redundant array xdfile_t.ha When 0 <= i < xdfile_t.nreff the following is true: xdfile_t.ha[i] == xdfile_t.recs[xdfile_t.rindex[i]] This makes the code about 5% slower. The fields rindex and ha are specific to the classic diff (myers and minimal). I plan on creating a struct for classic diff, but there's a lot of cleanup that needs to be done before that can happen and leaving ha in would make those cleanups harder to follow. A subsequent commit will delete the chastore cha from xdfile_t. That later commit will investigate deleting ha and cha independently and together. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-30 14:12:46 -07:00
Ezekiel Newren	efaf553b1a	xdiff: delete unnecessary fields from xrecord_t and xdfile_t xrecord_t.next, xdfile_t.hbits, xdfile_t.rhash are initialized, but never used for anything by the code. Remove them. Best-viewed-with: --color-words Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-26 16:08:55 -07:00
Ezekiel Newren	d1c028bdf7	xdiff: delete local variables and initialize/free xdfile_t directly These local variables are essentially a hand-rolled additional implementation of xdl_free_ctx() inlined into xdl_prepare_ctx(). Modify the code to use the existing xdl_free_ctx() function so there aren't two ways to free such variables. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-26 16:08:54 -07:00
Ezekiel Newren	43d5f52ac4	xdiff: delete static forward declarations in xprepare Move xdl_prepare_env() later in the file to avoid the need for static forward declarations. Best-viewed-with: --color-moved Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-26 16:08:54 -07:00
Niels Glodny	03f2915541	xdiff: disable cleanup_records heuristic with --minimal The cleanup_records function marks some lines as changed before running the actual diff algorithm. For most lines, this is a good performance optimization, but it also marks lines that are surrounded by many changed lines as changed as well. This can cause redundant changes and longer-than-necessary diffs. Whether this results in better-looking diffs is subjective. However, the --minimal flag explicitly requests the shortest possible diff. The change results in shorter diffs in about 1.3% of all diffs in Git's history. Performance wise, I have measured the impact on "git log -p -3000 --minimal > /dev/null". With this change, I get Time (mean ± σ): 2.363 s ± 0.023 s (25 runs) and without this patch I measured Time (mean ± σ): 2.362 s ± 0.035 s (25 runs). As the difference is well within the margin of error, this does not seem to have an impact on performance. Signed-off-by: Niels Glodny <n.glodny@campus.lmu.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-29 12:46:58 -07:00
Phillip Wood	f7b587bf65	xdiff: introduce XDL_ALLOC_GROW() Add a helper to grow an array. This is analogous to ALLOC_GROW() in the rest of the codebase but returns −1 on allocation failure to accommodate other users of libxdiff such as libgit2. It will also return a error if the multiplication overflows while calculating the new allocation size. Note that this keeps doubling on reallocation like the code it is replacing rather than increasing the existing size by half like ALLOC_GROW(). It does however copy ALLOC_GROW()'s trick of adding a small amount to the new allocation to avoid a lot of reallocations at small sizes. Note that xdl_alloc_grow_helper() uses long rather than size_t for `nr` and `alloc` to match the existing code. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-08 09:34:30 -07:00
Phillip Wood	848fd5ae5b	xdiff: introduce XDL_CALLOC_ARRAY() Add a helper for allocating an array and initialize the elements to zero. This is analogous to CALLOC_ARRAY() in the rest of the codebase but it returns NULL on allocation failures rather than dying to accommodate other users of libxdiff such as libgit2. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-08 09:34:30 -07:00
Phillip Wood	18aae7e21e	xdiff: introduce xdl_calloc In preparation for introducing XDL_CALLOC_ARRAY() use calloc() to obtain zeroed out memory rather than malloc() followed by memset(). To try and keep the lines a reasonable length this commit also stops casting the pointer returned by calloc() as this is unnecessary. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-08 09:34:30 -07:00
Phillip Wood	abf04bdaa8	xdiff: introduce XDL_ALLOC_ARRAY() Add a helper to allocate an array that automatically calculates the allocation size. This is analogous to ALLOC_ARRAY() in the rest of the codebase but returns NULL if the allocation fails to accommodate other users of libxdiff such as libgit2. The helper will also return NULL if the multiplication in the allocation calculation overflows. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-08 09:34:30 -07:00
Junio C Hamano	2b0a58d164	Merge branch 'ep/maint-equals-null-cocci' for maint-2.35 * ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci	2022-05-02 10:06:04 -07:00
Junio C Hamano	afe8a9070b	tree-wide: apply equals-null.cocci Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-02 09:50:37 -07:00
Phillip Wood	b82dd3f7f2	xdiff: avoid unnecessary memory allocations rindex and ha are only used by xdl_cleanup_records() which is not called by the histogram or patience algorithms. The perf test results show a small reduction in run time but that is probably within the noise. Test HEAD^ HEAD ----------------------------------------------------------------------------- 4000.1: log -3000 (baseline) 0.19(0.17+0.02) 0.19(0.12+0.07) +0.0% 4000.2: log --raw -3000 (tree-only) 0.98(0.78+0.20) 0.98(0.81+0.16) +0.0% 4000.3: log -p -3000 (Myers) 4.81(4.15+0.64) 4.81(4.23+0.56) +0.0% 4000.4: log -p -3000 --histogram 5.87(5.19+0.66) 5.83(5.11+0.70) -0.7% 4000.5: log -p -3000 --patience 5.35(4.60+0.73) 5.31(4.61+0.69) -0.7% Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-11-18 22:23:45 -08:00
Phillip Wood	663c5ad035	diff histogram: intern strings Histogram is the only diff algorithm not to call xdl_classify_record(). xdl_classify_record() ensures that the hash values of two strings that are not equal differ which means that it is not necessary to use xdl_recmatch() when comparing lines, all that is necessary is to compare the hash values. This gives a 7% reduction in the runtime of "git log --patch" when using the histogram diff algorithm. Test HEAD^ HEAD ----------------------------------------------------------------------------- 4000.1: log -3000 (baseline) 0.18(0.14+0.04) 0.19(0.17+0.02) +5.6% 4000.2: log --raw -3000 (tree-only) 0.99(0.77+0.21) 0.98(0.78+0.20) -1.0% 4000.3: log -p -3000 (Myers) 4.84(4.31+0.51) 4.81(4.15+0.64) -0.6% 4000.4: log -p -3000 --histogram 6.34(5.86+0.46) 5.87(5.19+0.66) -7.4% 4000.5: log -p -3000 --patience 5.39(4.60+0.76) 5.35(4.60+0.73) -0.7% Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-11-18 22:23:31 -08:00
Todd Zullinger	484257925f	Replace Free Software Foundation address in license notices The mailing address for the FSF has changed over the years. Rather than updating the address across all files, refer readers to gnu.org, as the GNU GPL documentation now suggests for license notices. The mailing address is retained in the full license files (COPYING and LGPL-2.1). The old address is still present in t/diff-lib/COPYING. This is intentional, as the file is used in tests and the contents are not expected to change. Signed-off-by: Todd Zullinger <tmz@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-09 13:21:21 +09:00
Ramsay Jones	87f1625836	xdiff/xprepare: fix a memory leak The xdl_prepare_env() function may initialise an xdlclassifier_t data structure via xdl_init_classifier(), which allocates memory to several fields, for example 'rchash', 'rcrecs' and 'ncha'. If this function later exits due to the failure of xdl_optimize_ctxs(), then this xdlclassifier_t structure, and the memory allocated to it, is not cleaned up. In order to fix the memory leak, insert a call to xdl_free_classifier() before returning. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-03-04 15:51:08 -08:00
Ramsay Jones	5cd6978a9c	xdiff/xprepare: use the XDF_DIFF_ALG() macro to access flag bits Commit `307ab20b3` ("xdiff: PATIENCE/HISTOGRAM are not independent option bits", 19-02-2012) introduced the XDF_DIFF_ALG() macro to access the flag bits used to represent the diff algorithm requested. In addition, code which had used explicit manipulation of the flag bits was changed to use the macros. However, one example of direct manipulation remains. Update this code to use the XDF_DIFF_ALG() macro. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-03-04 15:51:06 -08:00
Junio C Hamano	307ab20b33	xdiff: PATIENCE/HISTOGRAM are not independent option bits Because the default Myers, patience and histogram algorithms cannot be in effect at the same time, XDL_PATIENCE_DIFF and XDL_HISTOGRAM_DIFF are not independent bits. Instead of wasting one bit per algorithm, define a few macros to access the few bits they occupy and update the code that access them. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-19 15:36:55 -08:00
Junio C Hamano	7a63a920fd	Merge branch 'rs/diff-cleanup-records-fix' * rs/diff-cleanup-records-fix: diff: resurrect XDF_NEED_MINIMAL with --minimal Revert removal of multi-match discard heuristic in 27af01	2011-10-13 19:03:22 -07:00
René Scharfe	c5aa90682f	Revert removal of multi-match discard heuristic in 27af01 `27af01d` (xdiff/xprepare: improve O(n*m) performance in xdl_cleanup_records(), 2011-08-17) was supposed to be a performance boost only. However, it unexpectedly changed the behaviour of diff. Revert a part of `27af01d` that removes logic that mark lines as "multi-match" (ie. dis[i] == 2). This was preventing the multi-match discard heuristic (performed in xdl_cleanup_records() and xdl_clean_mmatch()) from executing. Reported-by: Alexander Pepper <pepper@inf.fu-berlin.de> Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-09-26 11:38:14 -07:00
Junio C Hamano	b648557ef1	Merge branch 'rc/histogram-diff' * rc/histogram-diff: xdiff/xprepare: initialise xdlclassifier_t cf in xdl_prepare_env()	2011-09-06 11:42:58 -07:00
Tay Ray Chuan	2738bc3f09	xdiff/xprepare: initialise xdlclassifier_t cf in xdl_prepare_env() Ensure that the xdl_free_classifier() call on xdlclassifier_t cf is safe even if xdl_init_classifier() isn't called. This may occur in the case where diff is run with --histogram and a call to, say, xdl_prepare_ctx() fails. Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-08-31 10:03:51 -07:00
Junio C Hamano	b14b969ab9	Merge branch 'rc/histogram-diff' into HEAD * rc/histogram-diff: xdiff/xhistogram: drop need for additional variable xdiff/xhistogram: rely on xdl_trim_ends() xdiff/xhistogram: rework handling of recursed results xdiff: do away with xdl_mmfile_next() Make test number unique xdiff/xprepare: use a smaller sample size for histogram diff xdiff/xprepare: skip classification teach --histogram to diff t4033-diff-patience: factor out tests xdiff/xpatience: factor out fall-back-diff function xdiff/xprepare: refactor abort cleanups xdiff/xprepare: use memset() Conflicts: xdiff/xprepare.c	2011-08-17 17:17:16 -07:00
Tay Ray Chuan	27af01d552	xdiff/xprepare: improve O(nm) performance in xdl_cleanup_records() In xdl_cleanup_records(), we see O(nm) performance, where n is the number of records from xdf->dstart to xdf->dend, and m is the size of a bucket in xdf->rhash (<= by mlim). Here, we improve this to O(n) by pre-computing nm (in rcrec->len(1\|2)) in xdl_classify_record(). Reported-by: Marat Radchenko <marat@slonopotamus.org> Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-08-17 17:15:05 -07:00
Tay Ray Chuan	739864b1ff	xdiff: do away with xdl_mmfile_next() Given our simple mmfile structure, xdl_mmfile_next() calls are redundant. Do away with calls to them. Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-08-03 10:15:16 -07:00
Tay Ray Chuan	86abba8015	xdiff/xprepare: use a smaller sample size for histogram diff For histogram diff, we can afford a smaller sample size and thus a poorer estimate of the number of lines, as the hash table (rhash) won't be filled up/grown. This is safe as the final count of lines (xdf.nrecs) will be updated correctly anyway by xdl_prepare_ctx(). This gives us a small boost in performance. Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-07-12 09:30:00 -07:00
Tay Ray Chuan	9f37c27593	xdiff/xprepare: skip classification xdiff performs "classification" of records (xdl_classify_record()), replacing hashes (xrecord_t.ha) with a unique identifier of the record/line and building a hash table (xrecord_t.rhash) of records. This is then used to "cleanup" records (xdl_cleanup_records()). We don't need any of that in histogram diff, so we omit calls to these functions. We also skip allocating memory to the hash table, rhash, as it is no longer used. This gives us a small boost in performance. Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-07-12 09:29:39 -07:00
Tay Ray Chuan	159607a8f1	xdiff/xprepare: refactor abort cleanups Group free()'s that are called when a malloc() fails in xdl_prepare_ctx(), making for more readable code. Also add a free() on ha, in case future git hackers add allocs after the ha malloc. Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-07-07 09:37:21 -07:00
Tay Ray Chuan	452f4fa51e	xdiff/xprepare: use memset() Use memset() instead of a for loop to initialize. This could give a performance advantage. Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-07-07 09:36:44 -07:00
Johannes Schindelin	92b7de93fb	Implement the patience diff algorithm The patience diff algorithm produces slightly more intuitive output than the classic Myers algorithm, as it does not try to minimize the number of +/- lines first, but tries to preserve the lines that are unique. To this end, it first determines lines that are unique in both files, then the maximal sequence which preserves the order (relative to both files) is extracted. Starting from this initial set of common lines, the rest of the lines is handled recursively, with Myers' algorithm as a fallback when the patience algorithm fails (due to no common unique lines). This patch includes memory leak fixes by Pierre Habouzit. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-01-07 13:35:44 -08:00
Davide Libenzi	9b28d55401	xdiff: give up scanning similar lines early In a corner case of large files whose lines do not match uniquely, the loop to eliminate a line that matches multiple locations adjacent to a run of lines that do not uniquely match wasted too much cycles. Fix this by giving up early after scanning 100 lines in both direction. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-11-08 09:28:30 -08:00
Junio C Hamano	a6080a0a44	War on whitespace This uses "git-apply --whitespace=strip" to fix whitespace errors that have crept in to our source files over time. There are a few files that need to have trailing whitespaces (most notably, test vectors). The results still passes the test, and build result in Documentation/ area is unchanged. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2007-06-07 00:04:01 -07:00
Johannes Schindelin	0d21efa51c	Teach diff about -b and -w flags This adds -b (--ignore-space-change) and -w (--ignore-all-space) flags to diff. The main part of the patch is teaching libxdiff about it. [jc: renamed xdl_line_match() to xdl_recmatch() since the former is used for different purposes in xpatchi.c which is in the parts of the upstream source we do not use.] Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-06-23 17:35:27 -07:00
Davide Libenzi	ca557afff9	Clean-up trivially redundant diff. Also corrects the line numbers in unified output when using zero lines context.	2006-04-04 00:11:09 -07:00
Linus Torvalds	3443546f6e	Use a real built-in diff generator This uses a simplified libxdiff setup to generate unified diffs _without_ doing fork/execve of GNU "diff". This has several huge advantages, for example: Before: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m24.818s user 0m13.332s sys 0m8.664s After: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m4.563s user 0m2.944s sys 0m1.580s and the fact that this should be a lot more portable (ie we can ignore all the issues with doing fork/execve under Windows). Perhaps even more importantly, this allows us to do diffs without actually ever writing out the git file contents to a temporary file (and without any of the shell quoting issues on filenames etc etc). NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the current "diff-core" code actually will always write the temp-files, because it used to be something that you simply had to do. So this current one actually writes a temp-file like before, and then reads it into memory again just to do the diff. Stupid. But if this basic infrastructure is accepted, we can start switching over diff-core to not write temp-files, which should speed things up even further, especially when doing big tree-to-tree diffs. Now, in the interest of full disclosure, I should also point out a few downsides: - the libxdiff algorithm is different, and I bet GNU diff has gotten a lot more testing. And the thing is, generating a diff is not an exact science - you can get two different diffs (and you will), and they can both be perfectly valid. So it's not possible to "validate" the libxdiff output by just comparing it against GNU diff. - GNU diff does some nice eye-candy, like trying to figure out what the last function was, and adding that information to the "@@ .." line. libxdiff doesn't do that. - The libxdiff thing has some known deficiencies. In particular, it gets the "\No newline at end of file" case wrong. So this is currently for the experimental branch only. I hope Davide will help fix it. That said, I think the huge performance advantage, and the fact that it integrates better is definitely worth it. But it should go into a development branch at least due to the missing newline issue. Technical note: this is based on libxdiff-0.17, but I did some surgery to get rid of the extraneous fat - stuff that git doesn't need, and seriously cutting down on mmfile_t, which had much more capabilities than the diff algorithm either needed or used. In this version, "mmfile_t" is just a trivial <pointer,length> tuple. That said, I tried to keep the differences to simple removals, so that you can do a diff between this and the libxdiff origin, and you'll basically see just things getting deleted. Even the mmfile_t simplifications are left in a state where the diffs should be readable. Apologies to Davide, whom I'd love to get feedback on this all from (I wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old complex format had a helper function for that, but I did my surgery with the goal in mind that eventually we _should_ just do mmfile_t mf; buf = read_sha1_file(sha1, type, &size); mf->ptr = buf; mf->size = size; .. use "mf" directly .. which was really a nightmare with the old "helpful" mmfile_t, and really is that easy with the new cut-down interfaces). [ Btw, as any hawk-eye can see from the diff, this was actually generated with itself, so it is "self-hosting". That's about all the testing it has gotten, along with the above kernel diff, which eye-balls correctly, but shows the newline issue when you double-check it with "git-apply" ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-03-25 16:49:58 -08:00

46 Commits