When we fail to find `gpatch`, don't say that we failed to find `patch`.
Cosmetic fix to commit 1254f146f8 ("tests: validate "patch" and "ed"
commands once, print meaningful messages (#226)")
This is an extremely minor fix because the error message already printed
"gpatch validation failed, no such file or directory" even before this
commit.
Signed-off-by: Marc Herbert <Marc.Herbert@gmail.com>
macOS' /usr/bin/patch and GNU patch have very subtle incompatibilities
that cause only some "more advanced" tests to fail in obscure and very
time-consuming ways - while other tests pass. In some cases (depending
on test threads racing), the lack of newlines in some test data even
causes the whole test suite to stall.
This fix runs `patch -version` (only once), makes sure the output starts
with "GNU patch" and shows a meaningful assert message when not. It also
looks for `gpatch` instead of `patch` on macOS and shows a meaningful
assert message if either is missing.
Fixes: #225
This also provides faster and better feedback when `ed` is missing (see
#39) and implements a portable and basic check.
Last but not least, this new code is generic enough to support the
validation of any other test dependency in the future.
* feat: u64 for --bytes and --ignore-initial
fix: bumped up tempfile to "3.26.0"
The variables for --bytes, --ignore-initial and line count where size 'usize',
thus limiting the readable bytes on 32-bit systems.
GNU cmp is compiled with LFS (Large File Support) and allows i64 values.
This is now all u64, which works also on 32-bit systems with Rust.
There is no reason to implement a 32-bit barrier for 32 bit machines.
Additionally the --bytes limit can be set to 'u128' using the feature
"cmp_bytes_limit_128_bit".
The performance impact would be negligible, as there only few calculations
each time a full block is read from the file.
---------
Co-authored-by: Gunter Schmidt <gsgit@beadsoft.de>
Co-authored-by: Sylvestre Ledru <sylvestre@debian.org>
Fixes#223. Very simple reproduction
```
cd diffutils
mkdir a
touch a/alef a/alefn a/alef_ a/alefx a/alefr a/fuzz.file
cargo test
```
=> fail
https://www.gnu.org/software/diffutils/manual/html_node/Multiple-Patches.html
states that the "old" file name has precedence over the "new" filename.
I hit this problem because some other (and unfortunately: unknown for
now) test issue left bogus `a/alef*` file(s) behind in my workspace. I
didn't bother cleaning them up because I assumed some test would keep
recreating them and that cost me a lot of time.
This issue seems to have existed since the very first commit.
Interestingly, there as a previous attempt in 2024 to fix this in commit
a3a372ff36 ! So I was apparently not the only affected. BUT that
fix was immediately reverted by commit ba7cb0aef9 in the same
PR. Admittedly, that fix seemed somewhat off-topic in
https://github.com/uutils/diffutils/pull/33. So here it is again.
* fix: match GNU error format for unrecognized options
Use single quotes and remove colon to match GNU diff/cmp output:
`unrecognized option '--foobar'` instead of `unrecognized option: "--foobar"`
Also use `contains` instead of `starts_with` in the integration test
to handle the command prefix (e.g. `cmp: unrecognized option ...`).
Follow-up to #178 / #179.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* style: apply cargo fmt formatting
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This makes verbose comparison of 37MB completely different files 2.34x
faster than our own baseline, putting our cmp at almost 6x faster than
GNU cmp (/opt/homebrew/bin/cmp) on my M4 Pro Mac. The output remains
identical to that of GNU cmp. Mostly equal and smaller files do not
regress.
Benchmark 1: ./bin/baseline/diffutils cmp -lb t/huge t/eguh
Time (mean ± σ): 1.669 s ± 0.011 s [User: 1.594 s, System: 0.073 s]
Range (min … max): 1.654 s … 1.689 s 10 runs
Warning: Ignoring non-zero exit code.
Benchmark 2: ./target/release/diffutils cmp -lb t/huge t/eguh
Time (mean ± σ): 714.2 ms ± 4.1 ms [User: 629.3 ms, System: 82.7 ms]
Range (min … max): 707.2 ms … 721.5 ms 10 runs
Warning: Ignoring non-zero exit code.
Benchmark 3: /opt/homebrew/bin/cmp -lb t/huge t/eguh
Time (mean ± σ): 4.213 s ± 0.050 s [User: 4.128 s, System: 0.081 s]
Range (min … max): 4.160 s … 4.316 s 10 runs
Warning: Ignoring non-zero exit code.
Benchmark 4: /usr/bin/cmp -lb t/huge t/eguh
Time (mean ± σ): 3.892 s ± 0.048 s [User: 3.819 s, System: 0.070 s]
Range (min … max): 3.808 s … 3.976 s 10 runs
Warning: Ignoring non-zero exit code.
Summary
./target/release/diffutils cmp -lb t/huge t/eguh ran
2.34 ± 0.02 times faster than ./bin/baseline/diffutils cmp -lb t/huge t/eguh
5.45 ± 0.07 times faster than /usr/bin/cmp -lb t/huge t/eguh
5.90 ± 0.08 times faster than /opt/homebrew/bin/cmp -lb t/huge t/eguh
Create the diff -y utility, this time introducing tests and changes focused
mainly on the construction of the utility and issues related to alignment
and response tabulation. New parameters were introduced such as the size
of the total width of the output in the parameters. A new calculation was
introduced to determine the size of the output columns and the maximum
total column size. The tab and spacing mechanism has the same behavior
as the original diff, with tabs and spaces formatted in the same way.
- Introducing tests for the diff 'main' function
- Introducing fuzzing for side diff utility
- Introducing tests for internal mechanisms
- Modular functions that allow consistent changes across the entire project
- Create the function, in the utils package, limited_string that allows you to truncate a string based on a
delimiter (May break the encoding of the character where it was cut)
- Create tests for limited_string function
- Add support for -y and --side-by-side flags that enables diff output for side-by-side mode
- Create implementation of the diff -y (SideBySide) command, base command for sdiff, using the crate
diff as engine. Currently it does not fully represent GNU diff -y, some flags (|, (, ), , /) could
not be developed due to the limitation of the engine we currently use (crate diff), which did not
allow perform logic around it. Only the use of '<' and '>' were enabled.
- Create tests for SideBySide implementation
Before this change, we would first find all changes so we could obtain
the largest offset we will report and use that to set up the padding.
Now we use the file sizes to estimate the largest possible offset.
Not only does this allow us to print earlier, reduces memory usage, as
we do not store diffs to report later, but it also fixes a case in
which our output was different to GNU cmp's - because it also seems
to estimate based on size.
Memory usage drops by a factor of 1000(!), without losing performance
while comparing 2 binaries of hundreds of MBs:
Before:
Maximum resident set size (kbytes): 2489260
Benchmark 1: ../target/release/diffutils \
cmp -l -b /usr/lib64/chromium-browser/chromium-browser /usr/lib64/firefox/libxul.so
Time (mean ± σ): 14.466 s ± 0.166 s [User: 12.367 s, System: 2.012 s]
Range (min … max): 14.350 s … 14.914 s 10 runs
After:
Maximum resident set size (kbytes): 2636
Benchmark 1: ../target/release/diffutils \
cmp -l -b /usr/lib64/chromium-browser/chromium-browser /usr/lib64/firefox/libxul.so
Time (mean ± σ): 13.724 s ± 0.038 s [User: 12.263 s, System: 1.372 s]
Range (min … max): 13.667 s … 13.793 s 10 runs
This makes the code less readable, but gets us a massive improvement
to performance. Comparing ~36M completely different files now takes
~40% of the time. Compared to GNU cmp, we now run the same comparison
in ~26% of the time.
This also improves comparing binary files. A comparison of chromium
and libxul now takes ~60% of the time. We also beat GNU cmpi by about
the same margin.
Before:
> hyperfine --warmup 1 -i --output=pipe \
'../target/release/diffutils cmp -l huge huge.3'
Benchmark 1: ../target/release/diffutils cmp -l huge huge.3
Time (mean ± σ): 2.000 s ± 0.016 s [User: 1.603 s, System: 0.392 s]
Range (min … max): 1.989 s … 2.043 s 10 runs
Warning: Ignoring non-zero exit code.
> hyperfine --warmup 1 -i --output=pipe \
'../target/release/diffutils cmp -l -b \
/usr/lib64/chromium-browser/chromium-browser \
/usr/lib64/firefox/libxul.so'
Benchmark 1: ../target/release/diffutils cmp -l -b /usr/lib64/chromium-browser/chromium-browser /usr/lib64/firefox/libxul.so
Time (mean ± σ): 24.704 s ± 0.162 s [User: 21.948 s, System: 2.700 s]
Range (min … max): 24.359 s … 24.889 s 10 runs
Warning: Ignoring non-zero exit code.
After:
> hyperfine --warmup 1 -i --output=pipe \
'../target/release/diffutils cmp -l huge huge.3'
Benchmark 1: ../target/release/diffutils cmp -l huge huge.3
Time (mean ± σ): 849.5 ms ± 6.2 ms [User: 538.3 ms, System: 306.8 ms]
Range (min … max): 839.4 ms … 857.7 ms 10 runs
Warning: Ignoring non-zero exit code.
> hyperfine --warmup 1 -i --output=pipe \
'../target/release/diffutils cmp -l -b \
/usr/lib64/chromium-browser/chromium-browser \
/usr/lib64/firefox/libxul.so'
Benchmark 1: ../target/release/diffutils cmp -l -b /usr/lib64/chromium-browser/chromium-browser /usr/lib64/firefox/libxul.so
Time (mean ± σ): 14.646 s ± 0.040 s [User: 12.328 s, System: 2.286 s]
Range (min … max): 14.585 s … 14.702 s 10 runs
Warning: Ignoring non-zero exit code.
Octal conversion and simple integer to string both show up in profiling.
This change improves comparing ~36M completely different files wth both
-l and -b by ~11-13%.
The utility should support all the arguments supported by GNU cmp and
perform slightly better.
On a "bad" scenario, ~36M files which are completely different, our
version runs in ~72% of the time of the original on my M1 Max:
> hyperfine --warmup 1 -i --output=pipe \
'cmp -l huge huge.3'
Benchmark 1: cmp -l huge huge.3
Time (mean ± σ): 3.237 s ± 0.014 s [User: 2.891 s, System: 0.341 s]
Range (min … max): 3.221 s … 3.271 s 10 runs
Warning: Ignoring non-zero exit code.
> hyperfine --warmup 1 -i --output=pipe \
'../target/release/diffutils cmp -l huge huge.3'
Benchmark 1: ../target/release/diffutils cmp -l huge huge.3
Time (mean ± σ): 2.392 s ± 0.009 s [User: 1.978 s, System: 0.406 s]
Range (min … max): 2.378 s … 2.406 s 10 runs
Warning: Ignoring non-zero exit code.
Our cmp runs in ~116% of the time when comparing libxul.so to the
chromium-browser binary with -l and -b. In a best case scenario of
comparing 2 files which are the same except for the last byte, our
tool is slightly faster.
This is in preparation for adding the other diffutils commands, cmp,
diff3, sdiff.
We use a similar strategy to uutils/coreutils, with the single binary
acting as one of the supported tools if called through a symlink with
the appropriate name. When using the multi-tool binary directly, the
utility needds to be the first parameter.