-
Notifications
You must be signed in to change notification settings - Fork 188
[RFC] diff: add diff.<driver>.process for external hunk providers
#2120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
13eb201
58f4763
3735854
ad5246f
f6eb681
8fd3d25
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -218,6 +218,11 @@ endif::git-diff[] | |
| Set this option to `true` to make the diff driver cache the text | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Junio C Hamano wrote on the Git mailing list (how to reply to this email): "Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com> writes:
> Zero hunks with status=success means the tool considers the
> files equivalent. Git skips diff output for that file.
Is "zero hunk" a common word or some random string you invented? If
the latter, which is I am assuming it to be, you should define what
it means at/before the first use. Here in the proposed log message,
and ...
>
> Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
> ---
> Documentation/config/diff.adoc | 8 +
> Documentation/gitattributes.adoc | 40 ++++
> Makefile | 1 +
> diff-process.c | 206 +++++++++++++++++++
> diff-process.h | 28 +++
> diff.c | 23 +++
> t/.gitattributes | 1 +
> t/t4080-diff-process.sh | 338 +++++++++++++++++++++++++++++++
> 8 files changed, 645 insertions(+)
> create mode 100644 diff-process.c
> create mode 100644 diff-process.h
> create mode 100755 t/t4080-diff-process.sh
>
> diff --git a/Documentation/config/diff.adoc b/Documentation/config/diff.adoc
> index 1135a62a0a..4ab5f60df6 100644
> --- a/Documentation/config/diff.adoc
> +++ b/Documentation/config/diff.adoc
> @@ -218,6 +218,14 @@ endif::git-diff[]
> Set this option to `true` to make the diff driver cache the text
> conversion outputs. See linkgit:gitattributes[5] for details.
>
> +`diff.<driver>.process`::
> + The command to run as a long-running diff process.
> + The tool communicates via the pkt-line protocol and returns
> + hunks that are fed into Git's diff and blame pipelines.
> + If the tool returns zero hunks, the file is treated as
> + unchanged for both diff output and blame attribution.
> + See linkgit:gitattributes[5] for details.
... also here.
I do not know if you mean "the tool returns no hunks" (there is no
"hunk <old_start> <old_count> <new_start> <new_count>" line passed
from the tool over the protocol) or "the tool returns zero-hunk"
(there is a special "zero-hunk" message to signal this particular
condition sent over the protocol), and this description does not
quite help disambiguating between the two.
If the former, then avoid "zero hunks" as it sounds like a noun with
special meaning. Yes, we can say "tool returns one hunk", "tool
returns 31 hunks", etc., so "tool returns zero hunks" may logically
be correct, but "when the tool returns no hunks with status=success"
is much less confusing, I think.There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Michael Montalbo wrote on the Git mailing list (how to reply to this email): On Mon, May 25, 2026 at 7:26 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Zero hunks with status=success means the tool considers the
> > files equivalent. Git skips diff output for that file.
>
> Is "zero hunk" a common word or some random string you invented? If
> the latter, which is I am assuming it to be, you should define what
> it means at/before the first use. Here in the proposed log message,
> and ...
>
> >
> > Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
> > ---
> > Documentation/config/diff.adoc | 8 +
> > Documentation/gitattributes.adoc | 40 ++++
> > Makefile | 1 +
> > diff-process.c | 206 +++++++++++++++++++
> > diff-process.h | 28 +++
> > diff.c | 23 +++
> > t/.gitattributes | 1 +
> > t/t4080-diff-process.sh | 338 +++++++++++++++++++++++++++++++
> > 8 files changed, 645 insertions(+)
> > create mode 100644 diff-process.c
> > create mode 100644 diff-process.h
> > create mode 100755 t/t4080-diff-process.sh
> >
> > diff --git a/Documentation/config/diff.adoc b/Documentation/config/diff.adoc
> > index 1135a62a0a..4ab5f60df6 100644
> > --- a/Documentation/config/diff.adoc
> > +++ b/Documentation/config/diff.adoc
> > @@ -218,6 +218,14 @@ endif::git-diff[]
> > Set this option to `true` to make the diff driver cache the text
> > conversion outputs. See linkgit:gitattributes[5] for details.
> >
> > +`diff.<driver>.process`::
> > + The command to run as a long-running diff process.
> > + The tool communicates via the pkt-line protocol and returns
> > + hunks that are fed into Git's diff and blame pipelines.
> > + If the tool returns zero hunks, the file is treated as
> > + unchanged for both diff output and blame attribution.
> > + See linkgit:gitattributes[5] for details.
>
> ... also here.
>
> I do not know if you mean "the tool returns no hunks" (there is no
> "hunk <old_start> <old_count> <new_start> <new_count>" line passed
> from the tool over the protocol) or "the tool returns zero-hunk"
> (there is a special "zero-hunk" message to signal this particular
> condition sent over the protocol), and this description does not
> quite help disambiguating between the two.
>
> If the former, then avoid "zero hunks" as it sounds like a noun with
> special meaning. Yes, we can say "tool returns one hunk", "tool
> returns 31 hunks", etc., so "tool returns zero hunks" may logically
> be correct, but "when the tool returns no hunks with status=success"
> is much less confusing, I think.
Yes, "zero hunks" was my own invention and I see why it's confusing. Will
update the messaging to use "no hunks" instead and do a broader sweep of
the documentation to clarify the protocol and expected tool behavior.There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Johannes Schindelin wrote on the Git mailing list (how to reply to this email): Hi Michael,
I stumbled about this patch when it broke CI in Git for Windows, where we
do _not_ use `NO_PYTHON`, even though Python is unavailable in the
build/test CI jobs. The existing tests handle this situation gracefully,
this here patch does not:
On Sun, 7 Jun 2026, Michael Montalbo via GitGitGadget wrote:
> diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
> new file mode 100755
> index 0000000000..f159cd86d8
> --- /dev/null
> +++ b/t/t4080-diff-process.sh
> @@ -0,0 +1,538 @@
> +#!/bin/sh
> +
> +test_description='diff process via long-running process'
> +
> +. ./test-lib.sh
> +
> +if test_have_prereq PYTHON
> +then
> + PYTHON_PATH=$(command -v python3) || PYTHON_PATH=$(command -v python)
When neither `python3` nor `python` are available (which is the case in
the minimal Git for Windows SDK used in Git's CI runs), this fails under
`set -e`. Before even running the first test case. Resulting in an
unexpected TAP format error.
Now, we could "fix" this by imitating what `lib-p4` does (see
https://github.com/dscho/git/commit/bd0b5570c744f678911a67a62da63f30655f20d8
which demonstrates that it is indeed a work-around on Windows):
-- snip --
diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
index fdf6da1c341e67..bd22c247ff3856 100755
--- a/t/t4080-diff-process.sh
+++ b/t/t4080-diff-process.sh
@@ -4,9 +4,10 @@ test_description='diff process via long-running process'
. ./test-lib.sh
-if test_have_prereq PYTHON
+if ! test_have_prereq PYTHON || ! test -x "$PYTHON_PATH"
then
- PYTHON_PATH=$(command -v python3) || PYTHON_PATH=$(command -v python)
+ skip_all='python interpreter not available'
+ test_done
fi
#
-- snap --
Of course, this uncovers _another_ problem with the Python script: It uses
Python3-only `f"..."` format strings, which cannot be handled by the
Python2 to which the `PYTHON_PATH` variable in `linux-TEST-vars` points.
So this requires _another follow-up (see also
https://github.com/dscho/git/commit/c12a9f4c80e5ce8db0fe370fac46fb45be2b775f):
-- snip --
diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
index bd22c247ff3856..ba14682a9086e4 100755
--- a/t/t4080-diff-process.sh
+++ b/t/t4080-diff-process.sh
@@ -39,7 +39,8 @@ setup_backend () {
def write_pkt(line):
data = (line + "\n").encode()
- sys.stdout.buffer.write(f"{len(data)+4:04x}".encode() + data)
+ hdr = "{:04x}".format(len(data) + 4).encode()
+ sys.stdout.buffer.write(hdr + data)
sys.stdout.buffer.flush()
def write_flush():
@@ -98,7 +99,8 @@ setup_backend () {
new = read_content()
old_first = old.split(b"\n")[0].decode(errors="replace") if old else ""
new_first = new.split(b"\n")[0].decode(errors="replace") if new else ""
- log(f"command={cmd} pathname={pathname} old={old_first} new={new_first}")
+ log("command={} pathname={} old={} new={}".format(
+ cmd, pathname, old_first, new_first))
if mode == "error":
write_flush()
@@ -130,7 +132,7 @@ setup_backend () {
else:
ol = old.count(b"\n")
nl = new.count(b"\n")
- write_pkt(f"hunk 1 {ol} 1 {nl}")
+ write_pkt("hunk 1 {} 1 {}".format(ol, nl))
write_flush()
write_pkt("status=success")
write_flush()
-- snap --
And this is still not enough to make it work with Python2, see
https://github.com/dscho/git/actions/runs/27091523842/job/79955895737:
-- snip --
[...]
+ git -c diff.cdiff.process=./diff-process-backend --mode=fixed-hunk diff boundary.c
Traceback (most recent call last):
File "/__w/git/git/t/trash directory.t4080-diff-process/diff-process-backend.py", line 45, in <module>
assert read_pkt() == "git-diff-client"
File "/__w/git/git/t/trash directory.t4080-diff-process/diff-process-backend.py", line 4, in read_pkt
hdr = sys.stdin.buffer.read(4)
AttributeError: 'file' object has no attribute 'buffer'
-- snap --
I have experienced similar patterns in my career, where a single decision
required multiple follow-up fixes _just_ to avoid having to revert that
decision. This kind of doubling down has never ended well.
Therefore I would like to take a step back, and ask: Is it _really_ a good
idea to use Python here? Are we certain that we want to _require_ Python
to run this test and skip it if Python isn't available (as is the case in
the Windows-related parts of Git's very own CI) even if Python has nothing
at all to do with the feature that is being tested?
I don't want to be doomed to repeat history, and we can very well learn
e.g. from prior art in this very project, where the tests for the
clean/smudge filters (which _also_ want to speak pkt-line over stdio)
needlessly incurred Perl as a requirement to run the tests. It was
Matheus's heroic work in 52917a998ef3a (t0021: implementation the
rot13-filter.pl script in C, 2022-08-14) and 4d1d843be7a15 (tests: use the
new C rot13-filter helper to avoid PERL prereq, 2022-08-14) that avoided
that unnecessary prerequisite.
Likewise, there is `test-tool pkt-line` intended for driving the pkt-line
protocol via simple shell scripts.
So the conscious project direction has been: fold pkt-line test backends
into `test-tool` and drop the scripting-language prereq. Reintroducing the
same shape in Python would walk this back.
Patrick's careful effort in 27bd8ee311719 (Merge branch 'ps/fewer-perl',
2025-04-29) has been another clear sign that the Git project is actively
_removing_ scripting-language dependencies from the build and test
infrastructure, not adding new ones.
The clear prior art in Git's own tests for what t4080 wants to do, as of
today, is `t/helper/test-rot13-filter.c`, which could be imitated here
instead of (re-)introducing a dependency on a scripting language other
than Unix shell in Git's test suite.
The `PYTHON` prereq exists in exactly five files today, all `git p4`
related (where Python is an intrinsic prerequisite given that `git-p4.py`
_is_ written in Python): `t/lib-git-p4.sh`, `t/t9802-git-p4-filetype.sh`,
`t/t9810-git-p4-rcs.sh`, `t/t9835-git-p4-metadata-encoding-python2.sh`,
and `t/t9836-git-p4-metadata-encoding-python3.sh`.
After 7cdbff14d482 (remove merge-recursive-old, 2006-11-20), this here
patch would be the first one, after almost 20 years, to re-introduce
Python as a dependency outside `git p4`.
And it would also be the first ever to embed a Python script as a heredoc:
> +fi
> +
> +#
> +# A single parametric diff process.
> +# Usage: diff-process-backend --mode=<mode> [--log=<path>]
> +#
> +# Modes:
> +# whole-file - report all lines as changed (default)
> +# fixed-hunk - always report hunk 5 2 5 2
> +# bad-hunk - report out-of-bounds hunk 999 1 999 1
> +# bad-sync - report hunk with mismatched unchanged totals
> +# overlap - report two overlapping hunks
> +# no-hunks - return no hunks (files considered equivalent)
> +# error - return status=error for every request
> +# abort - return status=abort for every request
> +# crash - read one request then exit without responding
> +#
> +setup_backend () {
> + cat >"$TRASH_DIRECTORY/diff-process-backend.py" <<-\PYEOF
> + import sys, os
> +
> + def read_pkt():
> + hdr = sys.stdin.buffer.read(4)
> + if len(hdr) < 4: return None
> + length = int(hdr, 16)
> + if length == 0: return ""
> + data = sys.stdin.buffer.read(length - 4)
> + return data.decode().rstrip("\n")
> +
> + def write_pkt(line):
> + data = (line + "\n").encode()
> + sys.stdout.buffer.write(f"{len(data)+4:04x}".encode() + data)
> + sys.stdout.buffer.flush()
> +
> + def write_flush():
> + sys.stdout.buffer.write(b"0000")
> + sys.stdout.buffer.flush()
> +
> + def read_content():
> + chunks = []
> + while True:
> + hdr = sys.stdin.buffer.read(4)
> + if len(hdr) < 4: break
> + length = int(hdr, 16)
> + if length == 0: break
> + chunks.append(sys.stdin.buffer.read(length - 4))
> + return b"".join(chunks)
> +
> + mode = "whole-file"
> + logfile = None
> + for arg in sys.argv[1:]:
> + if arg.startswith("--mode="):
> + mode = arg[7:]
> + elif arg.startswith("--log="):
> + logfile = open(arg[6:], "a")
> +
> + def log(msg):
> + if logfile:
> + logfile.write(msg + "\n")
> + logfile.flush()
> +
> + # Handshake
> + assert read_pkt() == "git-diff-client"
> + assert read_pkt() == "version=1"
> + read_pkt()
> + write_pkt("git-diff-server")
> + write_pkt("version=1")
> + write_flush()
> + while True:
> + p = read_pkt()
> + if p == "": break
> + write_pkt("capability=hunks")
> + write_flush()
> +
> + log("ready")
> +
> + while True:
> + cmd = None
> + pathname = None
> + while True:
> + p = read_pkt()
> + if p is None: sys.exit(0)
> + if p == "": break
> + if p.startswith("command="): cmd = p.split("=",1)[1]
> + if p.startswith("pathname="): pathname = p.split("=",1)[1]
> + if cmd is None: sys.exit(0)
> + old = read_content()
> + new = read_content()
> + old_first = old.split(b"\n")[0].decode(errors="replace") if old else ""
> + new_first = new.split(b"\n")[0].decode(errors="replace") if new else ""
> + log(f"command={cmd} pathname={pathname} old={old_first} new={new_first}")
> +
> + if mode == "error":
> + write_flush()
> + write_pkt("status=error")
> + write_flush()
> + continue
> +
> + if mode == "abort":
> + write_flush()
> + write_pkt("status=abort")
> + write_flush()
> + continue
> +
> + if mode == "crash":
> + sys.exit(1)
> +
> + if cmd == "hunks":
> + if mode == "fixed-hunk":
> + write_pkt("hunk 5 2 5 2")
> + elif mode == "bad-hunk":
> + write_pkt("hunk 999 1 999 1")
> + elif mode == "bad-sync":
> + write_pkt("hunk 1 2 1 1")
> + elif mode == "overlap":
> + write_pkt("hunk 1 5 1 5")
> + write_pkt("hunk 3 2 3 2")
> + elif mode == "no-hunks":
> + pass
> + else:
> + ol = old.count(b"\n")
> + nl = new.count(b"\n")
> + write_pkt(f"hunk 1 {ol} 1 {nl}")
> + write_flush()
> + write_pkt("status=success")
> + write_flush()
> + else:
> + write_flush()
> + write_pkt("status=error")
> + write_flush()
> + PYEOF
The existing pattern is to provide larger scripts as fixtures in
associated `t/tNNNN/` directories, not as heredoc, see e.g.
`t/t1509/prepare-chroot.sh`. Writing scripts, especially lengthy ones, in
heredoc strings makes it virtually impossible to use static code analysis
or syntax highlighting to fend off banal errors.
Given the complexity of what t4080 tries to test (error, abort, crash,
bad-sync, no-hunks, multiple files in one session, capability
negotiation), it would unfortunately be infeasible to use `test-tool
pkt-line` from a shell script implementing that `diff.*.process` protocol.
So I've spiked a demo how the `test-tool diff-process-backend` could look
like (letting Opus do the menial typing, so that I can enjoy at least part
of a sunny Sunday outside), which also passes the CI build and test:
https://github.com/dscho/git/commit/b6e3c93381b00929476c3a00155f7cf7334a22e6
That commit is of course not intended to be used as-is; Feel free to pick
code parts of it and integrate them into your topic branch. Or write your
own test-tool helper from scratch if that's more your jam.
Ciao,
Johannes
> + write_script diff-process-backend <<-SHEOF
> + exec "$PYTHON_PATH" "$TRASH_DIRECTORY/diff-process-backend.py" "\$@"
> + SHEOF
> +}
> +
> +BACKEND="./diff-process-backend"
> +
> +test_expect_success PYTHON 'setup' '
> + setup_backend &&
> + echo "*.c diff=cdiff" >.gitattributes &&
> + git add .gitattributes &&
> +
> + # boundary.c: 10 lines, changes at 5-6 and 9-10.
> + # Used by: hunk boundaries, error fallback, crash, bad hunks, overlap.
> + cat >boundary.c <<-\EOF &&
> + line1
> + line2
> + line3
> + line4
> + OLD5
> + OLD6
> + line7
> + line8
> + OLD9
> + OLD10
> + EOF
> + git add boundary.c &&
> +
> + # worddiff.c: single-line function, value changes 1 -> 999.
> + # Used by: word-diff, --diff-algorithm, --no-ext-diff, --stat.
> + cat >worddiff.c <<-\EOF &&
> + int value(void) { return 1; }
> + EOF
> + git add worddiff.c &&
> +
> + # newfile.c: single-line function, value changes 42 -> 99.
> + # Used by: new file, --exit-code, multiple drivers.
> + cat >newfile.c <<-\EOF &&
> + int new_func(void) { return 42; }
> + EOF
> + git add newfile.c &&
> +
> + # logtest.c: single-line function for log/format-patch tests.
> + # Needs two commits so log -1 has a diff.
> + cat >logtest.c <<-\EOF &&
> + int logfunc(void) { return 1; }
> + EOF
> + git add logtest.c &&
> +
> + # two.c/one.c: two-file pair for error/abort/startup-failure tests.
> + cat >one.c <<-\EOF &&
> + int first(void) { return 1; }
> + EOF
> + cat >two.c <<-\EOF &&
> + int second(void) { return 2; }
> + EOF
> + git add one.c two.c &&
> +
> + git commit -m "initial" &&
> +
> + # Second commit for logtest.c (so log -1 has something to show).
> + cat >logtest.c <<-\EOF &&
> + int logfunc(void) { return 2; }
> + EOF
> + git add logtest.c &&
> + git commit -m "change logtest.c" &&
> +
> + # Working tree modifications (not committed).
> + cat >boundary.c <<-\EOF &&
> + line1
> + line2
> + line3
> + line4
> + NEW5
> + NEW6
> + line7
> + line8
> + NEW9
> + NEW10
> + EOF
> +
> + cat >worddiff.c <<-\EOF &&
> + int value(void) { return 999; }
> + EOF
> +
> + cat >newfile.c <<-\EOF &&
> + int new_func(void) { return 99; }
> + EOF
> +
> + cat >one.c <<-\EOF &&
> + int first(void) { return 10; }
> + EOF
> +
> + cat >two.c <<-\EOF
> + int second(void) { return 20; }
> + EOF
> +'
> +
> +#
> +# Core behavior: the tool controls which lines are marked as changed.
> +#
> +
> +test_expect_success PYTHON 'diff process hunk boundaries affect output' '
> + # The file has changes at lines 5-6 and 9-10, but fixed-hunk
> + # only reports lines 5-6 as changed. Lines 9-10 should not
> + # appear as changed in the output.
> + git -c diff.cdiff.process="$BACKEND --mode=fixed-hunk" \
> + diff boundary.c >actual &&
> + test_grep "^-OLD5" actual &&
> + test_grep "^-OLD6" actual &&
> + test_grep "^+NEW5" actual &&
> + test_grep "^+NEW6" actual &&
> + test_grep ! "^-OLD9" actual &&
> + test_grep ! "^-OLD10" actual &&
> + test_grep ! "^+NEW9" actual &&
> + test_grep ! "^+NEW10" actual
> +'
> +
> +test_expect_success PYTHON 'diff process works with new file' '
> + rm -f backend.log &&
> + git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> + diff -- newfile.c >actual 2>stderr &&
> + test_grep "return 99" actual &&
> + test_grep "pathname=newfile.c" backend.log &&
> + test_must_be_empty stderr
> +'
> +
> +test_expect_success PYTHON 'diff process works with added file (empty old side)' '
> + cat >added.c <<-\EOF &&
> + int added(void) { return 1; }
> + EOF
> + git add added.c &&
> +
> + rm -f backend.log &&
> + git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> + diff --cached -- added.c >actual 2>stderr &&
> + test_grep "added" actual &&
> + test_grep "pathname=added.c" backend.log &&
> + test_must_be_empty stderr
> +'
> +
> +test_expect_success PYTHON 'diff process skipped for binary files' '
> + printf "\\0binary" >binary.c &&
> + git add binary.c &&
> + git commit -m "add binary" &&
> + printf "\\0changed" >binary.c &&
> +
> + rm -f backend.log &&
> + git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> + diff -- binary.c >actual &&
> + test_grep "Binary files" actual &&
> + test_path_is_missing backend.log
> +'
> +
> +test_expect_success PYTHON 'diff process not consulted for unmatched driver' '
> + echo "not tracked by cdiff" >unmatched.txt &&
> + git add unmatched.txt &&
> + git commit -m "add unmatched.txt" &&
> +
> + echo "modified" >unmatched.txt &&
> +
> + rm -f backend.log &&
> + git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> + diff -- unmatched.txt >actual &&
> + test_grep "modified" actual &&
> + test_path_is_missing backend.log
> +'
> +
> +test_expect_success PYTHON 'multiple drivers use separate processes' '
> + echo "*.h diff=hdiff" >>.gitattributes &&
> + git add .gitattributes &&
> +
> + cat >multi.h <<-\EOF &&
> + int header(void) { return 1; }
> + EOF
> + git add multi.h &&
> + git commit -m "add multi.h" &&
> +
> + cat >multi.h <<-\EOF &&
> + int header(void) { return 2; }
> + EOF
> +
> + rm -f backend-c.log backend-h.log &&
> + git -c diff.cdiff.process="$BACKEND --log=backend-c.log" \
> + -c diff.hdiff.process="$BACKEND --log=backend-h.log" \
> + diff -- newfile.c multi.h >actual 2>stderr &&
> + test_grep "pathname=newfile.c" backend-c.log &&
> + test_grep "pathname=multi.h" backend-h.log &&
> + test_must_be_empty stderr
> +'
> +
> +test_expect_success PYTHON 'diff process works alongside textconv' '
> + write_script uppercase-filter <<-\EOF &&
> + tr "a-z" "A-Z" <"$1"
> + EOF
> +
> + cat >textconv.c <<-\EOF &&
> + hello world
> + EOF
> + git add textconv.c &&
> + git commit -m "add textconv.c" &&
> +
> + cat >textconv.c <<-\EOF &&
> + goodbye world
> + EOF
> +
> + rm -f backend.log &&
> + git -c diff.cdiff.textconv="./uppercase-filter" \
> + -c diff.cdiff.process="$BACKEND --log=backend.log" \
> + diff -- textconv.c >actual 2>stderr &&
> + # The diff process receives textconv-transformed (uppercase) content.
> + test_grep "pathname=textconv.c" backend.log &&
> + test_grep "old=HELLO WORLD" backend.log &&
> + test_grep "new=GOODBYE WORLD" backend.log &&
> + test_must_be_empty stderr
> +'
> +
> +#
> +# Downstream features: word diff, log, equivalent files, exit code.
> +#
> +
> +test_expect_success PYTHON 'diff process with --word-diff' '
> + rm -f backend.log &&
> + git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> + diff --word-diff worddiff.c >actual 2>stderr &&
> + test_grep "\[-1;-\]" actual &&
> + test_grep "{+999;+}" actual &&
> + test_grep "pathname=worddiff.c" backend.log &&
> + test_must_be_empty stderr
> +'
> +
> +test_expect_success PYTHON 'diff process works with git log -p' '
> + # With no-hunks mode, the tool says the files are equivalent,
> + # so log -p should show the commit but no diff content.
> + rm -f backend.log &&
> + git -c diff.cdiff.process="$BACKEND --mode=no-hunks --log=backend.log" \
> + log -1 -p -- logtest.c >actual 2>stderr &&
> + test_grep "change logtest.c" actual &&
> + test_grep ! "return 2" actual &&
> + test_grep "command=hunks pathname=logtest.c" backend.log &&
> + test_must_be_empty stderr
> +'
> +
> +test_expect_success PYTHON 'diff process no hunks suppresses diff output' '
> + cat >nohunks.c <<-\EOF &&
> + int zero(void) { return 0; }
> + EOF
> + git add nohunks.c &&
> + git commit -m "add nohunks.c" &&
> +
> + cat >nohunks.c <<-\EOF &&
> + int zero(void) { return 999; }
> + EOF
> +
> + git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
> + diff nohunks.c >actual &&
> + test_must_be_empty actual
> +'
> +
> +test_expect_success PYTHON 'diff process no hunks with --exit-code returns success' '
> + git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
> + diff --exit-code nohunks.c
> +'
> +
> +test_expect_success PYTHON 'diff process with --exit-code and hunks returns failure' '
> + test_expect_code 1 git -c diff.cdiff.process="$BACKEND" \
> + diff --exit-code newfile.c
> +'
> +
> +#
> +# Bypass mechanisms: flags and commands that skip the diff process.
> +#
> +
> +test_expect_success PYTHON 'diff process bypassed by --diff-algorithm' '
> + rm -f backend.log &&
> + git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> + diff --diff-algorithm=patience worddiff.c >actual &&
> + test_grep "return 999" actual &&
> + test_path_is_missing backend.log
> +'
> +
> +test_expect_success PYTHON 'diff process not used by --stat' '
> + rm -f backend.log &&
> + git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> + diff --stat worddiff.c >actual &&
> + test_grep "worddiff.c" actual &&
> + test_path_is_missing backend.log
> +'
> +
> +#
> +# Error handling and fallback.
> +#
> +
> +test_expect_success PYTHON 'diff process fallback on tool error status' '
> + rm -f backend.log &&
> + git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
> + diff boundary.c >actual 2>stderr &&
> + # Fallback produces the full builtin diff (both change regions).
> + test_grep "^-OLD5" actual &&
> + test_grep "^+NEW5" actual &&
> + test_grep "^-OLD9" actual &&
> + test_grep "^+NEW9" actual &&
> + # Tool was contacted (it replied with error, not crash).
> + test_grep "command=hunks pathname=boundary.c" backend.log &&
> + test_grep "diff process.*failed" stderr
> +'
> +
> +test_expect_success PYTHON 'diff process error keeps tool available for next file' '
> + rm -f backend.log &&
> + git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
> + diff -- one.c two.c >actual 2>stderr &&
> + # Unlike abort, error keeps the tool available: both files
> + # are sent to the tool (and both fall back).
> + test_grep "pathname=one.c" backend.log &&
> + test_grep "pathname=two.c" backend.log &&
> + test_grep "return 10" actual &&
> + test_grep "return 20" actual
> +'
> +
> +test_expect_success PYTHON 'diff process abort disables for session' '
> + rm -f backend.log &&
> + git -c diff.cdiff.process="$BACKEND --mode=abort --log=backend.log" \
> + diff -- one.c two.c >actual &&
> + # Both files should still produce diff output via fallback.
> + test_grep "return 10" actual &&
> + test_grep "return 20" actual &&
> + # The tool aborts on the first file and git clears its
> + # capability. The second file never contacts the tool.
> + test_grep "pathname=one.c" backend.log &&
> + test_grep ! "pathname=two.c" backend.log
> +'
> +
> +test_expect_success PYTHON 'diff process fallback on tool crash' '
> + git -c diff.cdiff.process="$BACKEND --mode=crash" \
> + diff boundary.c >actual 2>stderr &&
> + test_grep "^-OLD5" actual &&
> + test_grep "^+NEW5" actual &&
> + test_grep "^-OLD9" actual &&
> + test_grep "^+NEW9" actual &&
> + # Crash is a communication failure, so a warning is emitted.
> + test_grep "diff process.*failed" stderr
> +'
> +
> +test_expect_success PYTHON 'diff process startup failure only warns once' '
> + git -c diff.cdiff.process="/nonexistent/tool" \
> + diff -- one.c two.c >actual 2>stderr &&
> + # Both files produce diff output via fallback.
> + test_grep "return 10" actual &&
> + test_grep "return 20" actual &&
> + # Sentinel prevents repeated warnings: only one, not one per file.
> + test_grep "diff process.*failed" stderr >warnings &&
> + test_line_count = 1 warnings
> +'
> +
> +test_expect_success PYTHON 'diff process fallback on bad hunks' '
> + git -c diff.cdiff.process="$BACKEND --mode=bad-hunk" \
> + diff boundary.c >actual 2>stderr &&
> + test_grep "^-OLD5" actual &&
> + test_grep "^+NEW5" actual &&
> + test_grep "^-OLD9" actual &&
> + test_grep "^+NEW9" actual &&
> + # Invalid hunks are caught by xdiff validation, not the
> + # protocol layer, so no warning is emitted.
> + test_must_be_empty stderr
> +'
> +
> +test_expect_success PYTHON 'diff process fallback on mismatched unchanged totals' '
> + cat >synctest.c <<-\EOF &&
> + line1
> + line2
> + line3
> + EOF
> + git add synctest.c &&
> + git commit -m "add synctest.c" &&
> +
> + cat >synctest.c <<-\EOF &&
> + line1
> + changed
> + line3
> + EOF
> +
> + # bad-sync reports hunk 1 2 1 1: marks 2 old lines and 1 new
> + # line as changed, leaving 1 unchanged old vs 2 unchanged new.
> + # The synchronization invariant fails and git falls back.
> + git -c diff.cdiff.process="$BACKEND --mode=bad-sync" \
> + diff synctest.c >actual 2>stderr &&
> + test_grep "changed" actual
> +'
> +
> +test_expect_success PYTHON 'diff process fallback on overlapping hunks' '
> + # boundary.c has 10 lines, so both hunks are in bounds
> + # but they overlap at lines 3-5, triggering the ordering check.
> + git -c diff.cdiff.process="$BACKEND --mode=overlap" \
> + diff boundary.c >actual 2>stderr &&
> + test_grep "NEW5" actual
> +'
> +
> +test_done
> diff --git a/userdiff.h b/userdiff.h
> index 51c26e0d41..a98eabe377 100644
> --- a/userdiff.h
> +++ b/userdiff.h
> @@ -3,6 +3,7 @@
>
> #include "notes-cache.h"
>
> +struct diff_subprocess;
> struct index_state;
> struct repository;
>
> @@ -33,6 +34,8 @@ struct userdiff_driver {
> int textconv_want_cache;
> const char *process;
> char *process_owned;
> + struct diff_subprocess *diff_subprocess;
> + unsigned diff_process_failed : 1;
> };
> enum userdiff_driver_type {
> USERDIFF_DRIVER_TYPE_BUILTIN = 1<<0,
> --
> gitgitgadget
>
>
> There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Michael Montalbo wrote on the Git mailing list (how to reply to this email): On Sun, Jun 7, 2026 at 7:36 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Michael,
>
> I stumbled about this patch when it broke CI in Git for Windows, where we
> do _not_ use `NO_PYTHON`, even though Python is unavailable in the
> build/test CI jobs. The existing tests handle this situation gracefully,
> this here patch does not:
>
> On Sun, 7 Jun 2026, Michael Montalbo via GitGitGadget wrote:
>
> > diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
> > new file mode 100755
> > index 0000000000..f159cd86d8
> > --- /dev/null
> > +++ b/t/t4080-diff-process.sh
> > @@ -0,0 +1,538 @@
> > +#!/bin/sh
> > +
> > +test_description='diff process via long-running process'
> > +
> > +. ./test-lib.sh
> > +
> > +if test_have_prereq PYTHON
> > +then
> > + PYTHON_PATH=$(command -v python3) || PYTHON_PATH=$(command -v python)
>
> When neither `python3` nor `python` are available (which is the case in
> the minimal Git for Windows SDK used in Git's CI runs), this fails under
> `set -e`. Before even running the first test case. Resulting in an
> unexpected TAP format error.
>
> Now, we could "fix" this by imitating what `lib-p4` does (see
> https://github.com/dscho/git/commit/bd0b5570c744f678911a67a62da63f30655f20d8
> which demonstrates that it is indeed a work-around on Windows):
>
> -- snip --
> diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
> index fdf6da1c341e67..bd22c247ff3856 100755
> --- a/t/t4080-diff-process.sh
> +++ b/t/t4080-diff-process.sh
> @@ -4,9 +4,10 @@ test_description='diff process via long-running process'
>
> . ./test-lib.sh
>
> -if test_have_prereq PYTHON
> +if ! test_have_prereq PYTHON || ! test -x "$PYTHON_PATH"
> then
> - PYTHON_PATH=$(command -v python3) || PYTHON_PATH=$(command -v python)
> + skip_all='python interpreter not available'
> + test_done
> fi
>
> #
> -- snap --
>
> Of course, this uncovers _another_ problem with the Python script: It uses
> Python3-only `f"..."` format strings, which cannot be handled by the
> Python2 to which the `PYTHON_PATH` variable in `linux-TEST-vars` points.
> So this requires _another follow-up (see also
> https://github.com/dscho/git/commit/c12a9f4c80e5ce8db0fe370fac46fb45be2b775f):
>
> -- snip --
> diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
> index bd22c247ff3856..ba14682a9086e4 100755
> --- a/t/t4080-diff-process.sh
> +++ b/t/t4080-diff-process.sh
> @@ -39,7 +39,8 @@ setup_backend () {
>
> def write_pkt(line):
> data = (line + "\n").encode()
> - sys.stdout.buffer.write(f"{len(data)+4:04x}".encode() + data)
> + hdr = "{:04x}".format(len(data) + 4).encode()
> + sys.stdout.buffer.write(hdr + data)
> sys.stdout.buffer.flush()
>
> def write_flush():
> @@ -98,7 +99,8 @@ setup_backend () {
> new = read_content()
> old_first = old.split(b"\n")[0].decode(errors="replace") if old else ""
> new_first = new.split(b"\n")[0].decode(errors="replace") if new else ""
> - log(f"command={cmd} pathname={pathname} old={old_first} new={new_first}")
> + log("command={} pathname={} old={} new={}".format(
> + cmd, pathname, old_first, new_first))
>
> if mode == "error":
> write_flush()
> @@ -130,7 +132,7 @@ setup_backend () {
> else:
> ol = old.count(b"\n")
> nl = new.count(b"\n")
> - write_pkt(f"hunk 1 {ol} 1 {nl}")
> + write_pkt("hunk 1 {} 1 {}".format(ol, nl))
> write_flush()
> write_pkt("status=success")
> write_flush()
> -- snap --
>
> And this is still not enough to make it work with Python2, see
> https://github.com/dscho/git/actions/runs/27091523842/job/79955895737:
>
> -- snip --
> [...]
> + git -c diff.cdiff.process=./diff-process-backend --mode=fixed-hunk diff boundary.c
> Traceback (most recent call last):
> File "/__w/git/git/t/trash directory.t4080-diff-process/diff-process-backend.py", line 45, in <module>
> assert read_pkt() == "git-diff-client"
> File "/__w/git/git/t/trash directory.t4080-diff-process/diff-process-backend.py", line 4, in read_pkt
> hdr = sys.stdin.buffer.read(4)
> AttributeError: 'file' object has no attribute 'buffer'
> -- snap --
>
> I have experienced similar patterns in my career, where a single decision
> required multiple follow-up fixes _just_ to avoid having to revert that
> decision. This kind of doubling down has never ended well.
>
> Therefore I would like to take a step back, and ask: Is it _really_ a good
> idea to use Python here? Are we certain that we want to _require_ Python
> to run this test and skip it if Python isn't available (as is the case in
> the Windows-related parts of Git's very own CI) even if Python has nothing
> at all to do with the feature that is being tested?
>
> I don't want to be doomed to repeat history, and we can very well learn
> e.g. from prior art in this very project, where the tests for the
> clean/smudge filters (which _also_ want to speak pkt-line over stdio)
> needlessly incurred Perl as a requirement to run the tests. It was
> Matheus's heroic work in 52917a998ef3a (t0021: implementation the
> rot13-filter.pl script in C, 2022-08-14) and 4d1d843be7a15 (tests: use the
> new C rot13-filter helper to avoid PERL prereq, 2022-08-14) that avoided
> that unnecessary prerequisite.
>
> Likewise, there is `test-tool pkt-line` intended for driving the pkt-line
> protocol via simple shell scripts.
>
> So the conscious project direction has been: fold pkt-line test backends
> into `test-tool` and drop the scripting-language prereq. Reintroducing the
> same shape in Python would walk this back.
>
> Patrick's careful effort in 27bd8ee311719 (Merge branch 'ps/fewer-perl',
> 2025-04-29) has been another clear sign that the Git project is actively
> _removing_ scripting-language dependencies from the build and test
> infrastructure, not adding new ones.
>
> The clear prior art in Git's own tests for what t4080 wants to do, as of
> today, is `t/helper/test-rot13-filter.c`, which could be imitated here
> instead of (re-)introducing a dependency on a scripting language other
> than Unix shell in Git's test suite.
>
> The `PYTHON` prereq exists in exactly five files today, all `git p4`
> related (where Python is an intrinsic prerequisite given that `git-p4.py`
> _is_ written in Python): `t/lib-git-p4.sh`, `t/t9802-git-p4-filetype.sh`,
> `t/t9810-git-p4-rcs.sh`, `t/t9835-git-p4-metadata-encoding-python2.sh`,
> and `t/t9836-git-p4-metadata-encoding-python3.sh`.
>
> After 7cdbff14d482 (remove merge-recursive-old, 2006-11-20), this here
> patch would be the first one, after almost 20 years, to re-introduce
> Python as a dependency outside `git p4`.
>
> And it would also be the first ever to embed a Python script as a heredoc:
>
> > +fi
> > +
> > +#
> > +# A single parametric diff process.
> > +# Usage: diff-process-backend --mode=<mode> [--log=<path>]
> > +#
> > +# Modes:
> > +# whole-file - report all lines as changed (default)
> > +# fixed-hunk - always report hunk 5 2 5 2
> > +# bad-hunk - report out-of-bounds hunk 999 1 999 1
> > +# bad-sync - report hunk with mismatched unchanged totals
> > +# overlap - report two overlapping hunks
> > +# no-hunks - return no hunks (files considered equivalent)
> > +# error - return status=error for every request
> > +# abort - return status=abort for every request
> > +# crash - read one request then exit without responding
> > +#
> > +setup_backend () {
> > + cat >"$TRASH_DIRECTORY/diff-process-backend.py" <<-\PYEOF
> > + import sys, os
> > +
> > + def read_pkt():
> > + hdr = sys.stdin.buffer.read(4)
> > + if len(hdr) < 4: return None
> > + length = int(hdr, 16)
> > + if length == 0: return ""
> > + data = sys.stdin.buffer.read(length - 4)
> > + return data.decode().rstrip("\n")
> > +
> > + def write_pkt(line):
> > + data = (line + "\n").encode()
> > + sys.stdout.buffer.write(f"{len(data)+4:04x}".encode() + data)
> > + sys.stdout.buffer.flush()
> > +
> > + def write_flush():
> > + sys.stdout.buffer.write(b"0000")
> > + sys.stdout.buffer.flush()
> > +
> > + def read_content():
> > + chunks = []
> > + while True:
> > + hdr = sys.stdin.buffer.read(4)
> > + if len(hdr) < 4: break
> > + length = int(hdr, 16)
> > + if length == 0: break
> > + chunks.append(sys.stdin.buffer.read(length - 4))
> > + return b"".join(chunks)
> > +
> > + mode = "whole-file"
> > + logfile = None
> > + for arg in sys.argv[1:]:
> > + if arg.startswith("--mode="):
> > + mode = arg[7:]
> > + elif arg.startswith("--log="):
> > + logfile = open(arg[6:], "a")
> > +
> > + def log(msg):
> > + if logfile:
> > + logfile.write(msg + "\n")
> > + logfile.flush()
> > +
> > + # Handshake
> > + assert read_pkt() == "git-diff-client"
> > + assert read_pkt() == "version=1"
> > + read_pkt()
> > + write_pkt("git-diff-server")
> > + write_pkt("version=1")
> > + write_flush()
> > + while True:
> > + p = read_pkt()
> > + if p == "": break
> > + write_pkt("capability=hunks")
> > + write_flush()
> > +
> > + log("ready")
> > +
> > + while True:
> > + cmd = None
> > + pathname = None
> > + while True:
> > + p = read_pkt()
> > + if p is None: sys.exit(0)
> > + if p == "": break
> > + if p.startswith("command="): cmd = p.split("=",1)[1]
> > + if p.startswith("pathname="): pathname = p.split("=",1)[1]
> > + if cmd is None: sys.exit(0)
> > + old = read_content()
> > + new = read_content()
> > + old_first = old.split(b"\n")[0].decode(errors="replace") if old else ""
> > + new_first = new.split(b"\n")[0].decode(errors="replace") if new else ""
> > + log(f"command={cmd} pathname={pathname} old={old_first} new={new_first}")
> > +
> > + if mode == "error":
> > + write_flush()
> > + write_pkt("status=error")
> > + write_flush()
> > + continue
> > +
> > + if mode == "abort":
> > + write_flush()
> > + write_pkt("status=abort")
> > + write_flush()
> > + continue
> > +
> > + if mode == "crash":
> > + sys.exit(1)
> > +
> > + if cmd == "hunks":
> > + if mode == "fixed-hunk":
> > + write_pkt("hunk 5 2 5 2")
> > + elif mode == "bad-hunk":
> > + write_pkt("hunk 999 1 999 1")
> > + elif mode == "bad-sync":
> > + write_pkt("hunk 1 2 1 1")
> > + elif mode == "overlap":
> > + write_pkt("hunk 1 5 1 5")
> > + write_pkt("hunk 3 2 3 2")
> > + elif mode == "no-hunks":
> > + pass
> > + else:
> > + ol = old.count(b"\n")
> > + nl = new.count(b"\n")
> > + write_pkt(f"hunk 1 {ol} 1 {nl}")
> > + write_flush()
> > + write_pkt("status=success")
> > + write_flush()
> > + else:
> > + write_flush()
> > + write_pkt("status=error")
> > + write_flush()
> > + PYEOF
>
> The existing pattern is to provide larger scripts as fixtures in
> associated `t/tNNNN/` directories, not as heredoc, see e.g.
> `t/t1509/prepare-chroot.sh`. Writing scripts, especially lengthy ones, in
> heredoc strings makes it virtually impossible to use static code analysis
> or syntax highlighting to fend off banal errors.
>
> Given the complexity of what t4080 tries to test (error, abort, crash,
> bad-sync, no-hunks, multiple files in one session, capability
> negotiation), it would unfortunately be infeasible to use `test-tool
> pkt-line` from a shell script implementing that `diff.*.process` protocol.
>
> So I've spiked a demo how the `test-tool diff-process-backend` could look
> like (letting Opus do the menial typing, so that I can enjoy at least part
> of a sunny Sunday outside), which also passes the CI build and test:
> https://github.com/dscho/git/commit/b6e3c93381b00929476c3a00155f7cf7334a22e6
>
> That commit is of course not intended to be used as-is; Feel free to pick
> code parts of it and integrate them into your topic branch. Or write your
> own test-tool helper from scratch if that's more your jam.
>
Johannes, thank you for the great feedback. The historical context is
really helpful and
the concerns you raise make a lot of sense. I will take a look at your
spike and also work
on removing Python from the test.
> Ciao,
> Johannes
>
> > + write_script diff-process-backend <<-SHEOF
> > + exec "$PYTHON_PATH" "$TRASH_DIRECTORY/diff-process-backend.py" "\$@"
> > + SHEOF
> > +}
> > +
> > +BACKEND="./diff-process-backend"
> > +
> > +test_expect_success PYTHON 'setup' '
> > + setup_backend &&
> > + echo "*.c diff=cdiff" >.gitattributes &&
> > + git add .gitattributes &&
> > +
> > + # boundary.c: 10 lines, changes at 5-6 and 9-10.
> > + # Used by: hunk boundaries, error fallback, crash, bad hunks, overlap.
> > + cat >boundary.c <<-\EOF &&
> > + line1
> > + line2
> > + line3
> > + line4
> > + OLD5
> > + OLD6
> > + line7
> > + line8
> > + OLD9
> > + OLD10
> > + EOF
> > + git add boundary.c &&
> > +
> > + # worddiff.c: single-line function, value changes 1 -> 999.
> > + # Used by: word-diff, --diff-algorithm, --no-ext-diff, --stat.
> > + cat >worddiff.c <<-\EOF &&
> > + int value(void) { return 1; }
> > + EOF
> > + git add worddiff.c &&
> > +
> > + # newfile.c: single-line function, value changes 42 -> 99.
> > + # Used by: new file, --exit-code, multiple drivers.
> > + cat >newfile.c <<-\EOF &&
> > + int new_func(void) { return 42; }
> > + EOF
> > + git add newfile.c &&
> > +
> > + # logtest.c: single-line function for log/format-patch tests.
> > + # Needs two commits so log -1 has a diff.
> > + cat >logtest.c <<-\EOF &&
> > + int logfunc(void) { return 1; }
> > + EOF
> > + git add logtest.c &&
> > +
> > + # two.c/one.c: two-file pair for error/abort/startup-failure tests.
> > + cat >one.c <<-\EOF &&
> > + int first(void) { return 1; }
> > + EOF
> > + cat >two.c <<-\EOF &&
> > + int second(void) { return 2; }
> > + EOF
> > + git add one.c two.c &&
> > +
> > + git commit -m "initial" &&
> > +
> > + # Second commit for logtest.c (so log -1 has something to show).
> > + cat >logtest.c <<-\EOF &&
> > + int logfunc(void) { return 2; }
> > + EOF
> > + git add logtest.c &&
> > + git commit -m "change logtest.c" &&
> > +
> > + # Working tree modifications (not committed).
> > + cat >boundary.c <<-\EOF &&
> > + line1
> > + line2
> > + line3
> > + line4
> > + NEW5
> > + NEW6
> > + line7
> > + line8
> > + NEW9
> > + NEW10
> > + EOF
> > +
> > + cat >worddiff.c <<-\EOF &&
> > + int value(void) { return 999; }
> > + EOF
> > +
> > + cat >newfile.c <<-\EOF &&
> > + int new_func(void) { return 99; }
> > + EOF
> > +
> > + cat >one.c <<-\EOF &&
> > + int first(void) { return 10; }
> > + EOF
> > +
> > + cat >two.c <<-\EOF
> > + int second(void) { return 20; }
> > + EOF
> > +'
> > +
> > +#
> > +# Core behavior: the tool controls which lines are marked as changed.
> > +#
> > +
> > +test_expect_success PYTHON 'diff process hunk boundaries affect output' '
> > + # The file has changes at lines 5-6 and 9-10, but fixed-hunk
> > + # only reports lines 5-6 as changed. Lines 9-10 should not
> > + # appear as changed in the output.
> > + git -c diff.cdiff.process="$BACKEND --mode=fixed-hunk" \
> > + diff boundary.c >actual &&
> > + test_grep "^-OLD5" actual &&
> > + test_grep "^-OLD6" actual &&
> > + test_grep "^+NEW5" actual &&
> > + test_grep "^+NEW6" actual &&
> > + test_grep ! "^-OLD9" actual &&
> > + test_grep ! "^-OLD10" actual &&
> > + test_grep ! "^+NEW9" actual &&
> > + test_grep ! "^+NEW10" actual
> > +'
> > +
> > +test_expect_success PYTHON 'diff process works with new file' '
> > + rm -f backend.log &&
> > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > + diff -- newfile.c >actual 2>stderr &&
> > + test_grep "return 99" actual &&
> > + test_grep "pathname=newfile.c" backend.log &&
> > + test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process works with added file (empty old side)' '
> > + cat >added.c <<-\EOF &&
> > + int added(void) { return 1; }
> > + EOF
> > + git add added.c &&
> > +
> > + rm -f backend.log &&
> > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > + diff --cached -- added.c >actual 2>stderr &&
> > + test_grep "added" actual &&
> > + test_grep "pathname=added.c" backend.log &&
> > + test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process skipped for binary files' '
> > + printf "\\0binary" >binary.c &&
> > + git add binary.c &&
> > + git commit -m "add binary" &&
> > + printf "\\0changed" >binary.c &&
> > +
> > + rm -f backend.log &&
> > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > + diff -- binary.c >actual &&
> > + test_grep "Binary files" actual &&
> > + test_path_is_missing backend.log
> > +'
> > +
> > +test_expect_success PYTHON 'diff process not consulted for unmatched driver' '
> > + echo "not tracked by cdiff" >unmatched.txt &&
> > + git add unmatched.txt &&
> > + git commit -m "add unmatched.txt" &&
> > +
> > + echo "modified" >unmatched.txt &&
> > +
> > + rm -f backend.log &&
> > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > + diff -- unmatched.txt >actual &&
> > + test_grep "modified" actual &&
> > + test_path_is_missing backend.log
> > +'
> > +
> > +test_expect_success PYTHON 'multiple drivers use separate processes' '
> > + echo "*.h diff=hdiff" >>.gitattributes &&
> > + git add .gitattributes &&
> > +
> > + cat >multi.h <<-\EOF &&
> > + int header(void) { return 1; }
> > + EOF
> > + git add multi.h &&
> > + git commit -m "add multi.h" &&
> > +
> > + cat >multi.h <<-\EOF &&
> > + int header(void) { return 2; }
> > + EOF
> > +
> > + rm -f backend-c.log backend-h.log &&
> > + git -c diff.cdiff.process="$BACKEND --log=backend-c.log" \
> > + -c diff.hdiff.process="$BACKEND --log=backend-h.log" \
> > + diff -- newfile.c multi.h >actual 2>stderr &&
> > + test_grep "pathname=newfile.c" backend-c.log &&
> > + test_grep "pathname=multi.h" backend-h.log &&
> > + test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process works alongside textconv' '
> > + write_script uppercase-filter <<-\EOF &&
> > + tr "a-z" "A-Z" <"$1"
> > + EOF
> > +
> > + cat >textconv.c <<-\EOF &&
> > + hello world
> > + EOF
> > + git add textconv.c &&
> > + git commit -m "add textconv.c" &&
> > +
> > + cat >textconv.c <<-\EOF &&
> > + goodbye world
> > + EOF
> > +
> > + rm -f backend.log &&
> > + git -c diff.cdiff.textconv="./uppercase-filter" \
> > + -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > + diff -- textconv.c >actual 2>stderr &&
> > + # The diff process receives textconv-transformed (uppercase) content.
> > + test_grep "pathname=textconv.c" backend.log &&
> > + test_grep "old=HELLO WORLD" backend.log &&
> > + test_grep "new=GOODBYE WORLD" backend.log &&
> > + test_must_be_empty stderr
> > +'
> > +
> > +#
> > +# Downstream features: word diff, log, equivalent files, exit code.
> > +#
> > +
> > +test_expect_success PYTHON 'diff process with --word-diff' '
> > + rm -f backend.log &&
> > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > + diff --word-diff worddiff.c >actual 2>stderr &&
> > + test_grep "\[-1;-\]" actual &&
> > + test_grep "{+999;+}" actual &&
> > + test_grep "pathname=worddiff.c" backend.log &&
> > + test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process works with git log -p' '
> > + # With no-hunks mode, the tool says the files are equivalent,
> > + # so log -p should show the commit but no diff content.
> > + rm -f backend.log &&
> > + git -c diff.cdiff.process="$BACKEND --mode=no-hunks --log=backend.log" \
> > + log -1 -p -- logtest.c >actual 2>stderr &&
> > + test_grep "change logtest.c" actual &&
> > + test_grep ! "return 2" actual &&
> > + test_grep "command=hunks pathname=logtest.c" backend.log &&
> > + test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process no hunks suppresses diff output' '
> > + cat >nohunks.c <<-\EOF &&
> > + int zero(void) { return 0; }
> > + EOF
> > + git add nohunks.c &&
> > + git commit -m "add nohunks.c" &&
> > +
> > + cat >nohunks.c <<-\EOF &&
> > + int zero(void) { return 999; }
> > + EOF
> > +
> > + git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
> > + diff nohunks.c >actual &&
> > + test_must_be_empty actual
> > +'
> > +
> > +test_expect_success PYTHON 'diff process no hunks with --exit-code returns success' '
> > + git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
> > + diff --exit-code nohunks.c
> > +'
> > +
> > +test_expect_success PYTHON 'diff process with --exit-code and hunks returns failure' '
> > + test_expect_code 1 git -c diff.cdiff.process="$BACKEND" \
> > + diff --exit-code newfile.c
> > +'
> > +
> > +#
> > +# Bypass mechanisms: flags and commands that skip the diff process.
> > +#
> > +
> > +test_expect_success PYTHON 'diff process bypassed by --diff-algorithm' '
> > + rm -f backend.log &&
> > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > + diff --diff-algorithm=patience worddiff.c >actual &&
> > + test_grep "return 999" actual &&
> > + test_path_is_missing backend.log
> > +'
> > +
> > +test_expect_success PYTHON 'diff process not used by --stat' '
> > + rm -f backend.log &&
> > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > + diff --stat worddiff.c >actual &&
> > + test_grep "worddiff.c" actual &&
> > + test_path_is_missing backend.log
> > +'
> > +
> > +#
> > +# Error handling and fallback.
> > +#
> > +
> > +test_expect_success PYTHON 'diff process fallback on tool error status' '
> > + rm -f backend.log &&
> > + git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
> > + diff boundary.c >actual 2>stderr &&
> > + # Fallback produces the full builtin diff (both change regions).
> > + test_grep "^-OLD5" actual &&
> > + test_grep "^+NEW5" actual &&
> > + test_grep "^-OLD9" actual &&
> > + test_grep "^+NEW9" actual &&
> > + # Tool was contacted (it replied with error, not crash).
> > + test_grep "command=hunks pathname=boundary.c" backend.log &&
> > + test_grep "diff process.*failed" stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process error keeps tool available for next file' '
> > + rm -f backend.log &&
> > + git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
> > + diff -- one.c two.c >actual 2>stderr &&
> > + # Unlike abort, error keeps the tool available: both files
> > + # are sent to the tool (and both fall back).
> > + test_grep "pathname=one.c" backend.log &&
> > + test_grep "pathname=two.c" backend.log &&
> > + test_grep "return 10" actual &&
> > + test_grep "return 20" actual
> > +'
> > +
> > +test_expect_success PYTHON 'diff process abort disables for session' '
> > + rm -f backend.log &&
> > + git -c diff.cdiff.process="$BACKEND --mode=abort --log=backend.log" \
> > + diff -- one.c two.c >actual &&
> > + # Both files should still produce diff output via fallback.
> > + test_grep "return 10" actual &&
> > + test_grep "return 20" actual &&
> > + # The tool aborts on the first file and git clears its
> > + # capability. The second file never contacts the tool.
> > + test_grep "pathname=one.c" backend.log &&
> > + test_grep ! "pathname=two.c" backend.log
> > +'
> > +
> > +test_expect_success PYTHON 'diff process fallback on tool crash' '
> > + git -c diff.cdiff.process="$BACKEND --mode=crash" \
> > + diff boundary.c >actual 2>stderr &&
> > + test_grep "^-OLD5" actual &&
> > + test_grep "^+NEW5" actual &&
> > + test_grep "^-OLD9" actual &&
> > + test_grep "^+NEW9" actual &&
> > + # Crash is a communication failure, so a warning is emitted.
> > + test_grep "diff process.*failed" stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process startup failure only warns once' '
> > + git -c diff.cdiff.process="/nonexistent/tool" \
> > + diff -- one.c two.c >actual 2>stderr &&
> > + # Both files produce diff output via fallback.
> > + test_grep "return 10" actual &&
> > + test_grep "return 20" actual &&
> > + # Sentinel prevents repeated warnings: only one, not one per file.
> > + test_grep "diff process.*failed" stderr >warnings &&
> > + test_line_count = 1 warnings
> > +'
> > +
> > +test_expect_success PYTHON 'diff process fallback on bad hunks' '
> > + git -c diff.cdiff.process="$BACKEND --mode=bad-hunk" \
> > + diff boundary.c >actual 2>stderr &&
> > + test_grep "^-OLD5" actual &&
> > + test_grep "^+NEW5" actual &&
> > + test_grep "^-OLD9" actual &&
> > + test_grep "^+NEW9" actual &&
> > + # Invalid hunks are caught by xdiff validation, not the
> > + # protocol layer, so no warning is emitted.
> > + test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process fallback on mismatched unchanged totals' '
> > + cat >synctest.c <<-\EOF &&
> > + line1
> > + line2
> > + line3
> > + EOF
> > + git add synctest.c &&
> > + git commit -m "add synctest.c" &&
> > +
> > + cat >synctest.c <<-\EOF &&
> > + line1
> > + changed
> > + line3
> > + EOF
> > +
> > + # bad-sync reports hunk 1 2 1 1: marks 2 old lines and 1 new
> > + # line as changed, leaving 1 unchanged old vs 2 unchanged new.
> > + # The synchronization invariant fails and git falls back.
> > + git -c diff.cdiff.process="$BACKEND --mode=bad-sync" \
> > + diff synctest.c >actual 2>stderr &&
> > + test_grep "changed" actual
> > +'
> > +
> > +test_expect_success PYTHON 'diff process fallback on overlapping hunks' '
> > + # boundary.c has 10 lines, so both hunks are in bounds
> > + # but they overlap at lines 3-5, triggering the ordering check.
> > + git -c diff.cdiff.process="$BACKEND --mode=overlap" \
> > + diff boundary.c >actual 2>stderr &&
> > + test_grep "NEW5" actual
> > +'
> > +
> > +test_done
> > diff --git a/userdiff.h b/userdiff.h
> > index 51c26e0d41..a98eabe377 100644
> > --- a/userdiff.h
> > +++ b/userdiff.h
> > @@ -3,6 +3,7 @@
> >
> > #include "notes-cache.h"
> >
> > +struct diff_subprocess;
> > struct index_state;
> > struct repository;
> >
> > @@ -33,6 +34,8 @@ struct userdiff_driver {
> > int textconv_want_cache;
> > const char *process;
> > char *process_owned;
> > + struct diff_subprocess *diff_subprocess;
> > + unsigned diff_process_failed : 1;
> > };
> > enum userdiff_driver_type {
> > USERDIFF_DRIVER_TYPE_BUILTIN = 1<<0,
> > --
> > gitgitgadget
> >
> >
> >There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Michael Montalbo wrote on the Git mailing list (how to reply to this email): On Sun, Jun 7, 2026 at 7:36 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Michael,
>
> I stumbled about this patch when it broke CI in Git for Windows, where we
> do _not_ use `NO_PYTHON`, even though Python is unavailable in the
> build/test CI jobs. The existing tests handle this situation gracefully,
> this here patch does not:
>
> On Sun, 7 Jun 2026, Michael Montalbo via GitGitGadget wrote:
>
> > diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
> > new file mode 100755
> > index 0000000000..f159cd86d8
> > --- /dev/null
> > +++ b/t/t4080-diff-process.sh
> > @@ -0,0 +1,538 @@
> > +#!/bin/sh
> > +
> > +test_description='diff process via long-running process'
> > +
> > +. ./test-lib.sh
> > +
> > +if test_have_prereq PYTHON
> > +then
> > + PYTHON_PATH=$(command -v python3) || PYTHON_PATH=$(command -v python)
>
> When neither `python3` nor `python` are available (which is the case in
> the minimal Git for Windows SDK used in Git's CI runs), this fails under
> `set -e`. Before even running the first test case. Resulting in an
> unexpected TAP format error.
>
> Now, we could "fix" this by imitating what `lib-p4` does (see
> https://github.com/dscho/git/commit/bd0b5570c744f678911a67a62da63f30655f20d8
> which demonstrates that it is indeed a work-around on Windows):
>
> -- snip --
> diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
> index fdf6da1c341e67..bd22c247ff3856 100755
> --- a/t/t4080-diff-process.sh
> +++ b/t/t4080-diff-process.sh
> @@ -4,9 +4,10 @@ test_description='diff process via long-running process'
>
> . ./test-lib.sh
>
> -if test_have_prereq PYTHON
> +if ! test_have_prereq PYTHON || ! test -x "$PYTHON_PATH"
> then
> - PYTHON_PATH=$(command -v python3) || PYTHON_PATH=$(command -v python)
> + skip_all='python interpreter not available'
> + test_done
> fi
>
> #
> -- snap --
>
> Of course, this uncovers _another_ problem with the Python script: It uses
> Python3-only `f"..."` format strings, which cannot be handled by the
> Python2 to which the `PYTHON_PATH` variable in `linux-TEST-vars` points.
> So this requires _another follow-up (see also
> https://github.com/dscho/git/commit/c12a9f4c80e5ce8db0fe370fac46fb45be2b775f):
>
> -- snip --
> diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
> index bd22c247ff3856..ba14682a9086e4 100755
> --- a/t/t4080-diff-process.sh
> +++ b/t/t4080-diff-process.sh
> @@ -39,7 +39,8 @@ setup_backend () {
>
> def write_pkt(line):
> data = (line + "\n").encode()
> - sys.stdout.buffer.write(f"{len(data)+4:04x}".encode() + data)
> + hdr = "{:04x}".format(len(data) + 4).encode()
> + sys.stdout.buffer.write(hdr + data)
> sys.stdout.buffer.flush()
>
> def write_flush():
> @@ -98,7 +99,8 @@ setup_backend () {
> new = read_content()
> old_first = old.split(b"\n")[0].decode(errors="replace") if old else ""
> new_first = new.split(b"\n")[0].decode(errors="replace") if new else ""
> - log(f"command={cmd} pathname={pathname} old={old_first} new={new_first}")
> + log("command={} pathname={} old={} new={}".format(
> + cmd, pathname, old_first, new_first))
>
> if mode == "error":
> write_flush()
> @@ -130,7 +132,7 @@ setup_backend () {
> else:
> ol = old.count(b"\n")
> nl = new.count(b"\n")
> - write_pkt(f"hunk 1 {ol} 1 {nl}")
> + write_pkt("hunk 1 {} 1 {}".format(ol, nl))
> write_flush()
> write_pkt("status=success")
> write_flush()
> -- snap --
>
> And this is still not enough to make it work with Python2, see
> https://github.com/dscho/git/actions/runs/27091523842/job/79955895737:
>
> -- snip --
> [...]
> + git -c diff.cdiff.process=./diff-process-backend --mode=fixed-hunk diff boundary.c
> Traceback (most recent call last):
> File "/__w/git/git/t/trash directory.t4080-diff-process/diff-process-backend.py", line 45, in <module>
> assert read_pkt() == "git-diff-client"
> File "/__w/git/git/t/trash directory.t4080-diff-process/diff-process-backend.py", line 4, in read_pkt
> hdr = sys.stdin.buffer.read(4)
> AttributeError: 'file' object has no attribute 'buffer'
> -- snap --
>
> I have experienced similar patterns in my career, where a single decision
> required multiple follow-up fixes _just_ to avoid having to revert that
> decision. This kind of doubling down has never ended well.
>
> Therefore I would like to take a step back, and ask: Is it _really_ a good
> idea to use Python here? Are we certain that we want to _require_ Python
> to run this test and skip it if Python isn't available (as is the case in
> the Windows-related parts of Git's very own CI) even if Python has nothing
> at all to do with the feature that is being tested?
>
> I don't want to be doomed to repeat history, and we can very well learn
> e.g. from prior art in this very project, where the tests for the
> clean/smudge filters (which _also_ want to speak pkt-line over stdio)
> needlessly incurred Perl as a requirement to run the tests. It was
> Matheus's heroic work in 52917a998ef3a (t0021: implementation the
> rot13-filter.pl script in C, 2022-08-14) and 4d1d843be7a15 (tests: use the
> new C rot13-filter helper to avoid PERL prereq, 2022-08-14) that avoided
> that unnecessary prerequisite.
>
> Likewise, there is `test-tool pkt-line` intended for driving the pkt-line
> protocol via simple shell scripts.
>
> So the conscious project direction has been: fold pkt-line test backends
> into `test-tool` and drop the scripting-language prereq. Reintroducing the
> same shape in Python would walk this back.
>
> Patrick's careful effort in 27bd8ee311719 (Merge branch 'ps/fewer-perl',
> 2025-04-29) has been another clear sign that the Git project is actively
> _removing_ scripting-language dependencies from the build and test
> infrastructure, not adding new ones.
>
Now I wonder if the extension / addition of more Perl test infra with my other
series:
https://lore.kernel.org/git/pull.2135.git.1780559158.gitgitgadget@gmail.com/T/
also goes against the project direction w.r.t. removing scripting languages.
Maybe I should also re-evaluate the usage of Perl there. I am leveraging
existing shell parsing logic written in Perl, but maybe the preference for
Perl-based lint rules is a mistake and should be avoided.
> The clear prior art in Git's own tests for what t4080 wants to do, as of
> today, is `t/helper/test-rot13-filter.c`, which could be imitated here
> instead of (re-)introducing a dependency on a scripting language other
> than Unix shell in Git's test suite.
>
> The `PYTHON` prereq exists in exactly five files today, all `git p4`
> related (where Python is an intrinsic prerequisite given that `git-p4.py`
> _is_ written in Python): `t/lib-git-p4.sh`, `t/t9802-git-p4-filetype.sh`,
> `t/t9810-git-p4-rcs.sh`, `t/t9835-git-p4-metadata-encoding-python2.sh`,
> and `t/t9836-git-p4-metadata-encoding-python3.sh`.
>
> After 7cdbff14d482 (remove merge-recursive-old, 2006-11-20), this here
> patch would be the first one, after almost 20 years, to re-introduce
> Python as a dependency outside `git p4`.
>
> And it would also be the first ever to embed a Python script as a heredoc:
>
> > +fi
> > +
> > +#
> > +# A single parametric diff process.
> > +# Usage: diff-process-backend --mode=<mode> [--log=<path>]
> > +#
> > +# Modes:
> > +# whole-file - report all lines as changed (default)
> > +# fixed-hunk - always report hunk 5 2 5 2
> > +# bad-hunk - report out-of-bounds hunk 999 1 999 1
> > +# bad-sync - report hunk with mismatched unchanged totals
> > +# overlap - report two overlapping hunks
> > +# no-hunks - return no hunks (files considered equivalent)
> > +# error - return status=error for every request
> > +# abort - return status=abort for every request
> > +# crash - read one request then exit without responding
> > +#
> > +setup_backend () {
> > + cat >"$TRASH_DIRECTORY/diff-process-backend.py" <<-\PYEOF
> > + import sys, os
> > +
> > + def read_pkt():
> > + hdr = sys.stdin.buffer.read(4)
> > + if len(hdr) < 4: return None
> > + length = int(hdr, 16)
> > + if length == 0: return ""
> > + data = sys.stdin.buffer.read(length - 4)
> > + return data.decode().rstrip("\n")
> > +
> > + def write_pkt(line):
> > + data = (line + "\n").encode()
> > + sys.stdout.buffer.write(f"{len(data)+4:04x}".encode() + data)
> > + sys.stdout.buffer.flush()
> > +
> > + def write_flush():
> > + sys.stdout.buffer.write(b"0000")
> > + sys.stdout.buffer.flush()
> > +
> > + def read_content():
> > + chunks = []
> > + while True:
> > + hdr = sys.stdin.buffer.read(4)
> > + if len(hdr) < 4: break
> > + length = int(hdr, 16)
> > + if length == 0: break
> > + chunks.append(sys.stdin.buffer.read(length - 4))
> > + return b"".join(chunks)
> > +
> > + mode = "whole-file"
> > + logfile = None
> > + for arg in sys.argv[1:]:
> > + if arg.startswith("--mode="):
> > + mode = arg[7:]
> > + elif arg.startswith("--log="):
> > + logfile = open(arg[6:], "a")
> > +
> > + def log(msg):
> > + if logfile:
> > + logfile.write(msg + "\n")
> > + logfile.flush()
> > +
> > + # Handshake
> > + assert read_pkt() == "git-diff-client"
> > + assert read_pkt() == "version=1"
> > + read_pkt()
> > + write_pkt("git-diff-server")
> > + write_pkt("version=1")
> > + write_flush()
> > + while True:
> > + p = read_pkt()
> > + if p == "": break
> > + write_pkt("capability=hunks")
> > + write_flush()
> > +
> > + log("ready")
> > +
> > + while True:
> > + cmd = None
> > + pathname = None
> > + while True:
> > + p = read_pkt()
> > + if p is None: sys.exit(0)
> > + if p == "": break
> > + if p.startswith("command="): cmd = p.split("=",1)[1]
> > + if p.startswith("pathname="): pathname = p.split("=",1)[1]
> > + if cmd is None: sys.exit(0)
> > + old = read_content()
> > + new = read_content()
> > + old_first = old.split(b"\n")[0].decode(errors="replace") if old else ""
> > + new_first = new.split(b"\n")[0].decode(errors="replace") if new else ""
> > + log(f"command={cmd} pathname={pathname} old={old_first} new={new_first}")
> > +
> > + if mode == "error":
> > + write_flush()
> > + write_pkt("status=error")
> > + write_flush()
> > + continue
> > +
> > + if mode == "abort":
> > + write_flush()
> > + write_pkt("status=abort")
> > + write_flush()
> > + continue
> > +
> > + if mode == "crash":
> > + sys.exit(1)
> > +
> > + if cmd == "hunks":
> > + if mode == "fixed-hunk":
> > + write_pkt("hunk 5 2 5 2")
> > + elif mode == "bad-hunk":
> > + write_pkt("hunk 999 1 999 1")
> > + elif mode == "bad-sync":
> > + write_pkt("hunk 1 2 1 1")
> > + elif mode == "overlap":
> > + write_pkt("hunk 1 5 1 5")
> > + write_pkt("hunk 3 2 3 2")
> > + elif mode == "no-hunks":
> > + pass
> > + else:
> > + ol = old.count(b"\n")
> > + nl = new.count(b"\n")
> > + write_pkt(f"hunk 1 {ol} 1 {nl}")
> > + write_flush()
> > + write_pkt("status=success")
> > + write_flush()
> > + else:
> > + write_flush()
> > + write_pkt("status=error")
> > + write_flush()
> > + PYEOF
>
> The existing pattern is to provide larger scripts as fixtures in
> associated `t/tNNNN/` directories, not as heredoc, see e.g.
> `t/t1509/prepare-chroot.sh`. Writing scripts, especially lengthy ones, in
> heredoc strings makes it virtually impossible to use static code analysis
> or syntax highlighting to fend off banal errors.
>
> Given the complexity of what t4080 tries to test (error, abort, crash,
> bad-sync, no-hunks, multiple files in one session, capability
> negotiation), it would unfortunately be infeasible to use `test-tool
> pkt-line` from a shell script implementing that `diff.*.process` protocol.
>
> So I've spiked a demo how the `test-tool diff-process-backend` could look
> like (letting Opus do the menial typing, so that I can enjoy at least part
> of a sunny Sunday outside), which also passes the CI build and test:
> https://github.com/dscho/git/commit/b6e3c93381b00929476c3a00155f7cf7334a22e6
>
> That commit is of course not intended to be used as-is; Feel free to pick
> code parts of it and integrate them into your topic branch. Or write your
> own test-tool helper from scratch if that's more your jam.
>
> Ciao,
> Johannes
>
> > + write_script diff-process-backend <<-SHEOF
> > + exec "$PYTHON_PATH" "$TRASH_DIRECTORY/diff-process-backend.py" "\$@"
> > + SHEOF
> > +}
> > +
> > +BACKEND="./diff-process-backend"
> > +
> > +test_expect_success PYTHON 'setup' '
> > + setup_backend &&
> > + echo "*.c diff=cdiff" >.gitattributes &&
> > + git add .gitattributes &&
> > +
> > + # boundary.c: 10 lines, changes at 5-6 and 9-10.
> > + # Used by: hunk boundaries, error fallback, crash, bad hunks, overlap.
> > + cat >boundary.c <<-\EOF &&
> > + line1
> > + line2
> > + line3
> > + line4
> > + OLD5
> > + OLD6
> > + line7
> > + line8
> > + OLD9
> > + OLD10
> > + EOF
> > + git add boundary.c &&
> > +
> > + # worddiff.c: single-line function, value changes 1 -> 999.
> > + # Used by: word-diff, --diff-algorithm, --no-ext-diff, --stat.
> > + cat >worddiff.c <<-\EOF &&
> > + int value(void) { return 1; }
> > + EOF
> > + git add worddiff.c &&
> > +
> > + # newfile.c: single-line function, value changes 42 -> 99.
> > + # Used by: new file, --exit-code, multiple drivers.
> > + cat >newfile.c <<-\EOF &&
> > + int new_func(void) { return 42; }
> > + EOF
> > + git add newfile.c &&
> > +
> > + # logtest.c: single-line function for log/format-patch tests.
> > + # Needs two commits so log -1 has a diff.
> > + cat >logtest.c <<-\EOF &&
> > + int logfunc(void) { return 1; }
> > + EOF
> > + git add logtest.c &&
> > +
> > + # two.c/one.c: two-file pair for error/abort/startup-failure tests.
> > + cat >one.c <<-\EOF &&
> > + int first(void) { return 1; }
> > + EOF
> > + cat >two.c <<-\EOF &&
> > + int second(void) { return 2; }
> > + EOF
> > + git add one.c two.c &&
> > +
> > + git commit -m "initial" &&
> > +
> > + # Second commit for logtest.c (so log -1 has something to show).
> > + cat >logtest.c <<-\EOF &&
> > + int logfunc(void) { return 2; }
> > + EOF
> > + git add logtest.c &&
> > + git commit -m "change logtest.c" &&
> > +
> > + # Working tree modifications (not committed).
> > + cat >boundary.c <<-\EOF &&
> > + line1
> > + line2
> > + line3
> > + line4
> > + NEW5
> > + NEW6
> > + line7
> > + line8
> > + NEW9
> > + NEW10
> > + EOF
> > +
> > + cat >worddiff.c <<-\EOF &&
> > + int value(void) { return 999; }
> > + EOF
> > +
> > + cat >newfile.c <<-\EOF &&
> > + int new_func(void) { return 99; }
> > + EOF
> > +
> > + cat >one.c <<-\EOF &&
> > + int first(void) { return 10; }
> > + EOF
> > +
> > + cat >two.c <<-\EOF
> > + int second(void) { return 20; }
> > + EOF
> > +'
> > +
> > +#
> > +# Core behavior: the tool controls which lines are marked as changed.
> > +#
> > +
> > +test_expect_success PYTHON 'diff process hunk boundaries affect output' '
> > + # The file has changes at lines 5-6 and 9-10, but fixed-hunk
> > + # only reports lines 5-6 as changed. Lines 9-10 should not
> > + # appear as changed in the output.
> > + git -c diff.cdiff.process="$BACKEND --mode=fixed-hunk" \
> > + diff boundary.c >actual &&
> > + test_grep "^-OLD5" actual &&
> > + test_grep "^-OLD6" actual &&
> > + test_grep "^+NEW5" actual &&
> > + test_grep "^+NEW6" actual &&
> > + test_grep ! "^-OLD9" actual &&
> > + test_grep ! "^-OLD10" actual &&
> > + test_grep ! "^+NEW9" actual &&
> > + test_grep ! "^+NEW10" actual
> > +'
> > +
> > +test_expect_success PYTHON 'diff process works with new file' '
> > + rm -f backend.log &&
> > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > + diff -- newfile.c >actual 2>stderr &&
> > + test_grep "return 99" actual &&
> > + test_grep "pathname=newfile.c" backend.log &&
> > + test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process works with added file (empty old side)' '
> > + cat >added.c <<-\EOF &&
> > + int added(void) { return 1; }
> > + EOF
> > + git add added.c &&
> > +
> > + rm -f backend.log &&
> > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > + diff --cached -- added.c >actual 2>stderr &&
> > + test_grep "added" actual &&
> > + test_grep "pathname=added.c" backend.log &&
> > + test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process skipped for binary files' '
> > + printf "\\0binary" >binary.c &&
> > + git add binary.c &&
> > + git commit -m "add binary" &&
> > + printf "\\0changed" >binary.c &&
> > +
> > + rm -f backend.log &&
> > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > + diff -- binary.c >actual &&
> > + test_grep "Binary files" actual &&
> > + test_path_is_missing backend.log
> > +'
> > +
> > +test_expect_success PYTHON 'diff process not consulted for unmatched driver' '
> > + echo "not tracked by cdiff" >unmatched.txt &&
> > + git add unmatched.txt &&
> > + git commit -m "add unmatched.txt" &&
> > +
> > + echo "modified" >unmatched.txt &&
> > +
> > + rm -f backend.log &&
> > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > + diff -- unmatched.txt >actual &&
> > + test_grep "modified" actual &&
> > + test_path_is_missing backend.log
> > +'
> > +
> > +test_expect_success PYTHON 'multiple drivers use separate processes' '
> > + echo "*.h diff=hdiff" >>.gitattributes &&
> > + git add .gitattributes &&
> > +
> > + cat >multi.h <<-\EOF &&
> > + int header(void) { return 1; }
> > + EOF
> > + git add multi.h &&
> > + git commit -m "add multi.h" &&
> > +
> > + cat >multi.h <<-\EOF &&
> > + int header(void) { return 2; }
> > + EOF
> > +
> > + rm -f backend-c.log backend-h.log &&
> > + git -c diff.cdiff.process="$BACKEND --log=backend-c.log" \
> > + -c diff.hdiff.process="$BACKEND --log=backend-h.log" \
> > + diff -- newfile.c multi.h >actual 2>stderr &&
> > + test_grep "pathname=newfile.c" backend-c.log &&
> > + test_grep "pathname=multi.h" backend-h.log &&
> > + test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process works alongside textconv' '
> > + write_script uppercase-filter <<-\EOF &&
> > + tr "a-z" "A-Z" <"$1"
> > + EOF
> > +
> > + cat >textconv.c <<-\EOF &&
> > + hello world
> > + EOF
> > + git add textconv.c &&
> > + git commit -m "add textconv.c" &&
> > +
> > + cat >textconv.c <<-\EOF &&
> > + goodbye world
> > + EOF
> > +
> > + rm -f backend.log &&
> > + git -c diff.cdiff.textconv="./uppercase-filter" \
> > + -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > + diff -- textconv.c >actual 2>stderr &&
> > + # The diff process receives textconv-transformed (uppercase) content.
> > + test_grep "pathname=textconv.c" backend.log &&
> > + test_grep "old=HELLO WORLD" backend.log &&
> > + test_grep "new=GOODBYE WORLD" backend.log &&
> > + test_must_be_empty stderr
> > +'
> > +
> > +#
> > +# Downstream features: word diff, log, equivalent files, exit code.
> > +#
> > +
> > +test_expect_success PYTHON 'diff process with --word-diff' '
> > + rm -f backend.log &&
> > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > + diff --word-diff worddiff.c >actual 2>stderr &&
> > + test_grep "\[-1;-\]" actual &&
> > + test_grep "{+999;+}" actual &&
> > + test_grep "pathname=worddiff.c" backend.log &&
> > + test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process works with git log -p' '
> > + # With no-hunks mode, the tool says the files are equivalent,
> > + # so log -p should show the commit but no diff content.
> > + rm -f backend.log &&
> > + git -c diff.cdiff.process="$BACKEND --mode=no-hunks --log=backend.log" \
> > + log -1 -p -- logtest.c >actual 2>stderr &&
> > + test_grep "change logtest.c" actual &&
> > + test_grep ! "return 2" actual &&
> > + test_grep "command=hunks pathname=logtest.c" backend.log &&
> > + test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process no hunks suppresses diff output' '
> > + cat >nohunks.c <<-\EOF &&
> > + int zero(void) { return 0; }
> > + EOF
> > + git add nohunks.c &&
> > + git commit -m "add nohunks.c" &&
> > +
> > + cat >nohunks.c <<-\EOF &&
> > + int zero(void) { return 999; }
> > + EOF
> > +
> > + git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
> > + diff nohunks.c >actual &&
> > + test_must_be_empty actual
> > +'
> > +
> > +test_expect_success PYTHON 'diff process no hunks with --exit-code returns success' '
> > + git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
> > + diff --exit-code nohunks.c
> > +'
> > +
> > +test_expect_success PYTHON 'diff process with --exit-code and hunks returns failure' '
> > + test_expect_code 1 git -c diff.cdiff.process="$BACKEND" \
> > + diff --exit-code newfile.c
> > +'
> > +
> > +#
> > +# Bypass mechanisms: flags and commands that skip the diff process.
> > +#
> > +
> > +test_expect_success PYTHON 'diff process bypassed by --diff-algorithm' '
> > + rm -f backend.log &&
> > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > + diff --diff-algorithm=patience worddiff.c >actual &&
> > + test_grep "return 999" actual &&
> > + test_path_is_missing backend.log
> > +'
> > +
> > +test_expect_success PYTHON 'diff process not used by --stat' '
> > + rm -f backend.log &&
> > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > + diff --stat worddiff.c >actual &&
> > + test_grep "worddiff.c" actual &&
> > + test_path_is_missing backend.log
> > +'
> > +
> > +#
> > +# Error handling and fallback.
> > +#
> > +
> > +test_expect_success PYTHON 'diff process fallback on tool error status' '
> > + rm -f backend.log &&
> > + git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
> > + diff boundary.c >actual 2>stderr &&
> > + # Fallback produces the full builtin diff (both change regions).
> > + test_grep "^-OLD5" actual &&
> > + test_grep "^+NEW5" actual &&
> > + test_grep "^-OLD9" actual &&
> > + test_grep "^+NEW9" actual &&
> > + # Tool was contacted (it replied with error, not crash).
> > + test_grep "command=hunks pathname=boundary.c" backend.log &&
> > + test_grep "diff process.*failed" stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process error keeps tool available for next file' '
> > + rm -f backend.log &&
> > + git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
> > + diff -- one.c two.c >actual 2>stderr &&
> > + # Unlike abort, error keeps the tool available: both files
> > + # are sent to the tool (and both fall back).
> > + test_grep "pathname=one.c" backend.log &&
> > + test_grep "pathname=two.c" backend.log &&
> > + test_grep "return 10" actual &&
> > + test_grep "return 20" actual
> > +'
> > +
> > +test_expect_success PYTHON 'diff process abort disables for session' '
> > + rm -f backend.log &&
> > + git -c diff.cdiff.process="$BACKEND --mode=abort --log=backend.log" \
> > + diff -- one.c two.c >actual &&
> > + # Both files should still produce diff output via fallback.
> > + test_grep "return 10" actual &&
> > + test_grep "return 20" actual &&
> > + # The tool aborts on the first file and git clears its
> > + # capability. The second file never contacts the tool.
> > + test_grep "pathname=one.c" backend.log &&
> > + test_grep ! "pathname=two.c" backend.log
> > +'
> > +
> > +test_expect_success PYTHON 'diff process fallback on tool crash' '
> > + git -c diff.cdiff.process="$BACKEND --mode=crash" \
> > + diff boundary.c >actual 2>stderr &&
> > + test_grep "^-OLD5" actual &&
> > + test_grep "^+NEW5" actual &&
> > + test_grep "^-OLD9" actual &&
> > + test_grep "^+NEW9" actual &&
> > + # Crash is a communication failure, so a warning is emitted.
> > + test_grep "diff process.*failed" stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process startup failure only warns once' '
> > + git -c diff.cdiff.process="/nonexistent/tool" \
> > + diff -- one.c two.c >actual 2>stderr &&
> > + # Both files produce diff output via fallback.
> > + test_grep "return 10" actual &&
> > + test_grep "return 20" actual &&
> > + # Sentinel prevents repeated warnings: only one, not one per file.
> > + test_grep "diff process.*failed" stderr >warnings &&
> > + test_line_count = 1 warnings
> > +'
> > +
> > +test_expect_success PYTHON 'diff process fallback on bad hunks' '
> > + git -c diff.cdiff.process="$BACKEND --mode=bad-hunk" \
> > + diff boundary.c >actual 2>stderr &&
> > + test_grep "^-OLD5" actual &&
> > + test_grep "^+NEW5" actual &&
> > + test_grep "^-OLD9" actual &&
> > + test_grep "^+NEW9" actual &&
> > + # Invalid hunks are caught by xdiff validation, not the
> > + # protocol layer, so no warning is emitted.
> > + test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process fallback on mismatched unchanged totals' '
> > + cat >synctest.c <<-\EOF &&
> > + line1
> > + line2
> > + line3
> > + EOF
> > + git add synctest.c &&
> > + git commit -m "add synctest.c" &&
> > +
> > + cat >synctest.c <<-\EOF &&
> > + line1
> > + changed
> > + line3
> > + EOF
> > +
> > + # bad-sync reports hunk 1 2 1 1: marks 2 old lines and 1 new
> > + # line as changed, leaving 1 unchanged old vs 2 unchanged new.
> > + # The synchronization invariant fails and git falls back.
> > + git -c diff.cdiff.process="$BACKEND --mode=bad-sync" \
> > + diff synctest.c >actual 2>stderr &&
> > + test_grep "changed" actual
> > +'
> > +
> > +test_expect_success PYTHON 'diff process fallback on overlapping hunks' '
> > + # boundary.c has 10 lines, so both hunks are in bounds
> > + # but they overlap at lines 3-5, triggering the ordering check.
> > + git -c diff.cdiff.process="$BACKEND --mode=overlap" \
> > + diff boundary.c >actual 2>stderr &&
> > + test_grep "NEW5" actual
> > +'
> > +
> > +test_done
> > diff --git a/userdiff.h b/userdiff.h
> > index 51c26e0d41..a98eabe377 100644
> > --- a/userdiff.h
> > +++ b/userdiff.h
> > @@ -3,6 +3,7 @@
> >
> > #include "notes-cache.h"
> >
> > +struct diff_subprocess;
> > struct index_state;
> > struct repository;
> >
> > @@ -33,6 +34,8 @@ struct userdiff_driver {
> > int textconv_want_cache;
> > const char *process;
> > char *process_owned;
> > + struct diff_subprocess *diff_subprocess;
> > + unsigned diff_process_failed : 1;
> > };
> > enum userdiff_driver_type {
> > USERDIFF_DRIVER_TYPE_BUILTIN = 1<<0,
> > --
> > gitgitgadget
> >
> >
> >There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Junio C Hamano wrote on the Git mailing list (how to reply to this email): Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> So the conscious project direction has been: fold pkt-line test backends
> into `test-tool` and drop the scripting-language prereq. Reintroducing the
> same shape in Python would walk this back.
> ...
> The `PYTHON` prereq exists in exactly five files today, all `git p4`
> related (where Python is an intrinsic prerequisite given that `git-p4.py`
> _is_ written in Python): `t/lib-git-p4.sh`, `t/t9802-git-p4-filetype.sh`,
> `t/t9810-git-p4-rcs.sh`, `t/t9835-git-p4-metadata-encoding-python2.sh`,
> and `t/t9836-git-p4-metadata-encoding-python3.sh`.
> ...
> That commit is of course not intended to be used as-is; Feel free to pick
> code parts of it and integrate them into your topic branch. Or write your
> own test-tool helper from scratch if that's more your jam.
Showing better direction to new folks with such a clear thinking is
very much appreciated. Even though it is natural and perfectly OK
for tests that interacts with parts of Git that are written in these
languages (e.g., we are OK for gitweb tests to require Perl), we
should consciously keep ourselves clean and not adding unnecessary
dependencies.There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Junio C Hamano wrote on the Git mailing list (how to reply to this email): Michael Montalbo <mmontalbo@gmail.com> writes:
> On Sun, Jun 7, 2026 at 7:36 AM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>>
>> Hi Michael,
>>
>> I stumbled about this patch when it broke CI in Git for Windows, where we
>> do _not_ use `NO_PYTHON`, even though Python is unavailable in the
>> build/test CI jobs. The existing tests handle this situation gracefully,
>> this here patch does not:
>> ...
>> Given the complexity of what t4080 tries to test (error, abort, crash,
>> bad-sync, no-hunks, multiple files in one session, capability
>> negotiation), it would unfortunately be infeasible to use `test-tool
>> pkt-line` from a shell script implementing that `diff.*.process` protocol.
>>
>> So I've spiked a demo how the `test-tool diff-process-backend` could look
>> like (letting Opus do the menial typing, so that I can enjoy at least part
>> of a sunny Sunday outside), which also passes the CI build and test:
>> https://github.com/dscho/git/commit/b6e3c93381b00929476c3a00155f7cf7334a22e6
>>
>> That commit is of course not intended to be used as-is; Feel free to pick
>> code parts of it and integrate them into your topic branch. Or write your
>> own test-tool helper from scratch if that's more your jam.
>>
>
> Johannes, thank you for the great feedback. The historical context is
> really helpful and
> the concerns you raise make a lot of sense. I will take a look at your
> spike and also work
> on removing Python from the test.
Another request.
Please do not force readers to scroll through a ~800 line message
just to read only 5 lines of response from you. Keep relevant parts
of the message you are responding to in your message to help readers
understand the context in which your response was made, but trim
everything else that is not relevant from your quote.
Thanks.There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Junio C Hamano wrote on the Git mailing list (how to reply to this email): Michael Montalbo <mmontalbo@gmail.com> writes:
>> So the conscious project direction has been: fold pkt-line test backends
>> into `test-tool` and drop the scripting-language prereq. Reintroducing the
>> same shape in Python would walk this back.
>>
>> Patrick's careful effort in 27bd8ee311719 (Merge branch 'ps/fewer-perl',
>> 2025-04-29) has been another clear sign that the Git project is actively
>> _removing_ scripting-language dependencies from the build and test
>> infrastructure, not adding new ones.
>
> Now I wonder if the extension / addition of more Perl test infra with my other
> series:
>
> https://lore.kernel.org/git/pull.2135.git.1780559158.gitgitgadget@gmail.com/T/
>
> also goes against the project direction w.r.t. removing scripting languages.
> Maybe I should also re-evaluate the usage of Perl there. I am leveraging
> existing shell parsing logic written in Perl, but maybe the preference for
> Perl-based lint rules is a mistake and should be avoided.
That sounds prudent (even though it is outside the scope of _this_
topic, of course).
Thanks. |
||
| conversion outputs. See linkgit:gitattributes[5] for details. | ||
|
|
||
| `diff.<driver>.process`:: | ||
| The command to run as a long-running diff process that | ||
| provides hunks to Git's diff pipeline. | ||
| See linkgit:gitattributes[5] for details. | ||
|
|
||
| `diff.indentHeuristic`:: | ||
| Set this option to `false` to disable the default heuristics | ||
| that shift diff hunk boundaries to make patches easier to read. | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Junio C Hamano wrote on the Git mailing list (how to reply to this email):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Montalbo wrote on the Git mailing list (how to reply to this email):