[RFC] diff: add `diff.<driver>.process` for external hunk providers #2120

gitgitgadget · 2026-05-26T05:46:31Z

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

"Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com> writes: > +struct diff_subprocess { > + struct subprocess_entry subprocess; > + unsigned int supported_capabilities; > +}; > + > +static int subprocess_map_initialized; > +static struct hashmap subprocess_map; Can we avoid introducing new global variables like these? Would "struct userdiff_driver" or "struct diff_options" be a good place to hang this hashmap, perhaps? > +static int send_file_content(int fd, const char *buf, long size) > +{ > + int ret; > + > + if (size > 0) > + ret = write_packetized_from_buf_no_flush(buf, size, fd); > + else > + ret = 0; Shouldn't "size == -24" be flagged as an invalid input? > + if (ret) > + return ret; > + return packet_flush_gently(fd); > +} > +static int parse_hunk_line(const char *line, struct xdl_hunk *hunk) > +{ > +... > +} This gives a silent error diagnosis, which is good for a lower level helper. > +int diff_process_get_hunks(struct userdiff_driver *drv, > + const char *path, > + const char *old_buf, long old_size, > + const char *new_buf, long new_size, > + struct xdl_hunk **hunks_out, > + size_t *nr_hunks_out) > +{ > + struct diff_subprocess *backend; > + struct child_process *process; > + int fd_in, fd_out; > + struct strbuf status = STRBUF_INIT; > + struct xdl_hunk *hunks = NULL; > + struct xdl_hunk hunk; > + size_t nr_hunks = 0, alloc_hunks = 0; > + int len; > + char *line; > + > + if (!drv || !drv->process) > + return -1; A driver that does not define process is not an error; it is perfectly normal in the current world order where nobody has such an external process and even fi this patch lands, external processes are optional. So here "return -1" does not mean an error, and silent return is perfectly fine. > + backend = find_or_start_process(drv->process); > + if (!backend) > + return -1; This is probably an error; the user specified drv->process, we either tried to find or start the process and failed. Isn't it an event that deserves to be reported in an error message? > + if (!(backend->supported_capabilities & CAP_HUNKS)) > + return -1; Backend started, but the "hunks" feature is not supported. Perhaps in a year or two, this external process protocol may have become so popular that it gained more capabilities, possibly making get_hunks obsolete. We may be looking at such an external process that uses other capabilities but not this one. This is not an error, so silent return is perfectly fine. > + process = subprocess_get_child_process(&backend->subprocess); > + fd_in = process->in; > + fd_out = process->out; > + > + /* Send request */ > + if (packet_write_fmt_gently(fd_in, "command=hunks\n") || > + packet_write_fmt_gently(fd_in, "pathname=%s\n", path) || > + packet_flush_gently(fd_in)) > + goto error; > + > + /* Send old file content */ > + if (send_file_content(fd_in, old_buf, old_size)) > + goto error; > + > + /* Send new file content */ > + if (send_file_content(fd_in, new_buf, new_size)) > + goto error; > + > + /* Read hunks until flush packet */ > + while ((len = packet_read_line_gently(fd_out, NULL, &line)) >= 0 && > + line) { > + if (parse_hunk_line(line, &hunk) < 0) > + goto error; > + ALLOC_GROW(hunks, nr_hunks + 1, alloc_hunks); > + hunks[nr_hunks++] = hunk; > + } > + if (len < 0) > + goto error; > + > + /* Read status */ > + if (subprocess_read_status(fd_out, &status)) > + goto error; > + > + if (strcmp(status.buf, "success")) { > + if (!strcmp(status.buf, "abort")) > + backend->supported_capabilities &= ~CAP_HUNKS; > + goto error; > + } > + > + *hunks_out = hunks; > + *nr_hunks_out = nr_hunks; > + strbuf_release(&status); > + return 0; > + > +error: All exceptions that lead here look like events that should be reported to the end-user. > + free(hunks); > + strbuf_release(&status); > + return -1; > +} > +/* > + * Query a diff process for hunks describing the changes > + * between old_buf and new_buf. > + * > + * The backend is a long-running subprocess configured via > + * diff.<driver>.process. It receives file content via > + * pkt-line and returns hunks with 1-based line numbers. > + * > + * On success, sets *hunks_out and *nr_hunks_out to a newly allocated > + * array (caller must free) and returns 0. > + * > + * On failure, returns -1. The caller should fall back to the > + * builtin diff algorithm. > + */ I do not agree with this. If it is a failure, the user should fix the external process (or disable). It shouldn't be hidden behind a fallback. As I left comments, in this round of implementation, there are conditions that returns -1 for soemthing that is not an error (i.e., not configured, or process not supporting the particular capability) *and* in those cases the caller should fall back as if nothing happened. But some error cases, the caller should't hide them.

Michael Montalbo wrote on the Git mailing list (how to reply to this email):

On Mon, May 25, 2026 at 6:56 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > +struct diff_subprocess { > > + struct subprocess_entry subprocess; > > + unsigned int supported_capabilities; > > +}; > > + > > +static int subprocess_map_initialized; > > +static struct hashmap subprocess_map; > > Can we avoid introducing new global variables like these? Would > "struct userdiff_driver" or "struct diff_options" be a good place to > hang this hashmap, perhaps? > Will clean this up. > > +static int send_file_content(int fd, const char *buf, long size) > > +{ > > + int ret; > > + > > + if (size > 0) > > + ret = write_packetized_from_buf_no_flush(buf, size, fd); > > + else > > + ret = 0; > > Shouldn't "size == -24" be flagged as an invalid input? > Will fix and do a broader audit of input validation and bounds checking. > > + if (ret) > > + return ret; > > + return packet_flush_gently(fd); > > +} > > > +static int parse_hunk_line(const char *line, struct xdl_hunk *hunk) > > +{ > > +... > > +} > > This gives a silent error diagnosis, which is good for a lower level > helper. > > > +int diff_process_get_hunks(struct userdiff_driver *drv, > > + const char *path, > > + const char *old_buf, long old_size, > > + const char *new_buf, long new_size, > > + struct xdl_hunk **hunks_out, > > + size_t *nr_hunks_out) > > +{ > > + struct diff_subprocess *backend; > > + struct child_process *process; > > + int fd_in, fd_out; > > + struct strbuf status = STRBUF_INIT; > > + struct xdl_hunk *hunks = NULL; > > + struct xdl_hunk hunk; > > + size_t nr_hunks = 0, alloc_hunks = 0; > > + int len; > > + char *line; > > + > > + if (!drv || !drv->process) > > + return -1; > > A driver that does not define process is not an error; it is > perfectly normal in the current world order where nobody has such an > external process and even fi this patch lands, external processes > are optional. So here "return -1" does not mean an error, and > silent return is perfectly fine. > > > + backend = find_or_start_process(drv->process); > > + if (!backend) > > + return -1; > > This is probably an error; the user specified drv->process, we > either tried to find or start the process and failed. Isn't it an > event that deserves to be reported in an error message? > > > + if (!(backend->supported_capabilities & CAP_HUNKS)) > > + return -1; > > Backend started, but the "hunks" feature is not supported. Perhaps > in a year or two, this external process protocol may have become so > popular that it gained more capabilities, possibly making get_hunks > obsolete. We may be looking at such an external process that uses > other capabilities but not this one. This is not an error, so > silent return is perfectly fine. > > > + process = subprocess_get_child_process(&backend->subprocess); > > + fd_in = process->in; > > + fd_out = process->out; > > + > > + /* Send request */ > > + if (packet_write_fmt_gently(fd_in, "command=hunks\n") || > > + packet_write_fmt_gently(fd_in, "pathname=%s\n", path) || > > + packet_flush_gently(fd_in)) > > + goto error; > > + > > + /* Send old file content */ > > + if (send_file_content(fd_in, old_buf, old_size)) > > + goto error; > > + > > + /* Send new file content */ > > + if (send_file_content(fd_in, new_buf, new_size)) > > + goto error; > > + > > + /* Read hunks until flush packet */ > > + while ((len = packet_read_line_gently(fd_out, NULL, &line)) >= 0 && > > + line) { > > + if (parse_hunk_line(line, &hunk) < 0) > > + goto error; > > + ALLOC_GROW(hunks, nr_hunks + 1, alloc_hunks); > > + hunks[nr_hunks++] = hunk; > > + } > > + if (len < 0) > > + goto error; > > + > > + /* Read status */ > > + if (subprocess_read_status(fd_out, &status)) > > + goto error; > > + > > + if (strcmp(status.buf, "success")) { > > + if (!strcmp(status.buf, "abort")) > > + backend->supported_capabilities &= ~CAP_HUNKS; > > + goto error; > > + } > > + > > + *hunks_out = hunks; > > + *nr_hunks_out = nr_hunks; > > + strbuf_release(&status); > > + return 0; > > + > > +error: > > All exceptions that lead here look like events that should be > reported to the end-user. > Agreed on all points. I will restructure things so errors are flagged when appropriate (i.e., user specified a process but one was not found / couldn't start and exceptions) and non-errors are treated as they should be. > > + free(hunks); > > + strbuf_release(&status); > > + return -1; > > +} > > > +/* > > + * Query a diff process for hunks describing the changes > > + * between old_buf and new_buf. > > + * > > + * The backend is a long-running subprocess configured via > > + * diff.<driver>.process. It receives file content via > > + * pkt-line and returns hunks with 1-based line numbers. > > + * > > + * On success, sets *hunks_out and *nr_hunks_out to a newly allocated > > + * array (caller must free) and returns 0. > > + * > > + * On failure, returns -1. The caller should fall back to the > > + * builtin diff algorithm. > > + */ > > I do not agree with this. If it is a failure, the user should fix > the external process (or disable). It shouldn't be hidden behind a > fallback. As I left comments, in this round of implementation, > there are conditions that returns -1 for soemthing that is not an > error (i.e., not configured, or process not supporting the > particular capability) *and* in those cases the caller should fall > back as if nothing happened. But some error cases, the caller > should't hide them. Will address in a follow-up. Thank you for the feedback!

gitgitgadget · 2026-05-26T05:46:39Z

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

"Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com> writes: > Zero hunks with status=success means the tool considers the > files equivalent. Git skips diff output for that file. Is "zero hunk" a common word or some random string you invented? If the latter, which is I am assuming it to be, you should define what it means at/before the first use. Here in the proposed log message, and ... > > Signed-off-by: Michael Montalbo <mmontalbo@gmail.com> > --- > Documentation/config/diff.adoc | 8 + > Documentation/gitattributes.adoc | 40 ++++ > Makefile | 1 + > diff-process.c | 206 +++++++++++++++++++ > diff-process.h | 28 +++ > diff.c | 23 +++ > t/.gitattributes | 1 + > t/t4080-diff-process.sh | 338 +++++++++++++++++++++++++++++++ > 8 files changed, 645 insertions(+) > create mode 100644 diff-process.c > create mode 100644 diff-process.h > create mode 100755 t/t4080-diff-process.sh > > diff --git a/Documentation/config/diff.adoc b/Documentation/config/diff.adoc > index 1135a62a0a..4ab5f60df6 100644 > --- a/Documentation/config/diff.adoc > +++ b/Documentation/config/diff.adoc > @@ -218,6 +218,14 @@ endif::git-diff[] > Set this option to `true` to make the diff driver cache the text > conversion outputs. See linkgit:gitattributes[5] for details. > > +`diff.<driver>.process`:: > + The command to run as a long-running diff process. > + The tool communicates via the pkt-line protocol and returns > + hunks that are fed into Git's diff and blame pipelines. > + If the tool returns zero hunks, the file is treated as > + unchanged for both diff output and blame attribution. > + See linkgit:gitattributes[5] for details. ... also here. I do not know if you mean "the tool returns no hunks" (there is no "hunk <old_start> <old_count> <new_start> <new_count>" line passed from the tool over the protocol) or "the tool returns zero-hunk" (there is a special "zero-hunk" message to signal this particular condition sent over the protocol), and this description does not quite help disambiguating between the two. If the former, then avoid "zero hunks" as it sounds like a noun with special meaning. Yes, we can say "tool returns one hunk", "tool returns 31 hunks", etc., so "tool returns zero hunks" may logically be correct, but "when the tool returns no hunks with status=success" is much less confusing, I think.

Michael Montalbo wrote on the Git mailing list (how to reply to this email):

On Mon, May 25, 2026 at 7:26 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > Zero hunks with status=success means the tool considers the > > files equivalent. Git skips diff output for that file. > > Is "zero hunk" a common word or some random string you invented? If > the latter, which is I am assuming it to be, you should define what > it means at/before the first use. Here in the proposed log message, > and ... > > > > > Signed-off-by: Michael Montalbo <mmontalbo@gmail.com> > > --- > > Documentation/config/diff.adoc | 8 + > > Documentation/gitattributes.adoc | 40 ++++ > > Makefile | 1 + > > diff-process.c | 206 +++++++++++++++++++ > > diff-process.h | 28 +++ > > diff.c | 23 +++ > > t/.gitattributes | 1 + > > t/t4080-diff-process.sh | 338 +++++++++++++++++++++++++++++++ > > 8 files changed, 645 insertions(+) > > create mode 100644 diff-process.c > > create mode 100644 diff-process.h > > create mode 100755 t/t4080-diff-process.sh > > > > diff --git a/Documentation/config/diff.adoc b/Documentation/config/diff.adoc > > index 1135a62a0a..4ab5f60df6 100644 > > --- a/Documentation/config/diff.adoc > > +++ b/Documentation/config/diff.adoc > > @@ -218,6 +218,14 @@ endif::git-diff[] > > Set this option to `true` to make the diff driver cache the text > > conversion outputs. See linkgit:gitattributes[5] for details. > > > > +`diff.<driver>.process`:: > > + The command to run as a long-running diff process. > > + The tool communicates via the pkt-line protocol and returns > > + hunks that are fed into Git's diff and blame pipelines. > > + If the tool returns zero hunks, the file is treated as > > + unchanged for both diff output and blame attribution. > > + See linkgit:gitattributes[5] for details. > > ... also here. > > I do not know if you mean "the tool returns no hunks" (there is no > "hunk <old_start> <old_count> <new_start> <new_count>" line passed > from the tool over the protocol) or "the tool returns zero-hunk" > (there is a special "zero-hunk" message to signal this particular > condition sent over the protocol), and this description does not > quite help disambiguating between the two. > > If the former, then avoid "zero hunks" as it sounds like a noun with > special meaning. Yes, we can say "tool returns one hunk", "tool > returns 31 hunks", etc., so "tool returns zero hunks" may logically > be correct, but "when the tool returns no hunks with status=success" > is much less confusing, I think. Yes, "zero hunks" was my own invention and I see why it's confusing. Will update the messaging to use "no hunks" instead and do a broader sweep of the documentation to clarify the protocol and expected tool behavior.

gitgitgadget · 2026-06-07T14:55:48Z

Johannes Schindelin wrote on the Git mailing list (how to reply to this email):

Hi Michael, I stumbled about this patch when it broke CI in Git for Windows, where we do _not_ use `NO_PYTHON`, even though Python is unavailable in the build/test CI jobs. The existing tests handle this situation gracefully, this here patch does not: On Sun, 7 Jun 2026, Michael Montalbo via GitGitGadget wrote: > diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh > new file mode 100755 > index 0000000000..f159cd86d8 > --- /dev/null > +++ b/t/t4080-diff-process.sh > @@ -0,0 +1,538 @@ > +#!/bin/sh > + > +test_description='diff process via long-running process' > + > +. ./test-lib.sh > + > +if test_have_prereq PYTHON > +then > + PYTHON_PATH=$(command -v python3) || PYTHON_PATH=$(command -v python) When neither `python3` nor `python` are available (which is the case in the minimal Git for Windows SDK used in Git's CI runs), this fails under `set -e`. Before even running the first test case. Resulting in an unexpected TAP format error. Now, we could "fix" this by imitating what `lib-p4` does (see https://github.com/dscho/git/commit/bd0b5570c744f678911a67a62da63f30655f20d8 which demonstrates that it is indeed a work-around on Windows): -- snip -- diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh index fdf6da1c341e67..bd22c247ff3856 100755 --- a/t/t4080-diff-process.sh +++ b/t/t4080-diff-process.sh @@ -4,9 +4,10 @@ test_description='diff process via long-running process' . ./test-lib.sh -if test_have_prereq PYTHON +if ! test_have_prereq PYTHON || ! test -x "$PYTHON_PATH" then - PYTHON_PATH=$(command -v python3) || PYTHON_PATH=$(command -v python) + skip_all='python interpreter not available' + test_done fi # -- snap -- Of course, this uncovers _another_ problem with the Python script: It uses Python3-only `f"..."` format strings, which cannot be handled by the Python2 to which the `PYTHON_PATH` variable in `linux-TEST-vars` points. So this requires _another follow-up (see also https://github.com/dscho/git/commit/c12a9f4c80e5ce8db0fe370fac46fb45be2b775f): -- snip -- diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh index bd22c247ff3856..ba14682a9086e4 100755 --- a/t/t4080-diff-process.sh +++ b/t/t4080-diff-process.sh @@ -39,7 +39,8 @@ setup_backend () { def write_pkt(line): data = (line + "\n").encode() - sys.stdout.buffer.write(f"{len(data)+4:04x}".encode() + data) + hdr = "{:04x}".format(len(data) + 4).encode() + sys.stdout.buffer.write(hdr + data) sys.stdout.buffer.flush() def write_flush(): @@ -98,7 +99,8 @@ setup_backend () { new = read_content() old_first = old.split(b"\n")[0].decode(errors="replace") if old else "" new_first = new.split(b"\n")[0].decode(errors="replace") if new else "" - log(f"command={cmd} pathname={pathname} old={old_first} new={new_first}") + log("command={} pathname={} old={} new={}".format( + cmd, pathname, old_first, new_first)) if mode == "error": write_flush() @@ -130,7 +132,7 @@ setup_backend () { else: ol = old.count(b"\n") nl = new.count(b"\n") - write_pkt(f"hunk 1 {ol} 1 {nl}") + write_pkt("hunk 1 {} 1 {}".format(ol, nl)) write_flush() write_pkt("status=success") write_flush() -- snap -- And this is still not enough to make it work with Python2, see https://github.com/dscho/git/actions/runs/27091523842/job/79955895737: -- snip -- [...] + git -c diff.cdiff.process=./diff-process-backend --mode=fixed-hunk diff boundary.c Traceback (most recent call last): File "/__w/git/git/t/trash directory.t4080-diff-process/diff-process-backend.py", line 45, in <module> assert read_pkt() == "git-diff-client" File "/__w/git/git/t/trash directory.t4080-diff-process/diff-process-backend.py", line 4, in read_pkt hdr = sys.stdin.buffer.read(4) AttributeError: 'file' object has no attribute 'buffer' -- snap -- I have experienced similar patterns in my career, where a single decision required multiple follow-up fixes _just_ to avoid having to revert that decision. This kind of doubling down has never ended well. Therefore I would like to take a step back, and ask: Is it _really_ a good idea to use Python here? Are we certain that we want to _require_ Python to run this test and skip it if Python isn't available (as is the case in the Windows-related parts of Git's very own CI) even if Python has nothing at all to do with the feature that is being tested? I don't want to be doomed to repeat history, and we can very well learn e.g. from prior art in this very project, where the tests for the clean/smudge filters (which _also_ want to speak pkt-line over stdio) needlessly incurred Perl as a requirement to run the tests. It was Matheus's heroic work in 52917a998ef3a (t0021: implementation the rot13-filter.pl script in C, 2022-08-14) and 4d1d843be7a15 (tests: use the new C rot13-filter helper to avoid PERL prereq, 2022-08-14) that avoided that unnecessary prerequisite. Likewise, there is `test-tool pkt-line` intended for driving the pkt-line protocol via simple shell scripts. So the conscious project direction has been: fold pkt-line test backends into `test-tool` and drop the scripting-language prereq. Reintroducing the same shape in Python would walk this back. Patrick's careful effort in 27bd8ee311719 (Merge branch 'ps/fewer-perl', 2025-04-29) has been another clear sign that the Git project is actively _removing_ scripting-language dependencies from the build and test infrastructure, not adding new ones. The clear prior art in Git's own tests for what t4080 wants to do, as of today, is `t/helper/test-rot13-filter.c`, which could be imitated here instead of (re-)introducing a dependency on a scripting language other than Unix shell in Git's test suite. The `PYTHON` prereq exists in exactly five files today, all `git p4` related (where Python is an intrinsic prerequisite given that `git-p4.py` _is_ written in Python): `t/lib-git-p4.sh`, `t/t9802-git-p4-filetype.sh`, `t/t9810-git-p4-rcs.sh`, `t/t9835-git-p4-metadata-encoding-python2.sh`, and `t/t9836-git-p4-metadata-encoding-python3.sh`. After 7cdbff14d482 (remove merge-recursive-old, 2006-11-20), this here patch would be the first one, after almost 20 years, to re-introduce Python as a dependency outside `git p4`. And it would also be the first ever to embed a Python script as a heredoc: > +fi > + > +# > +# A single parametric diff process. > +# Usage: diff-process-backend --mode=<mode> [--log=<path>] > +# > +# Modes: > +# whole-file - report all lines as changed (default) > +# fixed-hunk - always report hunk 5 2 5 2 > +# bad-hunk - report out-of-bounds hunk 999 1 999 1 > +# bad-sync - report hunk with mismatched unchanged totals > +# overlap - report two overlapping hunks > +# no-hunks - return no hunks (files considered equivalent) > +# error - return status=error for every request > +# abort - return status=abort for every request > +# crash - read one request then exit without responding > +# > +setup_backend () { > + cat >"$TRASH_DIRECTORY/diff-process-backend.py" <<-\PYEOF > + import sys, os > + > + def read_pkt(): > + hdr = sys.stdin.buffer.read(4) > + if len(hdr) < 4: return None > + length = int(hdr, 16) > + if length == 0: return "" > + data = sys.stdin.buffer.read(length - 4) > + return data.decode().rstrip("\n") > + > + def write_pkt(line): > + data = (line + "\n").encode() > + sys.stdout.buffer.write(f"{len(data)+4:04x}".encode() + data) > + sys.stdout.buffer.flush() > + > + def write_flush(): > + sys.stdout.buffer.write(b"0000") > + sys.stdout.buffer.flush() > + > + def read_content(): > + chunks = [] > + while True: > + hdr = sys.stdin.buffer.read(4) > + if len(hdr) < 4: break > + length = int(hdr, 16) > + if length == 0: break > + chunks.append(sys.stdin.buffer.read(length - 4)) > + return b"".join(chunks) > + > + mode = "whole-file" > + logfile = None > + for arg in sys.argv[1:]: > + if arg.startswith("--mode="): > + mode = arg[7:] > + elif arg.startswith("--log="): > + logfile = open(arg[6:], "a") > + > + def log(msg): > + if logfile: > + logfile.write(msg + "\n") > + logfile.flush() > + > + # Handshake > + assert read_pkt() == "git-diff-client" > + assert read_pkt() == "version=1" > + read_pkt() > + write_pkt("git-diff-server") > + write_pkt("version=1") > + write_flush() > + while True: > + p = read_pkt() > + if p == "": break > + write_pkt("capability=hunks") > + write_flush() > + > + log("ready") > + > + while True: > + cmd = None > + pathname = None > + while True: > + p = read_pkt() > + if p is None: sys.exit(0) > + if p == "": break > + if p.startswith("command="): cmd = p.split("=",1)[1] > + if p.startswith("pathname="): pathname = p.split("=",1)[1] > + if cmd is None: sys.exit(0) > + old = read_content() > + new = read_content() > + old_first = old.split(b"\n")[0].decode(errors="replace") if old else "" > + new_first = new.split(b"\n")[0].decode(errors="replace") if new else "" > + log(f"command={cmd} pathname={pathname} old={old_first} new={new_first}") > + > + if mode == "error": > + write_flush() > + write_pkt("status=error") > + write_flush() > + continue > + > + if mode == "abort": > + write_flush() > + write_pkt("status=abort") > + write_flush() > + continue > + > + if mode == "crash": > + sys.exit(1) > + > + if cmd == "hunks": > + if mode == "fixed-hunk": > + write_pkt("hunk 5 2 5 2") > + elif mode == "bad-hunk": > + write_pkt("hunk 999 1 999 1") > + elif mode == "bad-sync": > + write_pkt("hunk 1 2 1 1") > + elif mode == "overlap": > + write_pkt("hunk 1 5 1 5") > + write_pkt("hunk 3 2 3 2") > + elif mode == "no-hunks": > + pass > + else: > + ol = old.count(b"\n") > + nl = new.count(b"\n") > + write_pkt(f"hunk 1 {ol} 1 {nl}") > + write_flush() > + write_pkt("status=success") > + write_flush() > + else: > + write_flush() > + write_pkt("status=error") > + write_flush() > + PYEOF The existing pattern is to provide larger scripts as fixtures in associated `t/tNNNN/` directories, not as heredoc, see e.g. `t/t1509/prepare-chroot.sh`. Writing scripts, especially lengthy ones, in heredoc strings makes it virtually impossible to use static code analysis or syntax highlighting to fend off banal errors. Given the complexity of what t4080 tries to test (error, abort, crash, bad-sync, no-hunks, multiple files in one session, capability negotiation), it would unfortunately be infeasible to use `test-tool pkt-line` from a shell script implementing that `diff.*.process` protocol. So I've spiked a demo how the `test-tool diff-process-backend` could look like (letting Opus do the menial typing, so that I can enjoy at least part of a sunny Sunday outside), which also passes the CI build and test: https://github.com/dscho/git/commit/b6e3c93381b00929476c3a00155f7cf7334a22e6 That commit is of course not intended to be used as-is; Feel free to pick code parts of it and integrate them into your topic branch. Or write your own test-tool helper from scratch if that's more your jam. Ciao, Johannes > + write_script diff-process-backend <<-SHEOF > + exec "$PYTHON_PATH" "$TRASH_DIRECTORY/diff-process-backend.py" "\$@" > + SHEOF > +} > + > +BACKEND="./diff-process-backend" > + > +test_expect_success PYTHON 'setup' ' > + setup_backend && > + echo "*.c diff=cdiff" >.gitattributes && > + git add .gitattributes && > + > + # boundary.c: 10 lines, changes at 5-6 and 9-10. > + # Used by: hunk boundaries, error fallback, crash, bad hunks, overlap. > + cat >boundary.c <<-\EOF && > + line1 > + line2 > + line3 > + line4 > + OLD5 > + OLD6 > + line7 > + line8 > + OLD9 > + OLD10 > + EOF > + git add boundary.c && > + > + # worddiff.c: single-line function, value changes 1 -> 999. > + # Used by: word-diff, --diff-algorithm, --no-ext-diff, --stat. > + cat >worddiff.c <<-\EOF && > + int value(void) { return 1; } > + EOF > + git add worddiff.c && > + > + # newfile.c: single-line function, value changes 42 -> 99. > + # Used by: new file, --exit-code, multiple drivers. > + cat >newfile.c <<-\EOF && > + int new_func(void) { return 42; } > + EOF > + git add newfile.c && > + > + # logtest.c: single-line function for log/format-patch tests. > + # Needs two commits so log -1 has a diff. > + cat >logtest.c <<-\EOF && > + int logfunc(void) { return 1; } > + EOF > + git add logtest.c && > + > + # two.c/one.c: two-file pair for error/abort/startup-failure tests. > + cat >one.c <<-\EOF && > + int first(void) { return 1; } > + EOF > + cat >two.c <<-\EOF && > + int second(void) { return 2; } > + EOF > + git add one.c two.c && > + > + git commit -m "initial" && > + > + # Second commit for logtest.c (so log -1 has something to show). > + cat >logtest.c <<-\EOF && > + int logfunc(void) { return 2; } > + EOF > + git add logtest.c && > + git commit -m "change logtest.c" && > + > + # Working tree modifications (not committed). > + cat >boundary.c <<-\EOF && > + line1 > + line2 > + line3 > + line4 > + NEW5 > + NEW6 > + line7 > + line8 > + NEW9 > + NEW10 > + EOF > + > + cat >worddiff.c <<-\EOF && > + int value(void) { return 999; } > + EOF > + > + cat >newfile.c <<-\EOF && > + int new_func(void) { return 99; } > + EOF > + > + cat >one.c <<-\EOF && > + int first(void) { return 10; } > + EOF > + > + cat >two.c <<-\EOF > + int second(void) { return 20; } > + EOF > +' > + > +# > +# Core behavior: the tool controls which lines are marked as changed. > +# > + > +test_expect_success PYTHON 'diff process hunk boundaries affect output' ' > + # The file has changes at lines 5-6 and 9-10, but fixed-hunk > + # only reports lines 5-6 as changed. Lines 9-10 should not > + # appear as changed in the output. > + git -c diff.cdiff.process="$BACKEND --mode=fixed-hunk" \ > + diff boundary.c >actual && > + test_grep "^-OLD5" actual && > + test_grep "^-OLD6" actual && > + test_grep "^+NEW5" actual && > + test_grep "^+NEW6" actual && > + test_grep ! "^-OLD9" actual && > + test_grep ! "^-OLD10" actual && > + test_grep ! "^+NEW9" actual && > + test_grep ! "^+NEW10" actual > +' > + > +test_expect_success PYTHON 'diff process works with new file' ' > + rm -f backend.log && > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \ > + diff -- newfile.c >actual 2>stderr && > + test_grep "return 99" actual && > + test_grep "pathname=newfile.c" backend.log && > + test_must_be_empty stderr > +' > + > +test_expect_success PYTHON 'diff process works with added file (empty old side)' ' > + cat >added.c <<-\EOF && > + int added(void) { return 1; } > + EOF > + git add added.c && > + > + rm -f backend.log && > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \ > + diff --cached -- added.c >actual 2>stderr && > + test_grep "added" actual && > + test_grep "pathname=added.c" backend.log && > + test_must_be_empty stderr > +' > + > +test_expect_success PYTHON 'diff process skipped for binary files' ' > + printf "\\0binary" >binary.c && > + git add binary.c && > + git commit -m "add binary" && > + printf "\\0changed" >binary.c && > + > + rm -f backend.log && > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \ > + diff -- binary.c >actual && > + test_grep "Binary files" actual && > + test_path_is_missing backend.log > +' > + > +test_expect_success PYTHON 'diff process not consulted for unmatched driver' ' > + echo "not tracked by cdiff" >unmatched.txt && > + git add unmatched.txt && > + git commit -m "add unmatched.txt" && > + > + echo "modified" >unmatched.txt && > + > + rm -f backend.log && > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \ > + diff -- unmatched.txt >actual && > + test_grep "modified" actual && > + test_path_is_missing backend.log > +' > + > +test_expect_success PYTHON 'multiple drivers use separate processes' ' > + echo "*.h diff=hdiff" >>.gitattributes && > + git add .gitattributes && > + > + cat >multi.h <<-\EOF && > + int header(void) { return 1; } > + EOF > + git add multi.h && > + git commit -m "add multi.h" && > + > + cat >multi.h <<-\EOF && > + int header(void) { return 2; } > + EOF > + > + rm -f backend-c.log backend-h.log && > + git -c diff.cdiff.process="$BACKEND --log=backend-c.log" \ > + -c diff.hdiff.process="$BACKEND --log=backend-h.log" \ > + diff -- newfile.c multi.h >actual 2>stderr && > + test_grep "pathname=newfile.c" backend-c.log && > + test_grep "pathname=multi.h" backend-h.log && > + test_must_be_empty stderr > +' > + > +test_expect_success PYTHON 'diff process works alongside textconv' ' > + write_script uppercase-filter <<-\EOF && > + tr "a-z" "A-Z" <"$1" > + EOF > + > + cat >textconv.c <<-\EOF && > + hello world > + EOF > + git add textconv.c && > + git commit -m "add textconv.c" && > + > + cat >textconv.c <<-\EOF && > + goodbye world > + EOF > + > + rm -f backend.log && > + git -c diff.cdiff.textconv="./uppercase-filter" \ > + -c diff.cdiff.process="$BACKEND --log=backend.log" \ > + diff -- textconv.c >actual 2>stderr && > + # The diff process receives textconv-transformed (uppercase) content. > + test_grep "pathname=textconv.c" backend.log && > + test_grep "old=HELLO WORLD" backend.log && > + test_grep "new=GOODBYE WORLD" backend.log && > + test_must_be_empty stderr > +' > + > +# > +# Downstream features: word diff, log, equivalent files, exit code. > +# > + > +test_expect_success PYTHON 'diff process with --word-diff' ' > + rm -f backend.log && > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \ > + diff --word-diff worddiff.c >actual 2>stderr && > + test_grep "\[-1;-\]" actual && > + test_grep "{+999;+}" actual && > + test_grep "pathname=worddiff.c" backend.log && > + test_must_be_empty stderr > +' > + > +test_expect_success PYTHON 'diff process works with git log -p' ' > + # With no-hunks mode, the tool says the files are equivalent, > + # so log -p should show the commit but no diff content. > + rm -f backend.log && > + git -c diff.cdiff.process="$BACKEND --mode=no-hunks --log=backend.log" \ > + log -1 -p -- logtest.c >actual 2>stderr && > + test_grep "change logtest.c" actual && > + test_grep ! "return 2" actual && > + test_grep "command=hunks pathname=logtest.c" backend.log && > + test_must_be_empty stderr > +' > + > +test_expect_success PYTHON 'diff process no hunks suppresses diff output' ' > + cat >nohunks.c <<-\EOF && > + int zero(void) { return 0; } > + EOF > + git add nohunks.c && > + git commit -m "add nohunks.c" && > + > + cat >nohunks.c <<-\EOF && > + int zero(void) { return 999; } > + EOF > + > + git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \ > + diff nohunks.c >actual && > + test_must_be_empty actual > +' > + > +test_expect_success PYTHON 'diff process no hunks with --exit-code returns success' ' > + git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \ > + diff --exit-code nohunks.c > +' > + > +test_expect_success PYTHON 'diff process with --exit-code and hunks returns failure' ' > + test_expect_code 1 git -c diff.cdiff.process="$BACKEND" \ > + diff --exit-code newfile.c > +' > + > +# > +# Bypass mechanisms: flags and commands that skip the diff process. > +# > + > +test_expect_success PYTHON 'diff process bypassed by --diff-algorithm' ' > + rm -f backend.log && > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \ > + diff --diff-algorithm=patience worddiff.c >actual && > + test_grep "return 999" actual && > + test_path_is_missing backend.log > +' > + > +test_expect_success PYTHON 'diff process not used by --stat' ' > + rm -f backend.log && > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \ > + diff --stat worddiff.c >actual && > + test_grep "worddiff.c" actual && > + test_path_is_missing backend.log > +' > + > +# > +# Error handling and fallback. > +# > + > +test_expect_success PYTHON 'diff process fallback on tool error status' ' > + rm -f backend.log && > + git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \ > + diff boundary.c >actual 2>stderr && > + # Fallback produces the full builtin diff (both change regions). > + test_grep "^-OLD5" actual && > + test_grep "^+NEW5" actual && > + test_grep "^-OLD9" actual && > + test_grep "^+NEW9" actual && > + # Tool was contacted (it replied with error, not crash). > + test_grep "command=hunks pathname=boundary.c" backend.log && > + test_grep "diff process.*failed" stderr > +' > + > +test_expect_success PYTHON 'diff process error keeps tool available for next file' ' > + rm -f backend.log && > + git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \ > + diff -- one.c two.c >actual 2>stderr && > + # Unlike abort, error keeps the tool available: both files > + # are sent to the tool (and both fall back). > + test_grep "pathname=one.c" backend.log && > + test_grep "pathname=two.c" backend.log && > + test_grep "return 10" actual && > + test_grep "return 20" actual > +' > + > +test_expect_success PYTHON 'diff process abort disables for session' ' > + rm -f backend.log && > + git -c diff.cdiff.process="$BACKEND --mode=abort --log=backend.log" \ > + diff -- one.c two.c >actual && > + # Both files should still produce diff output via fallback. > + test_grep "return 10" actual && > + test_grep "return 20" actual && > + # The tool aborts on the first file and git clears its > + # capability. The second file never contacts the tool. > + test_grep "pathname=one.c" backend.log && > + test_grep ! "pathname=two.c" backend.log > +' > + > +test_expect_success PYTHON 'diff process fallback on tool crash' ' > + git -c diff.cdiff.process="$BACKEND --mode=crash" \ > + diff boundary.c >actual 2>stderr && > + test_grep "^-OLD5" actual && > + test_grep "^+NEW5" actual && > + test_grep "^-OLD9" actual && > + test_grep "^+NEW9" actual && > + # Crash is a communication failure, so a warning is emitted. > + test_grep "diff process.*failed" stderr > +' > + > +test_expect_success PYTHON 'diff process startup failure only warns once' ' > + git -c diff.cdiff.process="/nonexistent/tool" \ > + diff -- one.c two.c >actual 2>stderr && > + # Both files produce diff output via fallback. > + test_grep "return 10" actual && > + test_grep "return 20" actual && > + # Sentinel prevents repeated warnings: only one, not one per file. > + test_grep "diff process.*failed" stderr >warnings && > + test_line_count = 1 warnings > +' > + > +test_expect_success PYTHON 'diff process fallback on bad hunks' ' > + git -c diff.cdiff.process="$BACKEND --mode=bad-hunk" \ > + diff boundary.c >actual 2>stderr && > + test_grep "^-OLD5" actual && > + test_grep "^+NEW5" actual && > + test_grep "^-OLD9" actual && > + test_grep "^+NEW9" actual && > + # Invalid hunks are caught by xdiff validation, not the > + # protocol layer, so no warning is emitted. > + test_must_be_empty stderr > +' > + > +test_expect_success PYTHON 'diff process fallback on mismatched unchanged totals' ' > + cat >synctest.c <<-\EOF && > + line1 > + line2 > + line3 > + EOF > + git add synctest.c && > + git commit -m "add synctest.c" && > + > + cat >synctest.c <<-\EOF && > + line1 > + changed > + line3 > + EOF > + > + # bad-sync reports hunk 1 2 1 1: marks 2 old lines and 1 new > + # line as changed, leaving 1 unchanged old vs 2 unchanged new. > + # The synchronization invariant fails and git falls back. > + git -c diff.cdiff.process="$BACKEND --mode=bad-sync" \ > + diff synctest.c >actual 2>stderr && > + test_grep "changed" actual > +' > + > +test_expect_success PYTHON 'diff process fallback on overlapping hunks' ' > + # boundary.c has 10 lines, so both hunks are in bounds > + # but they overlap at lines 3-5, triggering the ordering check. > + git -c diff.cdiff.process="$BACKEND --mode=overlap" \ > + diff boundary.c >actual 2>stderr && > + test_grep "NEW5" actual > +' > + > +test_done > diff --git a/userdiff.h b/userdiff.h > index 51c26e0d41..a98eabe377 100644 > --- a/userdiff.h > +++ b/userdiff.h > @@ -3,6 +3,7 @@ > > #include "notes-cache.h" > > +struct diff_subprocess; > struct index_state; > struct repository; > > @@ -33,6 +34,8 @@ struct userdiff_driver { > int textconv_want_cache; > const char *process; > char *process_owned; > + struct diff_subprocess *diff_subprocess; > + unsigned diff_process_failed : 1; > }; > enum userdiff_driver_type { > USERDIFF_DRIVER_TYPE_BUILTIN = 1<<0, > -- > gitgitgadget > > >

Michael Montalbo wrote on the Git mailing list (how to reply to this email):

On Sun, Jun 7, 2026 at 7:36 AM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > Hi Michael, > > I stumbled about this patch when it broke CI in Git for Windows, where we > do _not_ use `NO_PYTHON`, even though Python is unavailable in the > build/test CI jobs. The existing tests handle this situation gracefully, > this here patch does not: > > On Sun, 7 Jun 2026, Michael Montalbo via GitGitGadget wrote: > > > diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh > > new file mode 100755 > > index 0000000000..f159cd86d8 > > --- /dev/null > > +++ b/t/t4080-diff-process.sh > > @@ -0,0 +1,538 @@ > > +#!/bin/sh > > + > > +test_description='diff process via long-running process' > > + > > +. ./test-lib.sh > > + > > +if test_have_prereq PYTHON > > +then > > + PYTHON_PATH=$(command -v python3) || PYTHON_PATH=$(command -v python) > > When neither `python3` nor `python` are available (which is the case in > the minimal Git for Windows SDK used in Git's CI runs), this fails under > `set -e`. Before even running the first test case. Resulting in an > unexpected TAP format error. > > Now, we could "fix" this by imitating what `lib-p4` does (see > https://github.com/dscho/git/commit/bd0b5570c744f678911a67a62da63f30655f20d8 > which demonstrates that it is indeed a work-around on Windows): > > -- snip -- > diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh > index fdf6da1c341e67..bd22c247ff3856 100755 > --- a/t/t4080-diff-process.sh > +++ b/t/t4080-diff-process.sh > @@ -4,9 +4,10 @@ test_description='diff process via long-running process' > > . ./test-lib.sh > > -if test_have_prereq PYTHON > +if ! test_have_prereq PYTHON || ! test -x "$PYTHON_PATH" > then > - PYTHON_PATH=$(command -v python3) || PYTHON_PATH=$(command -v python) > + skip_all='python interpreter not available' > + test_done > fi > > # > -- snap -- > > Of course, this uncovers _another_ problem with the Python script: It uses > Python3-only `f"..."` format strings, which cannot be handled by the > Python2 to which the `PYTHON_PATH` variable in `linux-TEST-vars` points. > So this requires _another follow-up (see also > https://github.com/dscho/git/commit/c12a9f4c80e5ce8db0fe370fac46fb45be2b775f): > > -- snip -- > diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh > index bd22c247ff3856..ba14682a9086e4 100755 > --- a/t/t4080-diff-process.sh > +++ b/t/t4080-diff-process.sh > @@ -39,7 +39,8 @@ setup_backend () { > > def write_pkt(line): > data = (line + "\n").encode() > - sys.stdout.buffer.write(f"{len(data)+4:04x}".encode() + data) > + hdr = "{:04x}".format(len(data) + 4).encode() > + sys.stdout.buffer.write(hdr + data) > sys.stdout.buffer.flush() > > def write_flush(): > @@ -98,7 +99,8 @@ setup_backend () { > new = read_content() > old_first = old.split(b"\n")[0].decode(errors="replace") if old else "" > new_first = new.split(b"\n")[0].decode(errors="replace") if new else "" > - log(f"command={cmd} pathname={pathname} old={old_first} new={new_first}") > + log("command={} pathname={} old={} new={}".format( > + cmd, pathname, old_first, new_first)) > > if mode == "error": > write_flush() > @@ -130,7 +132,7 @@ setup_backend () { > else: > ol = old.count(b"\n") > nl = new.count(b"\n") > - write_pkt(f"hunk 1 {ol} 1 {nl}") > + write_pkt("hunk 1 {} 1 {}".format(ol, nl)) > write_flush() > write_pkt("status=success") > write_flush() > -- snap -- > > And this is still not enough to make it work with Python2, see > https://github.com/dscho/git/actions/runs/27091523842/job/79955895737: > > -- snip -- > [...] > + git -c diff.cdiff.process=./diff-process-backend --mode=fixed-hunk diff boundary.c > Traceback (most recent call last): > File "/__w/git/git/t/trash directory.t4080-diff-process/diff-process-backend.py", line 45, in <module> > assert read_pkt() == "git-diff-client" > File "/__w/git/git/t/trash directory.t4080-diff-process/diff-process-backend.py", line 4, in read_pkt > hdr = sys.stdin.buffer.read(4) > AttributeError: 'file' object has no attribute 'buffer' > -- snap -- > > I have experienced similar patterns in my career, where a single decision > required multiple follow-up fixes _just_ to avoid having to revert that > decision. This kind of doubling down has never ended well. > > Therefore I would like to take a step back, and ask: Is it _really_ a good > idea to use Python here? Are we certain that we want to _require_ Python > to run this test and skip it if Python isn't available (as is the case in > the Windows-related parts of Git's very own CI) even if Python has nothing > at all to do with the feature that is being tested? > > I don't want to be doomed to repeat history, and we can very well learn > e.g. from prior art in this very project, where the tests for the > clean/smudge filters (which _also_ want to speak pkt-line over stdio) > needlessly incurred Perl as a requirement to run the tests. It was > Matheus's heroic work in 52917a998ef3a (t0021: implementation the > rot13-filter.pl script in C, 2022-08-14) and 4d1d843be7a15 (tests: use the > new C rot13-filter helper to avoid PERL prereq, 2022-08-14) that avoided > that unnecessary prerequisite. > > Likewise, there is `test-tool pkt-line` intended for driving the pkt-line > protocol via simple shell scripts. > > So the conscious project direction has been: fold pkt-line test backends > into `test-tool` and drop the scripting-language prereq. Reintroducing the > same shape in Python would walk this back. > > Patrick's careful effort in 27bd8ee311719 (Merge branch 'ps/fewer-perl', > 2025-04-29) has been another clear sign that the Git project is actively > _removing_ scripting-language dependencies from the build and test > infrastructure, not adding new ones. > > The clear prior art in Git's own tests for what t4080 wants to do, as of > today, is `t/helper/test-rot13-filter.c`, which could be imitated here > instead of (re-)introducing a dependency on a scripting language other > than Unix shell in Git's test suite. > > The `PYTHON` prereq exists in exactly five files today, all `git p4` > related (where Python is an intrinsic prerequisite given that `git-p4.py` > _is_ written in Python): `t/lib-git-p4.sh`, `t/t9802-git-p4-filetype.sh`, > `t/t9810-git-p4-rcs.sh`, `t/t9835-git-p4-metadata-encoding-python2.sh`, > and `t/t9836-git-p4-metadata-encoding-python3.sh`. > > After 7cdbff14d482 (remove merge-recursive-old, 2006-11-20), this here > patch would be the first one, after almost 20 years, to re-introduce > Python as a dependency outside `git p4`. > > And it would also be the first ever to embed a Python script as a heredoc: > > > +fi > > + > > +# > > +# A single parametric diff process. > > +# Usage: diff-process-backend --mode=<mode> [--log=<path>] > > +# > > +# Modes: > > +# whole-file - report all lines as changed (default) > > +# fixed-hunk - always report hunk 5 2 5 2 > > +# bad-hunk - report out-of-bounds hunk 999 1 999 1 > > +# bad-sync - report hunk with mismatched unchanged totals > > +# overlap - report two overlapping hunks > > +# no-hunks - return no hunks (files considered equivalent) > > +# error - return status=error for every request > > +# abort - return status=abort for every request > > +# crash - read one request then exit without responding > > +# > > +setup_backend () { > > + cat >"$TRASH_DIRECTORY/diff-process-backend.py" <<-\PYEOF > > + import sys, os > > + > > + def read_pkt(): > > + hdr = sys.stdin.buffer.read(4) > > + if len(hdr) < 4: return None > > + length = int(hdr, 16) > > + if length == 0: return "" > > + data = sys.stdin.buffer.read(length - 4) > > + return data.decode().rstrip("\n") > > + > > + def write_pkt(line): > > + data = (line + "\n").encode() > > + sys.stdout.buffer.write(f"{len(data)+4:04x}".encode() + data) > > + sys.stdout.buffer.flush() > > + > > + def write_flush(): > > + sys.stdout.buffer.write(b"0000") > > + sys.stdout.buffer.flush() > > + > > + def read_content(): > > + chunks = [] > > + while True: > > + hdr = sys.stdin.buffer.read(4) > > + if len(hdr) < 4: break > > + length = int(hdr, 16) > > + if length == 0: break > > + chunks.append(sys.stdin.buffer.read(length - 4)) > > + return b"".join(chunks) > > + > > + mode = "whole-file" > > + logfile = None > > + for arg in sys.argv[1:]: > > + if arg.startswith("--mode="): > > + mode = arg[7:] > > + elif arg.startswith("--log="): > > + logfile = open(arg[6:], "a") > > + > > + def log(msg): > > + if logfile: > > + logfile.write(msg + "\n") > > + logfile.flush() > > + > > + # Handshake > > + assert read_pkt() == "git-diff-client" > > + assert read_pkt() == "version=1" > > + read_pkt() > > + write_pkt("git-diff-server") > > + write_pkt("version=1") > > + write_flush() > > + while True: > > + p = read_pkt() > > + if p == "": break > > + write_pkt("capability=hunks") > > + write_flush() > > + > > + log("ready") > > + > > + while True: > > + cmd = None > > + pathname = None > > + while True: > > + p = read_pkt() > > + if p is None: sys.exit(0) > > + if p == "": break > > + if p.startswith("command="): cmd = p.split("=",1)[1] > > + if p.startswith("pathname="): pathname = p.split("=",1)[1] > > + if cmd is None: sys.exit(0) > > + old = read_content() > > + new = read_content() > > + old_first = old.split(b"\n")[0].decode(errors="replace") if old else "" > > + new_first = new.split(b"\n")[0].decode(errors="replace") if new else "" > > + log(f"command={cmd} pathname={pathname} old={old_first} new={new_first}") > > + > > + if mode == "error": > > + write_flush() > > + write_pkt("status=error") > > + write_flush() > > + continue > > + > > + if mode == "abort": > > + write_flush() > > + write_pkt("status=abort") > > + write_flush() > > + continue > > + > > + if mode == "crash": > > + sys.exit(1) > > + > > + if cmd == "hunks": > > + if mode == "fixed-hunk": > > + write_pkt("hunk 5 2 5 2") > > + elif mode == "bad-hunk": > > + write_pkt("hunk 999 1 999 1") > > + elif mode == "bad-sync": > > + write_pkt("hunk 1 2 1 1") > > + elif mode == "overlap": > > + write_pkt("hunk 1 5 1 5") > > + write_pkt("hunk 3 2 3 2") > > + elif mode == "no-hunks": > > + pass > > + else: > > + ol = old.count(b"\n") > > + nl = new.count(b"\n") > > + write_pkt(f"hunk 1 {ol} 1 {nl}") > > + write_flush() > > + write_pkt("status=success") > > + write_flush() > > + else: > > + write_flush() > > + write_pkt("status=error") > > + write_flush() > > + PYEOF > > The existing pattern is to provide larger scripts as fixtures in > associated `t/tNNNN/` directories, not as heredoc, see e.g. > `t/t1509/prepare-chroot.sh`. Writing scripts, especially lengthy ones, in > heredoc strings makes it virtually impossible to use static code analysis > or syntax highlighting to fend off banal errors. > > Given the complexity of what t4080 tries to test (error, abort, crash, > bad-sync, no-hunks, multiple files in one session, capability > negotiation), it would unfortunately be infeasible to use `test-tool > pkt-line` from a shell script implementing that `diff.*.process` protocol. > > So I've spiked a demo how the `test-tool diff-process-backend` could look > like (letting Opus do the menial typing, so that I can enjoy at least part > of a sunny Sunday outside), which also passes the CI build and test: > https://github.com/dscho/git/commit/b6e3c93381b00929476c3a00155f7cf7334a22e6 > > That commit is of course not intended to be used as-is; Feel free to pick > code parts of it and integrate them into your topic branch. Or write your > own test-tool helper from scratch if that's more your jam. > Johannes, thank you for the great feedback. The historical context is really helpful and the concerns you raise make a lot of sense. I will take a look at your spike and also work on removing Python from the test. > Ciao, > Johannes > > > + write_script diff-process-backend <<-SHEOF > > + exec "$PYTHON_PATH" "$TRASH_DIRECTORY/diff-process-backend.py" "\$@" > > + SHEOF > > +} > > + > > +BACKEND="./diff-process-backend" > > + > > +test_expect_success PYTHON 'setup' ' > > + setup_backend && > > + echo "*.c diff=cdiff" >.gitattributes && > > + git add .gitattributes && > > + > > + # boundary.c: 10 lines, changes at 5-6 and 9-10. > > + # Used by: hunk boundaries, error fallback, crash, bad hunks, overlap. > > + cat >boundary.c <<-\EOF && > > + line1 > > + line2 > > + line3 > > + line4 > > + OLD5 > > + OLD6 > > + line7 > > + line8 > > + OLD9 > > + OLD10 > > + EOF > > + git add boundary.c && > > + > > + # worddiff.c: single-line function, value changes 1 -> 999. > > + # Used by: word-diff, --diff-algorithm, --no-ext-diff, --stat. > > + cat >worddiff.c <<-\EOF && > > + int value(void) { return 1; } > > + EOF > > + git add worddiff.c && > > + > > + # newfile.c: single-line function, value changes 42 -> 99. > > + # Used by: new file, --exit-code, multiple drivers. > > + cat >newfile.c <<-\EOF && > > + int new_func(void) { return 42; } > > + EOF > > + git add newfile.c && > > + > > + # logtest.c: single-line function for log/format-patch tests. > > + # Needs two commits so log -1 has a diff. > > + cat >logtest.c <<-\EOF && > > + int logfunc(void) { return 1; } > > + EOF > > + git add logtest.c && > > + > > + # two.c/one.c: two-file pair for error/abort/startup-failure tests. > > + cat >one.c <<-\EOF && > > + int first(void) { return 1; } > > + EOF > > + cat >two.c <<-\EOF && > > + int second(void) { return 2; } > > + EOF > > + git add one.c two.c && > > + > > + git commit -m "initial" && > > + > > + # Second commit for logtest.c (so log -1 has something to show). > > + cat >logtest.c <<-\EOF && > > + int logfunc(void) { return 2; } > > + EOF > > + git add logtest.c && > > + git commit -m "change logtest.c" && > > + > > + # Working tree modifications (not committed). > > + cat >boundary.c <<-\EOF && > > + line1 > > + line2 > > + line3 > > + line4 > > + NEW5 > > + NEW6 > > + line7 > > + line8 > > + NEW9 > > + NEW10 > > + EOF > > + > > + cat >worddiff.c <<-\EOF && > > + int value(void) { return 999; } > > + EOF > > + > > + cat >newfile.c <<-\EOF && > > + int new_func(void) { return 99; } > > + EOF > > + > > + cat >one.c <<-\EOF && > > + int first(void) { return 10; } > > + EOF > > + > > + cat >two.c <<-\EOF > > + int second(void) { return 20; } > > + EOF > > +' > > + > > +# > > +# Core behavior: the tool controls which lines are marked as changed. > > +# > > + > > +test_expect_success PYTHON 'diff process hunk boundaries affect output' ' > > + # The file has changes at lines 5-6 and 9-10, but fixed-hunk > > + # only reports lines 5-6 as changed. Lines 9-10 should not > > + # appear as changed in the output. > > + git -c diff.cdiff.process="$BACKEND --mode=fixed-hunk" \ > > + diff boundary.c >actual && > > + test_grep "^-OLD5" actual && > > + test_grep "^-OLD6" actual && > > + test_grep "^+NEW5" actual && > > + test_grep "^+NEW6" actual && > > + test_grep ! "^-OLD9" actual && > > + test_grep ! "^-OLD10" actual && > > + test_grep ! "^+NEW9" actual && > > + test_grep ! "^+NEW10" actual > > +' > > + > > +test_expect_success PYTHON 'diff process works with new file' ' > > + rm -f backend.log && > > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \ > > + diff -- newfile.c >actual 2>stderr && > > + test_grep "return 99" actual && > > + test_grep "pathname=newfile.c" backend.log && > > + test_must_be_empty stderr > > +' > > + > > +test_expect_success PYTHON 'diff process works with added file (empty old side)' ' > > + cat >added.c <<-\EOF && > > + int added(void) { return 1; } > > + EOF > > + git add added.c && > > + > > + rm -f backend.log && > > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \ > > + diff --cached -- added.c >actual 2>stderr && > > + test_grep "added" actual && > > + test_grep "pathname=added.c" backend.log && > > + test_must_be_empty stderr > > +' > > + > > +test_expect_success PYTHON 'diff process skipped for binary files' ' > > + printf "\\0binary" >binary.c && > > + git add binary.c && > > + git commit -m "add binary" && > > + printf "\\0changed" >binary.c && > > + > > + rm -f backend.log && > > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \ > > + diff -- binary.c >actual && > > + test_grep "Binary files" actual && > > + test_path_is_missing backend.log > > +' > > + > > +test_expect_success PYTHON 'diff process not consulted for unmatched driver' ' > > + echo "not tracked by cdiff" >unmatched.txt && > > + git add unmatched.txt && > > + git commit -m "add unmatched.txt" && > > + > > + echo "modified" >unmatched.txt && > > + > > + rm -f backend.log && > > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \ > > + diff -- unmatched.txt >actual && > > + test_grep "modified" actual && > > + test_path_is_missing backend.log > > +' > > + > > +test_expect_success PYTHON 'multiple drivers use separate processes' ' > > + echo "*.h diff=hdiff" >>.gitattributes && > > + git add .gitattributes && > > + > > + cat >multi.h <<-\EOF && > > + int header(void) { return 1; } > > + EOF > > + git add multi.h && > > + git commit -m "add multi.h" && > > + > > + cat >multi.h <<-\EOF && > > + int header(void) { return 2; } > > + EOF > > + > > + rm -f backend-c.log backend-h.log && > > + git -c diff.cdiff.process="$BACKEND --log=backend-c.log" \ > > + -c diff.hdiff.process="$BACKEND --log=backend-h.log" \ > > + diff -- newfile.c multi.h >actual 2>stderr && > > + test_grep "pathname=newfile.c" backend-c.log && > > + test_grep "pathname=multi.h" backend-h.log && > > + test_must_be_empty stderr > > +' > > + > > +test_expect_success PYTHON 'diff process works alongside textconv' ' > > + write_script uppercase-filter <<-\EOF && > > + tr "a-z" "A-Z" <"$1" > > + EOF > > + > > + cat >textconv.c <<-\EOF && > > + hello world > > + EOF > > + git add textconv.c && > > + git commit -m "add textconv.c" && > > + > > + cat >textconv.c <<-\EOF && > > + goodbye world > > + EOF > > + > > + rm -f backend.log && > > + git -c diff.cdiff.textconv="./uppercase-filter" \ > > + -c diff.cdiff.process="$BACKEND --log=backend.log" \ > > + diff -- textconv.c >actual 2>stderr && > > + # The diff process receives textconv-transformed (uppercase) content. > > + test_grep "pathname=textconv.c" backend.log && > > + test_grep "old=HELLO WORLD" backend.log && > > + test_grep "new=GOODBYE WORLD" backend.log && > > + test_must_be_empty stderr > > +' > > + > > +# > > +# Downstream features: word diff, log, equivalent files, exit code. > > +# > > + > > +test_expect_success PYTHON 'diff process with --word-diff' ' > > + rm -f backend.log && > > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \ > > + diff --word-diff worddiff.c >actual 2>stderr && > > + test_grep "\[-1;-\]" actual && > > + test_grep "{+999;+}" actual && > > + test_grep "pathname=worddiff.c" backend.log && > > + test_must_be_empty stderr > > +' > > + > > +test_expect_success PYTHON 'diff process works with git log -p' ' > > + # With no-hunks mode, the tool says the files are equivalent, > > + # so log -p should show the commit but no diff content. > > + rm -f backend.log && > > + git -c diff.cdiff.process="$BACKEND --mode=no-hunks --log=backend.log" \ > > + log -1 -p -- logtest.c >actual 2>stderr && > > + test_grep "change logtest.c" actual && > > + test_grep ! "return 2" actual && > > + test_grep "command=hunks pathname=logtest.c" backend.log && > > + test_must_be_empty stderr > > +' > > + > > +test_expect_success PYTHON 'diff process no hunks suppresses diff output' ' > > + cat >nohunks.c <<-\EOF && > > + int zero(void) { return 0; } > > + EOF > > + git add nohunks.c && > > + git commit -m "add nohunks.c" && > > + > > + cat >nohunks.c <<-\EOF && > > + int zero(void) { return 999; } > > + EOF > > + > > + git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \ > > + diff nohunks.c >actual && > > + test_must_be_empty actual > > +' > > + > > +test_expect_success PYTHON 'diff process no hunks with --exit-code returns success' ' > > + git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \ > > + diff --exit-code nohunks.c > > +' > > + > > +test_expect_success PYTHON 'diff process with --exit-code and hunks returns failure' ' > > + test_expect_code 1 git -c diff.cdiff.process="$BACKEND" \ > > + diff --exit-code newfile.c > > +' > > + > > +# > > +# Bypass mechanisms: flags and commands that skip the diff process. > > +# > > + > > +test_expect_success PYTHON 'diff process bypassed by --diff-algorithm' ' > > + rm -f backend.log && > > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \ > > + diff --diff-algorithm=patience worddiff.c >actual && > > + test_grep "return 999" actual && > > + test_path_is_missing backend.log > > +' > > + > > +test_expect_success PYTHON 'diff process not used by --stat' ' > > + rm -f backend.log && > > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \ > > + diff --stat worddiff.c >actual && > > + test_grep "worddiff.c" actual && > > + test_path_is_missing backend.log > > +' > > + > > +# > > +# Error handling and fallback. > > +# > > + > > +test_expect_success PYTHON 'diff process fallback on tool error status' ' > > + rm -f backend.log && > > + git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \ > > + diff boundary.c >actual 2>stderr && > > + # Fallback produces the full builtin diff (both change regions). > > + test_grep "^-OLD5" actual && > > + test_grep "^+NEW5" actual && > > + test_grep "^-OLD9" actual && > > + test_grep "^+NEW9" actual && > > + # Tool was contacted (it replied with error, not crash). > > + test_grep "command=hunks pathname=boundary.c" backend.log && > > + test_grep "diff process.*failed" stderr > > +' > > + > > +test_expect_success PYTHON 'diff process error keeps tool available for next file' ' > > + rm -f backend.log && > > + git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \ > > + diff -- one.c two.c >actual 2>stderr && > > + # Unlike abort, error keeps the tool available: both files > > + # are sent to the tool (and both fall back). > > + test_grep "pathname=one.c" backend.log && > > + test_grep "pathname=two.c" backend.log && > > + test_grep "return 10" actual && > > + test_grep "return 20" actual > > +' > > + > > +test_expect_success PYTHON 'diff process abort disables for session' ' > > + rm -f backend.log && > > + git -c diff.cdiff.process="$BACKEND --mode=abort --log=backend.log" \ > > + diff -- one.c two.c >actual && > > + # Both files should still produce diff output via fallback. > > + test_grep "return 10" actual && > > + test_grep "return 20" actual && > > + # The tool aborts on the first file and git clears its > > + # capability. The second file never contacts the tool. > > + test_grep "pathname=one.c" backend.log && > > + test_grep ! "pathname=two.c" backend.log > > +' > > + > > +test_expect_success PYTHON 'diff process fallback on tool crash' ' > > + git -c diff.cdiff.process="$BACKEND --mode=crash" \ > > + diff boundary.c >actual 2>stderr && > > + test_grep "^-OLD5" actual && > > + test_grep "^+NEW5" actual && > > + test_grep "^-OLD9" actual && > > + test_grep "^+NEW9" actual && > > + # Crash is a communication failure, so a warning is emitted. > > + test_grep "diff process.*failed" stderr > > +' > > + > > +test_expect_success PYTHON 'diff process startup failure only warns once' ' > > + git -c diff.cdiff.process="/nonexistent/tool" \ > > + diff -- one.c two.c >actual 2>stderr && > > + # Both files produce diff output via fallback. > > + test_grep "return 10" actual && > > + test_grep "return 20" actual && > > + # Sentinel prevents repeated warnings: only one, not one per file. > > + test_grep "diff process.*failed" stderr >warnings && > > + test_line_count = 1 warnings > > +' > > + > > +test_expect_success PYTHON 'diff process fallback on bad hunks' ' > > + git -c diff.cdiff.process="$BACKEND --mode=bad-hunk" \ > > + diff boundary.c >actual 2>stderr && > > + test_grep "^-OLD5" actual && > > + test_grep "^+NEW5" actual && > > + test_grep "^-OLD9" actual && > > + test_grep "^+NEW9" actual && > > + # Invalid hunks are caught by xdiff validation, not the > > + # protocol layer, so no warning is emitted. > > + test_must_be_empty stderr > > +' > > + > > +test_expect_success PYTHON 'diff process fallback on mismatched unchanged totals' ' > > + cat >synctest.c <<-\EOF && > > + line1 > > + line2 > > + line3 > > + EOF > > + git add synctest.c && > > + git commit -m "add synctest.c" && > > + > > + cat >synctest.c <<-\EOF && > > + line1 > > + changed > > + line3 > > + EOF > > + > > + # bad-sync reports hunk 1 2 1 1: marks 2 old lines and 1 new > > + # line as changed, leaving 1 unchanged old vs 2 unchanged new. > > + # The synchronization invariant fails and git falls back. > > + git -c diff.cdiff.process="$BACKEND --mode=bad-sync" \ > > + diff synctest.c >actual 2>stderr && > > + test_grep "changed" actual > > +' > > + > > +test_expect_success PYTHON 'diff process fallback on overlapping hunks' ' > > + # boundary.c has 10 lines, so both hunks are in bounds > > + # but they overlap at lines 3-5, triggering the ordering check. > > + git -c diff.cdiff.process="$BACKEND --mode=overlap" \ > > + diff boundary.c >actual 2>stderr && > > + test_grep "NEW5" actual > > +' > > + > > +test_done > > diff --git a/userdiff.h b/userdiff.h > > index 51c26e0d41..a98eabe377 100644 > > --- a/userdiff.h > > +++ b/userdiff.h > > @@ -3,6 +3,7 @@ > > > > #include "notes-cache.h" > > > > +struct diff_subprocess; > > struct index_state; > > struct repository; > > > > @@ -33,6 +34,8 @@ struct userdiff_driver { > > int textconv_want_cache; > > const char *process; > > char *process_owned; > > + struct diff_subprocess *diff_subprocess; > > + unsigned diff_process_failed : 1; > > }; > > enum userdiff_driver_type { > > USERDIFF_DRIVER_TYPE_BUILTIN = 1<<0, > > -- > > gitgitgadget > > > > > >

Michael Montalbo wrote on the Git mailing list (how to reply to this email):

On Sun, Jun 7, 2026 at 7:36 AM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > Hi Michael, > > I stumbled about this patch when it broke CI in Git for Windows, where we > do _not_ use `NO_PYTHON`, even though Python is unavailable in the > build/test CI jobs. The existing tests handle this situation gracefully, > this here patch does not: > > On Sun, 7 Jun 2026, Michael Montalbo via GitGitGadget wrote: > > > diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh > > new file mode 100755 > > index 0000000000..f159cd86d8 > > --- /dev/null > > +++ b/t/t4080-diff-process.sh > > @@ -0,0 +1,538 @@ > > +#!/bin/sh > > + > > +test_description='diff process via long-running process' > > + > > +. ./test-lib.sh > > + > > +if test_have_prereq PYTHON > > +then > > + PYTHON_PATH=$(command -v python3) || PYTHON_PATH=$(command -v python) > > When neither `python3` nor `python` are available (which is the case in > the minimal Git for Windows SDK used in Git's CI runs), this fails under > `set -e`. Before even running the first test case. Resulting in an > unexpected TAP format error. > > Now, we could "fix" this by imitating what `lib-p4` does (see > https://github.com/dscho/git/commit/bd0b5570c744f678911a67a62da63f30655f20d8 > which demonstrates that it is indeed a work-around on Windows): > > -- snip -- > diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh > index fdf6da1c341e67..bd22c247ff3856 100755 > --- a/t/t4080-diff-process.sh > +++ b/t/t4080-diff-process.sh > @@ -4,9 +4,10 @@ test_description='diff process via long-running process' > > . ./test-lib.sh > > -if test_have_prereq PYTHON > +if ! test_have_prereq PYTHON || ! test -x "$PYTHON_PATH" > then > - PYTHON_PATH=$(command -v python3) || PYTHON_PATH=$(command -v python) > + skip_all='python interpreter not available' > + test_done > fi > > # > -- snap -- > > Of course, this uncovers _another_ problem with the Python script: It uses > Python3-only `f"..."` format strings, which cannot be handled by the > Python2 to which the `PYTHON_PATH` variable in `linux-TEST-vars` points. > So this requires _another follow-up (see also > https://github.com/dscho/git/commit/c12a9f4c80e5ce8db0fe370fac46fb45be2b775f): > > -- snip -- > diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh > index bd22c247ff3856..ba14682a9086e4 100755 > --- a/t/t4080-diff-process.sh > +++ b/t/t4080-diff-process.sh > @@ -39,7 +39,8 @@ setup_backend () { > > def write_pkt(line): > data = (line + "\n").encode() > - sys.stdout.buffer.write(f"{len(data)+4:04x}".encode() + data) > + hdr = "{:04x}".format(len(data) + 4).encode() > + sys.stdout.buffer.write(hdr + data) > sys.stdout.buffer.flush() > > def write_flush(): > @@ -98,7 +99,8 @@ setup_backend () { > new = read_content() > old_first = old.split(b"\n")[0].decode(errors="replace") if old else "" > new_first = new.split(b"\n")[0].decode(errors="replace") if new else "" > - log(f"command={cmd} pathname={pathname} old={old_first} new={new_first}") > + log("command={} pathname={} old={} new={}".format( > + cmd, pathname, old_first, new_first)) > > if mode == "error": > write_flush() > @@ -130,7 +132,7 @@ setup_backend () { > else: > ol = old.count(b"\n") > nl = new.count(b"\n") > - write_pkt(f"hunk 1 {ol} 1 {nl}") > + write_pkt("hunk 1 {} 1 {}".format(ol, nl)) > write_flush() > write_pkt("status=success") > write_flush() > -- snap -- > > And this is still not enough to make it work with Python2, see > https://github.com/dscho/git/actions/runs/27091523842/job/79955895737: > > -- snip -- > [...] > + git -c diff.cdiff.process=./diff-process-backend --mode=fixed-hunk diff boundary.c > Traceback (most recent call last): > File "/__w/git/git/t/trash directory.t4080-diff-process/diff-process-backend.py", line 45, in <module> > assert read_pkt() == "git-diff-client" > File "/__w/git/git/t/trash directory.t4080-diff-process/diff-process-backend.py", line 4, in read_pkt > hdr = sys.stdin.buffer.read(4) > AttributeError: 'file' object has no attribute 'buffer' > -- snap -- > > I have experienced similar patterns in my career, where a single decision > required multiple follow-up fixes _just_ to avoid having to revert that > decision. This kind of doubling down has never ended well. > > Therefore I would like to take a step back, and ask: Is it _really_ a good > idea to use Python here? Are we certain that we want to _require_ Python > to run this test and skip it if Python isn't available (as is the case in > the Windows-related parts of Git's very own CI) even if Python has nothing > at all to do with the feature that is being tested? > > I don't want to be doomed to repeat history, and we can very well learn > e.g. from prior art in this very project, where the tests for the > clean/smudge filters (which _also_ want to speak pkt-line over stdio) > needlessly incurred Perl as a requirement to run the tests. It was > Matheus's heroic work in 52917a998ef3a (t0021: implementation the > rot13-filter.pl script in C, 2022-08-14) and 4d1d843be7a15 (tests: use the > new C rot13-filter helper to avoid PERL prereq, 2022-08-14) that avoided > that unnecessary prerequisite. > > Likewise, there is `test-tool pkt-line` intended for driving the pkt-line > protocol via simple shell scripts. > > So the conscious project direction has been: fold pkt-line test backends > into `test-tool` and drop the scripting-language prereq. Reintroducing the > same shape in Python would walk this back. > > Patrick's careful effort in 27bd8ee311719 (Merge branch 'ps/fewer-perl', > 2025-04-29) has been another clear sign that the Git project is actively > _removing_ scripting-language dependencies from the build and test > infrastructure, not adding new ones. > Now I wonder if the extension / addition of more Perl test infra with my other series: https://lore.kernel.org/git/pull.2135.git.1780559158.gitgitgadget@gmail.com/T/ also goes against the project direction w.r.t. removing scripting languages. Maybe I should also re-evaluate the usage of Perl there. I am leveraging existing shell parsing logic written in Perl, but maybe the preference for Perl-based lint rules is a mistake and should be avoided. > The clear prior art in Git's own tests for what t4080 wants to do, as of > today, is `t/helper/test-rot13-filter.c`, which could be imitated here > instead of (re-)introducing a dependency on a scripting language other > than Unix shell in Git's test suite. > > The `PYTHON` prereq exists in exactly five files today, all `git p4` > related (where Python is an intrinsic prerequisite given that `git-p4.py` > _is_ written in Python): `t/lib-git-p4.sh`, `t/t9802-git-p4-filetype.sh`, > `t/t9810-git-p4-rcs.sh`, `t/t9835-git-p4-metadata-encoding-python2.sh`, > and `t/t9836-git-p4-metadata-encoding-python3.sh`. > > After 7cdbff14d482 (remove merge-recursive-old, 2006-11-20), this here > patch would be the first one, after almost 20 years, to re-introduce > Python as a dependency outside `git p4`. > > And it would also be the first ever to embed a Python script as a heredoc: > > > +fi > > + > > +# > > +# A single parametric diff process. > > +# Usage: diff-process-backend --mode=<mode> [--log=<path>] > > +# > > +# Modes: > > +# whole-file - report all lines as changed (default) > > +# fixed-hunk - always report hunk 5 2 5 2 > > +# bad-hunk - report out-of-bounds hunk 999 1 999 1 > > +# bad-sync - report hunk with mismatched unchanged totals > > +# overlap - report two overlapping hunks > > +# no-hunks - return no hunks (files considered equivalent) > > +# error - return status=error for every request > > +# abort - return status=abort for every request > > +# crash - read one request then exit without responding > > +# > > +setup_backend () { > > + cat >"$TRASH_DIRECTORY/diff-process-backend.py" <<-\PYEOF > > + import sys, os > > + > > + def read_pkt(): > > + hdr = sys.stdin.buffer.read(4) > > + if len(hdr) < 4: return None > > + length = int(hdr, 16) > > + if length == 0: return "" > > + data = sys.stdin.buffer.read(length - 4) > > + return data.decode().rstrip("\n") > > + > > + def write_pkt(line): > > + data = (line + "\n").encode() > > + sys.stdout.buffer.write(f"{len(data)+4:04x}".encode() + data) > > + sys.stdout.buffer.flush() > > + > > + def write_flush(): > > + sys.stdout.buffer.write(b"0000") > > + sys.stdout.buffer.flush() > > + > > + def read_content(): > > + chunks = [] > > + while True: > > + hdr = sys.stdin.buffer.read(4) > > + if len(hdr) < 4: break > > + length = int(hdr, 16) > > + if length == 0: break > > + chunks.append(sys.stdin.buffer.read(length - 4)) > > + return b"".join(chunks) > > + > > + mode = "whole-file" > > + logfile = None > > + for arg in sys.argv[1:]: > > + if arg.startswith("--mode="): > > + mode = arg[7:] > > + elif arg.startswith("--log="): > > + logfile = open(arg[6:], "a") > > + > > + def log(msg): > > + if logfile: > > + logfile.write(msg + "\n") > > + logfile.flush() > > + > > + # Handshake > > + assert read_pkt() == "git-diff-client" > > + assert read_pkt() == "version=1" > > + read_pkt() > > + write_pkt("git-diff-server") > > + write_pkt("version=1") > > + write_flush() > > + while True: > > + p = read_pkt() > > + if p == "": break > > + write_pkt("capability=hunks") > > + write_flush() > > + > > + log("ready") > > + > > + while True: > > + cmd = None > > + pathname = None > > + while True: > > + p = read_pkt() > > + if p is None: sys.exit(0) > > + if p == "": break > > + if p.startswith("command="): cmd = p.split("=",1)[1] > > + if p.startswith("pathname="): pathname = p.split("=",1)[1] > > + if cmd is None: sys.exit(0) > > + old = read_content() > > + new = read_content() > > + old_first = old.split(b"\n")[0].decode(errors="replace") if old else "" > > + new_first = new.split(b"\n")[0].decode(errors="replace") if new else "" > > + log(f"command={cmd} pathname={pathname} old={old_first} new={new_first}") > > + > > + if mode == "error": > > + write_flush() > > + write_pkt("status=error") > > + write_flush() > > + continue > > + > > + if mode == "abort": > > + write_flush() > > + write_pkt("status=abort") > > + write_flush() > > + continue > > + > > + if mode == "crash": > > + sys.exit(1) > > + > > + if cmd == "hunks": > > + if mode == "fixed-hunk": > > + write_pkt("hunk 5 2 5 2") > > + elif mode == "bad-hunk": > > + write_pkt("hunk 999 1 999 1") > > + elif mode == "bad-sync": > > + write_pkt("hunk 1 2 1 1") > > + elif mode == "overlap": > > + write_pkt("hunk 1 5 1 5") > > + write_pkt("hunk 3 2 3 2") > > + elif mode == "no-hunks": > > + pass > > + else: > > + ol = old.count(b"\n") > > + nl = new.count(b"\n") > > + write_pkt(f"hunk 1 {ol} 1 {nl}") > > + write_flush() > > + write_pkt("status=success") > > + write_flush() > > + else: > > + write_flush() > > + write_pkt("status=error") > > + write_flush() > > + PYEOF > > The existing pattern is to provide larger scripts as fixtures in > associated `t/tNNNN/` directories, not as heredoc, see e.g. > `t/t1509/prepare-chroot.sh`. Writing scripts, especially lengthy ones, in > heredoc strings makes it virtually impossible to use static code analysis > or syntax highlighting to fend off banal errors. > > Given the complexity of what t4080 tries to test (error, abort, crash, > bad-sync, no-hunks, multiple files in one session, capability > negotiation), it would unfortunately be infeasible to use `test-tool > pkt-line` from a shell script implementing that `diff.*.process` protocol. > > So I've spiked a demo how the `test-tool diff-process-backend` could look > like (letting Opus do the menial typing, so that I can enjoy at least part > of a sunny Sunday outside), which also passes the CI build and test: > https://github.com/dscho/git/commit/b6e3c93381b00929476c3a00155f7cf7334a22e6 > > That commit is of course not intended to be used as-is; Feel free to pick > code parts of it and integrate them into your topic branch. Or write your > own test-tool helper from scratch if that's more your jam. > > Ciao, > Johannes > > > + write_script diff-process-backend <<-SHEOF > > + exec "$PYTHON_PATH" "$TRASH_DIRECTORY/diff-process-backend.py" "\$@" > > + SHEOF > > +} > > + > > +BACKEND="./diff-process-backend" > > + > > +test_expect_success PYTHON 'setup' ' > > + setup_backend && > > + echo "*.c diff=cdiff" >.gitattributes && > > + git add .gitattributes && > > + > > + # boundary.c: 10 lines, changes at 5-6 and 9-10. > > + # Used by: hunk boundaries, error fallback, crash, bad hunks, overlap. > > + cat >boundary.c <<-\EOF && > > + line1 > > + line2 > > + line3 > > + line4 > > + OLD5 > > + OLD6 > > + line7 > > + line8 > > + OLD9 > > + OLD10 > > + EOF > > + git add boundary.c && > > + > > + # worddiff.c: single-line function, value changes 1 -> 999. > > + # Used by: word-diff, --diff-algorithm, --no-ext-diff, --stat. > > + cat >worddiff.c <<-\EOF && > > + int value(void) { return 1; } > > + EOF > > + git add worddiff.c && > > + > > + # newfile.c: single-line function, value changes 42 -> 99. > > + # Used by: new file, --exit-code, multiple drivers. > > + cat >newfile.c <<-\EOF && > > + int new_func(void) { return 42; } > > + EOF > > + git add newfile.c && > > + > > + # logtest.c: single-line function for log/format-patch tests. > > + # Needs two commits so log -1 has a diff. > > + cat >logtest.c <<-\EOF && > > + int logfunc(void) { return 1; } > > + EOF > > + git add logtest.c && > > + > > + # two.c/one.c: two-file pair for error/abort/startup-failure tests. > > + cat >one.c <<-\EOF && > > + int first(void) { return 1; } > > + EOF > > + cat >two.c <<-\EOF && > > + int second(void) { return 2; } > > + EOF > > + git add one.c two.c && > > + > > + git commit -m "initial" && > > + > > + # Second commit for logtest.c (so log -1 has something to show). > > + cat >logtest.c <<-\EOF && > > + int logfunc(void) { return 2; } > > + EOF > > + git add logtest.c && > > + git commit -m "change logtest.c" && > > + > > + # Working tree modifications (not committed). > > + cat >boundary.c <<-\EOF && > > + line1 > > + line2 > > + line3 > > + line4 > > + NEW5 > > + NEW6 > > + line7 > > + line8 > > + NEW9 > > + NEW10 > > + EOF > > + > > + cat >worddiff.c <<-\EOF && > > + int value(void) { return 999; } > > + EOF > > + > > + cat >newfile.c <<-\EOF && > > + int new_func(void) { return 99; } > > + EOF > > + > > + cat >one.c <<-\EOF && > > + int first(void) { return 10; } > > + EOF > > + > > + cat >two.c <<-\EOF > > + int second(void) { return 20; } > > + EOF > > +' > > + > > +# > > +# Core behavior: the tool controls which lines are marked as changed. > > +# > > + > > +test_expect_success PYTHON 'diff process hunk boundaries affect output' ' > > + # The file has changes at lines 5-6 and 9-10, but fixed-hunk > > + # only reports lines 5-6 as changed. Lines 9-10 should not > > + # appear as changed in the output. > > + git -c diff.cdiff.process="$BACKEND --mode=fixed-hunk" \ > > + diff boundary.c >actual && > > + test_grep "^-OLD5" actual && > > + test_grep "^-OLD6" actual && > > + test_grep "^+NEW5" actual && > > + test_grep "^+NEW6" actual && > > + test_grep ! "^-OLD9" actual && > > + test_grep ! "^-OLD10" actual && > > + test_grep ! "^+NEW9" actual && > > + test_grep ! "^+NEW10" actual > > +' > > + > > +test_expect_success PYTHON 'diff process works with new file' ' > > + rm -f backend.log && > > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \ > > + diff -- newfile.c >actual 2>stderr && > > + test_grep "return 99" actual && > > + test_grep "pathname=newfile.c" backend.log && > > + test_must_be_empty stderr > > +' > > + > > +test_expect_success PYTHON 'diff process works with added file (empty old side)' ' > > + cat >added.c <<-\EOF && > > + int added(void) { return 1; } > > + EOF > > + git add added.c && > > + > > + rm -f backend.log && > > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \ > > + diff --cached -- added.c >actual 2>stderr && > > + test_grep "added" actual && > > + test_grep "pathname=added.c" backend.log && > > + test_must_be_empty stderr > > +' > > + > > +test_expect_success PYTHON 'diff process skipped for binary files' ' > > + printf "\\0binary" >binary.c && > > + git add binary.c && > > + git commit -m "add binary" && > > + printf "\\0changed" >binary.c && > > + > > + rm -f backend.log && > > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \ > > + diff -- binary.c >actual && > > + test_grep "Binary files" actual && > > + test_path_is_missing backend.log > > +' > > + > > +test_expect_success PYTHON 'diff process not consulted for unmatched driver' ' > > + echo "not tracked by cdiff" >unmatched.txt && > > + git add unmatched.txt && > > + git commit -m "add unmatched.txt" && > > + > > + echo "modified" >unmatched.txt && > > + > > + rm -f backend.log && > > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \ > > + diff -- unmatched.txt >actual && > > + test_grep "modified" actual && > > + test_path_is_missing backend.log > > +' > > + > > +test_expect_success PYTHON 'multiple drivers use separate processes' ' > > + echo "*.h diff=hdiff" >>.gitattributes && > > + git add .gitattributes && > > + > > + cat >multi.h <<-\EOF && > > + int header(void) { return 1; } > > + EOF > > + git add multi.h && > > + git commit -m "add multi.h" && > > + > > + cat >multi.h <<-\EOF && > > + int header(void) { return 2; } > > + EOF > > + > > + rm -f backend-c.log backend-h.log && > > + git -c diff.cdiff.process="$BACKEND --log=backend-c.log" \ > > + -c diff.hdiff.process="$BACKEND --log=backend-h.log" \ > > + diff -- newfile.c multi.h >actual 2>stderr && > > + test_grep "pathname=newfile.c" backend-c.log && > > + test_grep "pathname=multi.h" backend-h.log && > > + test_must_be_empty stderr > > +' > > + > > +test_expect_success PYTHON 'diff process works alongside textconv' ' > > + write_script uppercase-filter <<-\EOF && > > + tr "a-z" "A-Z" <"$1" > > + EOF > > + > > + cat >textconv.c <<-\EOF && > > + hello world > > + EOF > > + git add textconv.c && > > + git commit -m "add textconv.c" && > > + > > + cat >textconv.c <<-\EOF && > > + goodbye world > > + EOF > > + > > + rm -f backend.log && > > + git -c diff.cdiff.textconv="./uppercase-filter" \ > > + -c diff.cdiff.process="$BACKEND --log=backend.log" \ > > + diff -- textconv.c >actual 2>stderr && > > + # The diff process receives textconv-transformed (uppercase) content. > > + test_grep "pathname=textconv.c" backend.log && > > + test_grep "old=HELLO WORLD" backend.log && > > + test_grep "new=GOODBYE WORLD" backend.log && > > + test_must_be_empty stderr > > +' > > + > > +# > > +# Downstream features: word diff, log, equivalent files, exit code. > > +# > > + > > +test_expect_success PYTHON 'diff process with --word-diff' ' > > + rm -f backend.log && > > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \ > > + diff --word-diff worddiff.c >actual 2>stderr && > > + test_grep "\[-1;-\]" actual && > > + test_grep "{+999;+}" actual && > > + test_grep "pathname=worddiff.c" backend.log && > > + test_must_be_empty stderr > > +' > > + > > +test_expect_success PYTHON 'diff process works with git log -p' ' > > + # With no-hunks mode, the tool says the files are equivalent, > > + # so log -p should show the commit but no diff content. > > + rm -f backend.log && > > + git -c diff.cdiff.process="$BACKEND --mode=no-hunks --log=backend.log" \ > > + log -1 -p -- logtest.c >actual 2>stderr && > > + test_grep "change logtest.c" actual && > > + test_grep ! "return 2" actual && > > + test_grep "command=hunks pathname=logtest.c" backend.log && > > + test_must_be_empty stderr > > +' > > + > > +test_expect_success PYTHON 'diff process no hunks suppresses diff output' ' > > + cat >nohunks.c <<-\EOF && > > + int zero(void) { return 0; } > > + EOF > > + git add nohunks.c && > > + git commit -m "add nohunks.c" && > > + > > + cat >nohunks.c <<-\EOF && > > + int zero(void) { return 999; } > > + EOF > > + > > + git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \ > > + diff nohunks.c >actual && > > + test_must_be_empty actual > > +' > > + > > +test_expect_success PYTHON 'diff process no hunks with --exit-code returns success' ' > > + git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \ > > + diff --exit-code nohunks.c > > +' > > + > > +test_expect_success PYTHON 'diff process with --exit-code and hunks returns failure' ' > > + test_expect_code 1 git -c diff.cdiff.process="$BACKEND" \ > > + diff --exit-code newfile.c > > +' > > + > > +# > > +# Bypass mechanisms: flags and commands that skip the diff process. > > +# > > + > > +test_expect_success PYTHON 'diff process bypassed by --diff-algorithm' ' > > + rm -f backend.log && > > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \ > > + diff --diff-algorithm=patience worddiff.c >actual && > > + test_grep "return 999" actual && > > + test_path_is_missing backend.log > > +' > > + > > +test_expect_success PYTHON 'diff process not used by --stat' ' > > + rm -f backend.log && > > + git -c diff.cdiff.process="$BACKEND --log=backend.log" \ > > + diff --stat worddiff.c >actual && > > + test_grep "worddiff.c" actual && > > + test_path_is_missing backend.log > > +' > > + > > +# > > +# Error handling and fallback. > > +# > > + > > +test_expect_success PYTHON 'diff process fallback on tool error status' ' > > + rm -f backend.log && > > + git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \ > > + diff boundary.c >actual 2>stderr && > > + # Fallback produces the full builtin diff (both change regions). > > + test_grep "^-OLD5" actual && > > + test_grep "^+NEW5" actual && > > + test_grep "^-OLD9" actual && > > + test_grep "^+NEW9" actual && > > + # Tool was contacted (it replied with error, not crash). > > + test_grep "command=hunks pathname=boundary.c" backend.log && > > + test_grep "diff process.*failed" stderr > > +' > > + > > +test_expect_success PYTHON 'diff process error keeps tool available for next file' ' > > + rm -f backend.log && > > + git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \ > > + diff -- one.c two.c >actual 2>stderr && > > + # Unlike abort, error keeps the tool available: both files > > + # are sent to the tool (and both fall back). > > + test_grep "pathname=one.c" backend.log && > > + test_grep "pathname=two.c" backend.log && > > + test_grep "return 10" actual && > > + test_grep "return 20" actual > > +' > > + > > +test_expect_success PYTHON 'diff process abort disables for session' ' > > + rm -f backend.log && > > + git -c diff.cdiff.process="$BACKEND --mode=abort --log=backend.log" \ > > + diff -- one.c two.c >actual && > > + # Both files should still produce diff output via fallback. > > + test_grep "return 10" actual && > > + test_grep "return 20" actual && > > + # The tool aborts on the first file and git clears its > > + # capability. The second file never contacts the tool. > > + test_grep "pathname=one.c" backend.log && > > + test_grep ! "pathname=two.c" backend.log > > +' > > + > > +test_expect_success PYTHON 'diff process fallback on tool crash' ' > > + git -c diff.cdiff.process="$BACKEND --mode=crash" \ > > + diff boundary.c >actual 2>stderr && > > + test_grep "^-OLD5" actual && > > + test_grep "^+NEW5" actual && > > + test_grep "^-OLD9" actual && > > + test_grep "^+NEW9" actual && > > + # Crash is a communication failure, so a warning is emitted. > > + test_grep "diff process.*failed" stderr > > +' > > + > > +test_expect_success PYTHON 'diff process startup failure only warns once' ' > > + git -c diff.cdiff.process="/nonexistent/tool" \ > > + diff -- one.c two.c >actual 2>stderr && > > + # Both files produce diff output via fallback. > > + test_grep "return 10" actual && > > + test_grep "return 20" actual && > > + # Sentinel prevents repeated warnings: only one, not one per file. > > + test_grep "diff process.*failed" stderr >warnings && > > + test_line_count = 1 warnings > > +' > > + > > +test_expect_success PYTHON 'diff process fallback on bad hunks' ' > > + git -c diff.cdiff.process="$BACKEND --mode=bad-hunk" \ > > + diff boundary.c >actual 2>stderr && > > + test_grep "^-OLD5" actual && > > + test_grep "^+NEW5" actual && > > + test_grep "^-OLD9" actual && > > + test_grep "^+NEW9" actual && > > + # Invalid hunks are caught by xdiff validation, not the > > + # protocol layer, so no warning is emitted. > > + test_must_be_empty stderr > > +' > > + > > +test_expect_success PYTHON 'diff process fallback on mismatched unchanged totals' ' > > + cat >synctest.c <<-\EOF && > > + line1 > > + line2 > > + line3 > > + EOF > > + git add synctest.c && > > + git commit -m "add synctest.c" && > > + > > + cat >synctest.c <<-\EOF && > > + line1 > > + changed > > + line3 > > + EOF > > + > > + # bad-sync reports hunk 1 2 1 1: marks 2 old lines and 1 new > > + # line as changed, leaving 1 unchanged old vs 2 unchanged new. > > + # The synchronization invariant fails and git falls back. > > + git -c diff.cdiff.process="$BACKEND --mode=bad-sync" \ > > + diff synctest.c >actual 2>stderr && > > + test_grep "changed" actual > > +' > > + > > +test_expect_success PYTHON 'diff process fallback on overlapping hunks' ' > > + # boundary.c has 10 lines, so both hunks are in bounds > > + # but they overlap at lines 3-5, triggering the ordering check. > > + git -c diff.cdiff.process="$BACKEND --mode=overlap" \ > > + diff boundary.c >actual 2>stderr && > > + test_grep "NEW5" actual > > +' > > + > > +test_done > > diff --git a/userdiff.h b/userdiff.h > > index 51c26e0d41..a98eabe377 100644 > > --- a/userdiff.h > > +++ b/userdiff.h > > @@ -3,6 +3,7 @@ > > > > #include "notes-cache.h" > > > > +struct diff_subprocess; > > struct index_state; > > struct repository; > > > > @@ -33,6 +34,8 @@ struct userdiff_driver { > > int textconv_want_cache; > > const char *process; > > char *process_owned; > > + struct diff_subprocess *diff_subprocess; > > + unsigned diff_process_failed : 1; > > }; > > enum userdiff_driver_type { > > USERDIFF_DRIVER_TYPE_BUILTIN = 1<<0, > > -- > > gitgitgadget > > > > > >

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > So the conscious project direction has been: fold pkt-line test backends > into `test-tool` and drop the scripting-language prereq. Reintroducing the > same shape in Python would walk this back. > ... > The `PYTHON` prereq exists in exactly five files today, all `git p4` > related (where Python is an intrinsic prerequisite given that `git-p4.py` > _is_ written in Python): `t/lib-git-p4.sh`, `t/t9802-git-p4-filetype.sh`, > `t/t9810-git-p4-rcs.sh`, `t/t9835-git-p4-metadata-encoding-python2.sh`, > and `t/t9836-git-p4-metadata-encoding-python3.sh`. > ... > That commit is of course not intended to be used as-is; Feel free to pick > code parts of it and integrate them into your topic branch. Or write your > own test-tool helper from scratch if that's more your jam. Showing better direction to new folks with such a clear thinking is very much appreciated. Even though it is natural and perfectly OK for tests that interacts with parts of Git that are written in these languages (e.g., we are OK for gitweb tests to require Perl), we should consciously keep ourselves clean and not adding unnecessary dependencies.

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

Michael Montalbo <mmontalbo@gmail.com> writes: > On Sun, Jun 7, 2026 at 7:36 AM Johannes Schindelin > <Johannes.Schindelin@gmx.de> wrote: >> >> Hi Michael, >> >> I stumbled about this patch when it broke CI in Git for Windows, where we >> do _not_ use `NO_PYTHON`, even though Python is unavailable in the >> build/test CI jobs. The existing tests handle this situation gracefully, >> this here patch does not: >> ... >> Given the complexity of what t4080 tries to test (error, abort, crash, >> bad-sync, no-hunks, multiple files in one session, capability >> negotiation), it would unfortunately be infeasible to use `test-tool >> pkt-line` from a shell script implementing that `diff.*.process` protocol. >> >> So I've spiked a demo how the `test-tool diff-process-backend` could look >> like (letting Opus do the menial typing, so that I can enjoy at least part >> of a sunny Sunday outside), which also passes the CI build and test: >> https://github.com/dscho/git/commit/b6e3c93381b00929476c3a00155f7cf7334a22e6 >> >> That commit is of course not intended to be used as-is; Feel free to pick >> code parts of it and integrate them into your topic branch. Or write your >> own test-tool helper from scratch if that's more your jam. >> > > Johannes, thank you for the great feedback. The historical context is > really helpful and > the concerns you raise make a lot of sense. I will take a look at your > spike and also work > on removing Python from the test. Another request. Please do not force readers to scroll through a ~800 line message just to read only 5 lines of response from you. Keep relevant parts of the message you are responding to in your message to help readers understand the context in which your response was made, but trim everything else that is not relevant from your quote. Thanks.

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

Michael Montalbo <mmontalbo@gmail.com> writes: >> So the conscious project direction has been: fold pkt-line test backends >> into `test-tool` and drop the scripting-language prereq. Reintroducing the >> same shape in Python would walk this back. >> >> Patrick's careful effort in 27bd8ee311719 (Merge branch 'ps/fewer-perl', >> 2025-04-29) has been another clear sign that the Git project is actively >> _removing_ scripting-language dependencies from the build and test >> infrastructure, not adding new ones. > > Now I wonder if the extension / addition of more Perl test infra with my other > series: > > https://lore.kernel.org/git/pull.2135.git.1780559158.gitgitgadget@gmail.com/T/ > > also goes against the project direction w.r.t. removing scripting languages. > Maybe I should also re-evaluate the usage of Perl there. I am leveraging > existing shell parsing logic written in Perl, but maybe the preference for > Perl-based lint rules is a mistake and should be avoided. That sounds prudent (even though it is outside the scope of _this_ topic, of course). Thanks.

-Original file line number
+Diff line change
@@ Expand Up / @@ -218,6 +218,11 @@ endif::git-diff[] @@
     	Set this option to `true` to make the diff driver cache the text
     	conversion outputs.  See linkgit:gitattributes[5] for details.
+    `diff.<driver>.process`::
+    	The command to run as a long-running diff process that
+    	provides hunks to Git's diff pipeline.
+    	See linkgit:gitattributes[5] for details.
     `diff.indentHeuristic`::
     	Set this option to `false` to disable the default heuristics
     	that shift diff hunk boundaries to make patches easier to read.
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -18,3 +18,6 @@ @@
     For instance, if you configured the `diff.algorithm` variable to a
     non-default value and want to use the default one, then you
     have to use `--diff-algorithm=default` option.
+    +
+    If you explicitly choose a diff algorithm, it also bypasses
+    `diff.<driver>.process` (see linkgit:gitattributes[5]).

-Original file line number
+Diff line change
@@ Expand Up / @@ -825,7 +825,9 @@ endif::git-format-patch[] @@
     	to use this option with linkgit:git-log[1] and friends.
     `--no-ext-diff`::
-    	Disallow external diff drivers.
+    	Disallow external diff helpers, including
+    	`diff.<driver>.command` and `diff.<driver>.process`
+    	(see linkgit:gitattributes[5]).
     `--textconv`::
     `--no-textconv`::
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up @@
     (see above), and adding `diff.<name>.algorithm` has no effect, as the
     algorithm is not passed to the external diff driver.
+    Using an external diff process
+    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+    If `diff.<name>.process` is defined, Git sends the old and new file
+    content to an external tool and receives back a list of changed
+    regions (pairs of line ranges in the old and new file).  Git uses
+    these instead of its builtin diff algorithm, but still controls
+    all output formatting, so features like word diff, function context,
+    color, and blame work normally.  This is achieved by using the
+    long-running process protocol (described in
+    Documentation/technical/long-running-process-protocol.adoc).
+    Unlike `diff.<name>.command`, which replaces Git's output entirely,
+    the diff process feeds results back into the standard pipeline.
+    First, in `.gitattributes`, assign the `diff` attribute for paths.
+    ------------------------
+    *.c diff=cdiff
+    ------------------------
+    Then, define a "diff.<name>.process" configuration to specify
+    the diff process command.
+    ----------------------------------------------------------------
+    [diff "cdiff"]
+      process = /path/to/diff-process-tool
+    ----------------------------------------------------------------
+    When Git encounters the first file that needs to be diffed, it starts
+    the process and performs the handshake.  In the handshake, the welcome
+    message sent by Git is "git-diff-client", only version 1 is supported,
+    and the supported capability is "hunks" (the changed regions
+    described below).
+    For each file, Git sends a list of "key=value" pairs terminated with
+    a flush packet, followed by the old and new file content as packetized
+    data, each terminated with a flush packet.  The pathname is relative
+    to the repository root.  When `diff.<name>.textconv` is also set,
+    the tool receives the textconv-transformed content rather than the
+    raw blob.  Git does not send binary files to the diff process.
+    -----------------------
+    packet:          git> command=hunks
+    packet:          git> pathname=path/file.c
+    packet:          git> 0000
+    packet:          git> OLD_CONTENT
+    packet:          git> 0000
+    packet:          git> NEW_CONTENT
+    packet:          git> 0000
+    -----------------------
+    The tool is expected to respond with zero or more hunk lines,
+    a flush packet, and a status packet terminated with a flush packet.
+    Each hunk line has the form:
+      `hunk <old_start> <old_count> <new_start> <new_count>`
+    where `<old_start>` and `<old_count>` identify a range of lines in
+    the old file, and `<new_start>` and `<new_count>` identify the
+    replacement range in the new file.  Start values are 1-based and
+    counts are non-negative.  Ranges must not extend beyond the end of
+    the file.  For example, `hunk 3 2 3 4` means that 2 lines starting
+    at line 3 in the old file were replaced by 4 lines starting at
+    line 3 in the new file.  An `<old_count>` of 0 means no lines were
+    removed (pure insertion); a `<new_count>` of 0 means no lines were
+    added (pure deletion).
+    Lines are delimited by newlines.  A file `"foo\nbar\n"` and a
+    file `"foo\nbar"` both have 2 lines.
+    Hunks must be listed in order and must not overlap.  Any line
+    not covered by a hunk is treated as unchanged, so the total
+    number of unchanged lines must be the same on both sides.
+    For example, if the old file has 10 lines and the hunks cover
+of them (`old_count` values summing to 4), then 6 old lines
+    are unchanged.  The new file must also have exactly 6 lines
+    not covered by hunks, so the `new_count` values must sum to
+    `new_file_lines - 6`.
+    -----------------------
+    packet:          git< hunk 1 3 1 5
+    packet:          git< hunk 10 2 12 2
+    packet:          git< 0000
+    packet:          git< status=success
+    packet:          git< 0000
+    -----------------------
+    If the tool responds with hunks and "success", Git marks those lines
+    as changed and feeds them into the standard diff pipeline.  Patch
+    output features (word diff, function context, color) work normally.
+    Note that `--stat` and other summary formats use their own diff path
+    and are not affected by the diff process.
+    If no hunk lines precede the flush, followed by "success", Git
+    treats the files as having no changes: `git diff` produces no output
+    and `git blame` skips the commit, attributing lines to earlier commits.
+    -----------------------
+    packet:          git< 0000
+    packet:          git< status=success
+    packet:          git< 0000
+    -----------------------
+    If the tool returns invalid hunks (out of bounds, overlapping), Git
+    silently falls back to the builtin diff algorithm.
+    In case the tool cannot or does not want to process the content,
+    it is expected to respond with an "error" status.  Git warns and
+    falls back to the builtin diff algorithm for this file.  The tool
+    remains available for subsequent files.
+    -----------------------
+    packet:          git< 0000
+    packet:          git< status=error
+    packet:          git< 0000
+    -----------------------
+    In case the tool cannot or does not want to process the content as
+    well as any future content for the lifetime of the Git process, it
+    is expected to respond with an "abort" status.  Git silently falls
+    back to the builtin diff algorithm for this file and does not send
+    further requests to the tool.
+    -----------------------
+    packet:          git< 0000
+    packet:          git< status=abort
+    packet:          git< 0000
+    -----------------------
+    If the tool dies during the communication or does not adhere to the
+    protocol then Git will stop the process and fall back to the builtin
+    diff algorithm.  Git warns once and does not restart the process for
+    subsequent files.
+    Tools should ignore unknown keys in the per-file request to remain
+    forward-compatible.  Future versions of Git may send additional
+    `command=` values; tools that receive an unrecognized command should
+    respond with `status=error` rather than terminating.
     Defining a custom hunk-header
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -815,6 +815,7 @@ TEST_BUILTINS_OBJS += test-csprng.o @@
     TEST_BUILTINS_OBJS += test-date.o
     TEST_BUILTINS_OBJS += test-delete-gpgsig.o
     TEST_BUILTINS_OBJS += test-delta.o
+    TEST_BUILTINS_OBJS += test-diff-process-backend.o
     TEST_BUILTINS_OBJS += test-dir-iterator.o
     TEST_BUILTINS_OBJS += test-drop-caches.o
     TEST_BUILTINS_OBJS += test-dump-cache-tree.o
@@ Expand Down Expand Up / @@ -1142,6 +1143,7 @@ LIB_OBJS += diff-delta.o @@
     LIB_OBJS += diff-merges.o
     LIB_OBJS += diff-lib.o
     LIB_OBJS += diff-no-index.o
+    LIB_OBJS += diff-process.o
     LIB_OBJS += diff.o
     LIB_OBJS += diffcore-break.o
     LIB_OBJS += diffcore-delta.o
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] diff: add `diff.<driver>.process` for external hunk providers #2120

Uh oh!

Diff view

Diff view

There are no files selected for viewing

gitgitgadget Bot May 26, 2026

Uh oh!

gitgitgadget Bot May 29, 2026

Uh oh!

gitgitgadget Bot May 26, 2026

Uh oh!

gitgitgadget Bot May 29, 2026

Uh oh!

gitgitgadget Bot Jun 7, 2026

Uh oh!

gitgitgadget Bot Jun 7, 2026

Uh oh!

gitgitgadget Bot Jun 7, 2026

Uh oh!

gitgitgadget Bot Jun 8, 2026

Uh oh!

gitgitgadget Bot Jun 8, 2026

Uh oh!

gitgitgadget Bot Jun 8, 2026

Uh oh!

Uh oh!

Uh oh!

-Original file line number
+Diff line change
@@ Expand Up / @@ -19,6 +19,8 @@ @@
     #include "tag.h"
     #include "trace2.h"
     #include "blame.h"
+    #include "diff-process.h"
+    #include "xdiff-interface.h"
     #include "alloc.h"
     #include "commit-slab.h"
     #include "bloom.h"
@@ Expand Down Expand Up @@
-    static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
-    		      xdl_emit_hunk_consume_func_t hunk_func, void *cb_data, int xdl_opts)
+    static int diff_hunks_xpp(mmfile_t *file_a, mmfile_t *file_b,
+    			  xdl_emit_hunk_consume_func_t hunk_func,
+    			  void *cb_data, xpparam_t *xpp)
     {
-    	xpparam_t xpp = {0};
     	xdemitconf_t xecfg = {0};
     	xdemitcb_t ecb = {NULL};
-    	xpp.flags = xdl_opts;
     	xecfg.hunk_func = hunk_func;
     	ecb.priv = cb_data;
-    	return xdi_diff(file_a, file_b, &xpp, &xecfg, &ecb);
+    	return xdi_diff(file_a, file_b, xpp, &xecfg, &ecb);
+    }
+    static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
+    		      xdl_emit_hunk_consume_func_t hunk_func, void *cb_data, int xdl_opts)
+    {
+    	xpparam_t xpp = {0};
+    	xpp.flags = xdl_opts;
+    	return diff_hunks_xpp(file_a, file_b, hunk_func, cb_data, &xpp);
     }
     static const char *get_next_line(const char *start, const char *end)
@@ Expand Down Expand Up @@
     				 struct blame_origin *parent, int ignore_diffs)
     {
     	mmfile_t file_p, file_o;
+    	xpparam_t xpp = {0};
     	struct blame_chunk_cb_data d;
     	struct blame_entry *newdest = NULL;
@@ Expand All @@
     			 &sb->num_read_blob, ignore_diffs);
     	sb->num_get_patch++;
-    	if (diff_hunks(&file_p, &file_o, blame_chunk_cb, &d, sb->xdl_opts))
-    		die("unable to generate diff (%s -> %s)",
-    		    oid_to_hex(&parent->commit->object.oid),
-    		    oid_to_hex(&target->commit->object.oid));
+    	xpp.flags = sb->xdl_opts;
+    	/*
+    	 * If the diff process considers the files equivalent,
+    	 * skip the diff so blame looks past this commit.
+    	 */
+    	if (diff_process_fill_hunks(&sb->revs->diffopt, target->path,
+    				    &file_p, &file_o, &xpp)
+    	    != DIFF_PROCESS_EQUIVALENT) {
+    		if (diff_hunks_xpp(&file_p, &file_o, blame_chunk_cb,
+    				   &d, &xpp))
+    			die("unable to generate diff (%s -> %s)",
+    			    oid_to_hex(&parent->commit->object.oid),
+    			    oid_to_hex(&target->commit->object.oid));
+    	}
+    	free(xpp.external_hunks);
     	/* The rest are the same as the parent */
     	blame_chunk(&d.dstq, &d.srcq, INT_MAX, d.offset, INT_MAX, 0,
     		    parent, target, 0);
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -2213,6 +2213,13 @@ int cmd_format_patch(int argc, @@
     	if (argc > 1)
     		die(_("unrecognized argument: %s"), argv[1]);
+    	/*
+    	 * Disable diff.<driver>.process so that patches generated by
+    	 * format-patch are always based on the builtin diff algorithm
+    	 * and can be applied reliably.
+    	 */
+    	rev.diffopt.flags.no_diff_process = 1;
     	if (rev.diffopt.output_format & DIFF_FORMAT_NAME)
     		die(_("--name-only does not make sense"));
     	if (rev.diffopt.output_format & DIFF_FORMAT_NAME_STATUS)
@@ Expand Down @@

[RFC] diff: add diff.<driver>.process for external hunk providers #2120

Are you sure you want to change the base?

Uh oh!

[RFC] diff: add diff.<driver>.process for external hunk providers #2120

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

gitgitgadget Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

gitgitgadget Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

gitgitgadget Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

gitgitgadget Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

gitgitgadget Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

gitgitgadget Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

gitgitgadget Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

gitgitgadget Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

gitgitgadget Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

gitgitgadget Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

[RFC] diff: add `diff.<driver>.process` for external hunk providers #2120

[RFC] diff: add `diff.<driver>.process` for external hunk providers #2120