Duplicate entry hardening by newren · Pull Request #2096 · gitgitgadget/git

newren · 2026-04-20T22:04:14Z

We had some corrupt trees with duplicate entries in real world repositories, which triggered an assertion failure in merge-ort. Further, the corrupt tree creation in the third party tool would have been avoided had verify_cache() correctly checked for D/F conflicts. Provide fixes for both issues, including 3 preparatory changes for the merge-ort fix.

Changes since v1:

Add some more detail to the commit message of Patch 1
Split some code out in Patch 5 into a helper function.

cc: Patrick Steinhardt ps@pks.im
cc: Christian Couder christian.couder@gmail.com
cc: Elijah Newren newren@gmail.com

newren · 2026-04-21T00:25:12Z

/submit

gitgitgadget · 2026-04-21T00:26:52Z

Submitted as pull.2096.git.1776731171.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-2096/newren/duplicate-entry-hardening-v1

To fetch this version to local tag pr-2096/newren/duplicate-entry-hardening-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-2096/newren/duplicate-entry-hardening-v1

gitgitgadget · 2026-04-26T00:50:36Z

This branch is now known as en/ort-harden-against-corrupt-trees.

gitgitgadget · 2026-04-26T00:50:36Z

This patch series was integrated into seen via git@5ceb988.

gitgitgadget · 2026-04-28T05:57:06Z

This patch series was integrated into seen via git@b2979e0.

gitgitgadget · 2026-05-01T22:26:42Z

This patch series was integrated into seen via git@5a8e1e0.

gitgitgadget · 2026-05-03T04:03:34Z

There was a status update in the "Cooking" section about the branch en/ort-harden-against-corrupt-trees on the Git mailing list:

"ort" merge backend handles merging corrupt trees better by
aborting when it should.

Needs review.
source: <pull.2096.git.1776731171.gitgitgadget@gmail.com>

gitgitgadget · 2026-05-04T01:50:12Z

This patch series was integrated into seen via git@62d334b.

gitgitgadget · 2026-05-09T04:08:16Z

This patch series was integrated into seen via git@6099d30.

gitgitgadget · 2026-05-09T12:13:28Z

This patch series was integrated into seen via git@1d317d6.

gitgitgadget · 2026-05-11T02:14:32Z

This patch series was integrated into seen via git@a1e3c69.

gitgitgadget · 2026-05-11T04:17:51Z

There was a status update in the "Cooking" section about the branch en/ort-harden-against-corrupt-trees on the Git mailing list:

"ort" merge backend handles merging corrupt trees better by
aborting when it should.

Needs review.
source: <pull.2096.git.1776731171.gitgitgadget@gmail.com>

gitgitgadget · 2026-05-11T05:44:56Z

This patch series was integrated into seen via git@b7807b6.

gitgitgadget · 2026-05-11T09:18:45Z

This patch series was integrated into seen via git@8cb15cc.

gitgitgadget · 2026-05-12T07:25:31Z

This patch series was integrated into seen via git@a090bdb.

gitgitgadget · 2026-05-12T10:17:37Z

There was a status update in the "Cooking" section about the branch en/ort-harden-against-corrupt-trees on the Git mailing list:

"ort" merge backend handles merging corrupt trees better by
aborting when it should.

Needs review.
source: <pull.2096.git.1776731171.gitgitgadget@gmail.com>

gitgitgadget · 2026-05-13T05:59:46Z

This patch series was integrated into seen via git@64f0aa6.

gitgitgadget · 2026-05-14T12:02:26Z

This patch series was integrated into seen via git@1300a5a.

gitgitgadget · 2026-05-15T02:16:59Z

This patch series was integrated into seen via git@b3f3685.

gitgitgadget · 2026-05-18T01:34:15Z

This patch series was integrated into seen via git@fc35baf.

gitgitgadget · 2026-05-18T05:52:32Z

There was a status update in the "Cooking" section about the branch en/ort-harden-against-corrupt-trees on the Git mailing list:

"ort" merge backend handles merging corrupt trees better by
aborting when it should.

Needs review.
source: <pull.2096.git.1776731171.gitgitgadget@gmail.com>

gitgitgadget · 2026-05-19T04:00:19Z

This patch series was integrated into seen via git@5edc804.

gitgitgadget · 2026-05-20T03:22:59Z

This patch series was integrated into seen via git@a510b62.

gitgitgadget · 2026-05-20T05:50:33Z

There was a status update in the "Cooking" section about the branch en/ort-harden-against-corrupt-trees on the Git mailing list:

"ort" merge backend handles merging corrupt trees better by
aborting when it should.

Needs review.
source: <pull.2096.git.1776731171.gitgitgadget@gmail.com>

gitgitgadget · 2026-05-21T05:00:43Z

This patch series was integrated into seen via git@3924c8d.

gitgitgadget · 2026-05-21T06:35:46Z

This patch series was integrated into seen via git@4944388.

gitgitgadget · 2026-05-21T14:14:05Z

This patch series was integrated into seen via git@c7d9b18.

gitgitgadget · 2026-05-23T09:09:57Z

This patch series was integrated into seen via git@798f6ed.

gitgitgadget · 2026-05-23T09:28:01Z

There was a status update in the "Cooking" section about the branch en/ort-harden-against-corrupt-trees on the Git mailing list:

"ort" merge backend handles merging corrupt trees better by
aborting when it should.

Needs review.
source: <pull.2096.git.1776731171.gitgitgadget@gmail.com>

gitgitgadget · 2026-06-01T12:42:50Z

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> We had some corrupt trees with duplicate entries in real world repositories,
> which triggered an assertion failure in merge-ort. Further, the corrupt tree
> creation in the third party tool would have been avoided had verify_cache()
> correctly checked for D/F conflicts. Provide fixes for both issues,
> including 3 preparatory changes for the merge-ort fix.
>
> Elijah Newren (5):
>   merge-ort: propagate callback errors from traverse_trees_wrapper()
>   merge-ort: drop unnecessary show_all_errors from collect_merge_info()
>   merge-ort: free diff pairs queue in clear_or_reinit_internal_opts()
>   merge-ort: abort merge when trees have duplicate entries
>   cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts

This is a fix to an important corner of our system, but somehow left
in "Needs review" state for much longer than I would have liked, so
even though I am officially on vacation ;-), I took some time to
read these through (by the way it was a pleasant read, thank you).

I wonder if we create a rule like

    Those of you who have more than 30 commits in our project are
    expected to review one topic (or more) from other contributors
    for every three patches you send and ask for reviews by others.

it would help balance the patch vs review ratio, perhaps?

gitgitgadget · 2026-06-01T18:28:58Z

Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Mon, Jun 01, 2026 at 09:33:10PM +0900, Junio C Hamano wrote:
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
> > We had some corrupt trees with duplicate entries in real world repositories,
> > which triggered an assertion failure in merge-ort. Further, the corrupt tree
> > creation in the third party tool would have been avoided had verify_cache()
> > correctly checked for D/F conflicts. Provide fixes for both issues,
> > including 3 preparatory changes for the merge-ort fix.
> >
> > Elijah Newren (5):
> >   merge-ort: propagate callback errors from traverse_trees_wrapper()
> >   merge-ort: drop unnecessary show_all_errors from collect_merge_info()
> >   merge-ort: free diff pairs queue in clear_or_reinit_internal_opts()
> >   merge-ort: abort merge when trees have duplicate entries
> >   cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts
> 
> This is a fix to an important corner of our system, but somehow left
> in "Needs review" state for much longer than I would have liked, so
> even though I am officially on vacation ;-), I took some time to
> read these through (by the way it was a pleasant read, thank you).

Honestly, I always shy away from the merge-related subsystems. It has a
lot of subtleties that I don't have any experience with, so I never
really consider my input to be helpful here.

> I wonder if we create a rule like
> 
>     Those of you who have more than 30 commits in our project are
>     expected to review one topic (or more) from other contributors
>     for every three patches you send and ask for reviews by others.

Heh, that would make me condense patch series into fewer patches ;)

> it would help balance the patch vs review ratio, perhaps?

It's a good question. I typically try to aim for reviewing series on the
mailing list at least every second day, and I always encourage other
folks in my team to do the same. But recently I (well, rather we)
haven't really been able to due to the current situation at GitLab,
which forces us to put almost all of our focus towards a different
project for a while.

Overall I agree that everyone who is a core contributor should also make
reviews part of their regular worflow. At least for corporate
contributors that might also make it easier to communicate this to their
respective employers. Regardless of that, my expectation is that there
will be times where it works well, and other times where it works less
well.

Patrick

gitgitgadget · 2026-06-01T18:29:00Z

User Patrick Steinhardt <ps@pks.im> has been added to the cc: list.

gitgitgadget · 2026-06-02T03:52:41Z

This patch series was integrated into seen via git@db3b688.

gitgitgadget · 2026-06-02T10:24:06Z

This patch series was integrated into seen via git@1587470.

gitgitgadget · 2026-06-02T15:02:43Z

There was a status update in the "Cooking" section about the branch en/ort-harden-against-corrupt-trees on the Git mailing list:

"ort" merge backend handles merging corrupt trees better by
aborting when it should.

Waiting for response(s) to review comment(s).
cf. <xmqqldcy4f07.fsf@gitster.g>
source: <pull.2096.git.1776731171.gitgitgadget@gmail.com>

gitgitgadget · 2026-06-03T11:22:17Z

This patch series was integrated into seen via git@86c54bb.

gitgitgadget · 2026-06-04T02:40:42Z

This patch series was integrated into seen via git@ecefd6e.

gitgitgadget · 2026-06-04T05:15:03Z

There was a status update in the "Cooking" section about the branch en/ort-harden-against-corrupt-trees on the Git mailing list:

"ort" merge backend handles merging corrupt trees better by
aborting when it should.

Waiting for response(s) to review comment(s).
cf. <xmqqldcy4f07.fsf@gitster.g>
source: <pull.2096.git.1776731171.gitgitgadget@gmail.com>

gitgitgadget · 2026-06-09T02:37:32Z

There was a status update in the "Cooking" section about the branch en/ort-harden-against-corrupt-trees on the Git mailing list:

"ort" merge backend handles merging corrupt trees better by
aborting when it should.

Waiting for response(s) to review comment(s).
cf. <xmqqldcy4f07.fsf@gitster.g>
source: <pull.2096.git.1776731171.gitgitgadget@gmail.com>

gitgitgadget · 2026-06-11T18:30:02Z

There was a status update in the "Cooking" section about the branch en/ort-harden-against-corrupt-trees on the Git mailing list:

"ort" merge backend handles merging corrupt trees better by
aborting when it should.

Waiting for response(s) to review comment(s).
cf. <xmqqldcy4f07.fsf@gitster.g>
source: <pull.2096.git.1776731171.gitgitgadget@gmail.com>

gitgitgadget · 2026-06-12T11:48:26Z

This patch series is no longer integrated into seen.

gitgitgadget · 2026-06-12T14:28:35Z

This patch series was integrated into seen via git@3565a90.

gitgitgadget · 2026-06-12T15:49:42Z

Christian Couder wrote on the Git mailing list (how to reply to this email):

On Tue, Jun 2, 2026 at 8:16 AM Patrick Steinhardt <ps@pks.im> wrote:
>
> On Mon, Jun 01, 2026 at 09:33:10PM +0900, Junio C Hamano wrote:

> > This is a fix to an important corner of our system, but somehow left
> > in "Needs review" state for much longer than I would have liked, so
> > even though I am officially on vacation ;-), I took some time to
> > read these through (by the way it was a pleasant read, thank you).
>
> Honestly, I always shy away from the merge-related subsystems. It has a
> lot of subtleties that I don't have any experience with, so I never
> really consider my input to be helpful here.
>
> > I wonder if we create a rule like
> >
> >     Those of you who have more than 30 commits in our project are
> >     expected to review one topic (or more) from other contributors
> >     for every three patches you send and ask for reviews by others.
>
> Heh, that would make me condense patch series into fewer patches ;)
>
> > it would help balance the patch vs review ratio, perhaps?
>
> It's a good question. I typically try to aim for reviewing series on the
> mailing list at least every second day, and I always encourage other
> folks in my team to do the same. But recently I (well, rather we)
> haven't really been able to due to the current situation at GitLab,
> which forces us to put almost all of our focus towards a different
> project for a while.
>
> Overall I agree that everyone who is a core contributor should also make
> reviews part of their regular worflow. At least for corporate
> contributors that might also make it easier to communicate this to their
> respective employers. Regardless of that, my expectation is that there
> will be times where it works well, and other times where it works less
> well.

Sashiko (https://github.com/sashiko-dev/sashiko) is used these days by
Linux kernel developers and seems to work well for them.

At GitLab and probably in other companies, some of us also use AI to
review our work before sending it to the mailing list. And yeah, it
helps find issues before our patches reach the mailing list.

In the same way as we require that patches must pass CI, do we want to
require that patches "pass" an AI review before they get accepted?

The benefit would be that it would hopefully catch a lot of trivial
things like indentation, typos/grammos, etc, and a lot of things a bit
more difficult to spot like memory issues. Perhaps with some amount of
prompting/configuration (for example pointing it at our
CodingGuidelines and SubmittingPatches) it could also catch issues
like style issues, commits that do too many things, refactoring
opportunities, etc.

We would likely still require at least one human review (by someone
who is not the maintainer) to validate architectural decisions, to
make sure it goes in the same direction as other efforts, and perhaps
also to make sure that AI suggestions were properly handled by the
patch author.

If we decide to require it, then there are a lot of questions that we
will have to answer.

Do we want to have our own system somehow managed by us or would we be
happy to use existing systems already in place in some companies as
long as we can still tweak them in some ways, like the current CI
systems we use?

If we use existing systems likely at GitLab and GitHub, it might be
more difficult to get coherent results as they might use different
LLMs, but maybe it could help tighten our docs to make sure everyone
is aligned, and we could get better reviews by using multiple systems
because an LLM might find an issue that the other LLM missed.

Do we want an AI review right after a patch is posted or only if there
is no human review in the next X days?

Also what if the AI makes a long concrete suggestion to improve on the
patches? Could that be incompatible with our AI policy to apply it?
Should we try to prevent the AI from making such a suggestion in the
first place?

I haven't looked at how Sashiko is used for the kernel, but maybe
there will need to be some kinds of restrictions/authentications to
avoid potential abuse.

gitgitgadget · 2026-06-12T15:49:44Z

User Christian Couder <christian.couder@gmail.com> has been added to the cc: list.

gitgitgadget · 2026-06-12T20:08:28Z

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

Christian Couder <christian.couder@gmail.com> writes:

> On Tue, Jun 2, 2026 at 8:16 AM Patrick Steinhardt <ps@pks.im> wrote:
>>
>> Overall I agree that everyone who is a core contributor should also make
>> reviews part of their regular worflow. At least for corporate
>> contributors that might also make it easier to communicate this to their
>> respective employers. Regardless of that, my expectation is that there
>> will be times where it works well, and other times where it works less
>> well.
>
> Sashiko (https://github.com/sashiko-dev/sashiko) is used these days by
> Linux kernel developers and seems to work well for them.
>
> At GitLab and probably in other companies, some of us also use AI to
> review our work before sending it to the mailing list. And yeah, it
> helps find issues before our patches reach the mailing list.
>
> In the same way as we require that patches must pass CI, do we want to
> require that patches "pass" an AI review before they get accepted?

I do not think so.  You (figuratively, not limited to Christian
Couder) are welcome to use whatever tool available to you to help
you polish your submission, and the higher quality your patches are
(e.g., fewer typos and jumps in logic flow that interferes the
thought process of human reviewers), the more helpful you are being
to the community.  The use of GitHub PR initiated CI run falls into
the same category, I think, in that we do not require you to have an
account and trigger the CI there, but you are doing a good service
if you made sure you caught breakages on macOS you do not have
access to otherwise before sending your patches to the list.

But I do not think we should require you to bring your own token
budget to be able to contribute.

> The benefit would be that it would hopefully catch a lot of trivial
> things like indentation, typos/grammos, etc, and a lot of things a bit
> more difficult to spot like memory issues. Perhaps with some amount of
> prompting/configuration (for example pointing it at our
> CodingGuidelines and SubmittingPatches) it could also catch issues
> like style issues, commits that do too many things, refactoring
> opportunities, etc.

Yes.

Similarly, you are welcome to use tools including AI tools to help
you review others' patches, or help sanity check your reviews of
others' patches before you send them out.  The reason why such an
effort is valuable to the community is the same.

But I personally consider that the use of the tools (not limited to
AI tools) is up to each developer.  What counts a lot more is the
quality of the output.  Just like PR driven CI at GitHub is offered
to everybody who wants to participate and is willing to have an
account there, it may help those aspiring developers if automated
review services are made easily available, but it is a different
story to _require_ use of such service.

traverse_trees_wrapper() saves entries from a first pass through traverse_trees() and then replays them through the real callback (collect_merge_info_callback). However, the replay loop silently discards the callback return value. This is not a deferred error; it is an ignored error. Today the only originator of a negative return in this entire call graph is traverse_trees()'s "exceeded maximum allowed tree depth" check; everything else (collect_merge_info_callback, traverse_trees_wrapper, the inner traverse_trees recursion) only relays that. So in current Git, the visible effect of dropping the replay callback's return value is narrow but bad: a tree nested past core.maxTreeDepth has its -1 swallowed, the subtree below the limit is silently pruned, and the merge completes as if that were the correct result. A later patch in this series will teach collect_merge_info_callback() to return -1 on an additional path -- detecting duplicate entries in malformed trees -- which is similarly handled today by just ignoring the problem (resulting in mostly a "last one wins" rule, though the non-last entry can mutate various state flags). Capture the return value, stop the loop on negative returns, and propagate the error to the caller. The callback returns a positive mask value on success, so normalize non-negative returns to 0 for the caller. Signed-off-by: Elijah Newren <newren@gmail.com>

collect_merge_info() has set info.show_all_errors = 1 since d2bc199 (merge-ort: implement a very basic collect_merge_info(), 2020-12-13). This setting was copied from unpack-trees.c where it controls batching of error messages for porcelain display, but merge-ort has no such error-batching logic and never needed it. With show_all_errors set, traverse_trees() captures a negative callback return but continues processing remaining entries rather than stopping immediately. Removing the setting restores the default behavior where a negative return from collect_merge_info_callback() breaks out of the traversal loop right away, allowing a future commit to exit early when a corrupt tree is detected. Signed-off-by: Elijah Newren <newren@gmail.com>

clear_or_reinit_internal_opts() is responsible for cleaning up the various data structures in merge_options_internal. It already handles many renames-related structures (dirs_removed, dir_renames, relevant_sources, cached_pairs, deferred, etc.) but does not free renames->pairs[].queue. In the normal code path, resolve_and_process_renames() frees pairs[s].queue and reinitializes it with diff_queue_init() before clear_or_reinit_internal_opts() runs, so the omission is harmless. However, if collect_merge_info() encounters an error and returns early (before resolve_and_process_renames() is ever called), any diff pairs already queued by collect_rename_info()/add_pair() will have their backing array leaked. Fix this by freeing renames->pairs[].queue in the cleanup function. In the normal path the pointer is already NULL (from the earlier diff_queue_init() in resolve_and_process_renames()), so free(NULL) is a safe no-op. Signed-off-by: Elijah Newren <newren@gmail.com>

Trees with duplicate entries are malformed; fsck reports "contains duplicate file entries" for them. merge-ort has from the beginning assumed that we would never hit such trees. It was written with the assumption that traverse_trees() calls collect_merge_info_callback() at most once per path. The "sanity checks" in that callback (added in d2bc199 (merge-ort: implement a very basic collect_merge_info(), 2020-12-13)) verify properties of each individual call but not that invariant. The strmap_put() in setup_path_info() silently overwrites the entry from any prior call for the same path, because it assumed there would be no other path. Unfortunately, supplemental data structures for various optimizations could still be tweaked before the extra paths were overwritten, and those data structures not matching expected state could trip various assertions. Change the return type of setup_path_info() from void to int to allow us to detect this case, and abort the merge with a clear error message when it occurs. Signed-off-by: Elijah Newren <newren@gmail.com>

verify_cache() checks that the index does not contain both "path" and "path/file" before writing a tree. It does this by comparing only adjacent entries, relying on the assumption that "path/file" would immediately follow "path" in sorted order. Unfortunately, this assumption does not always hold. For example: docs <-- submodule entry docs-internal/README.md <-- intervening entry docs/requirements.txt <-- D/F conflict, NOT adjacent to "docs" When this happens, verify_cache() silently misses the D/F conflict and write-tree produces a corrupt tree object containing duplicate entries (one for the submodule "docs" and one for the tree "docs"). I could not find any caller in current git that both allows the index to get into this state and then tries to write it out without doing other checks beyond the verify_cache() call in cache_tree_update(), but verify_cache() is documented as a safety net for preventing corrupt trees and should actually provide that guarantee. A downstream consumer that relied solely on cache_tree_update()'s internal checking via verify_cache() to prevent duplicate tree entries was bitten by the gap. Add a test that constructs a corrupt index directly (bypassing the D/F checks in add_index_entry) and verifies that write-tree now rejects it. Signed-off-by: Elijah Newren <newren@gmail.com>

gitgitgadget · 2026-06-14T06:20:32Z

User Elijah Newren <newren@gmail.com> has been added to the cc: list.

newren · 2026-06-14T06:36:25Z

/submit

gitgitgadget · 2026-06-14T06:37:50Z

Submitted as pull.2096.v2.git.1781419047.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-2096/newren/duplicate-entry-hardening-v2

To fetch this version to local tag pr-2096/newren/duplicate-entry-hardening-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-2096/newren/duplicate-entry-hardening-v2

gitgitgadget · 2026-06-14T16:05:57Z

@@ -161,6 +161,54 @@ void cache_tree_invalidate_path(struct index_state *istate, const char *path)
 		istate->cache_changed |= CACHE_TREE_CHANGED;


Junio C Hamano wrote on the Git mailing list (how to reply to this email):

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > diff --git a/cache-tree.c b/cache-tree.c > index 7881b42aa2..4d2669b312 100644 > --- a/cache-tree.c > +++ b/cache-tree.c > @@ -161,6 +161,54 @@ void cache_tree_invalidate_path(struct index_state *istate, const char *path) > istate->cache_changed |= CACHE_TREE_CHANGED; > } > > +/* > + * Check whether this_ce and the next entry in the index form a D/F > + * conflict ("path" vs "path/file"). Returns the conflicting "path/..." > + * name when one is found, or NULL otherwise. > + * > + * The cache is sorted, so "path/file" sorts after "path" and the > + * conflict is usually visible as adjacent entries. But other entries > + * can sort between them -- e.g. "path-internal" sits between "path" > + * and "path/file" because '-' (0x2D) precedes '/' (0x2F) -- so when > + * the immediately following entry shares our prefix but starts with a > + * character that sorts before '/', binary search for "path/" instead. > + */ > +static const char *find_df_conflict(struct index_state *istate, > + const struct cache_entry *this_ce, > + const struct cache_entry *next_ce) > +{ > + const char *this_name = this_ce->name; > + const char *next_name = next_ce->name; > + int this_len = ce_namelen(this_ce); > + const struct cache_entry *other; > + struct strbuf probe = STRBUF_INIT; > + int pos; > + if (this_len >= ce_namelen(next_ce) || > + next_name[this_len] > '/' || > + strncmp(this_name, next_name, this_len)) > + return NULL; First we reject an abvious "cannot conflict" case. If the current one is not strictly shorter than the next one, it cannot be an overlapping with one of the leading directories in the next one and the next one cannot be "hiding" an overlapping one in the "docs, docs-i, docs/r" relationship described in the log message (where next=="docs-i' hides the fact that this=="docs" conflicts with "docs/r"). Of course, the current one cannot be such a confliciting entry, if an earlier part of the next name does not entirely match the current name. In addition, if the byte after the matching leading part in the next name is larger than '/', any entry that comes later than the next name cannot have '/' at that byte position. Makes sense. > + if (next_name[this_len] == '/') > + return next_name; And if we see '/' there, we definitely have conflict right there. So these trivial cases after us, we need to see if an entry that comes after next has a name whose leading part is identical to this name, plus '/'. How would we compute it? > + strbuf_add(&probe, this_name, this_len); > + strbuf_addch(&probe, '/'); > + pos = index_name_pos_sparse(istate, probe.buf, probe.len); > + strbuf_release(&probe); We see where the first of such a name (i.e. this name followed by '/') may appear. Normal cache entries in the index would never have a trailing slash in their names, so I expect that this would either find an directory that is sparsed out and returns a non-negative index into the .cache[] array, or a negative index that indicates where a name like this + '/' + anything would appear. > + > + if (pos < 0) > + pos = -pos - 1; So in order to check both cases, we do the usual index flipping. We do not need to to treat "it exists---that's sparse dir" case and "it is expanded and we are trying to find the first entry in that index" case differently below, which simplifies things a bit. > + if (pos >= (int)istate->cache_nr) > + return NULL; OK, such an entry whose name is this followed by '/' does not exist so we are safe. > + other = istate->cache[pos]; > + if (ce_namelen(other) > this_len && > + other->name[this_len] == '/' && > + !strncmp(this_name, other->name, this_len)) > + return other->name; > + return NULL; This looks very similar to the inverse of the earlier one but slightly different. It might be easier to spot where it differs, if we flipped the polarity, like so, perhaps? other = istate->cache[pos]; if (this_len >= ce_namelen(other) || other->name[this_len] != '/' || strncmp(this_name, other->name, this_len)) return NULL; return other->name; Or perhaps not. I do not care very strongly, but I only spotted the subtle difference while attempting to flip the polarity in my head, so it might help ther readers the same way. I dunno. Thanks, looking very good.

gitgitgadget · 2026-06-15T20:34:00Z

There was a status update in the "Cooking" section about the branch en/ort-harden-against-corrupt-trees on the Git mailing list:

"ort" merge backend handles merging corrupt trees better by
aborting when it should.

Waiting for response(s) to review comment(s).
cf. <xmqq5x3ldu4h.fsf@gitster.g>
source: <pull.2096.v2.git.1781419047.gitgitgadget@gmail.com>

gitgitgadget Bot added the seen label Apr 26, 2026

gitgitgadget Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread cache-tree.c

gitgitgadget Bot removed the seen label Jun 12, 2026

gitgitgadget Bot added the seen label Jun 12, 2026

newren added 5 commits June 13, 2026 20:26

newren force-pushed the duplicate-entry-hardening branch from a87bbaa to cf50f1a Compare June 14, 2026 03:54

gitgitgadget Bot reviewed Jun 14, 2026

View reviewed changes

		@@ -161,6 +161,54 @@ void cache_tree_invalidate_path(struct index_state istate, const char path)
		istate->cache_changed \|= CACHE_TREE_CHANGED;

Conversation

newren commented Apr 20, 2026 • edited by gitgitgadget Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

newren commented Apr 21, 2026

Uh oh!

gitgitgadget Bot commented Apr 21, 2026

Uh oh!

gitgitgadget Bot commented Apr 26, 2026

Uh oh!

gitgitgadget Bot commented Apr 26, 2026

Uh oh!

gitgitgadget Bot commented Apr 28, 2026

Uh oh!

gitgitgadget Bot commented May 1, 2026

Uh oh!

gitgitgadget Bot commented May 3, 2026

Uh oh!

gitgitgadget Bot commented May 4, 2026

Uh oh!

gitgitgadget Bot commented May 9, 2026

Uh oh!

gitgitgadget Bot commented May 9, 2026

Uh oh!

gitgitgadget Bot commented May 11, 2026

Uh oh!

gitgitgadget Bot commented May 11, 2026

Uh oh!

gitgitgadget Bot commented May 11, 2026

Uh oh!

gitgitgadget Bot commented May 11, 2026

Uh oh!

gitgitgadget Bot commented May 12, 2026

Uh oh!

gitgitgadget Bot commented May 12, 2026

Uh oh!

gitgitgadget Bot commented May 13, 2026

Uh oh!

gitgitgadget Bot commented May 14, 2026

Uh oh!

gitgitgadget Bot commented May 15, 2026

Uh oh!

gitgitgadget Bot commented May 18, 2026

Uh oh!

gitgitgadget Bot commented May 18, 2026

Uh oh!

gitgitgadget Bot commented May 19, 2026

Uh oh!

gitgitgadget Bot commented May 20, 2026

Uh oh!

gitgitgadget Bot commented May 20, 2026

Uh oh!

gitgitgadget Bot commented May 21, 2026

Uh oh!

gitgitgadget Bot commented May 21, 2026

Uh oh!

gitgitgadget Bot commented May 21, 2026

Uh oh!

gitgitgadget Bot commented May 23, 2026

Uh oh!

gitgitgadget Bot commented May 23, 2026

Uh oh!

gitgitgadget Bot commented Jun 1, 2026

Uh oh!

Uh oh!

gitgitgadget Bot commented Jun 1, 2026

Uh oh!

gitgitgadget Bot commented Jun 1, 2026

Uh oh!

gitgitgadget Bot commented Jun 2, 2026

Uh oh!

gitgitgadget Bot commented Jun 2, 2026

Uh oh!

gitgitgadget Bot commented Jun 2, 2026

Uh oh!

gitgitgadget Bot commented Jun 3, 2026

Uh oh!

gitgitgadget Bot commented Jun 4, 2026

Uh oh!

gitgitgadget Bot commented Jun 4, 2026

newren commented Apr 20, 2026 •

edited by gitgitgadget Bot

Loading