Skip to content

GH-3625: Compute dataSize correctly when dealing with duplicate keys#3626

Open
nastra wants to merge 1 commit into
apache:masterfrom
nastra:datasize-calc
Open

GH-3625: Compute dataSize correctly when dealing with duplicate keys#3626
nastra wants to merge 1 commit into
apache:masterfrom
nastra:datasize-calc

Conversation

@nastra

@nastra nastra commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Rationale for this change

When a Variant object is built with the same key appended more than once, VariantBuilder.endObject() deduplicates the entries and keeps the last-written value. However, the total dataSize of the object's value region was computed from each key's first occurrence rather than the value that was actually retained.

When duplicate keys had values of different encoded sizes, this size mismatch corrupted the output:

  • If the retained value was larger (e.g. {"a": 1, "a": "hello"}), the buffer was under-reserved, the offset width could be undersized, and the final offset entry understated the real data length. Reading the field then threw IllegalArgumentException: Invalid byte-array offset or returned truncated data.
  • If the retained value was smaller, the object reserved stale trailing bytes and overstated its size.

Closes #3625

What changes are included in this PR?

Are these changes tested?

yes, added tests that reproduce the underlying issue

Are there any user-facing changes?

@nastra

nastra commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

/cc @Fokko @wgtmac

}

@Test
public void testDuplicateKeysDifferentSizesAcrossMultipleKeys() {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without the fix this test would fail with java.lang.IllegalArgumentException: Invalid byte-array offset (25). length: 22

}

@Test
public void testDuplicateKeysKeptValueLarger() {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without the fix this test would fail with java.lang.IllegalArgumentException: Invalid byte-array offset (10). length: 10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dataSize is wrongly computed for duplicate keys in VariantBuilder

2 participants