Skip to content

GH-3558: Bump hadoop to 3.4.3 and properly close buffers for vectored IO#3620

Closed
nastra wants to merge 2 commits into
apache:masterfrom
nastra:close-buffers
Closed

GH-3558: Bump hadoop to 3.4.3 and properly close buffers for vectored IO#3620
nastra wants to merge 2 commits into
apache:masterfrom
nastra:close-buffers

Conversation

@nastra

@nastra nastra commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Rationale for this change

This closes allocated buffers for vectored IO, because after updating to Hadoop > 3.4.x we see a bunch of failing tests where buffers are not properly closed.

Closes #3558
Closes #3356

What changes are included in this PR?

Are these changes tested?

existing tests

Are there any user-facing changes?

@nastra nastra changed the title GH-3558: Properly close buffers for vectored IO GH-3558: Bump hadoop to 3.5.0 and properly close buffers for vectored IO Jun 19, 2026
@nastra nastra changed the title GH-3558: Bump hadoop to 3.5.0 and properly close buffers for vectored IO GH-3558: Bump hadoop to 3.4.3 and properly close buffers for vectored IO Jun 19, 2026
@nastra

nastra commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

/cc @Fokko @wgtmac

@iemejia

iemejia commented Jun 20, 2026

Copy link
Copy Markdown
Member

Wondering if my approach for the same issue in https://github.com/apache/parquet-java/pull/3579/changes#diff-8da24c84aef62e6e836d073938f7843d289785baaeddf446f3afeae6d4ef4b10R1368
is slightly more robust. Also for reference in my change when I free the resources it really does not apply them because it is up to Hadoop to do it :(
apache/hadoop#8511

If anyone here knows someone who can help us get this reviewed/merged on the Hadoop side that would be great.

// allocated. We track the originals here to ensure they are properly released.
ByteBufferAllocator baseAllocator = options.getAllocator();
List<ByteBuffer> allocatedBuffers = new ArrayList<>();
ByteBufferAllocator trackingAllocator = new ByteBufferAllocator() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like adding a wrapper here just to fix something in Hadoop. @steveloughran suggested to set the config to avoid checksums on the Hadoop side: #3356 (comment) This doesn't add any value to Parquet anyway. This is more or less what I suggested in #3559

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, let's go with your PR then.

@nastra nastra closed this Jun 22, 2026
@nastra nastra deleted the close-buffers branch June 22, 2026 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Properly close buffers TestParquetReader fails with hadoop 3.4.2

3 participants