Skip to content

[core] Rewrite oss tryToWriteAtomic using atomic putObject API (#8226)#8228

Open
MaxLinyun wants to merge 1 commit into
apache:masterfrom
MaxLinyun:fix/make-tryToWriteAtomic-atomic-in-oss
Open

[core] Rewrite oss tryToWriteAtomic using atomic putObject API (#8226)#8228
MaxLinyun wants to merge 1 commit into
apache:masterfrom
MaxLinyun:fix/make-tryToWriteAtomic-atomic-in-oss

Conversation

@MaxLinyun

@MaxLinyun MaxLinyun commented Jun 13, 2026

Copy link
Copy Markdown

Purpose

closes #8226

Paimon's existing OSSFileIO inherits from HadoopCompliantFileIO, with file operations implemented underneath via Hadoop's AliyunOSSFileSystem. In object storage scenarios, the default implementation of tryToWriteAtomic follows the pattern of 'writing a temporary file followed by renaming'. However, renaming on OSS is essentially a copy-then-delete process and not an atomic operation.

Rewrite the implementation of tryToWriteAtomic, and directly call the conditional write API (put-if-absent) of the OSS SDK, so as to implement the atomic 'write if not exists' semantics without relying on external locks.

Tests

Since I cannot put oss ak/sk to test case, I don't know how to write test case.

@MaxLinyun MaxLinyun force-pushed the fix/make-tryToWriteAtomic-atomic-in-oss branch from 7feed20 to 18e4e42 Compare June 13, 2026 08:49
…of the OSS SDK (apache#8226)

Paimon's existing OSSFileIO inherits from HadoopCompliantFileIO, with
file operations implemented underneath via Hadoop's AliyunOSSFileSystem.
In object storage scenarios, the default implementation of tryToWriteAtomic
follows the pattern of 'writing a temporary file followed by renaming'.
However, renaming on OSS is essentially a copy-then-delete process and
not an atomic operation.

Rewrite the implementation of tryToWriteAtomic, and directly call the
conditional write API (put-if-absent) of the OSS SDK, so as to implement
the atomic 'write if not exists' semantics without relying on external locks.
@MaxLinyun MaxLinyun force-pushed the fix/make-tryToWriteAtomic-atomic-in-oss branch from 18e4e42 to a9c77ce Compare June 13, 2026 09:39
@MaxLinyun

Copy link
Copy Markdown
Author

@JingsongLi Hi, could you help review this PR?


ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentLength(bytes.length);
metadata.setHeader("x-oss-forbid-overwrite", "true");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direct SDK upload no longer carries fs.oss.server-side-encryption-algorithm into ObjectMetadata. The Hadoop OSS write path that this replaces sets ObjectMetadata#setServerSideEncryption before putObject; with catalogs configured for OSS SSE this can either violate bucket policies that require encrypted uploads or create unencrypted metadata/version files. Could we set the same SSE header from the configured Hadoop options/store before calling putObject?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Data loss occurs when multiple Spark jobs write data concurrently

2 participants