[core][filesystems] Add conditional write support for S3 and Azure ABFS#7218
[core][filesystems] Add conditional write support for S3 and Azure ABFS#7218tub wants to merge 2 commits intoapache:masterfrom
Conversation
Implements native conditional writes using Hadoop 3.4+ API (fs.createFile().overwrite(false).build()) which leverages: - S3: If-None-Match: * header (S3 conditional writes, Aug 2024) - Azure ABFS: If-None-Match: * header (existing Azure feature) Changes: - Add FileIO.supportsConditionalWrite() and tryToWriteAtomicIfAbsent() - S3FileIO: Override to use native conditional writes - AzureFileIO: Override to use native conditional writes - RenamingSnapshotCommit: Use conditional writes when available, eliminating need for metastore lock on S3/Azure This allows S3 and Azure users to run Paimon without lock.enabled=true for safe concurrent commits. Co-Authored-By: Claude <noreply@anthropic.com>
| ? snapshotManager.snapshotPath(snapshot.id()) | ||
| : snapshotManager.copyWithBranch(branch).snapshotPath(snapshot.id()); | ||
|
|
||
| // Use native conditional writes if supported |
There was a problem hiding this comment.
I'm not 100% sure if this is the right approach, or if we should rename RenamingSnapshotCommit given that it doesn't always use rename now.
There was a problem hiding this comment.
We also have IcebergCommitCallbacks here, so we need to support it there as well.
There was a problem hiding this comment.
I think overriding the method tryToWriteAtomic would be better as it is used at different places. Perhaps we can leverage the same property fs.s3a.create.conditional.enabled (https://github.com/apache/hadoop/blob/8222ac0f546911b387b6a141e06dd8bf5306d565/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/troubleshooting_s3a.md) for making a decision whether to use fallback.
|
Iterating with @junmuz on this separately, will close this out |
Purpose
Implements native conditional writes using Hadoop 3.4+ API (fs.createFile().overwrite(false).build()) which leverages
If-None-Match: *for both AWS S3 and Azure ABFS.Changes:
This allows S3 and Azure users to run Paimon without lock.enabled=true for safe concurrent commits.
Linked issue: close #6563
Builds on: #7187
Tests
API and Format
Not directly - changes default safety guarantees.
Documentation
Added documentation covering both AWS and Azure,