Skip to content

[#10741] fix(hive): Fix keytab symlink TOCTOU race in FetchFileUtils#10742

Merged
yuqi1129 merged 3 commits intoapache:mainfrom
diqiu50:upstream/hive-keytab-toctou-fix
Apr 15, 2026
Merged

[#10741] fix(hive): Fix keytab symlink TOCTOU race in FetchFileUtils#10742
yuqi1129 merged 3 commits intoapache:mainfrom
diqiu50:upstream/hive-keytab-toctou-fix

Conversation

@diqiu50
Copy link
Copy Markdown
Contributor

@diqiu50 diqiu50 commented Apr 10, 2026

What changes were proposed in this pull request?

Fix the file: scheme handler in FetchFileUtils to synchronize symlink creation and avoid concurrent races. Replace the non-atomic exists() + delete() in KerberosClient with Files.deleteIfExists(). Remove the unused duplicate FetchFileUtils in catalog-hive and add comprehensive tests in hive-metastore-common.

Why are the changes needed?

Files.createSymbolicLink() throws FileAlreadyExistsException when multiple threads concurrently fetch the same keytab path, such as when the search event listener triggers cascade sync after catalog creation.

Fix: #10741

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • added unit coverage in TestFetchFileUtils including concurrent, idempotent, and target-replacement scenarios

Copilot AI review requested due to automatic review settings April 10, 2026 11:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a concurrency bug in Hive Kerberos keytab fetching by making file: URI handling in FetchFileUtils resilient to concurrent symlink creation, and by removing duplicated Hive fetch utilities/tests in catalog-hive while adding coverage in hive-metastore-common.

Changes:

  • Synchronize file: symlink creation in catalogs/hive-metastore-common FetchFileUtils to avoid concurrent FileAlreadyExistsException.
  • Replace exists() + delete() with Files.deleteIfExists() in KerberosClient.
  • Remove the duplicate FetchFileUtils and its tests from catalog-hive, and add new unit tests under hive-metastore-common.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
catalogs/hive-metastore-common/src/main/java/org/apache/gravitino/hive/kerberos/FetchFileUtils.java Adds synchronization + delete/recreate behavior for file: symlink creation under concurrency.
catalogs/hive-metastore-common/src/main/java/org/apache/gravitino/hive/kerberos/KerberosClient.java Switches to Files.deleteIfExists() prior to fetching keytab.
catalogs/hive-metastore-common/src/test/java/org/apache/gravitino/hive/kerberos/TestFetchFileUtils.java Adds tests for local symlink behavior + concurrency + replacement scenarios (and a disabled HTTP test).
catalogs/catalog-hive/src/main/java/org/apache/gravitino/catalog/hive/FetchFileUtils.java Removes duplicated implementation in catalog-hive.
catalogs/catalog-hive/src/test/java/org/apache/gravitino/catalog/hive/TestFetchFileUtils.java Removes duplicated tests in catalog-hive.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 10, 2026

Code Coverage Report

Overall Project 65.18% +0.02% 🟢
Files changed 77.88% 🟢

Module Coverage
aliyun 1.73% 🔴
api 47.14% 🟢
authorization-common 85.96% 🟢
aws 1.1% 🔴
azure 2.6% 🔴
catalog-common 10.2% 🔴
catalog-fileset 80.02% 🟢
catalog-hive 81.83% 🟢
catalog-jdbc-clickhouse 79.06% 🟢
catalog-jdbc-common 42.89% 🟢
catalog-jdbc-doris 80.28% 🟢
catalog-jdbc-hologres 54.03% 🟢
catalog-jdbc-mysql 79.23% 🟢
catalog-jdbc-oceanbase 78.38% 🟢
catalog-jdbc-postgresql 82.05% 🟢
catalog-jdbc-starrocks 78.27% 🟢
catalog-kafka 77.01% 🟢
catalog-lakehouse-generic 45.07% 🟢
catalog-lakehouse-hudi 79.1% 🟢
catalog-lakehouse-iceberg 87.16% 🟢
catalog-lakehouse-paimon 77.71% 🟢
catalog-model 77.72% 🟢
cli 44.51% 🟢
client-java 77.63% 🟢
common 49.35% 🟢
core 81.41% 🟢
filesystem-hadoop3 76.97% 🟢
flink 40.55% 🟢
flink-runtime 0.0% 🔴
gcp 14.2% 🔴
hadoop-common 10.39% 🔴
hive-metastore-common 46.14% +2.07% 🟢
iceberg-common 50.73% 🟢
iceberg-rest-server 65.82% 🟢
integration-test-common 0.0% 🔴
jobs 66.17% 🟢
lance-common 23.88% 🔴
lance-rest-server 57.84% 🟢
lineage 53.02% 🟢
optimizer 82.87% 🟢
optimizer-api 21.95% 🔴
server 85.89% 🟢
server-common 70.38% 🟢
spark 32.79% 🔴
spark-common 39.09% 🔴
trino-connector 33.83% 🔴
Files
Module File Coverage
hive-metastore-common KerberosClient.java 82.67% 🟢
FetchFileUtils.java 65.52% 🟢

@yuqi1129
Copy link
Copy Markdown
Contributor

What's the meaning of TOCTOU in the title?

@diqiu50
Copy link
Copy Markdown
Contributor Author

diqiu50 commented Apr 13, 2026

TOCTOU is Time Of Check To Time Of Use

- Replace String.intern() locking with a ConcurrentHashMap lock map keyed
  by normalized absolute destination path to avoid races from different
  path spellings of the same file
- Replace delete-then-create with an atomic tmp-symlink + rename
  (REPLACE_EXISTING) to eliminate the window where the keytab path is absent
- Skip symlink creation when the existing link already points to the
  correct target (idempotent fast path)
- Wrap ExecutorService in try/finally with shutdownNow in the concurrent test
- Remove permanently-disabled testDownloadFromHTTP (external network dependency)
@diqiu50 diqiu50 force-pushed the upstream/hive-keytab-toctou-fix branch from 23a4407 to adf6fe8 Compare April 13, 2026 06:51
@diqiu50 diqiu50 requested review from Copilot and yuqi1129 April 13, 2026 06:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

* file. Keyed by the normalized absolute destination path string to avoid races caused by
* different path spellings referring to the same file.
*/
private static final ConcurrentHashMap<String, Object> SYMLINK_LOCKS = new ConcurrentHashMap<>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will it be clear?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close at the Kerberos client closed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not see related close logic on it. As it's a static field, I believe it will only be clear when the classloader is closed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a function removeLock, which is called when the Kerberos client closes.

- Add removeLock() to FetchFileUtils and call it in KerberosClient.close()
  so SYMLINK_LOCKS entries are cleared when the catalog is closed
- Remove pre-delete in saveKeyTabFileFromUri; FetchFileUtils handles
  idempotent replacement atomically for all schemes
- Clarify comment that same-directory rename is effectively atomic on
  common local filesystems
- Add 30s timeouts to barrier.await() and future.get() in the concurrency test
@diqiu50 diqiu50 self-assigned this Apr 14, 2026
@yuqi1129 yuqi1129 added branch-1.0 Automatically cherry-pick commit to branch-1.0 branch-1.2 Automatically cherry-pick commit to branch-1.2 and removed branch-1.0 Automatically cherry-pick commit to branch-1.0 labels Apr 15, 2026
@yuqi1129 yuqi1129 merged commit 89ade4f into apache:main Apr 15, 2026
29 checks passed
github-actions bot pushed a commit that referenced this pull request Apr 15, 2026
…10742)

### What changes were proposed in this pull request?

Fix the `file:` scheme handler in `FetchFileUtils` to synchronize
symlink creation and avoid concurrent races. Replace the non-atomic
`exists() + delete()` in `KerberosClient` with `Files.deleteIfExists()`.
Remove the unused duplicate `FetchFileUtils` in `catalog-hive` and add
comprehensive tests in `hive-metastore-common`.

### Why are the changes needed?

`Files.createSymbolicLink()` throws `FileAlreadyExistsException` when
multiple threads concurrently fetch the same keytab path, such as when
the search event listener triggers cascade sync after catalog creation.

Fix: #10741

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

- added unit coverage in `TestFetchFileUtils` including concurrent,
idempotent, and target-replacement scenarios
yuqi1129 pushed a commit that referenced this pull request Apr 15, 2026
…CTOU race in FetchFileUtils (#10742) (#10786)

**Cherry-pick Information:**
- Original commit: 89ade4f
- Target branch: `branch-1.2`
- Status: ✅ Clean cherry-pick (no conflicts)

Co-authored-by: Yuhui <hui@datastrato.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

branch-1.2 Automatically cherry-pick commit to branch-1.2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug report] Concurrent keytab symlink creation causes FileAlreadyExistsException in Hive Kerberos

3 participants