Skip to content

fix(observability): do not hold span guards across await points#25482

Merged
gwenaskell merged 5 commits into
masterfrom
yoenn.burban/OPA-5380-avoid-span-enter-in-async-code
May 26, 2026
Merged

fix(observability): do not hold span guards across await points#25482
gwenaskell merged 5 commits into
masterfrom
yoenn.burban/OPA-5380-avoid-span-enter-in-async-code

Conversation

@gwenaskell
Copy link
Copy Markdown
Contributor

@gwenaskell gwenaskell commented May 21, 2026

Summary

In the topology builder, the components builder functions enter a tracing span, which carries the tags that will automatically be injected into the metrics registered at build time below. This entered span guard is held while calling the inner source build function, which is asynchronous. Holding an entered span across an await point is strongly discouraged in tracing docs: if the awaited future is non-trivial and actually yields to the tokio runtime, the entered span guard is leaked on the current thread's stack and lost if the task is later resumed on a different thread.

What this means concretely: any source, transform or sink which build method actually performs async work that pauses it temporarily will very likely loose all component tags on the metrics registered at build time within that source / sink after the await point.
This includes (the list may not be exhaustive):

  • all sinks configured with a disk buffer (which often yields during initialization due to IO work)
  • sources/transforms/sinks like gcp_pubsub and microsoft_sentinel, which perform IO work for authentication at build time

Vector configuration

Our E2E tests flagged missing metrics for gcp_pubsub, microsoft_sentinel, and sinks using disk buffers. utilization was frequently not reported.

How did you test this PR?

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.

…it. The test still works because tokio:: test runs single-threaded by default, but better fixing it than silencing the clippy error
@github-actions github-actions Bot added domain: topology Anything related to Vector's topology code domain: sources Anything related to the Vector's sources labels May 21, 2026
@gwenaskell gwenaskell changed the title fix(tracing): do not hold span guards across await points fix(observability): do not hold span guards across await points May 21, 2026
@gwenaskell gwenaskell marked this pull request as ready for review May 22, 2026 07:44
@gwenaskell gwenaskell requested a review from a team as a code owner May 22, 2026 07:44
@gwenaskell gwenaskell requested a review from bruceg May 22, 2026 13:22
Copy link
Copy Markdown
Member

@bruceg bruceg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, and has a side benefit of reducing the complexity of those long topology builder functions.

Comment thread src/sources/internal_logs.rs Outdated
Comment thread src/sources/internal_logs.rs Outdated
Comment thread src/topology/builder.rs
@gwenaskell gwenaskell enabled auto-merge May 26, 2026 07:56
@gwenaskell gwenaskell added this pull request to the merge queue May 26, 2026
Merged via the queue into master with commit 3df9e43 May 26, 2026
82 checks passed
@gwenaskell gwenaskell deleted the yoenn.burban/OPA-5380-avoid-span-enter-in-async-code branch May 26, 2026 09:03
@github-actions github-actions Bot locked and limited conversation to collaborators May 26, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

domain: sources Anything related to the Vector's sources domain: topology Anything related to Vector's topology code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants