Fix S5 shutdown teardown: observe CloseAsync outcome + bound the wait (EventHub/ServiceBus)#203
Merged
Merged
Conversation
Reproduces the confirmed S5 defects via instrumented IEventHubService / ICloudServiceBus fakes that record the order of batch sends relative to CloseAsync and can make CloseAsync slow and/or throw: - Shutdown_WhenCloseAsyncThrows_IsLoggedAndDoesNotPropagate: a CloseAsync that faults after the hard 500ms wait becomes a silent unobserved task exception (not logged) today. FAILS pre-fix. - Shutdown_SlowCloseIsAwaitedNotAbandoned: an 800ms close is abandoned by the 500ms cap; for ServiceBus this also leaks the undisposed client. FAILS pre-fix. - Shutdown_FlushesQueuedEventsBeforeClosingConnection: invariant guard - NLog drains the queue before CloseTarget, so nothing is sent over a closed connection. PASSES pre-fix (the audit's "log loss" premise is disproven; the reorder in the fix is defensive). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ose, bound the wait
EventHub and ServiceBus CloseTarget did:
var task = Task.Run(() => _svc.CloseAsync());
task.Wait(TimeSpan.FromMilliseconds(500)); // bool result discarded
base.CloseTarget();
Reproduction (instrumented fakes + LogFactory) showed two real defects and
disproved a third:
1. Unobserved task exception: a CloseAsync that faults AFTER the 500ms wait
times out becomes a silent unobserved TaskException - never logged. The
discard also hides faults that complete within the window.
2. Abandoned teardown: the hard 500ms cap abandons a legitimately slow close
mid-flight. For ServiceBus the abandoned step is _client.DisposeAsync(),
leaking the undisposed client.
3. "Shutdown log loss" (the audit premise): NOT reproduced. NLog drains the
queue (WriteAsyncTask) before CloseTarget runs, so events are never sent
over a closed connection in any standard shutdown path.
Fix (symmetric across both targets):
- Call base.CloseTarget() FIRST so NLog drains any remaining queued events
over the still-open connection, then close the connection.
- Keep the synchronous sync-over-async bridge but make it correct: a faulted
close that completes within the wait is observed and logged via
InternalLogger; on timeout a fault-observing continuation is attached so a
later fault is never an unobserved task exception, and a warning is logged.
- CloseTarget never throws.
- Replace the arbitrary 500ms cap with a 5s bound generous enough for a
healthy-but-slow teardown; because the outcome is always observed, the
bound governs only how long shutdown blocks, never correctness.
Turns the regression tests added in the previous commit green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes shutdown teardown behavior for EventHubTarget and ServiceBusTarget by ensuring connection close is performed after NLog has drained queued events, and by ensuring CloseAsync() outcomes are always observed (avoiding unobserved task exceptions) while keeping shutdown blocking bounded.
Changes:
- Reorders shutdown to
base.CloseTarget()first, then performs a bounded wait for connection close, with async fault observation on timeout. - Adds a 5s close timeout and InternalLogger warnings for close failures/timeouts (without throwing from
CloseTarget). - Adds new regression tests and instrumented fakes to validate flush-before-close ordering, slow close awaiting, and close-fault logging behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/NLog.Extensions.AzureEventHub/EventHubTarget.cs |
Reorders shutdown and adds bounded/observed connection close logic for EventHub. |
src/NLog.Extensions.AzureServiceBus/ServiceBusTarget.cs |
Reorders shutdown and adds bounded/observed connection close logic for ServiceBus (including dispose completion). |
test/NLog.Extensions.AzureEventHub.Tests/EventHubShutdownTests.cs |
New regression tests covering shutdown flush ordering and close fault/timeout behavior (EventHub). |
test/NLog.Extensions.AzureEventHub.Tests/RecordingEventHubService.cs |
New instrumented IEventHubService fake to record send/close ordering and simulate slow/throwing close. |
test/NLog.Extensions.AzureServiceBus.Tests/ServiceBusShutdownTests.cs |
New regression tests covering shutdown flush ordering and close fault/timeout behavior (ServiceBus). |
test/NLog.Extensions.AzureServiceBus.Tests/RecordingCloudServiceBus.cs |
New instrumented ICloudServiceBus fake to record send/close ordering and model sender-close + client-dispose. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Quality cleanup of the S5 fix (no behaviour change; all tests still pass): - Hoist the duplicated ~32-line sync-over-async close logic out of both EventHubTarget and ServiceBusTarget into one shared, linked helper (AsyncTargetCloseHelper, alongside ProxyHelpers/SortHelpers). The subtle observe-on-timeout logic now lives once and can't drift between targets; each target's CloseTarget is a 3-line try/finally. - Collapse the helper's nested try/catch + `bool completed` into a single try/catch: a close that faults within the wait and one that faults after the timeout are handled the same way, and the warning message + the 5s timeout are defined once instead of repeated per call site. - Trim the recording test fakes: drop the Timeline/DumpTimeline/CloseStarted bookkeeping (and the unused partition-key plumbing) left over from the deleted discovery harness; the tests only assert the send/close counters. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The new EventHub/ServiceBus shutdown test classes mutate the global
NLog.InternalLogger.LogWriter/LogLevel but were not enrolled in the
existing [Collection("InternalLogger isolation")] that disables
parallelization for InternalLogger-mutating tests. Add the attribute so
they cannot run concurrently with sibling tests that read/write the same
global state, removing a potential source of suite flakiness.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



S5 — shutdown teardown on
EventHubTarget/ServiceBusTargetThe audit (BUGHUNT.md, S5 remainder) flagged this as "shutdown log loss (flush abandoned)". Reproducing before writing tests corrected that premise. The headline "log loss" does not happen; two adjacent defects do.
Both targets had:
CloseAsync()closes the connection (EventHub_client.CloseAsync(); ServiceBus_sender.CloseAsync()then_client.DisposeAsync()), not the NLog queue — draining the queue isAsyncTaskTarget's job (neither target overridesFlushAsync). The connection was closed beforebase.CloseTarget().Reproduction (instrumented fakes +
LogFactory)I added recording
IEventHubService/ICloudServiceBusfakes that log the order of batch sends relative toCloseAsync, and can makeCloseAsyncslow and/or throw. Findings onorigin/master:CloseTarget? Any send over a closed client?CloseTargetin every shutdown path (Shutdown(),Configuration = null,Dispose()). Timeline: `sendCloseAsyncthrows after the 500 ms wait elapses — observed anywhere?TaskException(confirmed:UnobservedTaskExceptionfired once, nothing in the internal log). Real defect.CloseAsyncthrows within 500 ms — doesWaitrethrow out ofCloseTarget?Target.Close()already swallowsCloseTargetexceptions. Throwing anAggregateExceptionout ofCloseTargetis still poor form and gets logged as a scary error.shutdownMs≈502, connection still closing). For ServiceBus the abandoned step is_client.DisposeAsync()→ leaked client. Real defect.Fix (symmetric across both targets)
base.CloseTarget()first so NLog flushes any remaining queued events over the still-open connection, then close the connection — never the reverse. (Defensive: repro showed NLog already flushes first, but this makes the ordering correct by construction.)InternalLogger; on timeout a fault-observing continuation is attached so a later fault can never become an unobserved task exception, plus a warning is logged.CloseTargetnever throws.DisposeAsync()is no longer silently abandoned.CloseTargetstays synchronous (NLog signature) with a correct sync-over-async bridge.Tests (test-first; reverting the fix fails them)
Driven through the fakes +
LogFactory:Shutdown_WhenCloseAsyncThrows_IsLoggedAndDoesNotPropagate— a 900 ms-then-throwing close is logged viaInternalLoggerand does not propagate (was a silent unobserved exception). Fails pre-fix.Shutdown_SlowCloseIsAwaitedNotAbandoned— an 800 ms close runs to completion during shutdown; for ServiceBus the client dispose completes too. Fails pre-fix (abandoned at 500 ms).Shutdown_FlushesQueuedEventsBeforeClosingConnection— invariant guard: all events sent, none over a closed connection. Passes pre- and post-fix — documents the disproven premise; not a revert-fails test.Verification
EventHubTargettests: 10/10.ServiceBusTargettests: 9/9 (DOTNET_ROLL_FORWARD=Major, net8 target on net9/10 runtimes).srcprojects acrossnetstandard2.0; net8.0; net10.0: 0 warnings, 0 errors.TreatWarningsAsErrorsnot enabled.🤖 Generated with Claude Code