Tests: SNIClose close-during-async-IO regression tests (ADO #43847 / ICM 775308542)#4420
Open
paulmedynski wants to merge 5 commits into
Open
Tests: SNIClose close-during-async-IO regression tests (ADO #43847 / ICM 775308542)#4420paulmedynski wants to merge 5 commits into
paulmedynski wants to merge 5 commits into
Conversation
Adds gated-server repro tests that close/dispose a connection while an async network read is in flight, using the in-proc TDS server with a query engine that stalls the response. Findings: - On managed SNI (Linux) these tests PASS: disposing the socket completes the pending ReadAsync with ObjectDisposedException, so there is no deadlock in the managed teardown path. - The ICM deadlock is specific to the native SNIClose path on Windows, which cannot be exercised from a Linux host. These tests are intended to reproduce that hang on Windows/native SNI and serve as bounded-wait regression guards elsewhere. No fix applied yet.
…ke, MARS) Expands the ADO #43847 / ICM 775308542 investigation into whether SNIClose can deadlock with in-flight async I/O during connection close. UnitTests (SNICloseDeadlockTest.cs), in-process TDS server: - Strengthen the existing post-login query-read scenarios to assert the async read is genuinely pending at close (readTask not completed, connection Open). - Add pre-login stall scenarios (bare TcpListener withholds the pre-login response) for Close and Dispose. - Add faithful TLS-over-TDS handshake scenarios: the listener advertises ENCRYPT_ON, reads the client's ClientHello, then withholds the ServerHello so the client is inside SslStream.AuthenticateAsClient with a pending SNI read at close. ManualTests (MarsCloseDeadlockTest.cs), live SQL Server: - Add MARS (smux logical-session) Close/Dispose scenarios using WAITFOR DELAY to keep an async read pending; the in-process TDS server does not support MARS. All scenarios pass on net9.0 and net462: current native SNI drains the pending async I/O and closes promptly in every path (post-login, pre-login, TLS handshake, and MARS).
Contributor
There was a problem hiding this comment.
Pull request overview
Adds regression coverage to ensure SqlConnection.Close() / Dispose() do not deadlock when native SNI has an in-flight async read during connection teardown, spanning post-login, pre-login, TLS-handshake, and MARS logical-session phases.
Changes:
- Added in-process simulated-server unit tests covering close/dispose with a pending async read after login, during pre-login, and during TLS-over-TDS handshake.
- Added a live SQL Server manual test covering the MARS SMUX logical-session path with a pending async read during close/dispose.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| src/Microsoft.Data.SqlClient/tests/UnitTests/SimulatedServerTests/SNICloseDeadlockTest.cs | New simulated-server unit tests that force pending reads and verify close/dispose completes within a bounded timeout across multiple connection phases. |
| src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/AsyncTest/MarsCloseDeadlockTest.cs | New manual live-server MARS test that keeps an async read pending on a logical session and verifies close/dispose completes promptly. |
…on-test docs Resolves copilot-pull-request-reviewer feedback on #4420: - Guard the finally-block connection.Dispose() cleanup with closedInTime in all four scenarios (post-login read, pre-login, TLS handshake, MARS). When a close deadlock is detected (closedInTime == false), skip the potentially-blocking Dispose() so the bounded-wait Assert fails fast instead of hanging. - Correct the SNICloseDeadlockTest class doc: these are regression guards that PASS on the current code base (MDS drains in-flight async I/O and closes promptly); they would fail only if a future change reintroduced the deadlock. Tests: SNICloseDeadlockTest 6/6 pass on net9.0 and net462; MarsCloseDeadlockTest 2/2 pass on net9.0 against a live server.
Replace each pair of Close/Dispose [Fact] methods plus their private helper with a single [Theory] parameterized by disposeInsteadOfClose ([InlineData(false)]/[InlineData(true)]), matching the MARS manual test's style. No behavior change; still 6 cases (3 theories x 2).
…anup Second round of copilot-pull-request-reviewer feedback on #4420: - Run the Close/Dispose worker on a dedicated long-running thread (Task.Factory.StartNew(..., TaskCreationOptions.LongRunning)) instead of a thread-pool thread, so a hypothetical deadlock regression parks one dedicated thread rather than starving the shared pool. - Wrap each scenario body in try/finally so cleanup always runs even if a precondition assert throws: the release event is always signaled, the TCP listener is always stopped, and the command/connection are disposed when the close worker never started or completed. The blocking connection.Dispose() is still skipped when a deadlock is actually detected (closedInTime == false), so the failure stays bounded and no listener/connection is leaked. Applies to all four scenarios (post-login read, pre-login, TLS handshake, MARS). Tests: SNICloseDeadlockTest 6/6 pass on net9.0 and net462; MarsCloseDeadlockTest 2/2 pass on net9.0.
Comment on lines
+251
to
+264
| try | ||
| { | ||
| // Read one byte of the client's pre-login packet. Once this | ||
| // returns the client has sent its pre-login and posted its | ||
| // async read for the response. We only need to know data | ||
| // arrived, so a single byte is sufficient. | ||
| stream.ReadByte(); | ||
| } | ||
| catch | ||
| { | ||
| // The client may tear down the socket; ignore. | ||
| } | ||
|
|
||
| clientConnected.Set(); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Investigation and regression tests for ADO #43847 / ICM 775308542 — "SNIClose can deadlock with in-flight async I/O during connection close."
The original incident was reported against the retired System.Data.SqlClient; this work verifies whether the same deadlock exists in current Microsoft.Data.SqlClient native SNI by reproducing "close while an async read is in flight" across every relevant connection phase.
Result: no deadlock reproduced in any tested path on net9.0 or net462. Current native SNI drains/cancels the pending async I/O and closes promptly in each scenario.
Tests added
UnitTests —
SNICloseDeadlockTest.cs(in-process TDS server)readTasknot completed, connectionOpen).TcpListenerwithholds the pre-login response soOpenAsync's read pends during connection establishment.ENCRYPT_ON, reads the client's TLS ClientHello, then withholds the ServerHello, leaving the client insideSslStream.AuthenticateAsClientwith a pending SNI read on the SSL-over-TDS transport at close time.ManualTests —
MarsCloseDeadlockTest.cs(live SQL Server)MultipleActiveResultSets=truewithWAITFOR DELAYto keep an async read pending on a MARS logical session at close. This path requires a real server because the in-process TDS test server does not support MARS. Verified against a local SQL Server on net9.0 and net462.Each scenario closes on a worker thread under a bounded wait; a deadlock would manifest as the wait timing out.
Results
Notes
TdsParserStateObjectNative.Dispose()still carries anUNDONEcomment about blocking for pending callbacks on logical (MARS) connections, andDisposeCounters()waits on_readingCountbut not_pendingCallbacks. These remain latent hardening opportunities, but no live deadlock was found in current MDS.Checklist
Draft: opened for review/discussion of the investigation outcome.