Skip to content

async client: retry on bare ServerDisconnectedError#726

Merged
joe-clickhouse merged 1 commit intoClickHouse:mainfrom
IAL32:fix/async-retry-server-disconnected
Apr 30, 2026
Merged

async client: retry on bare ServerDisconnectedError#726
joe-clickhouse merged 1 commit intoClickHouse:mainfrom
IAL32:fix/async-retry-server-disconnected

Conversation

@IAL32
Copy link
Copy Markdown
Contributor

@IAL32 IAL32 commented Apr 30, 2026

Summary

The async client's _raw_request retry path catches aiohttp.ServerConnectionError and retries once when str(e) contains "Connection reset", "Remote end closed", or "Cannot connect". It does not match the bare aiohttp.ServerDisconnectedError() whose default message is "Server disconnected", which is what aiohttp raises in current versions when a pooled keep-alive connection has been closed by the server (or an upstream load balancer) between requests. The first request after an idle period reuses the stale connection, sees the disconnect, and surfaces as OperationalError("Network Error: Server disconnected") to the caller instead of being transparently retried on a fresh connection.

The fix adds "Server disconnected" to the existing substring list. Reproduces and verifies in tests/integration_tests/test_error_handling.py against a real ClickHouse server using the test_native_async_client fixture and pytest-mock to inject the disconnect at the _session.request boundary.

Tests added:

  • test_async_retry_on_server_disconnected: bare ServerDisconnectedError() is retried once and the second attempt succeeds. Fails on main, passes with this change.
  • test_async_server_disconnected_raises_after_retry: if the disconnect persists, OperationalError("Network Error: Server disconnected") still surfaces with the original ServerDisconnectedError as __cause__.
  • test_async_retry_on_connection_reset: regression check that the pre-existing "Connection reset" retry branch still works.

This is async-only. The sync client uses urllib3, which retries pooled-connection drops independently via Retry.

Closes #725

Checklist

Delete items not relevant to your PR:

  • Unit and integration tests covering the common scenarios were added
  • A human-readable description of the changes was provided to include in CHANGELOG

aiohttp raises ServerDisconnectedError without a message when a pooled
keep-alive connection has been closed by the server (or an upstream load
balancer) and the client picks it up on the next request. The existing
retry path in _raw_request matched on "Connection reset" / "Remote end
closed" / "Cannot connect" but not on the bare "Server disconnected"
message, so the failure surfaced as OperationalError on the first
post-idle request instead of being transparently retried.

Add "Server disconnected" to the retry substring list and add async
integration coverage in test_error_handling.py for both the retried-once
and retry-also-fails paths, plus a regression check for the existing
"Connection reset" branch.
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 30, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Contributor

@joe-clickhouse joe-clickhouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks!

@joe-clickhouse joe-clickhouse merged commit 4d2dffa into ClickHouse:main Apr 30, 2026
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Async client raises OperationalError on stale pooled connection instead of retrying

3 participants