Commit 46e2d6e
[2/7] Telemetry Infrastructure: CircuitBreaker and FeatureFlagCache (#325)
* Add telemetry infrastructure: CircuitBreaker and FeatureFlagCache
This is part 2 of 7 in the telemetry implementation stack.
Components:
- CircuitBreaker: Per-host endpoint protection with state management
- FeatureFlagCache: Per-host feature flag caching with reference counting
- CircuitBreakerRegistry: Manages circuit breakers per host
Circuit Breaker:
- States: CLOSED (normal), OPEN (failing), HALF_OPEN (testing recovery)
- Default: 5 failures trigger OPEN, 60s timeout, 2 successes to CLOSE
- Per-host isolation prevents cascade failures
- All state transitions logged at debug level
Feature Flag Cache:
- Per-host caching with 15-minute TTL
- Reference counting for connection lifecycle management
- Automatic cache expiration and refetch
- Context removed when refCount reaches zero
Testing:
- 32 comprehensive unit tests for CircuitBreaker
- 29 comprehensive unit tests for FeatureFlagCache
- 100% function coverage, >80% line/branch coverage
- CircuitBreakerStub for testing other components
Dependencies:
- Builds on [1/7] Types and Exception Classifier
* Add authentication support for REST API calls
Implements getAuthHeaders() method for authenticated REST API requests:
- Added getAuthHeaders() to IClientContext interface
- Implemented in DBSQLClient using authProvider.authenticate()
- Updated FeatureFlagCache to fetch from connector-service API with auth
- Added driver version support for version-specific feature flags
- Replaced placeholder implementation with actual REST API calls
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* Fix feature flag and telemetry export endpoints
- Change feature flag endpoint to use NODEJS client type
- Fix telemetry endpoints to /telemetry-ext and /telemetry-unauth
- Update payload to match proto with system_configuration
- Add shared buildUrl utility for protocol handling
* Match JDBC telemetry payload format
- Change payload structure to match JDBC: uploadTime, items, protoLogs
- protoLogs contains JSON-stringified TelemetryFrontendLog objects
- Remove workspace_id (JDBC doesn't populate it)
- Remove debug logs added during testing
* Fix lint errors
- Fix import order in FeatureFlagCache
- Replace require() with import for driverVersion
- Fix variable shadowing
- Disable prefer-default-export for urlUtils
* Add missing getAuthHeaders method to ClientContextStub
Fix TypeScript compilation error by implementing getAuthHeaders
method required by IClientContext interface.
* Fix prettier formatting
* Add DRIVER_NAME constant for nodejs-sql-driver
* Add missing telemetry fields to match JDBC
Added osArch, runtimeVendor, localeName, charSetEncoding, and
processName fields to DriverConfiguration to match JDBC implementation.
* Fix TypeScript compilation: add missing fields to system_configuration interface
* Fix telemetry PR review comments from #325
Three fixes addressing review feedback:
1. Fix documentation typo (sreekanth-db comment)
- DatabricksTelemetryExporter.ts:94
- Changed "TelemetryFrontendLog" to "DatabricksTelemetryLog"
2. Add proxy support (jadewang-db comment)
- DatabricksTelemetryExporter.ts:exportInternal()
- Get HTTP agent from connection provider
- Pass agent to fetch for proxy support
- Follows same pattern as CloudFetchResultHandler and DBSQLSession
- Supports http/https/socks proxies with authentication
3. Fix flush timer to prevent rate limiting (sreekanth-db comment)
- MetricsAggregator.ts:flush()
- Reset timer after manual flushes (batch size, terminal errors)
- Ensures consistent 30s spacing between exports
- Prevents rapid successive flushes (e.g., batch at 25s, timer at 30s)
All changes follow existing driver patterns and maintain backward compatibility.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* Add proxy support to feature flag fetching
Feature flag fetching was also missing proxy support like telemetry
exporter was. Applied the same fix:
- Get HTTP agent from connection provider
- Pass agent to fetch call for proxy support
- Follows same pattern as CloudFetchResultHandler and DBSQLSession
- Supports http/https/socks proxies with authentication
This completes proxy support for all HTTP operations in the telemetry
system (both telemetry export and feature flag fetching).
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* Address PR #325 review feedback
- Fix CircuitBreaker HALF_OPEN: any failure now immediately reopens the
circuit instead of accumulating to failureThreshold, aligning with
Resilience4j behavior used in the JDBC driver
- Add maxPendingMetrics bound to MetricsAggregator (default 500) to
prevent unbounded buffer growth when exports keep failing
- Inline buildUrl into DatabricksTelemetryExporter and FeatureFlagCache;
remove urlUtils.ts (single-use utility)
- Add missing unit tests for DatabricksTelemetryExporter, MetricsAggregator,
and TelemetryEventEmitter (HTTP calls, retry logic, batching, flush
timers, event routing)
Co-authored-by: Isaac
* refactor: replace node-fetch injection with sendRequest via connection provider
Address PR #325 review feedback: instead of accepting a `fetchFunction`
constructor parameter, `DatabricksTelemetryExporter` now delegates HTTP
calls to a private `sendRequest()` method that retrieves the agent from
`IConnectionProvider.getAgent()` — the same pattern used by
`CloudFetchResultHandler` and `DBSQLSession`. This keeps proxy support
intact while removing the direct `node-fetch` coupling from the public API.
Tests updated to stub `sendRequest` on the instance via sinon instead of
injecting a fetch function through the constructor.
Co-authored-by: Isaac
* style: fix prettier formatting in telemetry files
Co-authored-by: Isaac
* fix: address telemetry code review issues
- ExceptionClassifier: classify ECONNREFUSED, ENOTFOUND, EHOSTUNREACH,
ECONNRESET as retryable — these are transient network failures that
were previously falling through as non-retryable and silently dropped
- DatabricksTelemetryExporter: read driver version from lib/version.ts
instead of hardcoded '1.0.0'; use uuid.v4() instead of hand-rolled
UUID generation which had incorrect version/variant bits
Co-authored-by: Isaac
* fix: address code review findings for telemetry infrastructure
- Add workspace_id to telemetry log serialization (was silently dropped)
- Fix socket leak: consume HTTP response body on success and error paths
- Add 10s timeout to all telemetry/feature-flag fetch calls
- Fix thundering herd: deduplicate concurrent feature flag fetches
- Add Promise.resolve().catch() to flush() to prevent unhandled rejections
- Add HALF_OPEN inflight guard to CircuitBreaker (limit to 1 probe)
- Rename TelemetryEventType.ERROR to 'telemetry.error' (avoid EventEmitter collision)
- Extract shared buildTelemetryUrl utility enforcing HTTPS
- Clamp server-provided TTL to [60, 3600] seconds
- Skip export silently when auth headers are missing
- Log circuit breaker OPEN transitions at warn level
- Fix CircuitBreakerStub HALF_OPEN behavior to match real implementation
- Snapshot Map keys before iteration in close()
- Remove unnecessary 'as any' cast and '| string' type widening
Co-authored-by: Isaac
* style: fix prettier formatting in CircuitBreaker.ts
Co-authored-by: Isaac
* fix: resolve ESLint errors in telemetry modules
Use dot notation for Authorization header access and convert
buildTelemetryUrl to default export to satisfy lint rules.
Co-authored-by: Isaac
* fix: harden telemetry security + address code-review-squad findings
Security (Critical):
- buildTelemetryUrl now rejects protocol-relative //prefix, zero-width
and BOM codepoints, CR/LF/tab, userinfo, path/query/fragment, and
loopback/RFC1918/IMDS/localhost/GCP+Azure-metadata hosts. Defeats
the SSRF-shaped Bearer-token exfil vector an attacker-influenced
host (env var, tampered config) could trigger.
- redactSensitive now covers real Databricks secret shapes: dapi/
dkea/dskea/dsapi/dose PATs, JWT triplets, JSON-quoted access_token/
client_secret/refresh_token/id_token/password/api_key, Basic auth,
URL-embedded credentials. Re-applies after truncation.
- Unauthenticated export now omits system_configuration entirely,
strips userAgentEntry from User-Agent, and blanks stack_trace, so
on-path observers cannot re-identify clients on the unauth path.
- sanitizeProcessName drops argv tail (handles node --db-password=X
app.js shape).
Correctness (High):
- MetricsAggregator gained a closed flag; close() no longer races
with batch-triggered flushes that would resurrect the interval.
- evictExpiredStatements now runs from the periodic flush timer so
idle clients actually reclaim orphan statement entries.
- Evicted statements emit their buffered error events as standalone
metrics before being dropped — first-failure signal survives.
- Batch-size and terminal-error flush paths pass resetTimer=false so
sustained overflow cant starve the periodic tail drain.
- TelemetryTerminalError introduced for host-validation refusals,
separating that taxonomy from AuthenticationError.
- authMissingWarned re-arms after a successful export so operators
see a fresh signal the next time auth breaks.
- Retry log denominator uses totalAttempts (not maxRetries); negative
maxRetries falls back to default; retry log includes the redacted
failing error so ops can see whats being retried.
API / hygiene:
- CircuitBreakerOpenError, CIRCUIT_BREAKER_OPEN_CODE, and
TelemetryTerminalError re-exported from lib/index.ts so consumers
can instanceof-check.
- DBSQLClient.getAuthProvider marked @internal.
- DEFAULT_TELEMETRY_CONFIG / DEFAULT_CIRCUIT_BREAKER_CONFIG frozen.
- pushBoundedError uses if instead of while.
- CIRCUIT_BREAKER_OPEN_CODE typed as const.
- Default export on buildTelemetryUrl removed (no callers).
- Dropped wasted new Error allocation in processErrorEvent.
Tests:
- New telemetryUtils.test.ts (53 tests): URL-rejection table covering
every known bypass, redactor shapes, sanitize process name.
- DatabricksTelemetryExporter: 13 new tests covering Authorization
on-the-wire, authMissingWarned idempotency + re-arm, unauth
correlation/system_configuration/userAgentEntry/stack_trace
stripping, malformed-host drop, loopback drop, dispose idempotency,
errorStack to redacted stack_trace flow.
- MetricsAggregator: 2 new tests for async close() awaiting the
exporter promise (prevents process.exit truncation) and no timer
resurrection after close.
700 unit tests pass, ESLint clean.
Co-authored-by: Isaac
Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com>
* test: document synthetic-JWT source pattern in redactSensitive test
Clarify that the JWT string in the redactor test is intentionally fake
and is built from parts so the assembled token never appears as a
source literal (to satisfy pre-commit secret scanners).
Co-authored-by: Isaac
Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com>
* fix: stringify sinon.Call.args[1] before regex test
CI's TypeScript was stricter than the local version and rejected the
untyped `c.args[1]` passed to `RegExp.test()`. Wrap in `String(...)`
so the tests compile on Node 14/16/18/20 runners.
Co-authored-by: Isaac
Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com>
* fix: address Jade's review comments
- ExceptionClassifier: remove ENOTFOUND from retryable (DNS "not found"
is deterministic — retrying just pushes load at the resolver without
any expectation of success). Add ETIMEDOUT and EAI_AGAIN per Jade's
follow-up list.
- Extract shared `normalizeHeaders` + `hasAuthorization` helpers into
telemetryUtils; DatabricksTelemetryExporter and FeatureFlagCache both
use them — eliminates the ~40-line duplication Jade flagged.
- normalizeHeaders now guards `typeof raw === 'object'` before
Object.entries, preventing Object.entries('some-string') index entries
from leaking in as headers (Jade: "should we do type check here?").
- FeatureFlagCache.fetchFeatureFlag: add single-retry on transient errors
(classified via ExceptionClassifier). Without a retry, one DNS hiccup
would disable telemetry for the full 15-minute cache TTL; one retry
gives an ephemeral failure a second chance without pushing sustained
load at a broken endpoint.
- Drop the private hasAuthorization/normalizeHeaders on the exporter;
drop the inlined branching in FFC getAuthHeaders.
- Update ExceptionClassifier tests: invert ENOTFOUND expectation with
a comment explaining why; add cases for ETIMEDOUT and EAI_AGAIN.
702 unit tests pass, ESLint clean.
Co-authored-by: Isaac
Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com>
---------
Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>1 parent 69f901c commit 46e2d6e
File tree
19 files changed
+4348
-7
lines changed- lib
- contracts
- telemetry
- tests/unit
- .stubs
- telemetry
19 files changed
+4348
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
238 | 238 | | |
239 | 239 | | |
240 | 240 | | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
241 | 247 | | |
242 | 248 | | |
243 | 249 | | |
| |||
352 | 358 | | |
353 | 359 | | |
354 | 360 | | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
355 | 372 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
31 | 34 | | |
32 | 35 | | |
33 | 36 | | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
34 | 41 | | |
35 | 42 | | |
36 | 43 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
26 | 31 | | |
27 | 32 | | |
28 | 33 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
0 commit comments