Fix flaky tests and preexisting CI failures#1443
Conversation
The sendSignInCode helper used a fixed 40-iteration loop with increasing delays (100 + i*20 ms), giving a maximum wait of ~20 seconds. Under CI load with parallel test workers, the email queue + Inbucket delivery pipeline can exceed this window, causing flaky 'Sign-in code message not found after 40 attempts' errors. Root cause: The email delivery path is asynchronous — the API queues the email, the email-queue worker picks it up, renders the template, and sends it to Inbucket. When multiple test workers are running concurrently (as in CI), this pipeline competes for CPU and I/O, regularly pushing delivery times past the 20s polling window. Fix: Replace the fixed-iteration loop with a deadline-based polling approach (60s deadline, 500ms interval), matching the pattern already used by waitForMessagesWithSubject. This gives 3x more headroom while keeping the polling interval consistent. The loop still exits immediately once the message appears, so there is no cost on the happy path. Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
…plays tests Three session-replays tests were flaky: 1. 'admin list session replays filters by user_ids' 2. 'admin list session replays filters by click_count_min' 3. 'admin events endpoint does not allow fetching a chunk via the wrong session id' Root causes: - The user_ids filter test queried ClickHouse immediately after uploading batches without any retry loop. ClickHouse uses async_insert (server- buffered, no wait_for_async_insert), so data is not immediately queryable. Under CI load, this caused assertion failures (expected items.length === 1 but got 0). - The click_count_min test had a retry loop but with only 15 attempts × 500ms = 7.5s, which was insufficient under CI load where ClickHouse ingestion can take 10–15s. - The chunk isolation test used two Auth.Otp.signIn() calls just to get two authenticated sessions. Each OTP sign-in involves email delivery and polling (5–15s each under CI load), consuming 20–40s of the 50s test timeout budget and causing timeouts. Fixes: - Added listReplaysWithRetry helper (30 attempts × 500ms = 15s) with a predicate-based retry loop, consistent with the retry patterns already used in analytics-events.test.ts and token-refresh-events.test.ts. - Replaced Auth.Otp.signIn() with Auth.fastSignUp() in tests that don't specifically test OTP. fastSignUp creates a user and session server-side without any email flow, taking <100ms vs 5–15s for OTP under CI load. - Used listReplaysWithRetry for both user_ids and click_count_min assertions to tolerate ClickHouse eventual consistency. Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
…ken-refresh-events tests The 'multiple session refreshes create one event each' test was flaky because: 1. The ClickHouse polling window (15 attempts × 300ms = 4.5s) was too short. The event pipeline goes through SDK self-call → quota debit → Postgres insert → ClickHouse async_insert (server-buffered). Under CI load this can exceed 4.5s, causing 'Expected exactly N events but found N-1' errors. 2. The test used Auth.Otp.signIn() for initial session creation, which involves email delivery and polling (~5-15s under CI load). This ate into the 50s test timeout budget, leaving insufficient time for four sequential ClickHouse polling cycles. Fixes: - Increased fetchEventsWithRetry defaults from 10 attempts × 300ms to 40 attempts × 500ms (20s max), matching the pattern in analytics-events.test.ts. The loop still exits immediately when the expected count appears, so there is no cost on the happy path. - Updated expectExactlyNTokenRefreshEvents to use the new defaults instead of hardcoded 15 × 300ms. - Replaced Auth.Otp.signIn() with Auth.fastSignUp() in the 'multiple session refreshes' test since it only needs an initial session with a refresh token, not OTP specifically. This saves 5-15s per test run. Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
…nal-metrics test The 'should return metrics data with users' test was flaky because waitForMetricsToIncludeUsersByCountry only checked that users_by_country ['CH'] === 1 before taking the snapshot. The snapshot also expects 2 users in 'AQ' (active_users_by_country), but under CI load ClickHouse may ingest AQ events after CH events, causing the snapshot to see only 1 user in AQ instead of 2. Root cause: The ClickHouse async_insert pipeline ingests events in a non-deterministic order. When the test creates users with different country data, the events for different countries may arrive at different times. Waiting for only one country's count to match doesn't guarantee the other country's data is fully ingested. Fix: Use waitForMetricsMatch with a combined predicate that checks both users_by_country['CH'] === 1 AND active_users_by_country['AQ'].length >= 2, ensuring all expected data is present before the snapshot fires. Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
…test The 'cannot read events from other projects' test was timing out at 50s because it performed two Auth.Otp.signIn() calls (one per project). Each OTP sign-in involves sending an email, polling the mailbox for delivery, and extracting the code — taking 5-15s under CI load. Two sign-ins could consume 20-40s of the 50s test timeout, leaving insufficient time for the ClickHouse polling that follows (up to 20s). Root cause: The test only needs authenticated users in two projects to verify cross-project event isolation. It doesn't test OTP functionality. Using slow email-based auth for a non-auth test wastes timeout budget. Fix: Replace both Auth.Otp.signIn() calls with Auth.fastSignUp(), which creates users and sessions server-side without any email flow (<100ms each). fastSignUp still generates $token-refresh events in ClickHouse (via the session creation path), so the test's assertions remain valid. Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
The 'cannot read another user sessions as client' test was timing out at
50s because it performed two Auth.Password.signUpWithEmail() calls, each
waiting for the verification email. Under CI load with 8 parallel test
workers, email delivery can take 5-15s per message, consuming 20-40s of
the 50s budget.
Root cause: The test only needs two authenticated users to verify that
user2 cannot read user1's sessions. Email verification is irrelevant to
this test's purpose.
Fix: Pass { noWaitForEmail: true } to both signUpWithEmail calls. This
skips the verification email wait while still creating authenticated
users with valid sessions.
Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
…tats test The 'should track sent emails' test was flaky because: 1. Auth.Otp.signIn() consumed 5-15s of timeout budget under CI load, leaving less time for the fixed 12s wait. 2. The fixed wait(12_000) was not always sufficient for the email stats pipeline to update, especially under CI resource contention. 3. The rate_per_second value in the inline snapshot was timing-dependent (computed from recent activity), causing non-deterministic mismatches. Root cause: The email delivery tracking pipeline processes sent emails asynchronously. A fixed wait is inherently fragile — if the pipeline is slower than 12s (common under CI load), the stats show sent=0 and the snapshot mismatches. Fixes: - Replaced Auth.Otp.signIn() with Auth.fastSignUp() since the test just needs an initial auth session. - Replaced the fixed wait(12_000) with a retry loop (30 attempts × 1s) that polls delivery-info until stats.hour.sent >= 1. This adapts to pipeline latency automatically. - Replaced the inline snapshot with explicit field assertions to avoid flakiness from the timing-dependent rate_per_second value. The rate is checked with a range (27 < rate < 28) instead of exact matching. Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
The 'unsubscribe link should not be sent for emails with transactional
notification category' test was timing out at 50s because
Auth.Password.signUpWithEmail() waits for the verification email (up to
60s). Under CI load, this can take 15-30s, leaving insufficient time for
the test's own email send + waitForMessagesWithSubject call (which also
has a 60s deadline).
Root cause: The test needs an authenticated user with a valid email to
send a custom email and check its content. Email verification is not
required — the test only checks if the sent email contains an
unsubscribe link.
Fix: Pass { noWaitForEmail: true } to skip the verification email wait.
Also added credential_enabled: true to the project config to ensure
password auth is available.
Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
…tests Multiple backend unit tests (regen-internal-subscriptions-to-latest and ensure-free-plan) were timing out at 20s under CI load. These tests perform real database operations (Prisma queries, transaction processing) that compete for I/O with 8 parallel test workers. Root cause: The 20s timeout was calibrated for isolated local runs. In CI, the shared database connection pool is contended by parallel workers, causing individual queries to take 2-5x longer than in isolation. Fix: Increase the backend testTimeout from 20s to 60s. This matches the e2e test timeout pattern (50s in CI) and provides sufficient headroom for DB-heavy tests under CI load. The timeout only fires if a test truly hangs — well-behaved tests still complete in their normal time. Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
…for refund tests
Two payment-related tests were timing out under CI load:
1. 'allows team admins to be added when item quantity is increased'
(items.test.ts): The test sends 3 invitation emails in parallel but
then waits for each email sequentially in the for loop. Each
waitForMessagesWithSubject call has a 60s deadline. Under CI load,
3 sequential waits can consume 15-45s of the 50s test timeout.
Fix: Pre-fetch all invitation emails in parallel using Promise.all()
before the acceptance loop. Since the invitations are already sent,
all mailboxes can poll concurrently, reducing the total wait from
3×N seconds to max(N) seconds.
2. 'refunds a renewal invoice' and related tests
(transactions-refund.test.ts): Each test calls
createLiveModeSubscriptionWithRenewal() which performs 7+ API calls
including Stripe webhook sends, project setup, and payment config
updates. Under CI load, this setup alone can take 30-40s.
Fix: Added { timeout: 120_000 } to all three tests that use
createLiveModeSubscriptionWithRenewal, providing adequate headroom
for the heavy payment pipeline operations.
Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (23)
📝 WalkthroughWalkthroughThis PR adds the ChangesDevelopment Environment Configuration and E2E Test Improvements
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Adds detailed timing logs to identify what makes specific emails exceed the 20s boundary: - sendSignInCode: logs total/api/poll time when >10s, tracks outbox status transitions during polling, reports poll count on timeout - prepareSendPlan: logs delivery stats query time, claim time, and quota=0 deferrals per tenancy - processTenancyBatch: logs setup time (getTenancy + getEmailConfig) and billingTeamId presence - processSingleEmail: logs per-email breakdown (resolve/billing/smtp) when >2s These logs will show up in CI output for any email exceeding thresholds, allowing us to identify exactly which pipeline stage is causing the tail latency for specific slow emails. Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
Remove timing measurements and console.warn calls that were added during the email delivery performance investigation. The analysis is complete (billing self-calls account for 96-99% of slow email delivery time under CI load), and the 60s deadline-based polling timeout is sufficient. Removed from backend-helpers.ts: - apiCallStart/apiCallEnd performance markers - pollCount and lastOutboxStatus tracking - console.warn calls for SLOW/WAITING emails - getOutboxEmails() calls in success path Removed from email-queue-step.tsx: - Per-stage timing markers in prepareSendPlan, processTenancyBatch, and processSingleEmail - console.warn calls for slow pipeline stages Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
The dev branch added an is_development_environment field to the project API response. Update all inline snapshots in project-related E2E tests to include the new field (always false for test-created projects). Affected test files: - projects.test.ts (16 snapshots) - internal/projects.test.ts (12 snapshots) - integrations/custom/projects/provision.test.ts - integrations/neon/oauth-providers.test.ts - integrations/neon/projects/current.test.ts - integrations/neon/projects/provision.test.ts Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
The setEnvironmentConfigOverride test was spying on isDevelopmentEnvironmentProject, but that function is called internally within the development-environment module (intra-module call), so vi.spyOn on the module export doesn't intercept it. Fix: spy on getEnvironmentConfigWriteBlockReason instead, which is the function that setEnvironmentConfigOverride actually imports and calls through the module namespace (cross-module call), making the spy effective. Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
ErrorBoundary's errorComponent props.reset is always defined (not optional), so the conditional check and non-null assertion were unnecessary, triggering @typescript-eslint/no-unnecessary-condition. Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
- project-permissions.test.ts, team-permissions.test.ts: add missing is_development_environment field to project response snapshots - config-local-emulator.test.ts: update blocked message to match the new development environment wording (was: 'cannot be changed in the local emulator', now: 'cannot be changed in a development environment') Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
Greptile SummaryThis PR addresses flaky E2E tests and pre-existing CI failures across the backend, dashboard, and test suites. No production logic changes are included.
Confidence Score: 4/5Safe to merge; changes are confined to tests and a trivial dashboard cosmetic fix with no production behaviour changes. The production-code changes (whitespace in apps/e2e/tests/backend/endpoints/api/v1/session-replays.test.ts and apps/e2e/tests/backend/endpoints/api/v1/emails/delivery-info.test.ts warrant a second look for the silent-timeout behaviour in their new polling helpers. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[E2E Test Start] --> B{Needs Email?}
B -- Yes, OTP/MagicLink --> C[sendSignInCode]
B -- No --> D[Auth.fastSignUp]
C --> E[Deadline-based poll\n60 s / 500 ms interval]
E -- found --> F[Sign in with code]
E -- timeout --> G[StackAssertionError thrown]
F --> H[Test actions]
D --> H
H --> I{ClickHouse query?}
I -- Yes --> J[listReplaysWithRetry /\nwaitForMetricsMatch /\nfetchEventsWithRetry]
J -- predicate met --> K[Assert & continue]
J -- timeout --> L[Return last response\nor throw]
I -- No --> K
|
| // Poll until email stats are updated instead of a fixed wait | ||
| let response; | ||
| for (let i = 0; i < 30; i++) { | ||
| await wait(1_000); | ||
| response = await niceBackendFetch("/api/v1/emails/delivery-info", { | ||
| method: "GET", | ||
| accessType: "server", | ||
| }); | ||
| if (response.status === 200 && response.body?.stats?.hour?.sent >= 1) { | ||
| break; | ||
| } | ||
| `); | ||
| } |
There was a problem hiding this comment.
Polling loop waits before first check and gives no timeout error
The loop unconditionally waits 1 s before ever issuing the first request, wasting time even when the email is already processed. More importantly, if the 30-iteration budget (30 s) is exhausted without the condition being met, the loop exits silently; the subsequent expect(stats.hour.sent).toBe(1) then produces a confusing assertion failure rather than a clear timeout message. waitForMetricsMatch in the same test suite throws a descriptive error on timeout — a similar pattern here would make failures easier to diagnose.
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/e2e/tests/backend/endpoints/api/v1/emails/delivery-info.test.ts
Line: 170-181
Comment:
**Polling loop waits before first check and gives no timeout error**
The loop unconditionally waits 1 s before ever issuing the first request, wasting time even when the email is already processed. More importantly, if the 30-iteration budget (30 s) is exhausted without the condition being met, the loop exits silently; the subsequent `expect(stats.hour.sent).toBe(1)` then produces a confusing assertion failure rather than a clear timeout message. `waitForMetricsMatch` in the same test suite throws a descriptive error on timeout — a similar pattern here would make failures easier to diagnose.
How can I resolve this? If you propose a fix, please make it concise.
Fix flaky tests and all preexisting CI failures to achieve green checks.
Flaky test fixes:
sendSignInCodefastSignUp()ornoWaitForEmailin 6 test fileswait(12s)→ polling loopPreexisting CI fixes:
is_development_environmentfield across all project response testsconfig.tsxin-source test: spy ongetEnvironmentConfigWriteBlockReasoninstead ofisDevelopmentEnvironmentProject(intra-module call not intercepted by module namespace spy)config-local-emulator.test.ts: update blocked message to match new development environment wordingprops.resetLink to Devin session: https://app.devin.ai/sessions/8a3910351551437a9b655d15af4c4154
Requested by: @N2D4
Note
Low Risk
Primarily adjusts test timeouts, polling/retry logic, and snapshots; production code changes are minimal and localized to test behavior and a small dashboard error-boundary handler.
Overview
Improves CI stability by replacing fixed sleeps and short retry loops with deadline-based polling/retries across backend E2E tests (email delivery/sign-in, ClickHouse-ingested analytics/session-replays, internal metrics), and by skipping unnecessary email waits via
Auth.fastSignUp()/noWaitForEmail.Updates numerous E2E snapshots to include the new
is_development_environmentfield in project responses, aligns local-emulator blocking wording with “development environment”, and bumps backend VitesttestTimeoutfrom 20s to 60s.Fixes a brittle in-source Vitest test in
config.tsxby spying ongetEnvironmentConfigWriteBlockReason(instead of an intra-module call that couldn’t be intercepted), and removes an unnecessary conditional aroundprops.reset()in the dashboard widget playground error boundary.Reviewed by Cursor Bugbot for commit 5a4e972. Bugbot is set up for automated code reviews on this repo. Configure here.
Summary by CodeRabbit
Release Notes
New Features
is_development_environmentfield to project configuration API responses.Bug Fixes
Tests