Fix flaky tests and preexisting CI failures by N2D4 · Pull Request #1443 · hexclave/stack-auth

N2D4 · 2026-05-19T23:43:46Z

Fix flaky tests and all preexisting CI failures to achieve green checks.

Flaky test fixes:

Email polling: 20s→60s deadline-based polling in sendSignInCode
ClickHouse eventual consistency: retry loops for session-replays, token-refresh-events, internal-metrics
Unnecessary email waits: replaced with fastSignUp() or noWaitForEmail in 6 test files
Fixed waits: delivery-info wait(12s) → polling loop
Timeout/parallelism: backend timeout 20s→60s, parallel email fetching, 120s for payment tests

Preexisting CI fixes:

Update E2E snapshots for is_development_environment field across all project response tests
Fix config.tsx in-source test: spy on getEnvironmentConfigWriteBlockReason instead of isDevelopmentEnvironmentProject (intra-module call not intercepted by module namespace spy)
Fix config-local-emulator.test.ts: update blocked message to match new development environment wording
Fix dashboard lint: remove unnecessary conditional on always-truthy props.reset

Link to Devin session: https://app.devin.ai/sessions/8a3910351551437a9b655d15af4c4154
Requested by: @N2D4

Note

Low Risk
Primarily adjusts test timeouts, polling/retry logic, and snapshots; production code changes are minimal and localized to test behavior and a small dashboard error-boundary handler.

Overview
Improves CI stability by replacing fixed sleeps and short retry loops with deadline-based polling/retries across backend E2E tests (email delivery/sign-in, ClickHouse-ingested analytics/session-replays, internal metrics), and by skipping unnecessary email waits via Auth.fastSignUp() / noWaitForEmail.

Updates numerous E2E snapshots to include the new is_development_environment field in project responses, aligns local-emulator blocking wording with “development environment”, and bumps backend Vitest testTimeout from 20s to 60s.

Fixes a brittle in-source Vitest test in config.tsx by spying on getEnvironmentConfigWriteBlockReason (instead of an intra-module call that couldn’t be intercepted), and removes an unnecessary conditional around props.reset() in the dashboard widget playground error boundary.

^{Reviewed by Cursor Bugbot for commit 5a4e972. Bugbot is set up for automated code reviews on this repo. Configure here.}

Summary by CodeRabbit

Release Notes

New Features
- Added is_development_environment field to project configuration API responses.
Bug Fixes
- Fixed "Reload widget" button in widget error boundary to always be functional.
Tests
- Improved test reliability and performance with faster authentication methods and intelligent polling mechanisms.
- Updated test timeout configurations and error messaging for development environments.

The sendSignInCode helper used a fixed 40-iteration loop with increasing delays (100 + i*20 ms), giving a maximum wait of ~20 seconds. Under CI load with parallel test workers, the email queue + Inbucket delivery pipeline can exceed this window, causing flaky 'Sign-in code message not found after 40 attempts' errors. Root cause: The email delivery path is asynchronous — the API queues the email, the email-queue worker picks it up, renders the template, and sends it to Inbucket. When multiple test workers are running concurrently (as in CI), this pipeline competes for CPU and I/O, regularly pushing delivery times past the 20s polling window. Fix: Replace the fixed-iteration loop with a deadline-based polling approach (60s deadline, 500ms interval), matching the pattern already used by waitForMessagesWithSubject. This gives 3x more headroom while keeping the polling interval consistent. The loop still exits immediately once the message appears, so there is no cost on the happy path. Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>

…plays tests Three session-replays tests were flaky: 1. 'admin list session replays filters by user_ids' 2. 'admin list session replays filters by click_count_min' 3. 'admin events endpoint does not allow fetching a chunk via the wrong session id' Root causes: - The user_ids filter test queried ClickHouse immediately after uploading batches without any retry loop. ClickHouse uses async_insert (server- buffered, no wait_for_async_insert), so data is not immediately queryable. Under CI load, this caused assertion failures (expected items.length === 1 but got 0). - The click_count_min test had a retry loop but with only 15 attempts × 500ms = 7.5s, which was insufficient under CI load where ClickHouse ingestion can take 10–15s. - The chunk isolation test used two Auth.Otp.signIn() calls just to get two authenticated sessions. Each OTP sign-in involves email delivery and polling (5–15s each under CI load), consuming 20–40s of the 50s test timeout budget and causing timeouts. Fixes: - Added listReplaysWithRetry helper (30 attempts × 500ms = 15s) with a predicate-based retry loop, consistent with the retry patterns already used in analytics-events.test.ts and token-refresh-events.test.ts. - Replaced Auth.Otp.signIn() with Auth.fastSignUp() in tests that don't specifically test OTP. fastSignUp creates a user and session server-side without any email flow, taking <100ms vs 5–15s for OTP under CI load. - Used listReplaysWithRetry for both user_ids and click_count_min assertions to tolerate ClickHouse eventual consistency. Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>

…ken-refresh-events tests The 'multiple session refreshes create one event each' test was flaky because: 1. The ClickHouse polling window (15 attempts × 300ms = 4.5s) was too short. The event pipeline goes through SDK self-call → quota debit → Postgres insert → ClickHouse async_insert (server-buffered). Under CI load this can exceed 4.5s, causing 'Expected exactly N events but found N-1' errors. 2. The test used Auth.Otp.signIn() for initial session creation, which involves email delivery and polling (~5-15s under CI load). This ate into the 50s test timeout budget, leaving insufficient time for four sequential ClickHouse polling cycles. Fixes: - Increased fetchEventsWithRetry defaults from 10 attempts × 300ms to 40 attempts × 500ms (20s max), matching the pattern in analytics-events.test.ts. The loop still exits immediately when the expected count appears, so there is no cost on the happy path. - Updated expectExactlyNTokenRefreshEvents to use the new defaults instead of hardcoded 15 × 300ms. - Replaced Auth.Otp.signIn() with Auth.fastSignUp() in the 'multiple session refreshes' test since it only needs an initial session with a refresh token, not OTP specifically. This saves 5-15s per test run. Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>

…nal-metrics test The 'should return metrics data with users' test was flaky because waitForMetricsToIncludeUsersByCountry only checked that users_by_country ['CH'] === 1 before taking the snapshot. The snapshot also expects 2 users in 'AQ' (active_users_by_country), but under CI load ClickHouse may ingest AQ events after CH events, causing the snapshot to see only 1 user in AQ instead of 2. Root cause: The ClickHouse async_insert pipeline ingests events in a non-deterministic order. When the test creates users with different country data, the events for different countries may arrive at different times. Waiting for only one country's count to match doesn't guarantee the other country's data is fully ingested. Fix: Use waitForMetricsMatch with a combined predicate that checks both users_by_country['CH'] === 1 AND active_users_by_country['AQ'].length >= 2, ensuring all expected data is present before the snapshot fires. Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>

…test The 'cannot read events from other projects' test was timing out at 50s because it performed two Auth.Otp.signIn() calls (one per project). Each OTP sign-in involves sending an email, polling the mailbox for delivery, and extracting the code — taking 5-15s under CI load. Two sign-ins could consume 20-40s of the 50s test timeout, leaving insufficient time for the ClickHouse polling that follows (up to 20s). Root cause: The test only needs authenticated users in two projects to verify cross-project event isolation. It doesn't test OTP functionality. Using slow email-based auth for a non-auth test wastes timeout budget. Fix: Replace both Auth.Otp.signIn() calls with Auth.fastSignUp(), which creates users and sessions server-side without any email flow (<100ms each). fastSignUp still generates $token-refresh events in ClickHouse (via the session creation path), so the test's assertions remain valid. Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>

The 'cannot read another user sessions as client' test was timing out at 50s because it performed two Auth.Password.signUpWithEmail() calls, each waiting for the verification email. Under CI load with 8 parallel test workers, email delivery can take 5-15s per message, consuming 20-40s of the 50s budget. Root cause: The test only needs two authenticated users to verify that user2 cannot read user1's sessions. Email verification is irrelevant to this test's purpose. Fix: Pass { noWaitForEmail: true } to both signUpWithEmail calls. This skips the verification email wait while still creating authenticated users with valid sessions. Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>

…tats test The 'should track sent emails' test was flaky because: 1. Auth.Otp.signIn() consumed 5-15s of timeout budget under CI load, leaving less time for the fixed 12s wait. 2. The fixed wait(12_000) was not always sufficient for the email stats pipeline to update, especially under CI resource contention. 3. The rate_per_second value in the inline snapshot was timing-dependent (computed from recent activity), causing non-deterministic mismatches. Root cause: The email delivery tracking pipeline processes sent emails asynchronously. A fixed wait is inherently fragile — if the pipeline is slower than 12s (common under CI load), the stats show sent=0 and the snapshot mismatches. Fixes: - Replaced Auth.Otp.signIn() with Auth.fastSignUp() since the test just needs an initial auth session. - Replaced the fixed wait(12_000) with a retry loop (30 attempts × 1s) that polls delivery-info until stats.hour.sent >= 1. This adapts to pipeline latency automatically. - Replaced the inline snapshot with explicit field assertions to avoid flakiness from the timing-dependent rate_per_second value. The rate is checked with a range (27 < rate < 28) instead of exact matching. Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>

The 'unsubscribe link should not be sent for emails with transactional notification category' test was timing out at 50s because Auth.Password.signUpWithEmail() waits for the verification email (up to 60s). Under CI load, this can take 15-30s, leaving insufficient time for the test's own email send + waitForMessagesWithSubject call (which also has a 60s deadline). Root cause: The test needs an authenticated user with a valid email to send a custom email and check its content. Email verification is not required — the test only checks if the sent email contains an unsubscribe link. Fix: Pass { noWaitForEmail: true } to skip the verification email wait. Also added credential_enabled: true to the project config to ensure password auth is available. Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>

…tests Multiple backend unit tests (regen-internal-subscriptions-to-latest and ensure-free-plan) were timing out at 20s under CI load. These tests perform real database operations (Prisma queries, transaction processing) that compete for I/O with 8 parallel test workers. Root cause: The 20s timeout was calibrated for isolated local runs. In CI, the shared database connection pool is contended by parallel workers, causing individual queries to take 2-5x longer than in isolation. Fix: Increase the backend testTimeout from 20s to 60s. This matches the e2e test timeout pattern (50s in CI) and provides sufficient headroom for DB-heavy tests under CI load. The timeout only fires if a test truly hangs — well-behaved tests still complete in their normal time. Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>

…for refund tests Two payment-related tests were timing out under CI load: 1. 'allows team admins to be added when item quantity is increased' (items.test.ts): The test sends 3 invitation emails in parallel but then waits for each email sequentially in the for loop. Each waitForMessagesWithSubject call has a 60s deadline. Under CI load, 3 sequential waits can consume 15-45s of the 50s test timeout. Fix: Pre-fetch all invitation emails in parallel using Promise.all() before the acceptance loop. Since the invitations are already sent, all mailboxes can poll concurrently, reducing the total wait from 3×N seconds to max(N) seconds. 2. 'refunds a renewal invoice' and related tests (transactions-refund.test.ts): Each test calls createLiveModeSubscriptionWithRenewal() which performs 7+ API calls including Stripe webhook sends, project setup, and payment config updates. Under CI load, this setup alone can take 30-40s. Fix: Added { timeout: 120_000 } to all three tests that use createLiveModeSubscriptionWithRenewal, providing adequate headroom for the heavy payment pipeline operations. Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>

vercel · 2026-05-19T23:43:48Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
stack-auth-hosted-components	Ready	Preview, Comment	May 20, 2026 5:06pm
stack-auth-mcp	Ready	Preview, Comment	May 20, 2026 5:06pm
stack-auth-skills	Ready	Preview, Comment	May 20, 2026 5:06pm
stack-backend	Ready	Preview, Comment	May 20, 2026 5:06pm
stack-dashboard	Ready	Preview, Comment	May 20, 2026 5:06pm
stack-demo	Ready	Preview, Comment	May 20, 2026 5:06pm
stack-docs	Ready	Preview, Comment	May 20, 2026 5:06pm
stack-preview-backend	Ready	Preview, Comment	May 20, 2026 5:06pm
stack-preview-dashboard	Ready	Preview, Comment	May 20, 2026 5:06pm

devin-ai-integration · 2026-05-19T23:43:49Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR that start with 'DevinAI' or '@devin'.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

coderabbitai · 2026-05-19T23:43:55Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e51575d4-1d5b-4089-8113-2e07fbc7029d

📥 Commits

Reviewing files that changed from the base of the PR and between 97f86a1 and 5a4e972.

📒 Files selected for processing (23)

apps/backend/src/lib/config.tsx
apps/backend/src/lib/email-queue-step.tsx
apps/backend/vitest.config.ts
apps/dashboard/src/app/(main)/(protected)/projects/[projectId]/widget-playground/page-client.tsx
apps/e2e/tests/backend/backend-helpers.ts
apps/e2e/tests/backend/endpoints/api/v1/analytics-events.test.ts
apps/e2e/tests/backend/endpoints/api/v1/auth/sessions/index.test.ts
apps/e2e/tests/backend/endpoints/api/v1/emails/delivery-info.test.ts
apps/e2e/tests/backend/endpoints/api/v1/integrations/custom/projects/provision.test.ts
apps/e2e/tests/backend/endpoints/api/v1/integrations/neon/oauth-providers.test.ts
apps/e2e/tests/backend/endpoints/api/v1/integrations/neon/projects/current.test.ts
apps/e2e/tests/backend/endpoints/api/v1/integrations/neon/projects/provision.test.ts
apps/e2e/tests/backend/endpoints/api/v1/internal-metrics.test.ts
apps/e2e/tests/backend/endpoints/api/v1/internal/config-local-emulator.test.ts
apps/e2e/tests/backend/endpoints/api/v1/internal/projects.test.ts
apps/e2e/tests/backend/endpoints/api/v1/internal/transactions-refund.test.ts
apps/e2e/tests/backend/endpoints/api/v1/payments/items.test.ts
apps/e2e/tests/backend/endpoints/api/v1/project-permissions.test.ts
apps/e2e/tests/backend/endpoints/api/v1/projects.test.ts
apps/e2e/tests/backend/endpoints/api/v1/session-replays.test.ts
apps/e2e/tests/backend/endpoints/api/v1/team-permissions.test.ts
apps/e2e/tests/backend/endpoints/api/v1/token-refresh-events.test.ts
apps/e2e/tests/backend/endpoints/api/v1/unsubscribe-link.test.ts

📝 Walkthrough

Walkthrough

This PR adds the is_development_environment boolean field to project API responses, updates development environment write-blocking to check getEnvironmentConfigWriteBlockReason, and improves E2E test reliability through faster auth methods, deadline-based polling, and retry helpers.

Changes

Development Environment Configuration and E2E Test Improvements

Layer / File(s)	Summary
Development environment config blocking setup `apps/backend/src/lib/config.tsx`, `apps/e2e/tests/backend/endpoints/api/v1/internal/config-local-emulator.test.ts`	Updated `setEnvironmentConfigOverride` test to spy on `getEnvironmentConfigWriteBlockReason` instead of `isDevelopmentEnvironmentProject`, and changed expected error message from "local emulator" to "development environment".
Project response snapshots with is_development_environment field `apps/e2e/tests/backend/endpoints/api/v1/integrations/custom/projects/`, `apps/e2e/tests/backend/endpoints/api/v1/integrations/neon/`, `apps/e2e/tests/backend/endpoints/api/v1/internal/projects.test.ts`, `apps/e2e/tests/backend/endpoints/api/v1/*-permissions.test.ts`, `apps/e2e/tests/backend/endpoints/api/v1/projects.test.ts`	All project API response snapshots updated to include `is_development_environment: false` field across internal, integration, and public endpoints.
E2E test polling and retry infrastructure `apps/e2e/tests/backend/endpoints/api/v1/session-replays.test.ts`, `apps/e2e/tests/backend/endpoints/api/v1/internal-metrics.test.ts`, `apps/e2e/tests/backend/endpoints/api/v1/token-refresh-events.test.ts`	Introduced `listReplaysWithRetry` helper for predicate-based polling, updated `fetchEventsWithRetry` defaults to longer deadlines and more attempts, and switched internal-metrics test to predicate-based metric polling.
E2E test auth modernization and timing improvements `apps/e2e/tests/backend/endpoints/api/v1/analytics-events.test.ts`, `apps/e2e/tests/backend/endpoints/api/v1/auth/sessions/index.test.ts`, `apps/e2e/tests/backend/endpoints/api/v1/internal/transactions-refund.test.ts`, `apps/e2e/tests/backend/endpoints/api/v1/payments/items.test.ts`, `apps/e2e/tests/backend/endpoints/api/v1/session-replays.test.ts`, `apps/e2e/tests/backend/endpoints/api/v1/unsubscribe-link.test.ts`	Migrated E2E test auth from `Auth.Otp.signIn()` to `Auth.fastSignUp()` and `Auth.Password.signUpWithEmail({ noWaitForEmail: true })`, added explicit `timeout: 120_000` to payment refund tests, and parallelized email message fetching.
Email delivery reliability and OTP polling `apps/e2e/tests/backend/backend-helpers.ts`, `apps/e2e/tests/backend/endpoints/api/v1/emails/delivery-info.test.ts`	Updated `Auth.Otp.sendSignInCode` from attempt-count polling to deadline-based polling using `performance.now()` with fixed 500ms intervals, and improved email delivery-info test to actively poll until sent counter updates instead of fixed 12s wait.
Widget error boundary and test configuration `apps/dashboard/src/app/(main)/(protected)/projects/[projectId]/widget-playground/page-client.tsx`, `apps/backend/vitest.config.ts`, `apps/backend/src/lib/email-queue-step.tsx`	Unconditionally render "Reload widget" button and invoke `props.reset()`, increased Vitest `testTimeout` to 60 seconds, and adjusted email-queue-step formatting around quota-gate and send-attempt logic.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

hexclave/stack-auth#1435: Implements the development-environment config write-blocking logic that this PR's test updates verify.
hexclave/stack-auth#1420: Similar ClickHouse polling improvements to internal-metrics test for eventual-consistency handling.
hexclave/stack-auth#1424: Backend session-replay endpoint changes that align with the PR's session-replay E2E test retry helper additions.

Suggested reviewers

nams1570

Poem

🐰 A rabbit hops through snapshots vast and wide,
With is_development_environment by its side,
Auth flows now swift on fastSignUp feet,
Polling helpers make flaky tests complete,
The widget reloads, timeout grows long—
Infrastructure sings a steadier song! ✨

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch devin/1779233168-fix-flaky-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Adds detailed timing logs to identify what makes specific emails exceed the 20s boundary: - sendSignInCode: logs total/api/poll time when >10s, tracks outbox status transitions during polling, reports poll count on timeout - prepareSendPlan: logs delivery stats query time, claim time, and quota=0 deferrals per tenancy - processTenancyBatch: logs setup time (getTenancy + getEmailConfig) and billingTeamId presence - processSingleEmail: logs per-email breakdown (resolve/billing/smtp) when >2s These logs will show up in CI output for any email exceeding thresholds, allowing us to identify exactly which pipeline stage is causing the tail latency for specific slow emails. Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>

Remove timing measurements and console.warn calls that were added during the email delivery performance investigation. The analysis is complete (billing self-calls account for 96-99% of slow email delivery time under CI load), and the 60s deadline-based polling timeout is sufficient. Removed from backend-helpers.ts: - apiCallStart/apiCallEnd performance markers - pollCount and lastOutboxStatus tracking - console.warn calls for SLOW/WAITING emails - getOutboxEmails() calls in success path Removed from email-queue-step.tsx: - Per-stage timing markers in prepareSendPlan, processTenancyBatch, and processSingleEmail - console.warn calls for slow pipeline stages Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>

The dev branch added an is_development_environment field to the project API response. Update all inline snapshots in project-related E2E tests to include the new field (always false for test-created projects). Affected test files: - projects.test.ts (16 snapshots) - internal/projects.test.ts (12 snapshots) - integrations/custom/projects/provision.test.ts - integrations/neon/oauth-providers.test.ts - integrations/neon/projects/current.test.ts - integrations/neon/projects/provision.test.ts Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>

The setEnvironmentConfigOverride test was spying on isDevelopmentEnvironmentProject, but that function is called internally within the development-environment module (intra-module call), so vi.spyOn on the module export doesn't intercept it. Fix: spy on getEnvironmentConfigWriteBlockReason instead, which is the function that setEnvironmentConfigOverride actually imports and calls through the module namespace (cross-module call), making the spy effective. Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>

ErrorBoundary's errorComponent props.reset is always defined (not optional), so the conditional check and non-null assertion were unnecessary, triggering @typescript-eslint/no-unnecessary-condition. Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>

- project-permissions.test.ts, team-permissions.test.ts: add missing is_development_environment field to project response snapshots - config-local-emulator.test.ts: update blocked message to match the new development environment wording (was: 'cannot be changed in the local emulator', now: 'cannot be changed in a development environment') Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>

greptile-apps · 2026-05-20T17:03:40Z

Greptile Summary

This PR addresses flaky E2E tests and pre-existing CI failures across the backend, dashboard, and test suites. No production logic changes are included.

Flakiness fixes: email polling is upgraded from a 20 s counted loop to a 60 s deadline-based poll; ClickHouse eventual-consistency waits are replaced with proper retry/predicate helpers; payment tests get explicit 120 s timeouts; parallel email pre-fetching is introduced in the items test.
Snapshot and wording alignment: all project-response snapshots are updated for the new is_development_environment field; the local-emulator blocked message and the config.tsx in-source spy are corrected to match renamed functions and updated copy.
Dashboard lint fix: unconditional props.reset() call replaces the always-truthy optional-chaining guard inside the ErrorBoundary error component.

Confidence Score: 4/5

Safe to merge; changes are confined to tests and a trivial dashboard cosmetic fix with no production behaviour changes.

The production-code changes (whitespace in email-queue-step.tsx, spy target in config.tsx, and the props.reset guard removal in the dashboard) are all correct and low-risk. The bulk of the PR is test infrastructure: two newly introduced retry helpers (listReplaysWithRetry in session-replays and the polling loop in delivery-info) silently return the last response on timeout rather than throwing, which could produce misleading assertion errors when the budget is exhausted — unlike waitForMetricsMatch, which does throw. This inconsistency is the only noteworthy concern.

apps/e2e/tests/backend/endpoints/api/v1/session-replays.test.ts and apps/e2e/tests/backend/endpoints/api/v1/emails/delivery-info.test.ts warrant a second look for the silent-timeout behaviour in their new polling helpers.

Important Files Changed

Filename	Overview
apps/backend/src/lib/config.tsx	In-source vitest: spy target changed from `isDevelopmentEnvironmentProject` to `getEnvironmentConfigWriteBlockReason` to correctly intercept the intra-module call path; correctly fixes the test.
apps/backend/src/lib/email-queue-step.tsx	Whitespace-only change (removed two blank lines); no functional impact.
apps/backend/vitest.config.ts	Backend test timeout increased from 20 s to 60 s to accommodate longer email-polling and ClickHouse consistency waits.
apps/dashboard/src/app/(main)/(protected)/projects/[projectId]/widget-playground/page-client.tsx	Removed unnecessary conditional guard on `props.reset`; Next.js `ErrorBoundary` always supplies `reset` in the error-component props, so the check was always truthy and the non-null assertion was redundant.
apps/e2e/tests/backend/backend-helpers.ts	Email polling refactored to deadline-based (60 s) with a fixed 500 ms interval; removes `outboxEmails` from the error payload (minor debug-info loss) but substantially increases tolerance for slow mail delivery.
apps/e2e/tests/backend/endpoints/api/v1/emails/delivery-info.test.ts	Replaces fixed 12 s wait with a polling loop (30 × 1 s); loop always sleeps before the first request and silently falls through on timeout, producing opaque assertion errors — see inline comment.
apps/e2e/tests/backend/endpoints/api/v1/session-replays.test.ts	Adds `listReplaysWithRetry` to handle ClickHouse eventual consistency; multiple `Auth.Otp.signIn()` calls replaced with `Auth.fastSignUp()` to remove email dependencies. The helper silently returns on timeout rather than throwing — see inline comment.
apps/e2e/tests/backend/endpoints/api/v1/internal-metrics.test.ts	Retry predicate strengthened to wait for both `users_by_country["CH"] === 1` and `active_users_by_country["AQ"].length >= 2` simultaneously, ensuring all async metric writes complete before snapshot assertion.
apps/e2e/tests/backend/endpoints/api/v1/payments/items.test.ts	Pre-fetches all team-invitation emails in parallel before the sequential sign-up loop; valid optimization since invitations are already dispatched before the loop starts.
apps/e2e/tests/backend/endpoints/api/v1/internal/transactions-refund.test.ts	Three payment-related tests given explicit 120 s per-test timeout to accommodate live-mode Stripe round-trips.
apps/e2e/tests/backend/endpoints/api/v1/internal/projects.test.ts	Snapshot updates: adds `is_development_environment: false` to all project response snapshots to match the newly exposed API field.
apps/e2e/tests/backend/endpoints/api/v1/projects.test.ts	Snapshot updates: adds `is_development_environment: false` to all project response snapshots across OAuth, domain, and email config tests.
apps/e2e/tests/backend/endpoints/api/v1/unsubscribe-link.test.ts	Adds `credential_enabled: true` to project config and uses `noWaitForEmail` to skip the signup confirmation email wait, speeding up the test.
apps/e2e/tests/backend/endpoints/api/v1/token-refresh-events.test.ts	Default retry budget raised to 40 attempts × 500 ms (was 10 × 300 ms); makes event propagation waits more tolerant in slow CI environments.
apps/e2e/tests/backend/endpoints/api/v1/internal/config-local-emulator.test.ts	Error message updated from 'cannot be changed in the local emulator' to 'cannot be changed in a development environment' to match new backend wording.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[E2E Test Start] --> B{Needs Email?}
    B -- Yes, OTP/MagicLink --> C[sendSignInCode]
    B -- No --> D[Auth.fastSignUp]
    C --> E[Deadline-based poll\n60 s / 500 ms interval]
    E -- found --> F[Sign in with code]
    E -- timeout --> G[StackAssertionError thrown]
    F --> H[Test actions]
    D --> H
    H --> I{ClickHouse query?}
    I -- Yes --> J[listReplaysWithRetry /\nwaitForMetricsMatch /\nfetchEventsWithRetry]
    J -- predicate met --> K[Assert & continue]
    J -- timeout --> L[Return last response\nor throw]
    I -- No --> K

Comments Outside Diff (1)

apps/e2e/tests/backend/endpoints/api/v1/session-replays.test.ts, line 657-671 (link)

Silent timeout in listReplaysWithRetry

listReplaysWithRetry silently returns the last (potentially unsatisfying) response after exhausting all attempts rather than throwing. This is inconsistent with waitForMetricsMatch (used elsewhere in this file), which throws a descriptive error on timeout. If ClickHouse ingestion takes longer than 30 × 500 ms = 15 s, the subsequent assertions will produce misleading failures like "expected 0 to be 2" instead of "timed out waiting for predicate".

Prompt To Fix With AI

This is a comment left during a code review.
Path: apps/e2e/tests/backend/endpoints/api/v1/session-replays.test.ts
Line: 657-671

Comment:
**Silent timeout in listReplaysWithRetry**

`listReplaysWithRetry` silently returns the last (potentially unsatisfying) response after exhausting all attempts rather than throwing. This is inconsistent with `waitForMetricsMatch` (used elsewhere in this file), which throws a descriptive error on timeout. If ClickHouse ingestion takes longer than 30 × 500 ms = 15 s, the subsequent assertions will produce misleading failures like "expected 0 to be 2" instead of "timed out waiting for predicate".

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
apps/e2e/tests/backend/endpoints/api/v1/session-replays.test.ts:657-671
**Silent timeout in listReplaysWithRetry**

`listReplaysWithRetry` silently returns the last (potentially unsatisfying) response after exhausting all attempts rather than throwing. This is inconsistent with `waitForMetricsMatch` (used elsewhere in this file), which throws a descriptive error on timeout. If ClickHouse ingestion takes longer than 30 × 500 ms = 15 s, the subsequent assertions will produce misleading failures like "expected 0 to be 2" instead of "timed out waiting for predicate".

### Issue 2 of 2
apps/e2e/tests/backend/endpoints/api/v1/emails/delivery-info.test.ts:170-181
**Polling loop waits before first check and gives no timeout error**

The loop unconditionally waits 1 s before ever issuing the first request, wasting time even when the email is already processed. More importantly, if the 30-iteration budget (30 s) is exhausted without the condition being met, the loop exits silently; the subsequent `expect(stats.hour.sent).toBe(1)` then produces a confusing assertion failure rather than a clear timeout message. `waitForMetricsMatch` in the same test suite throws a descriptive error on timeout — a similar pattern here would make failures easier to diagnose.

_{Reviews (1): Last reviewed commit: "Merge branch 'dev' into devin/1779233168..." | Re-trigger Greptile}

greptile-apps · 2026-05-20T17:03:48Z

+    // Poll until email stats are updated instead of a fixed wait
+    let response;
+    for (let i = 0; i < 30; i++) {
+      await wait(1_000);
+      response = await niceBackendFetch("/api/v1/emails/delivery-info", {
+        method: "GET",
+        accessType: "server",
+      });
+      if (response.status === 200 && response.body?.stats?.hour?.sent >= 1) {
+        break;
      }
-    `);
+    }


Polling loop waits before first check and gives no timeout error

The loop unconditionally waits 1 s before ever issuing the first request, wasting time even when the email is already processed. More importantly, if the 30-iteration budget (30 s) is exhausted without the condition being met, the loop exits silently; the subsequent expect(stats.hour.sent).toBe(1) then produces a confusing assertion failure rather than a clear timeout message. waitForMetricsMatch in the same test suite throws a descriptive error on timeout — a similar pattern here would make failures easier to diagnose.

Prompt To Fix With AI

This is a comment left during a code review. Path: apps/e2e/tests/backend/endpoints/api/v1/emails/delivery-info.test.ts Line: 170-181 Comment: **Polling loop waits before first check and gives no timeout error** The loop unconditionally waits 1 s before ever issuing the first request, wasting time even when the email is already processed. More importantly, if the 30-iteration budget (30 s) is exhausted without the condition being met, the loop exits silently; the subsequent `expect(stats.hour.sent).toBe(1)` then produces a confusing assertion failure rather than a clear timeout message. `waitForMetricsMatch` in the same test suite throws a descriptive error on timeout — a similar pattern here would make failures easier to diagnose. How can I resolve this? If you propose a fix, please make it concise.

devin-ai-integration Bot and others added 10 commits May 19, 2026 23:26

devin-ai-integration Bot assigned N2D4 May 19, 2026

vercel Bot deployed to Preview – stack-auth-skills May 19, 2026 23:43 View deployment

vercel Bot deployed to Preview – stack-auth-mcp May 19, 2026 23:44 View deployment

vercel Bot deployed to Preview – stack-preview-dashboard May 19, 2026 23:45 View deployment

vercel Bot deployed to Preview – stack-preview-backend May 19, 2026 23:46 View deployment

vercel Bot deployed to Preview – stack-auth-skills May 20, 2026 01:17 View deployment

vercel Bot deployed to Preview – stack-auth-mcp May 20, 2026 01:17 View deployment

vercel Bot deployed to Preview – stack-preview-dashboard May 20, 2026 01:19 View deployment

vercel Bot deployed to Preview – stack-preview-backend May 20, 2026 01:19 View deployment

Merge branch 'dev' into devin/1779233168-fix-flaky-tests

33927b7

vercel Bot deployed to Preview – stack-auth-skills May 20, 2026 02:07 View deployment

vercel Bot deployed to Preview – stack-auth-mcp May 20, 2026 02:08 View deployment

vercel Bot deployed to Preview – stack-preview-dashboard May 20, 2026 02:09 View deployment

vercel Bot deployed to Preview – stack-preview-backend May 20, 2026 02:10 View deployment

devin-ai-integration Bot and others added 2 commits May 20, 2026 06:12

vercel Bot deployed to Preview – stack-auth-skills May 20, 2026 06:13 View deployment

vercel Bot deployed to Preview – stack-auth-mcp May 20, 2026 06:13 View deployment

vercel Bot deployed to Preview – stack-preview-dashboard May 20, 2026 06:15 View deployment

vercel Bot deployed to Preview – stack-preview-backend May 20, 2026 06:16 View deployment

vercel Bot deployed to Preview – stack-auth-skills May 20, 2026 06:28 View deployment

vercel Bot deployed to Preview – stack-auth-mcp May 20, 2026 06:28 View deployment

vercel Bot deployed to Preview – stack-preview-dashboard May 20, 2026 06:30 View deployment

vercel Bot deployed to Preview – stack-preview-backend May 20, 2026 06:31 View deployment

vercel Bot deployed to Preview – stack-auth-skills May 20, 2026 07:05 View deployment

vercel Bot deployed to Preview – stack-auth-mcp May 20, 2026 07:05 View deployment

vercel Bot deployed to Preview – stack-preview-dashboard May 20, 2026 07:07 View deployment

vercel Bot deployed to Preview – stack-preview-backend May 20, 2026 07:07 View deployment

devin-ai-integration Bot changed the title ~~Fix flaky tests: ClickHouse polling, email timeouts, and test budget optimization~~ Fix flaky tests and preexisting CI failures May 20, 2026

N2D4 marked this pull request as ready for review May 20, 2026 16:56

Copilot AI review requested due to automatic review settings May 20, 2026 16:56

Copilot started reviewing on behalf of N2D4 May 20, 2026 16:57 View session

Merge branch 'dev' into devin/1779233168-fix-flaky-tests

5a4e972

vercel Bot deployed to Preview – stack-auth-skills May 20, 2026 16:58 View deployment

vercel Bot deployed to Preview – stack-auth-mcp May 20, 2026 16:58 View deployment

vercel Bot deployed to Preview – stack-preview-dashboard May 20, 2026 17:00 View deployment

N2D4 merged commit ffbd09d into dev May 20, 2026
29 of 32 checks passed

N2D4 deleted the devin/1779233168-fix-flaky-tests branch May 20, 2026 17:00

vercel Bot deployed to Preview – stack-preview-backend May 20, 2026 17:01 View deployment

greptile-apps Bot reviewed May 20, 2026

View reviewed changes

N2D4 review requested due to automatic review settings May 20, 2026 17:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky tests and preexisting CI failures#1443

Fix flaky tests and preexisting CI failures#1443
N2D4 merged 18 commits into
devfrom
devin/1779233168-fix-flaky-tests

N2D4 commented May 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

vercel Bot commented May 19, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot commented May 19, 2026

Uh oh!

coderabbitai Bot commented May 19, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

Uh oh!

greptile-apps Bot commented May 20, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

N2D4 commented May 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

vercel Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devin-ai-integration Bot commented May 19, 2026

🤖 Devin AI Engineer

Uh oh!

coderabbitai Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

Uh oh!

greptile-apps Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

N2D4 commented May 19, 2026 •

edited by coderabbitai Bot

Loading

vercel Bot commented May 19, 2026 •

edited

Loading

coderabbitai Bot commented May 19, 2026 •

edited

Loading

greptile-apps Bot commented May 20, 2026 •

edited

Loading