Skip to content

Fix flaky tests and preexisting CI failures#1443

Merged
N2D4 merged 18 commits into
devfrom
devin/1779233168-fix-flaky-tests
May 20, 2026
Merged

Fix flaky tests and preexisting CI failures#1443
N2D4 merged 18 commits into
devfrom
devin/1779233168-fix-flaky-tests

Conversation

@N2D4
Copy link
Copy Markdown
Contributor

@N2D4 N2D4 commented May 19, 2026

Fix flaky tests and all preexisting CI failures to achieve green checks.

Flaky test fixes:

  • Email polling: 20s→60s deadline-based polling in sendSignInCode
  • ClickHouse eventual consistency: retry loops for session-replays, token-refresh-events, internal-metrics
  • Unnecessary email waits: replaced with fastSignUp() or noWaitForEmail in 6 test files
  • Fixed waits: delivery-info wait(12s) → polling loop
  • Timeout/parallelism: backend timeout 20s→60s, parallel email fetching, 120s for payment tests

Preexisting CI fixes:

  • Update E2E snapshots for is_development_environment field across all project response tests
  • Fix config.tsx in-source test: spy on getEnvironmentConfigWriteBlockReason instead of isDevelopmentEnvironmentProject (intra-module call not intercepted by module namespace spy)
  • Fix config-local-emulator.test.ts: update blocked message to match new development environment wording
  • Fix dashboard lint: remove unnecessary conditional on always-truthy props.reset

Link to Devin session: https://app.devin.ai/sessions/8a3910351551437a9b655d15af4c4154
Requested by: @N2D4


Note

Low Risk
Primarily adjusts test timeouts, polling/retry logic, and snapshots; production code changes are minimal and localized to test behavior and a small dashboard error-boundary handler.

Overview
Improves CI stability by replacing fixed sleeps and short retry loops with deadline-based polling/retries across backend E2E tests (email delivery/sign-in, ClickHouse-ingested analytics/session-replays, internal metrics), and by skipping unnecessary email waits via Auth.fastSignUp() / noWaitForEmail.

Updates numerous E2E snapshots to include the new is_development_environment field in project responses, aligns local-emulator blocking wording with “development environment”, and bumps backend Vitest testTimeout from 20s to 60s.

Fixes a brittle in-source Vitest test in config.tsx by spying on getEnvironmentConfigWriteBlockReason (instead of an intra-module call that couldn’t be intercepted), and removes an unnecessary conditional around props.reset() in the dashboard widget playground error boundary.

Reviewed by Cursor Bugbot for commit 5a4e972. Bugbot is set up for automated code reviews on this repo. Configure here.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added is_development_environment field to project configuration API responses.
  • Bug Fixes

    • Fixed "Reload widget" button in widget error boundary to always be functional.
  • Tests

    • Improved test reliability and performance with faster authentication methods and intelligent polling mechanisms.
    • Updated test timeout configurations and error messaging for development environments.

Review Change Stack

devin-ai-integration Bot and others added 10 commits May 19, 2026 23:26
The sendSignInCode helper used a fixed 40-iteration loop with increasing
delays (100 + i*20 ms), giving a maximum wait of ~20 seconds. Under CI
load with parallel test workers, the email queue + Inbucket delivery
pipeline can exceed this window, causing flaky 'Sign-in code message not
found after 40 attempts' errors.

Root cause: The email delivery path is asynchronous — the API queues the
email, the email-queue worker picks it up, renders the template, and
sends it to Inbucket. When multiple test workers are running
concurrently (as in CI), this pipeline competes for CPU and I/O,
regularly pushing delivery times past the 20s polling window.

Fix: Replace the fixed-iteration loop with a deadline-based polling
approach (60s deadline, 500ms interval), matching the pattern already
used by waitForMessagesWithSubject. This gives 3x more headroom while
keeping the polling interval consistent. The loop still exits immediately
once the message appears, so there is no cost on the happy path.

Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
…plays tests

Three session-replays tests were flaky:
1. 'admin list session replays filters by user_ids'
2. 'admin list session replays filters by click_count_min'
3. 'admin events endpoint does not allow fetching a chunk via the wrong session id'

Root causes:
- The user_ids filter test queried ClickHouse immediately after uploading
  batches without any retry loop. ClickHouse uses async_insert (server-
  buffered, no wait_for_async_insert), so data is not immediately
  queryable. Under CI load, this caused assertion failures (expected
  items.length === 1 but got 0).
- The click_count_min test had a retry loop but with only 15 attempts ×
  500ms = 7.5s, which was insufficient under CI load where ClickHouse
  ingestion can take 10–15s.
- The chunk isolation test used two Auth.Otp.signIn() calls just to get
  two authenticated sessions. Each OTP sign-in involves email delivery
  and polling (5–15s each under CI load), consuming 20–40s of the 50s
  test timeout budget and causing timeouts.

Fixes:
- Added listReplaysWithRetry helper (30 attempts × 500ms = 15s) with a
  predicate-based retry loop, consistent with the retry patterns already
  used in analytics-events.test.ts and token-refresh-events.test.ts.
- Replaced Auth.Otp.signIn() with Auth.fastSignUp() in tests that don't
  specifically test OTP. fastSignUp creates a user and session server-side
  without any email flow, taking <100ms vs 5–15s for OTP under CI load.
- Used listReplaysWithRetry for both user_ids and click_count_min
  assertions to tolerate ClickHouse eventual consistency.

Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
…ken-refresh-events tests

The 'multiple session refreshes create one event each' test was flaky
because:

1. The ClickHouse polling window (15 attempts × 300ms = 4.5s) was too
   short. The event pipeline goes through SDK self-call → quota debit →
   Postgres insert → ClickHouse async_insert (server-buffered). Under CI
   load this can exceed 4.5s, causing 'Expected exactly N events but
   found N-1' errors.

2. The test used Auth.Otp.signIn() for initial session creation, which
   involves email delivery and polling (~5-15s under CI load). This ate
   into the 50s test timeout budget, leaving insufficient time for four
   sequential ClickHouse polling cycles.

Fixes:
- Increased fetchEventsWithRetry defaults from 10 attempts × 300ms to
  40 attempts × 500ms (20s max), matching the pattern in
  analytics-events.test.ts. The loop still exits immediately when the
  expected count appears, so there is no cost on the happy path.
- Updated expectExactlyNTokenRefreshEvents to use the new defaults
  instead of hardcoded 15 × 300ms.
- Replaced Auth.Otp.signIn() with Auth.fastSignUp() in the 'multiple
  session refreshes' test since it only needs an initial session with
  a refresh token, not OTP specifically. This saves 5-15s per test run.

Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
…nal-metrics test

The 'should return metrics data with users' test was flaky because
waitForMetricsToIncludeUsersByCountry only checked that users_by_country
['CH'] === 1 before taking the snapshot. The snapshot also expects 2
users in 'AQ' (active_users_by_country), but under CI load ClickHouse
may ingest AQ events after CH events, causing the snapshot to see only 1
user in AQ instead of 2.

Root cause: The ClickHouse async_insert pipeline ingests events in a
non-deterministic order. When the test creates users with different
country data, the events for different countries may arrive at different
times. Waiting for only one country's count to match doesn't guarantee
the other country's data is fully ingested.

Fix: Use waitForMetricsMatch with a combined predicate that checks both
users_by_country['CH'] === 1 AND active_users_by_country['AQ'].length
>= 2, ensuring all expected data is present before the snapshot fires.

Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
…test

The 'cannot read events from other projects' test was timing out at 50s
because it performed two Auth.Otp.signIn() calls (one per project). Each
OTP sign-in involves sending an email, polling the mailbox for delivery,
and extracting the code — taking 5-15s under CI load. Two sign-ins could
consume 20-40s of the 50s test timeout, leaving insufficient time for
the ClickHouse polling that follows (up to 20s).

Root cause: The test only needs authenticated users in two projects to
verify cross-project event isolation. It doesn't test OTP functionality.
Using slow email-based auth for a non-auth test wastes timeout budget.

Fix: Replace both Auth.Otp.signIn() calls with Auth.fastSignUp(), which
creates users and sessions server-side without any email flow (<100ms
each). fastSignUp still generates $token-refresh events in ClickHouse
(via the session creation path), so the test's assertions remain valid.

Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
The 'cannot read another user sessions as client' test was timing out at
50s because it performed two Auth.Password.signUpWithEmail() calls, each
waiting for the verification email. Under CI load with 8 parallel test
workers, email delivery can take 5-15s per message, consuming 20-40s of
the 50s budget.

Root cause: The test only needs two authenticated users to verify that
user2 cannot read user1's sessions. Email verification is irrelevant to
this test's purpose.

Fix: Pass { noWaitForEmail: true } to both signUpWithEmail calls. This
skips the verification email wait while still creating authenticated
users with valid sessions.

Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
…tats test

The 'should track sent emails' test was flaky because:

1. Auth.Otp.signIn() consumed 5-15s of timeout budget under CI load,
   leaving less time for the fixed 12s wait.
2. The fixed wait(12_000) was not always sufficient for the email stats
   pipeline to update, especially under CI resource contention.
3. The rate_per_second value in the inline snapshot was timing-dependent
   (computed from recent activity), causing non-deterministic mismatches.

Root cause: The email delivery tracking pipeline processes sent emails
asynchronously. A fixed wait is inherently fragile — if the pipeline is
slower than 12s (common under CI load), the stats show sent=0 and the
snapshot mismatches.

Fixes:
- Replaced Auth.Otp.signIn() with Auth.fastSignUp() since the test just
  needs an initial auth session.
- Replaced the fixed wait(12_000) with a retry loop (30 attempts × 1s)
  that polls delivery-info until stats.hour.sent >= 1. This adapts to
  pipeline latency automatically.
- Replaced the inline snapshot with explicit field assertions to avoid
  flakiness from the timing-dependent rate_per_second value. The rate
  is checked with a range (27 < rate < 28) instead of exact matching.

Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
The 'unsubscribe link should not be sent for emails with transactional
notification category' test was timing out at 50s because
Auth.Password.signUpWithEmail() waits for the verification email (up to
60s). Under CI load, this can take 15-30s, leaving insufficient time for
the test's own email send + waitForMessagesWithSubject call (which also
has a 60s deadline).

Root cause: The test needs an authenticated user with a valid email to
send a custom email and check its content. Email verification is not
required — the test only checks if the sent email contains an
unsubscribe link.

Fix: Pass { noWaitForEmail: true } to skip the verification email wait.
Also added credential_enabled: true to the project config to ensure
password auth is available.

Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
…tests

Multiple backend unit tests (regen-internal-subscriptions-to-latest and
ensure-free-plan) were timing out at 20s under CI load. These tests
perform real database operations (Prisma queries, transaction processing)
that compete for I/O with 8 parallel test workers.

Root cause: The 20s timeout was calibrated for isolated local runs. In
CI, the shared database connection pool is contended by parallel workers,
causing individual queries to take 2-5x longer than in isolation.

Fix: Increase the backend testTimeout from 20s to 60s. This matches the
e2e test timeout pattern (50s in CI) and provides sufficient headroom for
DB-heavy tests under CI load. The timeout only fires if a test truly
hangs — well-behaved tests still complete in their normal time.

Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
…for refund tests

Two payment-related tests were timing out under CI load:

1. 'allows team admins to be added when item quantity is increased'
   (items.test.ts): The test sends 3 invitation emails in parallel but
   then waits for each email sequentially in the for loop. Each
   waitForMessagesWithSubject call has a 60s deadline. Under CI load,
   3 sequential waits can consume 15-45s of the 50s test timeout.

   Fix: Pre-fetch all invitation emails in parallel using Promise.all()
   before the acceptance loop. Since the invitations are already sent,
   all mailboxes can poll concurrently, reducing the total wait from
   3×N seconds to max(N) seconds.

2. 'refunds a renewal invoice' and related tests
   (transactions-refund.test.ts): Each test calls
   createLiveModeSubscriptionWithRenewal() which performs 7+ API calls
   including Stripe webhook sends, project setup, and payment config
   updates. Under CI load, this setup alone can take 30-40s.

   Fix: Added { timeout: 120_000 } to all three tests that use
   createLiveModeSubscriptionWithRenewal, providing adequate headroom
   for the heavy payment pipeline operations.

Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 19, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
stack-auth-hosted-components Ready Ready Preview, Comment May 20, 2026 5:06pm
stack-auth-mcp Ready Ready Preview, Comment May 20, 2026 5:06pm
stack-auth-skills Ready Ready Preview, Comment May 20, 2026 5:06pm
stack-backend Ready Ready Preview, Comment May 20, 2026 5:06pm
stack-dashboard Ready Ready Preview, Comment May 20, 2026 5:06pm
stack-demo Ready Ready Preview, Comment May 20, 2026 5:06pm
stack-docs Ready Ready Preview, Comment May 20, 2026 5:06pm
stack-preview-backend Ready Ready Preview, Comment May 20, 2026 5:06pm
stack-preview-dashboard Ready Ready Preview, Comment May 20, 2026 5:06pm

@devin-ai-integration
Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR that start with 'DevinAI' or '@devin'.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 19, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e51575d4-1d5b-4089-8113-2e07fbc7029d

📥 Commits

Reviewing files that changed from the base of the PR and between 97f86a1 and 5a4e972.

📒 Files selected for processing (23)
  • apps/backend/src/lib/config.tsx
  • apps/backend/src/lib/email-queue-step.tsx
  • apps/backend/vitest.config.ts
  • apps/dashboard/src/app/(main)/(protected)/projects/[projectId]/widget-playground/page-client.tsx
  • apps/e2e/tests/backend/backend-helpers.ts
  • apps/e2e/tests/backend/endpoints/api/v1/analytics-events.test.ts
  • apps/e2e/tests/backend/endpoints/api/v1/auth/sessions/index.test.ts
  • apps/e2e/tests/backend/endpoints/api/v1/emails/delivery-info.test.ts
  • apps/e2e/tests/backend/endpoints/api/v1/integrations/custom/projects/provision.test.ts
  • apps/e2e/tests/backend/endpoints/api/v1/integrations/neon/oauth-providers.test.ts
  • apps/e2e/tests/backend/endpoints/api/v1/integrations/neon/projects/current.test.ts
  • apps/e2e/tests/backend/endpoints/api/v1/integrations/neon/projects/provision.test.ts
  • apps/e2e/tests/backend/endpoints/api/v1/internal-metrics.test.ts
  • apps/e2e/tests/backend/endpoints/api/v1/internal/config-local-emulator.test.ts
  • apps/e2e/tests/backend/endpoints/api/v1/internal/projects.test.ts
  • apps/e2e/tests/backend/endpoints/api/v1/internal/transactions-refund.test.ts
  • apps/e2e/tests/backend/endpoints/api/v1/payments/items.test.ts
  • apps/e2e/tests/backend/endpoints/api/v1/project-permissions.test.ts
  • apps/e2e/tests/backend/endpoints/api/v1/projects.test.ts
  • apps/e2e/tests/backend/endpoints/api/v1/session-replays.test.ts
  • apps/e2e/tests/backend/endpoints/api/v1/team-permissions.test.ts
  • apps/e2e/tests/backend/endpoints/api/v1/token-refresh-events.test.ts
  • apps/e2e/tests/backend/endpoints/api/v1/unsubscribe-link.test.ts

📝 Walkthrough

Walkthrough

This PR adds the is_development_environment boolean field to project API responses, updates development environment write-blocking to check getEnvironmentConfigWriteBlockReason, and improves E2E test reliability through faster auth methods, deadline-based polling, and retry helpers.

Changes

Development Environment Configuration and E2E Test Improvements

Layer / File(s) Summary
Development environment config blocking setup
apps/backend/src/lib/config.tsx, apps/e2e/tests/backend/endpoints/api/v1/internal/config-local-emulator.test.ts
Updated setEnvironmentConfigOverride test to spy on getEnvironmentConfigWriteBlockReason instead of isDevelopmentEnvironmentProject, and changed expected error message from "local emulator" to "development environment".
Project response snapshots with is_development_environment field
apps/e2e/tests/backend/endpoints/api/v1/integrations/custom/projects/*, apps/e2e/tests/backend/endpoints/api/v1/integrations/neon/*, apps/e2e/tests/backend/endpoints/api/v1/internal/projects.test.ts, apps/e2e/tests/backend/endpoints/api/v1/*-permissions.test.ts, apps/e2e/tests/backend/endpoints/api/v1/projects.test.ts
All project API response snapshots updated to include is_development_environment: false field across internal, integration, and public endpoints.
E2E test polling and retry infrastructure
apps/e2e/tests/backend/endpoints/api/v1/session-replays.test.ts, apps/e2e/tests/backend/endpoints/api/v1/internal-metrics.test.ts, apps/e2e/tests/backend/endpoints/api/v1/token-refresh-events.test.ts
Introduced listReplaysWithRetry helper for predicate-based polling, updated fetchEventsWithRetry defaults to longer deadlines and more attempts, and switched internal-metrics test to predicate-based metric polling.
E2E test auth modernization and timing improvements
apps/e2e/tests/backend/endpoints/api/v1/analytics-events.test.ts, apps/e2e/tests/backend/endpoints/api/v1/auth/sessions/index.test.ts, apps/e2e/tests/backend/endpoints/api/v1/internal/transactions-refund.test.ts, apps/e2e/tests/backend/endpoints/api/v1/payments/items.test.ts, apps/e2e/tests/backend/endpoints/api/v1/session-replays.test.ts, apps/e2e/tests/backend/endpoints/api/v1/unsubscribe-link.test.ts
Migrated E2E test auth from Auth.Otp.signIn() to Auth.fastSignUp() and Auth.Password.signUpWithEmail({ noWaitForEmail: true }), added explicit timeout: 120_000 to payment refund tests, and parallelized email message fetching.
Email delivery reliability and OTP polling
apps/e2e/tests/backend/backend-helpers.ts, apps/e2e/tests/backend/endpoints/api/v1/emails/delivery-info.test.ts
Updated Auth.Otp.sendSignInCode from attempt-count polling to deadline-based polling using performance.now() with fixed 500ms intervals, and improved email delivery-info test to actively poll until sent counter updates instead of fixed 12s wait.
Widget error boundary and test configuration
apps/dashboard/src/app/(main)/(protected)/projects/[projectId]/widget-playground/page-client.tsx, apps/backend/vitest.config.ts, apps/backend/src/lib/email-queue-step.tsx
Unconditionally render "Reload widget" button and invoke props.reset(), increased Vitest testTimeout to 60 seconds, and adjusted email-queue-step formatting around quota-gate and send-attempt logic.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • hexclave/stack-auth#1435: Implements the development-environment config write-blocking logic that this PR's test updates verify.
  • hexclave/stack-auth#1420: Similar ClickHouse polling improvements to internal-metrics test for eventual-consistency handling.
  • hexclave/stack-auth#1424: Backend session-replay endpoint changes that align with the PR's session-replay E2E test retry helper additions.

Suggested reviewers

  • nams1570

Poem

🐰 A rabbit hops through snapshots vast and wide,
With is_development_environment by its side,
Auth flows now swift on fastSignUp feet,
Polling helpers make flaky tests complete,
The widget reloads, timeout grows long—
Infrastructure sings a steadier song!

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch devin/1779233168-fix-flaky-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Adds detailed timing logs to identify what makes specific emails exceed
the 20s boundary:

- sendSignInCode: logs total/api/poll time when >10s, tracks outbox
  status transitions during polling, reports poll count on timeout
- prepareSendPlan: logs delivery stats query time, claim time, and
  quota=0 deferrals per tenancy
- processTenancyBatch: logs setup time (getTenancy + getEmailConfig)
  and billingTeamId presence
- processSingleEmail: logs per-email breakdown (resolve/billing/smtp)
  when >2s

These logs will show up in CI output for any email exceeding thresholds,
allowing us to identify exactly which pipeline stage is causing the
tail latency for specific slow emails.

Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
devin-ai-integration Bot and others added 2 commits May 20, 2026 06:12
Remove timing measurements and console.warn calls that were added
during the email delivery performance investigation. The analysis is
complete (billing self-calls account for 96-99% of slow email delivery
time under CI load), and the 60s deadline-based polling timeout is
sufficient.

Removed from backend-helpers.ts:
- apiCallStart/apiCallEnd performance markers
- pollCount and lastOutboxStatus tracking
- console.warn calls for SLOW/WAITING emails
- getOutboxEmails() calls in success path

Removed from email-queue-step.tsx:
- Per-stage timing markers in prepareSendPlan, processTenancyBatch,
  and processSingleEmail
- console.warn calls for slow pipeline stages

Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
The dev branch added an is_development_environment field to the project
API response. Update all inline snapshots in project-related E2E tests
to include the new field (always false for test-created projects).

Affected test files:
- projects.test.ts (16 snapshots)
- internal/projects.test.ts (12 snapshots)
- integrations/custom/projects/provision.test.ts
- integrations/neon/oauth-providers.test.ts
- integrations/neon/projects/current.test.ts
- integrations/neon/projects/provision.test.ts

Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
The setEnvironmentConfigOverride test was spying on
isDevelopmentEnvironmentProject, but that function is called
internally within the development-environment module (intra-module
call), so vi.spyOn on the module export doesn't intercept it.

Fix: spy on getEnvironmentConfigWriteBlockReason instead, which is
the function that setEnvironmentConfigOverride actually imports and
calls through the module namespace (cross-module call), making the
spy effective.

Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
ErrorBoundary's errorComponent props.reset is always defined (not
optional), so the conditional check and non-null assertion were
unnecessary, triggering @typescript-eslint/no-unnecessary-condition.

Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
- project-permissions.test.ts, team-permissions.test.ts: add missing
  is_development_environment field to project response snapshots
- config-local-emulator.test.ts: update blocked message to match the
  new development environment wording (was: 'cannot be changed in the
  local emulator', now: 'cannot be changed in a development environment')

Co-Authored-By: Konstantin Wohlwend <n2d4xc@gmail.com>
@devin-ai-integration devin-ai-integration Bot changed the title Fix flaky tests: ClickHouse polling, email timeouts, and test budget optimization Fix flaky tests and preexisting CI failures May 20, 2026
@N2D4 N2D4 marked this pull request as ready for review May 20, 2026 16:56
Copilot AI review requested due to automatic review settings May 20, 2026 16:56
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 20, 2026

Greptile Summary

This PR addresses flaky E2E tests and pre-existing CI failures across the backend, dashboard, and test suites. No production logic changes are included.

  • Flakiness fixes: email polling is upgraded from a 20 s counted loop to a 60 s deadline-based poll; ClickHouse eventual-consistency waits are replaced with proper retry/predicate helpers; payment tests get explicit 120 s timeouts; parallel email pre-fetching is introduced in the items test.
  • Snapshot and wording alignment: all project-response snapshots are updated for the new is_development_environment field; the local-emulator blocked message and the config.tsx in-source spy are corrected to match renamed functions and updated copy.
  • Dashboard lint fix: unconditional props.reset() call replaces the always-truthy optional-chaining guard inside the ErrorBoundary error component.

Confidence Score: 4/5

Safe to merge; changes are confined to tests and a trivial dashboard cosmetic fix with no production behaviour changes.

The production-code changes (whitespace in email-queue-step.tsx, spy target in config.tsx, and the props.reset guard removal in the dashboard) are all correct and low-risk. The bulk of the PR is test infrastructure: two newly introduced retry helpers (listReplaysWithRetry in session-replays and the polling loop in delivery-info) silently return the last response on timeout rather than throwing, which could produce misleading assertion errors when the budget is exhausted — unlike waitForMetricsMatch, which does throw. This inconsistency is the only noteworthy concern.

apps/e2e/tests/backend/endpoints/api/v1/session-replays.test.ts and apps/e2e/tests/backend/endpoints/api/v1/emails/delivery-info.test.ts warrant a second look for the silent-timeout behaviour in their new polling helpers.

Important Files Changed

Filename Overview
apps/backend/src/lib/config.tsx In-source vitest: spy target changed from isDevelopmentEnvironmentProject to getEnvironmentConfigWriteBlockReason to correctly intercept the intra-module call path; correctly fixes the test.
apps/backend/src/lib/email-queue-step.tsx Whitespace-only change (removed two blank lines); no functional impact.
apps/backend/vitest.config.ts Backend test timeout increased from 20 s to 60 s to accommodate longer email-polling and ClickHouse consistency waits.
apps/dashboard/src/app/(main)/(protected)/projects/[projectId]/widget-playground/page-client.tsx Removed unnecessary conditional guard on props.reset; Next.js ErrorBoundary always supplies reset in the error-component props, so the check was always truthy and the non-null assertion was redundant.
apps/e2e/tests/backend/backend-helpers.ts Email polling refactored to deadline-based (60 s) with a fixed 500 ms interval; removes outboxEmails from the error payload (minor debug-info loss) but substantially increases tolerance for slow mail delivery.
apps/e2e/tests/backend/endpoints/api/v1/emails/delivery-info.test.ts Replaces fixed 12 s wait with a polling loop (30 × 1 s); loop always sleeps before the first request and silently falls through on timeout, producing opaque assertion errors — see inline comment.
apps/e2e/tests/backend/endpoints/api/v1/session-replays.test.ts Adds listReplaysWithRetry to handle ClickHouse eventual consistency; multiple Auth.Otp.signIn() calls replaced with Auth.fastSignUp() to remove email dependencies. The helper silently returns on timeout rather than throwing — see inline comment.
apps/e2e/tests/backend/endpoints/api/v1/internal-metrics.test.ts Retry predicate strengthened to wait for both users_by_country["CH"] === 1 and active_users_by_country["AQ"].length >= 2 simultaneously, ensuring all async metric writes complete before snapshot assertion.
apps/e2e/tests/backend/endpoints/api/v1/payments/items.test.ts Pre-fetches all team-invitation emails in parallel before the sequential sign-up loop; valid optimization since invitations are already dispatched before the loop starts.
apps/e2e/tests/backend/endpoints/api/v1/internal/transactions-refund.test.ts Three payment-related tests given explicit 120 s per-test timeout to accommodate live-mode Stripe round-trips.
apps/e2e/tests/backend/endpoints/api/v1/internal/projects.test.ts Snapshot updates: adds is_development_environment: false to all project response snapshots to match the newly exposed API field.
apps/e2e/tests/backend/endpoints/api/v1/projects.test.ts Snapshot updates: adds is_development_environment: false to all project response snapshots across OAuth, domain, and email config tests.
apps/e2e/tests/backend/endpoints/api/v1/unsubscribe-link.test.ts Adds credential_enabled: true to project config and uses noWaitForEmail to skip the signup confirmation email wait, speeding up the test.
apps/e2e/tests/backend/endpoints/api/v1/token-refresh-events.test.ts Default retry budget raised to 40 attempts × 500 ms (was 10 × 300 ms); makes event propagation waits more tolerant in slow CI environments.
apps/e2e/tests/backend/endpoints/api/v1/internal/config-local-emulator.test.ts Error message updated from 'cannot be changed in the local emulator' to 'cannot be changed in a development environment' to match new backend wording.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[E2E Test Start] --> B{Needs Email?}
    B -- Yes, OTP/MagicLink --> C[sendSignInCode]
    B -- No --> D[Auth.fastSignUp]
    C --> E[Deadline-based poll\n60 s / 500 ms interval]
    E -- found --> F[Sign in with code]
    E -- timeout --> G[StackAssertionError thrown]
    F --> H[Test actions]
    D --> H
    H --> I{ClickHouse query?}
    I -- Yes --> J[listReplaysWithRetry /\nwaitForMetricsMatch /\nfetchEventsWithRetry]
    J -- predicate met --> K[Assert & continue]
    J -- timeout --> L[Return last response\nor throw]
    I -- No --> K
Loading

Comments Outside Diff (1)

  1. apps/e2e/tests/backend/endpoints/api/v1/session-replays.test.ts, line 657-671 (link)

    P2 Silent timeout in listReplaysWithRetry

    listReplaysWithRetry silently returns the last (potentially unsatisfying) response after exhausting all attempts rather than throwing. This is inconsistent with waitForMetricsMatch (used elsewhere in this file), which throws a descriptive error on timeout. If ClickHouse ingestion takes longer than 30 × 500 ms = 15 s, the subsequent assertions will produce misleading failures like "expected 0 to be 2" instead of "timed out waiting for predicate".

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: apps/e2e/tests/backend/endpoints/api/v1/session-replays.test.ts
    Line: 657-671
    
    Comment:
    **Silent timeout in listReplaysWithRetry**
    
    `listReplaysWithRetry` silently returns the last (potentially unsatisfying) response after exhausting all attempts rather than throwing. This is inconsistent with `waitForMetricsMatch` (used elsewhere in this file), which throws a descriptive error on timeout. If ClickHouse ingestion takes longer than 30 × 500 ms = 15 s, the subsequent assertions will produce misleading failures like "expected 0 to be 2" instead of "timed out waiting for predicate".
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
apps/e2e/tests/backend/endpoints/api/v1/session-replays.test.ts:657-671
**Silent timeout in listReplaysWithRetry**

`listReplaysWithRetry` silently returns the last (potentially unsatisfying) response after exhausting all attempts rather than throwing. This is inconsistent with `waitForMetricsMatch` (used elsewhere in this file), which throws a descriptive error on timeout. If ClickHouse ingestion takes longer than 30 × 500 ms = 15 s, the subsequent assertions will produce misleading failures like "expected 0 to be 2" instead of "timed out waiting for predicate".

### Issue 2 of 2
apps/e2e/tests/backend/endpoints/api/v1/emails/delivery-info.test.ts:170-181
**Polling loop waits before first check and gives no timeout error**

The loop unconditionally waits 1 s before ever issuing the first request, wasting time even when the email is already processed. More importantly, if the 30-iteration budget (30 s) is exhausted without the condition being met, the loop exits silently; the subsequent `expect(stats.hour.sent).toBe(1)` then produces a confusing assertion failure rather than a clear timeout message. `waitForMetricsMatch` in the same test suite throws a descriptive error on timeout — a similar pattern here would make failures easier to diagnose.

Reviews (1): Last reviewed commit: "Merge branch 'dev' into devin/1779233168..." | Re-trigger Greptile

Comment on lines +170 to +181
// Poll until email stats are updated instead of a fixed wait
let response;
for (let i = 0; i < 30; i++) {
await wait(1_000);
response = await niceBackendFetch("/api/v1/emails/delivery-info", {
method: "GET",
accessType: "server",
});
if (response.status === 200 && response.body?.stats?.hour?.sent >= 1) {
break;
}
`);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Polling loop waits before first check and gives no timeout error

The loop unconditionally waits 1 s before ever issuing the first request, wasting time even when the email is already processed. More importantly, if the 30-iteration budget (30 s) is exhausted without the condition being met, the loop exits silently; the subsequent expect(stats.hour.sent).toBe(1) then produces a confusing assertion failure rather than a clear timeout message. waitForMetricsMatch in the same test suite throws a descriptive error on timeout — a similar pattern here would make failures easier to diagnose.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/e2e/tests/backend/endpoints/api/v1/emails/delivery-info.test.ts
Line: 170-181

Comment:
**Polling loop waits before first check and gives no timeout error**

The loop unconditionally waits 1 s before ever issuing the first request, wasting time even when the email is already processed. More importantly, if the 30-iteration budget (30 s) is exhausted without the condition being met, the loop exits silently; the subsequent `expect(stats.hour.sent).toBe(1)` then produces a confusing assertion failure rather than a clear timeout message. `waitForMetricsMatch` in the same test suite throws a descriptive error on timeout — a similar pattern here would make failures easier to diagnose.

How can I resolve this? If you propose a fix, please make it concise.

@N2D4 N2D4 review requested due to automatic review settings May 20, 2026 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant