Skip to content

[Test] Web acceptance stability#4506

Open
bekossy wants to merge 35 commits into
release/v0.103.5from
test-/-web-acceptance-stability
Open

[Test] Web acceptance stability#4506
bekossy wants to merge 35 commits into
release/v0.103.5from
test-/-web-acceptance-stability

Conversation

@bekossy

@bekossy bekossy commented May 31, 2026

Copy link
Copy Markdown
Member

Summary

Testing

Verified locally

Added or updated tests

QA follow-up

Demo

Checklist

  • I have included a video or screen recording for UI changes, or marked Demo as N/A
  • Relevant tests pass locally
  • Relevant linting and formatting pass locally
  • I have signed the CLA, or I will sign it when the bot prompts me

Contributor Resources

@vercel

vercel Bot commented May 31, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jun 14, 2026 3:48pm

Request Review

@coderabbitai

coderabbitai Bot commented May 31, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d42105df-9cce-42fe-8b68-e307cbc9f5ce

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR refactors Playwright acceptance tests to reduce network-response dependencies and adds comprehensive tests for the "Use API" documentation drawer. Changes include member-invite modal helpers, prompt-registry UI navigation, new Use API snippet tests for variants and deployments, and a slug validation assertion in evaluator creation.

Changes

Use API Snippet Tests

Layer / File(s) Summary
Acceptance test helpers and variants scenario
web/oss/tests/playwright/acceptance/use-api/index.ts (lines 1–223)
Imports, acceptance tags, and helper functions (deployFirstVariantToDevelopment, openVariantUseApiDrawer, openDeploymentUseApiDrawer, switchToTypescriptTab) for managing drawers and variant deployments. First test validates TypeScript snippet content on the Variants registry page, asserting the snippet includes application_variant_ref and axios.post.
Deployments scenario and export
web/oss/tests/playwright/acceptance/use-api/index.ts (lines 224–297)
Second test creates a completion app, deploys its first variant to Development, then validates the Use API drawer on the Deployments page includes environment_ref and axios.post. Exports useApiTests suite.
Feature file and spec registration
web/oss/tests/playwright/acceptance/features/use-api.feature, web/ee/tests/playwright/acceptance/use-api/use-api.spec.ts, web/oss/tests/playwright/acceptance/use-api/use-api.spec.ts, web/oss/tests/playwright/10-use-api.ts
Cucumber feature file defines two TypeScript scenarios. EE and OSS specs register the useApiTests suite via test.describe. OSS test index exports the suite for test discovery.

Test Refactoring to UI-Based Waiting

Layer / File(s) Summary
Members invite modal helpers
web/ee/tests/playwright/acceptance/members/index.ts
Removes waitForInviteResponse. Adds openInviteMembersModal (retry loop + email input visibility) and submitInviteMembersModal (form submission + modal dismissal). Updates invitePendingMember and test to use modal-based flow instead of network response waiting.
Prompt registry UI navigation
web/oss/tests/playwright/acceptance/prompt-registry/index.ts
Narrows PromptRegistryApiHelpers to remove waitForApiResponse. Refactors openWorkflowRevisionsPage to navigate UI only. Rewrites openFirstPublishedWorkflowRevision to click version labels and extract revisionId from URL. Removes API response state from test scenario.
Evaluators slug assertion
web/oss/tests/playwright/acceptance/evaluators/tests.ts
Adds assertion in createHumanEvaluatorFromDrawer that the "unique slug" input matches the provided evaluatorName, with a 5-second timeout.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • Agenta-AI/agenta#4308: Both PRs modify the Members invite Playwright flow in the EE tests, refactoring how invite submission is synchronized; this PR removes network-wait helpers in favor of modal-based ones.
  • Agenta-AI/agenta#4458: Overlaps with the EE members invite refactoring by replacing network/response waits with UI readiness checks in the same file.
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description is a template with empty sections and placeholder text; it does not provide substantive information about what was changed or why. Fill in the template sections with actual details: describe the refactored invite flow and new use-api tests, explain why the changes improve stability, and document what was tested.
Title check ❓ Inconclusive The title '[Test] Web acceptance stability' is vague and generic, using non-descriptive terms that don't convey specific information about the actual changes made. Consider a more specific title like '[Test] Refactor member invite flow and add use-api snippets tests' that reflects the actual changes.
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test-/-web-acceptance-stability

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bekossy bekossy marked this pull request as ready for review May 31, 2026 20:04
@dosubot dosubot Bot added size:XS This PR changes 0-9 lines, ignoring generated files. Frontend tests labels May 31, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
web/oss/tests/playwright/acceptance/use-api/index.ts (1)

111-143: 💤 Low value

Consolidate the two drawer-open helpers.

openVariantUseApiDrawer and openDeploymentUseApiDrawer are identical except for the button locator. Consider parameterizing to remove the duplicated drawer-resolution block.

♻️ Proposed consolidation
-const openVariantUseApiDrawer = async (page: any) => {
-    await page.waitForLoadState("networkidle")
-    const useApiButton = page.locator('[data-tour="api-code-button"]')
-    await expect(useApiButton).toBeVisible({timeout: 15000})
-    await expect(useApiButton).toBeEnabled({timeout: 5000})
-    await useApiButton.click()
-
-    const drawer = page.locator(".ant-drawer-content-wrapper").filter({
-        hasText: "How to use API",
-    })
-    await expect(drawer).toBeVisible({timeout: 20000})
-    return drawer
-}
-
-const openDeploymentUseApiDrawer = async (page: any) => {
-    await page.waitForLoadState("networkidle")
-    const useApiButton = page.getByRole("button", {name: "Use API"}).first()
-    await expect(useApiButton).toBeVisible({timeout: 15000})
-    await expect(useApiButton).toBeEnabled({timeout: 5000})
-    await useApiButton.click()
-
-    const drawer = page.locator(".ant-drawer-content-wrapper").filter({
-        hasText: "How to use API",
-    })
-    await expect(drawer).toBeVisible({timeout: 20000})
-    return drawer
-}
+const openUseApiDrawer = async (page: any, useApiButton: any) => {
+    await expect(useApiButton).toBeVisible({timeout: 15000})
+    await expect(useApiButton).toBeEnabled({timeout: 5000})
+    await useApiButton.click()
+
+    const drawer = page.locator(".ant-drawer-content-wrapper").filter({
+        hasText: "How to use API",
+    })
+    await expect(drawer).toBeVisible({timeout: 20000})
+    return drawer
+}
+
+const openVariantUseApiDrawer = (page: any) =>
+    openUseApiDrawer(page, page.locator('[data-tour="api-code-button"]'))
+
+const openDeploymentUseApiDrawer = (page: any) =>
+    openUseApiDrawer(page, page.getByRole("button", {name: "Use API"}).first())
web/oss/tests/playwright/acceptance/evaluators/tests.ts (1)

354-355: ⚡ Quick win

Update slug assertion to account for real slug transformation (UI uses slugify)

The slug field in CreateEvaluator is set from the (debounced) evaluator name via a slugify helper (toLowerCase() and replace(/[^a-z0-9_\-]+/g, "-"), plus trimming/collapsing dashes). So the assertion should not assume a no-op transformation in general.

That said, the current tests pass evaluatorName values like e2e-human-eval-${Date.now()} (already lowercase and hyphen-safe), so it matches the slug output today. For robustness, assert against slugify(evaluatorName) instead of evaluatorName.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 685200ae-6c11-485f-8a7f-e0a67aa8fa47

📥 Commits

Reviewing files that changed from the base of the PR and between a269527 and 58e6bcb.

📒 Files selected for processing (8)
  • web/ee/tests/playwright/acceptance/members/index.ts
  • web/ee/tests/playwright/acceptance/use-api/use-api.spec.ts
  • web/oss/tests/playwright/10-use-api.ts
  • web/oss/tests/playwright/acceptance/evaluators/tests.ts
  • web/oss/tests/playwright/acceptance/features/use-api.feature
  • web/oss/tests/playwright/acceptance/prompt-registry/index.ts
  • web/oss/tests/playwright/acceptance/use-api/index.ts
  • web/oss/tests/playwright/acceptance/use-api/use-api.spec.ts

Comment thread web/ee/tests/playwright/acceptance/members/index.ts
Comment thread web/oss/tests/playwright/acceptance/prompt-registry/index.ts
Comment thread web/oss/tests/playwright/acceptance/use-api/index.ts
@bekossy bekossy changed the base branch from main to release/v0.100.9 May 31, 2026 20:14
@github-actions

github-actions Bot commented May 31, 2026

Copy link
Copy Markdown
Contributor

Railway Preview Environment

Preview URL https://gateway-production-ebaf.up.railway.app/w
Project agenta-oss-pr-4506
Image tag pr-4506-f424c2a
Status Deployed
Railway logs Open logs
Workflow logs View workflow run
Updated at 2026-06-15T07:43:55.840Z

…iness

Without --max-time/--connect-timeout, curl could hang indefinitely when a
Railway preview accepted the TCP connection but stalled before sending an
HTTP response. This caused the wait-for-readiness job to block for hours
instead of cycling through its 30-attempt loop.

- Add --max-time 10 --connect-timeout 5 to each curl attempt so the loop
  is always bounded (~10 min max across both URL checks).
- Add timeout-minutes: 25 to the job as a defence-in-depth backstop.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n reinstall

playwright install --with-deps chromium was downloading 170MB and running
apt-get for hundreds of X11/font packages on every run, taking ~17 minutes.
This left almost no time for the auth bootstrap within the job timeout.

Cache ~/.cache/ms-playwright keyed on the tests package.json hash so the
browser binary is restored on cache hits (subsequent runs). playwright install
still runs after the cache restore — it detects the binary is present and
skips the download, but still verifies/installs any missing system deps via
apt which is fast when packages are already cached by the runner.

Also bumps the job timeout from 25 to 30 minutes to give the first (cold)
run enough headroom.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
--with-deps runs apt-get for ~200 X11/font/GTK packages on top of the
browser download, adding 25+ minutes on cold runs and causing the job
to time out before the auth bootstrap could run. The ubuntu-latest runner
already has Chromium's core runtime libraries; the auth bootstrap (login
form + save cookies) doesn't need the full X11/font stack.

Removing --with-deps cuts the install from ~29 min to ~1-2 min (binary
download only, skipped entirely on cache hits). Timeout reduced to 15 min
to match the realistic job budget:
  - URL health check: ≤5 min
  - checkout + node + pnpm: ~2 min
  - playwright install (binary only): ~1-2 min
  - auth bootstrap: ~3 min

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bekossy bekossy changed the base branch from release/v0.100.9 to release/v0.101.0 June 3, 2026 07:47
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bekossy bekossy changed the base branch from release/v0.101.0 to release/v0.101.1 June 4, 2026 08:14
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Jun 9, 2026
… consolidate provider creation and deletion logic
@bekossy bekossy changed the base branch from release/v0.103.0 to release/v0.103.1 June 10, 2026 14:22
@bekossy bekossy changed the base branch from release/v0.103.1 to main June 11, 2026 08:23
@bekossy bekossy changed the base branch from main to release/v0.103.2 June 11, 2026 08:23
@bekossy bekossy changed the base branch from release/v0.103.2 to release/v0.103.5 June 14, 2026 10:38
@bekossy bekossy marked this pull request as draft June 15, 2026 07:37
@bekossy bekossy marked this pull request as ready for review June 15, 2026 07:37

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 222c839e25

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

const {projectId} = getProjectValues()
if (projectId) {
await triggerMetricsRefresh({projectId, runId, scenarioId})
triggerMetricsRefresh({projectId, runId, scenarioId}).catch(() => {})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Await metrics refresh before refetching caches

When an annotation changes metric values, this starts triggerMetricsRefresh and then immediately invalidates/refetches the metric queries below. Since triggerMetricsRefresh performs the backend refresh POSTs asynchronously, those refetches can complete before scenario/run metrics are recomputed, caching stale metrics while the success toast is already shown; the drawer path still awaits this refresh before invalidating. Await the refresh before clearing/refetching the metric caches.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Frontend size:L This PR changes 100-499 lines, ignoring generated files. tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants