Skip to content

fix(deploy): surface original error types in preflight steps. #1525

Draft
Hweinstock wants to merge 14 commits into
aws:mainfrom
Hweinstock:fix/deploy-unknown-error
Draft

fix(deploy): surface original error types in preflight steps. #1525
Hweinstock wants to merge 14 commits into
aws:mainfrom
Hweinstock:fix/deploy-unknown-error

Conversation

@Hweinstock

Copy link
Copy Markdown
Contributor

Problem

The deploy command has a very high error rate, with the majority coming from an unknown source. This makes it difficult to know if customers are blocked on a legitimate bug, or are simply using the command incorrectly.

Solution

  • Refactor some of the preflight steps to surface the original error type instead of rewrapping in Error.
  • Rename some error types to better communicate their purpose.
  • Add more error names to the common error name mapping for stronger classification.

Notes

  • will follow-up with backend schema change before releasing this.

Testing

  • ran integ and unit tests.

@github-actions github-actions Bot added the size/l PR size: L label Jun 11, 2026
@github-actions github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 11, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 11, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: the review did not actually analyze this PR (model took 0 turns — the skill likely failed during setup). See the run for details; a later push or re-run is needed for a real review.

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 11, 2026
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Package Tarball

aws-agentcore-0.20.0.tgz

How to install

gh release download pr-1525-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.20.0.tgz

@agentcore-cli-automation agentcore-cli-automation left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this — strongly typed errors and richer telemetry classification are great improvements. I have a few concerns, primarily around inconsistencies that I think will undermine the PR's stated goal of reducing the unknown error rate.

The biggest one: in setupApiKeyCredentialProvider you correctly wrap AWS-credential failures in AwsCredentialsError, but in setupSingleOAuth2Provider and setupPaymentCredentialProviders the equivalent paths just mutate error.message on the original SDK error. The error's name (and therefore its telemetry classification) is unchanged, so the OAuth/payment paths will still fall through to UnknownError in classifyError — defeating the whole point of the change for those paths. Details inline.

A couple of smaller items also inline.

Comment thread src/cli/operations/deploy/pre-deploy-identity.ts
Comment thread src/cli/operations/deploy/pre-deploy-identity.ts
Comment thread src/cli/commands/deploy/actions.ts
Comment thread src/cli/telemetry/error.ts
Comment thread src/cli/operations/identity/oauth2-credential-provider.ts
@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 11, 2026
@Hweinstock

Copy link
Copy Markdown
Contributor Author

Addressed the main concern by rewrapping the error instead of mutating. most of these are symptoms of the existing implementation.

Ignored comment about adding unit tests for a map.

@github-actions github-actions Bot added size/l PR size: L and removed size/l PR size: L labels Jun 11, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 11, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: the review did not actually analyze this PR (model took 0 turns — the skill likely failed during setup). See the run for details; a later push or re-run is needed for a real review.

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 11, 2026
@github-actions github-actions Bot removed the size/l PR size: L label Jun 11, 2026
@github-actions github-actions Bot added the size/l PR size: L label Jun 11, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 11, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: the review did not actually analyze this PR (model took 0 turns — the skill likely failed during setup). See the run for details; a later push or re-run is needed for a real review.

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 11, 2026
@Hweinstock

Copy link
Copy Markdown
Contributor Author

closing and moving to alternate branch.

@Hweinstock Hweinstock closed this Jun 11, 2026
@Hweinstock Hweinstock reopened this Jun 17, 2026
@github-actions github-actions Bot added size/l PR size: L and removed size/l PR size: L labels Jun 17, 2026
@github-actions github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 17, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 17, 2026
@github-actions github-actions Bot added size/l PR size: L and removed size/l PR size: L labels Jun 17, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot added claude-security-reviewing Claude Code /security-review in progress and removed claude-security-reviewing Claude Code /security-review in progress labels Jun 17, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 17, 2026

@agentcore-cli-automation agentcore-cli-automation left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two issues that need a quick fix before merge. Most of the substantive feedback was already raised by the previous bot review and addressed; flagging these two new ones.

providerName: credential.name,
status: 'error',
error: new AwsCredentialsError(
'AWS credentials not found.Run`aws sso login` or set AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY.',

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the user-facing error message: 'AWS credentials not found.Runaws sso login or set AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY.' — missing a space between found. and Run, and another between Run and the opening backtick. Renders as "AWS credentials not found.Runaws sso login..." in the user's terminal.

Also, the other two isNoCredentialsError branches in this file (around lines 209 and 555) build the message via `AWS credentials not found. ${await getAwsLoginGuidance()}`. This branch was changed to a hard-coded inline string, which means the dynamic guidance (e.g. recommending aws sso login vs. setting env vars based on the active profile) is no longer surfaced for OAuth setup failures.

Fix options:

  1. Use the same pattern as the other two sites:
    error: new AwsCredentialsError(
      `AWS credentials not found. ${await getAwsLoginGuidance()}`,
      undefined,
      { cause: error }
    ),
  2. If you intentionally want to skip getAwsLoginGuidance() here, at minimum fix the spacing: 'AWS credentials not found. Run aws sso login or set AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY.'.

return { success: false, error: new Error(errorMsg), logPath: logger.getRelativeLogPath() };

const error = errorResult?.error ?? new Error('unknown error occurred when setting up oauth providers');
error.message = errorMsg;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error.message = errorMsg; mutates the underlying error object in place. This re-introduces the exact mutation pattern that this PR is trying to remove — your earlier commit 06a2bd84 fix(deploy): avoid mutating error messages cleaned this up in pre-deploy-identity.ts, but here the mutation lands back on the error returned from preflight.

Why this matters:

  • The same Error instance may also be in the cause chain or held by other callers (the preflight result still references it via errorResult.error). Mutating its message after the fact is action-at-a-distance.
  • It defeats the goal of preserving the original SDK error detail in logs (logger.log(OAuth setup error: ${errorResult?.error ?? 'unknown'}, 'error') runs before the mutation, so logging is fine — but anything reading the error after this point sees the sanitized message).

Fix options (to keep the sanitization the comment promises while preserving the error class for telemetry classification via name):

  1. Re-instantiate the same class with cause:
    const inner = errorResult?.error ?? new Error('unknown error occurred when setting up oauth providers');
    const ErrorCtor = inner.constructor as new (msg: string, opts?: { cause?: unknown }) => Error;
    const error = new ErrorCtor(errorMsg, { cause: inner });
  2. Wrap in a ValidationError (or any modeled type) and rely on cause for telemetry to walk the chain:
    return {
      success: false,
      error: new ValidationError(errorMsg, { cause: errorResult?.error }),
      logPath: logger.getRelativeLogPath(),
    };
    (would require teaching classifyError to walk cause if you want the underlying class to drive classification — same caveat the previous bot review raised).
  3. If you've decided that preserving the underlying error class is more important than sanitizing the message, drop the error.message = errorMsg line and the comment about sanitizing, and pass errorResult.error through as-is.

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/l PR size: L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants