fix(deploy): surface original error types in preflight steps. #1525
fix(deploy): surface original error types in preflight steps. #1525Hweinstock wants to merge 14 commits into
Conversation
…ors to telemetry classification
|
Claude Security Review: the review did not actually analyze this PR (model took 0 turns — the skill likely failed during setup). See the run for details; a later push or re-run is needed for a real review. |
Package TarballHow to installgh release download pr-1525-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.20.0.tgz |
agentcore-cli-automation
left a comment
There was a problem hiding this comment.
Thanks for tackling this — strongly typed errors and richer telemetry classification are great improvements. I have a few concerns, primarily around inconsistencies that I think will undermine the PR's stated goal of reducing the unknown error rate.
The biggest one: in setupApiKeyCredentialProvider you correctly wrap AWS-credential failures in AwsCredentialsError, but in setupSingleOAuth2Provider and setupPaymentCredentialProviders the equivalent paths just mutate error.message on the original SDK error. The error's name (and therefore its telemetry classification) is unchanged, so the OAuth/payment paths will still fall through to UnknownError in classifyError — defeating the whole point of the change for those paths. Details inline.
A couple of smaller items also inline.
|
Addressed the main concern by rewrapping the error instead of mutating. most of these are symptoms of the existing implementation. Ignored comment about adding unit tests for a map. |
|
Claude Security Review: the review did not actually analyze this PR (model took 0 turns — the skill likely failed during setup). See the run for details; a later push or re-run is needed for a real review. |
|
Claude Security Review: the review did not actually analyze this PR (model took 0 turns — the skill likely failed during setup). See the run for details; a later push or re-run is needed for a real review. |
|
closing and moving to alternate branch. |
|
Claude Security Review: no high-confidence findings. (run) |
|
Claude Security Review: no high-confidence findings. (run) |
agentcore-cli-automation
left a comment
There was a problem hiding this comment.
Two issues that need a quick fix before merge. Most of the substantive feedback was already raised by the previous bot review and addressed; flagging these two new ones.
| providerName: credential.name, | ||
| status: 'error', | ||
| error: new AwsCredentialsError( | ||
| 'AWS credentials not found.Run`aws sso login` or set AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY.', |
There was a problem hiding this comment.
Typo in the user-facing error message: 'AWS credentials not found.Runaws sso login or set AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY.' — missing a space between found. and Run, and another between Run and the opening backtick. Renders as "AWS credentials not found.Runaws sso login..." in the user's terminal.
Also, the other two isNoCredentialsError branches in this file (around lines 209 and 555) build the message via `AWS credentials not found. ${await getAwsLoginGuidance()}`. This branch was changed to a hard-coded inline string, which means the dynamic guidance (e.g. recommending aws sso login vs. setting env vars based on the active profile) is no longer surfaced for OAuth setup failures.
Fix options:
- Use the same pattern as the other two sites:
error: new AwsCredentialsError( `AWS credentials not found. ${await getAwsLoginGuidance()}`, undefined, { cause: error } ),
- If you intentionally want to skip
getAwsLoginGuidance()here, at minimum fix the spacing:'AWS credentials not found. Runaws sso loginor set AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY.'.
| return { success: false, error: new Error(errorMsg), logPath: logger.getRelativeLogPath() }; | ||
|
|
||
| const error = errorResult?.error ?? new Error('unknown error occurred when setting up oauth providers'); | ||
| error.message = errorMsg; |
There was a problem hiding this comment.
error.message = errorMsg; mutates the underlying error object in place. This re-introduces the exact mutation pattern that this PR is trying to remove — your earlier commit 06a2bd84 fix(deploy): avoid mutating error messages cleaned this up in pre-deploy-identity.ts, but here the mutation lands back on the error returned from preflight.
Why this matters:
- The same
Errorinstance may also be in thecausechain or held by other callers (the preflight result still references it viaerrorResult.error). Mutating its message after the fact is action-at-a-distance. - It defeats the goal of preserving the original SDK error detail in logs (
logger.log(OAuth setup error: ${errorResult?.error ?? 'unknown'}, 'error')runs before the mutation, so logging is fine — but anything reading the error after this point sees the sanitized message).
Fix options (to keep the sanitization the comment promises while preserving the error class for telemetry classification via name):
- Re-instantiate the same class with
cause:const inner = errorResult?.error ?? new Error('unknown error occurred when setting up oauth providers'); const ErrorCtor = inner.constructor as new (msg: string, opts?: { cause?: unknown }) => Error; const error = new ErrorCtor(errorMsg, { cause: inner });
- Wrap in a
ValidationError(or any modeled type) and rely oncausefor telemetry to walk the chain:(would require teachingreturn { success: false, error: new ValidationError(errorMsg, { cause: errorResult?.error }), logPath: logger.getRelativeLogPath(), };
classifyErrorto walkcauseif you want the underlying class to drive classification — same caveat the previous bot review raised). - If you've decided that preserving the underlying error class is more important than sanitizing the message, drop the
error.message = errorMsgline and the comment about sanitizing, and passerrorResult.errorthrough as-is.
Problem
The deploy command has a very high error rate, with the majority coming from an
unknownsource. This makes it difficult to know if customers are blocked on a legitimate bug, or are simply using the command incorrectly.Solution
Error.Notes
Testing