test(e2e): add A/B test and all-gateway-target lifecycle suites#1584
test(e2e): add A/B test and all-gateway-target lifecycle suites#1584jariy17 wants to merge 7 commits into
Conversation
Add two e2e suites covering gaps in the AWS-boundary coverage: - ab-test-lifecycle.test.ts: A/B test in config-bundle mode — create project + gateway + two config-bundle versions + online-eval, deploy, run ab-test, then pause/resume/promote asserting live execution state from AWS via `view ab-test`. Archives the test (a job, not a project resource) before teardown. - httpgateway-all-targets.test.ts: one protocolType:None (HTTP) gateway hosting every deployable target type (http-runtime, mcp-server, lambda-function-arn, api-gateway, open-api-schema, smithy-model, web-search), deployed in a single stack. Omits connector (private-beta FMKB) and passthrough (unreleased gated feature). External resources targets can't self-provision (Lambda, REST API) are created idempotently by fixtures/gateway-targets/setup_target_prereqs.py, mirroring the fixtures/import/ boto3 pattern. The api-key credential provider is deleted in afterAll since it is created by a pre-deploy SDK call, not CloudFormation, and so survives stack teardown. Both suites pass against AWS (22/22).
|
Claude Security Review: no high-confidence findings. (run) |
Package TarballHow to installgh release download pr-1584-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.20.0.tgz |
Coverage Report
|
agentcore-cli-automation
left a comment
There was a problem hiding this comment.
Tests look well-structured overall — clean sequential flows, real AWS assertions, and no excessive mocking. Two main issues worth addressing before merge, both around the shared-resource design in setup_target_prereqs.py.
The e2e-tests-full.yml workflow runs a cli-build: [preview, ga] matrix in parallel against the same AWS account. Since the prereq script uses deterministic resource names (e2e-gwtarget-fn, e2e-gwtarget-api-v2) for idempotent reuse, both matrix jobs can race on the create paths. See inline comments for specifics.
| print(f" role not ready, retrying ({attempt + 1})...") | ||
| time.sleep(5) | ||
| continue | ||
| raise |
There was a problem hiding this comment.
Race condition with parallel CI runs.
e2e-tests-full.yml runs cli-build: [preview, ga] in parallel against the same AWS account. If both matrix jobs reach ensure_lambda() before either has created the function, both will pass get_function (NotFound), both will call create_function, and one will fail with ResourceConflictException. The retry loop here only catches role-propagation errors ("cannot be assumed" / "InvalidParameterValue") — a conflict will be re-raised and fail the e2e setup.
Options:
- Catch
lam.exceptions.ResourceConflictException(or checke.response['Error']['Code'] == 'ResourceConflictException') and re-fetch the function viaget_functionto recover the ARN. - Suffix the resource name with something like the GitHub run ID / matrix value so the two jobs don't share resources (matches the
import-resources.test.tsfixture pattern, but loses the cross-run reuse). - Serialize the two matrix jobs (e.g. use a workflow-level concurrency group) so they don't run simultaneously.
There was a problem hiding this comment.
Fixed in e5980bd (option 1). ensure_lambda now catches ResourceConflictException from a concurrent create_function and recovers the ARN via get_function, so the loser of the race reuses the winner's function instead of failing. Cross-run reuse is preserved.
| for item in api.get_rest_apis(limit=500).get("items", []): | ||
| if item.get("name") == REST_API_NAME: | ||
| print(f"REST API exists: {item['id']}") | ||
| return item["id"], REST_API_STAGE |
There was a problem hiding this comment.
Same race produces duplicate REST APIs (silent leak).
Unlike Lambda, API Gateway happily lets you create multiple REST APIs with the same name. If two parallel CI jobs both find no API matching REST_API_NAME and both call create_rest_api, you end up with two REST APIs named e2e-gwtarget-api-v2 in the account. Subsequent runs will pick whichever one get_rest_apis returns first — so this won't necessarily fail tests, but it leaks duplicates over time and the count grows without bound.
Options (same as for Lambda, pick one consistently):
- Suffix the API name with a CI-job-stable identifier so each parallel runner gets its own.
- Serialize the matrix jobs (workflow concurrency group).
- Detect duplicates here and clean them up before returning — uglier than the other two.
There was a problem hiding this comment.
Fixed in e5980bd. ensure_rest_api now reconciles to the lowest-id API and deletes any duplicates — both before the create path (reuse) and after it (in case two jobs both created one) — so parallel jobs converge on a single API and duplicates never accumulate, while keeping idempotent cross-run reuse. Verified against AWS: after a run, exactly one e2e-gwtarget-api-v2 remains.
…racefully - ab-test-target-based.test.ts: A/B test in target-based mode — two http-runtime gateway-targets on named runtime endpoints (control, treatment), each scoped by its own online-eval. run → pause → resume → promote (promote version-bumps the control endpoint to treatment's). Complements the existing config-bundle-mode suite. Passes AWS 11/11. - setup_target_prereqs.py + httpgateway-all-targets.test.ts: catch AccessDenied per resource and emit null so a restricted IAM role (e.g. the e2e-github-actions CI role, which lacks lambda:*/apigateway:*) skips the lambda-function-arn / api-gateway targets instead of failing the whole suite.
|
Claude Security Review: no high-confidence findings. (run) |
Mirror the ab-test-target-based.test.ts naming now that both A/B test modes have their own suite; the old "lifecycle" name was ambiguous.
|
Claude Security Review: no high-confidence findings. (run) |
Add the passthrough target type (gated behind ENABLE_GATED_FEATURES, with gateway-iam-role outbound auth + signing service). add/deploy now run with the gate enabled. The CDK fully synthesizes passthrough targets, so it deploys in the single stack alongside the other 7 types. Suite now 13/13 against AWS. Only connector (private-beta FMKB) remains omitted.
|
Claude Security Review: no high-confidence findings. (run) |
Addresses PR review on the preview/ga matrix racing on shared deterministic resource names: - Remove unused `import sys`. - ensure_lambda: catch ResourceConflictException from a concurrent create_function and recover the ARN via get_function instead of failing. - ensure_rest_api: API Gateway allows duplicate names, so parallel jobs could leak duplicate REST APIs. Reconcile to the lowest-id API (delete the rest) both before and after the create path, so runs converge on one API and duplicates never accumulate — while keeping cross-run reuse.
|
Claude Security Review: no high-confidence findings. (run) |
| print(f"Deleting duplicate REST API created by a parallel job: {dupe}") | ||
| try: | ||
| api.delete_rest_api(restApiId=dupe) | ||
| except api.exceptions.ClientError: |
Previously `abTestId` was assigned on the last line of the `run ab-test` retry callback, after the mode/variants assertions. If `run ab-test` created the test (exit 0) but a later assertion in that block threw, abTestId stayed unset and afterAll skipped `archive ab-test` — leaking the test (which global-setup does not reap). The retry wrapper could also re-create it up to 3×. Capture abTestId immediately after asserting json.id (inside retry, before any later throw), and move the deterministic mode/variants checks outside the retry so a mismatch can't trigger a re-create. Now any post-create failure still archives the test in afterAll. Both suites still pass (21/21).
|
Claude Security Review: no high-confidence findings. (run) |
The CI e2e job failed: under the restricted e2e-github-actions role, ensure_lambda raised AccessDeniedException as expected, but _try()'s `except botocore.exceptions.ClientError` threw `NameError: name 'botocore' is not defined` because only boto3 was imported — crashing the fixture instead of gracefully skipping the lambda/api-gateway targets. Local runs missed this because the Admin role never triggers AccessDenied, so the except clause never executed. Verified the fix by running the fixture under the actual e2e-github-actions role: both targets now skip cleanly (null) and the script exits 0.
|
Claude Security Review: no high-confidence findings. (run) |
What
Adds two e2e suites covering gaps in the AWS-boundary coverage (per
e2e-tests/README.md's targets list).ab-test-lifecycle.test.tsA/B test lifecycle in config-bundle mode:
create → gateway → config-bundle v1 → deploy → config-bundle v2 → deploy → online-eval (Builtin evaluator) → deploy → run ab-test → view (RUNNING) → pause (PAUSED) → resume (RUNNING) → promote (STOPPED).Asserts the live execution state from AWS via
view ab-test --jsonafter each control-plane call (the execution status surfaces inlifecycleStatus). A/B tests are fire-and-forget jobs, not project resources, so cleanuparchives the test explicitly before teardown.httpgateway-all-targets.test.tsOne
protocolType: None(HTTP) gateway — a superset that can host every target type — carrying all deployable gateway-target types in a single stack:http-runtime,mcp-server,lambda-function-arn,api-gateway,open-api-schema,smithy-model,web-search.Verifies config (
agentcore.json),deployed-state.json, andstatus. Omitted:connector(private-beta FMKB CFN resource type) andpassthrough(unreleased gated feature).Fixtures
fixtures/gateway-targets/setup_target_prereqs.pyidempotently provisions the external resources targets can't self-provision (a Lambda + a REST API), mirroring the existingfixtures/import/boto3 pattern. Static schema fixtures (openapi.json,smithy.json,lambda-tools.json) are copied into the project at test time.Notes
afterAll— it's created by a pre-deploy SDK call (not CloudFormation), so it survives stack teardown and would otherwise leak.api-gatewayREST API method sets anoperationName(→operationId), required by the AgentCore target; the Lambda gets a resource-based invoke policy for the gateway service.Testing
Both suites pass end-to-end against AWS: 22/22 (10 ab-test + 12 gateway), with clean teardown and no leaked stacks or credential providers.