Skip to content

acc: Always destroy deployed bundles on test exit in cloud runs#5585

Open
chrisst wants to merge 3 commits into
mainfrom
chris.stephens/fix4-deferred-teardown
Open

acc: Always destroy deployed bundles on test exit in cloud runs#5585
chrisst wants to merge 3 commits into
mainfrom
chris.stephens/fix4-deferred-teardown

Conversation

@chrisst

@chrisst chrisst commented Jun 12, 2026

Copy link
Copy Markdown

What

Adds a harness-level guarantee that bundles deployed by acceptance tests are destroyed when the test exits, for cloud runs (CLOUD_ENV set):

  • runTest registers a t.Cleanup before the script starts (covering failures, require aborts, and script timeouts), capturing a clone of the exact env the script ran with;
  • the cleanup walks the test temp dir for .databricks/bundle/<target> state dirs and runs $CLI bundle destroy --auto-approve --target <target> per bundle root (10-minute cap per destroy);
  • destroy errors are logged via t.Logf only — never fail the test; double-destroy is harmless ("No active deployment found to destroy!" exits 0);
  • local/testserver runs: complete no-op.

Why

Acceptance scripts run under bash -e, and script.cleanup fragments are appended after the main body — they never execute when a script fails or times out between bundle deploy and bundle destroy. Against shared cloud test workspaces this leaks real resources: during the 2026-06-11/12 incident one shared GCP workspace had accumulated 100+ leaked test warehouses and dozens of leaked test-bundle-pipeline-* pipelines, exhausting the project's local-SSD quota and blocking terraform-provider CI for ~2 days (ref ES-1974228).

Cleanup output cannot pollute golden files: it runs after output comparison and goes only to the test log.

Known limitation: destroy is best-effort — a bundle deployed with required --var flags or a config corrupted mid-test may still fail to destroy; this is logged as a leak warning rather than failing the run.

Tests

go build ./..., go vet ./acceptance pass; local deploy+destroy acceptance tests (bundle/resources/sql_warehouses, bundle/resources/pipelines/recreate-keys across all engine variants) pass with no output regressions.

This pull request and its description were written by Isaac.

When acceptance tests run against real cloud workspaces (CLOUD_ENV set),
a test that fails, times out, or exits mid-script never reaches its own
'bundle destroy' step: scripts run under 'bash -e' and the merged
script.cleanup parts are skipped on failure. The deployed resources
(SQL warehouses, pipelines, jobs, ...) then leak in the shared test
workspaces. Leaked started warehouses recently exhausted a GCP quota and
took CI down for two days; we observed 100+ leaked warehouses and dozens
of leaked test pipelines in a single workspace.

This adds a harness-level safety net: on cloud runs, runTest registers a
t.Cleanup (before starting the script, so it also covers timeouts and
mid-test failures) that scans the test's temp dir for bundle state
directories (<bundle_root>/.databricks/bundle/<target>) and runs
'$CLI bundle destroy --auto-approve --target <target>' in each bundle
root, reusing the exact environment the script ran with.

The mechanism is deliberately best effort and invisible to test output:

- It is a no-op for local testserver runs (gated on CLOUD_ENV).
- It runs after output comparison and logs only via t.Logf, so cleanup
  output is never compared against expected out files.
- Double-destroy is harmless: 'bundle destroy' on an already-destroyed
  bundle exits 0 with 'No active deployment found to destroy!'. In the
  common success path the shared script.cleanup already removed
  .databricks, so nothing is even attempted.
- Destroy failures are logged but never fail the test.

Co-authored-by: Isaac
@chrisst chrisst temporarily deployed to test-trigger-is June 12, 2026 19:46 — with GitHub Actions Inactive
@chrisst chrisst temporarily deployed to test-trigger-is June 12, 2026 19:46 — with GitHub Actions Inactive
Use context.WithoutCancel(t.Context()) instead of context.Background()
(t.Context() is already canceled when cleanups run), make the
best-effort nilerr skip explicit, and trim narration comments.

Co-authored-by: Isaac
@chrisst chrisst temporarily deployed to test-trigger-is June 12, 2026 20:08 — with GitHub Actions Inactive
@chrisst chrisst temporarily deployed to test-trigger-is June 12, 2026 20:08 — with GitHub Actions Inactive
@chrisst chrisst requested a review from pietern June 12, 2026 20:11
@chrisst chrisst marked this pull request as ready for review June 12, 2026 20:12
@github-actions

Copy link
Copy Markdown
Contributor

Waiting for approval

Based on git history, these people are best suited to review:

  • @denik -- recent work in acceptance/

Eligible reviewers: @andrewnester, @anton-107, @pietern, @renaudhartert-db, @shreyas-goenka, @simonfaltum

Suggestions based on git history. See OWNERS for ownership rules.

@eng-dev-ecosystem-bot

eng-dev-ecosystem-bot commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Integration test report

Commit: ed2ccb2

Run: 27712772736

Env ❌​FAIL 🟨​KNOWN 🔄​flaky 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
🟨​ aws linux 7 14 264 1010 6:57
🟨​ aws windows 7 8 14 258 1008 19:11
💚​ aws-ucws linux 7 14 360 924 7:24
💚​ aws-ucws windows 7 14 362 922 9:46
🔄​ azure linux 2 1 16 265 1008 9:08
❌​ azure windows 3 1 16 266 1006 10:48
❌​ azure-ucws linux 2 1 1 16 362 920 10:12
🔄​ azure-ucws windows 4 1 16 363 918 19:18
💚​ gcp linux 1 16 263 1011 5:48
💚​ gcp windows 1 16 265 1009 7:52
33 interesting tests: 14 SKIP, 9 flaky, 7 KNOWN, 3 FAIL
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 🟨​K 🟨​K 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🔄​ TestAccept/bundle/resources/apps/inline_config ✅​p 🔄​f ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p
🔄​ TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p 🔄​f ✅​p ✅​p
🔄​ TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform ✅​p 🔄​f ✅​p ✅​p ✅​p ✅​p ✅​p 🔄​f ✅​p ✅​p
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🔄​ TestAccept/bundle/resources/permissions/dashboards/create/DATABRICKS_BUNDLE_ENGINE=direct ✅​p 🔄​f ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p
🔄​ TestAccept/bundle/resources/permissions/dashboards/create/DATABRICKS_BUNDLE_ENGINE=terraform ✅​p 🔄​f ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/recreate/embedding_dimension 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🔄​ TestAccept/bundle/run_as/job_default ✅​p 🔄​f ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p
🔄​ TestAccept/bundle/run_as/job_default/DATABRICKS_BUNDLE_ENGINE=direct ✅​p 🔄​f ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🔄​ TestSecretsPutSecretBytesValue ✅​p 🔄​f 🙈​s 🙈​s ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p
🔄​ TestSecretsPutSecretStringValue ✅​p 🔄​f 🙈​s 🙈​s ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p
❌​ TestFetchRepositoryInfoAPI_FromRepo ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ❌​F ✅​p ✅​p ✅​p
❌​ TestFetchRepositoryInfoAPI_FromRepo/root ✅​p ✅​p ✅​p ✅​p 🔄​f ❌​F ❌​F 🔄​f ✅​p ✅​p
❌​ TestFetchRepositoryInfoAPI_FromRepo/subdir ✅​p ✅​p ✅​p ✅​p 🔄​f ❌​F 🔄​f 🔄​f ✅​p ✅​p
Top 20 slowest tests (at least 2 minutes):
duration env testname
4:23 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:19 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:18 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:10 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:29 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:26 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:25 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:25 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:22 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:17 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:16 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:05 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:01 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:00 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:53 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:43 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:34 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:32 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:31 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:26 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct

@chrisst chrisst temporarily deployed to test-trigger-is June 17, 2026 19:00 — with GitHub Actions Inactive
@chrisst chrisst temporarily deployed to test-trigger-is June 17, 2026 19:00 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants