Skip to content

acceptance: route bundle test clusters through the shared instance pool#5461

Open
renaudhartert-db wants to merge 3 commits into
mainfrom
nat-pool-routing
Open

acceptance: route bundle test clusters through the shared instance pool#5461
renaudhartert-db wants to merge 3 commits into
mainfrom
nat-pool-routing

Conversation

@renaudhartert-db
Copy link
Copy Markdown
Contributor

@renaudhartert-db renaudhartert-db commented Jun 7, 2026

Changes

Route the cluster-launching bundle acceptance templates through the shared test instance pool by adding instance_pool_id: $TEST_INSTANCE_POOL_ID to their cluster specs, matching the pattern already used by bundle/deploy/spark-jar-task and bundle/integration_whl/base.

Goldens for the affected tests are regenerated: when a cluster is created from a pool, the node type comes from the pool, so node_type_id/driver_node_type_id are dropped from the deployed spec.

Why

These bundle acceptance tests create real clusters during cloud integration runs, and each cold node pulls ~3GB over the NAT gateway on every launch. Most of the templates set node_type_id directly, which bypasses the warm instance pool that is already wired into the integration environment as TEST_INSTANCE_POOL_ID. Routing them through the pool lets nodes reuse an already-cached runtime image instead of re-pulling it each time, removing a large amount of redundant network transfer from the test runs. Two templates already did this; this brings the remaining cluster-launching ones in line.

Tests

Existing acceptance tests, with regenerated goldens for the affected templates. The full integration suite passes.

The cli-isolated integration tests launch ~30 ephemeral clusters per run, each
cold-pulling the multi-GB DBR runtime image over the NAT gateway in the deco AWS
test account. That NAT egress is the bulk of an opex.eng.deco budget overspend
(ES-1912931); the traffic is ~99.6% inbound download, ~3 GB per node.

These bundle acceptance templates set node_type_id directly, bypassing the
existing warm instance pool that is already exported to CI as
TEST_INSTANCE_POOL_ID and already used by spark-jar-task and integration_whl/base.
Routing them through the pool lets nodes reuse a cached runtime image instead of
re-pulling it through NAT on every launch.

Adds instance_pool_id: $TEST_INSTANCE_POOL_ID to the cluster-launching templates,
matching the existing pattern, and regenerates the affected acceptance goldens.

Co-authored-by: Isaac
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 7, 2026

Waiting for approval

Based on git history, these people are best suited to review:

  • @andrewnester -- recent work in acceptance/bundle/resources/clusters/lifecycle-started-terraform-error/, acceptance/bundle/resources/clusters/lifecycle-started-toggle/, acceptance/bundle/resources/clusters/lifecycle-started/

Eligible reviewers: @anton-107, @denik, @janniklasrose, @lennartkats-db, @pietern, @shreyas-goenka

Suggestions based on git history. See OWNERS for ownership rules.

@eng-dev-ecosystem-bot
Copy link
Copy Markdown
Collaborator

eng-dev-ecosystem-bot commented Jun 7, 2026

Commit: 3d5eb7e

Run: 27099506548

Env 🟨​KNOWN 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
🟨​ aws linux 7 15 261 923 7:18
🟨​ aws windows 7 15 263 921 13:39
💚​ aws-ucws linux 7 15 357 837 6:17
💚​ aws-ucws windows 7 15 359 835 8:56
💚​ azure linux 1 17 264 921 5:37
💚​ azure windows 1 17 266 919 8:08
💚​ azure-ucws linux 1 17 362 833 7:02
💚​ azure-ucws windows 1 17 364 831 9:10
💚​ gcp linux 1 17 260 924 6:42
💚​ gcp windows 1 17 262 922 8:26
22 interesting tests: 15 SKIP, 7 KNOWN
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 🟨​K 🟨​K 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/grants/select 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
Top 24 slowest tests (at least 2 minutes):
duration env testname
4:58 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:46 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:45 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:33 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:27 azure-ucws windows TestAccept
3:26 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:25 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:24 azure windows TestAccept
3:24 aws-ucws windows TestAccept
3:18 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:18 gcp windows TestAccept
3:07 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:03 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:59 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:56 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:50 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:43 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:40 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:40 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:37 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:37 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:31 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:27 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:22 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct

It is a bind test that only deploys (no cluster launch, no NAT benefit), and
its acceptance golden did not regenerate cleanly outside CI. Reverting it to
keep the change focused on templates that actually launch clusters.
Cloud-only deploy test (no bundle run, no cluster launch, no NAT benefit). It
skips locally so its golden could not be regenerated here, and its cloud golden
still records node_type_id; routing it through the pool changed the cloud output
and broke the integration check. The 11 cluster-launching templates that do
start clusters passed the cloud integration run.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants