Track compressed resource state sizes in deploy telemetry (direct engine) by shreyas-goenka · Pull Request #5608 · databricks/cli

shreyas-goenka · 2026-06-15T13:47:18Z

What

Bundle deploy telemetry already reports per-resource-type raw state-size statistics (state_size_{max,mean,median}_bytes in ResourceMetadata). The same per-resource state is stored compressed downstream, so this adds the compressed-size counterparts to gauge how much resource state shrinks under compression, not just the raw sizes:

state_compressed_size_max_bytes
state_compressed_size_mean_bytes
state_compressed_size_median_bytes

Performance

flate runs at state-export time over individually small resource states (each well under the server's per-resource limit), not in a tight loop, so even large bundles compress in a few milliseconds — negligible next to a deploy's network I/O. No background goroutine is warranted.

Flate vs Zstd

The server uses Zstd.compress(raw) — the single-arg luben call, which is zstd's default level (3). So the right comparison is flate-L6 (what the CLI uses) vs zstd-L3, and the data says the proxy is good:

┌───────────────────────┬─────────┬─────────────────┬──────────────────┬──────────────────┐
│        Sample         │   raw   │ flate L6 (CLI)  │ zstd L3 (server) │ flate vs zstd-L3 │
├───────────────────────┼─────────┼─────────────────┼──────────────────┼──────────────────┤
│ varied JSON, 64 KB    │ 64 KB   │ 11.6 KB (18.1%) │ 11.1 KB (17.3%)  │ +4.6%            │
├───────────────────────┼─────────┼─────────────────┼──────────────────┼──────────────────┤
│ varied JSON, 1 MB     │ 1024 KB │ 179 KB (17.6%)  │ 183 KB (17.9%)   │ −2.1%            │
├───────────────────────┼─────────┼─────────────────┼──────────────────┼──────────────────┤
│ realistic JSON, 64 KB │ 64 KB   │ 2.2 KB (3.4%)   │ 2.0 KB (3.1%)    │ +11.2%           │
├───────────────────────┼─────────┼─────────────────┼──────────────────┼──────────────────┤
│ realistic JSON, 1 MB  │ 1024 KB │ 27 KB (2.6%)    │ 29 KB (2.8%)     │ −6.9%            │
└───────────────────────┴─────────┴─────────────────┴──────────────────┴──────────────────┘

Takeaways:
- flate-L6 tracks the server's zstd-L3 within ~±10%, with no consistent bias — sometimes a touch larger (small blobs), sometimes a touch smaller (at ~1 MB flate actually beats zstd-L3). For the intended purpose — understanding how much state shrinks and rough server-storage sizing — that's a faithful proxy. And since the error isn't one-directional, it largely washes out in the aggregate max/mean/median.
- The one important caveat: the proxy is good because the server compresses at zstd's default level (3). zstd's real edge over DEFLATE only shows at higher levels — e.g., on realistic/1 MB, zstd --best got 21 KB vs flate's 27 KB (~28% smaller). So if the server ever raises its zstd level, flate would start systematically over-estimating stored size by ~10–30%. Worth keeping in mind, but at the current default level the proxy is within noise.

Net: for what this telemetry is for, flate is a solidly good stand-in for the server's zstd — within ~10% today. (These are synthetic JSON samples; real resource state will vary in absolute ratio, but the flate-vs-zstd relationship is stable for JSON/text.)

Against some realiish data:
Pulled 6 real .lvdash.json dashboards from public GitHub repos (databrickslabs/dqx, databrickslabs/sandbox's DBR-monitor, andre-salvati/databricks-template, etc.), 1.3 KB–240 KB, and compressed each with flate-L6 (what the CLI does) vs zstd-L3 (the server's confirmed default):

┌──────────────────────┬─────────┬──────────┬─────────┬───────────┬──────────────────┐
│      Dashboard       │   raw   │ flate L6 │ zstd L3 │ zstd best │ flate vs zstd-L3 │
├──────────────────────┼─────────┼──────────┼─────────┼───────────┼──────────────────┤
│ mtest                │ 1.3 KB  │ 503 B    │ 538 B   │ 521 B     │ −6.5%            │
├──────────────────────┼─────────┼──────────┼─────────┼───────────┼──────────────────┤
│ airflow              │ 5.4 KB  │ 961 B    │ 1060 B  │ 996 B     │ −9.3%            │
├──────────────────────┼─────────┼──────────┼─────────┼───────────┼──────────────────┤
│ worldcup             │ 8.8 KB  │ 1439 B   │ 1621 B  │ 1481 B    │ −11.2%           │
├──────────────────────┼─────────┼──────────┼─────────┼───────────┼──────────────────┤
│ orders               │ 16.6 KB │ 1571 B   │ 1787 B  │ 1601 B    │ −12.1%           │
├──────────────────────┼─────────┼──────────┼─────────┼───────────┼──────────────────┤
│ dbrmon (DBR monitor) │ 155 KB  │ 9298 B   │ 8690 B  │ 7388 B    │ +7.0%            │
├──────────────────────┼─────────┼──────────┼─────────┼───────────┼──────────────────┤
│ dqx                  │ 240 KB  │ 14907 B  │ 14869 B │ 12563 B   │ +0.3%            │
├──────────────────────┼─────────┼──────────┼─────────┼───────────┼──────────────────┤
│ TOTAL                │ 428 KB  │ 28679 B  │ 28565 B │ —         │ +0.4%            │

eng-dev-ecosystem-bot · 2026-06-15T14:48:10Z

Integration test report

Commit: c9ac30f

Run: 27685801795

	Env	🟨KNOWN	🔄flaky	💚RECOVERED	🙈SKIP	✅pass	🙈skip	Time
🟨	aws linux	7			15	264	1009	7:17
🟨	aws windows	7			15	266	1007	11:49
💚	aws-ucws linux			7	15	360	923	8:13
💚	aws-ucws windows			7	15	362	921	10:29
💚	azure linux			1	17	267	1007	6:32
💚	azure windows			1	17	269	1005	7:54
🔄	azure-ucws linux		3		17	363	919	11:00
💚	azure-ucws windows			1	17	367	917	9:45
🔄	gcp linux		3	1	17	260	1010	10:01
🔄	gcp windows		2	1	17	263	1008	13:07

27 interesting tests: 15 SKIP, 7 KNOWN, 5 flaky

	Test Name	aws linux	aws windows	aws-ucws linux	aws-ucws windows	azure linux	azure windows	azure-ucws linux	azure-ucws windows	gcp linux	gcp windows
🟨	TestAccept	🟨K	🟨K	💚R	💚R	💚R	💚R	🔄f	💚R	💚R	💚R
🙈	TestAccept/bundle/invariant/no_drift	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🔄	TestAccept/bundle/resources/apps/inline_config	✅p	✅p	✅p	✅p	✅p	✅p	🔄f	✅p	✅p	✅p
🔄	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform	✅p	✅p	✅p	✅p	✅p	✅p	🔄f	✅p	✅p	✅p
🙈	TestAccept/bundle/resources/permissions	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🟨	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions	🟨K	🟨K	💚R	💚R	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🟨	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct	🟨K	🟨K	💚R	💚R
🟨	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform	🟨K	🟨K	💚R	💚R
🟨	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions	🟨K	🟨K	💚R	💚R	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🟨	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct	🟨K	🟨K	💚R	💚R
🟨	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform	🟨K	🟨K	💚R	💚R
🙈	TestAccept/bundle/resources/postgres_branches/basic	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/recreate	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/replace_existing	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/update_protected	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/without_branch_id	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_endpoints/basic	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_endpoints/recreate	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_projects/update_display_name	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/synced_database_tables/basic	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/vector_search_indexes/basic	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/vector_search_indexes/grants/select	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/ssh/connection	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🔄	TestFetchRepositoryInfoAPI_FromRepo	✅p	✅p	✅p	✅p	✅p	✅p	✅p	✅p	🔄f	✅p
🔄	TestFetchRepositoryInfoAPI_FromRepo/root	✅p	✅p	✅p	✅p	✅p	✅p	✅p	✅p	🔄f	🔄f
🔄	TestFetchRepositoryInfoAPI_FromRepo/subdir	✅p	✅p	✅p	✅p	✅p	✅p	✅p	✅p	🔄f	🔄f

Top 24 slowest tests (at least 2 minutes):

duration	env	testname
5:15	gcp windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:40	gcp linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:27	gcp linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:26	gcp windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:50	azure linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:44	gcp windows	TestAccept
3:30	aws-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:28	aws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:21	aws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:16	azure-ucws windows	TestAccept
3:02	aws-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:02	azure windows	TestAccept
2:56	aws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:55	azure windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:55	azure-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:48	azure windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:45	azure-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:45	azure-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:41	aws-ucws windows	TestAccept
2:38	aws-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:33	aws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:32	azure linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:32	aws-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:27	azure-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct

github-actions · 2026-06-17T03:01:53Z

Approval status: pending

`/acceptance/bundle/` - needs approval

Files: acceptance/bundle/telemetry/deploy/out.resources_metadata.direct.txt
Suggested: @denik
Also eligible: @pietern, @janniklasrose, @anton-107, @andrewnester, @lennartkats-db

`/bundle/` - needs approval

6 files changed
Suggested: @denik
Also eligible: @pietern, @janniklasrose, @anton-107, @andrewnester, @lennartkats-db

`/libs/telemetry/` - needs approval

Files: libs/telemetry/protos/bundle_deploy.go
Eligible: @simonfaltum, @renaudhartert-db, @hectorcast-db, @parthban-db, @tanmay-db, @Divyansh-db, @tejaskochar-db, @mihaimitrea-db, @chrisst, @rauchy

_{Any maintainer (@andrewnester, @anton-107, @denik, @pietern, @simonfaltum, @renaudhartert-db) can approve all areas.

See OWNERS for ownership rules.}

pietern · 2026-06-17T10:20:32Z

+
+	result := make(map[string]int, len(keys))
+	for i, key := range keys {
+		result[key] = sizes[i]


The canonical pattern here is to run a goroutine for each element in the map and have it return a {key, int} on a channel. The main loop then drains that channel to collect the results and stores them in a map. There is no need to deal with GOMAXPROCS or "workers". The Go runtime takes care of scheduling.

Done — switched to the canonical pattern: one goroutine per resource sending {key, size} on a buffered channel, drained into the map. Dropped the GOMAXPROCS/worker-pool machinery entirely.

pietern · 2026-06-17T10:20:51Z

+	}
+	return buf.Len()
+}
+


This code belongs in a separate file where we have all the compression related stuff.

Done — moved the compression code (compressedStateSize + compressStateSizes) into bundle/direct/dstate/compress.go.

pietern · 2026-06-17T10:21:36Z

+			}
+		})
+	}
+}


This doesn't have anything to do with state, only with compression.

Done — the compression test and benchmarks now live in compress_test.go alongside the compression code.

…ine) Deploy telemetry already reports per-resource-type raw state-size statistics (state_size_{max,mean,median}_bytes). The deployment metadata service stores that same per-resource state compressed, so this adds compressed-size counterparts to gauge how much resource state shrinks under compression rather than just the raw sizes: - state_compressed_size_max_bytes - state_compressed_size_mean_bytes - state_compressed_size_median_bytes The compressed length is computed per resource at state-export time (alongside the existing raw length) using the standard library's compress/flate -- a deliberately rough proxy for the server side (which uses zstd) that keeps the dependency/supply-chain surface small while still giving useful signal on compressibility. Since the largest resource states (~1 MB, ~20 ms to compress) dominate the cost, the per-resource compression is fanned out across workers, keeping multi-resource bundles cheap. Only the direct engine is measured, matching the existing raw-size behavior. Co-authored-by: Isaac

shreyas-goenka mentioned this pull request Jun 15, 2026

Track zstd-compressed resource state sizes in deploy telemetry (direct engine) #5604

Closed

shreyas-goenka temporarily deployed to test-trigger-is June 15, 2026 13:48 — with GitHub Actions Inactive

shreyas-goenka force-pushed the shreyas-goenka/telemetry-compressed-resource-sizes branch 4 times, most recently from 1aacd8b to 8c625fa Compare June 17, 2026 02:58

shreyas-goenka requested a review from pietern June 17, 2026 03:01

shreyas-goenka marked this pull request as ready for review June 17, 2026 03:01

shreyas-goenka requested review from andrewnester and janniklasrose June 17, 2026 08:34

shreyas-goenka force-pushed the shreyas-goenka/telemetry-compressed-resource-sizes branch 2 times, most recently from 505c536 to b0b017d Compare June 17, 2026 09:31

shreyas-goenka temporarily deployed to test-trigger-is June 17, 2026 09:32 — with GitHub Actions Inactive

shreyas-goenka force-pushed the shreyas-goenka/telemetry-compressed-resource-sizes branch from b0b017d to ca58e77 Compare June 17, 2026 09:41

shreyas-goenka temporarily deployed to test-trigger-is June 17, 2026 09:41 — with GitHub Actions Inactive

pietern reviewed Jun 17, 2026

View reviewed changes

shreyas-goenka force-pushed the shreyas-goenka/telemetry-compressed-resource-sizes branch from ca58e77 to c9ac30f Compare June 17, 2026 11:30

shreyas-goenka temporarily deployed to test-trigger-is June 17, 2026 11:30 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track compressed resource state sizes in deploy telemetry (direct engine)#5608

Track compressed resource state sizes in deploy telemetry (direct engine)#5608
shreyas-goenka wants to merge 1 commit into
mainfrom
shreyas-goenka/telemetry-compressed-resource-sizes

shreyas-goenka commented Jun 15, 2026 •

edited

Loading

Uh oh!

eng-dev-ecosystem-bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

pietern Jun 17, 2026

Uh oh!

shreyas-goenka Jun 17, 2026

Uh oh!

pietern Jun 17, 2026

Uh oh!

shreyas-goenka Jun 17, 2026

Uh oh!

pietern Jun 17, 2026

Uh oh!

shreyas-goenka Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

shreyas-goenka commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Performance

Flate vs Zstd

Uh oh!

eng-dev-ecosystem-bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Integration test report

Uh oh!

github-actions Bot commented Jun 17, 2026

Approval status: pending

/acceptance/bundle/ - needs approval

/bundle/ - needs approval

/libs/telemetry/ - needs approval

Uh oh!

pietern Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

shreyas-goenka Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

pietern Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

shreyas-goenka Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

pietern Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

shreyas-goenka Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shreyas-goenka commented Jun 15, 2026 •

edited

Loading

eng-dev-ecosystem-bot commented Jun 15, 2026 •

edited

Loading

`/acceptance/bundle/` - needs approval

`/bundle/` - needs approval

`/libs/telemetry/` - needs approval