Skip to content

Fix Iceberg ARRAY columns with dot-separated names returning empty lists#1894

Open
il9ue wants to merge 7 commits into
antalya-26.3from
fix/iceberg-dotted-array-90731-antalya-26.3
Open

Fix Iceberg ARRAY columns with dot-separated names returning empty lists#1894
il9ue wants to merge 7 commits into
antalya-26.3from
fix/iceberg-dotted-array-90731-antalya-26.3

Conversation

@il9ue

@il9ue il9ue commented Jun 8, 2026

Copy link
Copy Markdown

Backport of upstream ClickHouse/ClickHouse#105546 (commit f8467af) onto antalya-26.3. Re-opened from Altinity/ClickHouse (instead of a fork) so CI publishes direct .deb package URLs for clickhouse-regression. Cherry-pick applied cleanly with no contextual conflicts.

Upstream issue: ClickHouse/ClickHouse#90731

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix reading Iceberg tables whose ARRAY column names contain a dot (e.g. `a.b` ARRAY<STRING>), which previously returned empty arrays. Two upstream defects were responsible: ColumnsDescription::getAllRegisteredNames filtered out dotted names, and NestedUtils::getSubcolumnsOfNested misclassified lone dotted Array(T) columns as flattened Nested children.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Symptom

When querying an Iceberg table through the iceberg(...) table function or a DataLakeCatalog, a column whose name contains a . and whose type is Array(T) returned empty arrays instead of the stored values. The same data read by Spark returned the expected values.

-- Spark
CREATE TABLE table7 (`a.b` ARRAY<STRING>);
INSERT INTO table7 VALUES (ARRAY('a','b','c'));

-- ClickHouse (before fix)
SELECT `a.b` FROM iceberg('...');
-- got:      [ ]
-- expected: ['a','b','c']

Root cause

The Parquet V3 reader path (SchemaConverter + ColumnMapper + FormatFilterInfo) is already correct after the dotted-name field-id work in 0a218cd4e8b, 4b733bae561 and f24c1a46063 (present in antalya-26.3). The remaining symptom comes from two upstream defects, independent of Iceberg but exposed by it:

  1. ColumnsDescription::getAllRegisteredNames explicitly filtered out any column whose name contained ., assuming such names were always flattened Nested subcolumns. A column whose stored name literally contains a dot (allowed by MergeTree with backticks, produced by Iceberg/Spark) is a first-class registered name. The function is only consumed by IHints-style suggestion paths (and StorageSystemZooKeeper column-name iteration, where no dotted names exist), so relaxing it has no effect on parsing, planning, storage, or wire protocol.
  2. NestedUtils::getSubcolumnsOfNested treated every dotted-name Array(T) column as a flattened element of a synthetic Nested structure named after the prefix. This made the Arrow, ORC and pre-V3 Parquet readers look for a struct field with the prefix name rather than the literal dotted column, returning an empty array.

Fix

  • ColumnsDescription::getAllRegisteredNames — drop the dot filter; return every registered column name.
  • NestedUtils::getSubcolumnsOfNested — two-pass scan: a synthetic Nested entry is emitted only when at least two Array(T) columns share the same dotted prefix. A lone a.b: Array(T) no longer appears in the synthetic-Nested map. Genuine flattened Nested with multiple fields is unaffected; the existing early-continue on isNested() covers the one-field edge case.

Tests

  • tests/integration/test_storage_iceberg_with_spark/test_column_names_with_dots.py::test_dotted_array_column — end-to-end repro of Iceberg: Selecting from ARRAY column with dot-separated name returns empty lists ClickHouse/ClickHouse#90731 against s3, azure and local storage.
  • …::test_dotted_array_alongside_real_nested — mixed-schema guard: a lone dotted Array column coexists with a genuine flattened-Nested sibling group.
  • tests/queries/0_stateless/04259_dotted_array_not_nested.sql — isolates the NestedUtils fix without Iceberg (Memory engine).
  • tests/queries/0_stateless/04260_dotted_column_in_hints.sh — verifies the ColumnsDescription fix via misspelling-hint output.

Risk / scope

Low. Five-line removal in ColumnsDescription (hint suggestions only) and a localised two-pass refactor in NestedUtils::getSubcolumnsOfNested. No header changes, no new settings, no public API change. Parquet V3 path is not modified.


CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

When querying an Iceberg table through the `iceberg(...)` table function
or a DataLakeCatalog, a column whose name contains a `.` and whose type
is `Array(T)` (e.g. `` `a.b` ARRAY<STRING> ``) returned empty arrays
instead of the stored values. The same data read by Spark returned the
expected values. Fixes ClickHouse#90731.

The Parquet V3 reader path (`SchemaConverter` + `ColumnMapper` +
`FormatFilterInfo`) is already correct after the dotted-name field-id
work in 0a218cd, 4b733ba and f24c1a4. This change addresses
two residual upstream defects that affect dotted-name `Array(T)`
columns regardless of source:

* `ColumnsDescription::getAllRegisteredNames` explicitly filtered out
  any column whose name contained `.`, under the assumption such names
  were always flattened Nested subcolumns. A column whose stored name
  literally contains a dot (allowed by MergeTree with backticks, and
  produced by Iceberg / Spark) is a first-class registered name and
  must appear in `IHints` misspelling suggestions. The function is only
  consumed by `IHints`-style suggestion paths (and by
  `StorageSystemZooKeeper` for column-name iteration, where no dotted
  names exist), so relaxing it has no effect on parsing, planning,
  storage, or wire protocol.

* `NestedUtils::getSubcolumnsOfNested` treated every `Array(T)` column
  whose name contained `.` as a flattened element of a synthetic
  `Nested` structure named after the prefix. This caused the Arrow,
  ORC and pre-V3 Parquet readers to look for a struct field with the
  prefix name in the data file rather than the literal dotted column,
  returning an empty array. The fix uses a two-pass scan: a synthetic
  `Nested` entry is only emitted when at least two `Array(T)` columns
  share the same dotted prefix. A lone column such as `a.b: Array(T)`
  no longer appears in the synthetic-Nested map. Genuine flattened
  `Nested` with multiple fields is unaffected; the existing
  early-continue on `isNested()` also covers the one-field-Nested
  edge case.

Tests:
* `tests/integration/test_storage_iceberg_with_spark/test_column_names_with_dots.py::test_dotted_array_column` —
  end-to-end repro of ClickHouse#90731 against s3, azure and local storage.
* `test_dotted_array_alongside_real_nested` in the same file — mixed-
  schema regression guard verifying a lone dotted `Array` column
  coexists with genuine flattened-Nested siblings.
* `tests/queries/0_stateless/04259_dotted_array_not_nested.sql` —
  isolates Bug B without Iceberg.
* `tests/queries/0_stateless/04260_dotted_column_in_hints.sh` —
  verifies Bug A by checking the misspelling hint output.

Changelog category (leave one):
- Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry:
Fix reading Iceberg tables whose `ARRAY` column names contain a dot
(e.g. `` `a.b` ARRAY<STRING> ``), which previously returned empty
arrays. Two upstream defects were responsible:
`ColumnsDescription::getAllRegisteredNames` filtered out dotted names,
and `NestedUtils::getSubcolumnsOfNested` misclassified lone dotted
`Array(T)` columns as flattened `Nested` children.

(cherry picked from commit f8467af)
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

Workflow [PR], commit [c7fd56b]

@il9ue

il9ue commented Jun 10, 2026

Copy link
Copy Markdown
Author

CI triage on ef84f17

— the failures are unrelated to this change:

Fast test (the only blocker in the PR workflow) reported Failed: 0, Passed: 2112, Skipped: 479, Broken: 0. It exited non-zero only on the harness-level checks Server died and clickhouse-test — i.e. the server process crashed during the run, not any functional test. No test failed, including the new 04259_dotted_array_not_nested and 04260_dotted_column_in_hints added here.

The Community PR workflow's 25 "errors" are also unrelated — they're GitHub Actions if-evaluation failures (fromJson: empty input from an empty Config output), not test results.

This looks like a flaky Server died. Could a maintainer re-run the PR workflow? Happy to dig in if it reproduces.

@ianton-ru

ianton-ru commented Jun 10, 2026

Copy link
Copy Markdown

In upstream Fast tests are failed too, with

[2026-06-03 06:02:09]       | [ip-172-31-42-164] 2026.06.03 07:59:48.401706 [ 254507 ] {e7385def-2016-4f26-8090-78dfb25fcee1} <Error> executeQuery: Code: 341. 
                      DB::Exception: Exception happened during execution of mutation 'mutation_2.txt' with part 'all_1_1_0' reason: 'Code: 190. DB::Exception: 
                      Elements 'nested.arr1' and 'nested.arr2' of Nested data structure (Array columns) have different array sizes (2 and 0 respectively) on row
                       0: while executing 'FUNCTION validateNestedArraySizes(1_UInt8 : 2, nested.arr1 : 0, nested.arr2 : 1) -> validateNestedArraySizes(1_UInt8,
                       nested.arr1, nested.arr2) UInt8 : 4'. (SIZES_OF_ARRAYS_DONT_MATCH) (version 26.6.1.1)'. This error maybe retryable or not. In case of 
                      unretryable error, mutation can be killed with KILL MUTATION query 
...
[2026-06-03 06:02:09]       | . (UNFINISHED) (version 26.6.1.1) (from [::ffff:127.0.0.1]:60466) (comment: 02565_update_empty_nested.sql-test_xxain04q) (query 6,
                       line 17) (in query: ALTER TABLE t_update_empty_nested UPDATE `nested.arr2` = `nested.arr1` WHERE 1;), Stack trace (when copying this 
                      message, always include the lines below):

Looks like reason is a changes in getSubcolumnsOfNested.

And can't find 04259_dotted_array_not_nested
https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1894&sha=ef84f172dccb85f5e294eb2df384e8d59711b42b&name_0=PR&name_1=Fast+test&name_2=Tests, sorted by name:
image

@zvonand zvonand left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two upstream defects were responsible: ColumnsDescription::getAllRegisteredNames filtered out dotted names, and NestedUtils::getSubcolumnsOfNested misclassified lone dotted Array(T) columns as flattened Nested children.

This part does not belong to changelog (both here and in your upstream PR). If you want, you can put it somewhere in description (however, I see, you already have more).
Changelog entry is a short user-facing line, one of tens of lines in release notes, it does not need all that implementatuon details.

@mkmkme mkmkme left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fast test is failing due to nullptr dereference. Please fix it

Daniel Q. Kim and others added 2 commits June 10, 2026 17:15
…s not needed

The earlier `Nested::getSubcolumnsOfNested` change added a two-pass `>= 2`
prefix-count gate so a lone dotted `Array(T)` column (e.g. `` `a.b` ``) would not
be collapsed into a synthetic `Nested`. That gate is the root cause of a server
crash and breaks existing mutation behavior, and it is unnecessary for the
Iceberg fix. This restores `getSubcolumnsOfNested` to the `antalya-26.3` base
behavior (it becomes byte-identical to base again). The
`ColumnsDescription::getAllRegisteredNames` hint fix and the regression tests are
kept.

Root cause of the crash
=======================
The gate's prefix count is computed over whatever column subset the caller
passes. For wide parts, `IMergeTreeReader` builds
`converted_requested_columns = Nested::convertToSubcolumns(columns_)` from the
*per-read requested set*. When a single member of a genuine flat-`Nested` group
is read on its own, the subset count is 1, so the gate refuses to remap it to a
`Nested` subcolumn:

* `03742_test_flattened_crash`
  (`SELECT arr.nested FROM ... ORDER BY arr.nested`): the dropped-and-re-added
  `arr.nested Array(Tuple(a String, b Float64))` is materialized through the
  plain-array default path instead of the `Nested` path, producing a column
  whose declared type does not match its backing data. Sending the result then
  dereferences a null `ColumnString` -> `SIGSEGV` in
  `SerializationString::serializeBinaryBulk` (via `SerializationArray` ->
  `SerializationTuple` -> `SerializationNamed` -> `NativeWriter`). The fault
  zeroes the test-failure count, which is why the Fast test reports
  `Server died` instead of a failed test.

* `02565_update_empty_nested`: `nested.arr2` read alone during the mutation loses
  the shared-offset `Nested` serialization, so its default value reads with the
  wrong size -> `validateNestedArraySizes` reports `SIZES_OF_ARRAYS_DONT_MATCH`
  (2 vs 0).

The count cannot be made reliable from the per-read subset: at the
`NamesAndTypesList` level a real flat-`Nested` member and a standalone dotted
`Array(T)` are indistinguishable, and basing the decision on the full storage
schema instead introduces a separate mutation-write corruption
(`CANNOT_READ_ALL_DATA`, truncated `nested.size0` marks).

Why the gate is not needed
==========================
The Iceberg / object-storage read path does not call `getSubcolumnsOfNested`,
`collect`, or `convertToSubcolumns` at all (only MergeTree, Log,
`generateRandom`, `mergeTreeIndex`, and the Arrow/ORC struct-prefix branch do,
and a lone dotted column does not enter that branch). Reading a lone dotted
`a.b Array(String)` — and a mix of a lone dotted column with a genuine flat
`Nested` (`c.x`, `c.y`) — from a real local Iceberg table (both the `Iceberg`
engine and the `icebergLocal` table function) returns the correct values without
this change; the dotted-name `Array` read is already handled by the Parquet
field-id work. So the gate provided no benefit while introducing the crash.

Validated locally: `03742_test_flattened_crash` no longer crashes,
`02565_update_empty_nested` returns `450000 450000`,
`04259_dotted_array_not_nested` and `04260_dotted_column_in_hints` pass, the
`icebergLocal` dotted-array reads are correct, and single-field `Nested` reads
are unchanged.
@il9ue

il9ue commented Jun 10, 2026

Copy link
Copy Markdown
Author

Pushed a fix for the Fast test crash (commit b0b731e). Short version: the crash is a real SIGSEGV from the getSubcolumnsOfNested >= 2 gate — evaluated over the per-read column subset, it misclassifies a lone flat-Nested member, producing a column whose type doesn't match its backing data, which then null-derefs in SerializationString::serializeBinaryBulk on result send. Same gate also drives ianton-ru's 02565 SIZES_OF_ARRAYS_DONT_MATCH.

The gate turns out not to be needed for the Iceberg fix (the object-storage read path never calls these functions; dotted Array reads are handled by the Parquet field-id work), so this reverts getSubcolumnsOfNested to the antalya-26.3 base and keeps the ColumnsDescription hint fix + tests.

CI is re-running now — watching Fast test and the Iceberg integration test. Full root-cause writeup once it's green. Note: upstream ClickHouse#105546 carries the identical gate, so it'll need the same revert to stay in sync.

@il9ue

il9ue commented Jun 11, 2026

Copy link
Copy Markdown
Author

Status update on commit 5f6b0cf

The crash is fixed. Fast test is now green — Failed: 0, Passed: 7984, Skipped: 2148, Broken: 0 (was Server died before). All builds, all unit tests, and the full stateless suite across asan/tsan/ubsan/msan/debug pass, including the debug runs that previously died.

Root cause (for the record): the getSubcolumnsOfNested >= 2 prefix-count gate is evaluated over the per-read column subset. When one member of a genuine flat-Nested group is read alone, the count is 1, so it isn't remapped to its Nested subcolumn and gets materialized through the plain-array path with a type that doesn't match its backing column — which then null-derefs in SerializationString::serializeBinaryBulk on result send (SIGSEGV, zeroing the test-failure count, hence Server died). Same gate drove ianton-ru's 02565 SIZES_OF_ARRAYS_DONT_MATCH. The gate isn't needed for the Iceberg fix (the object-storage read path never calls these functions), so this reverts getSubcolumnsOfNested to the antalya-26.3 base and keeps the ColumnsDescription hint fix + tests.

Remaining red checks — triaging, do not look related so far:

  • Stateless amd_tsan s3 2/2 and arm_asan azure 2/2: 1 failed test each out of 4500+ (03572_export_..., 03443_shared_storage_snapshots) — both flaky-retry timing tests, unrelated to column handling.
  • Regression (settings) release + aarch64: 1 scenario of 1571, same single failure both arches — looks like a known flake.
  • Stress (amd_release) and the regression suites show heavy retry churn on this branch.
  • Grype Scan: container-image CVE, not code.
  • Integration amd_msan 2/6, 4/6, 5/6: confirming the failing tests here aren't Iceberg/Nested — other shards (1/3/6) and every other arch/sanitizer integration config are green, so this looks msan-shard-specific, but I'm checking the actual test names before calling it.

The run is still completing; I'll confirm the msan integration tests and post a final note once it's done. Separately: upstream ClickHouse#105546 carries the identical gate and will need the same revert to stay in sync.

@il9ue

il9ue commented Jun 11, 2026

Copy link
Copy Markdown
Author

Ready for Review

  • on another look. Commit 5f6b0cf.

The crash is fixed. Fast test is green: Failed: 0, Passed: 7984, Skipped: 2148, Broken: 0 (previously Server died). All builds, all unit tests, and the full stateless suite across asan/tsan/ubsan/msan/debug pass.

Root cause: the getSubcolumnsOfNested >= 2 prefix-count gate is evaluated over the per-read column subset. When a single member of a genuine flat-Nested group is read alone the count is 1, so it isn't remapped to its Nested subcolumn and is materialized through the plain-array path with a type that doesn't match its backing column -> null-deref in SerializationString::serializeBinaryBulk on result send (SIGSEGV, which zeroes the failure count, hence Server died). Same gate drove the 02565 SIZES_OF_ARRAYS_DONT_MATCH. The gate isn't needed for the Iceberg fix (the object-storage read path never calls these functions; dotted Array reads are handled by the Parquet field-id work), so this reverts getSubcolumnsOfNested to the antalya-26.3 base (byte-identical) and keeps the ColumnsDescription hint fix + tests. Changelog trimmed to one user-facing line per @zvonand.

Remaining red checks are not related to this change:

  • Integration tests (amd_msan 2/6, 4/6, 5/6) and Stateless (amd_msan WasmEdge 2/4): marked error — "Job failed to produce Result due to a script error or CI runner issue" (infra/runner, not test failures). Every other integration config across asan/tsan/binary/arm is Failures: 0.
  • Stateless amd_tsan s3 2/2 and arm_asan azure 2/2: 1 flaky-retry test each (03572_export_..., 03443_shared_storage_snapshots) — timing-sensitive, unrelated to column handling.
  • Stress (amd_release): server fails to start attaching a Hybrid(...) table (test_hybrid_unqualified_segment does not exist) — a separate engine, untouched by this PR.
  • Regression (settings) / Grype scan: 1 settings scenario of 1571 + a container-image CVE — neither code-related here.

@mkmkme @ianton-ru @zvonand — could you take another look? Separately: upstream ClickHouse#105546 carries the identical gate (its Fast test fails the same way) and will need the same revert to stay in sync.

@mkmkme

mkmkme commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

@il9ue please do push the fixing commit to the upstream as well. Their AI review can spot some things we don't from the get-go.

@zvonand

zvonand commented Jun 11, 2026

Copy link
Copy Markdown
Member

Changelog trimmed to one user-facing line per @zvonand.

No, it wasn't 😄

@mkmkme

mkmkme commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Breezy!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

names.reserve(columns.size());
for (const auto & column : columns)
{
if (!column.name.contains('.'))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now PR only remove this single line.
This code was added in ClickHouse#35852
Can this change break hints?
For example return all database.table combinations instead of only database, or something like this?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this check was added, it is possible it was added by some reason in that PR. Why?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked into both — why the filter was there, and whether removing it breaks hints. Short answer: it's safe, and the database.table case can't arise here.

Why ClickHouse#35852 added it. That PR introduced ColumnsDescription::getAllRegisteredNames together with the dot-filter as part of a new feature (column-name hints). The filter was a design choice to keep flattened-Nested subcolumns (e.g. n.a) out of the suggestion list — not a guard added to fix a specific breakage. So removing it doesn't re-open a known bug; it only makes dotted / Nested-subcolumn names eligible as suggestions.

It's column-scoped — so no database.table. This method iterates one table's columns and returns each column.name. It only ever returns column names of a single table, never database or table names. The . in a.b is part of a literal column name, not a db.table separator, so no database.table combination can originate here.

No consumer splits on .. The list feeds only the IHints prompter (NamePrompter::getHints), which is pure Levenshtein edit-distance over whole candidate strings — it never tokenizes on .. The consumers are the "no such column" / ALTER MODIFY|RENAME|DROP hint messages, plus StorageSystemZooKeeper (whose columns are hardcoded and contain no dot, so the change is a no-op there). A dotted name is just one more intact candidate; it appears only if it's an edit-distance near-miss of the typed name.

Observed output (filter removed):

MODIFY `a.x`      -> Maybe you meant: ['a.b'].
multi-dot `a.b.x` -> Maybe you meant: ['a.b.c'].   (intact, not split)
Nested `n.x`      -> Maybe you meant: ['n.a'].
non-dotted `idd`  -> Maybe you meant: ['id'].       (machinery unchanged)
far typo `xyz`    -> (no suggestion)                (never fabricates combos)

Multi-dot names stay whole; a far typo yields no suggestion. Removing the filter strictly adds correct near-match candidates and never malforms one. (I inferred the "before" from the single removed line plus the xyz no-match behavior rather than building a separate base binary — happy to produce a literal side-by-side if you'd like one.)

I also tightened 04260 to assert the dotted name appears specifically inside the Maybe you meant: [...] list, so if the filter were ever re-added (emptying that list) the test fails directly.

-- A lone Array(T) column with a dot in its name must not be collapsed into
-- a synthetic Nested structure and must be readable as a plain array.

CREATE TABLE t1 (`a.b` Array(String)) ENGINE = Memory;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this test makes sense in this PR. The PR is about Iceberg, and this test is for ENGINE=Memory

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid point — you're right that 04259 didn't belong. It used ENGINE = Memory, which doesn't exercise the Iceberg read path and doesn't even route through Nested::convertToSubcolumns / collect, so it wouldn't have caught a regression in the reverted code either.

Replaced it with 04261_iceberg_dotted_array.sh, which reads dotted-name Array columns through the actual Iceberg path — the IcebergLocal engine plus the icebergLocal() table function with an explicitly-declared schema. It asserts:

That's the Iceberg-path coverage the PR was missing. Local-only (no Spark/S3), tagged no-fasttest, and stable across randomized settings and async_insert=1.


$CLICKHOUSE_CLIENT -q "
CREATE TABLE t_dotted_hint (\`a.b\` Array(String))
ENGINE = MergeTree ORDER BY tuple();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with this test. It's testing MergeTree whilst the issue seemed to be for Iceberg specifically. Or am I getting it wrong?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid point. The fix isn't in Iceberg code — the bug surfaced via Iceberg, but the root cause was in engine-agnostic machinery:

  • ColumnsDescription::getAllRegisteredNames (the dot-name filter) and NestedUtils. This test targets exactly the line this PR changes (the hint path for a dotted column name), and MergeTree is just the simplest engine to exercise it deterministically without a Spark+S3 setup.
  • It's also what answers @ianton-ru's question above — it pins the hint behavior for a dotted name so we don't silently regress Refactoring of hints for column descriptor ClickHouse/ClickHouse#35852. I'll make sure it asserts the actual hint output (not just readability) and link the before/after here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also tells the missing hole of Iceberg test suite I've shared with you earlier and I will share my findings into our team channel

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix isn't in Iceberg-specific code — the bug surfaced via Iceberg, but the root cause was in engine-agnostic machinery: the dot-name filter in ColumnsDescription::getAllRegisteredNames. This test targets exactly the line this PR removes (the hint path for a dotted column name), and MergeTree is just the simplest engine to exercise it deterministically without a Spark+S3 setup. It's also what backs @ianton-ru's question above — it pins the hint behavior for a dotted name so re-adding the filter (which would empty the Maybe you meant: [...] list) fails the test directly. I tightened it to assert the dotted name appears specifically inside that suggestion list, not just anywhere in the message.

il9ue pushed a commit that referenced this pull request Jun 11, 2026
…tion list

The test exercises the line PR #1894 removes (the dot-filter in
`ColumnsDescription::getAllRegisteredNames`). It now asserts the dotted column
name appears specifically inside the `Maybe you meant: [...]` suggestion list
rather than anywhere in the message, so re-introducing the filter (which would
empty the suggestion list) is caught directly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Daniel Q. Kim added 2 commits June 11, 2026 14:21
…tion list

The test exercises the line PR #1894 removes (the dot-filter in
`ColumnsDescription::getAllRegisteredNames`). It now asserts the dotted column
name appears specifically inside the `Maybe you meant: [...]` suggestion list
rather than anywhere in the message, so re-introducing the filter (which would
empty the suggestion list) is caught directly.
…Memory-engine 04259

`04261_iceberg_dotted_array.sh` reads dotted-name `Array` columns through the
actual Iceberg read path (`IcebergLocal` engine + `icebergLocal` table function
with an explicitly-declared schema), which the previous tests did not cover:

* a lone `` `a.b` Array(String) `` column must read back its stored values
  (the core regression for ClickHouse#90731 — it previously came back empty), and
* a lone dotted column alongside dotted columns that share a prefix
  (`` `c.x` ``, `` `c.y` ``) — all must round-trip correctly.

`04259_dotted_array_not_nested.sql` is removed: it used `ENGINE = Memory`, which
does not exercise the Iceberg read path (flagged in review) and does not go
through `Nested::convertToSubcolumns` / `collect`, so it neither reproduced
Iceberg test above, while the hint-path change is covered by
`04260_dotted_column_in_hints.sh`.
@il9ue il9ue force-pushed the fix/iceberg-dotted-array-90731-antalya-26.3 branch from 6655b2b to def81c2 Compare June 11, 2026 12:23
@mkmkme

mkmkme commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

@il9ue,

Correct me if I'm wrong but from what I see none of the stateless tests can actually test that this issue has been fixed. The reason for that is that in the original issue the data was written by Spark.

So, this can be verified with an integration test. Your integration test indeed does check the issue. But there's another problem with that test: it passes on antalya-26.3. So there's no test to cover the changes, but also I'm not sure that issue actually deserves the PR.

I tried to investigate that with Cursor, and Cursor states that commits 0a218cd and 4b733ba (which are already in antalya-26.3) fixed most of the real Iceberg dotted-name mapping.

@mkmkme

mkmkme commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

From what I can see this PR at this point only effectively changes the error message from

<Error> executeQuery: Code: 10. DB::Exception: Wrong column. Cannot find column `a_b` to modify.

to

<Error> executeQuery: Code: 10. DB::Exception: Wrong column. Cannot find column `a_b` to modify. Maybe you meant: ['a.b']. 

This is a nice to have feature, but it's not what the original issue was about.

Also for me added test 04261 fails locally, both on antalya-26.3 and this branch. Let's see if it passes in CI
Locally failing 04261 is a misconfiguration from my side.

@zvonand

zvonand commented Jun 11, 2026

Copy link
Copy Markdown
Member

Correct me if I'm wrong but from what I see none of the stateless tests can actually test that this issue has been fixed. The reason for that is that in the original issue the data was written by Spark.

Actually there are some stateless tests that use pre-generated iceberg tables. (E.g. see tests/queries/0_stateless/data_minio/field_ids_complex_test/metadata/v1.metadata.json)

But here this is not the case.

@il9ue

il9ue commented Jun 12, 2026

Copy link
Copy Markdown
Author

@zvonand @mkmkme
Thanks for the review and you're both right. I verified it, and it seems that the read fix is already in base.
Closing this; details plus an honest account of why it took a long road to a simple answer.

The finding

  • Iceberg: Selecting from ARRAY column with dot-separated name returns empty lists ClickHouse/ClickHouse#90731 is already fixed on antalya-26.3 by 0a218cd and 4b733ba (both in SchemaConverter.cpp / useColumnMapperIfNeeded): when a Parquet V3 field-id maps to a top-level dotted name, those commits return the full name instead of splitting to the last component. That covers the ARRAY case. The net src diff of this PR vs current base is now a single line — the dot-filter removal in ColumnsDescription::getAllRegisteredNames (the NestedUtils/IMergeTreeReader work was added and then reverted, net-zero). So the only behavioral delta is the hint message gaining Maybe you meant: ['a.b'], exactly as @mkmkme said.

Why I went the long way around (and missed the simple truth).

  • The PR started from the premise that the fix lived in NestedUtils::getSubcolumnsOfNested, and a lot of effort went into the >= 2 gate there. That premise was wrong: the gate caused a real SIGSEGV in MergeTree (a single Nested member read alone → type/column mismatch → null deref in SerializationString), so most of the work became diagnosing and reverting a crash my own change introduced — not fixing Iceberg. The crash was real and worth removing, but it pulled focus away from the actual question.

The thing that should have caught this on day one maybe the quick review/verification I skipped, which is does the repro pass on base, without my change? I only added an Iceberg test (04261) at the end, and even then I read "it passes" as "it's covered" instead of asking "does it also pass on base?" — which @zvonand checked and it does. Worse, 04261 was masking: ClickHouse's own Iceberg writer emits no Parquet field-ids (field_id=-1), so a round-trip read takes the early return element.name; branch and never reaches the ColumnMapper path ClickHouse#90731 actually hits. The test exercised the literal-name path, gave false confidence, and didn't test the fix at all.

Dropping it; the real coverage is the Spark integration test that shipped with 0a218cd/4b733ba.

Lesson for next time: before trusting a regression test, run it on base first — a test that passes without the change proves the change does nothing. I should have done that before writing a line of fix code.

Closing as superseded by 0a218cd/4b733ba.
If the dotted-name hint suggestion is wanted on its own, I'll open a small separate PR — noting it reverses the intentional exclusion from ClickHouse#35852, so it deserves its own discussion.

Thanks for the careful review; it saved us shipping a no-op.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants