Skip to content

[iceberg] Support ARRAY, MAP, ROW type conversions in IcebergConversions#8245

Open
weibangpeng wants to merge 1 commit into
apache:masterfrom
weibangpeng:iceberg-complex-type-conversions
Open

[iceberg] Support ARRAY, MAP, ROW type conversions in IcebergConversions#8245
weibangpeng wants to merge 1 commit into
apache:masterfrom
weibangpeng:iceberg-complex-type-conversions

Conversation

@weibangpeng

Copy link
Copy Markdown

Summary

  • Added ARRAY, MAP, ROW type serialization and deserialization support to IcebergConversions
  • Each element/field within complex types is prefixed with a 4-byte LE length to handle variable-length children correctly
  • Null elements are represented with length -1

Test plan

  • Added IcebergConversionsComplexTypeTest with 17 tests covering:
    • ARRAY: primitive types, strings, empty arrays, null elements, nested arrays, decimals, timestamps
    • MAP: int→string, empty maps, null values, array values
    • ROW: mixed field types, null fields, nested rows, fields with array and map
    • Mixed nesting: array of rows, map with array values, row with array and map fields
  • All 67 existing IcebergConversions tests continue to pass (no regressions)
  • IcebergCompatibilityTest (19 tests) continues to pass

Closes #3788

(Timestamp) value, ((LocalZonedTimestampType) type).getPrecision());
case TIME_WITHOUT_TIME_ZONE:
return timeToByteBuffer((Integer) value, ((TimeType) type).getPrecision());
case ARRAY:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This writes a Paimon-private encoding into Iceberg lower/upper bounds for nested types. Iceberg's binary single-value serialization only has implementations for primitive values (Iceberg Conversions throws for list/map/struct), so an Iceberg reader that sees these bounds will try to decode them as Iceberg metadata and fail or misinterpret them. If we cannot encode these values using Iceberg's standard format, we should omit lower/upper bounds for ARRAY/MAP/ROW instead of serializing a custom round-trip format here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature][core] Implement more type conversions for IcebergConversions

2 participants