Skip to content

[core] Support merging sparse row ranges for global indexes#8250

Draft
JingsongLi wants to merge 1 commit into
apache:masterfrom
JingsongLi:codex/btree-merge-row-ranges
Draft

[core] Support merging sparse row ranges for global indexes#8250
JingsongLi wants to merge 1 commit into
apache:masterfrom
JingsongLi:codex/btree-merge-row-ranges

Conversation

@JingsongLi

@JingsongLi JingsongLi commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR adds opt-in support for building global indexes over sparse, monotonically increasing RowID ranges whose gaps are permanent. It covers BTree global indexes and vector-style single-column global indexes, allowing index metadata to use the enclosing RowID span while reads and writers still honor the actual data ranges.

Changes

  • Add btree-index.build.merge-row-ranges and vector-index.build.merge-row-ranges, both disabled by default.
  • Preserve actual IndexedSplit row ranges while using the enclosing RowID range for index metadata when sparse ranges are merged.
  • Replace the old single-column writer variants with GlobalIndexSingleColumnWriter#write(@Nullable Object key, long relativeRowId) so BTree, vector, Tantivy, and Lumina writers all receive explicit relative RowIDs.
  • Wire sparse range merging through core, Flink, and Spark BTree index builders, plus Flink/Spark generic vector index builders.
  • Relax global index RowID existence conflict checks only when the corresponding merge option is enabled and the span endpoints are covered by current data files.
  • Update vector/full-text test index helpers to persist row ids, so sparse relative RowIDs are returned correctly by test readers.
  • Reuse overridable table schemas in the discontinuous RowID BTree test so Lance format coverage keeps using file.format=lance.

Testing

  • mvn -pl paimon-core -am -Pfast-build -DfailIfNoTests=false -Dtest=BtreeGlobalIndexTableTest#testBuildBTreeGlobalIndexWithDiscontinuousRowIdRanges test
  • mvn -pl paimon-lance -am -Pfast-build -DfailIfNoTests=false -Dtest=LanceBTreeGlobalIndexTest#testBuildBTreeGlobalIndexWithDiscontinuousRowIdRanges test
  • mvn -pl paimon-flink/paimon-flink-common -am -Pfast-build -DfailIfNoTests=false -Dtest=GenericIndexTopoBuilderTest test
  • mvn -pl paimon-spark/paimon-spark-common -am -Pfast-build -DfailIfNoTests=false -Dtest=CreateGlobalIndexProcedureTest test
  • mvn -pl paimon-vector/paimon-vector-index -am -Pfast-build -DfailIfNoTests=false -Dtest=VectorGlobalIndexerFactoryTest,VectorGlobalIndexTest test
  • mvn -pl paimon-tantivy/paimon-tantivy-index -am -Pfast-build -DfailIfNoTests=false -Dtest=TantivyFullTextGlobalIndexTest test
  • mvn -pl paimon-common,paimon-core,paimon-flink/paimon-flink-common,paimon-spark/paimon-spark-common,paimon-vector/paimon-vector-index,paimon-tantivy/paimon-tantivy-index,paimon-lumina -am -Pfast-build -DskipTests test-compile
  • mvn -pl paimon-common,paimon-core,paimon-flink/paimon-flink-common,paimon-spark/paimon-spark-common,paimon-vector/paimon-vector-index,paimon-tantivy/paimon-tantivy-index,paimon-lumina spotless:check
  • git diff --check

Notes

These options should only be enabled when RowID gaps are permanent and will not be filled later. With the default values, existing global index build behavior is unchanged.

The local Tantivy test command completed successfully, with Tantivy native test cases skipped by their test assumptions on this machine.

@JingsongLi JingsongLi force-pushed the codex/btree-merge-row-ranges branch 3 times, most recently from 570adad to c99169b Compare June 16, 2026 06:58
@JingsongLi JingsongLi changed the title [core] Support merging sparse row ranges for btree index [WIP][core] Support merging sparse row ranges for btree index Jun 16, 2026
@JingsongLi JingsongLi force-pushed the codex/btree-merge-row-ranges branch from c99169b to 00fe4aa Compare June 16, 2026 08:08
@JingsongLi JingsongLi changed the title [WIP][core] Support merging sparse row ranges for btree index [WIP][core] Support merging sparse row ranges for global indexes Jun 16, 2026
@JingsongLi JingsongLi force-pushed the codex/btree-merge-row-ranges branch from 00fe4aa to 8109f24 Compare June 16, 2026 09:54
@JingsongLi JingsongLi changed the title [WIP][core] Support merging sparse row ranges for global indexes [core] Support merging sparse row ranges for global indexes Jun 16, 2026
@JingsongLi JingsongLi force-pushed the codex/btree-merge-row-ranges branch from 8109f24 to cd649c7 Compare June 16, 2026 13:41
@JingsongLi JingsongLi marked this pull request as draft June 16, 2026 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant