[core] Support merging sparse row ranges for global indexes#8250
Draft
JingsongLi wants to merge 1 commit into
Draft
[core] Support merging sparse row ranges for global indexes#8250JingsongLi wants to merge 1 commit into
JingsongLi wants to merge 1 commit into
Conversation
570adad to
c99169b
Compare
c99169b to
00fe4aa
Compare
00fe4aa to
8109f24
Compare
8109f24 to
cd649c7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds opt-in support for building global indexes over sparse, monotonically increasing RowID ranges whose gaps are permanent. It covers BTree global indexes and vector-style single-column global indexes, allowing index metadata to use the enclosing RowID span while reads and writers still honor the actual data ranges.
Changes
btree-index.build.merge-row-rangesandvector-index.build.merge-row-ranges, both disabled by default.IndexedSplitrow ranges while using the enclosing RowID range for index metadata when sparse ranges are merged.GlobalIndexSingleColumnWriter#write(@Nullable Object key, long relativeRowId)so BTree, vector, Tantivy, and Lumina writers all receive explicit relative RowIDs.file.format=lance.Testing
mvn -pl paimon-core -am -Pfast-build -DfailIfNoTests=false -Dtest=BtreeGlobalIndexTableTest#testBuildBTreeGlobalIndexWithDiscontinuousRowIdRanges testmvn -pl paimon-lance -am -Pfast-build -DfailIfNoTests=false -Dtest=LanceBTreeGlobalIndexTest#testBuildBTreeGlobalIndexWithDiscontinuousRowIdRanges testmvn -pl paimon-flink/paimon-flink-common -am -Pfast-build -DfailIfNoTests=false -Dtest=GenericIndexTopoBuilderTest testmvn -pl paimon-spark/paimon-spark-common -am -Pfast-build -DfailIfNoTests=false -Dtest=CreateGlobalIndexProcedureTest testmvn -pl paimon-vector/paimon-vector-index -am -Pfast-build -DfailIfNoTests=false -Dtest=VectorGlobalIndexerFactoryTest,VectorGlobalIndexTest testmvn -pl paimon-tantivy/paimon-tantivy-index -am -Pfast-build -DfailIfNoTests=false -Dtest=TantivyFullTextGlobalIndexTest testmvn -pl paimon-common,paimon-core,paimon-flink/paimon-flink-common,paimon-spark/paimon-spark-common,paimon-vector/paimon-vector-index,paimon-tantivy/paimon-tantivy-index,paimon-lumina -am -Pfast-build -DskipTests test-compilemvn -pl paimon-common,paimon-core,paimon-flink/paimon-flink-common,paimon-spark/paimon-spark-common,paimon-vector/paimon-vector-index,paimon-tantivy/paimon-tantivy-index,paimon-lumina spotless:checkgit diff --checkNotes
These options should only be enabled when RowID gaps are permanent and will not be filled later. With the default values, existing global index build behavior is unchanged.
The local Tantivy test command completed successfully, with Tantivy native test cases skipped by their test assumptions on this machine.