Skip to content

Unify arrow exports across all query result types#495

Merged
evertlammerts merged 5 commits into
duckdb:v1.5-variegatafrom
evertlammerts:feat/arrow-promote-to-relation
Jun 16, 2026
Merged

Unify arrow exports across all query result types#495
evertlammerts merged 5 commits into
duckdb:v1.5-variegatafrom
evertlammerts:feat/arrow-promote-to-relation

Conversation

@evertlammerts

@evertlammerts evertlammerts commented Jun 12, 2026

Copy link
Copy Markdown
Member

This PR unifies arrow exports across query result types, and makes sure we always provide the schema from within a transaction.

We are dealing with 3 arrow export types:

  • Arrow Table
  • Arrow RecordBatch
  • Arrow C Stream

... across 3 result types:

  • StreamingQueryResult
  • ArrowQueryResult
  • StreamingQueryResult

The StreamingQueryResult paths are now unified. We re-feed the backing ColumnDataCollection to the engine for parallel conversion into a ArrowQueryResult, and then we delegate to the corresponding ArrowQueryResult path.

The ArrowQueryResult paths deal with materialized data already, and we have no way to plug into the transaction that generated it. The actual fix for this is to cache the schema when creating the ArrowQueryResult, during Finalize. This is a core change that we will probably apply in v2.0. The workaround is to fetch the schema in a separate transaction. For all paths, since we are already dealing with materialized data, we create an arrow table. Then for the streaming paths we return the corresponding stream types directly from the table.

The StreamingQueryResult paths always have access to a valid transaction context, and can get the arrow schema on demand even when that requires catalog access.

As a side effect of this PR, consuming an arrow c stream (reading from con.sql(q).__arrow_c_stream__()) is now lazy, i.e. not materialized. This makes consumption of course slower, but allows streaming much larger datasets.

The materialized paths are overall a little faster, and the non-c stream streaming paths as well.

  ┌───────────────────────────────────────────────────┬────────────────────┬───────────────────┬───────────────────┐
  │               benchmark expression                │ wall base→now (ms) │ CPU base→now (ms) │ mem base→now (MB) │
  ├───────────────────────────────────────────────────┼────────────────────┼───────────────────┼───────────────────┤
  │ r=con.sql(q); r.execute(); r.to_arrow_table()     │ 159 → 161          │ 259 → 286         │ 847 → 875         │
  ├───────────────────────────────────────────────────┼────────────────────┼───────────────────┼───────────────────┤
  │ r=con.sql(q); r.execute(); r.to_arrow_reader()    │ 161 → 144          │ 255 → 263         │ 896 → 877         │
  ├───────────────────────────────────────────────────┼────────────────────┼───────────────────┼───────────────────┤
  │ r=con.sql(q); r.execute(); r.__arrow_c_stream__() │ 157 → 136          │ 282 → 235         │ 854 → 881         │
  ├───────────────────────────────────────────────────┼────────────────────┼───────────────────┼───────────────────┤
  │ con.sql(q).to_arrow_table()                       │ 52 → 35            │ 267 → 244         │ 855 → 854         │
  ├───────────────────────────────────────────────────┼────────────────────┼───────────────────┼───────────────────┤
  │ con.execute(q).to_arrow_table()                   │ 202 → 174          │ 212 → 193         │ 548 → 554         │
  ├───────────────────────────────────────────────────┼────────────────────┼───────────────────┼───────────────────┤
  │ con.sql(q).to_arrow_reader()                      │ 186 → 175          │ 199 → 187         │ 552 → 552         │
  ├───────────────────────────────────────────────────┼────────────────────┼───────────────────┼───────────────────┤
  │ con.sql(q).__arrow_c_stream__()                   │ 48 → 173           │ 250 → 189         │ 857 → 554         │
  └───────────────────────────────────────────────────┴────────────────────┴───────────────────┴───────────────────┘

Fixes #475

@evertlammerts evertlammerts changed the title Pull materialized CDCs through the engine again for arrow conversion … Pull materialized CDCs through the engine again for arrow conversion with a live connection / transaction Jun 12, 2026
@evertlammerts evertlammerts marked this pull request as ready for review June 12, 2026 21:42
@evertlammerts evertlammerts changed the title Pull materialized CDCs through the engine again for arrow conversion with a live connection / transaction Unify arrow exports across all query result types Jun 16, 2026
@evertlammerts evertlammerts merged commit 6ac2daa into duckdb:v1.5-variegata Jun 16, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant