Unify arrow exports across all query result types by evertlammerts · Pull Request #495 · duckdb/duckdb-python

evertlammerts · 2026-06-12T21:24:50Z

This PR unifies arrow exports across query result types, and makes sure we always provide the schema from within a transaction.

We are dealing with 3 arrow export types:

Arrow Table
Arrow RecordBatch
Arrow C Stream

... across 3 result types:

StreamingQueryResult
ArrowQueryResult
StreamingQueryResult

The StreamingQueryResult paths are now unified. We re-feed the backing ColumnDataCollection to the engine for parallel conversion into a ArrowQueryResult, and then we delegate to the corresponding ArrowQueryResult path.

The ArrowQueryResult paths deal with materialized data already, and we have no way to plug into the transaction that generated it. The actual fix for this is to cache the schema when creating the ArrowQueryResult, during Finalize. This is a core change that we will probably apply in v2.0. The workaround is to fetch the schema in a separate transaction. For all paths, since we are already dealing with materialized data, we create an arrow table. Then for the streaming paths we return the corresponding stream types directly from the table.

The StreamingQueryResult paths always have access to a valid transaction context, and can get the arrow schema on demand even when that requires catalog access.

As a side effect of this PR, consuming an arrow c stream (reading from con.sql(q).__arrow_c_stream__()) is now lazy, i.e. not materialized. This makes consumption of course slower, but allows streaming much larger datasets.

The materialized paths are overall a little faster, and the non-c stream streaming paths as well.

  ┌───────────────────────────────────────────────────┬────────────────────┬───────────────────┬───────────────────┐
  │               benchmark expression                │ wall base→now (ms) │ CPU base→now (ms) │ mem base→now (MB) │
  ├───────────────────────────────────────────────────┼────────────────────┼───────────────────┼───────────────────┤
  │ r=con.sql(q); r.execute(); r.to_arrow_table()     │ 159 → 161          │ 259 → 286         │ 847 → 875         │
  ├───────────────────────────────────────────────────┼────────────────────┼───────────────────┼───────────────────┤
  │ r=con.sql(q); r.execute(); r.to_arrow_reader()    │ 161 → 144          │ 255 → 263         │ 896 → 877         │
  ├───────────────────────────────────────────────────┼────────────────────┼───────────────────┼───────────────────┤
  │ r=con.sql(q); r.execute(); r.__arrow_c_stream__() │ 157 → 136          │ 282 → 235         │ 854 → 881         │
  ├───────────────────────────────────────────────────┼────────────────────┼───────────────────┼───────────────────┤
  │ con.sql(q).to_arrow_table()                       │ 52 → 35            │ 267 → 244         │ 855 → 854         │
  ├───────────────────────────────────────────────────┼────────────────────┼───────────────────┼───────────────────┤
  │ con.execute(q).to_arrow_table()                   │ 202 → 174          │ 212 → 193         │ 548 → 554         │
  ├───────────────────────────────────────────────────┼────────────────────┼───────────────────┼───────────────────┤
  │ con.sql(q).to_arrow_reader()                      │ 186 → 175          │ 199 → 187         │ 552 → 552         │
  ├───────────────────────────────────────────────────┼────────────────────┼───────────────────┼───────────────────┤
  │ con.sql(q).__arrow_c_stream__()                   │ 48 → 173           │ 250 → 189         │ 857 → 554         │
  └───────────────────────────────────────────────────┴────────────────────┴───────────────────┴───────────────────┘

Fixes #475

…with a live connection / transaction

…only once

…only

Pull materialized CDCs through the engine again for arrow conversion …

426f6cc

…with a live connection / transaction

evertlammerts changed the title ~~Pull materialized CDCs through the engine again for arrow conversion …~~ Pull materialized CDCs through the engine again for arrow conversion with a live connection / transaction Jun 12, 2026

strip comments

5fd7f69

evertlammerts marked this pull request as ready for review June 12, 2026 21:42

evertlammerts added 3 commits June 15, 2026 17:05

run schema fetching in same transaction as arrow data conversion and …

7ddc75f

…only once

Get the arrow schema in a separate transaction for materialized data …

c8c35eb

…only

force windows 2022 runners

3211977

evertlammerts changed the title ~~Pull materialized CDCs through the engine again for arrow conversion with a live connection / transaction~~ Unify arrow exports across all query result types Jun 16, 2026

evertlammerts merged commit 6ac2daa into duckdb:v1.5-variegata Jun 16, 2026
15 checks passed

evertlammerts mentioned this pull request Jun 16, 2026

in duckdb==1.5.3, DuckDBPyRelation.to_arrow_table() raises InternalException for GEOMETRY('EPSG:xxx') columns #475

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify arrow exports across all query result types#495

Unify arrow exports across all query result types#495
evertlammerts merged 5 commits into
duckdb:v1.5-variegatafrom
evertlammerts:feat/arrow-promote-to-relation

evertlammerts commented Jun 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

evertlammerts commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

evertlammerts commented Jun 12, 2026 •

edited

Loading