Purpose
Design discussion for the texture / surface API in cuda.core — to settle the API shape and
naming before code review of the implementation. Reviewers asked for design sign-off in an issue
before we commit to a ~9k-line feature.
cc @leofang @mdboom @Andy-Jost @kkraus14 — you asked for a design pass; this is the home for it.
Proposed public surface (from #2095)
Array + ArrayFormat — opaque, hardware-laid-out GPU allocations backing textures/surfaces.
MipmappedArray — wraps CUmipmappedArray; get_level returns a non-owning Array kept alive
by a strong ref to the parent.
TextureObject + TextureDescriptor — bindless texture handle + sampling state.
SurfaceObject — bindless surface handle; requires Array(surface_load_store=True).
ResourceDescriptor — factories from_array, from_mipmapped_array, from_linear, from_pitch2d.
Decisions to make
-
Name of Array. ✅ Decided — rename Array → CUDAArray.
This type is an opaque cudaArray_t — the GPU stores it in a scrambled, hardware-defined layout
with no linear pointer, so it cannot expose __cuda_array_interface__ / DLPack and cannot
share memory zero-copy with cupy / numba-cuda / torch. The name Array implies an n-dimensional
array that participates in that ecosystem — it can't. CuPy names the identical type CUDAarray,
and its whole cupy.cuda.texture module already matches this PR's surface 1:1.
Resolution: use CUDAArray — the PEP 8 CapWords form (deliberately differing from CuPy's exact
CUDAarray casing to follow Python's class-naming standard). The name signals "CUDA texture/surface
backing store," not "n-dimensional array." Open detail only: whether the related ArrayFormat
follows suit (e.g. CUDAArrayFormat) for consistency.
-
Interop path. Decision: ship only copy_from / copy_to (like CuPy), or also add a
sanctioned copy-bridge helper to/from linear cuda.core buffers?
Zero-copy is impossible, so copying is the only option — this is purely about how polished
the path is. Either way, document the copy-only contract so the type doesn't surprise anyone.
-
Factory set. Decision: ship all four ResourceDescriptor factories, or only the two
Array-backed ones and defer linear-memory support to a follow-up?
A texture can be backed by four kinds of memory — the PR exposes one factory per kind:
from_array — texture over an Array (the headline feature)
from_mipmapped_array — texture over a MipmappedArray (the headline feature)
from_linear — texture over a plain 1D device buffer (ordinary linear memory, no Array)
from_pitch2d — texture over a plain 2D pitched buffer (ordinary linear memory, no Array)
The first two are what this feature is about. The last two are a separate path that textures
regular device memory unrelated to the new Array type. Shipping all four is more complete but
commits more public API in 1.1; shipping only the Array-backed two keeps the initial surface
focused and easier to walk back.
-
Channel format. Decision: describe an element with two loose parameters (an ArrayFormat
enum + a num_channels int), or with one bundled ChannelFormatDescriptor object like CuPy?
Each array element has a component type (e.g. 8-bit uint, 32-bit float) and a channel count
(1 = grayscale … 4 = RGBA). CUDA's C API bundles both into one struct (cudaChannelFormatDesc).
Two ways to surface that:
- Folded (this PR):
Array.from_descriptor(shape=..., format=ArrayFormat.FLOAT32, num_channels=4)
- Separate (CuPy): one
ChannelFormatDescriptor(...) object passed as a unit
Folded is a smaller surface and a simpler call. Separate matches CuPy and is a single object you
can read back when introspecting an existing array's format, instead of reconstructing two fields.
-
Descriptor type consistency. Decision: is the @dataclass / cdef class split deliberate
(and documented), or should the two descriptors be the same kind of type for a predictable API?
Both are config objects you fill in and pass to the texture machinery, but they're built as
different kinds of object:
TextureDescriptor → @dataclass (pure Python: auto __init__ / repr / equality, easy to
construct and inspect)
ResourceDescriptor → cdef class (Cython extension type: holds native C struct fields, faster,
but more rigid — no auto-generated niceties, fixed attributes)
The split may be justified: ResourceDescriptor wraps a C union (CUDA_RESOURCE_DESC, over
array / mipmap / linear / pitch2d backings) and its from_* factories poke struct fields, so a
cdef class is natural; TextureDescriptor is just a bag of sampling settings, so a @dataclass
is the simple choice. But to a user they look like siblings yet behave differently (repr, equality,
construction, subclassing). The author should confirm the divergence is intentional, or unify them.
-
Bool naming. ✅ Decided — adopt the is_<something> convention.
surface_load_store is a boolean on Array: it records whether the array was created with the
surface load/store capability (CUDA's CUDA_ARRAY3D_SURFACE_LDST), which a SurfaceObject
requires. Exposed both as a constructor keyword (surface_load_store=True) and a read-only
property (arr.surface_load_store).
The repo convention for boolean properties is is_<something>, so a property named
surface_load_store doesn't read as a boolean the way arr.is_managed does. Resolution: rename
the property to follow the is_<x> convention (e.g. is_surface_load_store) for consistency with
the cuda-python codebase. Open detail only: the exact name, and whether to rename the constructor
keyword too or leave the plain surface_load_store= option as-is.
-
Scope. ✅ Decided — split the examples into a follow-up PR.
The nine gl_interop_*.py examples (~5k lines, not CI-wired, need a GL context CI lacks) are
orthogonal to the core API. Resolution: drop them from this PR and land them in a separate
follow-up PR once this core texture/surface PR merges, since the examples depend on the new API
it introduces.
Deferred (already NotImplementedError in #2095 — listed for the record, not for debate): layered /
cubemap / sparse Array variants; descriptor round-tripping via cuTexObjectGetResourceDesc;
consolidating the duplicate _get_current_context_ptr / _get_current_device_id sites.
Doc fix (not a decision): the PR summary says "three factories" but lists four.
Not contested
The test approach (real-GPU, no mocks), the private-constructor + from_descriptor factory pattern,
and the MipmappedArray.get_level lifetime model all looked sound in review — the gate is
design / naming / scope, not the code.
Purpose
Design discussion for the texture / surface API in
cuda.core— to settle the API shape andnaming before code review of the implementation. Reviewers asked for design sign-off in an issue
before we commit to a ~9k-line feature.
TextureObjectandSurfaceObject? #467cc @leofang @mdboom @Andy-Jost @kkraus14 — you asked for a design pass; this is the home for it.
Proposed public surface (from #2095)
Array+ArrayFormat— opaque, hardware-laid-out GPU allocations backing textures/surfaces.MipmappedArray— wrapsCUmipmappedArray;get_levelreturns a non-owningArraykept aliveby a strong ref to the parent.
TextureObject+TextureDescriptor— bindless texture handle + sampling state.SurfaceObject— bindless surface handle; requiresArray(surface_load_store=True).ResourceDescriptor— factoriesfrom_array,from_mipmapped_array,from_linear,from_pitch2d.Decisions to make
Name of
Array. ✅ Decided — renameArray→CUDAArray.This type is an opaque
cudaArray_t— the GPU stores it in a scrambled, hardware-defined layoutwith no linear pointer, so it cannot expose
__cuda_array_interface__/ DLPack and cannotshare memory zero-copy with cupy / numba-cuda / torch. The name
Arrayimplies an n-dimensionalarray that participates in that ecosystem — it can't. CuPy names the identical type
CUDAarray,and its whole
cupy.cuda.texturemodule already matches this PR's surface 1:1.Resolution: use
CUDAArray— the PEP 8 CapWords form (deliberately differing from CuPy's exactCUDAarraycasing to follow Python's class-naming standard). The name signals "CUDA texture/surfacebacking store," not "n-dimensional array." Open detail only: whether the related
ArrayFormatfollows suit (e.g.
CUDAArrayFormat) for consistency.Interop path. Decision: ship only
copy_from/copy_to(like CuPy), or also add asanctioned copy-bridge helper to/from linear
cuda.corebuffers?Zero-copy is impossible, so copying is the only option — this is purely about how polished
the path is. Either way, document the copy-only contract so the type doesn't surprise anyone.
Factory set. Decision: ship all four
ResourceDescriptorfactories, or only the twoArray-backed ones and defer linear-memory support to a follow-up?A texture can be backed by four kinds of memory — the PR exposes one factory per kind:
from_array— texture over anArray(the headline feature)from_mipmapped_array— texture over aMipmappedArray(the headline feature)from_linear— texture over a plain 1D device buffer (ordinary linear memory, noArray)from_pitch2d— texture over a plain 2D pitched buffer (ordinary linear memory, noArray)The first two are what this feature is about. The last two are a separate path that textures
regular device memory unrelated to the new
Arraytype. Shipping all four is more complete butcommits more public API in 1.1; shipping only the
Array-backed two keeps the initial surfacefocused and easier to walk back.
Channel format. Decision: describe an element with two loose parameters (an
ArrayFormatenum + a
num_channelsint), or with one bundledChannelFormatDescriptorobject like CuPy?Each array element has a component type (e.g. 8-bit uint, 32-bit float) and a channel count
(1 = grayscale … 4 = RGBA). CUDA's C API bundles both into one struct (
cudaChannelFormatDesc).Two ways to surface that:
Array.from_descriptor(shape=..., format=ArrayFormat.FLOAT32, num_channels=4)ChannelFormatDescriptor(...)object passed as a unitFolded is a smaller surface and a simpler call. Separate matches CuPy and is a single object you
can read back when introspecting an existing array's format, instead of reconstructing two fields.
Descriptor type consistency. Decision: is the
@dataclass/cdef classsplit deliberate(and documented), or should the two descriptors be the same kind of type for a predictable API?
Both are config objects you fill in and pass to the texture machinery, but they're built as
different kinds of object:
TextureDescriptor→@dataclass(pure Python: auto__init__/repr/ equality, easy toconstruct and inspect)
ResourceDescriptor→cdef class(Cython extension type: holds native C struct fields, faster,but more rigid — no auto-generated niceties, fixed attributes)
The split may be justified:
ResourceDescriptorwraps a C union (CUDA_RESOURCE_DESC, overarray / mipmap / linear / pitch2d backings) and its
from_*factories poke struct fields, so acdef classis natural;TextureDescriptoris just a bag of sampling settings, so a@dataclassis the simple choice. But to a user they look like siblings yet behave differently (repr, equality,
construction, subclassing). The author should confirm the divergence is intentional, or unify them.
Bool naming. ✅ Decided — adopt the
is_<something>convention.surface_load_storeis a boolean onArray: it records whether the array was created with thesurface load/store capability (CUDA's
CUDA_ARRAY3D_SURFACE_LDST), which aSurfaceObjectrequires. Exposed both as a constructor keyword (
surface_load_store=True) and a read-onlyproperty (
arr.surface_load_store).The repo convention for boolean properties is
is_<something>, so a property namedsurface_load_storedoesn't read as a boolean the wayarr.is_manageddoes. Resolution: renamethe property to follow the
is_<x>convention (e.g.is_surface_load_store) for consistency withthe cuda-python codebase. Open detail only: the exact name, and whether to rename the constructor
keyword too or leave the plain
surface_load_store=option as-is.Scope. ✅ Decided — split the examples into a follow-up PR.
The nine
gl_interop_*.pyexamples (~5k lines, not CI-wired, need a GL context CI lacks) areorthogonal to the core API. Resolution: drop them from this PR and land them in a separate
follow-up PR once this core texture/surface PR merges, since the examples depend on the new API
it introduces.
Deferred (already
NotImplementedErrorin #2095 — listed for the record, not for debate): layered /cubemap / sparse
Arrayvariants; descriptor round-tripping viacuTexObjectGetResourceDesc;consolidating the duplicate
_get_current_context_ptr/_get_current_device_idsites.Doc fix (not a decision): the PR summary says "three factories" but lists four.
Not contested
The test approach (real-GPU, no mocks), the private-constructor +
from_descriptorfactory pattern,and the
MipmappedArray.get_levellifetime model all looked sound in review — the gate isdesign / naming / scope, not the code.