Design: cuda.core Texture/Surface API surface 

## Purpose

Design discussion for the **texture / surface API** in `cuda.core` — to settle the API shape and
naming *before* code review of the implementation. Reviewers asked for design sign-off in an issue
before we commit to a ~9k-line feature.

- Implementation PR: #2095
- Feature request: #467

cc @leofang @mdboom @Andy-Jost @kkraus14 — you asked for a design pass; this is the home for it.

## Proposed public surface (from #2095)

- `Array` + `ArrayFormat` — opaque, hardware-laid-out GPU allocations backing textures/surfaces.
- `MipmappedArray` — wraps `CUmipmappedArray`; `get_level` returns a non-owning `Array` kept alive
  by a strong ref to the parent.
- `TextureObject` + `TextureDescriptor` — bindless texture handle + sampling state.
- `SurfaceObject` — bindless surface handle; requires `Array(surface_load_store=True)`.
- `ResourceDescriptor` — factories `from_array`, `from_mipmapped_array`, `from_linear`, `from_pitch2d`.

## Decisions to make

1. **Name of `Array`. ✅ Decided — rename `Array` → `CUDAArray`.**

   This type is an opaque `cudaArray_t` — the GPU stores it in a scrambled, hardware-defined layout
   with no linear pointer, so it **cannot** expose `__cuda_array_interface__` / DLPack and cannot
   share memory zero-copy with cupy / numba-cuda / torch. The name `Array` implies an n-dimensional
   array that participates in that ecosystem — it can't. CuPy names the identical type `CUDAarray`,
   and its whole `cupy.cuda.texture` module already matches this PR's surface 1:1.

   **Resolution: use `CUDAArray`** — the PEP 8 CapWords form (deliberately differing from CuPy's exact
   `CUDAarray` casing to follow Python's class-naming standard). The name signals "CUDA texture/surface
   backing store," not "n-dimensional array." Open detail only: whether the related `ArrayFormat`
   follows suit (e.g. `CUDAArrayFormat`) for consistency.

2. **Interop path.** *Decision: ship only `copy_from` / `copy_to` (like CuPy), or also add a
   sanctioned copy-bridge helper to/from linear `cuda.core` buffers?*
   Zero-copy is impossible, so copying is the only option — this is purely about how polished
   the path is. Either way, document the copy-only contract so the type doesn't surprise anyone.

3. **Factory set.** *Decision: ship all four `ResourceDescriptor` factories, or only the two
   `Array`-backed ones and defer linear-memory support to a follow-up?*

   A texture can be backed by four kinds of memory — the PR exposes one factory per kind:
   - `from_array` — texture over an `Array` *(the headline feature)*
   - `from_mipmapped_array` — texture over a `MipmappedArray` *(the headline feature)*
   - `from_linear` — texture over a plain 1D device buffer *(ordinary linear memory, no `Array`)*
   - `from_pitch2d` — texture over a plain 2D pitched buffer *(ordinary linear memory, no `Array`)*

   The first two are what this feature is about. The last two are a separate path that textures
   regular device memory unrelated to the new `Array` type. Shipping all four is more complete but
   commits more public API in 1.1; shipping only the `Array`-backed two keeps the initial surface
   focused and easier to walk back.

4. **Channel format.** *Decision: describe an element with two loose parameters (an `ArrayFormat`
   enum + a `num_channels` int), or with one bundled `ChannelFormatDescriptor` object like CuPy?*

   Each array element has a component type (e.g. 8-bit uint, 32-bit float) and a channel count
   (1 = grayscale … 4 = RGBA). CUDA's C API bundles both into one struct (`cudaChannelFormatDesc`).
   Two ways to surface that:
   - **Folded (this PR):** `Array.from_descriptor(shape=..., format=ArrayFormat.FLOAT32, num_channels=4)`
   - **Separate (CuPy):** one `ChannelFormatDescriptor(...)` object passed as a unit

   Folded is a smaller surface and a simpler call. Separate matches CuPy and is a single object you
   can read *back* when introspecting an existing array's format, instead of reconstructing two fields.

5. **Descriptor type consistency.** *Decision: is the `@dataclass` / `cdef class` split deliberate
   (and documented), or should the two descriptors be the same kind of type for a predictable API?*

   Both are config objects you fill in and pass to the texture machinery, but they're built as
   different kinds of object:
   - `TextureDescriptor` → `@dataclass` (pure Python: auto `__init__` / `repr` / equality, easy to
     construct and inspect)
   - `ResourceDescriptor` → `cdef class` (Cython extension type: holds native C struct fields, faster,
     but more rigid — no auto-generated niceties, fixed attributes)

   The split may be justified: `ResourceDescriptor` wraps a C union (`CUDA_RESOURCE_DESC`, over
   array / mipmap / linear / pitch2d backings) and its `from_*` factories poke struct fields, so a
   `cdef class` is natural; `TextureDescriptor` is just a bag of sampling settings, so a `@dataclass`
   is the simple choice. But to a user they look like siblings yet behave differently (repr, equality,
   construction, subclassing). The author should confirm the divergence is intentional, or unify them.

6. **Bool naming. ✅ Decided — adopt the `is_<something>` convention.**

   `surface_load_store` is a boolean on `Array`: it records whether the array was created with the
   surface load/store capability (CUDA's `CUDA_ARRAY3D_SURFACE_LDST`), which a `SurfaceObject`
   requires. Exposed both as a constructor keyword (`surface_load_store=True`) and a read-only
   property (`arr.surface_load_store`).

   The repo convention for boolean properties is `is_<something>`, so a property named
   `surface_load_store` doesn't read as a boolean the way `arr.is_managed` does. **Resolution: rename
   the property to follow the `is_<x>` convention (e.g. `is_surface_load_store`) for consistency with
   the cuda-python codebase.** Open detail only: the exact name, and whether to rename the constructor
   keyword too or leave the plain `surface_load_store=` option as-is.

7. **Scope. ✅ Decided — split the examples into a follow-up PR.**

   The nine `gl_interop_*.py` examples (~5k lines, not CI-wired, need a GL context CI lacks) are
   orthogonal to the core API. **Resolution: drop them from this PR and land them in a separate
   follow-up PR once this core texture/surface PR merges**, since the examples depend on the new API
   it introduces.

*Deferred (already `NotImplementedError` in #2095 — listed for the record, not for debate):* layered /
cubemap / sparse `Array` variants; descriptor round-tripping via `cuTexObjectGetResourceDesc`;
consolidating the duplicate `_get_current_context_ptr` / `_get_current_device_id` sites.

*Doc fix (not a decision):* the PR summary says "three factories" but lists four.

## Not contested

The test approach (real-GPU, no mocks), the private-constructor + `from_descriptor` factory pattern,
and the `MipmappedArray.get_level` lifetime model all looked sound in review — the gate is
design / naming / scope, not the code.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design: cuda.core Texture/Surface API surface #2188

Purpose

Proposed public surface (from #2095)

Decisions to make

Not contested

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Design: cuda.core Texture/Surface API surface #2188

Description

Purpose

Proposed public surface (from #2095)

Decisions to make

Not contested

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions