Skip to content

ci: validate pixi run test source-build (PR smoke + nightly GPU)#2185

Merged
rparolin merged 7 commits into
NVIDIA:mainfrom
rparolin:ci/pixi-source-test
Jun 9, 2026
Merged

ci: validate pixi run test source-build (PR smoke + nightly GPU)#2185
rparolin merged 7 commits into
NVIDIA:mainfrom
rparolin:ci/pixi-source-test

Conversation

@rparolin

@rparolin rparolin commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Summary

Main CI tests prebuilt wheels (ci.yml / test-wheel-*.yml) and never exercises the pixi source build, so the pixi run test developer path rots silently whenever the CUDA pin, generated bindings, conda-forge packages, or cython-test build mechanics drift. This adds the missing guard.

Two tiers, to spend GPU minutes deliberately:

  • build-smoke (PRs touching the at-risk files): CPU-only on ubuntu-latest. Source-builds bindings + core, imports them, builds the cython test extensions and checks the .so lands in tests/cython/. Catches the compile / ABI / .so-placement regressions without a GPU.
  • full-test (nightly cron + manual workflow_dispatch): GPU runner, full pixi run test.

Shared pixi install is factored into a composite action (.github/actions/setup-pixi) with an explicit, asserted version pin.

Why separate from #2180

This is the prevention mechanism; #2180 is the fix for the current breakage. Kept separate per review.

Sequencing / expected CI

This branch is based on main, which still has the breakage until #2180 merges. So build-smoke is expected to FAIL on this PR — that failure is the guard correctly catching #2182. Merge #2180 first (or rebase this on it) and it goes green.

For reviewers to confirm

  • Runner label linux-amd64-gpu-l4-latest-1 (from ci/test-matrix.yml) — swap if a nightly-reserved label is preferred.
  • pixi installed via the official installer pinned to PIXI_VERSION; prefix-dev/setup-pixi pinned to a SHA is a reasonable alternative.
  • Suggest landing the new check non-blocking until proven stable.

Relates to #2183 (validate the source-build path over time). Guards against #2182 (fixed by #2180).

🤖 Generated with Claude Code


Update: rebased onto the #2180 fix branch so the source build is green (this PR was previously based on clean main, where build-smoke correctly caught #2182). It now stacks on #2180 — the pin/placement commits drop out once #2180 merges and this is rebased onto main.

rparolin and others added 2 commits June 8, 2026 16:53
The bindings were regenerated against CUDA 13.3.0 (cc50515), adding NVRTC
symbols (NVRTC_ERROR_BUSY, nvrtcBundledHeadersInfo, nvrtcGetBundledHeadersInfo),
but the pixi cuda-version pins stayed at 13.2 in cuda_bindings/pixi.toml and
cuda_core/pixi.toml. `pixi run test` then built 13.3-referencing Cython code
against a 13.2 nvrtc.h and failed with "'nvrtcBundledHeadersInfo' was not
declared in this scope". CI was unaffected because it builds wheels from
ci/versions.yml (13.3.0) rather than via pixi run test.

Bump the cuda-version pins (build-variants + feature.cu13) from 13.2.* to
13.3.* in both packages so the local toolkit matches the regenerated sources
and ci/versions.yml. Re-solved pixi.lock files accordingly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
tests/cython/build_tests.py runs `build_ext --inplace`, which writes the
compiled .so relative to the current working directory. pixi runs the
build-cython-tests task from the project root, so the .so landed in the
package root instead of tests/cython/, where pytest imports it by bare module
name. The test only passed previously because a correctly-placed .so from an
earlier build persisted (gitignored); a clean checkout fails with
ModuleNotFoundError.

chdir to the script directory before build_ext --inplace so the .so lands next
to its .pyx in both cuda_bindings and cuda_core (kept aligned per NVIDIA#1978).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rparolin rparolin added this to the cuda.bindings next milestone Jun 9, 2026
@rparolin rparolin added the CI/CD CI/CD infrastructure label Jun 9, 2026
@rparolin rparolin self-assigned this Jun 9, 2026
rparolin and others added 2 commits June 9, 2026 09:40
Main CI tests prebuilt wheels and never exercises the pixi source build, so
that developer path rots silently on CUDA-pin / generated-source / conda-forge
/ cython-build drift (NVIDIA#2182, NVIDIA#2183).

Add a workflow that runs the pixi source build:
- build-smoke (PRs touching the at-risk files): CPU-only. Source-builds
  bindings + core, imports them, builds the cython test extensions and checks
  placement. Catches the compile / ABI / .so-placement regressions without a GPU.
- full-test (nightly + manual): GPU runner, full `pixi run test`.

Shared pixi install factored into a composite action with an explicit,
asserted version pin.

Relates to NVIDIA#2183 (validate the source-build path over time); the regressions
this guards against are NVIDIA#2182, fixed by NVIDIA#2180.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…actionlint

actionlint validates static runner labels against its known set; the new
full-test job uses a literal GPU label (existing GPU jobs dodge this by
building the label from a matrix expression). Declare it so pre-commit's
actionlint hook passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rparolin rparolin force-pushed the ci/pixi-source-test branch from 837b5d5 to c40ad78 Compare June 9, 2026 16:42
@github-actions github-actions Bot added cuda.bindings Everything related to the cuda.bindings module cuda.core Everything related to the cuda.core module labels Jun 9, 2026
…ersion

A shallow checkout has no tags, so the source-built packages get
setuptools-scm's 0.1.dev1 fallback. cuda.core's import-time guard then
rejects cuda.bindings ("12.x or 13.x must be installed"). Use fetch-depth: 0
in both jobs so the build resolves the real 13.x version.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@mdboom mdboom left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, other that the strong security recommendation.

Comment thread .github/actions/setup-pixi/action.yml Outdated
@github-actions

This comment has been minimized.

rparolin and others added 2 commits June 9, 2026 13:56
Addresses review (@mdboom): the composite action shelled out to
`curl -fsSL https://pixi.sh/install.sh | bash`, an unverified installer
(the codecov.io supply-chain failure mode). Replace it with
prefix-dev/setup-pixi pinned to a commit SHA (v0.9.6) — its install logic
is auditable and pinned — and delete the composite action file.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The prior commit only removed the composite action file; this commits the
workflow change that actually uses prefix-dev/setup-pixi@<sha> in both jobs
(and drops the now-unneeded curl from the container apt install). Without
this the workflow referenced the deleted ./.github/actions/setup-pixi.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rparolin rparolin requested a review from mdboom June 9, 2026 21:37
@rparolin rparolin merged commit 774e988 into NVIDIA:main Jun 9, 2026
186 of 188 checks passed
@github-actions

This comment has been minimized.

1 similar comment
@github-actions

Copy link
Copy Markdown
Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD CI/CD infrastructure cuda.bindings Everything related to the cuda.bindings module cuda.core Everything related to the cuda.core module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants