Skip to content

CI: htmldoc artifacts and package#4150

Open
hdiethelm wants to merge 2 commits into
LinuxCNC:masterfrom
hdiethelm:ci_doc_build
Open

CI: htmldoc artifacts and package#4150
hdiethelm wants to merge 2 commits into
LinuxCNC:masterfrom
hdiethelm:ci_doc_build

Conversation

@hdiethelm

@hdiethelm hdiethelm commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

This PR adds:

  • Artifacts for htmldoc
  • A published package for htmldocs Removed
    • Github doesn't support generic packages but containers can be used with oras
    • The package is published for 2.9 / master and any tag
    • Download with a persistent link: oras pull ghcr.io/linuxcnc/linuxcnc/doc-html:master
  • Allow sid builds to fail and continue building other packages

If needed for the 2.9 branch, I can backport this.

The discussion started in: #4119

As much as I understand the github docs, only members are allowed to write packages. So PR's should not be able to create a package, even if one removes the if() to only run this stage always.

For testing, I run this stage in my github account: https://github.com/hdiethelm/linuxcnc-fork/actions/runs/27276506335
The package is here: https://github.com/hdiethelm/linuxcnc-fork
And can be downloaded with: oras pull ghcr.io/hdiethelm/linuxcnc/doc-html:ci_doc_build_test

@BsAtHome
Do you thing this does the job?
I will create a commit that removes the if() and see what happens. Then I will revert it again.

@hdiethelm

Copy link
Copy Markdown
Contributor Author

And again, the CI broke because sid is broken. Might be we should make the sid package build allow to fail?

I tested it:

apt install python3-opencv
Solving dependencies... Error!  
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

Unsatisfied dependencies:
 python3-opencv : Depends: libopencv-calib3d410 (>= 4.10.0+dfsg) but it is not going to be installed
                  Depends: libopencv-contrib410 (>= 4.10.0+dfsg) but it is not going to be installed
                  Depends: libopencv-core410 (>= 4.10.0+dfsg) but it is not going to be installed
                  Depends: libopencv-dnn410 (>= 4.10.0+dfsg) but it is not going to be installed
                  Depends: libopencv-features2d410 (>= 4.10.0+dfsg) but it is not going to be installed
                  Depends: libopencv-flann410 (>= 4.10.0+dfsg) but it is not going to be installed
                  Depends: libopencv-highgui410 (>= 4.10.0+dfsg) but it is not going to be installed
                  Depends: libopencv-imgcodecs410 (>= 4.10.0+dfsg) but it is not going to be installed
                  Depends: libopencv-imgproc410 (>= 4.10.0+dfsg) but it is not going to be installed
                  Depends: libopencv-ml410 (>= 4.10.0+dfsg) but it is not going to be installed
                  Depends: libopencv-objdetect410 (>= 4.10.0+dfsg) but it is not going to be installed
                  Depends: libopencv-photo410 (>= 4.10.0+dfsg) but it is not going to be installed
                  Depends: libopencv-shape410 (>= 4.10.0+dfsg) but it is not going to be installed
                  Depends: libopencv-stitching410 (>= 4.10.0+dfsg) but it is not going to be installed
                  Depends: libopencv-video410 (>= 4.10.0+dfsg) but it is not going to be installed
                  Depends: libopencv-videoio410 (>= 4.10.0+dfsg) but it is not going to be installed
                  Depends: libopencv-viz410 (>= 4.10.0+dfsg) but it is not going to be installed
Error: Unable to satisfy dependencies. Reached two conflicting assignments:
   1. libadios2-mpi-core-2.11:amd64 is selected for install because:
      1. python3-opencv:amd64=4.10.0+dfsg-7+b2 is selected for install
      2. python3-opencv:amd64 Depends libopencv-viz410 (>= 4.10.0+dfsg)
      3. libopencv-viz410:amd64 Depends libvtk9.5 (>= 9.5.2+dfsg4)
      4. libvtk9.5:amd64 Depends libadios2-mpi-c++-2.11 (>= 2.11.0+dfsg1)
      5. libadios2-mpi-c++-2.11:amd64 Depends libadios2-mpi-core-2.11 (>= 2.11.0+dfsg1)
   2. libadios2-mpi-core-2.11:amd64 Depends libadios2-mpi-plugins (= 2.11.0+dfsg1-7+b1)
      but none of the choices are installable:
      [no choices]

I guess it will be fixed soon in sid.

@BsAtHome

Copy link
Copy Markdown
Contributor

I'm not sure that the webserver can use oras. Besides, I'm not sure you would want to involve yet another third party in this process.

The point is that github already has stored any built artifact (like in the deb package builds). The question is whether we can exploit that. We only need the link to the artifact.

@BsAtHome

Copy link
Copy Markdown
Contributor

And, yes, a soft-fail on Debian:sid may be appropriate.
It is a rare occasion that sid breaks, but it is a blocker.

@hdiethelm

Copy link
Copy Markdown
Contributor Author

I'm not sure that the webserver can use oras. Besides, I'm not sure you would want to involve yet another third party in this process.

It is a debian package, depending how it is set up, apt-get install oras is all it needs.

The point is that github already has stored any built artifact (like in the deb package builds). The question is whether we can exploit that. We only need the link to the artifact.

The only issue there is, that artifacts are linked to CI runs. So for an update you would have to download the file manually. Even wget doesn't work to download the file when you use right-click copy link due to you have to be logged in.
You can try it right now, the doc artifact is already enabled: https://github.com/LinuxCNC/linuxcnc/actions/runs/27279180672
grafik

Is this good enough?

Otherwise what I found so far:
https://gist.github.com/umohi/bfc7ad9a845fc10289c03d532e3d2c2f
I can try that and give you a command that should work. But you will need an access token.

@BsAtHome

Copy link
Copy Markdown
Contributor

I'm not sure that the webserver can use oras. Besides, I'm not sure you would want to involve yet another third party in this process.

It is a debian package, depending how it is set up, apt-get install oras is all it needs.

That is a problem, right there... We can't install or sudo on the webserver.

The point is that github already has stored any built artifact (like in the deb package builds). The question is whether we can exploit that. We only need the link to the artifact.

Is this good enough?

Yes, that was what I was thinking about. Don't know if it works. That's why I was inquiring ;-)

The problem was that the htmldocs run did not produce any artifacts, so we had no chance of testing in any direction.

Otherwise what I found so far: https://gist.github.com/umohi/bfc7ad9a845fc10289c03d532e3d2c2f I can try that and give you a command that should work. But you will need an access token.

That looks like a way. Generating a limited access token, just to get the file, may be a possibility.

@hdiethelm

Copy link
Copy Markdown
Contributor Author

And, yes, a soft-fail on Debian:sid may be appropriate. It is a rare occasion that sid breaks, but it is a blocker.

With c323d8e, the other packages are built, even if sid fails. But sid is shown red.
I can move the continue-on-error: to all steps, then it will show green. But I think red is fine, it failed and needs investigation.

@hdiethelm

hdiethelm commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

Otherwise what I found so far: https://gist.github.com/umohi/bfc7ad9a845fc10289c03d532e3d2c2f I can try that and give you a command that should work. But you will need an access token.

That looks like a way. Generating a limited access token, just to get the file, may be a possibility.

I found something that works:
https://docs.github.com/en/rest/actions/artifacts?apiVersion=2026-03-10

Token:
Bildschirmfoto vom 2026-06-10 19-01-26

Get url's of all artifacts that match the branch and are named linuxcnc-doc:

TOKEN="YourToken"
BRANCH="ci_doc_build"
NAME="linuxcnc-doc"
curl -L \
  -H "Accept: application/vnd.github+json" \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "X-GitHub-Api-Version: 2026-03-10" \
  "https://api.github.com/repos/linuxcnc/linuxcnc/actions/artifacts?per_page=100&name=${NAME}"  | \
  jq ".artifacts[] | select(.workflow_run.head_branch==\"${BRANCH}\") | .archive_download_url"

Download the zip:

curl -L \
  -H "Accept: application/vnd.github+json" \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "X-GitHub-Api-Version: 2026-03-10" \
  https://api.github.com/repos/LinuxCNC/linuxcnc/actions/artifacts/7537361103/zip -o linuxcnc-doc.zip

Combined:

TOKEN="YourToken"
BRANCH=ci_doc_build
NAME="linuxcnc-doc"
DL_URL=$(curl -L \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${TOKEN}" \
-H "X-GitHub-Api-Version: 2026-03-10" \
"https://api.github.com/repos/linuxcnc/linuxcnc/actions/artifacts?per_page=100&name=${NAME}" | \
jq ".artifacts[] | select(.workflow_run.head_branch==\"${BRANCH}\") | .archive_download_url" | head -n1)

#DL_URL has quotes, remove them
DL_URL=${DL_URL//\"/}

curl -L \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${TOKEN}" \
-H "X-GitHub-Api-Version: 2026-03-10" \
"$DL_URL" -o linuxcnc-doc.zip

Looks like the links are sorted by date, so I can just take the first one.

Does that work for you? You need curl and jq Took some time to get all together, all i found was to download releases but we need artifacts from CI build.

As soon as this branch is merged, you can change BRANCH=ci_doc_build to BRANCH=master

Edit: Using ?name= is more efficient than filtering for name. Also, there is pagination, i set it to 100. So after 100 other builds without any master build, it will fail.

@hdiethelm

Copy link
Copy Markdown
Contributor Author

I removed the commit creating packages.
Should be fine to merge if you are ok with it.

@hdiethelm hdiethelm marked this pull request as ready for review June 10, 2026 17:39
@BsAtHome

Copy link
Copy Markdown
Contributor

That looks very good that we might be almost there to try.

@andypugh This might be able to restore the devel docs on the webserver. Can you check on the webserver if it has jq installed (try run jq --version)? There should be curl by default, I think.
If jq is there, then we have a "simple" way to poll github for new devel html content directly built from CI (via a cron job). If jq is unavailable, then we need to look at what alternatives can parse json and are available on the webserver (python, php, lua, perl,...).

@hdiethelm

Copy link
Copy Markdown
Contributor Author

There will shure be a way. Otherwhise just copy over the binary of jq or cobble together something in bash / awk to get the right url string.
BTW: Artifacs expire after some time, default 90 days, so it might sometimes fail. It's also still github.
The curl --fail options might help handle errors.

@grandixximo

Copy link
Copy Markdown
Contributor

This also bites package-indep: today's adios2/sid breakage failed package-indep (debian:sid) and fail-fast cancelled the bookworm/trixie indep builds. Could you apply the same allow_fail handling there? With that I'd close my overlapping #4155.

@andypugh

Copy link
Copy Markdown
Collaborator

Can you check on the webserver if it has jq installed (try run jq --version)?

Sorry, I missed this last night, and don't have the relevant keys on my work laptop. I can check when I get home.

@andypugh

Copy link
Copy Markdown
Collaborator

Can you check on the webserver if it has jq installed (try run jq --version)?

Unfortunately not.

[pdx1-shared-a3-06]$ jq --version
Command 'jq' not found, but can be installed with:
apt install jq
Please ask your administrator.
You will have to enable the component called 'universe'

@BsAtHome

Copy link
Copy Markdown
Contributor

Please try see if PHP and the json lib are installed. Put this is a file (test.php):

<?php
var_dump(json_decode('{"a":123}'));

and run: php ./test.php

@andypugh

Copy link
Copy Markdown
Collaborator

That looks more promising.

[pdx1-shared-a3-06]$ php ./test.php 
object(stdClass)#1 (1) {
  ["a"]=>
  int(123)
}

@BsAtHome

Copy link
Copy Markdown
Contributor

Very good. We'll go the PHP way.

@hdiethelm the artifact does not include the html directory. It is the content of the directory. I think the zip-file artifact needs to go one level back and include the containing directory.

@grandixximo

grandixximo commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

@BsAtHome sounds good, the PHP route it is. Here is a self-contained PHP fetch-and-deploy script for the webserver. It needs only curl plus the PHP json and zip extensions, no third-party tools and no shelling out.

On the artifact packaging point you raised: this script is agnostic to it. It locates the entrypoint after extraction, so it works whether the zip holds the contents of html/ directly (as it does now) or you repackage it to include the containing directory. It also creates the served html path itself as a symlink, so it does not actually need the wrapping directory. If you still repackage for other consumers it keeps working either way.

How it works:

  • Lists the linuxcnc-doc artifacts (paginated), picks the newest live one built from master by a trusted in-repo run, and skips if that id is already deployed (state file), so cron can run it often.
  • Two-step download so the token never reaches storage: it resolves archive_download_url with the token but does not follow the redirect, then fetches the signed URL on a clean handle with no Authorization header.
  • Publishes atomically: the served path is a symlink swapped over with a single rename(), so the tree is never half-written or momentarily absent. The extracted tree is validated (entrypoint present, plausible file count) before it goes live, and old releases are pruned.

Security notes for review: all curl handles pin to https and verify TLS; zip entry names are audited for traversal before extraction; the download is checked against Content-Length and a size ceiling; the token is read with a strict charset check and never logged.

I have lint-checked it on PHP 7.4 and 8.2 (both clean) and unit-tested the security-critical parts (symlink-safe cleanup, the zip-slip audit, unwrap detection, the atomic swap, prune retention). I have not run it end to end against the live API or webserver, so that part is unverified.

Two things to confirm before wiring it up:

  • $branch is master, which is correct once this lands and master CI uploads the artifact. If it is currently built from another branch, that top-of-file knob needs to point at it.
  • The webserver must follow symlinks at the docroot (Apache +FollowSymLinks / nginx disable_symlinks off), and the cron user needs write access to the docroot's parent. Both are documented in the header.

Are you already drafting the PHP script, or shall I finish this one? Writing it needs no server access. @andypugh would handle the actual install on the webserver (token, cron entry, FollowSymLinks) since he is the one with the keys.

fetch-devel-docs.php.txt

@BsAtHome

Copy link
Copy Markdown
Contributor

We also need to find out whether Andy's account on the webserver is allowed to write/change the page serving directory... That would be a real stopper ;-)

Is it really necessary to download 100x100 artifacts? Calling the github 100 times in a burst seems like an attempted denial of service and may be flagged. According to above post, these artifacts are sorted by when they were created/run (newest first).

You may want to use references in your for/foreach loops to reduce the number of copies created of values/instances/arrays (reduces memory footprint).

Keeping 5 releases by default is currently more then 1.5 GB. A bit overkill.

Failing on missing zip extension should not be necessary. We must test the availability of both cURL and zip modules before this can even start to become working. And when the modules are there, then they are there. Besides, we need to see how much memory is available to php-cli because we are handling large files and php is not always the best to keep a low-profile memory footprint.

@grandixximo

Copy link
Copy Markdown
Contributor

Thanks, all fair. Updated the script:

  • Pagination: you are right that walking up to 100 pages was wrong. Since the list is filtered by name and returned newest-first, it now takes the first live in-repo master artifact and stops, fetching another page only if one has no match, capped at 3. In practice that is a single request.
  • Releases: dropped the default to keep 2 (the live one plus a single rollback) instead of 5.
  • Capabilities: moved the curl/zip/json checks to an up-front preflight that fails clearly before any work, and removed the mid-run zip check.
  • Loops: the artifact scan now iterates by reference (with unset after) to avoid copying the arrays.

On memory: the script never loads a file into PHP. curl streams the download straight to a file handle and ZipArchive extracts entry by entry to disk, so memory_limit is not a factor regardless of tree size; only the small JSON metadata is decoded in memory.

Andy's account on the webserver is allowed to write/change the page serving directory...

Agreed, that is the real gate. The cron user needs write on the parent of the served path, since publishing renames a symlink there. If that account cannot write it, none of this works and we would need a different publish path. Worth confirming before going further.

fetch-devel-docs-v2.php.txt

@andypugh

Copy link
Copy Markdown
Collaborator

I do have write access, and have used it to put up a placeholder text.

https://linuxcnc.org/docs/devel/html/index.html

@andypugh

andypugh commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

One question is why the docs disappeared rather than simply not being updated. Is there a risk that the buildbot (if, indeed, it is the buildbot) will delete the docs after each successful build unless we stop it?

Though we can probably prevent that by de-authorising the key:

command="/home/emcboard/bin/rsync-server 'www.linuxcnc.org/docs/*'" ssh-rsa <...elided...> Buildbot doc uploader

@andypugh

andypugh commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

If you are curious, the rsync-server script there does not do the rsync.

#!/usr/bin/python
#
# This script is intended to help provide safe receipt of incoming file
# transfers via rsync.  It should be used in the receiving user's
# .ssh/authorized_keys file as the command associated with the sender's
# public key, like this:
#
#     command="rsync-server www.linuxcnc.org/docs/*" $PUBKEY Buildbot doc uploader
#
# This script examines the command that the sender wanted to run (in
# SSH_ORIGINAL_COMMAND), and does two sanity checks.
# 
#     1. The first two arguments must be "rsync --server".
#
#     2. The last argument must match the glob in $1.
#
# If both those checks pass, it execs the requested command and the file
# transfer goes through.
#

@BsAtHome

Copy link
Copy Markdown
Contributor

One question is why the docs disappeared rather than simply not being updated. Is there a risk that the buildbot (if, indeed, it is the buildbot) will delete the docs after each successful build unless we stop it?

My guess is that it stopped because the files in question are no longer available at the location they were expected to be. But certainty is only provided by reading/disabling the old sync-code. It seems that by blocking the script/pubkey you can disable the buildbot update?

It broke right after we changed the build layout. That does not seem to be random. Anyway, we could always move the new tree right next to the old directory and update all links, if that would ever be required.

BTW, could you try to run this to see if the some expected extensions are available or we need workarounds for them too:

<?php
$f = false;
foreach (['curl', 'zip', 'json'] as $ext) {
    if (!extension_loaded($ext)) {
        echo "required PHP extension not loaded: $ext\n";
        $f = true;
    }
}
if (!function_exists('symlink')) {
    echo "symlink() is disabled (disable_functions); cannot publish atomically\n";
        $f = true;
}
if(!$f) {
    echo "All fine!\n";
}

@BsAtHome

Copy link
Copy Markdown
Contributor

Now I'm wondering what happens to the 2.9 branch docs when it gets updated. Do the webserver docs still update in that 2.9 tree automatically?

@andypugh

Copy link
Copy Markdown
Collaborator

BTW, could you try to run this to see if the some expected extensions are available or we need workarounds for them too:

"All fine"

@andypugh

andypugh commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Now I'm wondering what happens to the 2.9 branch docs when it gets updated. Do the webserver docs still update in that 2.9 tree automatically?

Well, if we disable the buildbot rsync key they might never get updated...

All the files in docs/stable appear to date from 15th December.
version 2.9.7-9-gd435482ad1
Fits with that (we are currently on 2.9.8)

The sid build shows still failed but other packages are built.
@hdiethelm

Copy link
Copy Markdown
Contributor Author

Very good. We'll go the PHP way.

@hdiethelm the artifact does not include the html directory. It is the content of the directory. I think the zip-file artifact needs to go one level back and include the containing directory.

Done. Github is a bit annoying, it removes the not needed path. But you can use a wildcard to stop removing the path. However, this will upload any html* directory's if one is created in the future.

This also bites package-indep: today's adios2/sid breakage failed package-indep (debian:sid) and fail-fast cancelled the bookworm/trixie indep builds. Could you apply the same allow_fail handling there? With that I'd close my overlapping #4155.

Done

@hdiethelm

hdiethelm commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

fetch-devel-docs-v2.php.txt

Was this generated by AI? Looks like way to much code for a simple task to get a the correct zip url from the json blob, might be check if it was already downloaded and if not, download it, check if the contents are correct and if yes, copy it to a location.

BTW: For checking, there is a checksum and a created_at field in the json. I am quite sure that the artifact list is sorted by creation date, the API would make no sense otherwise. So taking just the newest one should be perfectly fine. However, the github doc says nothing about it.

@hdiethelm

Copy link
Copy Markdown
Contributor Author

I quickly went trough the PHP. What makes sens is checking head_repository_id but It would check it against a hardcoded value like:

BRANCH="ci_doc_build"
NAME="linuxcnc-doc"
curl -L \
  -H "Accept: application/vnd.github+json" \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "X-GitHub-Api-Version: 2026-03-10" \
  "https://api.github.com/repos/linuxcnc/linuxcnc/actions/artifacts?per_page=100&name=${NAME}"  | \
  jq ".artifacts[] | select(.workflow_run.head_branch==\"${BRANCH}\") | select(.workflow_run.head_repository_id==3662905) | .archive_download_url"

so if someone creates an PR for his master branch, it is not taken directly to the docserver. 3662905 is the linuxcnc repo. For testing together with this PR, you can take 1157775434, which is mine.

Additionally, I would check for <?php anywhere in the doc, so no one can inject PHP which could breach the server. Might be check for executable files and look if at least html/index.html is there.

@BsAtHome

Copy link
Copy Markdown
Contributor

fetch-devel-docs-v2.php.txt

Was this generated by AI? Looks like way to much code for a simple task to get a the correct zip url from the json blob, might be check if it was already downloaded and if not, download it, check if the contents are correct and if yes, copy it to a location.

I have to agree... Looking a bit more over the code and it looks a lot like too much for so little. The sequence is:

  1. curl fetch json artifact list
  2. filter with simple php program (this is the jq replacement)
  3. curl fetch zip
  4. unzip into newtree # This should have the right permissions already
  5. (optional: scan for <?php and <?= or <? if short tags are enabled and reject if found)
  6. ln -sf newtree currentree # This only works if the webserver has followsymlinks enabled. Otherwise we need to do mv and live with the small race
  7. rm -r oldtree

If it breaks, then we can save one intermediate tree, but this should be only few lines of shell scripting, I guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants