CI: htmldoc artifacts and package#4150
Conversation
|
And again, the CI broke because sid is broken. Might be we should make the sid package build allow to fail? I tested it: I guess it will be fixed soon in sid. |
|
I'm not sure that the webserver can use oras. Besides, I'm not sure you would want to involve yet another third party in this process. The point is that github already has stored any built artifact (like in the deb package builds). The question is whether we can exploit that. We only need the link to the artifact. |
|
And, yes, a soft-fail on Debian:sid may be appropriate. |
It is a debian package, depending how it is set up,
The only issue there is, that artifacts are linked to CI runs. So for an update you would have to download the file manually. Even wget doesn't work to download the file when you use right-click copy link due to you have to be logged in. Is this good enough? Otherwise what I found so far: |
That is a problem, right there... We can't install or sudo on the webserver.
Yes, that was what I was thinking about. Don't know if it works. That's why I was inquiring ;-) The problem was that the htmldocs run did not produce any artifacts, so we had no chance of testing in any direction.
That looks like a way. Generating a limited access token, just to get the file, may be a possibility. |
With c323d8e, the other packages are built, even if sid fails. But sid is shown red. |
I found something that works: Get url's of all artifacts that match the branch and are named linuxcnc-doc: TOKEN="YourToken"
BRANCH="ci_doc_build"
NAME="linuxcnc-doc"
curl -L \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${TOKEN}" \
-H "X-GitHub-Api-Version: 2026-03-10" \
"https://api.github.com/repos/linuxcnc/linuxcnc/actions/artifacts?per_page=100&name=${NAME}" | \
jq ".artifacts[] | select(.workflow_run.head_branch==\"${BRANCH}\") | .archive_download_url"Download the zip: curl -L \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${TOKEN}" \
-H "X-GitHub-Api-Version: 2026-03-10" \
https://api.github.com/repos/LinuxCNC/linuxcnc/actions/artifacts/7537361103/zip -o linuxcnc-doc.zipCombined: TOKEN="YourToken"
BRANCH=ci_doc_build
NAME="linuxcnc-doc"
DL_URL=$(curl -L \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${TOKEN}" \
-H "X-GitHub-Api-Version: 2026-03-10" \
"https://api.github.com/repos/linuxcnc/linuxcnc/actions/artifacts?per_page=100&name=${NAME}" | \
jq ".artifacts[] | select(.workflow_run.head_branch==\"${BRANCH}\") | .archive_download_url" | head -n1)
#DL_URL has quotes, remove them
DL_URL=${DL_URL//\"/}
curl -L \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${TOKEN}" \
-H "X-GitHub-Api-Version: 2026-03-10" \
"$DL_URL" -o linuxcnc-doc.zipLooks like the links are sorted by date, so I can just take the first one. Does that work for you? You need As soon as this branch is merged, you can change Edit: Using |
|
I removed the commit creating packages. |
|
That looks very good that we might be almost there to try. @andypugh This might be able to restore the devel docs on the webserver. Can you check on the webserver if it has |
|
There will shure be a way. Otherwhise just copy over the binary of jq or cobble together something in bash / awk to get the right url string. |
|
This also bites |
Sorry, I missed this last night, and don't have the relevant keys on my work laptop. I can check when I get home. |
Unfortunately not. [pdx1-shared-a3-06]$ jq --version |
|
Please try see if PHP and the json lib are installed. Put this is a file (test.php): <?php
var_dump(json_decode('{"a":123}'));and run: |
|
That looks more promising. |
|
Very good. We'll go the PHP way. @hdiethelm the artifact does not include the |
|
@BsAtHome sounds good, the PHP route it is. Here is a self-contained PHP fetch-and-deploy script for the webserver. It needs only curl plus the PHP json and zip extensions, no third-party tools and no shelling out. On the artifact packaging point you raised: this script is agnostic to it. It locates the entrypoint after extraction, so it works whether the zip holds the contents of How it works:
Security notes for review: all curl handles pin to https and verify TLS; zip entry names are audited for traversal before extraction; the download is checked against Content-Length and a size ceiling; the token is read with a strict charset check and never logged. I have lint-checked it on PHP 7.4 and 8.2 (both clean) and unit-tested the security-critical parts (symlink-safe cleanup, the zip-slip audit, unwrap detection, the atomic swap, prune retention). I have not run it end to end against the live API or webserver, so that part is unverified. Two things to confirm before wiring it up:
Are you already drafting the PHP script, or shall I finish this one? Writing it needs no server access. @andypugh would handle the actual install on the webserver (token, cron entry, |
|
We also need to find out whether Andy's account on the webserver is allowed to write/change the page serving directory... That would be a real stopper ;-) Is it really necessary to download 100x100 artifacts? Calling the github 100 times in a burst seems like an attempted denial of service and may be flagged. According to above post, these artifacts are sorted by when they were created/run (newest first). You may want to use references in your for/foreach loops to reduce the number of copies created of values/instances/arrays (reduces memory footprint). Keeping 5 releases by default is currently more then 1.5 GB. A bit overkill. Failing on missing zip extension should not be necessary. We must test the availability of both cURL and zip modules before this can even start to become working. And when the modules are there, then they are there. Besides, we need to see how much memory is available to php-cli because we are handling large files and php is not always the best to keep a low-profile memory footprint. |
|
Thanks, all fair. Updated the script:
On memory: the script never loads a file into PHP. curl streams the download straight to a file handle and ZipArchive extracts entry by entry to disk, so
Agreed, that is the real gate. The cron user needs write on the parent of the served path, since publishing renames a symlink there. If that account cannot write it, none of this works and we would need a different publish path. Worth confirming before going further. |
|
I do have write access, and have used it to put up a placeholder text. |
|
One question is why the docs disappeared rather than simply not being updated. Is there a risk that the buildbot (if, indeed, it is the buildbot) will delete the docs after each successful build unless we stop it? Though we can probably prevent that by de-authorising the key: |
|
If you are curious, the rsync-server script there does not do the rsync. |
My guess is that it stopped because the files in question are no longer available at the location they were expected to be. But certainty is only provided by reading/disabling the old sync-code. It seems that by blocking the script/pubkey you can disable the buildbot update? It broke right after we changed the build layout. That does not seem to be random. Anyway, we could always move the new tree right next to the old directory and update all links, if that would ever be required. BTW, could you try to run this to see if the some expected extensions are available or we need workarounds for them too: <?php
$f = false;
foreach (['curl', 'zip', 'json'] as $ext) {
if (!extension_loaded($ext)) {
echo "required PHP extension not loaded: $ext\n";
$f = true;
}
}
if (!function_exists('symlink')) {
echo "symlink() is disabled (disable_functions); cannot publish atomically\n";
$f = true;
}
if(!$f) {
echo "All fine!\n";
} |
|
Now I'm wondering what happens to the 2.9 branch docs when it gets updated. Do the webserver docs still update in that 2.9 tree automatically? |
"All fine" |
Well, if we disable the buildbot rsync key they might never get updated... All the files in docs/stable appear to date from 15th December. |
The sid build shows still failed but other packages are built.
Done. Github is a bit annoying, it removes the not needed path. But you can use a wildcard to stop removing the path. However, this will upload any html* directory's if one is created in the future.
Done |
|
Was this generated by AI? Looks like way to much code for a simple task to get a the correct zip url from the json blob, might be check if it was already downloaded and if not, download it, check if the contents are correct and if yes, copy it to a location. BTW: For checking, there is a checksum and a created_at field in the json. I am quite sure that the artifact list is sorted by creation date, the API would make no sense otherwise. So taking just the newest one should be perfectly fine. However, the github doc says nothing about it. |
|
I quickly went trough the PHP. What makes sens is checking BRANCH="ci_doc_build"
NAME="linuxcnc-doc"
curl -L \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${TOKEN}" \
-H "X-GitHub-Api-Version: 2026-03-10" \
"https://api.github.com/repos/linuxcnc/linuxcnc/actions/artifacts?per_page=100&name=${NAME}" | \
jq ".artifacts[] | select(.workflow_run.head_branch==\"${BRANCH}\") | select(.workflow_run.head_repository_id==3662905) | .archive_download_url"so if someone creates an PR for his master branch, it is not taken directly to the docserver. 3662905 is the linuxcnc repo. For testing together with this PR, you can take 1157775434, which is mine. Additionally, I would check for |
I have to agree... Looking a bit more over the code and it looks a lot like too much for so little. The sequence is:
If it breaks, then we can save one intermediate tree, but this should be only few lines of shell scripting, I guess. |


This PR adds:
A published package for htmldocsRemovedoras pull ghcr.io/linuxcnc/linuxcnc/doc-html:masterIf needed for the 2.9 branch, I can backport this.
The discussion started in: #4119
As much as I understand the github docs, only members are allowed to write packages. So PR's should not be able to create a package, even if one removes theif()to only run this stage always.For testing, I run this stage in my github account: https://github.com/hdiethelm/linuxcnc-fork/actions/runs/27276506335The package is here: https://github.com/hdiethelm/linuxcnc-fork
And can be downloaded with:
oras pull ghcr.io/hdiethelm/linuxcnc/doc-html:ci_doc_build_test@BsAtHome
Do you thing this does the job?
I will create a commit that removes the if() and see what happens. Then I will revert it again.