docs: deduplicate built HTML images into a shared image/ tree by grandixximo · Pull Request #4157 · LinuxCNC/linuxcnc

grandixximo · 2026-06-12T12:06:53Z

The HTML doc build copies every referenced image into each language and topic directory, so the built tree carries the same bytes many times over (238 MB of images, only 30 MB unique).

This adds docs/src/tools/dedup-images.py and wires it into the htmldocs build: it collapses byte-identical images (SHA-256) into a shared root image/ tree, keeps only translated overrides under <lang>/image/, and rewrites every src and click-to-enlarge href. It is dry-run by default, idempotent, and self-verifying (after --apply it re-resolves every reference and fails if any is broken), and it preserves HTML mtimes so a second make htmldocs does no work.

On the full translated tree the build output goes from 325 MB to 119 MB (images 238 MB to 30 MB), with all references verified.

@BsAtHome I went with a post-build pass rather than doing this in image_resolver.rb. The resolver is PDF-only today, and the bulk of the duplication is plain untranslated English images that the _en/_<lang> logic never touches, so a single-source-of-truth approach would mean a bigger refactor of the image stamps to defer to the resolver. The size and build-time results are the same either way. Happy to take the DRY refactor route instead if you would prefer it.

The HTML doc build copies every referenced image into each language and topic directory, so the built tree carries the same bytes many times over (238 MB of images, only 30 MB unique). Add docs/src/tools/dedup-images.py: it collapses byte-identical images (SHA-256) into a shared root image/ tree, keeps only translated overrides under <lang>/image/, and rewrites every src and click-to-enlarge href to match. Dry-run by default, idempotent, and self-verifying: after --apply it re-resolves every reference and fails if any is broken. Wire it into the htmldocs build as a final .dedup-images-stamp step. The tool preserves the mtime of every HTML file it rewrites, so a second `make htmldocs` does no work. On the full translated tree this takes the build output from 325 MB to 119 MB (images 238 MB to 30 MB) with all references verified.

BsAtHome · 2026-06-12T12:10:57Z

I went with a post-build pass rather than doing this in image_resolver.rb.

When it works as intended without too much extra work,... why not. I trust you have weighed the options and went for the better one :-)

I'll have a look, later.

grandixximo · 2026-06-13T02:30:07Z

Thanks. FWIW I did spend a fair bit of time weighing the build-integrated alternative before settling on the post-build pass.

This PR adds just one file and it is readable, but I agree the reason it has to exist is ugly: the build creates the duplicates and then this cleans them up.

The real alternative is not only refactoring the Ruby resolver but also shifting the build from zip to tar, so the resolver can place symlinks and have them survive. That preserves the dedup all the way through (artifact, deb, fetch), but it is a larger blast radius and would need coordination with @hdiethelm.

If we agree on that shape instead, the result is a more elegant structure, and I think genuinely better. Happy to go that way if you and @hdiethelm are on board.

grandixximo mentioned this pull request Jun 12, 2026

Devel Docs not showing at linuxcnc.org/docs/devel/html/ #4152

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: deduplicate built HTML images into a shared image/ tree#4157

docs: deduplicate built HTML images into a shared image/ tree#4157
grandixximo wants to merge 1 commit into
LinuxCNC:masterfrom
grandixximo:docs-image-dedup

grandixximo commented Jun 12, 2026

Uh oh!

BsAtHome commented Jun 12, 2026

Uh oh!

grandixximo commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

grandixximo commented Jun 12, 2026

Uh oh!

BsAtHome commented Jun 12, 2026

Uh oh!

grandixximo commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants