feat: sync post attachments (images, PDFs, PowerPoints) to S3#181
feat: sync post attachments (images, PDFs, PowerPoints) to S3#181bbornino wants to merge 2 commits into
Conversation
Adds a post_attachments table and syncs non-markdown files (images, PDFs, PowerPoints, nested subfolders) from a post's GitHub folder to S3, with image resize/convert, content diffing, and cleanup on removal or post deletion.
|
Warning Review limit reached
Next review available in: 25 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds a ChangesPost attachments feature
Estimated code review effort: 4 (Complex) | ~60 minutes Sequence Diagram(s)sequenceDiagram
participant Processor
participant GitHub
participant S3
participant DB
Processor->>GitHub: collectAttachmentEntries(post folder)
Processor->>DB: select existing postAttachments rows
Processor->>S3: matchesEtag / exists per attachment
alt attachment changed or missing
Processor->>S3: remove old object
Processor->>S3: upload resized/converted attachment
else unchanged
Processor-->>Processor: skip upload
end
Processor->>S3: remove attachments no longer in repo
Processor->>DB: delete old postAttachments rows, insert attachmentRows
Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
apps/worker/src/tasks/sync-post/processor.ts (1)
172-184: 🎯 Functional Correctness | 🔵 TrivialSequential S3 removal without failure isolation.
If
s3.removethrows for one attachment mid-loop, the exception propagates, potentially leaving the post row undeleted while some S3 objects are already removed (partial state). This is likely self-healing on job retry (S3 deletes are typically idempotent), but considerPromise.allSettledor a try/catch-and-continue to avoid one bad key blocking the rest of the cleanup and the eventual post deletion.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/worker/src/tasks/sync-post/processor.ts` around lines 172 - 184, The attachment cleanup in sync-post processing is doing sequential S3 deletes with no failure isolation, so one bad key can stop the rest of the cleanup and block the eventual post deletion. Update the removal loop in processor.ts around the removedAttachmentRows handling to make each s3.remove call fault-tolerant, either by using Promise.allSettled or by wrapping each delete in a try/catch and continuing. Keep the existing bucket lookup and logging behavior, but ensure the cleanup continues for all attachmentKey values even if one delete fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/worker/src/tasks/sync-post/processor.test.ts`:
- Around line 1090-1097: The “unchanged attachment was NOT re-uploaded”
assertion in processor.test.ts is vacuous because the matcher uses
expect.anything() for arguments where s3.upload passes literal undefined, so it
can never match a real call. Update the check around the s3.upload expectation
to match the actual call signature used by the sync-post processor, especially
the third argument in s3.upload, and verify the unchanged.txt key against the
correct arguments so the assertion truly fails if upload is invoked. Use the
s3.upload call site in processor.ts and the existing test around unchanged
attachment handling to align the matcher with the real invocation shape.
In `@apps/worker/src/tasks/sync-post/processor.ts`:
- Around line 74-88: The resizeAttachmentImage helper is upscaling small inputs
because sharp().resize uses fit: "inside" without disabling enlargement. Update
resizeAttachmentImage in processor.ts to treat ATTACHMENT_IMAGE_SIZE_MAX as a
true cap by preventing enlargement when the source image is smaller, then adjust
the processor.test.ts expectations for the 1×1 fixture since width and height
should remain at the original small size instead of 2048×2048.
- Around line 74-88: The attachment resizing flow in resizeAttachmentImage
currently pipes Readable.fromWeb(stream) into the sharp pipeline without
explicit error handling, which can let source stream failures crash the worker.
Update this helper to use stream.pipeline or stream/promises.pipeline with the
existing sharp pipeline so errors are propagated as job failures, or add an
explicit error handler on the source stream; keep the fix localized to
resizeAttachmentImage in processor.ts.
- Around line 368-380: The changed-attachment flow in processor.ts deletes the
old S3 object before uploading the replacement, but previous.attachmentKey
usually matches attachmentKey because it is deterministically derived from
relativePath and isImage. Update the sync logic to skip the s3.remove call when
the key is unchanged and let s3.upload in the existing upload path overwrite the
object directly; keep the delete only for cases where previous.attachmentKey
differs from the new key.
- Around line 327-390: The attachment processing in processor.ts is using
Promise.all over attachmentEntries, which can buffer and upload every file at
once and spike memory/rate usage. Update the attachmentRows flow to use bounded
concurrency or batching instead of unbounded Promise.all, while keeping the
per-attachment logic inside the map callback that fetches via
github.getContentsRawStream, handles resizeAttachmentImage, and uploads through
s3.upload. Use a limiter or chunked processing so only a small number of
attachments are buffered in memory simultaneously.
- Around line 344-358: The image upload path in processor.ts normalizes every
image to the same .jpeg key, which can make different attachments with the same
basename overwrite each other in S3. Update the attachment handling in the image
branch of the sync-post processor so the storage key preserves uniqueness,
either by keeping the original relativePath/extension in keyRelativePath or by
detecting and rejecting duplicate basenames before the Promise.all upload. Make
sure the fix is applied where resizeAttachmentImage and withExtension are used
so each attachment row maps to a distinct object key.
---
Nitpick comments:
In `@apps/worker/src/tasks/sync-post/processor.ts`:
- Around line 172-184: The attachment cleanup in sync-post processing is doing
sequential S3 deletes with no failure isolation, so one bad key can stop the
rest of the cleanup and block the eventual post deletion. Update the removal
loop in processor.ts around the removedAttachmentRows handling to make each
s3.remove call fault-tolerant, either by using Promise.allSettled or by wrapping
each delete in a try/catch and continuing. Keep the existing bucket lookup and
logging behavior, but ensure the cleanup continues for all attachmentKey values
even if one delete fails.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 799b3a89-6552-494c-a4c8-5f9f2a762ce2
📒 Files selected for processing (7)
apps/worker/src/tasks/sync-post/processor.test.tsapps/worker/src/tasks/sync-post/processor.tsapps/worker/test-utils/setup.tspackages/db/drizzle/20260705145304_curious_zuras/migration.sqlpackages/db/drizzle/20260705145304_curious_zuras/snapshot.jsonpackages/db/src/schema/index.tspackages/db/src/schema/post-attachments.ts
…fulprogramming#181) Makes 404-branch S3 cleanup resilient to individual failures, caps resize to avoid upscaling small images, batches attachment uploads to a concurrency limit, avoids same-basename key collisions between converted images and their originals, skips a needless delete when an attachment's key doesn't change, and fixes a vacuous test assertion.
Summary
Closes #86.
When a post is synced, any non-markdown files in that post's GitHub folder — images, PDFs, PowerPoints, and files nested in subfolders — are now uploaded to S3 and tracked in a new
post_attachmentstable, with cleanup when files are removed from the repo or the post itself is deleted.This is unrelated to the existing
post_imagestable/task, which only covers generated banner and social-share images.Schema
post_attachmentstable (packages/db/src/schema/post-attachments.ts):postSlug(FK →posts.slug, cascade),attachmentName,attachmentKey, nullablewidth/height. Composite PK on(postSlug, attachmentName), following thepostTags/postAuthorspattern (no synthetic id column).sync-post processor
index*.mdlocale files), including nested subfolders.getContentsRawStream. Images (by extension) are resized and converted to JPEG, mirroring the approach insync-author's profile image pipeline. Non-images are uploaded as-is with a small extension→MIME-type map..jpeg(e.g.banner.png→posts/{post}/attachments/banner.jpeg) rather than keeping the original extension, so nothing infers content-type from a mismatched extension.attachmentNamein the database still stores the original filename exactly as it appeared in the repo. Non-image keys keep their original extension.post_attachmentsrows are compared against the freshly discovered set byattachmentName. Missing attachments are deleted from S3. For attachments still present, an MD5 hash of the final (post-resize, if applicable) content is compared against the existing S3 object's ETag (via the already-present but previously-unuseds3.matchesEtaghelper) — unchanged content is skipped, changed content is deleted and re-uploaded under the same key. The full new row set is written topost_attachmentsin the same transaction as the rest of the post's data.post_attachmentsrows are now fetched and deleted from S3 before thepostsrow is deleted, letting the cascading FK handle the database-side cleanup.Tests
Added coverage in
sync-post/processor.test.tsfor: new attachment upload (including image resize and subfolder recursion), the diffing behavior (skip-unchanged / delete-and-reupload-changed / remove-deleted), and S3 cleanup in the post-deletion (404) cascade. Also extendedtest-utils/setup.tswithpostAttachmentsands3.exists/s3.matchesEtagmocks used by these tests.Follow-ups (not addressed in this PR)
sync-post/processor.tsrather than extracting the logic already duplicated insync-author/processor.tsinto a shared utility. Extracting that felt like a bigger refactor than this issue calls for — worth doing as its own follow-up.content.mdfiles from S3 for any locale — only the new attachment cleanup was added here. That gap predates this change and is worth its own follow-up.resizeAttachmentImagepipesReadable.fromWeb(stream)into the sharp pipeline without explicit error handling on the source stream, so a mid-transfer GitHub fetch failure could surface as an unhandled exception rather than a clean job failure. This same pattern already exists, unfixed, insync-author'sprocessProfileImg, which this code intentionally mirrors — worth fixing in both places together as its own follow-up rather than patching just the new copy here.Test plan
pnpm test:unitpasses (lint, knip, publint, sherif, vitest) — confirmed locallypnpm prettiercheck passes — confirmed locallyposts/{post}/attachments/...Summary by CodeRabbit
New Features
Bug Fixes