fix(staged_insert): converge metadata shape with ObjectCodec.encode#1465
Open
dimitri-yatsenko wants to merge 1 commit into
Open
fix(staged_insert): converge metadata shape with ObjectCodec.encode#1465dimitri-yatsenko wants to merge 1 commit into
dimitri-yatsenko wants to merge 1 commit into
Conversation
Bring the staged-insert metadata dict into structural equality with the
ordinary insert1 path, per the Staged Insert Specification
(datajoint-docs#177). Without this change the same content stored via
the two paths yields different column dicts:
ObjectCodec.encode -> {path, store, size, ext, is_dir, item_count, timestamp}
staged (directory) -> {path, size, hash, ext, is_dir, item_count, timestamp}
staged (single file)-> {path, size, hash, ext, is_dir, timestamp, mime_type?}
The staged path is now the canonical shape. Drops:
- hash: None (never carried information)
- mime_type (file case) (not in encode shape)
The file case now also carries item_count: None (matching encode).
Also fix the store_name divergence: staged_insert resolved the backend
from stores.default regardless of the field's type spec. A field declared
<object@local> would write through stores.default — and the store key
recorded in the metadata column would point at the wrong store. Now
resolve store_name from attr.store via resolve_dtype() and use that for
both path/backend resolution and the metadata's store field.
Drop the .manifest.json sidecar that the staged path wrote and the
encode path didn't. The metadata dict already records total size and
item_count; per-file listings are recoverable by walking the canonical
directory if ever needed.
Fix docstrings that showed `staged.rec['raw_data'] = z` — the framework
computes the metadata; the caller does not assign anything to the
staged field on staged.rec.
Add tests/integration/test_object.py::TestStagedInsert::
test_staged_insert_metadata_shape_matches_encode covering both the
single-file and directory cases against ObjectCodec.encode for
equivalent content.
Slated for DataJoint 2.3.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bring `staged_insert1` into compliance with the Staged Insert Specification (
datajoint-docs#177). The spec defines a normative metadata-dict shape; today's implementation diverges from it in two ways that are observable in the database.Slated for the DataJoint 2.3 release.
What changes
1. Metadata shape now matches
ObjectCodec.encodeexactly.The same content stored via the two insert paths used to yield different column dicts:
ObjectCodec.encode(builtin_codecs/object.py:166-174){path, store, size, ext, is_dir, item_count, timestamp}{path, size, hash, ext, is_dir, item_count, timestamp}{path, size, hash, ext, is_dir, timestamp, mime_type?}After this PR,
_compute_metadatareturns the encode shape in both cases:hash: None(never carried information)mime_type(not in encode shape)item_count: Nonestore: <resolved store name>2. Per-field store resolution.
Today's
staged_insertresolves the backend fromstores.defaultregardless of the field's type spec. A field declared<object@local>would write throughstores.default, and thestorekey recorded in the row's metadata column would name the wrong store (or be missing entirely). Now we resolvestore_namefromattr.storeviaresolve_dtype()and use that for both path/backend resolution and the metadata'sstorefield — matching what ordinaryinsert1does throughtable.py:1342.3. Drops the
.manifest.jsonsidecar.The staged path was writing a per-object manifest file that the encode path doesn't produce. The metadata dict already records total
sizeanditem_count; per-file listings are recoverable by walking the canonical directory. Removing the sidecar eliminates a small storage-layout divergence between the two paths.4. Docstring fix.
Module docstring and the
staged_insert1function docstring both showedstaged.rec['raw_data'] = z. Per the spec, the caller does not assign anything tostaged.rec[<staged field>]— the framework computes the metadata dict. The implementation already overwrote the user's assignment in_finalize, so this was harmless but misleading. Examples updated.Test added
tests/integration/test_object.py::TestStagedInsert::test_staged_insert_metadata_shape_matches_encode— exercises both the single-file and directory paths, captures the metadata dict assigned to the row, and asserts:ObjectCodec.encode's output exactly for equivalent contentstore,ext,is_dir,item_count)Test plan
tests/integration/test_object.pypassRelated
datajoint-docs#177's "Implementation status" note refers to once merged. After both merge, the spec ceases to be forward-looking for the<object@>path.