fix(search): frame the root subject, not the first graph node#503
Merged
Conversation
173b26f to
8f1491f
Compare
frameByType framed each subgraph by root type and took `framed['@graph'][0]`. When a root subject one-hop references another subject of the same root type — e.g. a terminology source that is also a separately registered dataset — `jsonld.frame` returns several root nodes, so `[0]` could be the referenced one: it was emitted twice and the referencing subject was dropped. Frame each subgraph by the specific root subject `@id` instead, so exactly that subject is returned. Keeps the original branch structure (no coverage change).
8f1491f to
722b82c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
frameByTypegroups each root subject with the triples of the one-hop nodes it references, then yieldsframed['@graph'][0]. When a root subject references another subject of the same root type,jsonld.frame({'@type': rootType})returns several root nodes and[0]can be the referenced one – so it is emitted twice and the referencing subject is dropped entirely.This is not hypothetical. In the NDE Dataset Register a dataset can list a
terminologySource(or other reference) whose IRI is itself a separately registereddcat:Dataset. On the full production corpus this silently dropped about 1% of datasets – each dropped one replaced by a duplicate of the dataset it referenced – with no error, because both the projection and the Typesense import still succeed.Fix
Thread the root subject IRI through
groupByRootand yield the framed node whose@idmatches that root, instead of blindly taking[0].Validation
projectGraphyielded 2625 documents but only 2602 unique (23 dropped); after the fix, 2625 / 2625 / 0 duplicates, and the previously missing datasets (e.g. the Gouda Tijdmachine knowledge graph) are present.Local
vitest/tsccould not run in my worktree (its sharednode_modulespredates the@tpluscode/rdf-ns-buildersdependency onmain); CI runs the suite with the correct dependencies.