Search by PID in the Interactive Explorer#314
Merged
Conversation
…esorg#26) Adds two-sided normalised PID matching so samples are findable by their persistent identifier (ARK, IGSN, DOI) even though those values never appear in label/description/place_name. New helpers in assets/js/sql-builders.js: - canonicalizePid(value): lowercase + strip resolver-URL prefix (n2t.net, doi.org, arks.org, hdl.handle.net) + collapse classic ARK `ark:/` → modern `ark:` (closes isamplesorg#26). - looksLikePid(term): heuristic — true when the term starts with ark:/ igsn:/doi: or begins with "10." (bare DOI) or is a resolver URL. Plain-text terms are never routed through PID matching; the hot-path is unchanged for queries like "pottery" or "basalt". - pidSearchWhere(rawTerm): SQL fragment: LOWER(REPLACE(pid,'ark:/','ark:')) = '<canonical>' OR pid ILIKE '%<localpart>%'. Both sides normalised so stored format (classic/modern ARK, uppercase IGSN) doesn't matter. All user input passed through escSql/escapeIlikePattern — no raw interpolation. Wire-up in explorer.qmd buildSearchFilter: import the three new helpers; when any search term looksLikePid, OR its pidSearchWhere into the existing fullWhere clause. Non-PID terms are unaffected (fullWhere === searchWhere). New test file tests/unit/sql-builders-pid.test.mjs: 22 tests covering canonicalizePid (ARK, IGSN, DOI, resolver URLs, whitespace trim), looksLikePid (true/false cases), and pidSearchWhere (SQL shape, injection safety). All 27 unit tests (5 existing + 22 new) pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a `pid:` query prefix so users can find a sample by a bare local identifier fragment without knowing the scheme (IGSN, ARK, DOI). Typing `pid:IEGIL000C` or `pid:k2000027w` in the search box emits: pid ILIKE '%<fragment>%' ESCAPE '\' DuckDB ILIKE is case-insensitive, so no LOWER is needed; the canonical exact-match arm is intentionally skipped — the substring already spans all scheme variants. The prefix itself is stripped case-insensitively (`PID:`, `Pid:`, `pid:` all work). Changes to assets/js/sql-builders.js: - looksLikePid: adds `pid:` to the list of recognised prefixes. - pidSearchWhere: new fast path for `pid:` terms — returns a single bare ILIKE predicate instead of the canonical exact-match + localpart pair. All other (scheme-bearing) terms keep existing behaviour unchanged. New tests in sql-builders-pid.test.mjs (7 additional, 34 total): - looksLikePid recognises pid: in multiple cases - pidSearchWhere emits correct bare ILIKE for pid:IEGIL000C and pid:k2000027w - Case-insensitive prefix strip (PID:, Pid:) - Injection safety: single-quote doubling and LIKE metachar escaping - Explicit Option-A confirmation: bare words without scheme not routed via PID Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes for users
Typing a plain word into the Explorer search box still does a regular text search across labels, place names, and descriptions — nothing changes there. But now, if you paste in a persistent identifier (PID) — an ARK, DOI, IGSN, handle, or a resolver URL like
https://n2t.net/ark:/...— the search detects it and does an exact match against that sample's identifier instead of a fuzzy text search. There's also an explicit escape hatch: prefixing your query withpid:(e.g.pid:k2000027w) runs a scheme-agnostic substring match against the identifier, useful when you only have a fragment of the PID or aren't sure which scheme it uses.Also resolves #26
This PR also fixes the ARK classic/modern collapse from #26 — classic-form ARKs (
ark:/...) and modern-form ARKs (ark:...) are now canonicalized before matching, so searching with either form finds the same record. Per triage, #26 ⊂ #278, so this PR addresses both.Implementation
assets/js/sql-builders.js: addscanonicalizePid,looksLikePid, andpidSearchWhere, which detect PID-shaped input, normalize across ARK/IGSN/DOI/handle schemes and resolver-URL prefixes, and build an injection-safe SQL WHERE clause (exact match for detected PIDs, ILIKE substring match for thepid:escape hatch).explorer.qmd: wires this intobuildSearchFilterso PID detection runs ahead of the existing text-search path, without disturbing it.tests/unit/sql-builders-pid.test.mjs: 29 new unit tests covering canonicalization, detection, and WHERE-clause generation (including SQL-injection-safety cases for both quote and LIKE-metacharacter escaping). Full unit suite is 42/42 passing after rebase onto currentmain.This branch was rebased onto
upstream/mainto pick up the squash-merged #300/#302 filtered-clusters work; only the two PID-search commits are new relative tomain.Closes #278, closes #26
🤖 Generated with Claude Code