#313 P1+P3: replace the boot-time facet-index full scan with a tiny trusted manifest#317
Merged
rdhyee merged 6 commits intoJul 2, 2026
Merged
Conversation
…st, derived from samp_geo) New build_sample_facet_index_meta() computes the per-source histogram directly from samp_geo (the same authoritative located-universe table build_sample_facet_index/build_facet_summaries already derive from), NOT by reading back sample_facet_index.parquet itself -- independence is the point, per Codex's 2026-07-01 review: an independent validator can then read the actual on-disk index and prove meta/index/facet_summaries agree. Registered in ARTIFACTS/HIER_ARTIFACTS, deliberately excluded from force_deps so `--only sample_facet_index_meta` alone builds just the meta file -- the escape hatch for pairing a new meta with an already-deployed index built from the same wide input. Part of isamplesorg#313 P1+P3 (facetIndexReady latency fix); validator + explorer.qmd wiring + P3 decoupling + P6 targeted test to follow in this branch.
…ainst the real index New --index-meta gate in validate_frontend_derived.py: schema/shape checks, then (given --index) a FRESH full scan of the actual on-disk sample_facet_index recomputes the per-source histogram/build_id/schema_version/row_count and diffs it against the manifest via symmetric EXCEPT (relational content, not byte identity) -- this is the independence Codex's review required: the validator does not trust meta's self-reported numbers or read meta back to derive its own expectation. Also cross-checks meta against facet_summaries' source facet, mirroring the comparison the explorer runtime performs. Continues isamplesorg#313 P1+P3 (see prior commit).
…ntract Adds SERIALIZATIONS.md §4.13 and a DATA_PROVENANCE.md summary line for the new manifest artifact: independence from sample_facet_index (built from samp_geo, not read back), the --only escape hatch, and the R2 same-build_id pairing requirement.
…decouple masks scan P1: facetIndexReady now reads index_meta_url (a few KB, built at compile time from samp_geo and independently validated against the real index) instead of scanning the 9.68MB sample_facet_index.parquet directly. Same checks (schema version, node_bits generation match, per-source coverage vs facet_summaries), same data, just sourced from the cheap pre-verified manifest. The big index file is now touched only lazily, when a user's actual multi-filter count query runs -- never during the readiness check. P3: split nodeBitsReady into nodeBitsCoreReady (step 1, node_bits fetch, publishes __nodeBitsMap/__nodeBitsBuild) and a thinner nodeBitsReady (step 2, the 9.67MB masks scan). facetIndexReady now depends on nodeBitsCoreReady only -- previously it depended on the whole nodeBitsReady cell, which meant it couldn't even start until the masks scan finished, even though the values it needs are published synchronously before that scan begins. nodeBitsReady itself now awaits facetIndexReady's settlement (ready or failed, either is fine) before starting the masks scan, so the two don't race for the single DuckDB-WASM connection -- same discipline as whenConnectionIdle elsewhere in this file. Completes the explorer.qmd side of isamplesorg#313 P1+P3 (see prior two commits for the data-pipeline side: build_frontend_derived.py + validate_frontend_derived.py).
…ding/failed race Adds a narrow firefox-facet-index-meta Playwright project scoped to ONE new spec (tests/playwright/facet-index-meta-pending.spec.js), not a broad Firefox enable. Test 1 uses page.route() to hold/release the sample_facet_index_meta fetch and proves window.__facetIndexStatus stays 'pending' while held and settles (ready/failed) once released. Test 2 exercises the exact UI contract for 2 active Material filters at global view across pending -> failed -> ready, reusing the real production handleFacetFilterChange/ updateCrossFilteredCounts code path. Empirical finding baked into the design (documented in the spec's header): DuckDB-WASM's non-threaded worker serializes queries, so holding the meta fetch open also starves the Material facet's own independent query -- a real held request and "Material checkboxes interactive" can't coexist in a single fresh page load. Test 2 therefore drives window.__facetIndexStatus directly (the same global the real preflight sets) after a normal boot, which lets it assert the pending/failed contract deterministically and still trigger a REAL count query for the 'ready' step (sample_facet_index and facet_node_bits are already live on R2; only the new meta manifest isn't). That real query was confirmed to genuinely start against production but did not resolve within the spec's window in this sandboxed environment (a large, network-bound full-file read) -- so the 'ready' step is a best-effort/soft check, not a hard CI assertion, with the reasoning documented inline. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XEtSoXjsKtnYWQ7yS8mGRo
…ntract spec Test 2 (pending -> failed -> ready UI contract) failed on repeat local runs: the DOM was still showing the "(Loading…)" pending state when the test expected "(—)" failed, well past the original 45s poll window. Tried and reverted: blocking the real sample_facet_index_meta fetch to "neutralize" the real boot-time preflight racing the test's manual window.__facetIndexStatus injections. That reintroduces the exact FIFO single-worker starvation the spec's own DESIGN NOTE documents -- Material's facet_tree_summaries query gets stuck behind the held route on the same DuckDB-WASM worker, so the checkboxes this test needs never render at all. Root cause is more likely general single-worker query-queue congestion in this sandbox's network path to data.isamples.org (the same Firefox slowness already documented for the 'ready' step) occasionally delaying the pending->failed repaint past 45s, not a status race -- the real preflight resolves to 'failed' quickly (a 404, not a large download) well before this test's manual steps run. Fix: generous-but-bounded timeouts (45s -> 90s) on both the pending and failed polls, test.setTimeout 180s -> 300s to give them room. Verified 3/3 clean runs locally after the change (previously flaked on run 2 of 2). Also verified independently: 46/46 unit tests, 39/39 python pipeline tests, explorer-smoke (chromium) all still pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this fixes
Part of #313 (Explorer slowness on slow connections). Live repro today: a URL with a preset facet filter at continental zoom took ~45-50 seconds to fully resolve on current production — reproduced independent of any recent changes, so this is a pre-existing latency issue, not a regression.
facetIndexReadyinexplorer.qmdcurrently does two expensive things against the livesample_facet_index.parquet(9.68 MB, ~6M rows) on every page load, blocking multi-filter count readiness:SELECT DISTINCT build_id, schema_version FROM read_parquet(index_url)— touches build_id/schema_version columns across every row group of the 9.68 MB file.SELECT source, COUNT(*) FROM read_parquet(index_url) GROUP BY sourcevsfacet_summaries— a full 6M-row scan.This PR eliminates both, per the joint Claude+Codex mitigation plan from the original 2026-06-26 investigation (P0, the "Loading…" honesty-state fix, already shipped as #316).
P1 — trusted build-time manifest
sample_facet_index_meta.parquetartifact (scripts/build_frontend_derived.py): a tiny (~1 KB) per-source histogram +build_id/schema_version/total_rows, computed directly fromsamp_geo— the same authoritative tablesample_facet_indexitself derives from, not read back from the index (independence is the point: a buggy index build could carry self-consistent-but-wrong metadata).scripts/validate_frontend_derived.py): reads the actual on-disksample_facet_index.parquet(full scan — fine at build/CI time, never the browser critical path) and asserts it matches the manifest.explorer.qmd'sfacetIndexReadynow reads the tiny manifest instead of scanning the big index. Same checks (schema version, node_bits generation match, coverage vsfacet_summaries), same data, just a cheaper source. The big index is now touched only lazily, when a user's actual multi-filter query runs.--only sample_facet_index_metabuilds just the meta file without forcing a full index rebuild — for pairing a new meta file with an already-deployed index built from the same input (see deployment note below).P3 — decouple the masks scan from the readiness gate
facetIndexReadypreviously waited on the entirenodeBitsReadycell, including a 9.67 MB masks scan it doesn't actually need (only__nodeBitsBuild, set after a 2 KB fetch). Split intonodeBitsCoreReady(fast) +nodeBitsReady(masks scan, now sequenced to run afterfacetIndexReadysettles so the two don't contend for the single DuckDB-WASM connection).P6 (targeted) — Firefox regression spec
Narrow
firefox-facet-index-metaPlaywright project, scoped to one new spec proving the pending→failed→ready UI contract and that a held/blocked manifest fetch never produces a permanent-looking stuck state.Verification
explorer-smoke(chromium), and the new Firefox spec (3/3 clean runs) all pass.sample_facet_index.parquet, and verified via the live public URL that thebuild_ids are identical.No R2 write access was available while building this, so the new
sample_facet_index_meta.parquetfile has not been uploaded. Built locally at/Users/raymondyee/Data/iSample/pqg_refining/staged_202608/p1_meta_local/isamples_202608_sample_facet_index_meta.parquet(also verified reproducible independently, sha256-matched inputs).This is safe to merge before the upload happens: today,
facetIndexReadyalways ends in'failed'(no index at all reachable in a useful way for this check). After merging but before uploading the new file, it will still end in'failed', just via a fast 404 instead of a slow scan — a net improvement to the failure path with zero behavior change to the success path (which simply isn't reachable yet either way). It becomes fully active (fast'ready'path) the momentisamples_202608_sample_facet_index_meta.parquetis uploaded to R2 (isamples-rybucket) alongside the existingisamples_202608_sample_facet_index.parquet— samebuild_id, confirmed paired above.Relates to #313 (not closing — P1+P3 shipped; P2 DuckDB-WASM upgrade and P4/P5 remain deferred per the original review).
🤖 Generated with Claude Code