Add synonym expansion to component lexical search#2425
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
🎩 PreviewA preview build has been created at: |
2655160 to
dce82a1
Compare
dce82a1 to
f5a29c0
Compare
🤖 Code review — Add synonym expansion to component lexical searchThe architecture here is the strong part: synonyms expand on the query side only, and the comment explaining why (expanding the index too would intersect a query token set against a ballooned index token set and surface components matching neither literal text nor intent) is exactly right. The single-word-key guard and the Findings:
Synonym list itself is sensible domain coverage for an ML component library. |

Description
Adds a synonym expansion system to the component lexical search so that queries using common aliases resolve to the intended components. For example, searching
gcsnow surfaces storage-related components,fitsurfaces training components,infersurfaces prediction components, anddfsurfaces dataframe/table components.A new
componentSearchSynonyms.tsmodule defines synonym groups (e.g.gcs ↔ storage ↔ bucket,train ↔ fit,predict ↔ infer,df ↔ dataframe ↔ table) and exposesexpandSynonymTokens, which fans out any recognized token into all members of its group.The search pipeline was also refactored to separate base tokenization (
baseSearchTokens) from the full normalized text used for document indexing. Synonym expansion is applied to query tokens before scoring, and the phrase-match bonus now uses the original (pre-expansion) token sequence so multi-word phrase matching remains accurate.Related Issue and Pull requests
Type of Change
Checklist
Screenshots (if applicable)
Test Instructions
gcsand confirm storage/bucket components appear at the top.fitand confirm model training components appear.inferand confirm prediction components appear.dfand confirm dataframe/table components appear.train test split) still correctly ranks exact name matches above partial matches.Additional Comments
The synonym groups are intentionally domain-neutral and kept in a single flat list in
componentSearchSynonyms.tsto make it easy to extend with additional aliases in the future. THIS IS NOT AN EXHAUSTIVE LIST