[SPARK-57716][SQL][TESTS] Add SORT / ORDER BY and window correctness tests for nanosecond-precision timestamp types#56811
Conversation
|
cc @MaxGekk PTAL. |
MaxGekk
left a comment
There was a problem hiding this comment.
0 blocking, 2 non-blocking, 0 nits.
Clean, well-documented tests-only PR. The suites are built so they genuinely observe nanosecond ordering (tie-break fixtures share epochMicros and differ only in nanosWithinMicro, so the distinct row-number / rank / strict-ordered collect is the real proof), assertions are correct, placement matches the existing TimestampNanos*SuiteBase family, and the cache intercept correctly documents the current gap. I independently confirmed the six infrastructure claims the comments rely on (testing-default flag, atomicTypes membership, ORC vectorized nanos getters, the cache-write throw path, findWiderDateTimeType widening, and the golden-file session-zone / bare-LTZ-literal round-trip) against the current source. Two minor non-blocking notes only.
Design / architecture (1)
- PR description: the body says this "mirrors the MIN/MAX follow-up which was likewise tests-only," but there is no
MinMaxsuite in the tree — MIN/MAX coverage appears to live insideTimestampNanosFunctionsSuiteBase. Minor description imprecision; not load-bearing.
Suggestions (1)
- TimestampNanosWindowSuiteBase.scala:101: LTZ
row_numbertest covers only ASC while its NTZ sibling covers ASC+DESC — see inline.
| } | ||
| } | ||
|
|
||
| test("row_number over a nanosecond TIMESTAMP_LTZ orders by the sub-microsecond part") { |
There was a problem hiding this comment.
Minor (non-blocking): this LTZ row_number test asserts only the ASC ordering, whereas its NTZ sibling (row_number over a nanosecond TIMESTAMP_NTZ ...) asserts both ASC and DESC. The comparator is type-shared so the risk is low, but adding the DESC arm here would make the NTZ/LTZ pair symmetric and guard the LTZ descending path explicitly.
There was a problem hiding this comment.
added the DESC arm to the LTZ row_number test.
fd5d84d to
7d93c3c
Compare
…sion timestamp types Co-authored-by: Isaac
7d93c3c to
f0c8798
Compare
What changes were proposed in this pull request?
Tests-only coverage for SORT / ORDER BY and window functions over the nanosecond-precision
timestamp types
TIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p)(pin[7, 9]):TimestampNanosWindowSuiteBase(+ ANSI on/off):row_number/rank/dense_rank/lag/leadover a nanosecond ordering key, NTZ and LTZ, whole-stage codegen on and off. No existing test asserts a window function over a timestamp ordering key.TimestampNanosSortSuiteBase(+ ANSI on/off): the DataFrame/SQL scenarios not already covered generically byOrderingSuite/SortSuite(which exercise these types viaDataTypeTestUtils.atomicTypes) — a public-APIORDER BYsub-microsecond tie-break smoke test, mixed-precisionUNIONordering, a vectorized-ORC-read-then-sort, and aninterceptdocumenting that caching a nanosecond column is not supported yet.ORDER BY+row_number+leadsection appended totimestamp-ntz-nanos.sql/timestamp-ltz-nanos.sql.Why are the changes needed?
SORT / ORDER BY and window functions already work over the nanosecond types (they ride on the orderability / hashing /
UnsafeRowprimitives), but had no dedicated test coverage. This locks the behaviour in, mirroring the earlier tests-only MIN/MAX follow-up (whose coverage lives inTimestampNanosFunctionsSuiteBase). Joins are deferred to a later change.Does this PR introduce any user-facing change?
No, tests only.
How was this patch tested?
New suites.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.8)