[SPARK-54588][SQL] Add time_format function to convert TIME values to formatted string representations#53320
[SPARK-54588][SQL] Add time_format function to convert TIME values to formatted string representations#53320vinodkc wants to merge 1 commit into
Conversation
MaxGekk
left a comment
There was a problem hiding this comment.
1 blocking, 1 non-blocking, 3 nits. A clean, well-tested TIME analogue of date_format (caching, eval/codegen parity, nulls, and since all correct); one error-quality issue plus doc nits.
Note: this PR is closed-unmerged — leaving this for the record / in case the work continues in a successor PR.
Correctness (1)
time_formatleaks a rawjava.time.temporal.UnsupportedTemporalTypeExceptionfor a date-field pattern (e.g.'HH:MM:ss', whereMM= month) instead of a Spark error — syntactically-invalid patterns are wrapped cleanly, but a valid-but-inapplicable letter is not. See inline ontimeExpressions.scala.
Suggestions (1)
- No nanosecond (
SSSSSSSSS, precision 7–9) fractional-second coverage; tests stop at micros. See inline onTimeExpressionsSuite.scala.
Nits (3)
time.sqlcomment claimsMM"returns epoch month (01)" but the golden output for that query is the exception above — factually wrong. Plus a broken@paramfragment infunctions.scalaand "date format" → "time format" in the@ExpressionDescription. See inline.
@vinodkc — would you consider reopening this PR to continue the work? The function is in good shape; the one blocking item is error-handling polish for date-field patterns.
| val formatter = formatterOption.getOrElse { | ||
| TimeFormatter(format.toString, TimeFormatter.defaultLocale, isParsing = false) | ||
| } | ||
| UTF8String.fromString(formatter.format(nanos)) |
There was a problem hiding this comment.
A format that references a date-only field (e.g. MM, yyyy, dd) leaks a raw java.time.temporal.UnsupportedTemporalTypeException: Unsupported field: MonthOfYear here — see the golden output for SELECT time_format(TIME'14:30:45', 'HH:MM:ss') in time.sql.out. TimeFormatter.validatePatternString passes (MM is a valid letter), then LocalTime.format throws at runtime because a TIME has no date fields. Syntactically-invalid patterns are already wrapped cleanly (the 'invalid[[[' test), so this valid-but-inapplicable-letter path is the gap; date_format never hits it (timestamps carry all fields). Suggest validating the pattern to time-applicable fields, or catching the UnsupportedTemporalTypeException and raising a clear Spark error (an INVALID_DATETIME_PATTERN-style message scoped to TIME) rather than leaking the JDK exception.
| SELECT time_format(TIME'14:30:45', 'HH-mm-ss'); | ||
|
|
||
| -- Test common mistake: MM (month) vs mm (minute) | ||
| -- MM is for month, mm is for minute - TIME has no date so MM returns epoch month (01) |
There was a problem hiding this comment.
This comment says MM "returns epoch month (01)", but the golden output for the query below is UnsupportedTemporalTypeException: Unsupported field: MonthOfYear — it throws, it doesn't return 01. Date fields don't default to epoch for a TIME. Worth correcting the comment to say date fields are unsupported and raise an error (this also ties to the error-handling suggestion on the expression).
| * @param time | ||
| * A column of time values to be formatted. | ||
| * @param format | ||
| * A time format string. for valid patterns. |
There was a problem hiding this comment.
@param format reads "A time format string. for valid patterns." — a dangling fragment (a clause referencing the Datetime Patterns doc was dropped). Complete it, e.g. "A time format string. See <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html\">Datetime Patterns for valid patterns."
|
|
||
| // scalastyle:off line.size.limit | ||
| @ExpressionDescription( | ||
| usage = "_FUNC_(time, format) - Converts a time to a value of string in the format specified by the date format given by the second argument.", |
There was a problem hiding this comment.
Nit: the usage string says "...in the format specified by the date format given by the second argument" — copied from date_format. This function formats a TIME, so it should read "time format".
| TimeFormat(Literal(localTime(9, 5, 0), TimeType()), Literal("hh:mm:ss a")), | ||
| "09:05:00 AM") | ||
| checkEvaluation( | ||
| TimeFormat(timeLit, Literal("HH:mm:ss.SSSSSS")), |
There was a problem hiding this comment.
Optional: the fractional-second cases stop at 6 digits (micros). TIME supports precision up to 9 (nanos), so a TIME'…123456789' with SSSSSSSSS case (which checkEvaluation exercises in both interpreted and codegen paths) would lock in nanosecond formatting.
What changes were proposed in this pull request?
This PR adds a new
time_formatfunction that converts TIME data type values to formatted string representations, providing functionality similar to date_format but specifically designed for TIME values.Why are the changes needed?
Users need a standard way to format TIME values as strings for:
Does this PR introduce any user-facing change?
Yes, this PR adds a new public API function.
Scala API
Python API
SQL Usage
Format Pattern Support
HH:mm:ss14:30:45hh:mm:ss a02:30:45 PMH:mm9:15HH:mm:ss.SSS14:30:45.123HH:mm:ss.SSSSSS14:30:45.123456HH-mm-ss14-30-45'Time:' HH:mmTime: 14:30How was this patch tested?
Added tests in TimeExpressionsSuite, TimeFunctionsSuiteBase
Was this patch authored or co-authored using generative AI tooling?
No