Skip to content

[SPARK-54588][SQL] Add time_format function to convert TIME values to formatted string representations#53320

Open
vinodkc wants to merge 1 commit into
apache:masterfrom
vinodkc:br_time_format_support
Open

[SPARK-54588][SQL] Add time_format function to convert TIME values to formatted string representations#53320
vinodkc wants to merge 1 commit into
apache:masterfrom
vinodkc:br_time_format_support

Conversation

@vinodkc

@vinodkc vinodkc commented Dec 4, 2025

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This PR adds a new time_format function that converts TIME data type values to formatted string representations, providing functionality similar to date_format but specifically designed for TIME values.

Why are the changes needed?

Users need a standard way to format TIME values as strings for:

  • Display purposes: Presenting time in user-friendly formats
  • Reporting: Generating formatted output for reports and dashboards
  • Data export: Converting TIME values to specific string formats for external systems
  • Localization: Supporting different time display conventions (12-hour vs 24-hour)

Does this PR introduce any user-facing change?

Yes, this PR adds a new public API function.

Scala API

import org.apache.spark.sql.functions._

df.select(time_format($"time_col", "HH:mm:ss"))
df.select(time_format($"time_col", "hh:mm:ss a"))

Python API

from pyspark.sql import functions as F

df.select(F.time_format("time_col", "HH:mm:ss"))
df.select(F.time_format("time_col", "hh:mm:ss a"))

SQL Usage

SELECT time_format(TIME'14:30:45', 'HH:mm:ss');           -- '14:30:45'
SELECT time_format(TIME'14:30:45.1234', 'hh-mm-ss.SS a');         -- '02-30-45.12 PM'
SELECT time_format(TIME'09:05:00', 'h:mm a');             -- '9:05 AM'

Format Pattern Support

Pattern Description Example Output
HH:mm:ss 24-hour format 14:30:45
hh:mm:ss a 12-hour with AM/PM 02:30:45 PM
H:mm Single-digit hour 9:15
HH:mm:ss.SSS With milliseconds 14:30:45.123
HH:mm:ss.SSSSSS With microseconds 14:30:45.123456
HH-mm-ss Custom separator 14-30-45
'Time:' HH:mm With text literal Time: 14:30

How was this patch tested?

Added tests in TimeExpressionsSuite, TimeFunctionsSuiteBase

Was this patch authored or co-authored using generative AI tooling?

No

@MaxGekk MaxGekk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 blocking, 1 non-blocking, 3 nits. A clean, well-tested TIME analogue of date_format (caching, eval/codegen parity, nulls, and since all correct); one error-quality issue plus doc nits.

Note: this PR is closed-unmerged — leaving this for the record / in case the work continues in a successor PR.

Correctness (1)

  • time_format leaks a raw java.time.temporal.UnsupportedTemporalTypeException for a date-field pattern (e.g. 'HH:MM:ss', where MM = month) instead of a Spark error — syntactically-invalid patterns are wrapped cleanly, but a valid-but-inapplicable letter is not. See inline on timeExpressions.scala.

Suggestions (1)

  • No nanosecond (SSSSSSSSS, precision 7–9) fractional-second coverage; tests stop at micros. See inline on TimeExpressionsSuite.scala.

Nits (3)

  • time.sql comment claims MM "returns epoch month (01)" but the golden output for that query is the exception above — factually wrong. Plus a broken @param fragment in functions.scala and "date format" → "time format" in the @ExpressionDescription. See inline.

@vinodkc — would you consider reopening this PR to continue the work? The function is in good shape; the one blocking item is error-handling polish for date-field patterns.

val formatter = formatterOption.getOrElse {
TimeFormatter(format.toString, TimeFormatter.defaultLocale, isParsing = false)
}
UTF8String.fromString(formatter.format(nanos))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A format that references a date-only field (e.g. MM, yyyy, dd) leaks a raw java.time.temporal.UnsupportedTemporalTypeException: Unsupported field: MonthOfYear here — see the golden output for SELECT time_format(TIME'14:30:45', 'HH:MM:ss') in time.sql.out. TimeFormatter.validatePatternString passes (MM is a valid letter), then LocalTime.format throws at runtime because a TIME has no date fields. Syntactically-invalid patterns are already wrapped cleanly (the 'invalid[[[' test), so this valid-but-inapplicable-letter path is the gap; date_format never hits it (timestamps carry all fields). Suggest validating the pattern to time-applicable fields, or catching the UnsupportedTemporalTypeException and raising a clear Spark error (an INVALID_DATETIME_PATTERN-style message scoped to TIME) rather than leaking the JDK exception.

SELECT time_format(TIME'14:30:45', 'HH-mm-ss');

-- Test common mistake: MM (month) vs mm (minute)
-- MM is for month, mm is for minute - TIME has no date so MM returns epoch month (01)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment says MM "returns epoch month (01)", but the golden output for the query below is UnsupportedTemporalTypeException: Unsupported field: MonthOfYear — it throws, it doesn't return 01. Date fields don't default to epoch for a TIME. Worth correcting the comment to say date fields are unsupported and raise an error (this also ties to the error-handling suggestion on the expression).

* @param time
* A column of time values to be formatted.
* @param format
* A time format string. for valid patterns.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@param format reads "A time format string. for valid patterns." — a dangling fragment (a clause referencing the Datetime Patterns doc was dropped). Complete it, e.g. "A time format string. See <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html\">Datetime Patterns for valid patterns."


// scalastyle:off line.size.limit
@ExpressionDescription(
usage = "_FUNC_(time, format) - Converts a time to a value of string in the format specified by the date format given by the second argument.",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the usage string says "...in the format specified by the date format given by the second argument" — copied from date_format. This function formats a TIME, so it should read "time format".

TimeFormat(Literal(localTime(9, 5, 0), TimeType()), Literal("hh:mm:ss a")),
"09:05:00 AM")
checkEvaluation(
TimeFormat(timeLit, Literal("HH:mm:ss.SSSSSS")),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: the fractional-second cases stop at 6 digits (micros). TIME supports precision up to 9 (nanos), so a TIME'…123456789' with SSSSSSSSS case (which checkEvaluation exercises in both interpreted and codegen paths) would lock in nanosecond formatting.

@vinodkc vinodkc reopened this Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants