pyspark.sql.functions.timestamp_diff#
- pyspark.sql.functions.timestamp_diff(unit, start, end)[source]#
Gets the difference between the timestamps in the specified units by truncating the fraction part.
New in version 4.0.0.
- Parameters
- unitliteral string
This indicates the units of the difference between the given timestamps. Supported options are (case insensitive): “YEAR”, “QUARTER”, “MONTH”, “WEEK”, “DAY”, “HOUR”, “MINUTE”, “SECOND”, “MILLISECOND” and “MICROSECOND”.
- start
Column
or column name A timestamp which the expression subtracts from endTimestamp.
- end
Column
or column name A timestamp from which the expression subtracts startTimestamp.
- Returns
Column
the difference between the timestamps.
Examples
>>> import datetime >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame( ... [(datetime.datetime(2016, 3, 11, 9, 0, 7), datetime.datetime(2024, 4, 2, 9, 0, 7))], ... ['ts1', 'ts2']) >>> df.select('*', sf.timestamp_diff('year', 'ts1', 'ts2')).show() +-------------------+-------------------+-----------------------------+ | ts1| ts2|timestampdiff(year, ts1, ts2)| +-------------------+-------------------+-----------------------------+ |2016-03-11 09:00:07|2024-04-02 09:00:07| 8| +-------------------+-------------------+-----------------------------+
>>> df.select('*', sf.timestamp_diff('WEEK', 'ts1', 'ts2')).show() +-------------------+-------------------+-----------------------------+ | ts1| ts2|timestampdiff(WEEK, ts1, ts2)| +-------------------+-------------------+-----------------------------+ |2016-03-11 09:00:07|2024-04-02 09:00:07| 420| +-------------------+-------------------+-----------------------------+
>>> df.select('*', sf.timestamp_diff('day', df.ts2, df.ts1)).show() +-------------------+-------------------+----------------------------+ | ts1| ts2|timestampdiff(day, ts2, ts1)| +-------------------+-------------------+----------------------------+ |2016-03-11 09:00:07|2024-04-02 09:00:07| -2944| +-------------------+-------------------+----------------------------+