Spark支持的java.time.Instant最大(小)值是多少?

java.time.Instant

在Spark 3.0中, java8 time API 被用到Spark datetime相关的内部计算和用户API中,比如Instant对象就被Mapping到Spark SQL类型TimestampTypeTimestampType内部实际以Long表示microsecond, 如其文档所述:

/**
 * The timestamp type represents a time instant in microsecond precision.
 * Valid range is [0001-01-01T00:00:00.000000Z, 9999-12-31T23:59:59.999999Z] where
 * the left/right-bound is a date and time of the proleptic Gregorian
 * calendar in UTC+00:00.
 *
 * Please use the singleton `DataTypes.TimestampType` to refer the type.
 * @since 1.3.0
 */

而Instant用一个Long类型字段存储EPOCH时刻相加减的秒数,和一个Int类型字段存储纳秒数,最大值和最小值远超Spark TimestampType所能承受的范围。

scala> Instant.MAX
res22: java.time.Instant = +1000000000-12-31T23:59:59.999999999Z
scala> Instant.MIN
res25: java.time.Instant = -1000000000-01-01T00:00:00Z
scala> Instant.EPOCH
res26: java.time.Instant = 1970-01-01T00:00:00Z

那么Spark 能支持多大的Instant实例呢?

scala> val t = Instant.ofEpochSecond(9223372036854L, 775807999)
t: java.time.Instant = +294247-01-10T04:00:54.775807999Z

scala> Seq(t).toDF
res20: org.apache.spark.sql.DataFrame = [value: timestamp]

scala> val t = Instant.ofEpochSecond(9223372036854L, 775808000)
t: java.time.Instant = +294247-01-10T04:00:54.775808Z

scala> Seq(t).toDF
java.lang.RuntimeException: Error while encoding: java.lang.ArithmeticException: long overflow
staticinvoke(class org.apache.spark.sql.catalyst.util.DateTimeUtils$, TimestampType, instantToMicros, input[0, java.time.Instant, true], true, false) AS value#67
  at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:215)
  at org.apache.spark.sql.SparkSession.$anonfun$createDataset$1(SparkSession.scala:466)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
  at scala.collection.immutable.List.foreach(List.scala:392)
  at scala.collection.TraversableLike.map(TraversableLike.scala:238)
  at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
  at scala.collection.immutable.List.map(List.scala:298)
  at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:466)
  at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:353)
  at org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:231)
  ... 47 elided
Caused by: java.lang.ArithmeticException: long overflow
  at java.lang.Math.addExact(Math.java:809)
  at org.apache.spark.sql.catalyst.util.DateTimeUtils$.instantToMicros(DateTimeUtils.scala:411)
  at org.apache.spark.sql.catalyst.util.DateTimeUtils.instantToMicros(DateTimeUtils.scala)
  at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
  at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:211)
  ... 56 more

显然和Spark API文档描述的不一致,实际可以表示的Timestamp值最大为·+294247-01-10T04:00:54.775807999Z远比9999-12-31T23:59:59.999999Z`(文档值)大不少,也比Instant的实际最大值小了不少个数量级。
最小值雷同,不再赘述。

你可能感兴趣的:(Spark支持的java.time.Instant最大(小)值是多少?)