Skip to content

[RFC] Refactor timestamp codecs #333

@flipp5b

Description

@flipp5b

For now, LocalDateTime is a "base" type for INT96 timestamp encoding/decoding: conversion for a Timestamp and Instant goes via LocalDateTime. I see a little mismatch here:

  • If I'm not mistaken, INT96 format is UTC adjusted.
  • LocalDateTime requires an additional piece of data to become UTC adjusted - timezone.
  • On the other hand, Instant is already UTC adjusted (and Timestamp can be directly converted to Instant).

So, it may be a little confusing that you need to specify a timezone to encode/decode an Instant or a Timestamp.

Meanwhile, we could use an Instant as a "base" type as follows:

private[parquet4s] object TimeValueCodecs {
// ...
  private val SecondsPerDay = TimeUnit.DAYS.toSeconds(1)

  def encodeInstant(instant: Instant): Value = BinaryValue {
    val julianSec  = instant.getEpochSecond + JulianDayOfEpoch * SecondsPerDay
    val julianDays = julianSec / SecondsPerDay
    val nanos      = TimeUnit.SECONDS.toNanos(julianSec % SecondsPerDay) + instant.getNano

    ByteBuffer
      .allocate(12)
      .order(ByteOrder.LITTLE_ENDIAN)
      .putLong(nanos)
      .putInt(julianDays.toInt)
      .array()
  }
// ...
}

trait TimeValueEncoders {
  implicit val localDateTimeEncoder: OptionalValueEncoder[LocalDateTime] = new OptionalValueEncoder[LocalDateTime] {
    def encodeNonNull(data: LocalDateTime, configuration: ValueCodecConfiguration): Value =
      TimeValueCodecs.encodeInstant(localDateTimeToInstant(data, configuration.timeZone))
  }

  implicit val instantEncoder: OptionalValueEncoder[Instant] = new OptionalValueEncoder[Instant] {
    def encodeNonNull(data: Instant, configuration: ValueCodecConfiguration): Value =
      TimeValueCodecs.encodeInstant(data)
  }

  implicit val sqlTimestampEncoder: OptionalValueEncoder[java.sql.Timestamp] = new OptionalValueEncoder[Timestamp] {
    def encodeNonNull(data: Timestamp, configuration: ValueCodecConfiguration): Value =
      TimeValueCodecs.encodeInstant(data.toInstant)
  }
}

In such a case, we specify timezone only for a LocalDateTime and the encodeInstant method itself looks a bit simpler than the encodeLocalDateTime.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions