Skip to content

Transformation into data class corrupts TimeZones of Instant #1047

Open
@martinitus

Description

Reproduction:

@Test
fun foo() {

    data class TimeZonesTest(val with_timezone_offset: Instant, val without_timezone_offset: LocalDateTime)

    val csvContent =
        """
        with_timezone_offset,without_timezone_offset
        2024-12-12T13:00:00+01:00,2024-12-12T13:00:00
        """.trimIndent()

    val df = DataFrame.readCsv(
        csvContent.byteInputStream(),
//            colTypes = mapOf("with_timezone_offset" to ColType.Instant)   // *1
//            parserOptions = ParserOptions(dateTimeFormatter = ISO_OFFSET_DATE_TIME), // *2
    )

    println(df)
    println(df.schema())
    val parsed = df.toListOf<TimeZonesTest>().first()
    assertEquals(Instant.parse("2024-12-12T13:00:00+01:00"), parsed.with_timezone_offset)
}

This outputs:

with_timezone_offset without_timezone_offset
0     2024-12-12T12:00        2024-12-12T13:00

with_timezone_offset: kotlinx.datetime.LocalDateTime
without_timezone_offset: kotlinx.datetime.LocalDateTime

org.opentest4j.AssertionFailedError: 
Expected :2024-12-12T12:00:00Z
Actual   :2024-12-12T11:00:00Z

Changing the dateTimeFormatter (*2) has no effect on the test outcome.
Explicitly telling the parser to parse as Instant (*1) fixes the issue.

However, following the principle of least surprise IMHO it would be A LOT better to:

  • either have the conversion fail because the colum was processed as LocalDateTime, hence has no timezone information, and hence cannot be converted into an instant.
  • or automagically detect, that there is a timezone in the data, and directly parse the column as Instant.

Metadata

Assignees

Labels

bugSomething isn't workingcsvCSV / delim related issues

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions