Liberal date/time parsing in Pig

(Related to Liberal date/time parsing in Joda-Time)

Hey, We're dealing with a pig column that contains mixed date formats: for some records its 09/11/2004 00:00:00, and for some records its 09/11/2004 00:00:00.000000.

We tried parsing it using pig11 ToDate (which internally uses jodatime's DateTimeFormat.forPattern(DataType.toString(input.get(1)));

  • MM/dd/yyyy HH:mm:ss we get Invalid format: "12/31/1969 00:00:00" is too short at org.joda.time.format.DateTimeFormatter.parseDateTime

  • MM/dd/yyyy HH:mm:ss.000000 we get Invalid format: "09/25/2009 00:00:00.000000" is malformed at ".000000"

Can you suggest a time-format that will handle both? Do we need to use a custom ToDate function? Thanks!

Jon Skeet
people
quotationmark

You can use DateTimeFormatterBuilder to achieve this with an optional part:

import org.joda.time.format.*;

class Test {
    private static final DateTimeFormatter formatter =
        new DateTimeFormatterBuilder()
            .appendPattern("MM/dd/yyyy HH:mm:ss")
            .appendOptional(DateTimeFormat.forPattern(".SSSSSS").getParser())
            .toFormatter();

    public static void main(String args[]) {
        testParse("09/11/2004 00:00:00");
        testParse("09/11/2004 00:00:00.000000");
    }

    private static void testParse(String input) {
        System.out.println(formatter.parseLocalDateTime(input));
    }
}

people

See more on this question at Stackoverflow