Uploaded image for project: 'Talend Component Kit'
  1. Talend Component Kit
  2. TCOMP-2021

Missing logic when handling null date values in Record

Apply templateInsert Lucidchart Diagram
    XMLWordPrintable

Details

    • All
    • Small

    Description

      Reproduce the issue

      Create an SQL table (in Snowflake, for example)

      create or replace table TEST_DATE(
       date DATE
      );
      INSERT INTO TEST_DATE
      VALUES
       ('1989-07-07'),
       ('1989-07-07'),
       ('1989-07-07'),
       (null),
       (null);
      

       

      Run a simple read/write pipeline (composed of TableNameInput and S3Output components) that writes to an Avro file on S3. The schema of the output file is correct, however, the values are not.

      The schema

      {
       "type" : "record",
       "name" : "Record_1_2789471843289431261",
       "namespace" : "org.talend.sdk.component.schema.generated",
       "fields" : [ {
         "name" : "DATE",
         "type" : [ "null", {
         "type" : "long",
         "logicalType" : "timestamp-millis",
         "talend.component.DATETIME" : "true"
       } ]
       } ]
      }

      The data

      {"DATE":{"long":615772800000}}
      {"DATE":{"long":615772800000}}
      {"DATE":{"long":615772800000}}
      {"DATE":{"long":-1}}
      {"DATE":{"long":-1}}

      Expected data

      {"DATE":{"long":615772800000}}
      {"DATE":{"long":615772800000}}
      {"DATE":{"long":615772800000}}
      {"DATE":null}
      {"DATE":null}

       

      Root cause analysis

      Everything started with this commit, 3 years ago, when - for some reason - null dates are stored as a value of -1.
      When getting a date field, a conversion is done to return null instead of -1.
      However, this ad-hoc approach of storing dates has a limitation that developers should not forget to implement the "-1 => null" logic in all the implementations of Record.
      And this is the problem here; when converting an AvroRecord to IndexedRecord, this conversion is forgotten, and thus, -1 is returned instead.

      This can be fixed in two ways:

      1. Add the converion logic for this case, and do not forget to do it for future use cases.
      2. Fix it once and for all and store the null dates as null (technically, do not store them) like what is done for other types.

      The two solutions haven been implemented in the PR.

      Attachments

        Issue Links

          Activity

            People

              emmanuel_g emmanuel gallois
              jdarrous Jad Darrous
              Jad Darrous
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: