Reproduce the issue
Create an SQL table (in Snowflake, for example)
Run a simple read/write pipeline (composed of TableNameInput and S3Output components) that writes to an Avro file on S3. The schema of the output file is correct, however, the values are not.
Root cause analysis
Everything started with this commit, 3 years ago, when - for some reason - null dates are stored as a value of -1.
When getting a date field, a conversion is done to return null instead of -1.
However, this ad-hoc approach of storing dates has a limitation that developers should not forget to implement the "-1 => null" logic in all the implementations of Record.
And this is the problem here; when converting an AvroRecord to IndexedRecord, this conversion is forgotten, and thus, -1 is returned instead.
This can be fixed in two ways:
- Add the converion logic for this case, and do not forget to do it for future use cases.
- Fix it once and for all and store the null dates as null (technically, do not store them) like what is done for other types.
The two solutions haven been implemented in the PR.