Uploaded image for project: 'Talend DI components'
  1. Talend DI components
  2. TDI-46440

TCK schema is not complete when generated with AvroToRecord tool

Apply templateInsert Lucidchart Diagram
    XMLWordPrintable

Details

    Description

      I have an avro payload (see attached file customers_ordres.avro) and i try to convert it to TCK record. I use for that the org.talend.components.common.stream.input.avro.AvroToRecord tool.
      The input avro payload contains a "customer" array with records. The schema of these records is the following:

      {
      	"type": "record",
      	"name": "customer",
      	"fields": [{
      		"name": "custid",
      		"type": ["null",
      		"string"],
      		"default": null
      	},
      	{
      		"name": "name",
      		"type": ["null",
      		"string"],
      		"default": null
      	},
      	{
      		"name": "address",
      		"type": ["null",
      		{
      			"type": "record",
      			"name": "address",
      			"fields": [{
      				"name": "street",
      				"type": ["null",
      				"string"],
      				"default": null
      			},
      			{
      				"name": "city",
      				"type": ["null",
      				"string"],
      				"default": null
      			},
      			{
      				"name": "zipcode",
      				"type": ["null",
      				"string"],
      				"default": null
      			}]
      		}],
      		"default": null
      	},
      	{
      		"name": "rating",
      		"type": ["null",
      		"int"],
      		"default": null
      	}]
      }
      

      Some customers don't have all the information: for example the last customer doesn't have a zipcode.

      I noticed that when i convert this avro payload to a TCK record, some nested records have incomplete TCK schema. For example, for the last customer without zipcode, the generated TCK schema linked to the corresonding TCK record do not have a zipcode field. This TCK schema should have a zipcode field since the corresponding avro schema has a zipcode field.

      It could be possible to change a little bit the code of AvroToRecord in order to generate complete TCK schema:

      • Line 152: add inferSchema(record) when calling recordBuilderFactory.newRecordBuilder
            private void buildArrayField(org.apache.avro.Schema.Field field, Collection<?> value, Record.Builder recordBuilder,
                    Entry entry) {
                final org.apache.avro.Schema arraySchema = AvroHelper.getUnionSchema(field.schema());
                final org.apache.avro.Schema arrayInnerType = arraySchema.getElementType();
        
                final Collection<?> objectArray;
                switch (arrayInnerType.getType()) {
                case RECORD:
                    objectArray = ((Collection<GenericRecord>) value).stream()
                            .map(record -> avroToRecord(record, arrayInnerType.getFields(), recordBuilderFactory.newRecordBuilder(inferSchema(record))))
                            .collect(Collectors.toList());
                    break;
        
      • line 188: add inferSchema((GenericRecord) value) when calling recordBuilderFactory.newRecordBuilder
          protected void buildField(org.apache.avro.Schema.Field field, Object value, Record.Builder recordBuilder, Entry entry) {
              String logicalType = field.schema().getProp(AVRO_LOGICAL_TYPE);
              org.apache.avro.Schema.Type fieldType = AvroHelper.getFieldType(field);
              switch (fieldType) {
              case RECORD:
                  recordBuilder.withRecord(entry, avroToRecord((GenericRecord) value, ((GenericRecord) value).getSchema().getFields(),
                          recordBuilderFactory.newRecordBuilder(inferSchema((GenericRecord) value))));
                  break;
      

      Attachments

        1. customers_orders.avro
          3 kB
        2. image (36).png
          image (36).png
          55 kB
        3. image (37).png
          image (37).png
          131 kB
        4. image (38).png
          image (38).png
          118 kB
        5. screenshot-1.png
          screenshot-1.png
          48 kB
        6. screenshot-2.png
          screenshot-2.png
          134 kB
        7. screenshot-3.png
          screenshot-3.png
          122 kB

        Issue Links

          Activity

            People

              pteyssier pierre teyssier
              timbault Tony Imbault
              Christophe LeSaec, Oleksandra Tkachenko (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: