Preparation function spend lot of time with building new Record with new Schema.
Optimisation could be made on TCK Record & Record.Builder
Record.BuilderImpl contains a Map<String, Entry> that is only instanciated when a schema is provided and which is not used in public function (let the client to parse all entries before finding the right one).
So, a function like "Entry getEntry(String name)" could be welcomed to help.
Builder.withNewSchema use code like :
Knowing that schema.getEntry(name) function scan a list before finding the right Entry. This mean that is schema contains N fields and new Schema M, it will make NxM comparisons. (By example, if schema contains 50 fields, and new schema 51 (add one field), it will make 2550 comparison).
Puting schema entries in a Map for example will reduce it to MxLog(N) for a treeMap and would be statictically more efficient with a HashMap.
(this function is not used by processing connector)
First performance test
For 100 000 records with 100 fields each, with uppercase processor on 1 columns (adding new col)
(average for 4 try on each case)
|Current TCK||Optimized TCK|
More than more than 62% time saved.