Uploaded image for project: 'Talend Component Kit'
  1. Talend Component Kit
  2. TCOMP-2186

Guess schema service for processors

Details

    • All
    • GreenHopper Ranking:
      0|i2httn:
    • 9223372036854775807
    • Small
    • To be defined

    Description

      Description of need

      On studio, some tck processor like "dataprep" will update the schema of incoming records. The kind of modification (rename a field, add one ...) can be known by processor with schema of incoming record, configuration of processor itself and outgoing branch.
      So, the need is to have a "Button" on processor to determine schema of produces records; and also that studio schema propagation use this service if exist.

      Supported method signatures for @DiscoverSchemaExtended annotation:

      /**
       * 
       * @param incomingSchema the schema of the input flow
       * @param conf the configuration of the processor (not a @Dataset)
       * @param branch the name of the output flow for which the the computed schema is expected (FLOW, MAIN, REJECT, etc.)
       * @return
       */
      @DiscoverSchemaExtended("full")
      public Schema guessMethodName(final Schema incomingSchema, final @Option("configuration") procConf, final String branch) {...}
      
      @DiscoverSchemaExtended("incoming_schema")
      public Schema guessMethodName(final Schema incomingSchema, final @Option procConf) {...}
      
      @DiscoverSchemaExtended("branch")
      public Schema guessMethodName(final @Option("configuration") procConf, final String branch) {...}
      
      @DiscoverSchemaExtended("minimal")
      public Schema guessMethodName(final @Option procConf) {...}
      

      Annotation value:
      The annotation action name should match the connector's name.

      Example:

      @Data
      @Processor(family = "TaCoKitGuessSchema", name = "outputDi")
      public static class StudioProcessor implements Serializable {
      
          @Option
          private ProcConf configuration;
      
          @ElementListener
          public Object next(Record in, Record out) {
              return null;
          }
      }
      

      In service class:

      @Service
      public static class StudioProcessorService implements Serializable {
      
          @DiscoverSchemaExtended("outputDi")
          public Schema discoverProcessorSchema(final Schema incomingSchema,  @Option("configuration") final ProcConf conf, final String branch) {
           ...
          }
      

      In Studio, by default, the guess schema will first search for an action named like the component. If the search fails, it will try to find a default one.
      On cloud environments, finding the action name via the component-server is more tricky, so I really recommend to name action to the target connector.

      Attachments

        1. guess_job.png
          guess_job.png
          23 kB
        2. st_input_schema.png
          st_input_schema.png
          60 kB
        3. st_output_FLOW.png
          st_output_FLOW.png
          62 kB
        4. st_process_main.png
          st_process_main.png
          56 kB
        5. st_processor_flows.png
          st_processor_flows.png
          36 kB
        6. st_processor_reject.png
          st_processor_reject.png
          59 kB

        Issue Links

          Activity

            People

              Unassigned Unassigned
              clesaec Christophe LeSaec
              emmanuel gallois, Fabien Desiles
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: