Uploaded image for project: 'Talend Data Prep'
  1. Talend Data Prep
  2. TDP-9927

[RunConv][Backend] Run a preparation on RE using a simple TCK function

Apply templateInsert Lucidchart Diagram


    • All
    • DGA Sprint 8 (4 to 21 June), DGA Sprint 9 (25/6 to 12/7)
    • GreenHopper Ranking:
    • 9223372036854775807
    • Small
    • 3


      As a Data Preparation user, in Runtime Convergence mode,
      I want to run my preparation on a remote engine with a simple function (upper case), with a dataset as destination
      In order to benefit from dataset common features (e.g. output connectivity, Quality, Trust Score, reuse it as input of a preparation or a pipeline)

      [Backend API part: Provide an API to transform a TDP recipe into TCK configuration + Generate and run a pipeline with a simple TCK function ]

      Objective: Use a simple migrated TCK function (from connectors-ee) during the preparation run


      In Track 1, the pipeline is built and run using the Data Prep processor & preparation definition. It is therefore not using the TCK function on the Remote Engine.
      We need to switch to the new approach for the available function(s) (starting with Uppercase) to be able to validate the migrated TCK functions.


      • Provide a new endpoint in prepV2 service (to be called at a later stage by Pipeline Designer DataPrepprocessor + tDataPrepRun Studio component)
        • Definition and mock => TDP-10144 /{preparationId}/runs/recipe
        • Mapping from TDP recipe (TDP steps: functions & parameters) to a list of TCK functions with parameters (migrated TCK functions)
        • Instanciation of mapping for a simple function (Uppercase, Lowercase, Concat)
      • Check and adapt mapping from TCK function to TDP recipe (add function)
      • Integrate input + TCK config (result of mapping of the TDP recipe into a list of TCK functions) + output to call Pipeline API

      Acceptance criteria

      Scenario: Run a preparation with 2 uppercase steps (with UI)
      Given a tenant with Runtime Convergence activated,
      a dataset dataset1 (ex: S3 or local connection) with at least two text columns
      and a dataset dataset2 created with name DATASET_OUTPUT_FULLRUN (ex: S3 or local connection)
      and a preparation based on dataset1
      When user applies "change to upper case" function on columnA - without creating a new column
      And applies"change to upper case" function on columnB - creating a new column
      And presses the "Export" button
      Then the preparation result is exported to dataset2 DATASET_OUTPUT_FULLRUN with the correct output (ex: check the sample)

      • columnA values are in upper case
      • a new column contains columnB values in upper case

      Out of scope:

      • Interactive mode with TCK => only run covered (previously known as export fullrun) - in interactive mode, legacy TDP pipeline is still used
      • Handling of statistics and semantic types => only simple functions covered
      • Function completeness => start with at least Uppercase TCK function


        Issue Links



              Unassigned Unassigned
              odubois Olivier Dubois
              0 Vote for this issue
              4 Start watching this issue