Uploaded image for project: 'Talend Data Prep'
  1. Talend Data Prep
  2. TDP-9905

[RunConv][Backend] Run a preparation with different dataset types as source/destination

Apply templateInsert Lucidchart Diagram
    XMLWordPrintable

Details

    • All
    • DGA Sprint 6 (10th of May), DGA Sprint 8 (4 to 21 June), DGA Sprint 9 (25/6 to 12/7)
    • GreenHopper Ranking:
      0|i1zg63:
    • 9223372036854775807
    • Small
    • 3

    Description

      As a Data Preparation user, in Runtime Convergence mode,
      I want to run my preparation on any compatible type of dataset as source and destination, not only S3
      In order to benefit from dataset common features (e.g. output connectivity, Quality, Trust Score, reuse it as input of a preparation or a pipeline)

      [Backend API part: Build full run pipeline Input/Output]

      Objective: Unblock the limitation to S3 dataset as input and output configuration by using TCK Proxy

      Why?

      In Track 1, the pipeline is built and run using the Data Prep processor & preparation definition. It is therefore not using the TCK function on the Remote Engine, but the Data Preparation code.
      We need to switch to the new approach for the available function(s) (starting with Uppercase) and build the pipeline.
      This step is to configure TCK I/O (input and output) based on the dataset information.

      How?

      • call TCK proxy to build TCK config from dataset and params (TFD-12360)
        • input TCK from input dataset
        • output TCK from dataset parameters:
          • to ensure compatibility with current frontend implementation (track 1 with simple button): keep current behavior if no datasetId is provided in API call (hardcoded S3 output dataset)
          • if datasetId provided in API call, use it for output
      • integrate input + TCK config (DataPrepProcessor) + output to call Pipeline API

      Technical information

      In order to create TCK input/output we will need to use two tacokit proxy endpoint:

      Acceptance criteria:

      Scenario 1: source dataset different than S3 (with UI)
      Given a tenant with Runtime Convergence activated,
      a dataset dataset1 of type different than S3 (ex: local connection)
      and a S3 dataset dataset2 created with name DATASET_OUTPUT_FULLRUN 
      and an empty preparation based on dataset1
      When user presses the "Export" button
      Then the preparation result is exported to dataset2 DATASET_OUTPUT_FULLRUN (track 1 - TDP-9581)

      Scenario 2: destination dataset different than S3 (with UI)
      Given a tenant with Runtime Convergence activated,
      a dataset dataset1 of type different than S3 (ex: local connection)
      and a dataset dataset2 created with name DATASET_OUTPUT_FULLRUN and of type different than S3 (ex: local connection)
      and an empty preparation based on dataset1
      When user presses the "Export" button
      Then the preparation result is exported to dataset2 DATASET_OUTPUT_FULLRUN (track 1 - TDP-9581)

      Scenario 3: destination dataset different than S3 (with API)
      Given a tenant with Runtime Convergence activated,
      two datasets dataset1 & dataset2 of type different than S3 (ex: local connection)
      and an empty preparation based on dataset1 
      When we call the API to launch a full run on the preparation with dataset2 as destination

      /transform/preparations/{preparationId}/runs

      Then the preparation result is exported to dataset2

      Out of scope:

      • Mapping of data prep steps to TCK config => TDP-9927. For the moment, keep using DataPrepProcessor for the run (track 1) => limited to empty preparation or preparation with Uppercase function.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              odubois Olivier Dubois
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: