Details
-
Work Item
-
Status: closed
-
Minor
-
Resolution: Done
-
None
-
All
-
DGA Sprint 6 (10th of May), DGA Sprint 8 (4 to 21 June), DGA Sprint 9 (25/6 to 12/7)
-
GreenHopper Ranking:0|i1zg63:
-
9223372036854775807
-
Small
-
3
Description
As a Data Preparation user, in Runtime Convergence mode,
I want to run my preparation on any compatible type of dataset as source and destination, not only S3
In order to benefit from dataset common features (e.g. output connectivity, Quality, Trust Score, reuse it as input of a preparation or a pipeline)
[Backend API part: Build full run pipeline Input/Output]
Objective: Unblock the limitation to S3 dataset as input and output configuration by using TCK Proxy
Why?
In Track 1, the pipeline is built and run using the Data Prep processor & preparation definition. It is therefore not using the TCK function on the Remote Engine, but the Data Preparation code.
We need to switch to the new approach for the available function(s) (starting with Uppercase) and build the pipeline.
This step is to configure TCK I/O (input and output) based on the dataset information.
How?
- call TCK proxy to build TCK config from dataset and params (TFD-12360)
- input TCK from input dataset
- output TCK from dataset parameters:
- to ensure compatibility with current frontend implementation (track 1 with simple button): keep current behavior if no datasetId is provided in API call (hardcoded S3 output dataset)
- if datasetId provided in API call, use it for output
- integrate input + TCK config (DataPrepProcessor) + output to call Pipeline API
Technical information
In order to create TCK input/output we will need to use two tacokit proxy endpoint:
- Get the plugin information to use: https://apid.eu.cloud.talend.com/apis/b5284c2b-ea96-43d9-83d3-5495d38c66db/resources/~2Ftacokit~2F%7BengineId%7D~2Fcomponents~2Fform~2Fconfiguration~2Fplugin~2F%7BdatasetType%7D/operations/GET
- Configure the plugin with our current data: https://apid.eu.cloud.talend.com/apis/b5284c2b-ea96-43d9-83d3-5495d38c66db/resources/~2Ftacokit~2F%7BengineId%7D~2Fcomponents~2Fform~2Fconfiguration/operations/PATCH
- Use TCK proxy model (POJO)
Acceptance criteria:
Scenario 1: source dataset different than S3 (with UI)
Given a tenant with Runtime Convergence activated,
a dataset dataset1 of type different than S3 (ex: local connection)
and a S3 dataset dataset2 created with name DATASET_OUTPUT_FULLRUN
and an empty preparation based on dataset1
When user presses the "Export" button
Then the preparation result is exported to dataset2 DATASET_OUTPUT_FULLRUN (track 1 - TDP-9581)
Scenario 2: destination dataset different than S3 (with UI)
Given a tenant with Runtime Convergence activated,
a dataset dataset1 of type different than S3 (ex: local connection)
and a dataset dataset2 created with name DATASET_OUTPUT_FULLRUN and of type different than S3 (ex: local connection)
and an empty preparation based on dataset1
When user presses the "Export" button
Then the preparation result is exported to dataset2 DATASET_OUTPUT_FULLRUN (track 1 - TDP-9581)
Scenario 3: destination dataset different than S3 (with API)
Given a tenant with Runtime Convergence activated,
two datasets dataset1 & dataset2 of type different than S3 (ex: local connection)
and an empty preparation based on dataset1
When we call the API to launch a full run on the preparation with dataset2 as destination
/transform/preparations/{preparationId}/runs
Then the preparation result is exported to dataset2
Out of scope:
- Mapping of data prep steps to TCK config =>
TDP-9927. For the moment, keep using DataPrepProcessor for the run (track 1) => limited to empty preparation or preparation with Uppercase function.