Details
-
New Feature
-
Status: Done
-
Major
-
Resolution: Fixed
-
None
-
None
-
All
-
GreenHopper Ranking:0|i28uv7:
-
9223372036854775807
-
Small
Description
Description
Goal: to have full tck join processor that is iso with legacy tdp/join.
What should be join behavior
- It has to be a left outer join
- We join only first match
- Match can be checked on several columns:
inputRecord.pivotColumnA == lookup.pivotColumnA && inputRecord.pivotColumnB == lookup.pivotColumnB && inputRecord.pivotColumnC == lookup.pivotColumnC ...
- For a same pivot value, the join should always return the same lookup.
It means that the lookup should be ordered, or at least retrieve values always in the same order=> What about having an option to order lookup on pivotal keys ? Order could be done in join connector, but could have performance issue at load, so I guess should be optional.
- User should be able to define which columns of the lookup he wants to merge in the original record. Those columns should be merged right after the input record pivot column.
- Currently, we can't decide in which position a column should be added
- There is a feature request to support insert in tck/recordBuilder : https://jira.talendforge.org/browse/TCOMP-2008
- FDE: This part not done. To be clarified.
- The current naming collision should be reproduced to keep legacy behavior
- adding _1, _2, ... when collision arrives
Example for naming collision:
- input record schema
["id", "name", "age", "zipcode", "street"]
- lookup record schema
["id", zip", "name", "population", "surface"]
- User configure the join connector:
- Input record pivot: zipcode
- Lookup pivot : zip
- Retrieved columns : name, surface
- Result should be
["id", "name", "age", "zipcode", "name_1", "surface", "street"]
As information
- Dataprep allows to create preparation only flat dataset. Datainventory set a flag in its dataset description flat=true
- In the same way, lookups are also flats
Integration
This connector would be in a first time dedicated for preparation executions. It should not be part of pipeline designer list of available processors.
So, it should be defined as a technical processor with:
@Metadatas({ @Metadata(key = "isTechnical", value = "true") })
Acceptance Criteria
- The processor should result in a Left Outer Join
- Only first match should be joined.
- Processor can define :
- which input columns should match with which lookup columns
- All pairs are linked by a AND relation
- parameters of another Dataset (lookup dataset)
- which lookup columns to merge.
- Columns to merge should be added after the last input record pivot column.
- 1.28.0: Will be added at the end of the record.
- Afterwards: as mentioned initially.
- Columns should be suggested
-> This point is not supported by TDI but by TDS integration of this component.
- Not found lookup elements are set to null.
- Columns to merge should be added after the last input record pivot column.
- which input columns should match with which lookup columns
- Naming collision management for columns:
- adding _1, _2, ... when collision occurs
- The processor should not appear in Pipeline Designer.
- Should be defined as a technical processor.
- User can add a Join processor on pipeline as other dataprep function
Cannot be checked by TDI as long as not integrated by TDP
DoR
Topic | Description | DoR | |
Description | Is the description enough for all stakeholders? | ![]() |
Version scope confirmed for 1.28.0 scope. Topics marked with ![]() |
Acceptance Criteria | Are they defined? Were they validated by PO, Dev and QA? | ![]() |
Version scope confirmed for 1.28.0 scope. Topics marked with ![]() |
Jira information | Is the Jira information correct? (fix version, labels, security level) | ![]() |
|
Environment | Environment ready: need TDP presentation Support SSL: not needed Reachable for QA/Doc/automation(TTP, Junit): not yet |
![]() |
Won't be integrated by TDP during validation. Will require local modifications to test the processor before release. |
License | Is license EE or SE clearly identified ? | ![]() |
|
Technical Analysis | Does the developer understand how it will be implemented? Do we have a solution? Approved/discussed with architecture (in-team, global or security, depending on the scope)? |
![]() |
|
Dependencies | Are all dependencies linked to JIRA (link "depends on")? Are they done? Not done yet Including SRE/Devops/IT |
![]() |
|
Migrations | Is migration needed? no | ![]() |
|
Doc | Is DOCT created and linked under the epic? No available for users no doc needed. | ![]() |
Not for user directly. Will replace current Join from TDP which is already documented in TDP. |
Communication channel | Is slack feat- created? With all the correct owner involved? (QA/Doc/PO/SM) | ![]() |
#feat-runtime-convergence -> Technical #eng-runtime-convergence-sync -> Confirmations |
UX | Are there changes in the UI & were they added in the DOCT? For new forms, was it approved by UX? For new connectors, was a TUX ticket created for a new Icon? the icon already exists |
![]() |
Tck/join form will be done by TDP (as processor replace back end of current Join feature) |
Attachments
Issue Links
- has to be done after
-
TCOMP-2004 [Runtime convergence] New tck/API to retrieve dataset full content
-
- Done
-