Uploaded image for project: 'Talend Data Quality'
  1. Talend Data Quality
  2. TDQ-18049

support feature importance viz on Databricks(Azure/AWS) and HDInsight(Azure) for tMatchModel

Apply templateInsert Lucidchart Diagram
    XMLWordPrintable

Details

    • All
    • DQ20 CN 5
    • Small

    Description

      what is working

      Storage Spark Framework Result
      Azure Storage local spark mode
      Azure Storage HDInsight
      Amazon S3 bucket local spark mode

      what is not working

      Storage Spark Framework Result
      Azure Storage Azure Databricks issue2
      Amazon S3 bucket AWS Databricks issue3
      Amazon S3 bucket AWS EMR issue4 will be fixed in TDQ-18366
      HDFS local spark mode issue5 will be fixed in TDQ-18367
      HDFS HDP issue5 will be fixed in TDQ-18367
      HDFS CDH issue5 will be fixed in TDQ-18367

      we want to support feature importance viz on the cluster for tMatchModel, the following problems need be fixed
      issue4 don’t support to store on S3 bucket if running on Amazon EMR because the job can’t run well on the EMR cluster
      issue5 test on HDFS+local spark mode/real cluster(HDP) after TDQ-18063 done
      more information see sub-tasks

      Acceptance Criteria

      In ALL the following scenarios, download the pdf file and check that it can be read correctly and that the content is the one expected (same as in local mode).

      • Scenario 1 Azure Storage
        Given available Azure storage (need an Azure account first)
        When run the tMatchModel job on Azure Databricks, "Model location" and "Model explanation" are enabled
        Then the job run successfully and generate the feature importance viz PDF file in the assigned location, an image in the PDF file
      • Scenario 2 Azure Storage
        Given available Azure storage (need an Azure account first)
        When run the tMatchModel job on Azure Databricks, "Model location" is enabled but "Model explanation" is NOT enabled
        Then the job run successfully and NO the feature importance viz PDF file generated in the assigned location
      • Scenario 3 Amazon S3
        Given available S3 bucket (need an Amazon account first)
        When run the tMatchModel job on AWS Databricks, "Model location" and "Model explanation" are enabled
        Then the job run successfully and generate the feature importance viz PDF file in the assigned location, an image in the PDF file
      • Scenario 4 Amazon S3
        Given available S3 bucket (need an Amazon account first)
        When run the tMatchModel job on AWS Databricks "Model location" is enabled but "Model explanation" is NOT enabled
        Then the job run successfully and NO the feature importance viz PDF file generated in the assigned location

      Attachments

        Issue Links

          Activity

            People

              xqliu liu xinquan
              xqliu liu xinquan
              liu xinquan, yunjie gao
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 0 minutes
                  0m
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 2 weeks
                  2w