With the integration of Blueshift and Databricks, you can import catalogs into your Blueshift account very easily. For information about data types, catalog attributes, and data formats, see Import catalog data.

Prerequisites

Before you can import data from Databricks you must set up integration of Blueshift with Databricks and configure at least one adapter.

Set up an import task

You can import catalogs, events, and customer data using Databricks. The starting point for your import tasks depends on the type of data you are importing.

To set up a task to import catalog data from Databricks, complete the following steps:

  1. Go to Catalog in the left navigation. Click +CATALOG.
  2. Select Databricks as the Source.
  3. Add a Name for the task. The import task form opens.
     

    databricks_catalog_main.png

  4. In the Destination section, you can see the type of data being imported as Products.
  5. Specify a name for the catalog.
  6. Set up Notification Preferences to send a notification email to the specified email addresses when there is a change in the status of the task or when a certain percentage of records fail during import.
  7. In the Source section, select the adapter that you want to use for the import task.
     

    databricks_catalog_source.png

  8. For Import From, select either Table or View. If your data is spread across multiple tables, it is recommended that you provide a View.
  9. Select the Table or the View from which the data is to be imported.
  10. Sample data consisting of 10 records is fetched from the table or view specified in the Source section. This data is displayed in the Configuration section.
     

    databricks_catalog_fetchdata.png

  11. Map the fields from the imported data to the fields in Blueshift and specify the data type for the field.
    • For catalog data: a column from the source data must be mapped to each of the following product attributes in Blueshift: item_id, item_titleitem_url, main_image and category.
    • The Source Attribute Name is the attribute in Databricks and the Destination Attribute Name is the attribute in Blueshift.
       

      databricks_catalog_mapdata.png

  12. In case the sample data does not contain all the available fields, Add more fields to the data mapping.
  13. For catalogs you must Map Item Category and Map Item Tags.
     

    databricks_category_map2.png

    • Use the Split a field option for Category and Location Item tags if the hierarchy for these is captured in a single field. For example, "Travel > Europe > Italy". If you select the Split a field option, you must select the correct incoming attribute header and then select the appropriate delimiter. In this example, the delimiter is ">".
    • If the category or tag hierarchy is captured in more than one field in the incoming file, use the Select Field(s) option to select multiple headers. Ensure that each header is a single string and not a delimited value.
  14. Click Test Run to test the mapping. A maximum of 10 records are fetched during this test run.
     

    databricks_catalog_testrun.png

  15. Verify that the data mapping is done correctly. Edit the data mapping if required. Click Test Run again after you make the changes.
  16. For Additional Configurations, select the Type of Import.
    • Select Full Import if you are importing bulk data. For a Full import, the entire data from the selected table or view is imported from Snowflake every time you run the import task.
    • Select Incremental Import to set up an incremental import task. Select the Diff Identification and the Diff Identifier.
       

      databricks_typeofimport.png

  17. In the Schedule section, set the Start Time for the import task.
     

    databricks_schedule.png

  18. To set up a recurring task, under Schedule select the Is it a recurring data import? option.
  19. Set the frequency using the Task executes every field. You can set the frequency in minutes, hours, days, weeks, or months. You cannot set an import frequency shorter than a day for a Full import.
  20. Click Save to save the task.
  21. Click Launch to run the import task.

Import task status

The index page for catalog import indicates the status for the catalog import task as either Draft, Launched, Paused, or Completed. For more information, see View catalog upload status.

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.