Catalogs - from Databricks

With the integration of Blueshift and Databricks, you can import catalogs into your Blueshift account very easily. For information about data types, catalog attributes, and data formats, see Import catalog data.

Prerequisites

Before you can import data from Databricks you must set up integration of Blueshift with Databricks and configure at least one adapter.

Set up an import task

You can import catalogs, events, and customer data using Databricks. The starting point for your import tasks depends on the type of data you are importing.

To set up a task to import catalog data from Databricks, complete the following steps:

Go to Catalog in the left navigation. Click +CATALOG.
Select Databricks as the Source.
Add a Name for the task. The import task form opens.
In the Destination section, you can see the type of data being imported as Products.
Specify a name for the catalog.
Set up Notification Preferences to send a notification email to the specified email addresses when there is a change in the status of the task or when a certain percentage of records fail during import.
In the Source section, select the adapter that you want to use for the import task.
For Import From, select either Table or View. If your data is spread across multiple tables, it is recommended that you provide a View.
Select the Table or the View from which the data is to be imported.
Sample data consisting of 10 records is fetched from the table or view specified in the Source section. This data is displayed in the Configuration section.
Map the fields from the imported data to the fields in Blueshift and specify the data type for the field.
- For catalog data: a column from the source data must be mapped to each of the following product attributes in Blueshift: item_id, item_title, item_url, main_image and category.
- The Source Attribute Name is the attribute in Databricks and the Destination Attribute Name is the attribute in Blueshift.
In case the sample data does not contain all the available fields, Add more fields to the data mapping.
For catalogs you must Map Item Category and Map Item Tags.
- Use the Split a field option for Category and Location Item tags if the hierarchy for these is captured in a single field. For example, "Travel > Europe > Italy". If you select the Split a field option, you must select the correct incoming attribute header and then select the appropriate delimiter. In this example, the delimiter is ">".
- If the category or tag hierarchy is captured in more than one field in the incoming file, use the Select Field(s) option to select multiple headers. Ensure that each header is a single string and not a delimited value.
Click Test Run to test the mapping. A maximum of 10 records are fetched during this test run.
Verify that the data mapping is done correctly. Edit the data mapping if required. Click Test Run again after you make the changes.
For Additional Configurations, select the Type of Import.
- Select Full Import if you are importing bulk data. For a Full import, the entire data from the selected table or view is imported from Snowflake every time you run the import task.
- Select Incremental Import to set up an incremental import task. Select the Diff Identification and the Diff Identifier.

Scheduling and launching the import task

Select the Start Date using the date picker.
Check 'Is it a recurring data import?' to enable recurring imports.
Choose when the task ends:
- 'Never' for an indefinite schedule.
- 'At some time' to set an End Date.
Set the execution frequency (e.g., every 15 minutes).
- Scheduling options: Minutes, Hourly, Daily, Weekly, and Monthly.

Review the setup and the top right corner of the screen:
- Click the Save button to save the task.
- Click the Launch button to start the task.

After the customer list has been uploaded, you will receive an email confirmation. The email includes information for both processed and failed records.

Import task status

The index page for catalog import indicates the status for the catalog import task as either Draft, Launched, Paused, or Completed. For more information, see View catalog upload status.

Catalogs - from Databricks

Prerequisites

Set up an import task

Scheduling and launching the import task

Import task status

Comments