With the integration of Blueshift and Databricks, you can import customer data into your Blueshift account very easily and build a 360 degree view of all your customers.
For information about data types, user attributes, and data formats, see Import customer data.
Prerequisites
Before you can import data from Databricks, you must set up integration of Blueshift with Databricks and configure at least one adapter.
Set up an import task
You can import catalogs, events, and customer data using Databricks. The starting point for your import tasks depends on the type of data you are importing.
To set up a task to import customer data from Databricks, complete the following steps:
- Go to the Customer Data > Profiles in the left navigation. Click +CUSTOMERS > Import Customers.
- Select Databricks as the Source.
- Add a Name for the task. The import task form opens.
- In the Destination section, you can see the type of data being imported as Customers.
- Set up Notification Preferences to send a notification email to the specified email addresses when there is a change in the status of the task or when a certain percentage of records fail during import.
- In the Source section, select the adapter that you want to use for the import task.
- For Import From, select either Table or View. If your data is spread across multiple tables, it is recommended that you provide a View.
- Select the Table or the View from which the data is to be imported.
- Sample data consisting of 10 records is fetched from the table or view specified in the Source section. This data is displayed in the Configuration section.
-
Map the fields from the imported data to the fields in Blueshift and specify the data type for the field.
- For customer data, one column must be mapped to either of the following customer identifiers in Blueshift: email, retailer_customer_id, customer_id, device_id, or cookie.
- The Source Attribute Name is the attribute in Databricks and the Destination Attribute Name is the attribute in Blueshift.
Note: You cannot change the data type for a custom attribute once it is imported.
- In case the sample data does not contain all the available fields, Add more fields to the data mapping.
- Click Test Run to test the mapping. A maximum of 10 records are fetched during this test run.
- Verify that the data mapping is done correctly. Edit the data mapping if required. Click Test Run again after you make the changes.
In the Additional Configuration section, set up the import type and preferences for customer profile updates.
Type of import
- Full import: Select this option if you are importing bulk data. In a full Import, all the data from the selected table or view is imported from Databricks every time you run the import task.
-
Incremental import: Select this option to set up an incremental import task, which only imports new or updated records since the last import. For incremental import, specify the following:
- Diff identification: Choose how the system will identify new or updated records (e.g., based on time or a unique identifier).
- Diff identifier: Specify the field used as the reference point for identifying changes, such as joined_at.
Import preferences
-
Allow blank user attributes (Enabled by default): Updates
existing profiles
with
blank values from the import data.
Note: Boolean attributes cannot be set to NULL or blank; they must be true or false. -
Update all matching profiles (Disabled by default):
This option is relevant when there are multiple customer profiles in
the system
with the same identifier (e.g., email address). Enabling this option
ensures that
all profiles with the matching identifier are updated.
- If multiple profiles share the same identifier (e.g., jdoe@example.com linked to Jane Doe and John Doe), all matching profiles will be updated.
- If there is only one profile matching the identifier, this option does not provide any additional functionality, and is effectively redundant.
- If none of the identifiers in the import match an existing profile, a new profile will be created regardless of whether 'Update all matching profiles' is selected.
Existing profiles: Jane Doe (jdoe@example.com) and John Doe (jdoe@example.com)
Import file includes: Updates to jdoe@example.com
Result: Both Jane's and John's profiles are updated. -
Only update existing customer profiles (Disabled by
default): Updates only profiles that already exist in the system. If
an identifier
from the
import file doesn’t exist in the system,
no new profile will be created.
For example:
Existing profiles: john@example.com and jane@example.com
Import file includes: john@example.com, jane@example.com, and bob@example.com
Result: John and Jane’s profiles are updated, but no profile is created for Bob.
- Click Continue to proceed to the next step.
- In the Schedule section, set the Start Time for the import task.
- To set up a recurring task, under Schedule select the Is it a recurring data import? option.
- Set the frequency using the Task executes every field. You can set the frequency in minutes, hours, days, weeks, or months. You cannot set an import frequency shorter than a day for a Full import.
- Click Save to save the task.
- Click Launch to run the import task.
Import task status
The index page for Customer imports shows the status for each Databricks task with an overall status that identifies which state the task is in. For more information, see Customer Import Tasks.
Comments
0 comments