Setting Up a DataSync Port Job to Copy a Dataset

Andrew

Last Updated: March 22, 2024 19:12

Update: Starting with DataSync 1.8.1, Port Job functionality uses a faster and more reliable code path. If you encounter issues with Port Jobs using a previous version of Datasync, please update to the newest version.

This article explains how to set up and run a Port Job using the DataSync 1.9.1 interface. Port jobs are used for copying datasets that are already on the Data & Insights platform. Port jobs allow users with publishing rights to copy both dataset schemas (metadata and columns) and data (rows). Please note that port jobs cannot be used to copy views.

First, launch DataSync navigating to the folder containing the DataSync JAR file that you downloaded previously and fill out the Authentication Details section if it is not already filled out.

In the DataSync UI go to File -> New... -> Port Job. This will open up a new Port Job.

To run a Port Job there are 5 sections that need to be configured:

1. Port Method: Choose one of the following:

Copy schema only: This will copy the metadata and columns of the source dataset into a new dataset. No row data is copied over.
Copy schema and data: This copies both the metadata/column info and all row data, effectively making a duplicate of the source dataset.
Copy data only: This copies the row data from the source dataset into the destination dataset. Starting with Datasync version 1.8.1 this option will create a working copy of the dataset. The effect on the destination dataset is determined by the publish Method option below. Please note, this option will only succeed if the schemas of the source and destination dataset agree.

2. Source Domain: The domain to which the source dataset belongs.
3. Source Dataset ID: The dataset identifier of the source dataset.
4. Destination Dataset ID: The dataset identifier of the destination dataset. This is only needed if selecting Copy data only as the PortMethod.
5. Publish Destination Dataset: Only relevant if copying the schema via Copy schema only or Copy schema and data as the PortMethod. Data only jobs will create a working copy. Choose one of the following:

Yes: This will publish the destination dataset to complete the Port Job.
No, create a working copy: This will leave the destination dataset as a working copy.

** Note: The Destination domain will be the domain entered with your user credentials.

** Publish Method: Only relevant if selecting Copy data only as the PortMethod. Choose one of the following:

upsert: This will upsert the data from the source dataset into the destination dataset, updating rows that exist already, inserting those that do not.
replace: This will replace the data in the destination dataset with that in the source dataset.

Once these options are filled out, you can either click Run Job Now to run the currently configured job immediately, or Save Job to save the job for automating.

Video: Set up a Datasync Port Job

Setting Up a DataSync Port Job to Copy a Dataset

Comments

Articles in this section