Overview of Ingress Methods

Introduction

Data & Insights supports a wide variety of methods and tools to import or federate data.  This article summarizes these methods, comparing their ease of use, ability to transform data, and what is needed for each to schedule automatic updates. Some of the tools require users to install additional software on their computers, and some are easier for users who have some developer skills. Below this summary table is a brief description of how to access each tool, accompanied by some links to general supporting articles.

Summary Table

Tool Access Complexity Transformations Schedulable Software Installation Needed Developer Skills Needed
Manual file upload Dataset Management Experience  Low Yes   No   No  None
URL link Dataset Management Experience  Low  Yes   Yes   No  None
Gateway Dataset Management Experience  High  Yes   Yes   Yes  Some
Link to External Source Dataset Management Experience Low   No  No   No   None

Catalog
connector

Admin panel Low  No  Partially  No   None

Esri Connector

Admin panel Low No Partially No None
DataSync Off platform Moderate  No  Off platform

 Yes

 None
API Off platform High  Yes  Off platform  No  Yes
FME Off platform High   Yes   Off platform  Yes  None

 

Access: where the tool can be accessed. "Off platform" indicates the tool is not part of Data & Insights's platform. The Dataset Management Experience is the user interface ingress tool accessed through a dataset primer page on a Data & Insights domain.

Schedulable: means the tool supports scheduled (automated) updates.
Transformation: means the tool supports doing automated data transformation.

 

Useful Publishing Links

Data & Insights  - Data Management Experience

The Data & Insights Data Management Experience is accessed by choosing "Create" on the Data & Insights platform. It provides a graphical user interface with several screens that allow the user to choose a data source, customize data types, set transforms, and create georeferenced columns. This is the easiest and most straightforward tool for data ingress. This is the tool used for the Manual File Upload, Upload via URL, and Link to an External Source.

Data & Insights - Gateway

A gateway is a small Java application that integrates into Data & Insights. A gateway agent is paired with one or more plugins. These source-specific plugins allow access to a wide variety of sources. Once installed on the user's machine or on a server, a gateway plugin is accessed via the user interface in the Data & Insights Data Management Experience. Gateway agents are created and managed through the Data & Insights Administration panel. 

Data & Insights - Catalog Connector

Catalog connectors federate assets to the Data & Insights catalog from a data.json source. They are created and managed through the Data & Insights Administration panel. 

Data & Insights - Esri Connector

The Esri Connector creates an external link to an Esri site, so they show data as linked ESRI maps. They are created and managed through the Data & Insights Administration panel. These maps are easy to create but not very configurable. If you need just the tabular data behind the Esri asset to work within Data & Insights, consider using the Gateway / Esri plugin.

Data & Insights - DataSync

DataSync is a Java application designed to integrate with Data & Insights. Once downloaded, It is accessed through its own simple graphical user interface by opening the application stored in a directory on the user's computer.

Data & Insights - Publisher APIs 

APIs (Application Programming Interfaces) carry requests between applications. Data & Insights's Publisher API's can be used to upload and update data from external sources to Data & Insights datasets.  APIs can also be used in scripts or as building blocks of other programs that use data from Data & Insights.

Additional API Resources

We have Libraries/SDKs and example code in various programming languages that make using our APIs much easier available at: http://dev.socrata.com/libraries

Libraries & SDKs:

(Languages marked with a * have good support for data publishing.)

 

Safe Software - FME

Safe FME is an ETL tool enabling easily creation workflows to extract data from source systems, perform cleanup and transformation, and publish that data to Data & Insights. Its software is separate from Data & Insights.

 

Pros and Cons

Each of these methods have advantages and disadvantages.  Some have constraints, and some shine in certain situations.

Method Pros Cons
Manual File Upload

Very straightforward. User interface is easy to use.

Transforms, geocoding, and datatypes are easy to set in the Dataset Management Experience.

Updates allow for replace or append.

Has a recommended limit of 4 GB. Files larger than 4 GB should be broken up into smaller updates.

Must be done manually - not schedulable. No automation within the platform or with other tools.

Import via URL

Very straightforward. Great way to link to a CSV file hosted publicly online. csv, tsv, xlsx, .zip, any supported tile can be used. 

Transforms, geocoding, and datatypes are easy to set in the Dataset Management Experience.

Schedulable through the Dataset Management Experience.

One of the easiest ways to schedule and automate updates.

Updates allow for replace only.

If copying from a Data & Insights domain, transforms will not copy - only the data.

If the source data schema changes, scheduled updates will fail until schema conflicts are resolved.

Data must be available via public URL.

Link to External Source Creates a link to data stored on other servers through a URL.

Easy to set up through the Data Management Experience.

Multiple sources can be linked on one page.
User is reliant on the external source for data management.

Data is not usable on the Data & Insights platform, because the data lives externally.
Gateway Once scheduling is set up, it will run automatically up to once a day.

Supports a wide variety of data sources, including the US Census.

Connections are created and managed through the Dataset Management Experience.

Transforms, geocoding, and datatypes are easy to set in the Dataset Management Experience.

Users must install software on their own computer or server. 

Requires Java 8  (or newer).

The user environment (source system/machine/network factors) can affect the ease of setup.

The user manages the connection.

Can only replace data; appending data is not currently possible.

Catalog Connector

Connects to data.json sources.


data.json connector is handy for linking to CKAN.

Configured in the domain's administration panel


Limited to monthly or daily update cadence, can't specify the time.

Limited to external link datasets only.


Esri Connector

Connects directly to public Esri sources.

Esri Connectors show data as linked ESRI maps. Maps are created automatically.

Configured in the domain's administration panel

Esri connectors can pull in only limited metadata: title and description.

Esri connectors do not pull in map symbols.

All connected assets are public by default.
Datasync 

Provides a basic user interface command tool.

Used to import CSV files from a computer.  Can import files over 4 GB.

Can be used to replace all rows, append or upsert rows, or to delete rows.

Used for "port jobs" - an easy way to copy Data & Insights datasets within or between domains.

Can be run headlessly.

Free Data & Insights software, but it resides off the platform.

Requires Java 8  (or newer) and installing Datasync, a small java application, on the user's machine.

Must be scheduled by an external tool - Windows task manager, for example.  

No transforms are applied to data imported via DataSync

FME Extracts, transforms, and loads data into a dataset.

Uses a graphical user interface.

Allows for creating intricate transforms and pre-processing of data prior to import. These workflows can be saved and reused.

Software must be purchased and installed on a user's computer.

Requires learning new software.
Publisher APIs Allows for data to be pulled from any source that allows API access.

Allows for filtering in the request to the API, to reduce unwanted data from entering the dataset.

Allows for high customization of an import process, with regards to actions taken as well as the preferred language used.

Can be placed in a larger script for automation, or as part of another application.
Requires some developer knowledge

Steepest learning curve.

 

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.