- Introduction / Summary Table
- Useful Publishing Links
- Data & Insights - Data Management Experience
- Data & Insights - Gateway
- Data & Insights - Catalog Connector
- Data & Insights - Esri Connector
- Data & Insights - DataSync
- Data & Insights - Publisher APIs
- Safe Software FME
- Pros and Cons
Introduction
Data & Insights supports a wide variety of methods and tools to import or federate data. This article summarizes these methods, comparing their ease of use, ability to transform data, and what is needed for each to schedule automatic updates. Some of the tools require users to install additional software on their computers, and some are easier for users who have some developer skills. Below this summary table is a brief description of how to access each tool, accompanied by some links to general supporting articles.
Summary Table
Tool | Access | Complexity | Transformations | Schedulable | Software Installation Needed | Developer Skills Needed |
Manual file upload | Dataset Management Experience | Low | Yes | No | No | None |
URL link | Dataset Management Experience | Low | Yes | Yes | No | None |
Gateway | Dataset Management Experience | High | Yes | Yes | Yes | Some |
Link to External Source | Dataset Management Experience | Low | No | No | No | None |
Catalog |
Admin panel | Low | No | Partially | No | None |
Esri Connector |
Admin panel | Low | No | Partially | No | None |
DataSync | Off platform | Moderate | No | Off platform |
Yes |
None |
API | Off platform | High | Yes | Off platform | No | Yes |
FME | Off platform | High | Yes | Off platform | Yes | None |
Access: where the tool can be accessed. "Off platform" indicates the tool is not part of Data & Insights's platform. The Dataset Management Experience is the user interface ingress tool accessed through a dataset primer page on a Data & Insights domain.
Schedulable: means the tool supports scheduled (automated) updates.
Transformation: means the tool supports doing automated data transformation.
Useful Publishing Links
Data & Insights - Data Management Experience
The Data & Insights Data Management Experience is accessed by choosing "Create" on the Data & Insights platform. It provides a graphical user interface with several screens that allow the user to choose a data source, customize data types, set transforms, and create georeferenced columns. This is the easiest and most straightforward tool for data ingress. This is the tool used for the Manual File Upload, Upload via URL, and Link to an External Source.
- Using the Data & Insights Data Management Experience
- Data Transformations
- Creating Georeference Columns
- Automate This - Using Python to Automate Data Updates
Data & Insights - Gateway
A gateway is a small Java application that integrates into Data & Insights. A gateway agent is paired with one or more plugins. These source-specific plugins allow access to a wide variety of sources. Once installed on the user's machine or on a server, a gateway plugin is accessed via the user interface in the Data & Insights Data Management Experience. Gateway agents are created and managed through the Data & Insights Administration panel.
- Gateway Overview
- Gateway Scheduling
- Gateway Readme for Windows
- Gateway ReadMe for MacOS
- Data & Insights Gateway: Available Plugins
Data & Insights - Catalog Connector
Catalog connectors federate assets to the Data & Insights catalog from a data.json source. They are created and managed through the Data & Insights Administration panel.
- Esri Asset Federation and Data Connections
- Using the catalog connector to federate and connect external assets
Data & Insights - Esri Connector
The Esri Connector creates an external link to an Esri site, so they show data as linked ESRI maps. They are created and managed through the Data & Insights Administration panel. These maps are easy to create but not very configurable. If you need just the tabular data behind the Esri asset to work within Data & Insights, consider using the Gateway / Esri plugin.
- Esri Connector Creates New Maps
- Esri Asset Federation and Data Connections
- Esri Connector vs Gateway Plugin
Data & Insights - DataSync
DataSync is a Java application designed to integrate with Data & Insights. Once downloaded, It is accessed through its own simple graphical user interface by opening the application stored in a directory on the user's computer.
- DataSync
- Getting started with DataSync
- Scheduling DataSync jobs
- DataSync source code
- File bugs & feature requests for DataSync
Data & Insights - Publisher APIs
APIs (Application Programming Interfaces) carry requests between applications. Data & Insights's Publisher API's can be used to upload and update data from external sources to Data & Insights datasets. APIs can also be used in scripts or as building blocks of other programs that use data from Data & Insights.
- Getting Started with Data Publishing
- Publishing and Managing Data
- Authentication and App Tokens
- Understanding Row Identifiers
Additional API Resources
We have Libraries/SDKs and example code in various programming languages that make using our APIs much easier available at: http://dev.socrata.com/libraries
Libraries & SDKs:
- DataSync SDK (Java) **most recommended**:
http://socrata.github.io/datasync/guides/datasync-library-sdk.html -
Java*: https://github.com/socrata/soda-java
-
Documentation for the soda-java library is here: http://socrata.github.io/soda-java/
-
Some example usage of the library here: https://github.com/socrata/soda-java-examples
-
- .NET* (C#): https://github.com/CityofSantaMonica/SODA.NET
-
JavaScript: https://github.com/socrata/soda-js
- Scala: https://github.com/socrata/soda-scala
(Languages marked with a * have good support for data publishing.)
Safe Software - FME
Safe FME is an ETL tool enabling easily creation workflows to extract data from source systems, perform cleanup and transformation, and publish that data to Data & Insights. Its software is separate from Data & Insights.
- Using FME to publish data to Data & Insights
- Safe's write-up/video on the FME Data & Insights Writer
- Getting started with FME
- FME Documentation - FMEpedia
- FME Support
- Handling Computed Columns with FME
- Working with Point Data with FME Data & Insights Writer
- Guide to using Automate This! with FME
Pros and Cons
Each of these methods have advantages and disadvantages. Some have constraints, and some shine in certain situations.
Method | Pros | Cons |
Manual File Upload |
Very straightforward. User interface is easy to use. Transforms, geocoding, and datatypes are easy to set in the Dataset Management Experience. |
Has a recommended limit of 4 GB. Files larger than 4 GB should be broken up into smaller updates. Must be done manually - not schedulable. No automation within the platform or with other tools. |
Import via URL |
Very straightforward. Great way to link to a CSV file hosted publicly online. csv, tsv, xlsx, .zip, any supported tile can be used. Transforms, geocoding, and datatypes are easy to set in the Dataset Management Experience. Schedulable through the Dataset Management Experience. |
Updates allow for replace only. If copying from a Data & Insights domain, transforms will not copy - only the data. If the source data schema changes, scheduled updates will fail until schema conflicts are resolved. Data must be available via public URL. |
Link to External Source | Creates a link to data stored on other servers through a URL. Easy to set up through the Data Management Experience. Multiple sources can be linked on one page. |
User is reliant on the external source for data management. Data is not usable on the Data & Insights platform, because the data lives externally. |
Gateway | Once scheduling is set up, it will run automatically up to once a day. Supports a wide variety of data sources, including the US Census. Connections are created and managed through the Dataset Management Experience. Transforms, geocoding, and datatypes are easy to set in the Dataset Management Experience. |
Users must install software on their own computer or server. Requires Java 8 (or newer). The user environment (source system/machine/network factors) can affect the ease of setup. The user manages the connection. |
Catalog Connector |
Connects to data.json sources.
|
Limited to monthly or daily update cadence, can't specify the time. Limited to external link datasets only. |
Esri Connector |
Connects directly to public Esri sources. |
Esri connectors can pull in only limited metadata: title and description. Esri connectors do not pull in map symbols. All connected assets are public by default. |
Datasync |
Provides a basic user interface command tool. |
Free Data & Insights software, but it resides off the platform. Requires Java 8 (or newer) and installing Datasync, a small java application, on the user's machine. Must be scheduled by an external tool - Windows task manager, for example. |
FME | Extracts, transforms, and loads data into a dataset. Uses a graphical user interface. Allows for creating intricate transforms and pre-processing of data prior to import. These workflows can be saved and reused. |
Software must be purchased and installed on a user's computer. Requires learning new software. |
Publisher APIs |
Allows for data to be pulled from any source that allows API access. Allows for filtering in the request to the API, to reduce unwanted data from entering the dataset. Allows for high customization of an import process, with regards to actions taken as well as the preferred language used. Can be placed in a larger script for automation, or as part of another application. |
Requires some developer knowledge Steepest learning curve. |
Comments
Article is closed for comments.