COVID-19 - Guide to Automating Data

Data Automation

Data automation is the process of updating data on your open data portal programmatically, rather than manually. Automating the process of data uploading is important for the long-term sustainability of your open data program. Any data that is updated manually risks being delayed because it is one more task an individual has to do as part of the rest of their workload.

There are three common elements to data automation: Extract, Transform, and Load, or ETL.

  • Extract: the process of extracting your data from one or many sources systems
  • Transform: the process of transforming your data into the necessary structure, such as a flat file format like a CSV. This could also include things like changing all state abbreviations to the full state name.
  • Load: the process of loading the data into the final system, in this case the open data portal.

What tools are available for data automation?

Data automation tools vary in which process they are best suited, the technical knowledge needed to use them, and the cost. Below is a table outlining the most common tools used in automation:

 

 

E, T, or L?

Technical Knowledge

Cost

Socrata DataSync

L

Medium

Free

FME

E, T, and L

Medium

Yes

Import from URL E, T, and L Low Free
Socrata Gateway E, T, and L Medium  Free 

Socrata Publishing API

L

High

Free - but will require developer time for custom scripts

 

Socrata Datasync

DataSync is an executable Java application that serves as a general solution to automate publishing data on the Socrata platform. t can be used through an easy-to-use graphical interface or as a command-line tool. DataSync takes a CSV or TSV file on a local machine or networked hard drive and publishes it to a Socrata dataset so that the Socrata dataset stays up-to-date. 

FME

FME is an ETL tool developed by Safe Software that has a built-in Socrata Writer to easily publish data to your Socrata dataset.

Import from URL

Import from URL is an on platform tool that allows you to automate data directly from a URL. The online hosted data must be publically available to use this option.

Socrata Gateway

Socrata Gateway is an easy-to-use on-platform solution that allows you to publish, update, and automate datasets directly from key on-premise and cloud-hosted source systems.

Socrata Publishing API

Socrata also has a number of APIs that can be used to publish and automate data on the platform.

 

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.