Socrata Gateway Technical Overview and Requirements

Socrata Gateway is an easy-to-use solution for database analysts and data publishers within the customer's organization to facilitate the ingress and scheduled updates of data from key on-prem and cloud-hosted source systems.

Terminology

  • Agent: An agent is a thin Java client that lives behind the user’s firewall that connects via WebSockets to Socrata backend systems. The client jar is available to download from the Dataset Management Page on Socrata and requires correct user authentication and correct permissions.
  • Plugin: A plugin is an extension to the agent that connects to specific data sources. For example, a Microsoft SQL Server plugin will contain the necessary drivers to connect to Microsoft SQL Server and can be configured to connect to a specific server and receive input from the agent owner on Socrata. In addition to reading local files and databases, plugins will be enabled to connect to API endpoints from various sources. Plugins are authored and released by Socrata and Tyler employees.

Hardware Specs

  • 16 GB RAM
  • 5 GB HD
  • The agent itself is ~15mb unzipped. Plugins vary between 10mb and 30mb. Data processing requirements will vary. 

Supported OSs

  • Windows Server R2008
  • Windows 10
  • macOS Sierra +
  • Ubuntu 16, 18 

Software

 Networking

  • The customer Socrata domain is the only required whitelisted URL.

 Proxies

  • Proxies are supported natively. The agent will use the default system proxy that has been configured.

 SSL

 Internet Access

  • Outbound internet access via Port 443 required
  • Websocket and HTTPS communication, over SSL will be used for network communication

Security

All the credentials needed to access the on-premise source system, which resides in the configuration file of the agent, resides behind the firewall and does not get exposed to the Socrata platform.

The agent and its config file sit behind the user's firewall. The agent differs in that, it can talk to a user’s source systems. Only users with an account on Socrata with the correct access permissions can download the Gateway agent. Since the client needs to be configured to talk to the user’s source systems, users with special roles (eg: Data Administrators) will have access.

There is a change to code-serving, as the agent and plugins are dynamically loaded by the agent. Risks are mitigated by checking the Sha256 digest of the reportedly released version against the Sha of the version that was actually downloaded. If an attacker gained access to the S3 bucket where the agent or plugin .jar files reside and overwrote the actual version with theirs, the agent would reject the downloaded version. In order to release code, the attacker would need write access to the S3 bucket in addition to write access to the DSMAPI database, which are controlled by different keys and are on different networks. This is strictly more secure than the current release process of Datasync, which requires the user manually check the digest of the .jar file they download from the Socrata site, which people rarely, if ever, verify.

There is a change to the security attack surface, in that, if you are logged in and know the unique identifier of the agent, you could theoretically get it to a dataset that you have access to. This is mitigated by prohibiting agents to send data across domains, as well as making the agent identifier a UUID which is not possible to guess.

Architecture Diagram:

ArchDiag.png

Permissions Requirements:

  • Windows: Administrator rights to set up a Windows Service
  • macOS: Administrator rights to configure service
  • Linux: Sudo access to configure service

Misc:

  • The installation file cannot be placed in a networked drive
  • Updates happen on an ad hoc basis in accordance with our bi-weekly release cycle.
Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.