Technical jargon can be overwhelming, especially acronyms. Everyone at Socrata wants you to be successful in your endeavors to make data accessible, more understandable, and more actionable by everyone who wants to know more about their government and communities. To that end, we've created a glossary of terms that you might hear, or read about, during your journey with Socrata's products. This glossary is meant to be a living document so if you have suggestions for words or acronyms to include, drop us an email note at support@socrata.com .

To begin using the glossary, click on a letter below to jump to that particular section. If you'd like, you can also scroll through the list to see what you can discover!

A - B - C - D - E - F - G -H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z


- A -

API - An API (Application Programming Interface), at its most basic level, allows your product or service to talk to other products or services. In this way, an API allows you to open up data and functionality to other developers and to other businesses.


- B -

Bar graph - A bar graph or chart uses horizontal or vertical bars whose lengths proportionally represent values in a dataset. A chart with vertical bars is also called a column graph or chart.


- C -

Catalog - A catalog represents a collection of assets that are grouped into categories. Catalogs organize assets to make it easier to navigate to the information needed.

CSV - CSV (comma separated values) file is a specially formatted, plain text file which stores spreadsheet or basic tabular information in a very simple format, with one record on each line, and each field within that record separated by a comma.


- D -

Data - Facts and statistics collected together for reference or analysis

Dataset - A dataset is an organized collection of data. The most basic representation of a dataset is data elements presented in tabular form. Each column represents a particular variable. Each row corresponds to a given value of that column’s variable. A dataset may also present information in a variety of non-tabular formats, such as an extended mark-up language (XML) file, a geospatial data file, or an image file.

DataSync - DataSync is Socrata’s free, simple, and powerful publishing tool that allows users to schedule and automate their data updates and upload large data files.

DNS - Domain Name System (DNS) The friendly naming system for giving addresses to web servers and webpages.

Data asset - Within a Socrata catalog, assets may include raw, source and federated datasets, as well as visualizations (table, chart, map, data lens, or story).

Data automation - The use of automatic processes, equipment, or systems for the purpose of collecting, processing, storing, transmitting, and presenting of data. Pertains especially to the use of computers and peripheral devices.

Data cleaning - Data cleaning, also called data checking or data validation, is the process b which missing , erroneous, or invalid data are determined and cleaned, or removed, from a dataset and follows the data preparation process.

Data management - Administrative process by which the required data is acquired, validated, stored, protected, and processed, and by which its accessibility, reliability, and timeliness is ensured to satisfy the needs of the data user.

Data preparation - The process by which data are readied for analysis and includes formatting, or normalizing, of values in a dataset.

Data provisioning - The process of making data available in an orderly and secure way to users, application developers, and applications that need it.

Data transformation - The process of converting data or information from one format to another, usually from the format of a source system into the required format of a new destination system.

Data transparency - The ability to easily access and work with data no matter where it is located or what application created it. The assurance that data being reported are accurate and are coming from the official source.

Data type - A classification identifying one of various types of data, such as real, integer or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of that type can be stored.

Data visualization - The process by which data are visualized, or presented, after the data cleaning process, and involves making choices about which data will be visualized, how data will be visualized, and what message will be shared with the target audience of the visualization. The end result may be referred to as data visualization.

Dependable variable - A type of variable whose value is determined by, or depends on, another variable.

Domain - A group of computers and devices on a network that are administered as a unit with common rules and procedures. Within the Internet, domains are defined by the IP address. All devices sharing a common part of the IP address are said to be in the same domain.


- E -

Egress - Exporting or downloading data

Esri - Environmental Systems Research Institute (Esri) is an international supplier of Geographic Information System (GIS) software, web GIS and geodatabase management applications.

ETL - ETL is short for extract, transform, load, three database functions that are combined into one tool to pull data out of one database and place it into another database.


- F -

Federated dataset - dataset federation is the ability to have datasets published on different Socrata sites or through another service, brought together into a public Socrata catalog.

Filter - A programmatic way to narrow a search using specified conditions.

Flat file - A file of data that does not contain links to other files or is anon-relational database.

FME - The Feature Manipulation Engine (FME) is a platform that streamlines the translation of spatial data between geometric and digital formats. It is intended especially for use with geographic information system (GIS), computer-aided design (CAD) and raster graphics software. The FME is useful in applications involving interactive geographical, geological and topographical mapping, such as Google Earth and MapQuest. It facilitates the transformation of spatial data into a variety of formats, data models and repositories for transmission to end users. This process is called spatial extract, transform and load (spatial ETL).

Front-end check - A type of data validation for datasets gathered electronically, and is performed at the front end, or before data are stored in an electronic database.


- G -

Geospatial - Used to indicate that data that has a geographic component to it. This means that the records in a dataset have locational information tied to them such as geographic data in the form of coordinates, address, city, or ZIP code. GIS data is a form of geospatial data. Other geospatial data can originate from GPS data, satellite imagery, and geotagging.

GIS - Refers to a system where geographic information is stored in layers and integrated with geographic software programs so that spatial information can be created, stored, manipulated, analyzed, and visualized (mapped).

GTFS - General Transit Feed Specification (GTFS) defines a common format for public transportation schedules and associated geographic information.


- H -

Heat map - A graph that uses bars to represent proportionally a continuous variable according to how frequently the values occur within a dataset.


- I -

Ingress - Uploading and/or updating data


- J -

JSON -JavaScript Object Notation (JSON) is syntax for storing and exchanging text information. Much like XML.


- K -

KML File - Keyhole Markup Language (KML) is an XML grammar and file format for modeling and storing geographic features such as points, lines, images, polygons, and models for display.

KMZ file - KML files are very often distributed in KMZ files, which are zipped files with a “.KMZ” extension. When a KMZ file is unzipped, a single “doc.kml” is found along with any overlay and icon images referenced in the KML and any network-linked KML files.

- L -



- M -

Median - A type of average, or measure of central tendency, in which the middle of a dataset is determined by arranging its numeric values in order.

Metadata - Metadata describes a number of characteristics or attributes of data; that is, “data that describes data”. (ISO 11179-3). For any particular datum, the metadata may describe how the datum is represented, ranges of acceptable values, its label, and its relationship to other data. Metadata also may provide other relevant information, such as the responsible steward, associated laws and regulations, and the access management policy. The metadata for structured data objects describes the structure, data elements, interrelationships, and other characteristics of information, including its creation, disposition, access and handling controls, formats, content, and context, as well as related audit trails.

Mode - a numeric value that appears most often in a dataset.


- N -



- O -

OData -Open Data Protocol (OData) is a RESTful data access protocol initially defined by Microsoft

Open Data - Making data that belongs to the public broadly accessible and usable by humans and machines, free of any constraints.

Outlier - An extremely high or extremely low numeric value that lies outside the distribution of most of the values in a dataset.


- P -

PDI - Pentaho Data Integration (PDI, also called Kettle) is the component of Pentaho responsible for the Extract, Transform and Load (ETL) processes. Though ETL tools are most frequently used in data warehouses environments, PDI can also be used for other purposes such as Migrating data between applications or databases, exporting data from databases to flat files, loading data massively into databases, data cleansing and integrating applications.

PDI - Police Data Initiative (PDI) is a White House initiative promoting the use of open data and technology to improve law enforcement practices and outcomes.

Pie Chart - A circular graph divided into sectors, each with an area relative to the whole circle, and is used to represent the frequency of values in a dataset.

Publishing Workflow - The activities and processes that lead to the publication of data, associated metatdata and accompanying documentation.


- Q -

Qualitative data - A type of data that describes the qualities or attributes of something using words or other non-numeric symbols.

Quantitative data - A type of data that quantifies or measures something using numeric values


- R -

RDF - Resource Description Framework(RDF) is a standard model for data interchange on the Web.

Range - A range is determined by taking the difference between the highest and lowest numeric values in a dataset.

Raw data - Refers to data that have only been collected, not manipulated or analyzed, from a source.

REST API - The representational state transfer (REST) architectural style describes six constraints: uniform interface, stateless, cacheable, client-server, layered system, and code on demand (optional).

RSS - With Rich Site Summary (RSS) it is possible to distribute up-to-date Web content from one website to thousands of other websites around the world.


- S -

SaaS - Software as a Service (SaaS), is a software licensing and delivery model in which software is licensed ona subscription basis and is centrally hosted.

SDP - Socrata Data Player (SDP) is a feature of the Socrata Platform which allows audiences to republish data on the web as an Embed. When hosted on webpages, these embeds serve as extensions of the portal - one can interact with the data and, the content updates dynamically when the underlying data is updated.

SEO - Search Engine Optimization (SEO) is the process of getting traffic from the “free,” “organic,” “editorial” or “natural” listings on search engines, like Google, Yahoo, and Bing.

SSL - Secure Sockets Layer (SSL) is a standard security technology for establishing an encrypted link between a server and a client - typically a server (website) and browser. The link ensures that all data passed between the web server and browsers remain private and integral.

Scatterplot - Uses plotted points (that are not connected by a line) to represent values of a dataset with one or more dependent variables and one independent variable.

Shapefile - A digital vector (non-topological) storage format for storing geometric location and associated attribute information. Shapefiles can support point, line, and area features.

SODA - Socrata Open Data API

Source System - A System of Record (SOR) or Source System of Record (SSOR) is a data management term for an information storage system (commonly implemented on a computer system) that is the authoritative data source for a given data element or piece of information.

Spatial locations - Describes where something (such as a collection) is physically located, using geospatial coordinates such as latitude and longitude.

SQL query - Structured Query Language (SQL) is a special-purpose programming language responsible for querying and editing information stored in a certain database management system.

Stacked bar graph - A type of bar graph whose bars are divided into sub-sections, each of which proportionally represent categories of data in a dataset that can be stacked together to form a larger category.

Standard deviation - A measure of how much the values in a dataset vary, or deviate, from the arithmetic mean by taking the square root of the variance.

Story -The public facing component of the Socrata Tool "Perspectives" used to create data driven narrative


- T -


- U -

UI -User Interface (UI) is everything designed into an information device with which a human being may interact- including display screen, appearance of a website, help messages, and how an application program or a website invites interaction and responds to it.


- V -

Variance - A measure of how spread out the numeric values in a dataset are, or how much the values vary, from the arithmetic mean.


- W -



- X -

XLS - The original file extension used for Microsoft Excel spreadsheets.

XML - XML (extensible markup language) is a simple, very flexible text format derived from SGML (standardized general markup language). XML is Often a Complement to HTML. In many HTML applications, XML is used to store or transport data, while HTML is used to format and display the same data.


- Y -



- Z -




