Technical jargon can be overwhelming, especially acronyms. Everyone at Data & Insights wants you to be successful in your endeavors to make data accessible, more understandable, and more actionable by everyone who wants to know more about their government and communities. To that end, we've created a glossary of terms that you might hear, or read about, during your journey with Data & Insights's products. This glossary is meant to be a living document so if you have suggestions for words or acronyms to include, drop us an email note at email@example.com .
To begin using the glossary, click on a letter below to jump to that particular section. If you'd like, you can also scroll through the list to see what you can discover!
- A -
With the Socrata Data Management Experience also comes a new way to manage your dataset: the action bar. The action bar appears on a dataset's Primer page, the data table, working copies, and revision drafts. This bar replaces the tasks once performed in the Manage tab of the data table. It is now easier to create drafts, share with collaborators, and manage your audience.
The Activity Log page gives Publishers and Administrators a view into the activity on their site. It shows records for dataset imports, updates, and recent deletions, in addition to other site updates like user role changes, asset permissions and collaborator updates, and more. This page is a great way to get a sense of what changes are being made on your site.
The central place to control configurations for your Solutions. This is separate from the Administration page within an Enterprise Data Portal or Open Data Portal.
A user role with the most permissions on the platform. They can take actions to enable/disable other users and can control every configurable aspect of a domain.
An API (Application Programming Interface), at its most basic level, allows your product or service to talk to other products or services. In this way, an API allows you to open up data and functionality to other developers and to other businesses.
A configurable setting controlled by Administrators that gates which assets on the platform can have their audience set to Public.
A Dataset, Story, Filtered View, Visualization, etc. that contains or displays data on the platform. Or any item that appears in the Asset Manager.
A tool in the Socrata Data Management Experience that automatically generates code that you can use to automate updates to your dataset. In the first iteration of this tool, Python code is generated; in subsequent iterations, more languages will be added.
- B -
A bar graph or chart uses horizontal or vertical bars whose lengths proportionally represent values in a dataset. A chart with vertical bars is also called a column graph or chart.
- C -
A catalog represents a collection of assets that are grouped into categories. Catalogs organize assets to make it easier to navigate to the information needed.
Criminal Justice Information System is a system that is capable of storing and serving protected justice information
CSV (comma separated values) file is a specially formatted, plain text file which stores spreadsheet or basic tabular information in a very simple format, with one record on each line, and each field within that record separated by a comma.
- D -
Facts and statistics collected together for reference or analysis
Extracting insights and patterns from data stored in the platform. It involves exploratory analysis, statistical analysis, predictive modeling, data mining, text mining, and geospatial analysis to support data-driven decision-making.
The use of automatic processes, equipment, or systems for the purpose of collecting, processing, storing, transmitting, and presentingof data. Pertains especially to the use of computers and peripheraldevices.
Data cleaning, alsocalled data checking or data validation, is the process b which missing , erroneous, or invalid data are determined and cleaned, or removed, from a dataset and follows the data preparation process.
The establishment of policies, processes, and controls to ensure data accuracy, integrity, and compliance with data governance standards.
Administrative process by which the required data is acquired, validated, stored, protected, and processed, and by which its accessibility, reliability, and timeliness is ensured to satisfy the needs of the data user.
The process by which data are readied for analysis and includes formatting, or normalizing, of values in a dataset.
Measures and mechanisms put in place to protect the privacy and security of data, such as role-based access control, data anonymization, and encryption.
The process of making data available in an orderly and secure way to users, application developers, and applications that need it.
The streamlined process of data ingestion, transformation, and publishing to make it easier for organizations to publish and update their data sets.
The accuracy, consistency, and reliability of data.
Data sharing refers to the ability to share data with stakeholders, including executives, the public, cities, counties, states, and federal agencies.
When creating a revision of a dataset, data source is the raw data to be uploaded whether via Gateway Plugin, drag-and-drop or Import from URL.
The process of converting data or information from one format to another, usually from the format of a source system into the required format of a new destination system.
The ability to easily access and work with data no matter where it is located or what application created it. The assurance that data being reported are accurate and are coming from the official source.
A classification identifying one of various types of data, such as real, integer or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of that type can be stored. The available data types on the platform are: Text, Number, Boolean(checkbox), Date & Time, (Multi)Point, (Multi)Line, and (Multi)Polygon.
The process by which data are visualized, or presented, after the data cleaning process, and involves making choices about which data will be visualized, how data will be visualized, and what message will be shared with the target audience of the visualization. The end result may be referred to as data visualization.
A dataset is an organized collection of data. The most basic representation of a dataset is data elements presented in tabular form. Each column represents a particular variable. Each row corresponds to a given value of that column’s variable. A dataset may also present information in a variety of non-tabular formats, such as an extended mark-up language (XML) file, a geospatial data file, or an image file.
This API allows you to update datasets. The API can be used with direct API calls, but it also the API underlying the socrata-py python package that has functions to interact with the API.
DataSync is Data & Insights' free, simple, and powerful publishing tool that allows users to schedule and automate their data updates and upload large data files.
A type of variable whose value is determined by, or depends on, another variable.
Any view that is built upon another "parent" view. One example would be a filtered view of hospital locations that only includes hospitals within the state of Washington because it is derived from the original unfiltered dataset. Visualizations and graphs are also considered to be Derived Views.
Domain Name System (DNS) The friendly naming system for giving addresses to web servers and webpages.
A singular website, on the data platform. This could be an open data site, internal data site, or solution dashboard.
- E -
Exporting or downloading data
Environmental Systems Research Institute (Esri) is an international supplier of Geographic Information System (GIS) software, web GIS and geodatabase management applications.
ETL is short for extract, transform, load, three database functions that are combined into one tool to pull data out of one database and place it into another database.
A platform tool to explore, shape and view tabular data
- F -
Dataset federation is the ability to have datasets that are published on different Data & Insights sites or other services brought together into a public Data & Insights catalog.
A programmatic way to narrow a search using specified conditions.
A file of data that does not contain links to other files, is a non-relational database table. Examples include .csv, tsv or a database view/table.
A way to for display an ephemeral notice in response to user interaction. For example when a user hovers the mouse pointer over a button, a flyout may automatically be shown to indicate what clicking that button will do.
The Feature Manipulation Engine (FME) is a platform that streamlines the translation of spatial data between geometric and digital formats. It is intended especially for use with geographic information system (GIS), computer-aided design (CAD) and raster graphics software. The FME is useful in applications involving interactive geographical, geological and topographical mapping, such as Google Earth and MapQuest. It facilitates the transformation of spatial data into a variety of formats, data models and repositories for transmission to end users. This process is called spatial extract, transform and load (spatial ETL).
AKA 4x4, fxf or unique ID, it is a unique identifier for any asset across the full platform.
- G -
Part of the Gateway data automation structure, the Agent is a .jar file that sits on client side infrastructure. It allows automated or manual data pushes from the client side by storing credentials and plugin configurations.
Part of the Gateway data automation structure, the Plugin is the connection to individual source systems. Once a Gateway Agent is installed, Plugins must be configured to enable data to be pushed from the source to the platform.
Used to indicate that data that has a geographic component to it. This means that the records in a dataset have locational information tied to them such as geographic data in the form of coordinates, address, city, or ZIP code. GIS data is a form of geospatial data. Other geospatial data can originate from GPS data, satellite imagery, and geotagging.
Refers to a system where geographic information is stored in layers and integrated with geographic software programs so that spatial information can be created, stored, manipulated, analyzed, and visualized (mapped).
General Transit Feed Specification (GTFS) defines a common format for public transportation schedules and associated geographic information.
A role on the Enterprise Data Platform that can see assets marked Public and any asset that is explicitly shared with them.
- H -
A heat map is a graphical representation of data in the form of a map where the individual values contained in a matrix are represented as colors.
The platform environment designed to support CJIS and HIPAA use cases
Health Insurance Portability and Accountability Act of 1996 is a federal law that governs the sharing and usage of patient health information.
HyperText Markup Language, the standard markup language used for creating web pages.
- I -
Uploading and/or updating data.
On an Enterprise Data Platform, the audience that can view Internal assets are all site user with a role besides the "Guest" role.
- J -
- K -
Keyhole Markup Language (KML) is an XML grammar and file format for modeling and storing geographic features such as points, lines, images, polygons, and models for display.
KML files are very often distributed in KMZ files, which are zipped files with a “.KMZ” extension. When a KMZ file is unzipped, a single “doc.kml” is found along with any overlay and icon images referenced in the KML and any network-linked KML files.
- L -
- M -
A type of average, or measure of central tendency, in which the middle of a dataset is determined by arranging its numeric values in order.
Metadata describes a number of characteristics or attributes of data; that is, “data that describes data”. (ISO 11179-3). For any particular datum, the metadata may describe how the datum is represented, ranges of acceptable values, its label, and its relationship to other data. Metadata also may provide other relevant information, such as the responsible steward, associated laws and regulations, and the access management policy. The metadata for structured data objects describes the structure, data elements, interrelationships, and other characteristics of information, including its creation, disposition, access and handling controls, formats, content, and context, as well as related audit trails.
A numeric value that appears most often in a dataset.
- N -
- O -
Open Data Protocol (OData) is a RESTful data access protocol initially defined by Microsoft. Check out our OData API!
Making data that belongs to the public broadly accessible and usable by humans and machines, free of any constraints.
An extremely high or extremely low numeric value that lies outside the distribution of most of the values in a dataset.
- P -
The asset that a derived view is based on. Datasets do not have parent datasets, but all derived views do.
Police Data Initiative (PDI) is a White House initiative promoting the use of open data and technology to improve law enforcement practices and outcomes.
Actions a user is allowed to undertake on the platform by controlling their access to activities and tools. Permissions are grouped to form Roles, which are assigned to every user. Admins control this through the Roles & Permissions settings.
Protected Health Information, which is considered sensitive data because it can be used to tie an individual to medical records
A circular graph divided into sectors, each with an area relative to the whole circle, and is used to represent the frequency of values in a dataset.
The activities and processes that lead to the publication of data, associated metatdata and accompanying documentation.
- Q -
A type of data that describes the qualities or attributes of something using words or other non-numeric symbols.
A type of data that quantifies or measures something using numeric values.
- R -
A range is determined by taking the difference between the highest and lowest numeric values in a dataset.
Refers to data that have only been collected, not manipulated or analyzed, from a source.
Resource Description Framework(RDF) is a standard model for data interchange on the Web.
The representational state transfer (REST) architectural style describes six constraints: uniform interface, stateless, cacheable, client-server, layered system, and code on demand (optional).
Making a change to a dataset to update data or change a configuration requires opening a new revision. This allows you to edit the dataset while the existing version is published to its audience.
A collection of permissions that a user can have on the platform. The collection is designed to give a user abilities that are representative of the work they need to do on the platform. Admins control this through the Roles & Permissions settings.
Data & Insights datasets are essentially a collection of rows. Each row can be uniquely designated by its “row identifier”, much like a driver’s license number or social security number identifies an individual. For those familiar with database concepts, they essentially act the same way as primary keys.
With Rich Site Summary (RSS) it is possible to distribute up-to-date Web content from one website to thousands of other websites around the world.
- S -
Software as a Service (SaaS), is a software licensing and delivery model in which software is licensed ona subscription basis and is centrally hosted.
Uses plotted points (that are not connected by a line) to represent values of a dataset with one or more dependent variables and one independent variable.
Datasets with a data source from a Gateway Plugin or Import from URL can be automated to run on a pre-determined schedule as frequently as once per day.
Social Data Player (SDP) is a feature of the Data & Insights Platform which allows audiences to republish data on the web as an Embed. When hosted on webpages, these embeds serve as extensions of the portal - one can interact with the data and, the content updates dynamically when the underlying data is updated.
Search Engine Optimization (SEO) is the process of getting traffic from the “free,” “organic,” “editorial” or “natural” listings on search engines, like Google, Yahoo, and Bing.
A digital vector (non-topological) storage format for storing geometric location and associated attribute information. Shapefiles can support point, line, and area features.
Socrata Open Data API
A System of Record (SOR) or Source System of Record (SSOR) is a data management term for an information storage system (commonly implemented on a computer system) that is the authoritative data source for a given data element or piece of information.
Choropleth or region maps can be created by counting the points that fall within specified polygons. In order to create one of these maps, a Spatial Lens must be created from a dataset that has polygon/multipolygon features by an Administrator
Describes where something (such as a collection) is physically located, using geospatial coordinates such as latitude and longitude.
Structured Query Language (SQL) is a special-purpose programming language responsible for querying and editing information stored in a certain database management system.
Secure Sockets Layer (SSL) is a standard security technology for establishing an encrypted link between a server and a client - typically a server (website) and browser. The link ensures that all data passed between the web server and browsers remain private and integral.
A type of bar graph whose bars are divided into sub-sections, each of which proportionally represent categories of data in a dataset that can be stacked together to form a larger category.
A measure of how much the values in a dataset vary, or deviate, from the arithmetic mean by taking the square root of the variance.
A data storytelling tool that is a standalone asset. It can contain many assets or none at all to tell a story by adding additional context via narrative, image, video or iframe embeds
Datasets on the platform that are created and updated by Tyler to track analytics across the domain.
- T -
On the Enterprise Data Platform, a team is a set of users grouped together for the purposes of sharing and collaboration (e.g., publishing/sharing to a team, adding a team as a collaborator). If there are specific groups of users that will consistently need to be granted the same level of access to particular assets, creating a team is a great way to manage access control securely and efficiently.
TSV stands for Tab Separated Values. TSV files are used for raw data and can be imported into and exported from spreadsheet software. TSV files are essentially text files, and the raw data can be viewed by text editors, though they are often used when moving raw data between spreadsheets.
- U -
User Interface (UI) is everything designed into an information device with which a human being may interact- including display screen, appearance of a website, help messages, and how an application program or a website invites interaction and responds to it.
- V -
A measure of how spread out the numeric values in a dataset are, or how much the values vary, from the arithmetic mean.
- W -
- X -
The original file extension used for Microsoft Excel spreadsheets.
XML (extensible markup language) is a simple, very flexible text format derived from SGML (standardized general markup language). XML is Often a Complement to HTML. In many HTML applications, XML is used to store or transport data, while HTML is used to format and display the same data.
- Y -
- Z -