Technical jargon can be overwhelming, especially acronyms. Everyone at Data & Insights wants you to be successful in your endeavors to make data accessible, more understandable, and more actionable by everyone who wants to know more about their government and communities. To that end, we've created a glossary of terms that you might hear, or read about, during your journey with Data & Insights's products. This glossary is meant to be a living document so if you have suggestions for words or acronyms to include, drop us an email note at email@example.com .
To begin using the glossary, click on a letter below to jump to that particular section. If you'd like, you can also scroll through the list to see what you can discover!
- A -
With the Socrata Data Management Experience also comes a new way to manage your dataset: the action bar. The action bar appears on a dataset's Primer page, the data table, working copies, and revision drafts. This bar replaces the tasks once performed in the Manage tab of the data table. It is now easier to create drafts, share with collaborators, and manage your audience.
An API (Application Programming Interface), at its most basic level, allows your product or service to talk to other products or services. In this way, an API allows you to open up data and functionality to other developers and to other businesses.
A tool in the Socrata Data Management Experience that automatically generates code that you can use to automate updates to your dataset. In the first iteration of this tool, Python code is generated; in subsequent iterations, more languages will be added.
- B -
A bar graph or chart uses horizontal or vertical bars whose lengths proportionally represent values in a dataset. A chart with vertical bars is also called a column graph or chart.
- C -
A catalog represents a collection of assets that are grouped into categories. Catalogs organize assets to make it easier to navigate to the information needed.
CSV (comma separated values) file is a specially formatted, plain text file which stores spreadsheet or basic tabular information in a very simple format, with one record on each line, and each field within that record separated by a comma.
- D -
Facts and statistics collected together for reference or analysis
A dataset is an organized collection of data. The most basic representation of a dataset is data elements presented in tabular form. Each column represents a particular variable. Each row corresponds to a given value of that column’s variable. A dataset may also present information in a variety of non-tabular formats, such as an extended mark-up language (XML) file, a geospatial data file, or an image file.
DataSync is Data & Insights' free, simple, and powerful publishing tool that allows users to schedule and automate their data updates and upload large data files.
Domain Name System (DNS) The friendly naming system for giving addresses to web servers and webpages.
Within a Data & Insights catalog, assets may include raw, source and federated datasets, as well as visualizations (table, chart, map, data lens, or story).
The use of automatic processes, equipment, or systems for the purpose of collecting, processing, storing, transmitting, and presentingof data. Pertains especially to the use of computers and peripheraldevices.
Data cleaning, alsocalled data checking or data validation, is the process b which missing , erroneous, or invalid data are determined and cleaned, or removed, from a dataset and follows the data preparation process.
Administrative process by which the required data is acquired, validated, stored, protected, and processed, and by which its accessibility, reliability, and timeliness is ensured to satisfy the needs of the data user.
The process by which data are readied for analysis and includes formatting, or normalizing, of values in a dataset.
The process of making data available in an orderly and secure way to users, application developers, and applications that need it.
The process of converting data or information from one format to another, usually from the format of a source system into the required format of a new destination system.
The ability to easily access and work with data no matter where it is located or what application created it. The assurance that data being reported are accurate and are coming from the official source.
A classification identifying one of various types of data, such as real, integer or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of that type can be stored.
The process by which data are visualized, or presented, after the data cleaning process, and involves making choices about which data will be visualized, how data will be visualized, and what message will be shared with the target audience of the visualization. The end result may be referred to as data visualization.
A type of variable whose value is determined by, or depends on, another variable.
A group of computers and devices on a network that are administered as a unit with common rules and procedures. Within the Internet, domains are defined by the IP address. All devices sharing a common part of the IP address are said to be in the same domain.
- E -
Exporting or downloading data
Environmental Systems Research Institute (Esri) is an international supplier of Geographic Information System (GIS) software, web GIS and geodatabase management applications.
ETL is short for extract, transform, load, three database functions that are combined into one tool to pull data out of one database and place it into another database.
- F -
Dataset federation is the ability to have datasets that are published on different Data & Insights sites or other services brought together into a public Data & Insights catalog.
A programmatic way to narrow a search using specified conditions.
A file of data that does not contain links to other files or is anon-relational database.
The Feature Manipulation Engine (FME) is a platform that streamlines the translation of spatial data between geometric and digital formats. It is intended especially for use with geographic information system (GIS), computer-aided design (CAD) and raster graphics software. The FME is useful in applications involving interactive geographical, geological and topographical mapping, such as Google Earth and MapQuest. It facilitates the transformation of spatial data into a variety of formats, data models and repositories for transmission to end users. This process is called spatial extract, transform and load (spatial ETL).
A type of data validation for datasets gathered electronically, and is performed at the front end, or before data are stored in an electronic database.
- G -
Used to indicate that data that has a geographic component to it. This means that the records in a dataset have locational information tied to them such as geographic data in the form of coordinates, address, city, or ZIP code. GIS data is a form of geospatial data. Other geospatial data can originate from GPS data, satellite imagery, and geotagging.
Refers to a system where geographic information is stored in layers and integrated with geographic software programs so that spatial information can be created, stored, manipulated, analyzed, and visualized (mapped).
General Transit Feed Specification (GTFS) defines a common format for public transportation schedules and associated geographic information.
- H -
A heat map is a graphical representation of data in the form of a map where the individual values contained in a matrix are represented as colors.
- I -
Uploading and/or updating data.
- J -
- K -
Keyhole Markup Language (KML) is an XML grammar and file format for modeling and storing geographic features such as points, lines, images, polygons, and models for display.
KML files are very often distributed in KMZ files, which are zipped files with a “.KMZ” extension. When a KMZ file is unzipped, a single “doc.kml” is found along with any overlay and icon images referenced in the KML and any network-linked KML files.
- L -
- M -
A type of average, or measure of central tendency, in which the middle of a dataset is determined by arranging its numeric values in order.
Metadata describes a number of characteristics or attributes of data; that is, “data that describes data”. (ISO 11179-3). For any particular datum, the metadata may describe how the datum is represented, ranges of acceptable values, its label, and its relationship to other data. Metadata also may provide other relevant information, such as the responsible steward, associated laws and regulations, and the access management policy. The metadata for structured data objects describes the structure, data elements, interrelationships, and other characteristics of information, including its creation, disposition, access and handling controls, formats, content, and context, as well as related audit trails.
A numeric value that appears most often in a dataset.
- N -
- O -
Open Data Protocol (OData) is a RESTful data access protocol initially defined by Microsoft. Check out our OData API!
Making data that belongs to the public broadly accessible and usable by humans and machines, free of any constraints.
An extremely high or extremely low numeric value that lies outside the distribution of most of the values in a dataset.
- P -
Pentaho Data Integration (PDI, also called Kettle) is the component of Pentaho responsible for the Extract, Transform and Load (ETL) processes. Though ETL tools are most frequently used in data warehouses environments, PDI can also be used for other purposes such as Migrating data between applications or databases, exporting data from databases to flat files, loading data massively into databases, data cleansing and integrating applications.
Police Data Initiative (PDI) is a White House initiative promoting the use of open data and technology to improve law enforcement practices and outcomes.
A circular graph divided into sectors, each with an area relative to the whole circle, and is used to represent the frequency of values in a dataset.
The activities and processes that lead to the publication of data, associated metatdata and accompanying documentation.
- Q -
A type of data that describes the qualities or attributes of something using words or other non-numeric symbols.
A type of data that quantifies or measures something using numeric values.
- R -
Resource Description Framework(RDF) is a standard model for data interchange on the Web.
A range is determined by taking the difference between the highest and lowest numeric values in a dataset.
Refers to data that have only been collected, not manipulated or analyzed, from a source.
The representational state transfer (REST) architectural style describes six constraints: uniform interface, stateless, cacheable, client-server, layered system, and code on demand (optional).
Data & Insights datasets are essentially a collection of rows. Each row can be uniquely designated by its “row identifier”, much like a driver’s license number or social security number identifies an individual. For those familiar with database concepts, they essentially act the same way as primary keys.
With Rich Site Summary (RSS) it is possible to distribute up-to-date Web content from one website to thousands of other websites around the world.
- S -
Software as a Service (SaaS), is a software licensing and delivery model in which software is licensed ona subscription basis and is centrally hosted.
Social Data Player (SDP) is a feature of the Data & Insights Platform which allows audiences to republish data on the web as an Embed. When hosted on webpages, these embeds serve as extensions of the portal - one can interact with the data and, the content updates dynamically when the underlying data is updated.
Search Engine Optimization (SEO) is the process of getting traffic from the “free,” “organic,” “editorial” or “natural” listings on search engines, like Google, Yahoo, and Bing.
Secure Sockets Layer (SSL) is a standard security technology for establishing an encrypted link between a server and a client - typically a server (website) and browser. The link ensures that all data passed between the web server and browsers remain private and integral.
Uses plotted points (that are not connected by a line) to represent values of a dataset with one or more dependent variables and one independent variable.
A digital vector (non-topological) storage format for storing geometric location and associated attribute information. Shapefiles can support point, line, and area features.
Socrata Open Data API
A System of Record (SOR) or Source System of Record (SSOR) is a data management term for an information storage system (commonly implemented on a computer system) that is the authoritative data source for a given data element or piece of information.
Describes where something (such as a collection) is physically located, using geospatial coordinates such as latitude and longitude.
Structured Query Language (SQL) is a special-purpose programming language responsible for querying and editing information stored in a certain database management system.
A type of bar graph whose bars are divided into sub-sections, each of which proportionally represent categories of data in a dataset that can be stacked together to form a larger category.
A measure of how much the values in a dataset vary, or deviate, from the arithmetic mean by taking the square root of the variance.
The public facing component of the Data & Insights Tool "Perspectives" used to create data driven narrative.
- T -
TSV stands for Tab Separated Values. TSV files are used for raw data and can be imported into and exported from spreadsheet software. TSV files are essentially text files, and the raw data can be viewed by text editors, though they are often used when moving raw data between spreadsheets.
- U -
User Interface (UI) is everything designed into an information device with which a human being may interact- including display screen, appearance of a website, help messages, and how an application program or a website invites interaction and responds to it.
- V -
A measure of how spread out the numeric values in a dataset are, or how much the values vary, from the arithmetic mean.
- W -
- X -
The original file extension used for Microsoft Excel spreadsheets.
XML (extensible markup language) is a simple, very flexible text format derived from SGML (standardized general markup language). XML is Often a Complement to HTML. In many HTML applications, XML is used to store or transport data, while HTML is used to format and display the same data.
- Y -
- Z -