Having consistent, thorough metadata is not only helpful for your publishers when they upload data, but also makes finding datasets and assets in your catalog easy for your users!
There are four general types of metadata:
- Administrative Metadata is the most common and is produced in data collection, production, publication, and archiving. Most open data metadata is in this category.
- Structural Metadata describes a dataset’s structure, including its format, organization, and variable definitions. This is highest in demand by researchers and academics.
- Reference/Descriptive Metadata is a broad term that mostly involves descriptions of methodology, sampling, and quality.
- Behavioral Metadata records to the reactions and behaviors of the dataset’s users such as a rating or user analytics.
Creating your Metadata Schema
As a starting point, we recommend reviewing our default metadata schema here.
We have also created a sample metadata schema that contains fields useful for most datasets. We encourage you to adapt this schema and/or build upon it to aid in the collection of information as you publish data.
Once you have defined your metadata schema, create custom metadata fields in your open data portal so that dataset owners and publishers can input the correct information when uploading and curating their data.
As an alternative to a home-grown convention for describing your data, using a metadata standard can enable your dataset to be organized with other datasets and ensure you have a complete, standard set of information about each part of your data.
Two examples of the many metadata standards are the Dublin Core Metadata Initiative (DCMI) and Department of Defense Discovery Metadata Standard (DDMS). The Dublin Core Metadata Initiative (DCMI) is a non-profit organization hosted at the National Library Board of Singapore. Its lists of elements, glossary, and frequently asked questions (FAQs) were last revised in 2005, but an effort to update its User Guide is being developed at the wiki page. Many state level Open Data programs use the current set of elements, which are required to accompany each data table.
The DoD Discovery Metadata Specification (DDMS) defines discovery metadata elements for resources posted to community and organizational shared spaces. The DDMS is dependent on other schemas provided by the Director of National Intelligence Chief Information Officer, the National Geospatial Intelligence Agency, International standards organizations and commercial standards. The latest official schemas and guidelines can be found here.
For many fields (frequency, license, and data owner are good examples), there is an option to enforce a controlled vocabulary rather than free text. Controlled vocabulary has two benefits: (1) it helps with tracking, search, and summary by ensuring consistent language; (2) it could support compliance by making it easier for data providers to provide appropriate metadata. Whenever possible, create drop-downs or pick-lists to aid in consistency.
In addition to the information about the dataset as a whole, a data dictionary is a valuable addition to help end users understand the data you’ve provided. The data dictionary provides detailed descriptions and data types for each field within the dataset. This information is used to populate the metadata fields inline with the dataset, as well as provided as additional documentation for end-users.
A standard template for a data dictionary is as follows:
Categories and Naming Conventions
While Socrata does not endorse a particular set of categories for data, looking to your open data peers for reference will provide a consistent search and navigation experience for open data end users. For example, look to New York City or Los Angeles for how they categorize their data. Alternatively, look to San Mateo County or Montgomery County or the states of Maryland or Washington.
Examples of Great Metadata Practices
Many Socrata customers are very forward-thinking about their use of metadata. Take a look at metadata standards other cities, counties, and states have created for their open data programs:
Reference: Metadata Standards and Guides