Having consistent, thorough metadata is not only helpful for your publishers when they upload data, but also makes finding datasets and assets in your catalog easy for your users!
There are four general types of metadata:
- Administrative Metadata is the most common and is produced in data collection, production, publication, and archiving. Most open data metadata is in this category.
- Structural Metadata describes a dataset’s structure, including its format, organization, and variable definitions. This is highest in demand by researchers and academics.
- Reference/Descriptive Metadata is a broad term that mostly involves descriptions of methodology, sampling, and quality.
- Behavioral Metadata records to the reactions and behaviors of the dataset’s users such as a rating or user analytics.
Creating your Metadata Schema
As a starting point, we recommend reviewing our default metadata schema here.
We have also created a sample metadata schema that contains fields useful for most datasets. We encourage you to adapt this schema and/or build upon it to aid in the collection of information as you publish data.
Once you have defined your metadata schema, create custom metadata fields in your open data portal so that dataset owners and publishers can input the correct information when uploading and curating their data.
Metadata Standards
As an alternative to a home-grown convention for describing your data, using a metadata standard can enable your dataset to be organized with other datasets and ensure you have a complete, standard set of information about each part of your data.
Two examples of the many metadata standards are the Dublin Core Metadata Initiative (DCMI) and Department of Defense Discovery Metadata Standard (DDMS). The Dublin Core Metadata Initiative (DCMI) is a project of the Association for Information Science and Technology (ASIS&T). Its lists of elements, glossary, and frequently asked questions (FAQs) were last revised in 2005, but an effort to update its User Guide is being developed at the wiki page. Many state-level Open Data programs use the current set of elements, which are required to accompany each data table.
The DoD Discovery Metadata Specification (DDMS) defines discovery metadata elements for resources posted to community and organizational shared spaces. The DDMS is dependent on other schemas provided by the Director of National Intelligence Chief Information Officer, the National Geospatial Intelligence Agency, International standards organizations and commercial standards. The latest official schemas and guidelines can be found here.
Metadata Formats
Popular Machine-readable, open formats for communicating metadata include XHTML, XML, JSON and RDF. Data & Insights natively supports data.json (JSON - JavaScript Object Notation) so any datasets managed through Data & Insights will be automatically exposed correctly in json format. Data & Insights also supports all the extended metadata fields through custom metadata features.
Controlled Vocabulary
For many fields (frequency, license, and data owner are good examples), there is an option to enforce a controlled vocabulary rather than free text. Controlled vocabulary has two benefits: (1) it helps with tracking, search, and summary by ensuring consistent language; (2) it could support compliance by making it easier for data providers to provide appropriate metadata. Whenever possible, create drop-downs or pick-lists to aid in consistency.
Data Dictionary
In addition to the information about the dataset as a whole, a data dictionary is a valuable addition to help end users understand the data you’ve provided. The data dictionary provides detailed descriptions and data types for each field within the dataset. This information is used to populate the metadata fields inline with the dataset, as well as provided as additional documentation for end-users.
A standard template for a data dictionary is as follows:
Field Name | Field Type | Field Description |
Categories and Naming Conventions
While Data & Insights does not endorse a particular set of categories for data, looking to your open data peers for reference will provide a consistent search and navigation experience for open data end users. For example, look to New York City or Los Angeles for how they categorize their data. Alternatively, look to San Mateo County or Montgomery County or the states of Maryland or Washington.
Examples of Great Metadata Practices
Many Data & Insights customers are very forward-thinking about their use of metadata. Take a look at metadata standards other cities, counties, and states have created for their open data programs:
- City of Chicago Data Dictionary
- San Francisco Metadata Standards
- New York State (Page 31 Appendix B)
- City of Philadelphia Metadata Catalog
Reference: Metadata Standards and Guides
- American National Standards Institute: Understanding Metadata
- National Archives and Records Administration: Minimal Metadata Elements and Terms
- National Archives and Records Administration: Metadata Guidance for the Transfer of Permanent Electronic Records
- Project Open Data Metadata Schema
- Open Metadata Handbook
- Open Data - Metadata Guide: Johns Hopkins Center for Government Excellence
Comments
Article is closed for comments.