Best Practices for Metadata Management

Kaylee Guetle

Last Updated: December 08, 2022 19:57

Having consistent, thorough metadata is not only helpful for your publishers when they upload data, but also makes finding datasets and assets in your catalog easy for your users!

There are four general types of metadata:

Administrative Metadata is the most common and is produced in data collection, production, publication, and archiving. Most open data metadata is in this category.
Structural Metadata describes a dataset’s structure, including its format, organization, and variable definitions. This is highest in demand by researchers and academics.
Reference/Descriptive Metadata is a broad term that mostly involves descriptions of methodology, sampling, and quality.
Behavioral Metadata records to the reactions and behaviors of the dataset’s users such as a rating or user analytics.

Creating your Metadata Schema

As a starting point, we recommend reviewing our default metadata schema here.

We have also created a sample metadata schema that contains fields useful for most datasets. We encourage you to adapt this schema and/or build upon it to aid in the collection of information as you publish data.

Once you have defined your metadata schema, create custom metadata fields in your open data portal so that dataset owners and publishers can input the correct information when uploading and curating their data.

Metadata Standards

As an alternative to a home-grown convention for describing your data, using a metadata standard can enable your dataset to be organized with other datasets and ensure you have a complete, standard set of information about each part of your data.

Two examples of the many metadata standards are the Dublin Core Metadata Initiative (DCMI) and Department of Defense Discovery Metadata Standard (DDMS). The Dublin Core Metadata Initiative (DCMI) is a project of the Association for Information Science and Technology (ASIS&T). Its lists of elements, glossary, and frequently asked questions (FAQs) were last revised in 2005, but an effort to update its User Guide is being developed at the wiki page. Many state-level Open Data programs use the current set of elements, which are required to accompany each data table.

The DoD Discovery Metadata Specification (DDMS) defines discovery metadata elements for resources posted to community and organizational shared spaces. The DDMS is dependent on other schemas provided by the Director of National Intelligence Chief Information Officer, the National Geospatial Intelligence Agency, International standards organizations and commercial standards. The latest official schemas and guidelines can be found here.

Metadata Formats

Popular Machine-readable, open formats for communicating metadata include XHTML, XML, JSON and RDF. Data & Insights natively supports data.json (JSON - JavaScript Object Notation) so any datasets managed through Data & Insights will be automatically exposed correctly in json format. Data & Insights also supports all the extended metadata fields through custom metadata features.

Controlled Vocabulary

For many fields (frequency, license, and data owner are good examples), there is an option to enforce a controlled vocabulary rather than free text. Controlled vocabulary has two benefits: (1) it helps with tracking, search, and summary by ensuring consistent language; (2) it could support compliance by making it easier for data providers to provide appropriate metadata. Whenever possible, create drop-downs or pick-lists to aid in consistency.

Data Dictionary

In addition to the information about the dataset as a whole, a data dictionary is a valuable addition to help end users understand the data you’ve provided. The data dictionary provides detailed descriptions and data types for each field within the dataset. This information is used to populate the metadata fields inline with the dataset, as well as provided as additional documentation for end-users.

A standard template for a data dictionary is as follows:

Field Name	Field Type	Field Description

Categories and Naming Conventions

While Data & Insights does not endorse a particular set of categories for data, looking to your open data peers for reference will provide a consistent search and navigation experience for open data end users. For example, look to New York City or Los Angeles for how they categorize their data. Alternatively, look to San Mateo County or Montgomery County or the states of Maryland or Washington.

Examples of Great Metadata Practices

Many Data & Insights customers are very forward-thinking about their use of metadata. Take a look at metadata standards other cities, counties, and states have created for their open data programs:

Reference: Metadata Standards and Guides

San Francisco_ Final Metadata Standard.xlsx
(60 KB)

Best Practices for Metadata Management

Metadata Formats

Comments

Articles in this section