Sample Metadata Schema

Field Description Definition Example Values
Title Title helps users discover, select, and differentiate between similar datasets. Human-readable name of the asset. Should be easy-to-understand and include sufficient detail to facilitate search and discovery. Avoid acronyms. Text with character limit.
Description Description helps users discover, select, and differentiate between similar datasets. Describes the dataset. Provides a longer description of the data that can be readily understood by non-technical users. Text with character limit.
Category Category groups similar datasets together regardless of source and can be used to locate similar datasets. Identified by a list of customizable values. If a dataset falls into multiple categories, select the one which is most significant. This list will be subject to change on an ongoing basis. Drop down menu. Categories predefined.
Agency/Department Responsible Agency/Department is helpful for navigation and to ensure a single responsible party. Identified by a list of customizable values. The agency/department that collects and manages the data as the canonical source. Drop down menu. List of agencies/departments predefined with acronyms.
Data Dictionary A Data Dictionary is essential to understanding how the data can be used. It can describe fields, differences between fields, and assess whether or not the data is appropriate for the intended use. Data Dictionaries could be published in both .csv and .pdf format. Explains fields within the dataset (definition, type, size, and any other pertinent information that describes the dataset). Attachments in the form of .csv format and .pdf.
Last Updated Last updated indicates of the recency of the data. Helps users determine usage of data. Identified by a list of customizable values. Most recent date and time when the dataset was changed, updated, or modified. Consider ISO 8601: YYYY-MM-DDThh:mm:ss.s (as much as is relevant to the dataset) or, to reflect continual updating, ISO has duration standards like P1D for daily and P2W for every two weeks.

Frequency of Data Change

Frequency - Data Change works together with the publishing frequency and helps set expectations for future updates as well as aids in planning. Identified by a list of customizable values. Cadence of dataset changes. Not updated (historical only),
Yearly,
Quarterly,
Bi-monthly,
Monthly,
Bi-weekly,
Weekly,
Daily,
Hourly,
Continuous
Frequency of Publishing Frequency - Publishing works together with the Data Change frequency and helps set expectations for future updates as well as aids in planning. Identified by a list of customizable values. Frequency with which dataset is published. Not updated (historical only),
Yearly,
Quarterly,
Bi-monthly,
Monthly,
Bi-weekly,
Weekly,
Daily,
Hourly,
Continuous
Unique Identifer A Unique Identifier is required for dataset management. A unique identifier for the dataset or API as maintained within an Agency catalog or database. Auto generated by Data & Insights.
Permalink/Identifier A Permalink helps provide continuity for accessing the dataset. Persistent link to the dataset Auto generated by Data & Insights.
Public Access Level While most data on the platform will be public, Public Access Level gives us a means to track protected or sensitive data and provide a means for internal users to discover and access non-public data. Identified by a list of customizable values. The degree to which this dataset could be made publicly-available, regardless of whether it has been made available. TBD; POD Common core uses “public”, “restricted public”, “non-public”. Consider using the following the data inventory/catalog:
Protected
Sensitive
Public
Public Access Level Comment If the data is not public, consider providing an explanation and a means for people to access it if eligible. An explanation for the selected “accessLevel” including instructions for how to access a restricted file, if applicable, or explanation for why a “non-public” or “restricted public” data asset is not “public,” if applicable. Text with character limit.
License/Rights A License reduces legal uncertainty for data consumers or users. The license with which the dataset or API is published. Current list of licenses offered can be found here.
Data Steward Consider including a Data Steward for each dataset to support the data coordinators and to answer dataset questions. This helps to track and triage data requests. Person who manages the data and is responsible for making changes to the data. Person understands what the dataset includes and can answer questions about it. String (First Last).
Contact Email Consider including publicly-visible Contact Email on each dataset, which can be used by users to ask questions. Person who manages the data and is responsible for making changes to the data. Person understands what the dataset includes and can answer questions about it. String (email address).
Row Count Row Count is a useful indicator of dataset size.   Auto generated by Data & Insights.
API Endpoint An API Endpoint facilitates programmatic access to the data. Endpoint of web service to access dataset. Auto generated by Data & Insights.
Geographic Unit Geographic Unit indicates the geographic level at which the dataset is collected; also helps track the need to aggregate or summarize data. Identified by a list of customizable values. At what geographic unit is the data collected? For example, if the data is collected by address, it would be Street Address. Consider using drop down menu - items:
Latitude/longitude,
Street address,
Parcel (block/lot),
Census block,
Census block group,
Census tract,
Zoning district,
Neighborhood,
Planning District,
Supervisorial District,
Zip code,
City,
Other,
Not applicable
Temporal Coverage Temporal Coverage provides an easy way to determine the value of a dataset. The range of temporal applicability of a dataset (i.e., a start and end date of applicability for the data). Consider using ISO 8601. ISO has options to clarify that the dataset is continually updated in a certain date range.
Download URL A Download URL provides access to the data for the purpose of open data. URL providing direct access to the downloadable distribution of a dataset. URL
Tags Tags link technical language, secondary categories, and acronyms to your dataset, aiding in user-executed searches. Tags (or keywords) help users discover your dataset. Include terms that could be used by technical and non-technical users. Keywords (examples: finance, parks, environment).
Link A Link can provide more information on the origin of the dataset. Not all datasets will have this information. The URL to the program area web pages. URL
Related Documents Linking a Related Document provides the opportunity to include forms or other types of documents to help users understand the data. Not all datasets will have this information. Related documents such as technical information about a dataset, developer documentation, etc. URL
Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.