The Program Analytics: Possible Duplicate Tags dataset identifies tags used on this site that may be a duplicate or misspelled instance of another tag on this site. This dataset helps determine which tags may need to be cleaned up and provides a link to the assets using the tag for an expedited update process. Routinely use this dataset to maintain a concise and effective tags strategy, thus strengthening the discoverability of the data in your catalog.
For steps on how to access this and other system datasets please see our article here.
Interpreting the Data
Each row in the dataset includes the following information:
The name of the tag in the asset metadata.
The total number of assets with the given tag.
The name of the tag that the given tag my be duplicating.
A measure of how likely the suggested tag is the one the user meant to use based on:
- the popularity of the suggested tag
- how similar the given tag is to the suggested tag using the principles of Levenshtein distance
This means if two tags are similar in their spelling, but one is much more popular than the other, the analysis will suggest using the more popular tag.
A link to a list of all the assets on this site with the given tag.
A link to a list of all the assets on this site with the suggested tag.
The below record indicates that there is a tag “buget” on this domain that exists on 2 assets. The Possible Duplicate Tag analysis predicts the user meant to use the tag “budget” with a high confidence score of 91. A user can compare the assets tagged with “buget” to the assets tagged with “budget” by following the links in the two rightmost columns.