In the new import experience, there are numerous ways to format your data before publication to the Socrata platform. Navigating them can be confusing if you don't understand the differences between unformatted data, data types, data transforms, and column formatting.
This article focuses on what distinguishes each from the others, and includes links to more in-depth resources throughout.
Unformatted data refers to the raw, unstyled state of the data in your source file. (Note that "format" is not referring to a file format in this case.) Unformatted data is what exists underneath any other applied styles or formats, including those common in Excel. It's important to check out your unformatted data before you upload to the Socrata platform, because that's what the platform itself will be reading.
Date/time data illustrates this difference well:
Fig. 1 - Date/time data from a CSV opened in Excel. The date recorded in cell 2A appears in the format "M/dd/YY" (2/28/02) in the sheet and "M/dd/YYYY" (2/28/2002) in the function bar. (The discrepancy between those two is your first hint that you're not looking at the unformatted data.)
Fig. 2 - Date/time data from the same CSV opened in a plain text reader. This is your unformatted data, which appears here as "YYYY-MM-dd" (2002-02-28).
The appearance of your unformatted data is determined by the source and method used for your actual data extraction.
Data types denote the type of information you've recorded, such as text data, point data, location data, timestamp data, etc. The Socrata platform is designed to make its best guess as to what data type each column should be, but you can also manually assign data types column-by-column in the new edit experience (for screenshots and videos, see the additional resources linked below). Certain data types preclude the use of certain functions, so it is important to assign the correct type for each column in your dataset.
For example, you may be able to assign a column of numerical data the text data type, but you won't be able to perform mathematical functions on it until you go back and re-designate it as number. Likewise, you can import a column of month/year data as text, but you'll only be able to sort it alphabetically (not chronologically) until you re-designate it as the date & time data type.
Data transforms are written SoQL expressions that allow the platform to transform and validate incoming data. Transforms are applied column-by-column, and the SoQL expression runs against every cell in the column. You can access data transforms through the "Preview Data Table" view during initial imports, updates, or replaces in the platform UI. Click to expand the column menu and click "Data Transform" to open the compiler.
NOTE: Once you apply a data transform, it is stored in your import configuration, meaning it will be automatically re-applied every time you upload new data on that dataset, unless you delete it manually. If you're seeing unexpected errors when updating your dataset, check the compiler! You may need to edit or delete an existing data transform, depending on changes in your source data.
- Transforming and Validating Data in Socrata Before Publishing
- How to Apply Nested Transforms to a Column When Uploading Data
- How to Transform a Datetime when Importing Data
- Data Transform Listing (Socrata Developer Site)
Column formatting is the final, cosmetic layer applied to your on-platform data. Column formatting does not affect your underlying data at all, nor does it interfere with any pre-existing data transforms. This is simply how you instruct the platform to make your data look a certain way. The data type you've selected for a column is going to determine which column formatting options are available to you.
The relationships between unformatted data, data types, data transforms, and column formatting are best summarized with an example.
You've inherited an old dataset with date/time data that looks like Fig. 3, and you want to replace it with new data from an Excel file (Fig. 4):
Fig. 3 - Platform dataset Fig. 4 - Excel file
When you upload the replacement data, however, you get errors in your date columns, so you check to see if there's already a data transform stored in this column that might be interfering with things:
The "to_floating_timestamp" function tells us that the transform expected the date/time data type in this column, and "'%m/%d/%Y'" means it expected to read it as month/day/year. To double-check for any inconsistencies, we view the unformatted data in plain text:
This shows us that our unformatted data is actually stored as year-month-day. We can now update our data transform to match:
Finally, we can return to the data table preview and apply column formatting so that our date/time data appears the way we want it to when it's published: