So you want to publish your data online. You want it to look great, you want it to be useful and you want people to be able to search and filter it efficiently. Data types are critical to all of these functions, but how do you know when something is a valid type or not? How do you format your date column so it appears as a date? How do you get percentages to properly appear?
Read on, and let us know if you still have any questions!
Which file types are allowed on import?
Import a Data File:
- .csv (csv output from an excel file or see rfc 4180 for standard format)
- .xls (Excel 97 and later)
- .xlsx (2007 and later)
- Note: .csv is the optimal file format for upload. If you experience an error while importing a .xls or .xlsx file, try saving as a .csv to eliminate Excel formatting that can sometimes cause errors.
Upload a Non-Data File:
- Any file type (however we only provide previews for types listed in this article)
Import Geospatial Data:
- .zip (shapefiles; the following files are required: .shp, .shx, .dbf, .prj)
- .kml (We do not support multiple features in a single layer. Technical terminology: we do not support heterogeneous children inside a multi geometry tag)
- .kmz (this a zipped .kml and follows the same import rules as above)
- .json (GeoJSON)
Link to External Data:
Connect with an ESRI Map Layer:
- URL to the RESTful endpoint for an individual map layer on an ArcGIS Server version 10.0 or above. You can read more about these connections here.
- In addition to ESRI Extension, Socrata has built out the Catalog Connector for ArcGIS Server. This new feature allows administrators to connect a public ArcGIS Server to your Socrata catalog and choose the layers to federate to your Socrata catalog as external datasets.
- Our platform supports Web Mercator projections via ESRI Extensions and will not convert other projections types.
Supported Types on Import
We only try to detect specifically supported types on import. There are many other data types that we support but you'll have to either convert your columns after import or enter the data manually.
The following types are supported:
UTF-8 encoded text (generally alphanumeric text). Our system assumes that there is no text formatting.
UTF-8 encoded text that may contain HTML. Note: any HTML will be sanitized to remove potentially dangerous HTML elements.
Numbers, Money, and Percent
For numbers, we directly use Java's BigDecimal parsing. For details see the documentation. If you're working with values that contain leading zeros that aren't importing correctly, visit this support article.
For negative numbers, the format should not contain any commas. -10000 will be read correctly but -10,000 will not.
Percentages are uploaded when you have a basic number in your data, the value in your source file should not contain the percent symbol (%). Percents are not in the range 0.0 to 1.0 like they are in Excel. A percentage input of "42.0" is idiomatically "42.0%".
Money can be either a number preceded by a dollar sign ($ or any other local currency symbol) or just a number. For negative monetary values, either a negative sign or a set of parentheses are acceptable: e.g. $-42.21, ($42.21), -$42.21 or (42.21). When importing a money data type, the source values should not include the currency symbol.
It is important to note that if you have string values, e.g. text, in a number, money or percent field, these values will be grayed out and italicized. This is how the Socrata platform tells you that these values are invalid. If a dataset contains invalid values such as strings in a number, money or percent field, these will not export.
Dates & Time / Date & Time (with timezone)
Dates are parsed by default in the American/Pacific (PST) timezone. You can explicitly specify a time zone by using the supported ISO 8601 subset. A 'Z' character is UTC, otherwise, the offset is [+-]HH:mm.
For inputs that don't specify a time of date, the resulting time is undefined. In other words, don't rely on it being anything consistent.
The accepted input formats are:
Supported ISO 8601 Subset
- yyyy-MM-dd['T']HH:mm:ssZ (e.g. "1920-01-22T00:00:00Z", "1920-01-22T00:00:00-10:00", or "1920-01-22 00:00:00Z")
- yyyy-MM-dd['T']HH:mm:ss (e.g. "1920-01-22T00:00:00" or "1920-01-22 00:00:00")
- yyyy-MM-dd['T']HH:mm (e.g. "1920-01-22T00:00")
- yyyy-MM-dd (e.g. "1920-01-22")
Supported non-ISO Dates
For dates other than the ISO subset we accept a date, optionally followed by a time, i.e.
Non-ISO dates are always parsed in the American date format locale (i.e. month, day, year). Months and days can be either single or double digit and may or may not be led with a '0'. Years can be either four digits (preferred) or two. If a year is two digits it will be assumed to be between 1951 and 2050: i.e. 1/2/75 would be January 2nd 1975, but 1/2/49 would be January 2nd 2049.
The accepted input formats are:
- MMM d, yyyy (e.g. "Jan 4, 1982")
- MMM d, yy (e.g. "Jan 4, 82")
- MMMM d, yyyy (e.g. "January 4, 1982")
- MMMM d, yy (e.g. "January 4, 82")
- M-d-yyyy (e.g. "1-4-1982")
- M/d/yyyy (e.g. "1/4/1982")
- M.d.yyyy (e.g. "1.4.1982")
- M-d-yy (e.g. "1-4-82")
- M/d/yy (e.g. "1/4/82")
- M.d.yy (e.g. "1.4.82")
Location columns are a "composite" column that's created by appending multiple values together. We accept the following types of location data for geolocation:
- Street Address (U.S. only)
- City (U.S. only)
- State (U.S. only)
- Zip Code (both 5 and 9 digits are accepted; U.S. only)
- Latitude and Longitude (International)
- To format a latitude and longitude pair to be appended or refreshed, format the values in the given column as: "(xx.xxx, yyy.yyy)" where "xx.xxx" is the latitude and "yyy.yyy" is the longitude. Make sure that your values are in decimal degrees, and that you use "negative" longitude degrees for "degrees west" (ie. "-122.36" for Seattle, WA, and "2.33" for Paris, France).
- You can also specify a latitude and longitude pair along with your address: "101 Yesler Way, Seattle, WA, 98108 (47.60165, -122.33403)".
To append or refresh an address, simply format it as a comma separated US-formatted address within the column, such as "101 Yesler Way, Seattle, WA, 98108". It'll automatically be queued up for geocoding if the address parser recognizes the format.
Note if you are importing location information as a single column: Whatever file format you use, make sure that the values in your location column are properly escaped. For example, for CSV, an address must be wrapped in double-quotes in order to escape the commas within it. Make sure the address format follows the following format: "101 Yesler Way, Seattle, WA, 98108".
URL's support three different input formats. Only three URL schemes are acceptable: FTP, HTTP, and HTTPs. We use a custom regular expression to validate URLs. It should accept just about anything that you throw at it, but there's always a chance that it's missed something.
- <a href="http://www.socrata.com/">Socrata</a>
- Socrata (http://www.socrata.com)
Three different input formats are acceptable for emails.
- <a href="mailto:firstname.lastname@example.org">Sam Gibson</a>
- Sam Gibson <email@example.com>
Nearly all emails should work, though technically for performance' sake we only support a subset of the RFC regex for emails. If there's a specific email or set of emails that's causing you a problem, please feel free to submit a support ticket and we'll fix it.
Valid false values:
Valid true values:
Which data types are available after import?
There are some file types that we do not handle during import, but you can add to your dataset after you have imported the file. It is also not possible to programmatically import these data types.
- Number. We do not do validation to confirm if it is a valid phone number or format.
- You can pre-enter values that a user can select from a drop-down.
- Accepted file formats: .jpg, .png, .gif
- We accept any file type
NOTE: FOR DATASETS WHICH HAVE DEPRECATED FEATURES, THE FEATURES WILL STILL FUNCTION. THESE FEATURES WILL NOT BE AVAILABLE FOR ANY NEW DATASETS.