Importing, Data Types, and You!

So you want to publish your data online. You want it to look great, you want it to be useful and you want people to be able to search and filter it efficiently. Data types are critical to all of these functions, but how do you know when something is a valid type or not? How do you format your date column so it appears as a date? How do you get percentages to properly appear?

Read on, and let us know if you still have any questions!

Please note that this article is specific to datasets uploaded through the Legacy Web User Interface.

Which file types are allowed on import?

Import a Data File:

  1. .csv (csv output from an excel file or see rfc 4180 for standard format)
  2. .xls (Excel 97 and later)
  3. .xlsx (2007 and later)
  4. .tsv
  5. Note: .csv is the optimal file format for upload. If you experience an error while importing a .xls or .xlsx file, try saving as a .csv to eliminate Excel formatting that can sometimes cause errors.

Upload a Non-Data File:

  1. The followin file types are allowed to be uploaded: accdb,. asc,. avi,. azw3,. bmp,. c,. css,. csv,. das,. dat,. db,. dbf,. dib,. doc,. docx,. dta,. dwg,. ecw,. eml,. fmw,. ged,. geojson,. gif,. gif,. gsheet,. ico,. ics,. indd,. jpeg,. jpg,. js,. json,. kml,. kmz,. log,. lyr,. md,. mdb,. mov,. mp3,. mp4,. mpg,. msg,. n3,. numbers,. ods,. odt,. one,. onepkg,. pages,. paint,. pb,. pdf,. png,. pps,. ppt,. pptx,. psd,. pxm,. rar,. rdf,. rtf,. sas,. sas7bcat,. sas7bdat,. sav,. sdf,. shp,. sql,. sig,. sketch,. snagproj,. tex,. tif,. tiff,. tmp,. tsv,. twbx,. txt,. wav,. wma,. wmf,. wmv,. wpd,. wps,. xlr,. xls,. xlsb,. xlsx,. xml,. xps,. zip (however we only provide previews for types listed in this article). Due to Socrata security policies, any other file extension on this list will not be allowed to upload. There is a 1 GB upload limit for all non-date files.

Import Geospatial Data:

  1. .zip (shapefiles; the following files are required: .shp, .shx, .dbf, .prj)
  2. .kml (We do not support multiple features in a single layer. Technical terminology: we do not support heterogeneous children inside a multi geometry tag)
  3. .kmz (this a zipped .kml and follows the same import rules as above)
  4. .json (GeoJSON)

Link to External Data:

  1. URL

Connect with an ESRI Map Layer:

  1. URL to the RESTful endpoint for an individual map layer on an ArcGIS Server version 10.0 or above. You can read more about these connections here.
  2. In addition to ESRI Extension, Socrata has built out the Catalog Connector for ArcGIS Server. This new feature allows administrators to connect a public ArcGIS Server to your Socrata catalog and choose the layers to federate to your Socrata catalog as external datasets. 
  3. Our platform supports Web Mercator projections via ESRI Extensions and will not convert other projections types. 

Supported Types on Import

We only try to detect specifically supported types on import. There are many other data types that we support but you'll have to either convert your columns after import or enter the data manually.

The following types are supported:

Plain Text

UTF-8 encoded text (generally alphanumeric text). Our system assumes that there is no text formatting. 

Numbers, Money, and Percent

Numbers: We directly use Java's BigDecimal parsing. For details see the documentation. If you're working with values that contain leading zeros that aren't importing correctly, visit this support article.

For negative numbers, the format should not contain any commas. -10000 will be read correctly but -10,000 will not.

Percentages: Percentages can be uploaded when you have a basic number in your data, the value in your source file should not contain the percent symbol (%). Percents are not in the range 0.0 to 1.0 like they are in Excel. A percentage input of "42.0" is idiomatically "42.0%".

Money: As of 6/15/18 this datatype has been deprecated. To display values as Money, you will need to upload the data as a Number datatype and apply formating to its display.

It is important to note that if you have string values, e.g. text, in a number, money or percent field, these values will not be included in the upload. Previously these invalid values would show with grey italicized text, however, with the new data table, these cells will now be blank.

Dates & Time / Date & Time (with timezone)

Dates are parsed by default in the American/Pacific (PST) timezone. You can explicitly specify a time zone by using the supported ISO 8601 subset. A 'Z' character is UTC, otherwise, the offset is [+-]HH:mm.

For inputs that don't specify a time of date, the resulting time is undefined. In other words, don't rely on it being anything consistent.

The accepted input formats are:

Supported ISO 8601 Subset

  • yyyy-MM-dd['T']HH:mm:ssZ (e.g. "1920-01-22T00:00:00Z", "1920-01-22T00:00:00-10:00", or "1920-01-22 00:00:00Z")
  • yyyy-MM-dd['T']HH:mm:ss (e.g. "1920-01-22T00:00:00" or "1920-01-22 00:00:00")
  • yyyy-MM-dd['T']HH:mm (e.g. "1920-01-22T00:00")
  • yyyy-MM-dd (e.g. "1920-01-22")

Supported non-ISO Dates

For dates other than the ISO subset we accept a date, optionally followed by a time, i.e.

(date)[ (time)]

Non-ISO dates are always parsed in the American date format locale (i.e. month, day, year). Months and days can be either single or double digit and may or may not be led with a '0'. Years can be either four digits (preferred) or two. If a year is two digits it will be assumed to be between 1951 and 2050: i.e. 1/2/75 would be January 2nd 1975, but 1/2/49 would be January 2nd 2049.

The accepted input formats are:

  • MMM d, yyyy (e.g. "Jan 4, 1982")
  • MMM d, yy (e.g. "Jan 4, 82")
  • MMMM d, yyyy (e.g. "January 4, 1982")
  • MMMM d, yy (e.g. "January 4, 82")
  • M-d-yyyy (e.g. "1-4-1982")
  • M/d/yyyy (e.g. "1/4/1982")
  • M.d.yyyy (e.g. "1.4.1982")
  • M-d-yy (e.g. "1-4-82")
  • M/d/yy (e.g. "1/4/82")
  • M.d.yy (e.g. "1.4.82")

Location Columns

Location columns are a "composite" column that's created by appending multiple values together. We accept the following types of location data for geolocation:

  1. Street Address (U.S. only)
  2. City (U.S. only)
  3. State (U.S. only, use the two-letter abbreviation)
  4. Zip Code (both 5 and 9 digits are accepted; U.S. only)
  5. Latitude and Longitude (International)
    1. To format a latitude and longitude pair to be appended or refreshed, format the values in the given column as: "(xx.xxx, yyy.yyy)" where "xx.xxx" is the latitude and "yyy.yyy" is the longitude. Make sure that your values are in decimal degrees, and that you use "negative" longitude degrees for "degrees west" (ie. "-122.36" for Seattle, WA, and "2.33" for Paris, France).
    2. You can also specify a latitude and longitude pair along with your address: "101 Yesler Way, Seattle, WA, 98108 (47.60165, -122.33403)".

To append or refresh an address, simply format it as a comma-separated US-formatted address within the column, such as "101 Yesler Way, Seattle, WA, 98108". It'll automatically be queued up for geocoding if the address parser recognizes the format. 

Note if you are importing location information as a single column: Whatever file format you use, make sure that the values in your location column are properly escaped. For example, for CSV, an address must be wrapped in double-quotes in order to escape the commas within it. Make sure the address format follows the following format: "101 Yesler Way, Seattle, WA, 98108".

Website Links/URLs

URL's support three different input formats. Only three URL schemes are acceptable: FTP, HTTP, and HTTPs. We use a custom regular expression to validate URLs. It should accept just about anything that you throw at it, but there's always a chance that it's missed something. Please note that since underscores are invalid characters in DNS records, we do not accept underscores.

  1. <a href="http://www.socrata.com/">Socrata</a>
  2. http://www.socrata.com/
  3. Socrata (http://www.socrata.com)

 

Checkboxes

Valid false values:

  • 0
  • f
  • false
  • n
  • no
  • off

Valid true values:

  • 1
  • t
  • true
  • y
  • yes
  • on

  

Which data types are available after import?

There are some file types that we do not handle during import, but you can add to your dataset after you have imported the file. It is also not possible to programmatically import these data types.

Photo (Image)

  • Accepted file formats: .jpg, .png, .gif

Document

  • The followin file types are allowed to be uploaded: accdb,. asc,. avi,. azw3,. bmp,. c,. css,. csv,. das,. dat,. db,. dbf,. dib,. doc,. docx,. dta,. dwg,. ecw,. eml,. fmw,. ged,. geojson,. gif,. gif,. gsheet,. ico,. ics,. indd,. jpeg,. jpg,. js,. json,. kml,. kmz,. log,. lyr,. md,. mdb,. mov,. mp3,. mp4,. mpg,. msg,. n3,. numbers,. ods,. odt,. one,. onepkg,. pages,. paint,. pb,. pdf,. png,. pps,. ppt,. pptx,. psd,. pxm,. rar,. rdf,. rtf,. sas,. sas7bcat,. sas7bdat,. sav,. sdf,. shp,. sql,. sig,. sketch,. snagproj,. tex,. tif,. tiff,. tmp,. tsv,. twbx,. txt,. wav,. wma,. wmf,. wmv,. wpd,. wps,. xlr,. xls,. xlsb,. xlsx,. xml,. xps,. zip.

NOTE: FOR DATASETS WHICH HAVE DEPRECATED FEATURES, THE FEATURES WILL STILL FUNCTION. THESE FEATURES WILL NOT BE AVAILABLE FOR ANY NEW DATASETS.

Was this article helpful?
3 out of 3 found this helpful
Have more questions? Submit a request

Comments

1 comment
  • Nice job on this Sam. I'm wondering if it would be worthy to point out in the document or in a comment if Excel's native date format is not accepted or what one should do if they want to upload an Excel spreadsheet with dates in it.

Please sign in to leave a comment.