Importing, Data Types, and You!

So you want to publish your data online. You want it to look great, you want it to be useful and you want people to be able to search and filter it efficiently. Data types are critical to all of these functions, but how do you know when something is a valid type or not? How do you format your date column so it appears as a date? How do you get percentages to properly appear?

Read on, and let us know if you still have any questions!

Which file types are allowed on import?

Import a Data File:

  1. .csv (csv output from an excel file or see rfc 4180 for standard format)
  2. .xls (Excel 97 and later)
  3. .xlsx (2007 and later)
  4. .tsv
  5. Note: .csv is the optimal file format for upload. If you experience an error while importing a .xls or .xlsx file, try saving as a .csv to eliminate Excel formatting that can sometimes cause errors.

Upload a Non-Data File:

  1. The followin file types are allowed to be uploaded: accdb,. asc,. avi,. azw3,. bmp,. c,. css,. csv,. das,. dat,. db,. dbf,. dib,. doc,. docx,. dta,. dwg,. ecw,. eml,. fmw,. ged,. geojson,. gif,. gif,. gsheet,. ico,. ics,. indd,. jpeg,. jpg,. js,. json,. kml,. kmz,. log,. lyr,. md,. mdb,. mov,. mp3,. mp4,. mpg,. msg,. n3,. numbers,. ods,. odt,. one,. onepkg,. pages,. paint,. pb,. pdf,. png,. pps,. ppt,. pptx,. psd,. pxm,. rar,. rdf,. rtf,. sas,. sas7bcat,. sas7bdat,. sav,. sdf,. shp,. sql,. sig,. sketch,. snagproj,. tex,. tif,. tiff,. tmp,. tsv,. twbx,. txt,. wav,. wma,. wmf,. wmv,. wpd,. wps,. xlr,. xls,. xlsb,. xlsx,. xml,. xps,. zip (however we only provide previews for types listed in this article). Due to Socrata security policies, any other file extension on this list will not be allowed to upload. There is a 1 GB upload limit for all non-date files.

Import Geospatial Data:

  1. .zip (shapefiles; the following files are required: .shp, .shx, .dbf, .prj)
  2. .kml (We do not support multiple features in a single layer. Technical terminology: we do not support heterogeneous children inside a multi geometry tag)
  3. .kmz (this a zipped .kml and follows the same import rules as above)
  4. .json (GeoJSON)

Link to External Data:

  1. URL

Connect with an ESRI Map Layer:

  1. URL to the RESTful endpoint for an individual map layer on an ArcGIS Server version 10.0 or above. You can read more about these connections here.
  2. In addition to ESRI Extension, Socrata has built out the Catalog Connector for ArcGIS Server. This new feature allows administrators to connect a public ArcGIS Server to your Socrata catalog and choose the layers to federate to your Socrata catalog as external datasets. 
  3. Our platform supports Web Mercator projections via ESRI Extensions and will not convert other projections types. 

Supported Types on Import

We only try to detect specifically supported types on import. There are many other data types that we support but you'll have to either convert your columns after import or enter the data manually.

The following types are supported:

Plain Text

UTF-8 encoded text (generally alphanumeric text). Our system assumes that there is no text formatting. 

Numbers

Numbers: We directly use Java's BigDecimal parsing. For details see the documentation. If you're working with values that contain leading zeros that aren't importing correctly, visit this support article.

For negative numbers, the format should not contain any commas. -10000 will be read correctly but -10,000 will not.

You can also upload data as a percent or as money. These values will be uploaded as number columns with formatting applied so that they display as percents or money. The display can be changed by modifying the column formatting.

 

Dates & Time / Date & Time (with timezone)

Dates are parsed by default in the American/Pacific (PST) timezone. You can explicitly specify a time zone by using the supported ISO 8601 subset. A 'Z' character is UTC, otherwise, the offset is [+-]HH:mm.

For inputs that don't specify a time of date, the resulting time is undefined. In other words, don't rely on it being anything consistent.

The accepted input formats are:

Supported ISO 8601 Subset

  • yyyy-MM-dd['T']HH:mm:ssZ (e.g. "1920-01-22T00:00:00Z", "1920-01-22T00:00:00-10:00", or "1920-01-22 00:00:00Z")
  • yyyy-MM-dd['T']HH:mm:ss (e.g. "1920-01-22T00:00:00" or "1920-01-22 00:00:00")
  • yyyy-MM-dd['T']HH:mm (e.g. "1920-01-22T00:00")
  • yyyy-MM-dd (e.g. "1920-01-22")

Supported non-ISO Dates

For dates other than the ISO subset we accept a date, optionally followed by a time, i.e.

(date)[ (time)]

Non-ISO dates are always parsed in the American date format locale (i.e. month, day, year). Months and days can be either single or double digit and may or may not be led with a '0'. Years can be either four digits (preferred) or two. If a year is two digits it will be assumed to be between 1951 and 2050: i.e. 1/2/75 would be January 2nd 1975, but 1/2/49 would be January 2nd 2049.

The accepted input formats are:

  • MMM d, yyyy (e.g. "Jan 4, 1982")
  • MMM d, yy (e.g. "Jan 4, 82")
  • MMMM d, yyyy (e.g. "January 4, 1982")
  • MMMM d, yy (e.g. "January 4, 82")
  • M-d-yyyy (e.g. "1-4-1982")
  • M/d/yyyy (e.g. "1/4/1982")
  • M.d.yyyy (e.g. "1.4.1982")
  • M-d-yy (e.g. "1-4-82")
  • M/d/yy (e.g. "1/4/82")
  • M.d.yy (e.g. "1.4.82")

Location and Point Columns

Datasets on Socrata can store location in two different data types Location and Point. In the future, all datasets will use the point format.

Both location and point columns can be built "composite" column that's created by appending multiple values together. We accept the following types of composite location data for geolocation:

  1. Street Address (U.S. only)
  2. City (U.S. only)
  3. State (U.S. only, use the two-letter abbreviation)
  4. Zip Code (both 5 and 9 digits are accepted; U.S. only)
  5. Latitude and Longitude (All Locations)

In addition, you can upload location data to both location and point columns when the data is contained within an entire cell.

For location columns the format will need to be: (lat, long)

For point columns the format should be: POINT (long lat)

URLs

URL's support three different input formats. Only three URL schemes are acceptable: FTP, HTTP, and HTTPS. We use a custom regular expression to validate URLs. It should accept just about anything that you throw at it, but there's always a chance that it's missed something. Please note that since underscores are invalid characters in DNS records, we do not accept underscores. Also, note that only one URL can be entered in a cell.

  1. <a href="http://www.socrata.com/">Socrata</a>
  2. http://www.socrata.com/
  3. Socrata (http://www.socrata.com)

 

Checkboxes

Valid false values:

  • 0
  • f
  • false
  • n
  • no
  • off

Valid true values:

  • 1
  • t
  • true
  • y
  • yes
  • on

  

Which data types are available after import?

There are some file types that we do not handle during import, but you can add to your dataset after you have imported the file. It is also not possible to programmatically import these data types.

Photo (Image)

  • Accepted file formats: .jpg, .png, .gif

Document

  • The following file types are allowed to be uploaded: accdb,. asc,. avi,. azw3,. bmp,. c,. css,. csv,. das,. dat,. db,. dbf,. dib,. doc,. docx,. dta,. dwg,. ecw,. eml,. fmw,. ged,. geojson,. gif,. gif,. gsheet,. ico,. ics,. indd,. jpeg,. jpg,. js,. json,. kml,. kmz,. log,. lyr,. md,. mdb,. mov,. mp3,. mp4,. mpg,. msg,. n3,. numbers,. ods,. odt,. one,. onepkg,. pages,. paint,. pb,. pdf,. png,. pps,. ppt,. pptx,. psd,. pxm,. rar,. rdf,. rtf,. sas,. sas7bcat,. sas7bdat,. sav,. sdf,. shp,. sql,. sig,. sketch,. snagproj,. tex,. tif,. tiff,. tmp,. tsv,. twbx,. txt,. wav,. wma,. wmf,. wmv,. wpd,. wps,. xlr,. xls,. xlsb,. xlsx,. xml,. xps,. zip.

NOTE: FOR DATASETS THAT HAVE DEPRECATED FEATURES, THE FEATURES WILL STILL FUNCTION. HOWEVER, THESE FEATURES WILL NOT BE AVAILABLE FOR ANY NEW DATASETS.

Was this article helpful?
3 out of 3 found this helpful
Have more questions? Submit a request

Comments

1 comment
  • Nice job on this Sam. I'm wondering if it would be worthy to point out in the document or in a comment if Excel's native date format is not accepted or what one should do if they want to upload an Excel spreadsheet with dates in it.

    0
    Comment actions Permalink

Please sign in to leave a comment.