When to Shape or Join with the SoQL Query Editor
Querying DateTime Datatype Columns
When to Shape or Join with the SoQL Query Editor
Published datasets on Data & Insights may be shaped by filtering, sorting, grouping, and performing other functions against the source dataset. Published datasets may also be joined together with other published datasets on Data & Insights. Both actions, shaping and joining, populate a new derived view with the desired data. The new view continuously refers back to the source or sources it has been produced from.
Available within the Exploration Canvas on Enterprise Data sites, the SoQL Query Editor is an advanced surface for building and running queries on your published datasets in order to produce these views. This Query Editor enables a view to be articulated with Socrata Query Language, including all functions documented on dev.socrata.com. Socrata Query Language is similar to Structured Query Language (SQL). The syntax will likely feel familiar to you if you are familiar with SQL.
The SoQL Query Editor also supports joins. The JOIN keyword may be used to join published Data & Insights datasets together as part of a SoQL query.
To open the Query Editor:
- On a published dataset, “View Data”. The dataset will open in the Exploration Canvas.
- Select “Create View” on the action bar, in blue. This creates a draft view of the source dataset. You can only filter and perform other queries on a view of the source dataset.
- Under the data preview area, select the last menu icon below the Visual Query Editor tabs (Filters, Group, Column Manager, and Joins).
SoQL Query Editor
The Editor is a surface for building a query. The columns in your dataset are listed after a SELECT keyword in the editor when you first open it up.
Keywords are the foundational verbs of your query.
- SELECT – selects columns from your source
- JOIN – allows for data from another published Data & Insights dataset to be queried and included
- WHERE – filters the records to be returned based on conditions you provide
- GROUP BY – determines any groupings to use based on columns you provide
- HAVING – filters records returned from an aggregation
- ORDER BY – determines the order of the results
After you run your query, you'll see the results in the data preview table located above the Query Editor.
Aliasing Column Field Names
Within the SELECT statement, you may alias any columns with new field names. Be sure to use only valid field names:
- all lowercase
- no spaces
- cannot be the same as another field in the query
- cannot start with a number
Shape Data
Shaping data refers to actions such as aggregating, grouping, or calculating values from the column and records of your source dataset. Your inputs may be manipulated by the functions documented in the side panel of the query editor.
This example aggregates the total number of units within affordability bands. When applying a function to a column, place the function’s name just before the column name. Be sure to surround the column name with parenthesis to indicate that it is the object to be acted upon by that function.
sum(totalunits)
You may rename the result of a function being applied to a column with the word ‘as’ followed by the new column name.
sum(totalunits) as sum_total_units
When applying functions that aggregate your data, you must account for columns that are not yet aggregated or grouped. Aggregating one column without providing a rationale for aggregating or grouping other columns in your query is not a valid query.
In an aggregation, the columns in the SELECT statement must either be used with an appropriate function, added to a GROUP BY condition, or removed from the SELECT statement if no longer relevant. In this example, Project ID and Building ID are levels of detail that should be removed when trying to determine at a higher level how many units in total there are within each affordability band.
Add the “GROUP BY” keyword and any relevant columns to set unique groups based on the values in those columns.
Once you have written a valid query, the results of the query may be viewed by selecting “Run”.
Join Data
The JOIN keyword in the Query Editor is used with identifiers of published datasets on Data & Insights to perform a join. You can check this article for more information on the different types of joins supported. Before building your query with the JOIN keyword, locate the identifier for the datasets you wish to join, and the column(s) you wish to join on:
- On a published dataset, locate the string of characters at the end of the URL. The string will have the format xxxx-xxxx and is referred to as the dataset’s 4x4 (four by four).
- While you’re there, locate the field name of the column you wish to join on, and any other field names needed. Field names are readily found on the dataset’s Primer page, in the “Columns in this Dataset” section. Expand each column’s row to discover the field name. Your column names may well be the field_names if they are short and contain no spaces.
To start writing the join, navigate to the primary dataset you wish to use in the join.
- Navigate to the SoQL Query Editor, as described above, at top.
- Add the JOIN keyword to the end of the pre-populated SELECT statement.
- Add an @ symbol, then the 4x4 of the second dataset you want to join to.
- Alias the second dataset with “as” followed by a short, descriptive alias for that table. It’s common practice to alias the first table as “a” and the second table as “b”, etc.
In this example, we have referenced the 4x4 of the second dataset and provided the alias “projects”, as it refers to a projects dataset.
Now, it’s time to include the columns we want from our second table in the SELECT statement.
- Add column field names from the second table with an @ symbol and the alias before each field name, separating each with a dot.
For example:
@projects.projectname
@projects.programname
Rename any of the newly referenced columns as needed.
To execute the join, define the columns that you want to match on.
- In your JOIN expression, and the operator “on”.
- Reference the column from the second table by name. For example, @projects.projectid
- Set that column equal to the column from the primary table that you will attempt the join with. For example, projectid.
- Select Run to preview the results.
Your join expression will look something like this:
JOIN @6jsz-jk6b as projects on projectid = @projects.projectid
[join keyword] [@4x4] [as] [alias for 2nd table] [column, 2nd table] [column, primary table]
Save and Publish the Result
Select the Run button in the upper right hand corner of the editor to apply the query.
At this point your changes are saved in draft form. You may also decide to publish this draft view to a specified audience, like private, internal, or public by selecting the Publish dataset... button. At the same time, you can add and manage collaborators that may be working on the draft with you.
Error Messages, Explained
“Invalid SoQL Query”
- The query as written is not valid SoQL, meaning the query is attempting to perform some action that is not possible or not supported
- Make sure all columns are accounted for in an aggregation
- Make sure any functions used are accurately written. Make sure functions are receiving all of their required arguments.
- Make sure your operators are correct.
- For example, “is” is not the same as “=”. The operator “is” belongs with functions like “null” or “not …", whereas “=” is for comparisons with numbers or strings, etc.
- Make sure you refer to the correct data type when writing out values.
- For example, if a column is a number data type, you can refer to a value or compare to a value written plain, like 4587. However, if the column is a text data type, you must compare or refer to text strings surrounded by single or double quotes, like “4587”.
“Error Parsing SoQL Query... ”
- There may be a missing comma after one of your column names in your SELECT statement. Note, though, that no comma is placed after the last column in your SELECT statement. Commas are used to separate the field names of your query. If nothing follows your last field name for a specific Keyword, then you don’t need a comma.
- Check your keywords for proper order.
- Check functions for necessary arguments, closed parenthesis, etc.
- Check for misspellings.
- Be sure your aliases are valid. They must be lowercase with no spaces, and cannot begin with a number or include special characters.
“Unexpected token … ”
- There is likely a syntax error. Check for missing commas where they should be separating field names.
- This may also refer to keywords being out of order
“Expression expected”
- Related to above. If a comma is hanging around where it isn’t necessary, or if all parentheses are not closed, the query will wonder what you intend to write next and return this error.
“Identifier expected”
- A column with an alias is not using a valid alias.
- Aliases must be valid field names. They may not start with a number nor be just a number.
“Collocation is rejected because the total size in bytes exceeds the limit ...”
- Each domain has a limit on a necessary process called collocation. Reach out to Data & Insights support to mediate this error message. Please provide links to the datasets you are attempting to join and the corresponding query.
“SoQL expression referred to column that does not exist: column”
- Check for misspellings of your column names or for an alias that you may be using inconsistently.
- If this is a join, check to make sure any columns from your joined table are referred to properly.
Query the Results of a Query
Sometimes you may want to query the results of a previous query, like when aggregating twice on the same column or when trying to count unique values. To accomplish the second query, create a new view with the results of your first query as the source.
- Save and publish your first view.
- From the view primer page, select View Data.
- Within the Exploration Canvas, select Create View, and give the new view a name.
- Open the SoQL Query Editor in the Exploration Canvas.
- Write and run your query, then publish the results.
Views are chained to one another. When you query a view, you may only query what is in that view, not everything from the parent dataset.
You may use any view as the source for a new view.
Querying DateTime Datatype Columns
Dates and times may be represented in many different formats on the Data & Insights platform. However, for querying, all dates are stored in ISO 8601 format. When including DateTime values in your query, you will need to write them out in ISO 8601 format. For example:
If you need to check the format or timezone of the DateTime values in a column, navigate to the dataset’s API endpoint to check on the full DateTime values themselves. A dataset’s API endpoint is {yourdomain.gov}/resource/{4x4}.json
Comments
Article is closed for comments.