Improving Geocoding Results for Address Data

Table of Contents

1. What is Geocoding?

2. Geocoding Errors

i. Error Handling

ii. Common Reasons for Errors

iii. Error Examples

3. Elements of an Address

i. Addresses in the US

ii. Addresses outside of the US

4. On-platform Georeferencing

5. Geocoding Methods

i. Lat/Long

ii. Address Separated

iii. Address Combined

6. Why Won't My Address Geocode?

7. Improving Results

i. Address Errors from Typos

ii. Address Errors from Inaccurate Information

iii. Address Errors from Missing Address Elements

iv. Address Errors from Locale Colloquialisms

8. Address Truthing

9. Geocoding Best Practices

 

1. What is Geocoding?

Address Analysis has been around for a very long time and although it may appear simple, it can actually be quite complex.  If you're curious and want to learn more about how addresses are understood, analyzed, and assessed in data systems you may read more about it online, or in this ESRI-supported article

Much of the data we use on an Open Data Platform or Enterprise Data Platform contains address information.  Our platform helps transform these addresses into point locations to be mapped in a visual display.  This transformation process is called Georeferencing.  Under the georeferencing umbrella is geocoding.  Geocoding helps transform a textual address into an X, and Y coordinate pair with a geometry type of POINT.  This geocoding step is required if you intend to display addresses as point locations on a map.

On our platform "Geocoding" or relating a street address to a geographic coordinate

  • Typically uses an address in text form (input) to produce Latitude/Longitude coordinates (output).
  • These Latitude and Longitude coordinates (2 elements) are then combined into location data  (1 element)
  • Resulting in a single column of a coordinate pair (X,Y) that has a data type POINT which may then be used in a map visualization. 

Example:

A single column of coordinate pairs (X,Y Locations) of data type POINT

 

2. Geocoding Errors

There are times when an address won't geocode, and it may not be clear why. Errors will present RED in the Geocoding results area with a reason the error occurred. 

 

i. Error Handling

When errors occur, a user may choose to either 'Skip Row' or 'Treat Value as Empty.' 

  • Skip Row - Treats the cell as an error, which removes the row from the dataset on import
  • Treat value as empty - Imports the row with the georeferenced value as null

ii. Common Reasons for Errors

In general, when an address doesn't geocode properly it can usually be traced back to one of the following reasons:

  • an incorrect or poorly formed address
  • an incomplete address
  • too many or identical addresses found during the geocoding process
  • the format of the address is not recognized by the geocoder
  • inaccurate address databases

In the image above the 4 addresses have issues or are incomplete, so it would make sense that the ESRI Geocoder may not be able to geocode them correctly.  You may click the Export Errors button at the bottom of the ingress screen to investigate your errors.  The 4 records that were reported as errors from the image above are below. Once each address is investigated, it becomes apparent why these were flagged as errors.  Read below to find out why. 

iii. Error Examples

Example 1:

Unable to convert "Fort Dix, New Jersey, Fort Dix, NJ 08640" from text to location

  • "Fort Dix" is a nickname used by locals, it is not recognized as an official address name.

Example 2:

Unable to convert "321 Main Street, 2nd Floor" from text to location

  • "321 Main Street, 2nd Floor" is not enough for the geocoding process to find this address, City, State, and Zipcode are needed.  Can you imagine how many address hits there are for "321 Main Street, 2nd Floor" in the world?

Example 3:

Unable to convert "24 Commerce Street, Suite 510" from text to location

Unable to convert "101 Oliver Street" from text to location

  • The remaining two addresses are missing City, State, and Zipcode, to help the Geocoder locate this address. 
  • Many times, you may improve your results with small changes to either the addresses within your dataset or the method of georeferencing you've selected. The main rule of thumb is the more precise information you provide to the Geocoding Engine the more accurate your location output will be. 
  • For the purpose of this article, we will be focusing on Geocoded results built from Address data.  

 

3. Elements of an Address

i. Addresses within the United States

Standard addresses within the US include 4 major elements.  A street address, City, State, and Zip Code.  Variations to these 4 elements do exist and can help to enhance geocoding results simply because they provide more detail to help pinpoint a location.  For instance, you may have a street address to a local building complex, as "454 North Ave", but if you needed to locate your Dentists' office within that complex, you might append a suite number to the street address, like 454 North Ave, Suite 101.  

Another common variation is adding a range or a route to the zip code.  This is represented as a 4-digit code after the 5-digit zip code and represents a range of zipcodes or might even be synonymous with a mail carrier's delivery route.  

  • [Street] [Suite]
  • [City]
  • [State]
  • [Zip Code] [Range/Route]

ii. Addresses Outside of the United States

Not all countries follow the same address format.  The format of the address and level of detail may affect the geocoding results. For instance, here is what the standard address might include in 3 different countries:

Spain:

  • [First Name][Last Name] 
  • [Street Name][House Number], [Stairwell], [Floor], [Door] 
  • [Postal Code]
  • [City] 
  • [Country]

Norway:

  • [Name]
  • [Street]
  • [Postal Code] [Locality]
  • [Country]

Columbia:

  • [Name]
  • [Street] [House Number] [Sub Building]
  • [Postal Code] [Locality]
  • [Country]

Because of these differences, it's important to provide as much information about an address during the geocoding process to help the geocoding engine find the right results.  Did you know that just searching Google for "454 North Ave" with no City, State, or Zipcode produces pages upon pages of this address occurring all over the US?  Further, if this address is searched for on Google.ca (Canada) we see even more!  Did you know that adding the City, State, and Zipcode to this street address properly returns an X,Y Coordinate and locates the XY point at the proper address? 

It's important to keep in mind that Geocoding is not magic and providing more information will generally yield better results.  

Below are some strategies you may use to help the geocoding engine find accurate results for you.

 

4. On-platform Georeferencing

Data & Insights uses the ESRI World Geocoding Engine via the ESRI ArcGIS REST API to geocode addresses and coordinates into location data during the ingress and/or update process.

You may add georeferenced data to both new datasets and existing datasets as long as there is either address data (text) or latitude/longitude present within the data columns.

Regular Ingress workflow (new dataset)

  • Create a New Dataset > Add Data > Complete Ingress > Publish Dataset

The geocoding workflow is similar to the general ingress workflow except that you will be adding a georeferencing step.  

Ingress with Georeferencing workflow

  • Create a New Dataset > Add Data (with Address information) > Georeference Data > Complete Ingress > Publish Dataset

Georeference Data Step above (generalized):

  1. Our ODP or EDP platform transmits a call via API to the ESRI Geocoding service, passing the address components provided
  2. ESRI responds to the call with an X, Y coordinate for each address transmitted
  3. Our platform packages this into a POINT location in a column in the dataset

Example of Step 2

ESRI API Geocoding Service Example (API Call/Response)

 

5. Geocoding Methods

From an ODP or EDP ingress screen, you may access the georeferencing screen by clicking Add georeference along the left margin.  Here, you will see the 3 methods we offer for processing location data. 

Methods

  • Lat/Long (georeference from X,Y coordinates)
  • Address Separated (geocoding from address)
  • Combined Location (geocoding from address)

*Note: Two of these methods perform geocoding from address data, and one performs georeferencing with X,Y coordinates.

i. Latitude/Longitude

The Lat/Long method of georeferencing will accept a typical X coordinate and Y coordinate as Latitude and Longitude.  The platform expects the coordinate pair to be represented in separate columns within your dataset. Where the X coordinate is entered as the Latitude and Y is entered as the Longitude.  Although Lat/Long coordinates can represent addresses, they flow through the georeferencing process differently than an address would.  

For the purpose of this article, we won't be focusing on the Latitude/Longitude method. 

Please refer to the Creating Georeference Columns in the Data & Insights Data Management Experience article for more information about how to run the Geocoder on our platform.

ii. Address Separated

The Address Separated method of geocoding will accept the pieces of an address to create location data and generally will produce more accurate results than the Address Combined mentioned below.  Therefore, the Address Separated method is highly recommended if you have incomplete address data.  

In the US there are 4 basic elements or parts to an address:  Street, City, State, and Zip Code.  

Example:

Full Address:  “123 Tyler Ave, Fort Worth, TX 76006"

  1. Street: 123 Tyler Ave
  2. City: Fort Worth
  3. State: TX
  4. Zip: 76006

Although, It is not required that every piece of a 4-part address is present for the Address Separated method to work.  Understand though that incomplete input can lead to incomplete or inaccurate output.  See the Geocoding Best Practices section below for guidance.

For example, you may have street information and city information but have no state or zip code to input.  The Address Separated method will still try to process the street address and will try to provide the best possible geocoded results based on the information provided. 

Many times, even though elements are missing, the Address Separated method will still do a great job of geocoding a location with accurate results.  This is more common with very unique street addresses (i.e. 95782 East Butenschoen Parkway.) and not so common with street addresses that can be found in every town (i.e. 500 Main St.). 

iii. Address Combined

You may wonder why we provide both Separated and Combined options when Geocoding.  Although these methods are similar, they are not identical.  The ESRI Geocoding service treats separated and combined addresses differently.  With an address separated into its elements, each element is validated separately and then scored and ranked.  The algorithm then determines to use each "Best option" for each element to match an address before returning an X,Y location. Where a combined address is treated as a single input where a match is sought for the entire address at once.  

For example, the word "Park" can be a place name (Sherwood Park), a street name (Park St.), or a street type (Stevens Parkway).  When an Address is Separated into elements, "Park" would be evaluated independently from the rest of the address first, then scored and ranked, and eventually matched to the other relevant address elements before arriving at the best possible match result.  When "Park" is provided in a Combined Address (456 Park St., New Haven, 06506) the occurrence of the word is treated in relevance to the rest of the address.

Although the Separated Address method might have longer processing times, in live tests, this method has outperformed the Combined Address method resulting in fewer errors and more addresses matched.  To read more about how address elements are matched to addresses click here.

 

6. Why Won't My Address Geocode?

There are times when an address, even though it's correct and you know it's correct, won't geocode.  There are some unique situations where you're just going to be out of luck.  The items mentioned below are some of the less common reasons an address won't geocode.

Some of the reasons why are:

  • The address may not be receiving mail and is missing from the US Postal Service databases or other survey databases.
  • The address may have street, city, and state, but it's missing its zip code
  • There are too many identical address results for the geocoding process to choose from 
  • There are errors within the source databases that the geocoder is checking for matches

 

7. Improving Results

Again, if your address is erroring, and the platform is not returning a point location, then assume that there is something wrong with the address first, double-check the address by validating it, and then try again.  In the following sections, we'll provide some error scenarios for you to consider.

i. Address Errors from Typos

Check any address producing an error for typos, spelling errors, and city or zip code discrepancies.

Example 1: 

Address:  50000 Hadley Road, Suite 100, South Plainfield, NJ 07080

  • There are too many zeros in 50000, this address geocodes fine with 5000 Hadley Road

ii. Address Errors from Inaccurate Information

Example 2:

Address:  Hangar 1, Lansdowne Road, NAES Lakehurst, NJ 08733-5001

  • Although NAES (Naval Air Engineering Station) is the official name of a Military Base, it is not part of the city name, NAES should be removed from the city name
  • The formal street address for this location is 'Lakehurst Hangar 1', not 'Hangar 1' 

Example 3:  Try Logical Variations

Hangar One, Hangar-1, Lakehurst Hangar 1

  • Variations can sometimes matter to geocoding results, One, 1, 1- are all read differently in an alphanumerical system.
  • You may want to consider 'truthing' your address with multiple sources
  • Addresses may change over time with new zoning

iii. Address Errors from Missing Address Elements

Example 1:

Address: 524 5th Street, Franklin

  • "Franklin" is one of the most common town names in the USA
  • With the lack of a State or Zipcode, the system will find too many matches to return a valid result

Example 2:

Address: 3000 E 1st Ave, Denver, CO 80206

  • The above address is the main address of a mall, and malls can cover whole areas or city blocks!
  • Adding a Suite Number (e.g. Suite #305) in the address will assist in locating the address appropriately

iv. Address Errors from Locale Colloquialisms

Example 1

Address:  9159 Farm-To-Market Rd 78, Converse, TX 78109

  • "Farm-to-Market" or "FM" is a hold-over from earlier times when locals described roads in their communities.  
  • This literally meant it was the local road or route where rural farms took their goods to market to nearby towns and cities.
  • These names can sometimes still show up in modern-day addresses 
  • In this case, FM 78 is also referred to as "Old Seguin Rd" or can be seen together in some addresses like "Old Seguin Road & FM 78, San Antonio, TX 78244"
  • As a result, modern-day address databases might have difficulty understanding the address if a local colloquialism is used.
  • The USPS keeps a standard for road abbreviations (https://pe.usps.com/text/pub28/28apc_002.htm) and Farm-to-Market or FM is not on the list.

 

8. Address Truthing

Here are some free and easy ways to truth your address data:

USPS.com - If you live within the United States, you may check all errors you receive against the US Postal Services' Address Validation tool here: 

USPS.com - To see a list of the official USPS Suffix Abbreviations

 Google Maps - Google Maps is free and easy to use.  You may find it by typing "Google Maps" into a web browser or navigating to:

Google Earth Pro - Google Earth Pro is free and a little more complex to use.  You may download the stand-alone application to your workstation but might need to check in with your IT department before you do this to ensure you have the proper permissions to download and install programs.

County Clerk & Recording Office - You may visit your local County Clerk & Recording Office in person or online to perform a Public Records search.  Official property records will have official addresses and many times surveyed addresses from qualified professional surveyors.  

Note:  each county will host the property records within that county only.  If the address is located in a different county you must search the property records of the other county.

Utilize the Map Display - During the Georeferencing process on-platform, there is the option to view your geocoded locations on a map, this includes the errors.  To see the map function you may visit the article How to Configure a Georeference/Location Column

Other sources - Ask a GIS Professional to help you validate your address.

 

9. Geocoding Best Practices

So in general, if your address is erroring, the first likely scenario is that something is wrong with the address, double-check the address by validating it, then try again.  If the address still errors, then you can start to try different geocoders (off-platform) to see if any of them find the address.  Because not all geocoders have been built the same, you will find varying accuracy levels between them.  
 
Here are some guidelines to help you when geocoding on our ODP or EDP platforms.

 
Basic recommendations:
 

  • Remember! If your address is erroring, and the platform is not returning a point location, then assume that there is something wrong with the address first, and not something going wrong with the geocoding process.  Double-check the address by validating it, and then try again.

 

  • It's always best to have all elements of an Address for geocoding (for example:, Street Address, City, State, Zip), anything less will degrade the geocoding results.  

 

  • In general, use the Combined method only when geocoding complete addresses with no missing elements (for example, “123 Tyler Ave, Fort Worth, TX 76006")

 

  • Use the Separated method when geocoding ambiguous (incomplete) addresses, or when combined addresses are erroring (for example, "Tyler, Fort Worth, TX")

 

  • If the source data is poor, incomplete, or incorrect expect geocoding to be poor, incomplete or incorrect.

 

  • Verify the address is a valid address recognized by the US Postal Service (or using another method), if it is not, it still might geocode fine, but geocoders may also have a harder time finding the address and it may continue to error.

 

 
 
 
 
 
 

 

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.