Saturday, January 12, 2008

Geocoding With SQL Server

Geocoding With SQL Server

Geocoding With SQL Server

Update: Sept 10th 2005:

The project has moved far beyond what is on this page. MySql s now supported and there is a new interface.
Please click here  to get the latest information.


THIS PROGRAM IS BEING UPDATED AND IMPROVED FREQUENTLY.
IF YOU WANT TO BE NOTIFIED OF UPDATES, SUBSCRIBE TO THE GEOCODING RSS FEED

Many of the scripts and tables on this page are outdated, please see the geocoding category for the latest versions.
There is now a beta installer.
With the release of the Google Maps API, I wanted to try out some mapping. This ended up being the fairfaxinfo.com project.
As nice as the Google API is, it does nothing to help with the hardest part of mapping: getting addresses translated into longitude/latitude.
The only free service I could find was www.geocoder.us, but I didn't want to rely on it being up/free forever. The code is available in PERL with a Berkley DB, but I would much rather chew on tinfoil than try and read someone else's perl code.
So, after much headscratching, fist pounding, key banging, debugging,  eye wateringly boring (but thorough) census documentation, and in general driving everyone around me crazy, I have finally managed to use the Tiger/Line census data and SQL Server to geocode addresses.

There's a lot of leg work to do up front to prep your db.
I tested random results against geocoder.us and they matched. Your mileage may vary.
COMING SOON: A program that automates the 5 gig census download, extracts the files and installs the db is almost finished. It should be up the week of Aug 10th.
Step 1: Load The Data
First, you'll need to get the Tiger/Line files for your county. You'll need your county's fips code which you get find here.
Download the zip file for your county, it will probably only be a couple of megabytes.
We're only going to use Type 1 and Type 2 files for now. I merged a few other types, but thats beyond the scope of this article.
Extract *.rt1 and *.rt2.

We're going to load these files into tables TIGER_01 and TIGER_02 respectively. I use a database called TIGER, so modify the later commands if you name your db something else.
The table definitions will match the file schema even though we're not going to use all the fields.
Here are the DDL files for TIGER_01 and TIGER_02.
Now we need to use the BCP utility to load the data.
Here are the format files I used: tiger1.txt tiger2.txt

The census files have occasional weird junk that will break your tables, like characters in the address number columns, so you'll need to set the maxerrors and log the output if you want to fix them and reinsert the suspect records.
I suggest you make .bat files out of the bcp commands to save your fingers:

bcp_load.txt

The switches are case sensitive, so be careful. (-F means something different than -f).

Step 2: Create the Functions
Now that we've got the data, we're ready to create the Geocoding functions.
Most return a table containing latitude, longitude, zip, and TLID (the Tiger/Line key). This will be expanded in the future.
Most of the functions in the scripts are support calls, you only need to worry about four of them at the moment:
fnGeocode(
@DirPrefix varchar(2), *see the note below
@Number int, Number component of the address. ex: pass in 1234 to geocode 1234 Main St
@StreetName,
@StreetType,
@DirSuffix varchar(2), *
@ZIP, Optional. Pass in ZIP if known. Otherwise pass in -1
@StateFipsCode, Optional. Pass in FIPS identifier for state if known. Otherwise use -1
@StateCountyCode, Optional. Pass in FIPS identifier for county. Otherwise use -1
)
For the optional components, searches will only be performed if one of the following combinations is provided:
ZIP
FipsState
FipsState AND FipsCounty
*Take special note of the @DirPrefix parameter.
If you are geocoding the address 123 West Main St, you will need to call the function like this:
select * from fnGeocode('W',123,'MAIN','ST','',55555,-1,-1)
The census files store directional prefixes as one or two characters: N, S, E, or W, NW, SE, etc. Pass in an empty string if there is no prefix.
The @DirSuffix is the same concept just at the end of the address. Example: 123 Main St SW.
You will need to normalize your street types before calling the function. For example, Street needs to be passed in as St, and Road needs to be passed in as RD or you won't get a hit.
Select distinct from the TIGER_01.fetype column to get the possible values.
You can reverse geocode and get the closest node intersection with the next function:
fnReverseGeocode( 
@Longitude decimal(9,6),
@Latitude decimal(9,6)
)
Eventually this function will attempt to return the exact address. Right now you will get the street(s) and address range, but I'm not trying to guess the number quite yet.
To get the location where streets A and B intersect (which may be more than once):
fnGeocodeIntersection(
@DirPrefixA varchar(2),
@StreetNameA varchar,
@StreetTypeA varchar,
@DirSuffxA varchar(2),
@ZipA int,
@DirPrefixB,
@StreetNameB,
@StreetTypeB,
@DirSuffxB varchar(2),
@ZipB
)
The fourth function tries to find a location when the exact number isn't known. Addresses I had to geocode were usually given as 5600 block Main St.
Since we are searching a range of addresses the number has to at least theoretically exist on one of the chains. But if there is no address 5600 and the chain starts at 5601, you will not get a hit with the other functions. Use this query as a last resort:
fnGeocodeClosestAddress(
@DirPrefix varchar(1),
@Number int,
@StreetName,
@StreetType,
@DirSuffx varchar(2),
@ZIP
)
To install, you can use the following scripts. The release notes for the latest versions can be found here.
For the entire database (don't use for an upgrade unless you want to reload the data): Geo.All.05.txt
If you already have the database setup, use this script to update the functions: Geo.functions.05.txt


Now for an example of how the census data and Google Maps differ in their data.

This is an image of a complete street chain (mulitple type 1 and type 2 records) overlayed on Google Maps as a polyline:

Notice that the marker is correctly positioned on the census data from which it was derived, but not on the Google road representation.
This one isn't that badly aligned compared to some I've seen.
In the next update I'm going to cover how to merge type 6 records into your data which provide more address ranges for type 1 records.