Tuesday, December 4, 2007

MSSQL: Parse Search Engine Querystrings for Search Terms

MSSQL: Parse Search Engine Querystrings for Search Terms
As a compliment to my MaxMind GeoIP Locator scripts and the parse top-level domain from URL function, I've created another function to parse the search term from an URL.
I've heard that T-SQL is a bit slow with parsing strings and wanted to test this myself so I created this in both C# and T-SQL. The difference is very noticeable, indeed: while the original, unoptimized function I wrote in T-SQL took 11 seconds to parse 10,000 referral URLs, the C# version took 4 seconds. The updated, more efficient function takes 6 seconds to run in T-SQL and 3 seconds in C#. Below are two truncated versions of the functions I created. I slimmed them down to make them easier to read on the blog but you can download the full versions here: qscleaner.cs and qscleaner.sql. If you do not have Visual Studio.net, scroll to the bottom of the post -- there I link the qscleaner.dll and give sample CREATE ASSEMBLY syntax.
If you don't have access to SQL Server 2005 or SQL 2k5 CLR, you can still use a T-SQL function.

If you do not have VS.NET but still wish to use the C# version of qscleaner, you can download the dll here: qscleaner.dll or you can email me for the SLN file and compile it yourself using msbuild which is included in the .NET 2.0 framework. Anyway, save the dll to the DATA directory that contains your .mdf files (for example: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data). To "install" the UDF, run the following:
CREATE ASSEMBLY QSCleaner
FROM 'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\qscleaner.dll'
WITH PERMISSION_SET = SAFE
GO
CREATE FUNCTION dbo.QSCleaner(@strURL nvarchar) RETURNS nvarcharAS EXTERNAL NAME QSCleaner.UserDefinedFunctions.QSCleaner
GO
To call the function, create a query that's something like this: SELECT TOP 1000 dbo.qscleaner(referer) as keywords FROM hitcounter WHERE referer IS NOT NULL ORDER BY id DESC
Actually, I take it back. The above dll/assembly/function creation doesn't work -- I just receive 1000 h's all the way down. Compiling it via VS.NET works and the T-SQL function works but the manual creation does not. If anyone can shed some light, I'd appreciate it!