Wednesday, September 19, 2007

Making the case for Full Text Search

Making the case for Full Text Search

Use Case for Full Text Search

On a recent project, one of the needs was to search by a few different sets of criteria which were in a number of different tables with a number of one to many relationships.  Most of the tables had millions of rows of data with the largest table having over 12 million rows.  The data was primarily read-only and updated on a monthly basis.  Unfortunately, all of the queries had to access the 12 million row table with 1 to 5 joins to the child tables and 1 to 5 WHERE clauses in the queries.  The data needed to be returned as quickly as possible with the highest level of concurrency based on the existing hardware resources.

In this scenario, a number of test cases were built with a few different T-SQL coding techniques to include:

Based on the testing conducted (queries, data, concurrency, etc.), the EXCEPT and INTERSECT syntax was the best T-SQL option for a single JOIN and/or a single WHERE clause statement, which was considered a rare condition.  The best T-SQL option for a numerous JOIN and numerous WHERE clause statements, which was considered the norm, was the Full Text Catalog with the CONTAINS command.  This was a little counter intuitive initially, but the Full Text Catalog was the most efficient as we tested each scenario and reviewed the cost of the query plans.  However, this may not be the case for your data and queries, so test thoroughly.

*** NOTE *** - As a point of reference, the testing was conducted with the DBCC DROPCLEANBUFFERS command issued in between statement executions to ensure cached results would not unfairly benefit subsequent queries.

Setting up a Full Text Catalog

Since, Full Text Catalogs may be new to some DBAs and Developers, let's walk through an example of setting up a Full Text Catalog for the AdventureWorks database (SQL Server 2005 sample database):

Creation - To create a new Full Text catalog navigate to root | Databases | AdventureWorks (Database Name) | Storage | Full Text Catalogs.  To create the catalog, right click on the Full Text Catalogs folder and select the 'New Full-Text Catalog...' option.

On the New Full-Text Catalog interface the following settings should be configured:

  • Full-text catalog name
  • Catalog location
  • Filegroup
  • Owner
  • Accent Sensitivity

Corresponding Script

USE [AdventureWorks]
GO
CREATE FULLTEXT CATALOG [zProduction.Product]
IN PATH N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData'
WITH ACCENT_SENSITIVITY = ON
AUTHORIZATION [dbo]
GO

Configuration - To configure the Full Text Catalog (tables, columns, schedule, etc) navigate to root | Databases | AdventureWorks (Database Name) | Storage | Full Text Catalogs | Catalog Name (i.e. zProduction.Product in our example) and select the 'Properties' option. 

On the Full Text Catalog Properties interface the following settings should be configured:

  • General configurations
  • Tables\views with the corresponding columns
  • Population Schedule

General Page - This interface corresponds primarily to the catalog that was setup in the previous steps, with the ability to indicate if the catalog is the default, the catalog owner and if the catalog is accent sensitive.

Tables/Views Page - This page provides the opportunity to determine the tables, columns to include in the Full Text Catalog.  Although multiple tables and columns could be included.  In the lower portion of the interface, it is important to note the unique index for each table, the language and the track changes setting.  For more information about the track changes setting, reference the Rebuilding the Full Text Catalog section below.

Population Schedule -

Corresponding Script

USE [AdventureWorks]
GO
CREATE FULLTEXT INDEX ON [Production].[Product] KEY INDEX [PK_Product_ProductID] ON [zProduction.Product] WITH CHANGE_TRACKING AUTO
GO

USE [AdventureWorks]
GO
ALTER FULLTEXT INDEX ON [Production].[Product] ADD ([Name])
GO

USE [AdventureWorks]
GO
ALTER FULLTEXT INDEX ON [Production].[Product] ENABLE
GO

USE [msdb]
GO
DECLARE @jobId BINARY(16)
EXEC msdb.dbo.sp_add_job @job_name=N'Start Optimize Catalog Population on AdventureWorks.zProduction.Product',
@enabled=1,
@start_step_id=1,
@description=N'Scheduled full-text optimize catalog population for full-text catalog zProduction.Product in database AdventureWorks. This job was created by the Full-Text Catalog Scheduling dialog or Full-Text Indexing Wizard.',
@category_name=N'Full-Text', @job_id = @jobId OUTPUT
select @jobId
GO

EXEC msdb.dbo.sp_add_jobserver @job_name=N'Start Optimize Catalog Population on AdventureWorks.zProduction.Product', @server_name = N'JTKLAPTOP'
GO

USE [msdb]
GO
DECLARE @schedule_id int
EXEC msdb.dbo.sp_add_jobschedule @job_name=N'Start Optimize Catalog Population on AdventureWorks.zProduction.Product', @name=N'zProduction.Product Full Text Catalog ',
@enabled=1,
@freq_type=4,
@freq_interval=1,
@freq_subday_type=1,
@freq_subday_interval=0,
@freq_relative_interval=0,
@freq_recurrence_factor=1,
@active_start_date=20070924,
@active_end_date=99991231,
@active_start_time=20000,
@active_end_time=235959,
 @schedule_id = @schedule_id OUTPUT

select @schedule_id
GO

USE [msdb]
GO
EXEC msdb.dbo.sp_add_jobstep @job_name=N'Start Optimize Catalog Population on AdventureWorks.zProduction.Product', @step_name=N'Full-Text Indexing',
@step_id=1,
@cmdexec_success_code=0,
@on_success_action=1,
@on_success_step_id=-1,
@on_fail_action=2,
@on_fail_step_id=-1,
@retry_attempts=0,
@retry_interval=0,
@os_run_priority=0,
 @subsystem=N'TSQL',
@command=N'USE [AdventureWorks]
ALTER FULLTEXT CATALOG [zProduction.Product] REORGANIZE',
@database_name=N'master'
GO

Querying with the Full Text Catalog

Here are a few Full Text Catalogs query examples with the CONTAINS command for the AdventureWorks database as a point of reference:

Product ID and Product Name Selection

USE AdventureWorks;
GO
SELECT ProductID, [Name]
FROM Production.Product
WHERE CONTAINS([Name], '"*washer*" OR "*ball*"');
GO

Description Selection

USE AdventureWorks;
GO
SELECT ProductDescriptionID, Description
FROM Production.ProductDescription
WHERE CONTAINS(Description, '"*technology*" OR "*performance*"');
GO

Rebuilding the Full Text Catalog

One item to keep in mind is that the Full Text Catalogs need to be rebuilt automatically, on a schedule or on an as needed basis in order to include all of the updated data since the initial population.  Please reference the options below to rebuild the Full Text Catalogs based on the Tables/Views Page interface shown above:

  • Automatic - The data in the full-text index is automatically updated as the data in the corresponding table is inserted, updated or deleted. 
  • Manual - In this scenario, when the indexed data is inserted, updated or deleted, SQL Server will track the change, but not update the index as is the case with the automatic option. A SQL Server Agent job needs to run in order to update the index.
  • Do not track changes - When the indexed data is inserted, updated or deleted, SQL Server will not track the change and the index must be rebuilt to reflect all of the underlying data changes.

Each of these options need to be understood in terms of data availability versus performance.  If the data in the Full Text catalog always needs to be up to date at any cost, then use the automatic setting.  If the data changes need to be recorded, but not update the Full Text indexes then the manual setting should be used. This setting should balance some of the performance needs by rebuilding the Full Text indexes during a low usage period.  If the data is updated in bulk on a regular basis, then not tracking changes is probably appropriate with the practice that the Full Text index will need to be updated following the bulk loading of the data.

Next Steps