Showing posts with label Full-Text Search. Show all posts
Showing posts with label Full-Text Search. Show all posts
Saturday, April 28, 2007
SQL Server 2005 - Full-Text Search (MSDN)
Full-Text Search Developer InfoCenter
SQL Server 2005 Books Online
Full-Text Search Developer InfoCenter
To help the Full-Text Search developer be as effective as possible, the following lists provide links to relevant topics.
Concepts
Introduction to Full-Text Search
Using Full-Text Search
Getting Started with Full-Text Search
Comparing Full-Text Functions and Full-Text Predicates
Inquiring About the Full-Text Key Column
Integrating Full-Text Search and Transact-SQL Predicates
Obtaining Full-Text Property Values Using Transact-SQL Functions
Searching for Specific Word or Phrase (Simple Term)
Performing Prefix Searches
Searching for the Inflectional Form of a Specific Word (Generation Term)
Searching for Words or Phrases Close to Another Word or Phrase (Proximity Term)
Searching for Words or Phrases Using Weighted Values (Weighted Term)
MySQL FULLTEXT Indexing and Searching
MySQL FULLTEXT Indexing and Searching
MySQL has supported FULLTEXT indexes since version 3.23.23. VARCHAR and TEXT Columns that have been indexed with FULLTEXT can be used with special SQL statements that perform the full text search in MySQL.
To get started you need to define the FULLTEXT index on some columns. Like other indexes, FULLTEXT indexes can contain multiple columns. Here's how you might add a FULLTEXT index to some table columns:ALTER TABLE news ADD FULLTEXT(headline, story);
Once you have a FULLTEXT index, you can search it using MATCH and AGAINST statements. For example:SELECT headline, story FROM news
WHERE MATCH (headline,story) AGAINST ('Hurricane');
The result of this query is automatically sorted by relevancy.
MATCH
The MATCH function is used to specify the column names that identify your FULLTEXT collection. The column list inside the MATCH function must exactly match that of the FULLTEXT index definition, unless your search in boolean mode (see below).
AGAINST
The AGAINST function is where your full text search query goes. Besides the default natural language search mode, you can perform boolean mode searches, and use query expansion.
Boolean Mode SearchesSELECT headline, story FROM news
WHERE MATCH (headline,story)
AGAINST ('+Hurricane -Katrina' IN BOOLEAN MODE);
The above statement would match news stories about hurricanes but not those that mention hurricane katrina.
See the MySQL documentation on Boolean Mode searches for more info.
Query Expansion
The Blind Query Expansion (or automatic relevance feedback) feature can be used to expand the results of the search. This often includes much more noise, and makes for a very fuzzy search.
In most cases you would use this operation if the users query returned just a few results, you try it again WITH QUERY EXPANSION and it will add words that are commonly found with the words in the query.SELECT headline, story FROM news
WHERE MATCH (headline,story)
AGAINST ('Katrina' WITH QUERY EXPANSION);
The above query might return all news stories about hurricanes, not just ones containing Katrina.
A couple points about Full-Text searching in MySQL:
Searches are not case sensitive
Short words are ignored, the default minimum length is 4 characters. You can change the min and max word length with the variables ft_min_word_len and ft_max_word_len
Words called stopwords are ignored, you can specify your own stopwords, but default words include the, have, some - see default stopwords list.
You can disable stopwords by setting the variable ft_stopword_file to an empty string.
Full Text searching is only supported by the MyISAM storage engine.
If a word is present in more than 50% of the rows it will have a weight of zero. This has advantages on large datasets, but can make testing difficult on small ones.
Do you have any other good tips for fulltext searching and indexing in MySQL?
Sphinx - Free open-source SQL full-text search engine
Sphinx - Free open-source SQL full-text search engine
Features
high indexing speed (upto 10 MB/sec on modern CPUs)
high search speed (avg query is under 0.1 sec on 2-4 GB text collections)
high scalability (upto 100 GB of text, upto 100 M documents on a single CPU)
supports distributed searching (since v.0.9.6)
supports MySQL natively (MyISAM and InnoDB tables are both supported)
supports phrase searching
supports phrase proximity ranking, providing good relevance
supports English and Russian stemming
supports any number of document fields (weights can be changed on the fly)
supports document groups
supports stopwords
supports different search modes ("match all", "match phrase" and "match any" as of v.0.9.5)
generic XML interface which grealy simplifies custom integration
pure-PHP (ie. NO module compiling etc) searchd client API
Sphinx 0.9.7
source code: sphinx-0.9.7.tar.gz (344 KB)
Win32 release binaries: sphinx-0.9.7-win32-release.zip (408 KB)
Win32 debug binaries: sphinx-0.9.7-win32-debug.zip (920 KB)
Ruby API update 0.3.0: sphinxapi-ruby-0.3.0.zip (26K), official site
http://www.sphinxsearch.com/doc.html
Subscribe to:
Posts (Atom)