Saturday, April 28, 2007

Sphinx - Open Source SQL Full Text Search Engine

Sphinx - Open Source SQL Full Text Search Engine


I came across Sphinx today via the MySQL Performance Blog (which has some good entries you might want to check out). It is an Open Source Full Text SQL Search Engine. It can be installed as a storage engine type on MySQL, and from what I hear can beat the pants off of MySQL's built-in full text search in some cases.
From the web site:
Generally, it's a standalone search engine, meant to provide fast, size-efficient and relevant fulltext search functions to other applications. Sphinx was specially designed to integrate well with SQL databases and scripting languages. Currently built-in data sources support fetching data either via direct connection to MySQL, or from an XML pipe.

Here are some of the features:
high indexing speed (upto 10 MB/sec on modern CPUs)
high search speed (avg query is under 0.1 sec on 2-4 GB text collections)
high scalability (upto 100 GB of text, upto 100 M documents on a single CPU)
supports distributed searching (since v.0.9.6)
supports MySQL natively (MyISAM and InnoDB tables are both supported)
supports phrase searching
supports phrase proximity ranking, providing good relevance
supports English and Russian stemming
supports any number of document fields (weights can be changed on the fly)
supports document groups
supports stopwords
supports different search modes ("match all", "match phrase" and "match any")
generic XML interface which grealy simplifies custom integration