This whitepaper explores full-text search capabilities of MySQL and PostgreSQL focusing on the features, indexing, and search approaches.
Full-text search is the process of finding natural-language documents that satisfy a query. The most common approach to full-text search in a database is to find all records containing given query terms and list those rows in order of their similarity to the query. Meaning of similarity is quite flexible and depend on the specific application. The simplest search treats searching query as a set of words and similarity as the frequency of query words in the row.
In MySQL full-text index is created on CHAR, VARCHAR or TEXT columns to enhance the efficiency of search queries running against those columns. Here are key features of full-text index concept in MySQL:
CREATE TABLE
statement or added to
an existing table using ALTER TABLE
or CREATE INDEX
statementsFULLTEXT
MySQL allows to include full-text index declaration in the CREATE TABLE
statement (example
from the official MySQL documentation):
CREATE TABLE opening_lines ( id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, opening_line TEXT(500), author VARCHAR(200), title VARCHAR(200), FULLTEXT idx (opening_line) ) ENGINE=InnoDB;
Also, you can add a full-text index to an existing table through ALTER TABLE
or
CREATE INDEX
statements as follows:
CREATE FULLTEXT INDEX idx_name ON opening_lines(opening_line); ALTER TABLE opening_lines ADD FULLTEXT INDEX idx_name (opening_line);
Full-text search in MySQL is implemented via 2 functions MATCH()
and AGAINST()
that allows to find specific terms in columns covered by the full-text indexes.
Below is example of a full-text search query to get number of rows containing the word 'Ishmael' in the 'opening_line' column:
SELECT COUNT(*) FROM opening_lines WHERE MATCH(opening_line) AGAINST('Ishmael' IN NATURAL LANGUAGE MODE);
Concepts of full-text search in PostgreSQL is described in this arctile.
When converting full-text search features from MySQL to PostgreSQL it is important to properly choose between two primary full-text search approaches: tsvector indexes and trigram indexes. Have in mind that tsvector indexes are ideal for complex linguistic searches, on the other hand trigram indexes have been designed for substring searches and fuzzy matching. It is important to carefully analyze the source data and project requirements to choose the most suitable full-text search option.
Have questions? Contact us