Searching text

Joined: 11/28/2008

Trying to get this right on searching text using MySQL's fulltext search function. Basically what I am trying to do is to get some fairly relevant results from a column in MySQL, which in this case is sermon text.

So far the best full text search works best like so:

CODE
SELECT fieldname FROM sermons WHERE MATCH(sermon_text)
AGAINST ('+"phrase to be searched on"' IN BOOLEAN MODE);

which in this case will yield only results from that phrase. There are title searches which use LIKE and also by date.

Anyone with any better ideas on this one?

Want to Make $$$$ with your Computer? No Risk! Simply press shift-4 four times in a row

Joined: 11/28/2008
Well, this note on that page

Well, this note on that page seemed of relevance:

http://dev.mysql.com/doc/mysql/en/fulltext-search.html#c1502

It might help with the ordering of results.

Paul Davey
Whitford Church
"Everyone who calls on the name of the Lord will be saved." Romans 10:13
"For all have sinned and fall short of the glory of God, and are justified

Joined: 11/28/2008
Thanks, Bobbymac.I have

Thanks, Bobbymac.

I have looked at that page previously. It is pretty difficult to determine relevant results. The other minor issue is that any word 3 letters or less is ignored. Not generally a problem but I was doing a demo of the site and someone wanted to search for 'war' and no results were found. It also carries a list of stop words that are generally ignored although not if searching in Boolean. I will probably add some validation telling that it needs to be over 3 letters.

Want to Make $$$$ with your Computer? No Risk! Simply press shift-4 four times in a row

Joined: 11/28/2008
I believe that is an option

I believe that is an option that can be set (at least) when compiled, if not in the mysql config. I am sure I read that on that page. The limit is for performance reasons.

Paul Davey
Whitford Church
"Everyone who calls on the name of the Lord will be saved." Romans 10:13
"For all have sinned and fall short of the glory of God, and are justified

Joined: 11/28/2008
Can someone explain a bit

Can someone explain a bit better how this functionality can be used? I was working on my own indexing service for large resume files to index all words 3chars or more in length into MySQL.

Of course, after indexing 2,000 Word resumes, it has become cumbersome (especially without stored procedures to speed things along) to sort and insert new found words into those tables.

How can this type of query be used to effectively search through data columns in MySQL of type FULLTEXT to retrieve results? Are you able to do these following queries:

1. COUNT of times a keyword/keyphrase is used in each FULLTEXT column?
2. Comparison between the char-length of the FULLTEXT column and then number of keyword/keyphrase occurrances?
3. If keyword/keyphrase is not found in FULLTEXT, can you break phrase apart into individual words and pull any results?

I have the ability to do all of these searches with my own indexing service, but certainly it is slow at best in indexing the files. Queries are simple and fast however.

Thanks for the help! /smile.gif" style="vertical-align:middle" emoid=":)" border="0" alt="smile.gif" />

Joined: 11/28/2008
I believe that is an
QUOTE
I believe that is an option that can be set (at least) when compiled, if not in the mysql config. I am sure I read that on that page. The limit is for performance reasons.

Interesting. I did find this post. I did a search on the site for war and it worked fine.

QUOTE
The Wikipedia encyclopedia uses MySQL boolean full text search and MySQL 4.0.20. You can assess the speed yourselves. About 350,000 articles/ rows in the English language version, roughtly 5GB total. For all languages, one million articles and 10GB of text.

It's very important to have a sufficiently large key_buffer_size. Get much of the index in RAM and searches are typically very fast. Because we use InnoDB tables and can't share cache between InnoDB and MyISAM, we're moving to a setup which will use a dedicated and MyISAM tuned search server.

We run a slow query killer which will kill queries once their allowed time expires. That time depends on server load (active thread count). Documented at http://wp.wikidev.net/Querybane

It turns out that our current search often spends most time on things other than full text search - an improved version which is much more efficient is pending. Remember that you can use a self join to get around the one index per query limit of MySQL.

Search remains our most problematic load feature, requiring a couple of quite capable slaves to keep up at busy time. If we're working on the servers and the remainder can't handle the load, we switch to Google or Yahoo search. This is from one of the top few hundred sites on the net, so it's scaling pretty well for our application. One server was sufficiently fast to take us to the top 1,000.

I also found this post suggesting a way to do it using Windows.

QUOTE
to set up min_len and stopword_file in windows (win2k or xp):

1. run services.msc to check what .ini your mysql is reading (in my case, "C:\MySQL\bin\mysqld-nt" --defaults-file="C:\MySQL\my.ini" MySQL)

2. change your my.ini like this:
[mysqld]
ft_min_word_len=3
ft_stopword_file="C:\\MySQL\\stop.txt"

3. restart your mysqld at services.msc

4. reindex your table using REPAIR TABLE tbl_name QUICK;

Little bit different for a 'nix server. You have to use the mysqld_safe --set-variable=ft_min_word_len=3. The server that I have runs FreeBSD, so I added the --set-variable statement to the mysql startup script and then reindexed the file. It works fine now.

Want to Make $$$$ with your Computer? No Risk! Simply press shift-4 four times in a row