May 31, 2007

Do all-stopword queries matter?

Many search engines don’t index “stopwords”, words that are very common and have little meaning by themselves. The stopword list is often just the most frequent words in the language: “the”, “be” (and its inflections), “a”, “of”, and so on.

Search engines that index all words like to show off searches for “to be or not to be”, because stopword elimination can remove every word in the phrase. Of course, no one really searches for “to be or not to be” because we all know where it came from.

Are there any real titles that are all stopwords? Does this matter? I’ve been indexing movie titles, and found a more than a few that are 100% stopwords.

The last one isn’t a traditional stopward, but think about the number of “click here” links on the web. It is a web stopword, for sure.

Posted by Walter Underwood (wunder@best.com) at May 31, 2007 08:49 AM | TrackBack
Comments
Post a comment









Remember personal info?