Irony
of Internet search is that - there never is a
paucity of information - but an overdose of it.
With databases that can keep the entire Web at
its fingertips - search engines almost always
can retrieve relevant pages but the challenge
lies in separating wheat from the chaff - keeping
out unwanted stuff. Most engines find more sites
from a typical search query than you could ever
wade through and so finding the relevant pages
from its search result looks more like proverbial
needle in the haystack situation.
We have discussed Boolean search
in last issue. Its a great tool in terms of simplicity
and speed - but incapable of differentiating search
expressions which have same keywords but in different
order (hence different meaning). So, Search expressions
'Dog Bites Man' and 'Man bites Dog' retrieves
virtually same result (unless using exact phrase).
Search Engines are aware of this
problem and have tried to solve it in different
ways. Directory type search engines display search
result in alphabetic order. But they are extremely
selective - so the search result seldom goes beyond
one or two pages.
Spider based search engines have
no such luck - so they employ what is called 'relevance
score' to sort search results.
Relevance score is a measure to bring
the most relevant pages at top of any search result.
Many search engines display relevant score of
each retrieved page.
Relevance scores reflect the number
of times a search term appears, where it appears
(e.g. in the title, in the meta tags, towards
the beginning of the document etc.), if all the
search terms are near each other and many other
relevance parameters. Each parameter has a different
weightage. The pages are sorted by final relevance
score.
Since each search engine has its
own system of calculating relevance score - you
find different search result from different search
engines even when the search expression is same.
|