T he vocation of search technology - and in particular its scope - seems worryingly narrow-minded. We live in an age of certain complacency with what is available for information extraction and discovery. The common Web surfer has a much-unjustified expectation that whenever a fortunate query gets invoked, he/she will be referred to a page which leads to an answer. The way this is done far from ideal or even acceptable if state-of-the-art research is accounted for. The process of answer-seeking is subjective and overly time-consuming at present. It should be possible to retrieve answers at the speed of will (or speech). Referral to a human professional - a field expert that is - is still more fruitful and open-ended than the on-line scatter of pages.
The Internet as we know it is transforming as we speak. We begin to incorporate, intentionally or unintentionally, structural and relational data like XFN, document classes and closed networks of collaborated knowledge - a finite, closed-ended universe of manageable and interpretable facts, that is. Further informative text at code-level is presently embedded which assists in fetching semantics, thus optimising exploration and encouraging cross-site collaboration.
With Web 2.0 (as it is commonly referred to) on the horizon, everything migrates to on-line storage units (private or public) so that it resides coherently and cohesively in a single domain. Having got huge heaps of knowledge assembled and inter-linked, traditional search engines would proceed to scanning pages and extracting key words from them to form indices. This is a most fundamental - some would say primitive - way of reflecting on page content, which is crucial when one undertakes a most laborious and error-prone task. That task involves covering billions of pages from possibly questionable sources, which are written in different languages and comply with different contexts.
Word density, word proximity and the like are currently analysed by market leaders, but no actual knowledge is formed. The potential for forming hypotheses and testing them is missed entirely, even discarded. Words are treated as atomic elements within a large pool and are perceived as merely unrelated entities despite the fact that, in the mind of the author, a continuous flow of thought was stirring.
When a page gets composed, the final outcome is a document where an actual story is told so arguments are provided in a logical order and each argument is related to its neighbours. Missing that observation makes an algorithm deem to weaknesses, if not utter failure.