Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: Are search engines going to die?

  • Subject: Re: Are search engines going to die?
  • From: Roy Schestowitz <newsgroups@xxxxxxxxxxxxxxx>
  • Date: Fri, 04 Aug 2006 16:04:19 +0100
  • Newsgroups: alt.internet.search-engines
  • Organization: schestowitz.com / ISBE, Manchester University / ITS
  • References: <1154680381.303780.114180@m79g2000cwm.googlegroups.com> <4jgs2nF7udddU1@individual.net> <4jh1rkF81lroU1@individual.net>
  • Reply-to: newsgroups@xxxxxxxxxxxxxxx
  • User-agent: KNode/0.7.2
You make a bold statement in the subject line. Under most circumstances, one
would suspect that you are an alias of John Dvorak, a self-admitted troll
that has bashed anything from Apple to CSS.

__/ [ tonnie ] on Friday 04 August 2006 14:53 \__

> Brian Wakem schreef:
>> Zhiguo wrote:
>>> Recently I am working on a web page analyzer, which divide the whole
>>> page into blocks according to all kinds of properties of elements, such
>>> as position, size, background color, etc.
>>> My partners will use this page analyzer to make our experimental search
>>> engine crawl and index the web on a semantic block basis rather than on
>>> a page basis.

That's quite interesting. I noticed that the indexing patterns for large
sites are not quite what they used to be. Site evolved to exploit weaknesses
of indexers and spiders, so search engines evolve to annul the effect of
this subversion.

>>> My problem is, the HTML renderer I am using, that is MSHTML, cannot
>>> handle scripts such as VBScript, JavaScript, etc. In fact, those
>>> scripts run as a response to user's certain action and operate on the
>>> HTML tree. But my page analyzer can only analyze static HTML tree with
>>> the help of MSHTML.
>>> In a word, the page analyzer doesn't understand scripts.
>>> I don't know whether google's crawler understand the script elements in
>>> a page, but I guess not. Because Google's Adsense cannot handle Ajax
>>> pages.

It depends how they are build. Crawlers will not follow or execute scripts,
but a properly constructed menu, for example, even if scripts are used, is
navigable with scripts disabled.

>>> The truth is that, as more and more pages in Internet become
>>> interaction-rich, the communication between server and browser has
>>> changed.
>>> In the days of static page,  the server sends a source file to the
>>> client, and the crawler can pretend to be a normal customer.
>>> but when web 2.0 comes, the server sends reples that cannot be
>>> understood if your browser is not waiting for this specified kind of
>>> reply. The crawler cannot pretend any more.
>>> New kinds of crawler comes, or search engines disappear.
>>> Am I right?

I can't help thinking about sites like Reddit, Digg.com, Netscape.com, and
Newsvine. Basically, these are becoming so extensive that searching their
large links databases can provide some decent search results. That's where a
lot of power lies, assuming people don't spam the site (needs moderation).
The results can be delivered in a variety of forms, too.

>> I think it is more likely that web2.0 will die as people realise their
>> pages are not being indexed.

*smile* I sure hope so. At present, such sites are doing quite well at
'sucking' traffic out of SERP's, merely because they contain a blurb. Blogs
likewise (assuming they don't fall under the 2.0 'umbrella').

> I would second that, although with a little alteration.
> People constantly search for information, due to the fact that our
> society is based upon obtaining, using, interpreting and distributing
> information.
> If web2.0 isn't accessible to search engine bots, it will die because
> people can't find the information they need.

The problem, in my opinion, is that Web 2.0 is largely braindead and badly
structured. I am not just thinking about MySpace, but I also think of
arbitrary comments and out-of-context, infamous 'user generated content'.
Some of it is very organic, or even unoriginal/plagiarised. I read an
article yesterday which claims that all news sites and media will be
outsourced to India because it can appeal to news aggregators and be

> Due to the enormous amount of information that is out there, we need a
> website - programm that is capable of categorising, indexing and
> delivering the information needed upon demand. So a search engine in
> what ever form is necesarry, web2.0 isn't.

I am still in favour of something more semantic that does not just index
words. Actually, if I may rave for second, my proposal (residing at the
bottom of my .sig) has got Google's attention. They contacted me out of the
blue and I will speak to them on the phone in less than an hour. *smile*

Best wishes,


Linux is like a girlfriend; try to stick to one distribution for a lifetime
http://Schestowitz.com  |  GNU is Not UNIX  |     PGP-Key: 0x74572E8E
roy      pts/4                         Fri Aug  4 10:08 - 10:23  (00:15)    
      http://iuron.com - proposing a non-profit search engine

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index