Re: How Search Engines SHOULD Be Managed

__/ [John A.] on Friday 14 October 2005 02:33 \__

>>* Blame ISP's for harbouring spammy traffic
>>* Blame Microsoft for unleashing a faulty O/S out of the box
>>* Blame Google for unintentionally giving incentive for Web spam
> SE spammers had incentive to spam long before Google, and they did
> plenty of it. Anyone else remember the TV commercial - I forget for
> which SE - where they had a bunch of old people("same old links") call
> out their site name when a searcher calls out their search terms?
> There was one old guy in a leather harness calling out "Hot Leather
> Action!" or something like that, to which another replied "Oh, you
> come up for everything!"
> At the time it was mostly content and keyword tag spamming. Google's
> system of analyzing links was just the ticket to sift out the real
> stuff (which legit sites generally linked to) from the scum.
> The problem is they let it come out how they did it. (It was, of
> course, just a matter of time before the spammers figured it out, even
> if they hadn't leaked it via their patents.) Any criteria by which
> sites can be evaluated for relevancy & authority can be targeted if
> it's known. Google has, of course, refined their system, mostly
> plugging the holes in ways that seem to be aimed at forcing spam to be
> more obvious to the user. The holy grail is, of course, to get the
> criteria to the point that a page absolutely *has to be* relevant
> and/or authoritative to meet the criteria and where any relevant
> and/or authoritative page will meet it. That point may be approached,
> but short of some degree of AI, it will never actually be met, and
> probably not even then.

Interesting take. Some months ago I argued that in order to avoid bias and
avoid corruption, the following steps should at least be considered:

- Make a search engine public service[1], much like the W3C's validation
services and ICANN/whois.net/relatives. The Web belongs to everyone in this
world and search -- the means by which data gets organised -- should be a
service. Likewise, an operating system should be nobody's property.
Hardware should, but not the platform upon which people communicate.
Conflicting interests leads to protocol breakage... (I am going endlessly
off topic, so I will stop)

- Have sites register in one form or another to state their aims and scope.
DMOZ goes some way towards that, but the whole Google-DMOZ-mozilla.com
(corporation) loveaffair is disturbing in my eyes.

- Use more proper methods for exploiting knowledge and information. Don't
tell me (Schmitt) how long it will take you to index all human knowledge
(300 years, he said - reference available on demand). Do the task
_properly_! See the URL in the bottom of my sig as I truly believe search
engines are lagging behind what science (AI in particular) has to offer.

[1] Funding of crawling resources can be managed in the same way Google
does, e.g. paid listing in SERP's (not sponsored links in the actual
results), much like Yellow Pages where yellow/white tells apart ham from

I think there needs to be a strategic movement like GNU in order to release
ourselves from commercial search engines (and all-round public information
domnation). The financial entry barrier is high though. See:

* http://iuron.com/documents/manifest/draft/node4.html

* http://www.google.com/intl/en/corporate/history.html#1998


All they (Brin, Page) needed was a little cash to move out of the dorm ? and
to pay off the credit cards they had maxed out buying a terabyte of memory.
So they wrote up a business plan, put their Ph.D. plans on hold, and went
looking for an angel investor. Their first visit was with a friend of a
faculty member.


Best Regards (happy to have heard your thoughts),


Roy S. Schestowitz
http://Schestowitz.com  |    SuSE Linux    |     PGP-Key: 74572E8E
  4:55am  up 49 days 17:09,  3 users,  load average: 0.79, 0.51, 0.55
      http://iuron.com - next generation of search paradigms

