__/ [John Bokma] on Thursday 08 September 2005 12:48 \__
> davidof <david.george@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>> Roy Schestowitz wrote:
>>> It actually raises an interesting point. Given certain URL
>>> structures, SE's can not only predict if pages will get generated
> read: they use several heuristics. With mod_rewrite you can decide what
> you want it to be :-)
Yes, indeed. However, many people are lazy or indifferent. *smile* Look at
the WordPress community, for example. Most users just deploy it as it comes
'out of the box', even with the faults I have complained about in the
development community (most recently PageRank leakage). Imagine yourself
CMS/blogging software that has a mechanism in place for hiding its
identity... soon to come?
>>> but also how they get generated.
> Also easy to fake. Note that everything a spider gets, except the domain
> name, can be created as you want it, since all this information is encoded
> in headers your server / program generates.
I am fairly sure that domain-related information is already used in the
algorithms. You can sometimes use it for rough speculations. For example,
certain hosts are blog-magnets whereas others are not, even by principle.
Residing on server which belongs to a well-reputed host still merits a
>>> For example, it should
>>> not be hard to spot a blog and tell it apart from a commercial CMS, a
>>> free (Open Source) CMS and a DIY CMS. It can give the SE some
>>> indication of reliability of information within.
>> I would argue whether it is that simple except in common cases
> And if this would happen, mod_rewrite is your friend :-) Moreover, it
> would make it possible to make a DIY homebrew CMS look like a reputable
> well known CMS and vice versa.
True, but it requires some work and/or skills. Statistically-speaking, this
still gives you the ability to weed out blogs from search results.
Roy S. Schestowitz | "Slashdot is standard-compliant... in Japan"
http://Schestowitz.com | SuSE Linux | PGP-Key: 74572E8E
6:10am up 15 days 8:02, 3 users, load average: 0.91, 1.05, 0.86