__/ [ hug ] on Thursday 27 April 2006 15:05 \__
> Roy Schestowitz <newsgroups@xxxxxxxxxxxxxxx> wrote:
>>__/ [ hug ] on Thursday 27 April 2006 14:10 \__
>>> Given a site where all pages have dynamic content, how does one
>>> prevent google from deciding that the same content with different
>>> session id's is duplicate?
>>Setting aside the issue associated with incoming links (try to prevent
>>in-line session id's), why not use XML site maps? You only mentioned
>>Google. Various guidelines to Webmasters would probably be more helpful.
>>Speaking of XML site maps, Matt Cutts announced today/last night that
>>through XML site maps, Webmasters can now get notification of panelties:
> Hi, Roy. I had thought about a site-map, the site-map that I'm
> currently using is automatically generated and it shouldn't be that
> tough to generate an xml version (presumably, I know nada about xml).
> In the google webmaster page it said that a robots.txt should be used
> to keep google from looking at anything with a session id. I'm not
> quite sure how to address that, it seems conflicting since all my
> pages are dynamic. I guess I should specify that nothing with a .PHP
> extension should be looked at, does that make sense?
Borek recently reminded me that Google will honour wildcards in robots.txt.
This is not something that was specified in the accepted protocol/standard
as far as I can tell. In fact, wildcards in robots.txt are officially
discourages, unless you count incomplete directory levels as wildcards (e.g.
disallow "/projects" also excludes "/projects/index.htm" and
Google are said to have made the exception. This means that you can specify a
pattern which matches URL's with session id's, and URL's with id's _ONLY_.
How you can do it will depend on your site/CMS.
Roy S. Schestowitz | "Did anyone see my lost carrier?"
http://Schestowitz.com | SuSE Linux ¦ PGP-Key: 0x74572E8E
4:00pm up 5 days 1:11, 9 users, load average: 1.55, 0.95, 0.70
http://iuron.com - help build a non-profit search engine