__/ [ Paul Zhao ] on Friday 17 March 2006 15:44 \__
> Hi, I was recently asked the question from a Washington Post Editor,
> why isn't
Ask/tell him something in return. For example:
* I suspect that some content in the Washington Post (not just the NY Times)
requires one to be registered in order to be read. This must be the price
one would pay for trying to alter the way to Web works, often for self
convenience (at the expense of the users and even crawlers who get
* Why is the URL so long? What CMS is used? Why can't the URL not be as
meaningful and yet short /a la/ BBC?
* Bloggers tend to copy entire items from mainstream media. Do a full-text
search of a popular article to see what I mean. It is possible that this
article was excluded due to suspicion of plagiarism (or duplicate content),
which is detected automatically.
> I understand all the "Google Search" rules, "it's not indexed cuz it's
> in deep subdirectories and it's gotta dynamic URL, the navigation is
> deep more than 4 clicks...".
No. Good ranks would allow the spider to crawl deep enough and index the
pages without any reluctance.
> But does anyone know anything about the Google News bot? I understand
> it's a diff both from the normal Googlebot. Does Google just have a
> list of 1000 or so "news sites", and that newsbot consistantly run
> through it over and over several times an hour? And does it have a
> diff database from Google Search, as in a specific page on
> washingtonpost.com could be indexed by Google Search but not Google
> News, or vise-versa?
Google News aggregate certain sites that are considered to be new sites and
pass some test. If you want to get only news site, Use Yahoo News, which in
my opinion is 'cleaner'. More recently, I even find job offers and shops
among Google News items.
As a side note, I believe that Google News visit sites ans poll for updates
of actual pages rather than rely on XML/RSS feeds. The reasons must be
historical as sites are still in the process of making their content more
> And does anyone know any "search commands" for Google News?
> site:washingtonpost.com works for Google News, but looking inside a
> directory (site:washingtonpost.com/wp-dyn) doesn't work for some
I suggest you use a toolbar to get results from news.google.com. I don't
believe a 'magic' command or reserved keyword exists. Google News, by the
way, is the only content from Google which one can syndicate (apart from the
Google Blog, of course). This has been the case for about 6 months, possibly
initiated in oder to keep up with Yahoo.
> Any information on anything about Google News is greatly appreciated.
I can offer some of my own (if I may):
Google News Feeds
Hope it helps,