[Offtopic] Google webmaster assistance

Sun Jul 29 20:30:41 EST 2007

Here's two Google pages. The first is from the 'Official Google Blog'
and the other webpage is from the Google 'Webmaster Help Center':

Robots Exclusion Protocol: (REP)

7/27/2007 09:25:00 AM  Posted by Dan Crow, Product Manager

<http://googleblog.blogspot.com/2007/07/robots-exclusion-protocol-now-with-even.html>

This is the third and last in my series of blog posts about the Robots Exclusion Protocol (REP). 

In the first post, I introduced robots.txt  and the robots META tags, giving an overview of when to use them. 

In the second post, I shared some examples of what you can do with the REP.  Today, I'll introduce.. new features that we have recently added to the protocol.

As a Google product manager, I'm always talking to content providers to learn about your needs for REP. 

We are constantly looking for ways to improve the control you have over how your content is indexed. These new features will give you flexible and convenient ways to improve the detailed control you have with Google.

Tell us if a page is going to expire

Sometimes you know in advance that a page is going to expire in the future. Maybe you have a temporary page that will be removed at the end of the month. Perhaps some pages are available free for a week, but after that you put them into an archive that users pay to access. In these cases, you want the page to show in Google search results until it expires, then have it removed: you don't want users getting frustrated when they find a page in the results but can't access it on your site.

We have introduced a new META tag that allows you to tell us when a page should be removed from the main Google web search results: the aptly named unavailable_after tag. This one follows a similar syntax to other REP META tags. For example, to specify that an HTML page should be removed from the search results after 3pm Eastern Standard Time on 25th August 2007, simply add the following tag to the first section of the page:

<META NAME="GOOGLEBOT" CONTENT="unavailable_after: 25-Aug-2007 15:00:00 EST">

The date and time is specified in the RFC 850 format.

This information is treated as a removal request: it will take about a day after the removal date passes for the page to disappear from the search results.

After the removal, the page stops showing in Google search results but it is not removed from our system. If you need a page to be excised from our systems completely, including any internal copies we might have, you should use the existing URL removal tool which you can read about on our Webmaster Central blog. <snip>

--

Webmaster Help Center 

How do I block Googlebot?   

<http://www.google.com/support/webmasters/bin/answer.py?answer=40364>

Blocking Googlebot

Google uses several user-agents.  You can block access to any of them by including the bot name on the User-Agent line of an entry. Blocking Googlebot blocks all bots that begin with "Googlebot".

* Googlebot: crawl pages from our web index and our news index 

* Googlebot-Mobile: crawls pages for our mobile index 

* Googlebot-Image: crawls pages for our image index 

* Mediapartners-Google: crawls pages to determine AdSense content. We only use this bot to crawl your site if you show AdSense ads on your site. 

* Adsbot-Google: crawls pages to measure AdWords landing page quality. We only use this bot if you use Google AdWords to advertise your site. Find out more about this bot and how to block it from portions of your site. 

For instance, to block Googlebot entirely, you can use the following syntax:

User-agent: Googlebot   Disallow: /
--

Cheers, people
Stephen Loosley
Victoria, Australia
.