[ipac] Refining the robots.txt

Vern Mastel v.mastel at mail.infolynx.org
Mon Apr 28 08:54:17 EDT 2008


At 06:02 PM 4/26/2008 +0100, you wrote:
>Content-class: urn:content-classes:message
>Content-Type: multipart/alternative;
>         boundary="----_=_NextPart_001_01C8A7BF.5664D4B6"
>
>As long as the default profile is the one you want to show to Google, then 
>you need to use a link like this on your library site (so that Google 
>finds it):
><http://ipac.wrhs.org:8080>http://ipac.wrhs.org:8080
>
>In your robots.txt file, you'll need:
>User-agent: *
>Disallow: /ipac20/
>Disallow: /hipres/
>The HIP link above should return the front page of the default profile 
>(rather than a 302 redirect header), so the search engines should be able 
>to retrieve the first HIP page but nothing else (as all other links will 
>either start with "ipac20" or "hipres").
>
>Hope that helps and also hope that works!
>Dave
>


This will not work because it will be ignored.

"Disallow" is merely a red flag marking where the good stuff is.

Google claims that they will call off the dogs if you fill out the 
appropriate paperwork to get a site removed from their list of victims, er, 
ah, indexable sites.

However the fine print says that this is only good for 6 months max and 
then they will be baaaaaccckkkk.

I am continuing to experiment with mod_security rules. I can deny access to 
any host that requests the robots file, now I need to refine control to 
deny by the host itslf.

Bwahhahahahahaha!


Vern Mastel
Technology Coordinator, Bismarck Veterans Memorial Public Library
Desk Phone 701-355-1499 Cell Phone 701-426-5897




More information about the ipac mailing list