[ipac] Refining the robots.txt
Vern Mastel
v.mastel at mail.infolynx.org
Mon Apr 28 08:54:17 EDT 2008
At 06:02 PM 4/26/2008 +0100, you wrote:
>Content-class: urn:content-classes:message
>Content-Type: multipart/alternative;
> boundary="----_=_NextPart_001_01C8A7BF.5664D4B6"
>
>As long as the default profile is the one you want to show to Google, then
>you need to use a link like this on your library site (so that Google
>finds it):
><http://ipac.wrhs.org:8080>http://ipac.wrhs.org:8080
>
>In your robots.txt file, you'll need:
>User-agent: *
>Disallow: /ipac20/
>Disallow: /hipres/
>The HIP link above should return the front page of the default profile
>(rather than a 302 redirect header), so the search engines should be able
>to retrieve the first HIP page but nothing else (as all other links will
>either start with "ipac20" or "hipres").
>
>Hope that helps and also hope that works!
>Dave
>
This will not work because it will be ignored.
"Disallow" is merely a red flag marking where the good stuff is.
Google claims that they will call off the dogs if you fill out the
appropriate paperwork to get a site removed from their list of victims, er,
ah, indexable sites.
However the fine print says that this is only good for 6 months max and
then they will be baaaaaccckkkk.
I am continuing to experiment with mod_security rules. I can deny access to
any host that requests the robots file, now I need to refine control to
deny by the host itslf.
Bwahhahahahahaha!
Vern Mastel
Technology Coordinator, Bismarck Veterans Memorial Public Library
Desk Phone 701-355-1499 Cell Phone 701-426-5897
More information about the ipac
mailing list