[ipac] Refining the robots.txt- was RE: Where is robots.txt on HIP?

David Pattern d.c.pattern at hud.ac.uk
Sat Apr 26 13:02:41 EDT 2008


As long as the default profile is the one you want to show to Google, then you need to use a link like this on your library site (so that Google finds it): 
http://ipac.wrhs.org:8080 <http://ipac.wrhs.org:8080> 
 
In your robots.txt file, you'll need:
User-agent: *
Disallow: /ipac20/
Disallow: /hipres/

The HIP link above should return the front page of the default profile (rather than a 302 redirect header), so the search engines should be able to retrieve the first HIP page but nothing else (as all other links will either start with "ipac20" or "hipres").
 
Hope that helps and also hope that works!
Dave
 
 

________________________________

From: ipac-bounces at lists.tblc.org on behalf of Brandon Walker
Sent: Sat 26/04/2008 16:26
To: Dynix's Horizon Information Portal,formerly iPac (discussion)
Subject: [ipac] Refining the robots.txt- was RE: Where is robots.txt on HIP?



Sorry, I read back two days and found the answer was already discussed.

 

So here's a much more specific question. I want Google, Yahoo, MSN Live, etc... to be aware of our catalog front page, but I don't want them crawling down any into the site and I'd sure as heck like them to not visit with the frequency with which they do. Is there any way to affect this via the robots.txt file? I'm unable to fathom how this would work and I'm not even really certain it can.

 

Brandon Walker
bwalker at wrhs.org
Library Technical Aide
Western Reserve Historical Society
http://www.wrhs.org <http://www.wrhs.org/> 
(216) 721-5722 ext. 271

 

________________________________

From: ipac-bounces at lists.tblc.org [mailto:ipac-bounces at lists.tblc.org] On Behalf Of Brandon Walker
Sent: Saturday, April 26, 2008 10:23 AM
To: Dynix's Horizon Information Portal, formerly iPac (discussion)
Subject: [ipac] Where is robots.txt on HIP?

 

Where can I find the robots.txt file for HIP 3.0.8? I'd like to check it and make sure it's updated so that I can try to eliminate our second-highest visitor, MSN Live Search.

 

Should I put additional copies of the robots.txt file in other locations? What's the most bot-restrictive configuration for this file?

 

Brandon Walker
bwalker at wrhs.org
Library Technical Aide
Western Reserve Historical Society
http://www.wrhs.org <http://www.wrhs.org/> 
(216) 721-5722 ext. 271

 


 





This transmission is confidential and may be legally privileged. If you receive it in error, please notify us immediately by e-mail and remove it from your system. If the content of this e-mail does not relate to the business of the University of Huddersfield, then we do not endorse it and will accept no liability.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tblc.org/pipermail/ipac/attachments/20080426/2cab8c10/attachment.html 


More information about the ipac mailing list