The ShopWiki Crawler
ShopWiki finds products using Web crawlers similar to other search engines. This means we look into a Web site's domain for all robots.txt files, which tell our crawlers which files it may search. All Web sites have the ability to define what parts of their domain are off-limits to specific robot user agents. ShopWiki respects and obeys all robots.txt files.
Please note that we only update our copy of these files periodically. If you have recently blocked us from crawling or given us access to crawl your site, the results will not be immediate. For answers to any other questions or concerns, please email us at crawler@shopwiki.com.
robots.txt
Web administrators should use the information below to update your sites robots.txt files.
Our current User Agent String is
ShopWiki/1.0 ( +http://www.shopwiki.com/wiki/Help:Bot )
If you would like us to not crawl your site, please add this to your robots.txt:
User-agent: ShopWiki Disallow: /
If you feel that we are crawling too fast, please add this to your robots.txt:
User-agent: ShopWiki Crawl-Delay: 5
This will slow our crawl to 1 page (at most) every 5 seconds.
If you would like to explicitly allow ShopWiki's crawlers on your site, please add this to your robots.txt:
User-Agent: Shopwiki Allow: /
For more information on robots.txt files, see robotstxt.org or this tutorial.
META Robots
Another method for controlling what robots can access is via the META robots directive.
You can add this to your pages so we do not index them, or follow links on them.
<meta name="robots" value="noindex,nofollow">
You can also conditionally add this. If you want to allow us and no one else to crawl your site, you can do something like this:
jsp
<%
if ( request.getHeader("User-Agent").indexOf("ShopWiki") < 0 ) {
out.print("<meta name='robots' value='noindex,nofollow'>");
}
%>
asp/c#
<%
if ( Request.UserAgent.indexOf("ShopWiki") < 0 ) {
Response.write("<meta name='robots' value='noindex,nofollow'>");
}
%>