Don’t Hide From Google
Here’s a robots.txt file from a company whose software I’m currently evaluating:
User-agent: *
Disallow: /cgi/
Disallow: /cgi-bin/
Disallow: /mantis/
Disallow: /forum/
Disallow: /stats/
Disallow: /synk/unreg.html
Disallow: /synk/de/unreg.html
Disallow: /synk/fr/unreg.html
Disallow: /synk/it/unreg.html
Disallow: /synk/email.psn
Disallow: /synk/help/
This is from a small company whose main product is experiencing solid growth. In fact, they are growing so fast, they are having trouble responding to support e-mails and are consequently requesting that users check the FAQ list and read the forums before sending them e-mail. Keeping that in mind, can you tell what’s wrong with this robots.txt?
If it’s not obvious consider this: the FAQ list they’re asking me to read and the forums they want me to consult are both in /forum. See it now?
The problem is that they have blocked off Google (and other polite search engines) from finding their support information. As it turns out, I had two problems with this product today, both of which gave me confusing, unhelpful, and yet fairly unique error messages. Consequently I did what any techie would do with such a message. Put quotes around it, typed it into Google, added the product name, and scanned the results.
Google’s very good at finding error messages (you’re never the first person to have any given problem) but this time it popped up zero hits. OK. I figured I did something really weird, or had a strange configuration. Maybe I was the first person to see this problem, so I sent off a detailed report to support@company.com. Then I started browsing their web site, and noticed their pleas to please check the FAQ list and the forums. The FAQ list didn’t answer the question but the forums did.
Then I had another problem with the same product. Again Google didn’t help, but the forums did. (This turned out to be a bug with no workaround that may be fixed in the next point release.) Why didn’t I find the answer I was looking for with Google? Because this company had deliberately blocked Google from their web site using robots.txt. That’s not business smart or user friendly. Now they’ve got two e-mails from me to read and process they wouldn’t have had to deal with if Google could search their forums.
Folks, to save money on support, you have to give your users the best self-help tools you can; and when it comes to search, Google’s better than anything you’ll come up with on your own. Google wants to be your friend. Let it.
November 18th, 2006 at 2:48 pm
So many times I’ve run in to this kind of thing. It boggles the mind. Another problem is support forums that use dynamic URLs that don’t index very well. I bought a Linksys Network Storage Link and it was painful to use their support forum but I wasn’t getting the results from Google. (Maybe they block bots also, but judging by the URLs, Google wouldn’t get very much from them in any case.)
Even Microsoft’s Knowledge Base results show up in Google, so they seem to get it. (Or used to–haven’t searched for MS/Windows error messages lately.)
November 21st, 2006 at 5:58 pm
Have you considered that the reason the support forum software could be hiding itself from Google is to prevent it from being of any use to web-bots that place artificial postings that link to bad guys’ sites? I’ve seen many good forums ruined by the web-bots.
November 28th, 2006 at 7:19 am
bbercik: Web-bots used by bad-guys don’t obey the rules. Therefore they are not affected in any way by robots.txt.
January 10th, 2007 at 9:54 am
Not directly, no. But if a user-writable page (like a forum page) is hidden from legitimate search engines, than planting links on it to get Googlejuice will not work, since the link is hidden too.
A better way to handle this is with “rel=’nofollow'”, which causes Google to ignore just the link.
January 23rd, 2007 at 11:23 am
The same idea extends to allowing people to view Google’s cached pages.
Google’s cache of Cafe Au Lait’s pages can’t be viewed … due to the Javascript.
February 6th, 2007 at 9:50 am
i agree with gaurav, google is really a friend in many ways – so helpful to search useful information, it’s unbeleivable at times. gaurav may respond to my e mail mahesh.hariani@yahoo.co.in