Tue 8 Jan 2008
What Do SEO/SEM People Put In Robots.txt Files?
Blurb by Shaun Anderson (Hobo)SEO TIPS My recent post on how SEO experts started their authority seo blogs proved popular when submitted to Sphinn by Marty at Aimclear (thanks!) and came at a time where frankly I was preparing to give up blogging for a short time to get on with work!
But in my new role as seo nosey-parker (thanks Tim), I thought it would be fun and perhaps useful to examine the seo/sem experts’ robot.txt files, to see if they could offer some insight into how to manage this often neglected file on your site (if you even have one).
What Is A Robots.txt File & Do You Need One?
Rand Fiskin, Jim Boykin, Danny Sullivan (on Daggle), John Andrews, XMCP, Michael Martinez, Marty Weintraub were notable for the lack of a detectable Robots.txt file, so no, it appears it’s certainly not necessary to include a Robots.txt in your website, and I survived without one for long enough.
A Robots file can, simply, tell search engines certain pages of your site not to return in search results crawl. It can also save valuable bandwidth.
What Do SEO/SEM People put in Robots Files?
Hope they don’t mind (and I’ve included the systems they appear to be on in case that’s useful to some)
- Andy Beard – Niche Marketing (Wordpress)
- Aaron Wall – SEO Book (Drupal)**
- Dave N – David Naylor (Wordpress)
- Michael Gray – Wolf Howl (Wordpress)
- Sebastian – Sebastian’s Pamphlets (Wordpress)
- Jill Whalen – High Rankings
- Matt Cutts – Matt Cutts (Wordpress)
- Jeremy Schoemaker – Shoemoney
- Todd Malicoat – Stuntdubl (Wordpress)
- Joost De Valk - Joost De Valk (Wordpress)
- Tamar Weinberg – Techipedia (Wordpress)
- Wiep Knol – Wiep (Wordpress)
- Tadeusz Szewczyk – SEO 2.0 Onreact (Wordpress)
- Maki – Dosh Dosh – (Wordpress)
- Donna Fontenot – Dazzlin Donna (Wordpress)
- Tim Nash – Tim Nash SEO (Wordpress)
- Andrew Gridwood – Blog.Arhg (Blogger)
- Lyndon Antcliff – Cornwall SEO (Wordpress)
- Hamlet Batista – Hamlet Batista (Wordpress)
- Debra Mastaler – The Link Spiel (Blogger)
- Bill Slawski - Seo By The Sea (Wordpress)
- Kalina Jordan – Ask Kalena (Wordpress)
- Andy Beal – Andy Beal (Wordpress)
- Hobo – Hobo-Web (Wordpress)
Make A Robots.txt File
For those who don’t know the Robots.txt syntax, there is an * online tool * you can use to make one if you need one.
Out of interest, while investigating this file, I happened across possibly the longest robots.txt in the world? Yours probably doesn’t need to be quite that big (!), and it’s clear a Robots.txt file is indexable by Google and does aquire Google Pagerank (why I nofollowed those links above in case their owners don’t want link love to that file).
And it wouldn’t be fair to talk about seo people’s Robots.txt without mentioning Brett Tabke, founder of my favourite forum, Webmasterworld. Brett uses his as a blog, when actually, his real robots.txt is here.
Update – While some search engines advise you to use to Robots txt to avoide issues like duplicate content, as Sebastian points out in the comments, this is not ideal and perhaps is not the best advice.
Disallow’ing dupes is one of the most common SEO mistakes by the way. Of course search engines encourage this bad practice, but that’s not exactly a good reason to follow the crowd.
Update – Consider these two articles about Robots.txt from some who have spent more time than me thinking about Robots Txt:
Did you know when you link to a Hobo SEO post we have search engine friendly links back to your site if approved? Our comments are also search engine friendly you know (once you've commented on a few posts)! Do you need any more encouragement to get involved in the conversation ;)

If you want to search robots.txt files and what’s in them, try the search engine BotSeer, designed exclusively for crawling and indexing robots.txt files.
Best
Lee Giles
> A Robots file can, simply, tell search engines certain pages of your site not to return in search results.
I’m sorry, that’s plain false. Disallow’ing files in robots.txt doesn’t prevent from indexing, the Disallow: statement just forbids crawling. Hence it’s a bad idea to steer indexing in robots.txt.
The current robots.txt commands (Disallow:, Allow: and Sitemaps:) are all crawler directives that control crawling but not indexing. Besides Google’s Noindex: experiments (somewhat flawed I may add) there’s no such thing as an indexer directive in robots.txt. Search engines do index URLs that are disallow’ed, and that’s the right thing to do. Disallow’ed URLs do accumulate PageRank, and appear on SERPs as URL-only listings or with titles and snippets pulled from 3rd party sources.
Disallow’ing dupes is one of the most common SEO mistakes by the way. Of course search engines encourage this bad practice, but that’s not exactly a good reason to follow the crowd.
However, “outing” robots.txt files is a great idea, and thanks for the mention.
Sebastian – I stand corrected. threw that line in trying to “simplify” and didn’t pick it up in reread. I’ll fix it so it’s not “misleading”, but leave the comment.
Hand slapped, Hang Head In Shame
Thanks Shaun
And plz add nofollow crap to Kalena’s and Andy’s robots.txt links.
eek! Done, not sure what happened there! Thanks again Sebastian!
Having fun with some of them. It seems I could make a living with robots.txt coaching
If only SEs would implement my enhancements …
I’ll take this time to refer visitors to your post on Robots Txt Directives Thanks for your input clarifying this point
Wow. That’s the first time someone thanked me for inserting blatant comment-author-link spam. Thank *you*!
Yes I was surprised the first time it happened to me too.
The way I figure it, you added more *actual* useful information to this page (which was just a bit of link-baiting fun crafted in the middle of the night) – so you deserve credit in the only way I can give you – a link.
But saying that, that’s enough spamming of my site Sebastian.
Away with you!
Ok Ok … I’ll spam away … and thanks for the link
Sean, I am so intimidated by Sebastian’s knowledge of robots.txt I simply can’t create one. Seriously… that dude’s knowledgable!
Also seriously, old-timers know to autogenerate the robots.txt so what you see when you pull it is not (necessarily) the same robots.txt the search engines get. We used to actually show incorrect ones to everybody but qualified search spiders — to consume competitors valuable time — but I don’t know how many people still bother. From your list of popular bloggers, I doubt any do serious competitive work on their public blogs anyway.
Hi John
Yeah tell me about it! Lets just say Sebastian has educated me via email to the wonders of robots.txt since this post, and has sent a couple of hundred stumblers and twitters to witness my slapping
Congrats on top spot on the best SEM forum on the net by the way and for keeping it – marvelous spin!
Actually, I am getting as sick of seeing your mug up there as I did Rand Fiskin’s although I am on the greatest hits myself (through sheer good fortune with Aaron Wall’s article).
I don’t think you’ll be top for long though!
The problem is you have Lucia’s plugin set up so that author comments are nofollow.
To follow in Sebastian’s footsteps here is something on robots.txt and pagerank accumulation
The fun thing is when people have a “secret” area on their site that they don’t want indexed, so they block it in the robots.txt file rather than using a more secure method.
Cheers Andy – I’ll get Linky Love sorted shortly.
Thanks Shaun you reminded me of something I meant to do, keep up the being nosey with your help I may well finish all those jobs I forgot to do
Cheers Tim – Hang on – I am about to publish some info Sebastian has let me in on
oh no wasn’t changing my file itself, just making sure nosey parkers get the full experience
Inspired by Brett
great compilation
And believe me, he did.
Read Sebastian to Shaun’s – The Idiots Guide To Robots.txt
haha just found this post, because it links to my robots.txt.
why I have one.. i simple I wanted fasted indexing from ASK that’s the only reason
I did Build a cool tool for making robots.txt files.. ages ago
http://www.clickability.co.uk/robotstxt.html
Several people still don’t know the real importance of robots.txt file. You can find this file in domains, under this directive: http://domain.com/robots.txt. Such a file is an invitation to search bots to crawl a site. Liked your post.
Do you really need a robots.txt file in every case? I heard you do not need one in every case.