Tue 8 Jan 2008
What Do SEO/SEM People Put In Robots.txt Files?
Posted by Shaun AndersonMy recent post on how SEO experts started their authority seo blogs proved popular when submitted to Sphinn by Marty at Aimclear (thanks!) and came at a time where frankly I was preparing to give up blogging for a short time to get on with work!
But in my new role as seo nosey-parker (thanks Tim), I thought it would be fun and perhaps useful to examine the seo/sem experts’ robot.txt files, to see if they could offer some insight into how to manage this often neglected file on your site (if you even have one).
What Is A Robots.txt File & Do You Need One?
Rand Fiskin, Jim Boykin, Danny Sullivan (on Daggle), John Andrews, XMCP, Michael Martinez, Marty Weintraub were notable for the lack of a detectable Robots.txt file, so no, it appears it’s certainly not necessary to include a Robots.txt in your website, and I survived without one for long enough, but it can help reduce duplicate content issues if you’re worried about that sort of thing and not confident enough to hack away at your template to physically prevent problems.
A Robots file can, simply, tell search engines certain pages of your site not to return in search results crawl. It can also save valuable bandwidth.
What Do SEO/SEM People put in Robots Files?
Hope they don’t mind (and I’ve included the systems they appear to be on in case that’s useful to some)
- Andy Beard - Niche Marketing (Wordpress)
- Aaron Wall - SEO Book (Drupal)**
- Dave N - David Naylor (Wordpress)
- Michael Gray - Wolf Howl (Wordpress)
- Sebastian - Sebastian’s Pamphlets (Wordpress)
- Jill Whalen - High Rankings
- Matt Cutts - Matt Cutts (Wordpress)
- Jeremy Schoemaker - Shoemoney
- Todd Malicoat - Stuntdubl (Wordpress)
- Joost De Valk - Joost De Valk (Wordpress)
- Tamar Weinberg - Techipedia (Wordpress)
- Wiep Knol - Wiep (Wordpress)
- Tadeusz Szewczyk - SEO 2.0 Onreact (Wordpress)
- Maki - Dosh Dosh - (Wordpress)
- Donna Fontenot - Dazzlin Donna (Wordpress)
- Tim Nash - Tim Nash SEO (Wordpress)
- Andrew Gridwood - Blog.Arhg (Blogger)
- Lyndon Antcliff - Cornwall SEO (Wordpress)
- Hamlet Batista - Hamlet Batista (Wordpress)
- Debra Mastaler - The Link Spiel (Blogger)
- Bill Slawski - Seo By The Sea (Wordpress)
- Kalina Jordan - Ask Kalena (Wordpress)
- Andy Beal - Andy Beal (Wordpress)
- Hobo - Hobo-Web (Wordpress)
Make A Robots.txt File
For those who don’t know the Robots.txt syntax, there is an * online tool * you can use to make one if you need one.
Out of interest, while investigating this file, I happened across possibly the longest robots.txt in the world? Yours probably doesn’t need to be quite that big (!), and it’s clear a Robots.txt file is indexable by Google and does aquire Google Pagerank (why I nofollowed those links above in case their owners don’t want link love to that file).
And it wouldn’t be fair to talk about seo people’s Robots.txt without mentioning Brett Tabke, founder of my favourite forum, Webmasterworld. Brett uses his as a blog, when actually, his real robots.txt is here.
Thanks for visiting the Hobo Blog! Why not subscribe to our Feed for free internet marketing tips! We really welcome any feedback on our articles, so don't be scared to leave a comment either!

If you want to search robots.txt files and what’s in them, try the search engine BotSeer, designed exclusively for crawling and indexing robots.txt files.
Best
Lee Giles
Comment by Lee Giles — January 8, 2008 @ 4:27 pm
> A Robots file can, simply, tell search engines certain pages of your site not to return in search results.
I’m sorry, that’s plain false. Disallow’ing files in robots.txt doesn’t prevent from indexing, the Disallow: statement just forbids crawling. Hence it’s a bad idea to steer indexing in robots.txt.
The current robots.txt commands (Disallow:, Allow: and Sitemaps:) are all crawler directives that control crawling but not indexing. Besides Google’s Noindex: experiments (somewhat flawed I may add) there’s no such thing as an indexer directive in robots.txt. Search engines do index URLs that are disallow’ed, and that’s the right thing to do. Disallow’ed URLs do accumulate PageRank, and appear on SERPs as URL-only listings or with titles and snippets pulled from 3rd party sources.
Disallow’ing dupes is one of the most common SEO mistakes by the way. Of course search engines encourage this bad practice, but that’s not exactly a good reason to follow the crowd.
However, “outing” robots.txt files is a great idea, and thanks for the mention.
Comment by Sebastian — January 9, 2008 @ 12:12 am
Sebastian - I stand corrected. threw that line in trying to “simplify” and didn’t pick it up in reread. I’ll fix it so it’s not “misleading”, but leave the comment.

Hand slapped, Hang Head In Shame
Comment by Shaun Anderson — January 9, 2008 @ 12:17 am
Thanks Shaun
And plz add nofollow crap to Kalena’s and Andy’s robots.txt links.
Comment by Sebastian — January 9, 2008 @ 12:22 am
eek! Done, not sure what happened there! Thanks again Sebastian!
Comment by Shaun Anderson — January 9, 2008 @ 12:32 am
Having fun with some of them. It seems I could make a living with robots.txt coaching
If only SEs would implement my enhancements …
Comment by Sebastian — January 9, 2008 @ 12:40 am
I’ll take this time to refer visitors to your post on Robots Txt Directives Thanks for your input clarifying this point
Comment by Shaun Anderson — January 9, 2008 @ 12:47 am
Wow. That’s the first time someone thanked me for inserting blatant comment-author-link spam. Thank *you*!
Comment by Sebastian — January 9, 2008 @ 1:05 am
Yes I was surprised the first time it happened to me too.
The way I figure it, you added more *actual* useful information to this page (which was just a bit of link-baiting fun crafted in the middle of the night) - so you deserve credit in the only way I can give you - a link.
But saying that, that’s enough spamming of my site Sebastian.
Away with you!
Comment by Shaun Anderson — January 9, 2008 @ 1:11 am
Ok Ok … I’ll spam away … and thanks for the link
Comment by Sebastian — January 9, 2008 @ 1:19 am
Sean, I am so intimidated by Sebastian’s knowledge of robots.txt I simply can’t create one. Seriously… that dude’s knowledgable!
Also seriously, old-timers know to autogenerate the robots.txt so what you see when you pull it is not (necessarily) the same robots.txt the search engines get. We used to actually show incorrect ones to everybody but qualified search spiders — to consume competitors valuable time — but I don’t know how many people still bother. From your list of popular bloggers, I doubt any do serious competitive work on their public blogs anyway.
Comment by john andrews — January 9, 2008 @ 4:47 am
Hi John
Yeah tell me about it! Lets just say Sebastian has educated me via email to the wonders of robots.txt since this post, and has sent a couple of hundred stumblers and twitters to witness my slapping
Congrats on top spot on the best SEM forum on the net by the way and for keeping it - marvelous spin!
Actually, I am getting as sick of seeing your mug up there as I did Rand Fiskin’s although I am on the greatest hits myself (through sheer good fortune with Aaron Wall’s article).
I don’t think you’ll be top for long though!
Comment by Shaun Anderson — January 9, 2008 @ 5:09 am
The problem is you have Lucia’s plugin set up so that author comments are nofollow.
To follow in Sebastian’s footsteps here is something on robots.txt and pagerank accumulation
The fun thing is when people have a “secret” area on their site that they don’t want indexed, so they block it in the robots.txt file rather than using a more secure method.
Comment by Andy Beard — January 9, 2008 @ 7:01 am
Cheers Andy - I’ll get Linky Love sorted shortly.
Comment by Shaun Anderson — January 9, 2008 @ 10:01 am
Thanks Shaun you reminded me of something I meant to do, keep up the being nosey with your help I may well finish all those jobs I forgot to do
Comment by Tim Nash — January 9, 2008 @ 11:34 am
Cheers Tim - Hang on - I am about to publish some info Sebastian has let me in on
Comment by Shaun Anderson — January 9, 2008 @ 11:53 am
oh no wasn’t changing my file itself, just making sure nosey parkers get the full experience
Inspired by Brett
Comment by Tim Nash — January 9, 2008 @ 11:55 am
great compilation,
just by looking at a few of these I have gathered some interesting data, insights into what some of our colleagues are up to as well as some specific techniques which can be applied to several of my client’s sites.
Comment by ny seo — January 9, 2008 @ 4:50 pm
Pingback by Thanks for all the ego food! — January 10, 2008 @ 3:15 am
And believe me, he did.
Read Sebastian to Shaun’s - The Idiots Guide To Robots.txt
Comment by Shaun Anderson — January 10, 2008 @ 3:22 am