Prevent Google Indexing Internal Search Results Pages?
Google says to do it using Robots txt.
Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don’t add much value for users coming from search engines. Google Webmaster Help
Perhaps it comes down to how much value these pages add to your visitors if they land on these pages from search results. I see many websites (especially big sites) that allow their internal search results to be spidered by Googlebot. Of course, if you follow Google’s advice and if people are linking to your search results pages too, and you robots.txt them out, you might be losing out on incoming link equity. Andy beard wrote about something similar some time ago- – SEO Linking Gotchas Even the Pros Make.
Search Engine Land wrote a couple of years ago about this same issue, as did Matt Cutts.
I have used various methods to manage internal search engine results pages over the last few years, but it annoys to see bigger brands ignore this Google guideline (where they benefit). Of course, they could well be screwing themselves in other ways with regards to ignoring this directive and allowing Google to crawl and return these internal serps.
I am no expert in this area. Do you prevent Google indexing the internal search results of your website and how do you do it?
Interested in learning more about Robots.txt? – check out our robots.txt beginners guide with Sebastian – the respected writer of Sebastians Pamphlets
My Twitter buddy Edward Lewis pointed me to this site for more information on Robots Meta Tags if you want to get deep
Written by Shaun Anderson
Instead of preventing Google from indexing internal search result pages, re-write the URLs and remove search result distinctions (search results in title, h1, url, etc). Make them faceted search pages.
Good examples can be seen on Overstock
Many times I simply use a noindex on search results page rather than exposing my search path with robots.txt
I do the same for thankyou pages and order confirmation pages that have tracking code.
Yup this is generally what I do too – thanks for the comment Jeet
Well i think both of the ideas work in such cases: 1:rewrite the urls of internal search results pages 2: just guide robot.txt to not index those pages.
As i was suffering with this issue then i consult with google customer service two months back and they suggested to disallow such urls in robot.txt.
Any ways pretty much informative post, thanks.
Because they use HTTP, robot spider indexers can be slower than local file indexers, and can put more pressure on your web server, as they ask for each page. Some older webservers may crash during this process, either from the number of requests or because they uncover file corruption.