Thu 17 Dec 2009
Does Google Remember Every Version Of A Page, Ever?
Blurb by Shaun Anderson (Hobo)SEO Theory for seo geeks of course.
We all know Google finds out, it knows, but I don’t see many people taking about what Google remembers. If trust is such a big thing in Google, just like humans, it needs to:
- find you,
- know you and
- remember your actions,
…to build real trust.
I’ve heard other folk in forums muse a similar thing before – especially about links, but for the first time in a while, using a variation of the site: operator:
I discovered:
- 4 pages in Google serps,
- on one DC,
- all edited at different times,
- all the EXACT SAME URI, and seemingly just
- 4 historic, slightly different versions of the same exact URI.
Is this an indication Google remembers every page and it’s history?
If Google remembers every single historic version of a URI (or page on your website) – then why? I wonder if it might use such knowledge to work things to rank you by – like out how much Google trusts you, for instance.
If it does have that incredible amount of historical data about your actions, surely somebody at Google is tasked with putting this knowledge to use in some way.
For instance.
- Can even the most minor changes to a page indicate your intent?
- Can it build a picture over time about what you are up to?
- Can it be a measurement your site can be held accountable for?
- Could it be a metric which might affect page and site rankings over time?
- In a positive, or negative, fashion?
Is, at any level, Google comparing one version of your page with the page you just changed, or started with, or even just the number of times you’ve modified it?
If this was the case, can you even use this information to your benefit? Of course, thoughts would need to be put into what was an indication of potentially manipulative intent….
Why the seo theory post? This is the sort of things I wonder about when I see interesting results in Google – purely for discussion if you’re game. You must remember everything you read in any seo blog is theory at best anyway so I thought it wouldn’t be out of place.
Should I just Twitter random thoughts like this post and keep your email inbox free of fanciful pap in future
Did you know when you link to a Hobo SEO post we have search engine friendly links back to your site if approved? Our comments are also search engine friendly you know (once you've commented on a few posts)! Do you need any more encouragement to get involved in the conversation ;)

I think Google has an history index like archive.org does
I’ve considered that too – I’m musing what it does with such data. Leave it lying around?
[...] Does Google Remember Every Version Of A Page, Ever?, Hobo [...]
I think it’s reasonable that they can use it to some degree. For instance with QDF you could game it by changing a couple of words hourly via a server script. If they keep a level of history they can see whether changes are sufficient enough to warrant an uplift (e.g. news site homepages releasing breaking news) and whether it deserves crawling more regularly / uplift etc etc.
I think it’d be more pattern matching than anything, plenty of sites will have constant “tinkering”, but I think if Google was to spot a pattern they’d be able to factor it in somehow. Plus, if they were to ever hit a site with a penalty, they could see if/what had been changed over time.
I think monitoring changes on a site would be a perfectly reasonable thing for Google to do, but as you say, it really is a question of “how?”.
If true, I also wonder as to what extent they are. Simple changes to text and content? I doubt they’d waste resources. Links…probably, if I were Google and I were doing this, I’d archive a page every time a link changed.
A fun read for my Friday morning anyway.
If they are keeping history intact it means we should be very careful when purchasing a pre-owned domain. Though I have only purchased a single preowned domain but the experience with it was more of a positive one with it being crawled the best among all my domains though I could never understand the reason for the same.
Too bad there isn’t any definitive way of testing this. The only thing close to the trust a website receives is mozTrust, but since Google doesn’t provide any data about trust we can’t compare and we can never be sure.
I’m wondering what variation of the site: operator: did you use?
A little off topic but I really like the look of your email blasts. They are clean with a big bold headline and a lot of white space. I get countless emails from different seo blogs but I always read yours. It’s easy!
Generally with theory I would prefer to keep it out of my mailbox but given that I subscribed to you and are interested in what you say then I don’t mind these once in a while as they make you think. You are not a guy who sends these out everyday. Sometimes I don’t hear from you for days and days which is fine. I do like proof or hard evidence… so screenshots of your findings are nice if you can share ïŠ
Interms of keeping everything – yes of course we all do? I cant image backup tapes sitting in the bin outside google (so to speak) – that’s a gold mine!
It can be used to back track changes they have made then find out who reacted first and correctly… then put those SEOs on a watch list!
Everyone loves a good Friday morning (US Time) Google theory that really gets the gears grinding, especially without the first caffeine fix kicked in.
On Ian’s point, I have met a number of webmasters who have tried scripts that rotate text on pages to appear that it is constantly being updated. So, if Google is recording historical changes this may be a tale of caution as penalties could ensue if Google deciphers the pattern.
On Alok’s Point, at this past Webmaster World PubCon in Las Vegas, Matt Cutts did mention caution about buying domains, specifically if they had ever been hosted on an IP block that was in disfavor with the search engines. He said it can be a challenge to clear those domains of their bad reputation.
Shaun – anyway you can share screenshots, URI and method of finding the 4 pages? (Visual learner) Would love to test it out for myself as well.
Please don’t keep these kind of posts off the blog. What I love most about SEO is the theory, speculation, and testing. Can you imagine how boring our jobs would be if we actually knew all there is to know.
Personally I think it makes perfect sense to track the history of a website or page over time. We already know they have cached versions to check against so keeping a more robust history only makes sense. So I don’t think its a question of if, but why and how would it change how we do our jobs.
I think it would be very unlikely that google is allocating the resources to store this information were it not of any use for determining trust. When we look at the way google handles other issues such as penalties you can see that there is already evidence that a memory of your actions is kept and considered when determining your rankings. Great Post Shaun.
AS someone who is still a novice to SEO (if the term still relevant..) it seems that there is this almost battle going on between what Google knows, what we think it does all really being driven by the fact that if your website is not on Page 1 then the average user is not going to search further.
If Google were to store info on every page the storage requirements would be mind boggling (even compared against the Hadron Collider which to my knowledge has one of the largest data stores on the planet).
No… Or at least I doubt it… A large proportion of pages fall under supplementary content, which means Google acknowledges it exists, but their spiders rarely if ever revisit the page… Much of the time I believe you can identify this if the pagerank of the particular page is N/A…