Here’s some notes I took from the recent Google Webmaster Hangout with John Mueller – Webmaster Trends Analyst in Switzerland….
Google Penguin, Crawl Rates and Manual Actions
The Google Penguin update, though a confirmed update, is still processing, so we should see ranking fluctuations as a result of this for a while yet.
Penguin is “an update that’s currently rolling out over a couple of weeks. (I) wouldn’t assume that the current state is the final state. The goal is to have it update a little bit faster, but I can’t pinpoint any specific update frequency“
John was asked a series of questions about crawl rate spikes around times of Penguin updates. When asked if crawl rate could be correlated to a problem with your site “I wouldn’t assume that there’s direct relationship between the crawl rate…..and, kind of, how we view that site from a quality point of view.” … which wasn’t really the intent of the question I don’t think. I think John was alluding to the fact that crawling is a separate process from quality evaluation.
The only time Googlebot crawl rate should be affected is in extreme cases of webspam. “OK, we’re not going to crawl it.””This is something that we do only in really extreme cases.”
He mentioned “manual actions, will still result in us crawling normally” Evidently not ranking normally, though…. or being indexed as fast.
“Some pages we crawl every day. Other pages, every couple of months.” – Some pages are more important than others to Googlebot.
John said “home pages” is where “we forward the pagerank within your website” and “depending on how your website is structured, if content is closer to the Home page, then we’ll probably crawl it a lot faster, because we think it’s more relevant” and “But it’s not something where I’d say you artificially need to move everything three clicks from your Home page”.
Panda evolves – signals can come and go – Google can get better at determining quality:
“So it’s not something where we’d say, if your website was previously affected, then it will always be affected. Or if it wasn’t previously affected, it will never be affected.”
“sometimes we do change the criteria.” PandaSignals
“category pages…. wouldn’t see that as something where Panda would say, this looks bad.“
“Ask them the questions from the Panda blog post….. usability, you need to work on.“
Rumours of a Panda update over the weekend, too.
SEO Myth Busting
John was asked the question “Using exact match anchor text…in navigation or headers…bad for SEO?” and answered “No, not necessarily.”
When asked about “keyword density, in general, is something I wouldn’t focus on. search engines have kind of moved on from there”
On “It’s a legitimate review, it’s just in ALL CAPS” he answered “That should be fine.“
When asked “Is the number of directories within a URL a ranking factor?” he answered “No.”
John said “if we see that things like keyword stuffing are happening on a page, then we’ll try to ignore that, and just focus on the rest of the page”.
Does that imply what we call a keyword stuffing “penalty” for a page, Google calls ‘ignoring that‘. From what I’ve observed, pages can seem to perform bad for sloppy keyword phrase stuffing, although they still can rank for long tail variations of it.
John says “nobody will always see ranking number three for your website for those queries. It’s always fluctuating.”
This flux is not necessarily something to do with a problem per se and “that’s just a sign that our algorithms are fluctuating with the rankings.” I recently described an example of (albeit drastic) ranking fluctuations.
Fluctuating upwards could be a good sign as he mentioned “maybe this is really relevant for the first page, or maybe not.” – then again – the converse is true, one would expect.
He says “a little bit of a push to make your website a little bit better, so that the algorithms can clearly say, yes, this really belongs on the first page.” which I thought was an interesting turn of phrase. ‘First page’, rather than ‘number 1’.
Forum SEO Advice
It’s evident Google wants forum administrators to work harder on managing content Googlebot ‘rates’.
He gives an option to “noindex untrusted post content” and going on says “posts by new posters who haven’t been in the forum before. threads that don’t have any answers. Maybe they’re noindexed by default.“
A very interesting statement was “how much quality content do you have compared to low-quality content“. That indicates google is looking at this ratio. John says to identify “which pages are high-quality, which pages are lower quality, so that the pages that do get indexed are really the high-quality ones.“
John mentions to look too at “threads that don’t have any authoritative answers“.
I think that advice is relevant for any site with lots of content.
Duplicate Content Penalty
John clearly states “We don’t have a duplicate content penalty. It’s not that we would demote a site for having a lot of duplicate content.” and “You don’t get penalized for having this kind of duplicate content” in which he was talking about very similar pages. John says to “provide… real unique value” on your pages.
I think that could be read Google has is not compelled to rank your duplicate content. If it ignores it, it’s different from a penalty. Your original content can still rank, for instance.
An ecommerce SEO tip from John with “variations of product “colors…for product page, but you wouldn’t create separate pages for that.” With these type of pages you are “always balancing is having really, really strong pages for these products, versus having, kind of, medium strength pages for a lot of different products.“
John says “one kind of really, really strong generic page” trumps “hundreds” of mediocre ones.
If “essentially, they’re the same, and just variations of keywords” that should be ok, but if you have ‘millions‘ of them- Googlebot might think you are building doorway pages and that IS risky.
Auto generated Pages
John says to avoid lots of “just automatically generated” pages and “if these are pages that are not automatically generated, then I wouldn’t see that as web spam.“
Conversely then “automatically generated” content = web spam? It is evident Googlebot expects to see a well formed 404 if no page exists at a url.
“no way to completely remove a URL from the site links, apart from putting a noindex on it.“
“hreflang mark-up” to help Google better understand “different language and country variations” for accurate sitelinks
Changing from HTTP to HTTPS – “make sure that you have both variations listed in Webmaster Tools” and “essentially just set up the (301) redirect, set up the rel=canonical.”
John says on affiliate links not to obfuscate form Googlebot and “just use the rel=nofollow there.”
Lazy Loading Scripts and Google Indexation
“Is Googlebot able to trigger lazy loading scripts- lazy loading images for below the fold content” – “This is a tricky thing.”
On lazy loading images John says “test this with Fetch as Google in Webmaster Tools” and “imagine those are things that Googlebot might miss out on.”
“We support 50 megabytes for a sitemap file, but not everyone else supports 50 megabytes. Therefore, we currently just recommend sticking to the 10 megabyte limit,“
Google wants to know when primary page content is updated, not when supplementary page content is modified – “if the content significantly changes, that’s relevant. If the content, the primary content, doesn’t change,then I wouldn’t update it.“
Weird index timestamps in your search snippet probably “sounds like some kind of a time zone calculation” which would explain last week I say a post published and indexed within about one second but with a datestamp of ‘3 hours ago’.
John says if “random page” unexpectedly cached instead of the home page. “usually, that’s a sign that something with canonicalization – make sure that the rel=canonical is working properly”.
Check “URL parameter settings in Webmaster Tools” not misconfigured
John calls out some callers with “I imagine what was happening is that our algorithms were picking up on these, kind of, artificial backlinks.” and as “the Webmaster placed these. Therefore, we shouldn’t really count these as being, kind of, recommendations of your website“. John also said to a caller that he “wouldn’t focus on clicks.” to improve rankings.
Some person wants to meet up with John when he visits Switzerland.
That sounded a little bit weird.
That’s all from the hangout – but at the weekend John pointed to a note from Matt Cutts about new top level domains now available.
Will a new TLD web address automatically be favoured by Google over a .com equivalent?
Matt says of the statement “I read a post by someone offering new top-level domain (TLDs). They made this claim: “Will a new TLD web address automatically be favoured by Google over a .com equivalent? Quite simply, yes it will.”
Sorry, but that’s just not true, and as an engineer in the search quality team at Google, I feel the need to debunk this misconception. Google has a lot of experience in returning relevant web pages, regardless of the top-level domain (TLD). Google will attempt to rank new TLDs appropriately, but I don’t expect a new TLD to get any kind of initial preference over .com, and I wouldn’t bet on that happening in the long-term either. If you want to register an entirely new TLD for other reasons, that’s your choice, but you shouldn’t register a TLD in the mistaken belief that you’ll get some sort of boost in search engine rankings.
I also recently updated my SEO tips for beginners guide.