First – a definition of ‘duplicate content':
Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin…..
It’s very important to understand that, as a small business owner, if you republish posts, press releases, news stories or product descriptions found on other sites, your pages are going to struggle to gain in traction in Google’s SERPS (search engine results pages). If your entire site is made of entirely of republished content – Google does not want to rank it.
Mess up with duplicate content, and it might look like a penalty, as the end result is the same – you don’t rank.
A good rule of thumb is do NOT expect to rank high in Google with content found on other, more trusted sites, and don’t expect to rank at all if all you are using is automatically generated pages with no ‘value add’. While there are exceptions to the rule, (and Google certainly treats your OWN duplicate content on your on site differently), your best bet in ranking in 2014 is to have one single version of content on your site with rich, unique text content that is written specifically for that page.
Google wants to reward RICH, UNIQUE, RELEVANT, INFORMATIVE and REMARKABLE content in it’s organic listings – and it’s really raised the quality bar over the last few years. If you want to rank high in Google for valuable key phrases and for a long time – you better have good, original content for a start – and lots of it.
If you have many pages of similar content your site, Google might have trouble choosing the page you want to rank, and it might dilute your capability to rank for what you do what to rank for. For instance, if you have PRINT ONLY versions of content, that can end up displaying in Google instead of your web page, if you’ve not handled it properly. That’s probably going to have an impact on conversions – for instance.
Google Penalty For Duplicate Content On-Site?
Since I wrote this article back in 2009, Google has given us some explicit guidelines when it comes to managing duplication of content.
Generally speaking, Google will identify the best pages on your site if you have a decent on-site architecture. It’s usually pretty good at returning specific duplicate content depends on a number of other factors.
The advice is to avoid duplicate content issues if you can and this should be common sense. Google wants and rewards original content – it’s a great way to push up the cost of seo and create a better user experience at the same time. Google doesn’t like it when ANY TACTIC it’s used to manipulate it’s results, and republishing content found on other websites is common tactic of a lot of spam sites.
You don’t want to look anything like a spam site, that’s for sure – and Google WILL classify your site… as something.
The more you can make it look a human built every page on a page by page basis with content that doesn’t appear exactly in other areas of the site – the more Google will like it. Google does not like automation when it comes to building a website, that’s clear in 2014.
I don’t mind multiple copies of articles on the same site – as you find with WordPress categories or tags, but I wouldn’t have tags and categories, for instance, indexable on a small site, and especially not targeting the same keyword phrases.
I prefer to avoid unnecessary repeated content on my site, and when I do have automatically generated content (like my news feed), I tell Google not to noindex it in meta tags or in XRobots (in this case – I AM probably doing the safest thing, as that could be seen as a scraper if I intended to get it indexed.
Google won’t thank you, either, for spidering a calendar folder with 10,000 blank pages on it, or a blog with more categories than original content – why would they?
…in some cases, content is deliberately duplicated across domains in an attempt to manipulate search engine rankings or win more traffic. Deceptive practices like this can result in a poor user experience, when a visitor sees substantially the same content repeated within a set of search results. Google tries hard to index and show pages with distinct information. This filtering means, for instance, that if your site has a “regular” and “printer” version of each article, and neither of these is blocked with a noindex meta tag, we’ll choose one of them to list. In the rare cases in which Google perceives that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we’ll also make appropriate adjustments in the indexing and ranking of the sites involved. As a result, the ranking of the site may suffer, or the site might be removed entirely from the Google index, in which case it will no longer appear in search results. GOOGLE
If you are trying to compete in competitive niches you need original content that’s not found on other pages in the exact same form on your site, and THIS IS EVEN MORE IMPORTANT WHEN THAT CONTENT IS FOUND ON OTHER PAGES ON OTHER WEBSITES.
Google isn’t under any obligation to rank your version of content – in the end it depends who’s site has got the most domain authority or most links coming to the page.
Don’t unnecessarily compete with these dupe pages by always rewriting your content if you think the content will appear on other sites (especially if you are not the first to ‘break it’, if it’s news).
A Dupe Content Strategy?
There are strategies where this will still work, in the short term. Opportunities are reserved for long tail serps where the top ten results page is already crammed full of low quality results and the SERPS are shabby – certainly not a strategy for competitive terms.
There’s not a lot of traffic in long tail results unless you do it en-mass and that could invite further site quality issues, but sometimes it’s worth exploring if using very similar content with geographic modifiers (for instance) on a site with some domain authority has opportunity. Very similar content can be useful across TLDs too. A bit spammy, but if the top ten results are already a bit spammy…
If low quality pages are performing well in the top ten of an existing long tail SERP – then it’s worth exploring – I’ve used it in the past, but I am not keen on it today. I always thought if it improves user experience and is better than whats there in those long tail searches at present, who’s complaining? Unfortunately that’s not exactly best practice seo in 2014, and I’d be nervous creating any type of low quality pages on your site these days.
Too many low quality pages might cause you site wide issues in the future, not just page level issues.
Original Content Is King
Stick to original content, found on only one page on your site, for best results – especially if you have a new/young site and are building it page by page over time… and you’ll get better rankings and more traffic to your site (affiliates too!). Yes – you can be create – and reuse and repackage content, but I always make sure if I am asked to rank a page I will require original content on the page.
There is NO NEED to block your own Duplicate Content
There was a useful post in Google forums a while back with advice from Google how to handle very similar or identical content:
“We now recommend not blocking access to duplicate content on your website, whether with a robots.txt file or other methods” John Mueller
John also goes on to say some good advice about how to handle duplicate content on your own site:
- Recognize duplicate content on your website.
- Determine your preferred URLs.
- Be consistent within your website.
- Apply 301 permanent redirects where necessary and possible.
- Implement the rel=”canonical” link element on your pages where you can. (Note – Soon we’ll be able to use the Canonical Tag accross multiple sites/domains too.)
- Use the URL parameter handling tool in Google Webmaster Tools where possible.
Webmaster guidelines on content duplication used to say:
Consider blocking pages from indexing: Rather than letting Google’s algorithms determine the “best” version of a document, you may wish to help guide us to your preferred version. For instance, if you don’t want us to index the printer versions of your site’s articles, disallow those directories or make use of regular expressions in your robots.txt file. Google
but now Google is pretty clear they do NOT want us to block duplicate content, and that is reflected in the guidelines.
Google does not recommend blocking crawler access to duplicate content (dc) on your website, whether with a robots.txt file or other methods. If search engines can’t crawl pages with dc, they can’t automatically detect that these URLs point to the same content and will therefore effectively have to treat them as separate, unique pages. A better solution is to allow search engines to crawl these URLs, but mark them as duplicates by using the
rel="canonical"link element, the URL parameter handling tool, or 301 redirects. In cases where DC leads to us crawling too much of your website, you can also adjust the crawl rate setting in Webmaster Tools. DC on a site is not grounds for action on that site unless it appears that the intent of the DC is to be deceptive and manipulate search engine results. If your site suffers from DC issues, and you don’t follow the advice listed above, we do a good job of choosing a version of the content to show in our search results.
Basically you want to minimise dupe content, rather than block it. Google says it really needs to detect an INTENT to manipulate Google to incur a penalty, and you should be OK if you arent doing this, BUT it’s easy to screw up and LOOK as if you are up to something, and it’s also easy to fail to get the benefit of proper canonicalisation and consolidation of relevant content if you don’t do basic housekeeping, for want of a better turn of phrase.
Advice on content spread across multiple domains:
Content Spread Accross Multiple TLDs
Mobile SEO Advice
Canonical Link Element Best Practice
Google also recommends using the canonical link element to help minimise content duplication problems.
If your site contains multiple pages with largely identical content, there are a number of ways you can indicate your preferred URL to Google. (This is called “canonicalization”.)
Google SEO – Matt Cutts from Google shares tips on the new rel=”canonical” tag (more accurately – the canonical link element) that the 3 top search engines now support. Google, Yahoo!, and Microsoft have all agreed to work together in a
“joint effort to help reduce duplicate content for larger, more complex sites, and the result is the new Canonical Tag”.
Example Canonical Tag From Google Webmaster Central blog:
<link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish" />
You can put this link tag in the head section of the problem urls, if you think you need it.
I add a self referring canonical link element as standard these days – to ANY web page.
Is rel=”canonical” a hint or a directive?
It’s a hint that we honor strongly. We’ll take your preference into account, in conjunction with other signals, when calculating the most relevant page to display in search results.
Can I use a relative path to specify the canonical, such as <link rel=”canonical” href=”product.php?item=swedish-fish” />?
Yes, relative paths are recognized as expected with the <link> tag. Also, if you include a<base> link in your document, relative paths will resolve according to the base URL.
Is it okay if the canonical is not an exact duplicate of the content?
We allow slight differences, e.g., in the sort order of a table of products. We also recognize that we may crawl the canonical and the duplicate pages at different points in time, so we may occasionally see different versions of your content. All of that is okay with us.
What if the rel=”canonical” returns a 404?
We’ll continue to index your content and use a heuristic to find a canonical, but we recommend that you specify existent URLs as canonicals.
What if the rel=”canonical” hasn’t yet been indexed?
Like all public content on the web, we strive to discover and crawl a designated canonical URL quickly. As soon as we index it, we’ll immediately reconsider the rel=”canonical” hint.
Can rel=”canonical” be a redirect?
Yes, you can specify a URL that redirects as a canonical URL. Google will then process the redirect as usual and try to index it.
What if I have contradictory rel=”canonical” designations?
Our algorithm is lenient: We can follow canonical chains, but we strongly recommend that you update links to point to a single canonical page to ensure optimal canonicalization results.
Can this link tag be used to suggest a canonical URL on a completely different domain?
**Update on 12/17/2009: The answer is yes! We now support a cross-domain rel=”canonical” link element.**
Tip – Redirect old, out of date content to new, freshly articles on the subject, minimising low quality pages and duplicate content whilst at the same time, improving the depth and quality of the page you want to rank. See our page on 301 redirects – http://www.hobo-web.co.uk/how-to-change-domain-names-keep-your-rankings-in-google/.