Duplicate Content
Eliminate Duplicate Content on Your Own Site
Duplicate content can come in many forms. I mentioned earlier to make sure your titles and descriptions on each page are unique.
Some parts of your page – like the header, footer and sidebar – will likely have duplicate content. This is OK – the search engines are pretty good at determining these sections of your page and understanding that this consistency is necessary for your visitors to be able to navigate your site. The important thing is to have enough unique text content on each page to make sure the search engines can differentiate one page from another.
For most sites, this is not a problem. Each of your pages should be on a different topic and have plenty of text for both human and bot visitors.
E-commerce sites, on the other hand, often run into trouble with duplicate content because many product pages have very little text to differentiate one product from another, especially similar products. Here are a two tips to help solve that problem
- Write a unique description for each product
- Implement a user rating system where customers can rate and review each product (e.g. Amazon.com), or use some other relevant UGC strategy to get your visitors to create the unique content for you
Unfortunately, many duplicate content issues are caused by the CMS. When you publish a new page, the CMS might not only publish to the primary page, but also to various archive pages (dated archives, category pages, tag pages, etc.). Structurally, this may be convenient for human visitors, but bots will be confused. The search engines will have to look at all the duplicates and then try to determine which one should show up in the SERPs.
Don’t let them decide for you! Check your CMS to see if these types of issues exist and look for a solution to make sure only one copy of each piece of content is indexed by the search engines. Here are a few examples of how to solve this type of problem:
- Modify your CMS so only the primary content page shows the full text. Archive pages could show a summary or short snippet of the content and then link to the content page for those who want to view the whole thing.
- Use the Meta Robots tag in your pages to disallow indexing of all the duplicate copies of the page: e.g.
<meta name=”robots” content=”noindex,follow”> - Use robots.txt to disallow entire archive directories if they don’t offer any original content of their own: e.g.
User-agent: *
Disallow: /archiveThis would keep all robots from indexing any files in your archive file (or at least all robots who care to follow the command). See Jane and Robot for more details and advanced use cases.
How to Deal with Content Theft
Eventually, you will run into duplicate content issues on other websites, especially if you publish content regularly. Sometimes this will be a problem for you, sometimes not.

You may have heard of Google’s “duplicate content penalties.” For the most part, this is a misnomer. Google rarely penalizes a site for duplicate content; rather it tries to determine the original source of the content to index and then ignores the duplicates. Hopefully your site is rightfully recognized as the original source.
But, if the other site has more authority or links or some other factors, the search engines may rank them ahead of you for your own content. Obviously, that’s not a good situation to be in. If this happens, you can do one or more of the following:
- First, try to contact the owner of the site directly by searching for their WhoIs information. Usually, just a simple request will be enough to get the offending website to remove your stolen content because they don’t want to take the chance that you’ll do either of the next two steps.
- File a Google SPAM report – www.google.com/contact/spamreport.html
- File a DMCA to request removal of the offending material – read this article by Lorelle to get specifics and this one by FreelanceSwitch.
- Contact the site owner’s hosting provider if you can find out who it is. Most hosts don’t want to deal with any potential legal problems, so if you make enough stink they’ll likely “encourage” the website owner to fix the problem.
The unfortunate fact is that you’ll never get them all. There will always be other sites grabbing your content and republishing it in different ways (especially if you have RSS feeds or other syndication technologies that they can easily scrape – and no, the answer isn’t to stop publishing content!). Choose your battles wisely, because if you spend all your time chasing content thieves you won’t produce anything worthwhile.
