Duplicate content is when snippets or entire pages of your website content appear in more than one location on the Internet (1). This causes problems with your search engine optimization (SEO) because when search engines scan these sources that contain duplicate content, they don't know what page to show to the user.
Search engine crawlers seek out information through unique URLs it finds through various links, sometimes leading it to the exact same content through different URLs. Managers of large, dynamic websites often create two or more URLs that link to the same content without even realizing it. This is a very common occurrence in website development. Having duplicate content does not always result in a penalty from Google, but these issues are important to address. Duplicate content can cause search engines to ignore your page, or worse, your entire website (2).
In February 2011, Google released an algorithm update, popularly known as Panda, which prevents websites with low-quality or thin content from ranking well in search results. This change was a reaction to widespread content theft in the Web development community.
Since then, Google has released several updates to Panda and now offers tips to webmasters about how it distinguishes the good from the bad. Duplicate content is a part of a broader evaluation of what determines quality content, and Web developers have become much more attentive to them than in the past. Before attempting to address duplicate content, business owners should first understand what causes the issue in the first place.
Ecommerce websites are particularly susceptible to duplicate content because they often have multiple categories and product pages with shared attributes. An online store that has 80 products, each with similar features and descriptions, can easily look like overlapping duplicate content to Google bots. Even though the process can be time consuming, writing unique descriptions for all of your products can be the best way to prevent your pages from looking the same to a search engine crawler.
There are dozens of reasons why duplicate content can pop up on a website, and most of the time the culprit is a technical mistake (3).
Variations in the URL: People often make the mistake of believing that Web pages are separate files stored on a Web server. The reality is that the same content can often be accessed by many different URLs. For instance, many marketers like to append URLs with parameters that enable them to track inbound traffic in Google Analytics. Google sees this as a completely new URL that points to exactly the same content as another. Most marketers believe that UTM parameters don't cause major SEO problems, but one way to address this is to use a canonical tag in the Web page code, specifying the URL that represents the preferred version of the page.
Domains with and without www: Browsers today will display to a website to the user regardless of whether or not he or she typed "www." Duplicate content issues arise when search engines see both www and non-www addresses as two different versions of the same page. Again, the best way to fix this is to use a rel=canonical tag in the code, telling Google which page is the best version for scanning.
Session IDs: When a user shops on your website, they need to be able to store items in their carts while they browse other pages. This information is called a session ID, a piece of code that is unique to every user. Sometimes this code is embedded in the URL structure, essentially giving every user their own URLs for your website, and all of them will be read by search engines as duplicate content. Most ecommerce platforms offer an option to remove session IDs from the URLs. You will need to change this in your settings.
Pagination - If you paginate longer posts within your website, or create comment pages, you can end up creating duplicate pages. Most CMS systems offer an option to disable that feature in settings and head off the problem.
You can see if your website has duplicate content by checking your Google Webmaster Tools account. Look in the section called "Optimization," and there you will see "HTML Improvements." Duplicate items will be listed there. You can also use SEO analytics programs, like Moz, which have crawl diagnostic tools that will tell you if problems exist.
Even though most duplicate content is unintentional and rarely earns a penalty from search engines, it's a good practice to stay attentive to how Google is reading your website, particularly if you add new content or products often.