The debate over exactly what duplicate content is and whether or not duplicate content is a problem has been underway for some time now and shows little sign of dying down. So exactly what is meant by duplicate content and does it really matter?
The widely accepted view is that duplicate content is an important issue and, although one well known and highly respected search engine optimization expert recently wrote an article opposing this view, even a cursory look at the huge mass of material which has been written on this subject recently will clearly show that this is a minority opinion.
If we agree with the view that duplicate content is in fact important, then just how can we define duplicate content? If I produce an original article for an article directory and then re-write that same article for submission to a second directory how are the search engines going to evaluate these two articles and decide whether or not they contain duplicate content? The simple answer is that we do not know, but here is one writer's opinion.
When checking for duplicate content was initially carried out by the search engines it was very much a matter of looking at one web page as a whole against another and no attempt was made to begin dissecting the two pages and comparing individual elements of the pages. In those days it was possible to use identical content and merely add an introduction and conclusion to one of the two pages and that would be enough to escape any duplicate content penalty. Sadly for many publishers these days have long since disappeared.
The search engines now cut up the two pages and examine individual elements and here is the core of today's argument. Most experts agree that attention is now directed towards the main content of a web page rather than the structure of the web page. Many site owners use templates when building their pages which define the structure of each page including things like navigation menus, headers and footers. This is widely thought to be accepted and the search engines do not see this as duplicate content. What the search engines are concerned about is the informational content that is contained in the body of the page. But just how do they examine this page content?
Some people believe that this checking is undertaken at 'block' level (that is to say at the level of individual sentences or paragraphs), while other people argue that filters look for phrases or even for individual words. Noone really knows the answer although it might seem reasonable to assume that the most likely basis for checking would be to use either sentence or phrase matching.
Sentence matching is reasonably clear-cut and simply involves breaking both pages down into chunks defined by the page's punctuation. For instance, look at this sentence:
It is quite easy to get a good deal on a camera, as long as you know where to go.
This could either be viewed as a single sentence or as two sentences, depending upon whether or not you use the time honored definition of a full-stop as being the end of a sentence or adopt a flexible approach and make use of other punctuation marks, such as commas.
Phrase matching is somewhat more complicated. How do you define a phrase? Should a phrase be 2 words or 3 words or 4 words or�?
For the moment let us say that a phrase is defined as 3 words. If this is the case the following phrases would all be classed as duplicate content if they appeared on two pages which were being examined:
Take a look
In those days
You can get
The answer is
Day to day
One way to
Did you know
These five phrases are all standard everyday phrases that could appear on pages about building a greenhouse, fighting breast cancer, healthy eating or any other subject you can think of. Now there are a few people who would say that the search engines do compare pages down to this level. To illustrate this, when I questioned the staff for one popular duplicate checker (Dupecop) about how their program examined duplicate content they said:
"DupeCop compares both individual words and 3-word phrases. It also ignores all punctuation and scans across sentences"
It was not a surprise therefore that when I checked several articles using their system guess is as good as mine.
Over the last few years I have published literally hundreds of articles and have watched the results in terms of duplicate content penalties, as far as any of us can do so. Based upon my own experience I believe that filtering is not conducted clear down to the level of 3 or 4 word phrases but is far more likely to stop at sentence level. Accordingly, providing you are re-writing articles down to sentence level, you should not have a problem in escaping the content filters. In actual fact, even if a couple of sentences are duplicated you should still be okay.
Sustainable Living Articles @ http://www.articlegarden.com
Additional Articles & Information on Internet