WordPress Duplicate Content Vulnerability

Posted by Saad Hamid on August 11, 2007

Wordpress

The following is a Guest Blog Post written by Lovedeep Wadhwa who is a technology analyst and gives you different blogging tips and Internet related news at his blog Freakitude. Please visit his blog for further articles on blogging and technology.

WordPress Duplicate Content Bug

Greg Mulhauser brought into my attention a Duplicate Content Vulnerability present in WordPress and Movable type.

If you are a WordPress user using permalinks on your blog then you must notice that the content of your posts is accessible from infinite number of different urls. You just have to append a sequence of extra digits to the end of a post’s URL.

For Example take a look at this latest post from Matt Cutt’s Blog.

The same post is also available on these urls:

http://www.mattcutts.com/blog/minty-fresh-indexing/123456/

http://www.mattcutts.com/blog/minty-fresh-indexing/45678/

When we try to access the post from these urls. WordPress doesn’t return a 301 redirect or a 404 error but simply makes the post content available on these url.

Fix The Duplicate Content Bug

If you are on a self hosted WordPress blog, you can fix the vulnerability by placing the following rules in your htaccess file.

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_URI} !.*(/page/[0-9]*/?)$
RewriteCond %{REQUEST_URI} !^/200[0-9]/?$
RewriteCond %{REQUEST_URI} !^/200[0-9]/[01][0-9]/?$
RewriteCond %{REQUEST_URI} !^/200[0-9]/[01][0-9]/[0-3][0-9]/?$
RewriteRule (.*)(/[0-9]+/?)$ $1/ [R=301,L]
</IfModule>

If someone tries to access the post from these urls will get a SEO friendly redirect to the original post.

If you are using www preference on your blog, there are some special precautions to take to avoid a double redirect.

I prefer you read this post to know more. The post also gives info about fixing this duplicate content issue for Movable Type blogs and WordPress MU blogs.

{ 8 comments… read them below or add one }

Dj Flush August 11, 2007 at 11:14 am

Lovedeep Thanks a lot for bringing this issue to the concern of all bloggers out there.

Especially for bloggers like me who are very much concerned about their blog search engine optimization this post is truly a must read.

I am scared only because I do have the www preference set on my blog and I am still confused how to fix the bug without triggering a double redirect.

Reply

Shankar Ganesh August 11, 2007 at 11:39 am

Many thanks for bringing this news to light. And thanks for the remedial measure as well ;)

Reply

Lovedeep Wadhwa August 11, 2007 at 1:27 pm

Thanks for the comments. :)

If you are using www preference, insert http://www.yourdomain.com/ in front of the $1 to avoid a double redirect.


RewriteEngine On
RewriteCond %{REQUEST_URI} !.*(/page/[0-9]*/?)$
RewriteCond %{REQUEST_URI} !^/200[0-9]/?$
RewriteCond %{REQUEST_URI} !^/200[0-9]/[01][0-9]/?$
RewriteCond %{REQUEST_URI} !^/200[0-9]/[01][0-9]/[0-3][0-9]/?$
RewriteRule (.*)(/[0-9]+/?)$ http://www.domain.com/$1/ [R=301,L]

Reply

DarrinW August 11, 2007 at 1:42 pm

Thanks, Lovedeep and DJ,

Thats a very good tip. I’ll try it and see if it works to get better SEO.

Now its harder to know about supplemental pages, because of the Google changes, since supplemental pages are usually because of duplicate content.

Reply

Dj Flush August 11, 2007 at 2:08 pm

Lovedeep Buddy thanks a lot for helping me out

DarrinW You are welcome :)

Reply

Lovedeep Wadhwa August 11, 2007 at 3:47 pm

You are welcome DJ :)

Reply

Andrew August 12, 2007 at 4:10 pm

Fantastic tip – thanks very much for the heads up.

Reply

Ross February 7, 2009 at 6:55 pm

Thank you, this is an amazing post, great share.

Reply

Leave a Comment

blog comments powered by Disqus