Home > Case Studies, Ethics, Online & Search Issues > How to spot web-site plagiarism and copyright theft with CopyScape

How to spot web-site plagiarism and copyright theft with CopyScape

AWARE has had a web-site since 1995 and our current domain (www.marketing-intelligence.co.uk) has been active since 1997. When we started there were less than 100,000 companies on the web. Google’s founders had not yet met each other, and even venerable search engines such as AltaVista had not yet started.

Over the years, we’ve made an effort to ensure that our web content was not copied and used on other sites without our permission.

Although doing manual checks by searching for key phrases is one way of checking for plagiarism and copyright theft, there are a number of dedicated plagiarism checking sites. One example is Plagium. Plagium’s drawback, shared with several similar services, is that you have to paste in the text you want to test rather than just enter the URL. Such services are generally aimed at helping teachers and college professors detect student cheating.

Although some services (such as Plagium) are free, most are not and may involve downloading dedicated software. Others only check a limited number of known “essay” type sites where students can download essays written by others. (We’ve found some of our content on such sites – evidently students who use them don’t care where they steal their content from. Once used successfully they then try to reuse it by uploading their A+ essay to the site).

CopyScape

Of all plagiarism detection websites probably the easiest and best is CopyScape. Copyscape’s aim is not only to help academics detect student cheating. It also allows webmasters to search for copied content in general. It doesn’t require users to paste in the suspect text. Instead web-site owners simply need to enter their URLs and get a report on other sites that use similar or identical wording. It’s sufficiently powerful that there is even a flippant web-page on ContentBoss’s website giving advice on how to bypass CopyScape and copy with impunity. (ContentBoss promises to provide unique content at a low monthly fee. Their bypass CopyScape tool uses a technique that will convert content into HTML guaranteed not to be picked up by plagiarism detectors. The catch is, as pointed out by ContentBoss, that using such content is also a guarantee that the site will be banned by search engines for spam content).

We’ve used CopyScape periodically over the years and miscreants included a competitor site that copied multiple pages from our site. We asked the site owner to change his pages and were ignored. We then took stronger action and within a couple of days the site was taken down. Another example involved an article published in a professional journal that took, almost verbatim, the content of our brief guide to competitive intelligence. We notified the publisher who ensured that the payment made to the “author” was recovered, and an apology published. The author said that he thought that material published on the web was copyright free. He was shown to be wrong.

Our most recent trawl for examples of copyright theft from AWARE’s pages turned up further examples where wording we’ve used has been stolen. The following images should show how effective the tool is – while at the same time naming and shaming the companies that are too weak, lazy or incompetent to produce their own copy and have to steal from others. (I’ve named them – but won’t give them the satisfaction of a link as this could help their search engine optimisation efforts – if they have any!)

The first example shows how text that appears on the footer of most of our pages is plagiarised.

This is the orignal text.

CopyScape found several sites had copied this text almost verbatim – for example Green Oasis Associates based in Nigeria:

or ICM Research from Italy and Pearlex from Virginia in the USA.

The ICM Research example is in fact the worst of these three, as their site has taken content from several other AWARE web-site pages.

The problem is that a company that is willing to steal content from other businesses is unethical – breaking the rule against misrepresenting who you are. If they are willing to steal content from others, they may also take short-cuts in the services provided and as a result should not be trusted to provide a competent service.

The page that is most often plagiarised is the Brief Guide to Competitive Intelligence Page, mentioned above. Clicking on a link found by CopyScape highlights the copied portions as seen in the following examples from AGResearch, Emisol and Wordsfinder.

Generally sites do not copy whole pages (although this does happen) but integrate chunks of stolen text into their pages – as seen in the AGResearch example, below – where 12% of the page is copied, and Wordsfinder where 13% has been copied.

The Emisol example below stole less – although copied key parts of the guide page:

Conclusion

Copyright theft is a compliment to the author of the original web-page, as it shows that the plagiarizing site views their competitor as top quality. However the purpose of writing good copy is to stand out and show one’s own capabilities. Sites that steal other site’s work remove this advantage as they make the claims seem anodyne and commonplace. They devalue both the copier – who cannot come up with their own material (and so are unlikely to be able to provide a competent service anyway) and the originator, as most people won’t be able to tell who came first. Fortunately search engines can, and when they detect duplication, they are likely to downplay the duplicated material meaning that such sites are less likely to appear high-up in search engine rankings. The danger is that both the originator of the material and the plagiarizer may get penalised by search engines – which is another reason to ensure that copyright thieves are caught and stopped. CopyScape is one tool that really works in protecting authors from such plagiarism.

  1. Babette
    March 12, 2012 at 2:12 am

    Amazing Arthur. Thank you for the heads up – I had no idea! In fact I spotted something that Yves Michel Marti presented at a conference back in 1995 on one of the texts you highlighted. I always learn so much from you. Thanks again.

  2. March 12, 2012 at 4:45 am

    Excellent job on the writeup!

  3. March 12, 2012 at 5:17 am

    A friend recommended me to your website. Thank you for the details.

  4. March 12, 2012 at 8:10 am

    Hello, I do believe your site could be having
    web browser compatibility issues. Whenever I take a look at your site in Safari, it looks fine however, when opening in
    IE, it’s got some overlapping issues. I merely wanted to provide you with a quick heads up! Besides that, fantastic website!

    • March 13, 2012 at 1:02 pm

      Thanks very much for this. I think it was one page only – and hopefully it’s now been corrected.

      That’s the problem with multiple browsers obeying different standards. They are all supposed to be the same (following the W3C rules) although they aren’t – and IE is probably the most inconsistent.

      We tend not to use IE although things are tested on Chrome, Safari, Firefox, and IE. However we don’t test every page on all browsers and this slipped through. I looked at most other pages and I think it was only one page that had the overlap. If you found more, please reply.

  5. March 12, 2012 at 10:09 am

    Thanks , I have just been searching for information approximately this topic for ages and yours is the greatest I’ve came upon so far. However, what concerning the conclusion? Are you sure about the source?

  6. March 12, 2012 at 10:21 am

    This is really provocative information. It set off alarms in my brain that have been dormant for a while. Thank you for making it interesting and clear. I have been searching for content like this.

  7. March 12, 2012 at 12:50 pm

    wow, this is what I’m looking for. Thank you so much for helping me.

  8. March 13, 2012 at 8:32 am

    Fascinating blog! Is your theme custom made or did you download it from
    somewhere? A theme like yours with a few simple tweeks would really make my
    blog shine. Please let me know where you got your design.

    Many thanks

    • March 13, 2012 at 12:56 pm

      It’s a standard WordPress design – INove by mg12

  9. March 15, 2012 at 1:08 am

    bookmarked!!, I really like your website!

  10. 掉髮
    May 25, 2013 at 7:57 pm

    Hello! I simply wish to give a huge thumbs up for the great info you may have right here on this post.

  1. July 5, 2014 at 9:48 pm

Leave a comment