Case Studies | Find It Out - Research Secrets and More

How to spot web-site plagiarism and copyright theft with CopyScape

March 11, 2012 14 comments

AWARE has had a web-site since 1995 and our current domain (www.marketing-intelligence.co.uk) has been active since 1997. When we started there were less than 100,000 companies on the web. Google’s founders had not yet met each other, and even venerable search engines such as AltaVista had not yet started.

Over the years, we’ve made an effort to ensure that our web content was not copied and used on other sites without our permission.

Although doing manual checks by searching for key phrases is one way of checking for plagiarism and copyright theft, there are a number of dedicated plagiarism checking sites. One example is Plagium. Plagium’s drawback, shared with several similar services, is that you have to paste in the text you want to test rather than just enter the URL. Such services are generally aimed at helping teachers and college professors detect student cheating.

Although some services (such as Plagium) are free, most are not and may involve downloading dedicated software. Others only check a limited number of known “essay” type sites where students can download essays written by others. (We’ve found some of our content on such sites – evidently students who use them don’t care where they steal their content from. Once used successfully they then try to reuse it by uploading their A+ essay to the site).

CopyScape

Of all plagiarism detection websites probably the easiest and best is CopyScape. Copyscape’s aim is not only to help academics detect student cheating. It also allows webmasters to search for copied content in general. It doesn’t require users to paste in the suspect text. Instead web-site owners simply need to enter their URLs and get a report on other sites that use similar or identical wording. It’s sufficiently powerful that there is even a flippant web-page on ContentBoss’s website giving advice on how to bypass CopyScape and copy with impunity. (ContentBoss promises to provide unique content at a low monthly fee. Their bypass CopyScape tool uses a technique that will convert content into HTML guaranteed not to be picked up by plagiarism detectors. The catch is, as pointed out by ContentBoss, that using such content is also a guarantee that the site will be banned by search engines for spam content).

We’ve used CopyScape periodically over the years and miscreants included a competitor site that copied multiple pages from our site. We asked the site owner to change his pages and were ignored. We then took stronger action and within a couple of days the site was taken down. Another example involved an article published in a professional journal that took, almost verbatim, the content of our brief guide to competitive intelligence. We notified the publisher who ensured that the payment made to the “author” was recovered, and an apology published. The author said that he thought that material published on the web was copyright free. He was shown to be wrong.

Our most recent trawl for examples of copyright theft from AWARE’s pages turned up further examples where wording we’ve used has been stolen. The following images should show how effective the tool is – while at the same time naming and shaming the companies that are too weak, lazy or incompetent to produce their own copy and have to steal from others. (I’ve named them – but won’t give them the satisfaction of a link as this could help their search engine optimisation efforts – if they have any!)

The first example shows how text that appears on the footer of most of our pages is plagiarised.

This is the orignal text.

CopyScape found several sites had copied this text almost verbatim – for example Green Oasis Associates based in Nigeria:

or ICM Research from Italy and Pearlex from Virginia in the USA.

The ICM Research example is in fact the worst of these three, as their site has taken content from several other AWARE web-site pages.

The problem is that a company that is willing to steal content from other businesses is unethical – breaking the rule against misrepresenting who you are. If they are willing to steal content from others, they may also take short-cuts in the services provided and as a result should not be trusted to provide a competent service.

The page that is most often plagiarised is the Brief Guide to Competitive Intelligence Page, mentioned above. Clicking on a link found by CopyScape highlights the copied portions as seen in the following examples from AGResearch, Emisol and Wordsfinder.

Generally sites do not copy whole pages (although this does happen) but integrate chunks of stolen text into their pages – as seen in the AGResearch example, below – where 12% of the page is copied, and Wordsfinder where 13% has been copied.

The Emisol example below stole less – although copied key parts of the guide page:

Conclusion

Copyright theft is a compliment to the author of the original web-page, as it shows that the plagiarizing site views their competitor as top quality. However the purpose of writing good copy is to stand out and show one’s own capabilities. Sites that steal other site’s work remove this advantage as they make the claims seem anodyne and commonplace. They devalue both the copier – who cannot come up with their own material (and so are unlikely to be able to provide a competent service anyway) and the originator, as most people won’t be able to tell who came first. Fortunately search engines can, and when they detect duplication, they are likely to downplay the duplicated material meaning that such sites are less likely to appear high-up in search engine rankings. The danger is that both the originator of the material and the plagiarizer may get penalised by search engines – which is another reason to ensure that copyright thieves are caught and stopped. CopyScape is one tool that really works in protecting authors from such plagiarism.

Analysing weak signals for competitive & marketing intelligence

March 5, 2012 6 comments

I’ve just read an interesting blog post by Philippe Silberzahn and Milo Jones. The post “Competitive intelligence and strategic surprises: Why monitoring weak signals is not the right approach” looked at the problems of weak signals in competitive intelligence and how even though an organisation may have lots of intelligence, they still get surprised.

Silberzahn and Jones point out that it’s not usually the intelligence that is the problem, but the interpretation of the gathered intelligence. This echoed a statement by Issur Harel, the former head of Mossad responsible for capturing the Nazi war criminal Eichmann. Harel was quoted as saying “We do not deal with certainties. The world of intelligence is the world of probabilities. Getting the information is not usually the most difficult task. What is difficult is putting upon it the right interpretation. Analysis is everything.”

In their post, Silberzahn and Jones argue that more important than monitoring for weak signals, is the need to monitor one’s own assumptions and hypotheses about what is happening in the environment. They give several examples where weak signals were available but still resulted in intelligence failures. Three different types of failure are mentioned:

Too much information: the problem faced by the US who had lots of information prior to the Pearl Harbour attack of 7 December 1941,
Disinformation, as put out by Osama bin Laden to keep people in a high-state of alert – by dropping clues that “something was about to happen“, when nothing was (and of course keeping silent when it was),
“Warning fatigue” (the crying wolf syndrome) where constant repetition of weak signals leads to reinterpretation and discounting of threats, as happened prior the Yom Kippur war.

Their conclusion is that with too much data, you can’t sort the wheat from the chaff, and with too little you make analytical errors. Their solution is that rather than collect data and subsequently analyse it to uncover its meaning you should first come up with hypotheses and use that to drive data collection. They quote Peter Drucker (Management: Tasks, Responsibilities, Practices, 1973) who wrote: “Executives who make effective decisions know that one does not start with facts. One starts with opinions… To get the facts first is impossible. There are no facts unless one has a criterion of relevance.” and emphasise that “it is hypotheses that must drive data collection”.

Essentially this is part of the philosophy behind the “Key Intelligence Topic” or KIT process – as articulated by Jan Herring and viewed as a key CI technique by many Competitive Intelligence Professionals.

I believe that KITs are an important part of CI, and it is important to come up with hypotheses on what is happening in the competitive environment, and then test these hypotheses through data collection. However this should not detract from general competitive monitoring, including the collection of weak signals.

The problem is how to interpret and analyse weak signals. Ignoring them or even downplaying them is NOT the solution in my view – and is in fact highly dangerous. Companies with effective intelligence do not get beaten or lose out through known problems but from unknown ones. It’s the unknown that catches the company by surprise, and often it is the weak signals that, in hindsight, give clues to the unknown. In hindsight, their interpretation is obvious. However at the time, the interpretation is often missed, misunderstood, or ignored as unimportant.

There is an approach to analysing weak signals that can help sort the wheat from the chaff. When you have a collection of weak signals don’t treat them all the same. Categorise them.

Are they about a known target’s capabilities? Put these in box 1.
Are they relating to a target’s strategy? These go into box 2.
Do they give clues to a target’s goals or drivers? Place these in box 3.
Can the weak signal be linked to assumptions about the environment held by the target? These go into box 4.

Anything else goes into box 5. Box 5 holds the real unknowns – unknown target or topic or subject. You have a signal but don’t know what to link it to.

First look at boxes 1-4 and compare each bit of intelligence to other information.

Does it fit in? If so good. You’ve added to the picture.
If it doesn’t, why not?

Consider the source of the information you have. What’s the chronology? Does the new information suggest a change? If so, what could have caused that change? For this, compare the other 3 boxes to see if there’s any information that backs up the new signal – using the competitor analysis approach sometimes known as 4-corners analysis, to see if other information would help create a picture or hypothesis of what is happening.

If you find nothing, go back and look at the source.

Is it old information masquerading as new? If so, you can probably discount it.
Is it a complete anomaly – not fitting in with anything else at all? Think why the information became available. Essentially this sort of information is similar to what goes into box 5.
- Could it be disinformation? If so, what is likely to be the truth? Knowing it may be disinformation may lead to what is being hidden?
- Or is it misinformation – which can probably be discounted?
- What about if you can’t tell? Then it suggests another task – to try and identify other intelligence that would provide further detail and help you evaluate the anomaly. Such weak signals then become leads for future intelligence gathering.

With box 5 – try and work out why it is box 5. (It may be that you have information but no target to pin it to, for example – so can’t do the above). As with anomalies, think why the information became available. You may need to come up with a number of hypotheses to explain meaning behind the information. These can sometimes (but not always) be tested.

Silberzahn and Jones mention a problem from Nassim Taleb’s brilliant book “The Black Swan: The Impact of the Highly Improbable“. The problem is how do you stop being like a turkey before Thanksgiving. Prior to Thanksgiving the turkey is regularly fed and given lots and lots of food. Life seems good, until the fateful day, just before Thanksgiving, when the food stops and the slaughterer enters to prepare the turkey for the Thanksgiving meal. For the turkey this is a complete surprise as all the evidence prior to this suggests that everything is going well. Taleb poses the question as to whether a turkey can learn from the events of yesterday what is about to happen tomorrow. Can an unknown future be predicted – and in this case, the answer seems to be no.

For an organisation, this is a major problem as if they are like turkeys, then weak signals become irrelevant. The unknown can destroy them however much information they hold prior to the unforeseen event. As Harel said, the problem is not information but analysis. The wrong analysis means death!

This is where a hypothesis approach comes in – and why hypotheses are needed for competitive intelligence gathering. In the Thanksgiving case, the turkey has lots of consistent information coming in saying “humans provide food”. The key is to look at the source of the information and try to understand it. In other words:

Information: Humans provide food.
Source: observation that humans give food every day – obtained from multiple reliable sources.

You now need to question the reason or look at the objectives behind this observation. Why was this observation available? Come up with hypotheses that can be used to test the observations and see what matches. Then choose a strategy based on an assessment of risk. In the case of the turkey there are two potential hypotheses:

“humans like me and so feed me” (i.e. humans are nice)
“humans feed me for some other reason” (i.e. humans may not be nice).

Until other information comes in to justify hypothesis 1, hypothesis 2 is the safer one to adopt as even if hypothesis 1 is true, you won’t get hurt by adopting a strategy predicated on hypothesis 2. (You may not eat so much and be called skinny by all the other turkeys near you. However you are less likely to be killed).

This approach can be taken with anomalous information in general, and used to handle weak signals. The problem then becomes not the analysis of information but the quantity. Too much information and you start to drown and can’t categorise it – it’s not a computer job, but a human job. In this case one approach is to do the above with a random sample of information – depending on your confidence needs and the quantity of information. This gets into concepts of sampling theory – which is another topic.

Internet Explorer is for Dummies! Anatomy of a hoax.

August 7, 2011 15 comments

Good business intelligence quickly identifies information that is real and what’s false – or should. It’s important that decision making is based on accurate, factual data – as otherwise bad decisions get made. So how do you tell whether something is real or fake?

Generally, the first rule is to check the source or sources.

Are they reputable and reliable?
Is the information in the story sensible and reasonable?
What’s the background to the story – does it fit in with what’s already known?

The problem is that even if information passes these tests it may still not be true. There are numerous examples of news items that sound true but that turn out to be false. One example is a BBC news story from 2002 quoting German researchers who claimed that natural blondes were likely to disappear within 200 years. A similar story appeared in February 2006 in the UK’s Sunday Times. This article quoted a WHO study from 2002. In fact, there was no WHO study that stated this – it was false. The story of blonde extinction has been traced back over 150 years and periodically is reported – always with “scientific” references to imply validity.

The “Internet Explorer users have lower IQs” hoax

Often, the decision to accept a news item depends on whether or not it sounds true. If the story sounds true, especially if supported by apparent research then people think that it probably is – and so checks aren’t made. That is why a recent news story suggesting that users of Internet Explorer have lower IQs than those of other browsers was reported so widely. Internet Explorer is often set up as the default browser on Windows computers, and many users are more familiar with Explorer than other browsers. The suggestion that less technologically adept users (i.e. less intelligent users) would not know how to download or switch to a different browser made sense.

I first read the news story in The Register – an online technical newspaper covering web, computer and scientific news. Apart from The Register, the story appeared on CNN, the BBC, the Huffington Post, Forbes and many other news outlets globally (e.g. the UK’s Daily Telegraph and Daily Mail). Many of these have now either pulled the story completely, just reporting the hoax, or added an addendum to their story showing that it was a hoax. A few admit to being fooled – the Register, for example, explained why they believed it: because it sounded plausible.

The hoax succeeded however, not only because the story itself sounded plausible, but also because a lot of work had been put in to make it look real. The hoaxer had built a complete web-site to accompany the news item – including other research, implying that the research company concerned was bona fide, other product details, FAQs, and even other research reports, etc. The report itself was included as a PDF download.

In fact most pages had been copies from a genuine company, Central Test headquartered in Paris and with offices in the US, UK, Germany and India – as was highlighted in an article in CBR Online.

Red Flags that indicated the hoax

To its credit the technology magazine, Wired.com spotted several red flags, suggesting that the story was a hoax, stating that “If a headline sounds too good to be true, think twice.”

Wired commented that the other journalists hadn’t really looked at the data, pointing out that “journalists get press releases from small research companies all the time“. The problem is that it’s one thing getting a press release and another printing it without doing basic journalistic checks and follow-throughs. In this case,

the “research company” AptiQuant had no history of past studies – other than on its own web-site;
the company address didn’t exist;
the average reported IQ for Internet Explorer users (80) was so low as to put them in the bottom 15% of the population (while that for Opera users put them in the top 5%) – scarcely credible considering Internet Explorer’s market share.

After the hoax was exposed, the author, Tarandeep Gill, pointed out several red flags that he felt should have alerted journalists and admitted it had been a hoax i.e.

1. The domain was registered on July 14th 2011.
2. The test that was mentioned in the report, “Wechsler Adult Intelligence Scale (IV) test” is a copyrighted test and cannot be administered online.
3. The phone number listed on the report and the press release is the same listed on the press releases/whois of my other websites. A google search reveals this.
4. The address listed on the report does not exist.
5. All the material on my website was not original.
6. The website is made in WordPress. Come on now!
7. I am sure, my haphazardly put together report had more than one grammatical mistakes.
8. There is a link to our website AtCheap.com in the footer.

The rationale and the aftermath

Gill is a computer programmer based in Vancouver, Canada, working on a a comparison shopping website www.AtCheap.com. Gill became irritated at having to code for earlier versions of Internet Explorer – and especially IE 6.0 which is still used by a small percentage of web users. (As of July 2011, 9% of web-users use Internet Explorer versions 6.0 and 7.0 with a further 26% using version 8.0. Only 7% of web users have upgraded to the latest version of Internet Explorer – v9.0).

The problem with IE versions 6.0-8.0 is that they are not compatible with general web-standards making life difficult for web designers who have to code accordingly, and test sites on multiple versions of the same browser – all differing slightly. (As you can’t have all 4 versions of Internet Explorer IE6.0 – IE9.0 on the same computer this means operating 4 separate computers or having 4 hard-disk partitions – one for each version).

Gill decided to create something that would encourage IE users to upgrade or switch, and felt that a report that used scientific language and that looked authentic would do the trick. He designed the web-site, copying material from Central Test, and then put out the press release – never expecting the story to spread so fast or far. He was sure he’d be found out much more quickly.

The problem was that after one or two reputable news sources published the story everybody else piled in. Later reports assumed that the early ones had verified the news story so nobody did any checks. The Register outlined the position in their mea culpa, highlighting how the story sounded sensible.

Many news outlets are busy flagellating themselves for falling for the hoax. But this seems odd when you consider that these news outlets run stories on equally ridiculous market studies on an almost day basis. What’s more, most Reg readers would argue that we all know Internet Explorer users have lower IQs than everyone else. So where’s the harm?

The facts are that AptiQuant doesn’t exist and its survey was a hoax. But facts and surveys are very different from the truth. “It’s official: IE users are dumb as a bag of hammers,” read our headline. “100,000 test subjects can’t be wrong.” The test subjects weren’t real. But they weren’t necessarily wrong either.

You may disagree. But we have no doubt that someone could easily survey 100,000 real internet users and somehow prove that we’re exactly right. And wrong.

The real issue is that nobody checked as the story seemed credible. Competitive Intelligence analysis cannot afford to be so lax. If nobody else bothers verifying a news story that turns out to be false, you have a chance to gain competitive advantage. In contrast those failing to check the story risk losing out. The same lessons that apply to journalists apply to competitive intelligence and just because a news story looks believable, is published in a reputable source and is supported by several other sources doesn’t make it true. The AptiQuant hoax story shows this.

Meanwhile the story rumbles on with threats of lawsuits against Tarandeep Gill by both Microsoft (for insulting Internet Explorer users) and more likely by Central Test. Neither company is willing to comment although Microsoft would like users to upgrade Internet Explorer to the latest version. In May 2010 Microsoft’s Australian operation even said using IE6 was like drinking nine-year-old milk. If Gill has managed to get some users to upgrade he’ll have helped the company. He should have also helped Central Test – as the relatively unknown company has received massive positive publicity as a result of the hoax. If they do sue, it shows a lack of a sense of humour (or a venal desire for money) – and will leave a sour taste as bad as from drinking that nine-year-old milk.

Telling stories – fairy tales, case-studies & scenarios….

April 14, 2011 5 comments

At the ICI/Atelis competitive intelligence conference that took place last week (April 6-7, 2011) in Bad Nauheim, Germany there was a panel discussion on story-telling as a method of reporting intelligence. At about the same time, the Association of Independent Information Professionals (AIIP) held their 25th annual conference in Vancouver, Washington in the USA. Mary-Ellen Bates described how stories can help information professionals market themselves by showing how their skills can solve client problems. The fact that both conferences looked at story-telling shows how businesses are adopting the technique as a way of addressing complex issues.

Story telling is an ancient art-form that might seem strange as a business tool. However, often stories will be an excellent approach for solving business questions as they allow people to look at a situation objectively, remove themselves from the scene and take an outside view. The trick is to tell the right story, catching the imagination and making people think. During the ICI / Atelis conference I suggested a framework for when different story styles can be used.

The first story type is the “fairy-tale” – the “Once Upon a Time in a Kingdom Far Away” type of story. Fairy-tales are possibly the most abstract example of a story that can be applicable to business. The danger is that they can be seen as childish and far-removed from real-world business realities. In fact, they can be a powerful way of highlighting deep-seated organisational problems, as management refusal to see such problems can be illustrated with stories. Such stories can help managers recognise their own situation, and so identify the problems and think of possible solutions.

Consider a company where the CEO or other senior management refuse to see that their business has changed. Often such management grew up in the industry and believe that they know it inside out. Accepting that things have changed is anathema to them. A standard comment given by such managers when asked why things are done in a particular way is “We’ve always done it that way“. Essentially such management suffers from corporate denial – or what Ben Gilad called a business taboo in his book “Business Blindspots“.

Telling such managers a fairy-tale story can help them see the problem (assuming that you can arrange a session they will be willing to attend).

Once upon a time, in a far-away country there was a king who loved to sing. He loved to sing so much that he made laws that all his people were to learn his favourite songs.

Every Sunday, the people were to gather in the town squares and village greens and sing the songs the king loved. The people were happy as they also loved the music and they prided themselves as being the most musical people in the world.

One day, a travelling minstrel sailed into the the kingdom from across the sea – singing a new song. Soon, children started to sing this new song, followed by their parents, and word reached the king that the people were no longer singing the king’s songs but were singing something different.

The king flew into a rage, and put the minstrel into a deep and dark dungeon. However this didn’t stop the minstrel singing – and soon the guards started to sing the new song. The king then made laws saying the new song lacked harmony, was discordant, and that anybody caught singing it would be severely punished.

Gradually the people became unhappier. They liked the new song and wanted to sing it along with the old songs. Instead they stopped singing – and the king got angrier and angrier that his songs were no longer being sung. He tried to force people to sing, but they just sang out-of-tune. He made new laws that said they had to sing on Sundays and Mondays, but found that lots of people said they’d lost their voices from singing so much and so couldn’t sing on Sundays or Mondays. And so the king also got unhappier as he no longer heard his songs being sung as in the past….

The basic lesson for a story such as this is to accept and embrace change – rejecting change is likely to be self-defeating. There are many companies and industries that fail in this – the music industry being a classic example, that lost out by refusing to recognise the impact of music downloading, Napster, iTunes and peer-to-peer file sharing. A fairy-story can help highlight the problems – although the solution will need to come from full discussion and management acceptance.

The second story-type is the traditional case-study. Case studies should be used where the organisation knows the problem, but not the solution. Finding the solution directly is difficult as management is too close to the situation. The case-study serves as a way of examining the problem dispassionately, by looking at a parallel situation involving a company or organisation, from another industry, or market. The aim is to analyse the problem and work out appropriate strategies to solve the problem and apply them to the real situation. The key for a case-study is to find one that matches the organisation’s problems. There is a vast bank of case-studies for a range of industries, topics and problems at the Case Study Clearing House.

A third story-type are future scenarios, generally generated as part of a scenario-planning exercise. Such stories attempt to answer “what if” questions by looking at external factors and their correlations and impacts, and then considering how these could play out in the future. It is essential that such scenarios are internally consistent and that there is a clear line of development from the current situation to the future scenario. This can then allow for strategies to be put in place that take into account what could happen. Such strategies need to be adaptable to changing situations and allow for organisations to prepare for any eventuality.

As a reporting approach, telling stories is one way of putting across ideas that stimulate the imagination, and so can help organisations develop strategies that lead to success. There is a common theme to all three story types: problem identification, its acceptance and the need for strategies to cope with change. They differ in their perspective on the world. The fairy-tale approach looks at understanding problems and overcoming blindspots that relate to the past imposing on the present; case studies look at solving present problems; scenarios are aimed at preparing organisations for the future.

Newer Entries Older Entries

Find It Out – Research Secrets and More

Archive