Archive

Archive for the ‘Online & Search Issues’ Category

Internet Explorer is for Dummies! Anatomy of a hoax.

August 7, 2011 15 comments

Good business intelligence quickly identifies information that is real and what’s false – or should. It’s important that decision making is based on accurate, factual data – as otherwise bad decisions get made. So how do you tell whether something is real or fake?

Generally, the first rule is to check the source or sources.

  • Are they reputable and reliable?
  • Is the information in the story sensible and reasonable?
  • What’s the background to the story – does it fit in with what’s already known?

The problem is that even if information passes these tests it may still not be true. There are numerous examples of news items that sound true but that turn out to be false. One example is a BBC news story from 2002 quoting German researchers who claimed that natural blondes were likely to disappear within 200 years.   A similar story appeared in February 2006 in the UK’s Sunday Times. This article quoted a WHO study from 2002. In fact, there was no WHO study that stated this – it was false. The story of blonde extinction has been traced back over 150 years and periodically is reported – always with “scientific” references to imply validity.

The “Internet Explorer users have lower IQs” hoax

Often, the decision to accept a news item depends on whether or not it sounds true. If the story sounds true, especially if supported by apparent research then people think that it probably is – and so checks aren’t made. That is why a recent news story suggesting that users of Internet Explorer have lower IQs than those of other browsers was reported so widely. Internet Explorer is often set up as the default browser on Windows computers, and many users are more familiar with Explorer than other browsers. The suggestion that less technologically adept users (i.e. less intelligent users) would not know how to download or switch to a different browser made sense.

I first read the news story in The Register – an online technical newspaper covering web, computer and scientific news. Apart from The Register, the story appeared on CNN, the BBC, the Huffington Post, Forbes and many other news outlets globally (e.g. the UK’s  Daily Telegraph  and Daily Mail). Many of these have now either pulled the story completely, just reporting the hoax, or added an addendum to their story showing that it was a hoax. A few admit to being fooled – the Register, for example, explained why they believed it: because it sounded plausible.

The hoax succeeded however, not only because the story itself sounded plausible, but also because a lot of work had been put in to make it look real. The hoaxer had built a complete web-site to accompany the news item – including other research, implying that the research company concerned was bona fide, other product details, FAQs, and even other research reports, etc. The report itself was included as a PDF download.

In fact most pages had been copies from a genuine company, Central Test headquartered in Paris and with offices in the US, UK, Germany and India – as was highlighted in an article in CBR Online.

Red Flags that indicated the hoax

To its credit the technology magazine, Wired.com spotted several red flags, suggesting that the story was a hoax, stating that “If a headline sounds too good to be true, think twice.”

Wired commented that the other journalists hadn’t really looked at the data, pointing out that “journalists get press releases from small research companies all the time“. The problem is that it’s one thing getting a press release and another printing it without doing basic journalistic checks and follow-throughs. In this case,

  • the “research company” AptiQuant had no history of past studies – other than on its own web-site;
  • the company address didn’t exist;
  • the average reported IQ for Internet Explorer users (80) was so low as to put them in the bottom 15% of the population (while that for Opera users put them in the top 5%) – scarcely credible considering Internet Explorer’s market share.

After the hoax was exposed, the author, Tarandeep Gill, pointed out several red flags that he felt should have alerted journalists and admitted it had been a hoax i.e.

1. The domain was registered on July 14th 2011.
2. The test that was mentioned in the report, “Wechsler Adult Intelligence Scale (IV) test” is a copyrighted test and cannot be administered online.
3. The phone number listed on the report and the press release is the same listed on the press releases/whois of my other websites. A google search reveals this.
4. The address listed on the report does not exist.
5. All the material on my website was not original.
6. The website is made in WordPress. Come on now!
7. I am sure, my haphazardly put together report had more than one grammatical mistakes.
8. There is a link to our website AtCheap.com in the footer.

The rationale and the aftermath

Gill is a computer programmer based in Vancouver, Canada, working on a a comparison shopping website www.AtCheap.com. Gill became irritated at having to code for earlier versions of Internet Explorer – and especially IE 6.0 which is still used by a small percentage of web users. (As of July 2011, 9% of web-users use Internet Explorer versions 6.0 and 7.0 with a further 26% using version 8.0. Only 7% of web users have upgraded to the latest version of Internet Explorer – v9.0).

The problem with IE versions 6.0-8.0 is that they are not compatible with general web-standards making life difficult for web designers who have to code accordingly, and test sites on multiple versions of the same browser – all differing slightly. (As you can’t have all 4 versions of Internet Explorer IE6.0 – IE9.0 on the same computer this means operating 4 separate computers or having 4 hard-disk partitions – one for each version).

Gill decided to create something that would encourage IE users to upgrade or switch, and felt that a report that used scientific language and that looked authentic would do the trick.  He designed the web-site, copying material from Central Test, and then put out the press release – never expecting the story to spread so fast or far. He was sure he’d be found out much more quickly.

The problem was that after one or two reputable news sources published the story everybody else piled in. Later reports assumed that the early ones had verified the news story so nobody did any checks. The Register outlined the position in their mea culpa, highlighting how the story sounded sensible.

Many news outlets are busy flagellating themselves for falling for the hoax. But this seems odd when you consider that these news outlets run stories on equally ridiculous market studies on an almost day basis. What’s more, most Reg readers would argue that we all know Internet Explorer users have lower IQs than everyone else. So where’s the harm?

The facts are that AptiQuant doesn’t exist and its survey was a hoax. But facts and surveys are very different from the truth. “It’s official: IE users are dumb as a bag of hammers,” read our headline. “100,000 test subjects can’t be wrong.” The test subjects weren’t real. But they weren’t necessarily wrong either.

You may disagree. But we have no doubt that someone could easily survey 100,000 real internet users and somehow prove that we’re exactly right. And wrong.

The real issue is that nobody checked as the story seemed credible. Competitive Intelligence analysis cannot afford to be so lax. If nobody else bothers verifying a news story that turns out to be false, you have a chance to gain competitive advantage. In contrast those failing to check the story risk losing out. The same lessons that apply to journalists apply to competitive intelligence and just because a news story looks believable, is published in a reputable source and is supported by several other sources doesn’t make it true. The AptiQuant hoax story shows this.

Meanwhile the story rumbles on with threats of lawsuits against Tarandeep Gill by both Microsoft (for insulting Internet Explorer users) and more likely by Central Test. Neither company is willing to comment although Microsoft would like users to upgrade Internet Explorer to the latest version. In May 2010 Microsoft’s Australian operation even said using IE6 was like drinking nine-year-old milk. If Gill has managed to get some users to upgrade he’ll have helped the company. He should have also helped Central Test – as the relatively unknown company has received massive positive publicity as a result of the hoax. If they do sue, it shows a lack of a sense of humour (or a venal desire for money) – and will leave a sour taste as bad as from drinking that nine-year-old milk.

Zanran – a new data search engine

April 21, 2011 4 comments

I’ve been playing with a new data search engine called Zanran – that focuses on finding numerical and graphical data. The site is in an early beta. Nevertheless my initial tests brought up material that would only have been found using an advanced search on Google – if you were lucky. As such, Zanran promises to be a great addition for advanced data searching.

Zanran.com

Zanran.com - Front Page

Zanran focuses on finding what it calls  ‘semi-structured’ data on the web. This is defined as numerical data presented as graphs, tables and charts – and these could be held in a graph image or table in an HTML file, as part of a PDF report, or in an Excel spreadsheet. This is the key differentiator – essentially, Zanran is not looking for text but for formatted numerical data.

When I first started looking at the site I was expecting something similar to Wolfram Alpha – or perhaps something from Google (e.g. Google Squared or Google Public Data). Zanran is nothing like these – and so brings something new to search. Rather than take data and structure or tabulate it (as with Wolfram Alpha and Google Squared), Zanran searches for data that is already in tables or charts and uses this in its results listing.

Zanran.com

Zanran.com Search: "Average Marriage Age"

The site has a nice touch in that hovering the cursor over results gives you the relevant data page – whether a table, a chart or a mix of text, tables or charts.

Zanran.com - Hovering over a result brings up an image of the data.

The advanced search options allow country searching (based on server location), document date and file type, each selectable from a drop-down box, as well as searches on specified web-sites.  At the moment only English speaking countries can be selected (Australia, Canada, Ireland, India, UK New Zealand, USA and South Africa). The date selections allow for the last 6, 12 or 24 months and the file type allows for selection based on PDF; Excel; images in HTML files; tables in HTML files; PDF, Excel and dynamic data; and dynamic data alone. PowerPoint and Word files are promised as future options. There are currently no field search options (e.g. title searches).

My main dislike was that the site doesn’t give the full URLs for the data presented. The top-level domain is given, but not the actual URL which makes the site difficult to use when full attribution is required for any data found (especially if data gets downloaded, rather than opening up in a new page or tab).

Zanran.com has been in development since at least 2009 when it was a finalist in the London Technology Fund Competition. The technology behind Zanran is patented and based on open-source software, and cloud storage. Rather than searching for text, Zanran searches for numerical content, and then classifies it by whether it’s a table or a chart.

Atypically, Zanran is not a Californian Silicon Valley Startup, but is based in the Islington area of London, in a quiet residential side-street made up of a mixture of small mostly home-based businesses and flats/apartments. Zanran was founded by two chemists, Jonathan Goldhill and Yves Dassas, who had previously run telecom businesses (High Track Communications Ltd and Bikebug Radio Technologies) from the same address. Funding has come from the London Development Agency and First Capital among other investors.

Zanran views competitors as Wolfram Alpha, Google Public Data and also Infochimps (a database repository – enabling users to search for and download a wide variety of databases). The competitor list comes from Google’s cache of Zanran’s Wikipedia page as unfortunately, Wikipedia has deleted the actual page – claiming that the site is “too new to know if it will or will not ever be notable“.

Google Cache of Zanran's Wikipedia entry

I hope that Wikipedia is wrong and that Zanran will become “notable” as I think the company offers a new approach to searching the web for data. It will never replace Google or Bing – but that’s not its aim. Zanran aims to be a niche tool that will probably only ever be used by search experts. However as such, it deserves a chance, and if its revenue model (I’m assuming that there is one) works, it deserves success.

Social Media – networking to the future

March 27, 2011 3 comments

On Friday this arrived in my email inbox – a timely reminder of how the world has changed over the last few years.

Linked In Letter

 

I joined LinkedIn.com in August 2004, fifteen months after the site was launched in May 2003, and two years before Facebook allowed for open-access. (Facebook itself launched in February 2004 but was restricted to university / colleges and a few others until September 2006).

I’d been interested in social networks for several years – and my membership of the UK networking site, FriendsReunited.com dates from a few years earlier.

Initially social networking seemed to be more about re-connecting with people from real life rather than communicating on a regular basis. That’s all changed now.  Online social networking – through sites such as Facebook and LinkedIn is the way many people keep up-to-date with what’s going on in their social circles.

I’ve been invited to parties via Facebook, and have also invited people to my own events. It’s the way I find out what’s going on in my friend’s lives – or those that keep up on Facebook. In fact I find it now more difficult to keep up with some people who still resist the online world – as the phone lacks the immediacy that we’ve come to expect. I’m not alone: 10% of the world’s population is now on Facebook (which claims over 600m members).

In the business world, the same sort of thing is happening. I find LinkedIn incredibly useful for contacting colleagues and potential colleagues – and finding people to contact when I’m doing research. It lets me know what people are doing and it is difficult to imagine how I’d do business without such sites now. Again – I’m not alone, with LinkedIn now claiming over 100m members.

These changes promise to do more than just change the way people communicate and do business. For many years, people have talked about computers bringing about a paperless office. In my opinion that’s bunkum – or is so far. (I personally believe that technologies such as the iPad and e-Paper may eventually mean that printed material will become the exception rather than the rule in the business world – but that is some years in the future). However another development may come more quickly: the email-less office. In February 2011 Atos Origin, the French IT consulting and services company, put out a press release setting out an ambition to become a zero-email company by 2014. The company pointed out online social networking was now more popular than email and even searching for information. (Bing is integrating with Facebook – recognising the importance of social networking sites, with some people preferring to search from within the site than to go to an external site). The prevalence of spam – even with efficient anti-spam software has also meant that email was becoming ineffective as a communication tool. Guy Kawasaki, the well known blogger and Internet guru has commented that email is too long, wishing that it could be limited to 140 characters i.e. like Twitter.com, the social networking communication tool. He echoes views that see email as a flawed communication medium.

So what is the future. I find it interesting that the current revolution in the Middle East seems to be driven by social media – with both Egyptian and Tunisian regimes falling as a result of campaigns launched on Facebook and Twitter. Personal contacts however were still important: the revolutions may have been organised virtually, via online social media, but it was the mass street protests that led to the change. I think that this states the position of all online social media. It’s a communication medium, but ultimately, that is all. In this it is not new. Over the last 120 years, mankind has seen several new communication media: telex; telephone; fax; email…. Each promised additional speed and immediacy. Now Facebook, LinkedIn and especially Twitter and instant messaging (e.g. via Skype) promise even faster ways for people to communicate. Nevertheless, at the end of the day, human contact still has to be physical to have any real meaning.

Google versus Bing – a competitive intelligence case study

February 2, 2011 7 comments

Search experts regularly emphasise that to get the best search results it is important to use more than one search engine. The main reason for this is that each search engine uses a different relevancy ranking leading to different search results pages. Using Google will give a results page with the sites that Google thinks are the most relevant for the search query, while using Bing is supposed to give a results page where the top hits are based on a different relevancy ranking. This alternative may give better results for some searches and so a comprehensive search needs to use multiple search engines.

You may have noticed that I highlighted the word supposed when mentioning Bing. This is because it appears that Bing is cheating, and is using some of Google’s results in their search lists. Plagiarising Google’s results may be Bing’s way of saying that Google is better. However it leaves a bad taste as it means that one of the main reasons for using Microsoft’s search engine can be questioned, i.e. that the results are different and that all are generated independently, using different relevancy rankings.

Bing is Microsoft’s third attempt at a market-leading, Google bashing, search engine – replacing Live.com which in turn had replaced MSN Search. Bing has been successful and is truly a good alternative to Google. It is the default search engine on Facebook (i.e. when doing a search on Facebook, you get Bing results) and is also used to supply results to other search utilities – most notably Yahoo! From a marketing perspective, however, it appears that the adage “differentiate or die” hasn’t been fully understood by Bing. Companies that fail to fully differentiate their product offerings from competitors are likely to fail.

The story that Bing was copying Google’s results dates back to Summer 2010, when Google noticed an odd similarity to a highly specialist search on the two search engines. This, in itself wouldn’t be a problem. You’d expect similar results for very targeted search terms – the main difference will be the sort order. However in this case, the same top results were being generated when spelling mistakes were used as the search term. Google started to look more closely – and found that this wasn’t just a one-off. However to prove that Bing was stealing Google’s results needed more than just observation. To test the hypothesis, Google set up 100 dummy and nonsense queries that led to web-sites that had no relationship at all to the query. They then gave their testers laptops with a new Windows install – running Microsoft’s Internet Explorer 8 and with the Bing Toolbar installed. The install process included the “Suggested Sites” feature of Internet Explorer and the toolbar’s default options.

Within a few weeks, Bing started returning the fake results for the same Google searches. For example, a search for hiybbprqag gave the seating plan for a Los Angeles theatre, while delhipublicschool40 chdjob returned a Ohio Credit Union as the top result. This proved that the source for the results was not Bing’s own search algorithm but that the result had been taken from Google.

What was happening was that the searches and search results on Google were being passed back to Microsoft – via some feature of Internet Explorer 8, Windows or the Bing Toolbar.

As Google states in their Blog article on the discovery (which is illustrated with screenshots of the findings):

At Google we strongly believe in innovation and are proud of our search quality. We’ve invested thousands of person-years into developing our search algorithms because we want our users to get the right answer every time they search, and that’s not easy. We look forward to competing with genuinely new search algorithms out there—algorithms built on core innovation, and not on recycled search results from a competitor. So to all the users out there looking for the most authentic, relevant search results, we encourage you to come directly to Google. And to those who have asked what we want out of all this, the answer is simple: we’d like for this practice to stop.

Interestingly, Bing doesn’t even try to deny the claim – perhaps because they realise that they were caught red-handed. Instead they have tried to justify using the data on customer computers as a way of improving search experiences – even when the searching was being done via a competitor.  In fact, Harry Shum, a Bing VP, believes that this is actually good practice, stating in Bing’s response to a blog post by Danny Sullivan that exposed the practice:

“We have been very clear. We use the customer data to help improve the search experience…. We all learn from our collective customers, and we all should.”

It is well known that companies collect data on customer usage of their own web-sites – that is one purpose of cookies generated when visiting a site. It is less well known that some companies also collect data on what users do on other sites (which is why Yauba boasts about its privacy credentials). I’m sure that the majority of users of the Bing toolbar and other Internet Explorer and Windows features that seem to pass back data to Microsoft would be less happy if they knew how much data was collected and where from. Microsoft has been collecting such data for several years, but ethically the practice is highly questionable, even though Microsoft users may have originally agreed to the company collecting data to “help improve the online experience“.

What the story also shows is how much care and pride Google take in their results – and how they have an effective competitive intelligence (and counter-intelligence) programme, actively comparing their results with competitors. Microsoft even recognised this by falsely accusing Google of spying via their sting operation that exposed Microsoft’s practices – with Shum commenting (my italics):

What we saw in today’s story was a spy-novelesque stunt to generate extreme outliers in tail query ranking. It was a creative tactic by a competitor, and we’ll take it as a back-handed compliment. But it doesn’t accurately portray how we use opt-in customer data as one of many inputs to help improve our user experience.

To me, this sounds like sour-grapes. How can copying a competitor’s results improve the user experience? If it doesn’t accurately portray how customer data IS used, maybe now would be the time for Microsoft to reassure customers regarding their data privacy. And rather than view the comment that Google’s exposure of Bing’s practices was a back-handed compliment, I’d see it as slap in the face with the front of the hand. However what else could Microsoft & Bing say, other than Mea Culpa.

Update – Wednesday 2 February 2011:

The war of words between Google and Bing continues. Bing has now denied copying Google’s results, and moreover accused Google of click-fraud:

Google engaged in a “honeypot” attack to trick Bing. In simple terms, Google’s “experiment” was rigged to manipulate Bing search results through a type of attack also known as “click fraud.” That’s right, the same type of attack employed by spammers on the web to trick consumers and produce bogus search results.  What does all this cloak and dagger click fraud prove? Nothing anyone in the industry doesn’t already know. As we have said before and again in this post, we use click stream optionally provided by consumers in an anonymous fashion as one of 1,000 signals to try and determine whether a site might make sense to be in our index.

Bing seems to have ignored the fact that Google’s experiment resulted from their observation that certain genuine searches seemed to be copied by Bing – including misspellings, and also some mistakes in their algorithm that resulted in odd results. The accusation of click fraud is bizarre as the searches Google used to test for click fraud were completely artificial. There is no way that a normal searcher would have made such searches, and so the fact that the results bore no resemblance to the actual search terms is completely different to the spam practice where a dummy site appears for certain searches.

Bing can accuse Google of cloak and dagger behaviour. However sometimes, counter-intelligence requires such behaviour to catch miscreants red-handed. It’s a practice carried out by law enforcement globally where a crime is suspected but where there is insufficient evidence to catch the culprit. As an Internet example, one technique used to catch paedophiles is for a police officer to pretend to be a vulnerable child on an Internet chat-room. Is this fraud – when the paedophile subsequently arranges to meet up – and is caught? In some senses it is. However saying such practices are wrong gives carte-blanche to criminals to continue their illegal practices. Bing appears to be putting themselves in the same camp – by saying that using “honeypot” attacks is wrong.

They also have not recognised the points I’ve stressed about the ethical use of data. There is a big difference between using anonymous data tracking user  behaviour on your own search engine and tracking that of a competitor. Using your competitor’s data to improve your own product, when the intelligence was gained by technology that effectively hacks into usage made by your competitor’s customers is espionage. The company guilty of spying is Bing – not Google. Google just used competitive intelligence to identify the problem, and a creative approach to counter-intelligence to prove it.

%d bloggers like this: