Search experts regularly emphasise that to get the best search results it is important to use more than one search engine. The main reason for this is that each search engine uses a different relevancy ranking leading to different search results pages. Using Google will give a results page with the sites that Google thinks are the most relevant for the search query, while using Bing is supposed to give a results page where the top hits are based on a different relevancy ranking. This alternative may give better results for some searches and so a comprehensive search needs to use multiple search engines.
You may have noticed that I highlighted the word supposed when mentioning Bing. This is because it appears that Bing is cheating, and is using some of Google’s results in their search lists. Plagiarising Google’s results may be Bing’s way of saying that Google is better. However it leaves a bad taste as it means that one of the main reasons for using Microsoft’s search engine can be questioned, i.e. that the results are different and that all are generated independently, using different relevancy rankings.
Bing is Microsoft’s third attempt at a market-leading, Google bashing, search engine – replacing Live.com which in turn had replaced MSN Search. Bing has been successful and is truly a good alternative to Google. It is the default search engine on Facebook (i.e. when doing a search on Facebook, you get Bing results) and is also used to supply results to other search utilities – most notably Yahoo! From a marketing perspective, however, it appears that the adage “differentiate or die” hasn’t been fully understood by Bing. Companies that fail to fully differentiate their product offerings from competitors are likely to fail.
The story that Bing was copying Google’s results dates back to Summer 2010, when Google noticed an odd similarity to a highly specialist search on the two search engines. This, in itself wouldn’t be a problem. You’d expect similar results for very targeted search terms – the main difference will be the sort order. However in this case, the same top results were being generated when spelling mistakes were used as the search term. Google started to look more closely – and found that this wasn’t just a one-off. However to prove that Bing was stealing Google’s results needed more than just observation. To test the hypothesis, Google set up 100 dummy and nonsense queries that led to web-sites that had no relationship at all to the query. They then gave their testers laptops with a new Windows install – running Microsoft’s Internet Explorer 8 and with the Bing Toolbar installed. The install process included the “Suggested Sites” feature of Internet Explorer and the toolbar’s default options.
Within a few weeks, Bing started returning the fake results for the same Google searches. For example, a search for hiybbprqag gave the seating plan for a Los Angeles theatre, while delhipublicschool40 chdjob returned a Ohio Credit Union as the top result. This proved that the source for the results was not Bing’s own search algorithm but that the result had been taken from Google.
What was happening was that the searches and search results on Google were being passed back to Microsoft – via some feature of Internet Explorer 8, Windows or the Bing Toolbar.
As Google states in their Blog article on the discovery (which is illustrated with screenshots of the findings):
At Google we strongly believe in innovation and are proud of our search quality. We’ve invested thousands of person-years into developing our search algorithms because we want our users to get the right answer every time they search, and that’s not easy. We look forward to competing with genuinely new search algorithms out there—algorithms built on core innovation, and not on recycled search results from a competitor. So to all the users out there looking for the most authentic, relevant search results, we encourage you to come directly to Google. And to those who have asked what we want out of all this, the answer is simple: we’d like for this practice to stop.
Interestingly, Bing doesn’t even try to deny the claim – perhaps because they realise that they were caught red-handed. Instead they have tried to justify using the data on customer computers as a way of improving search experiences – even when the searching was being done via a competitor. In fact, Harry Shum, a Bing VP, believes that this is actually good practice, stating in Bing’s response to a blog post by Danny Sullivan that exposed the practice:
“We have been very clear. We use the customer data to help improve the search experience…. We all learn from our collective customers, and we all should.”
It is well known that companies collect data on customer usage of their own web-sites – that is one purpose of cookies generated when visiting a site. It is less well known that some companies also collect data on what users do on other sites (which is why Yauba boasts about its privacy credentials). I’m sure that the majority of users of the Bing toolbar and other Internet Explorer and Windows features that seem to pass back data to Microsoft would be less happy if they knew how much data was collected and where from. Microsoft has been collecting such data for several years, but ethically the practice is highly questionable, even though Microsoft users may have originally agreed to the company collecting data to “help improve the online experience“.
What the story also shows is how much care and pride Google take in their results – and how they have an effective competitive intelligence (and counter-intelligence) programme, actively comparing their results with competitors. Microsoft even recognised this by falsely accusing Google of spying via their sting operation that exposed Microsoft’s practices – with Shum commenting (my italics):
What we saw in today’s story was a spy-novelesque stunt to generate extreme outliers in tail query ranking. It was a creative tactic by a competitor, and we’ll take it as a back-handed compliment. But it doesn’t accurately portray how we use opt-in customer data as one of many inputs to help improve our user experience.
To me, this sounds like sour-grapes. How can copying a competitor’s results improve the user experience? If it doesn’t accurately portray how customer data IS used, maybe now would be the time for Microsoft to reassure customers regarding their data privacy. And rather than view the comment that Google’s exposure of Bing’s practices was a back-handed compliment, I’d see it as slap in the face with the front of the hand. However what else could Microsoft & Bing say, other than Mea Culpa.
Update – Wednesday 2 February 2011:
The war of words between Google and Bing continues. Bing has now denied copying Google’s results, and moreover accused Google of click-fraud:
Google engaged in a “honeypot” attack to trick Bing. In simple terms, Google’s “experiment” was rigged to manipulate Bing search results through a type of attack also known as “click fraud.” That’s right, the same type of attack employed by spammers on the web to trick consumers and produce bogus search results. What does all this cloak and dagger click fraud prove? Nothing anyone in the industry doesn’t already know. As we have said before and again in this post, we use click stream optionally provided by consumers in an anonymous fashion as one of 1,000 signals to try and determine whether a site might make sense to be in our index.
Bing seems to have ignored the fact that Google’s experiment resulted from their observation that certain genuine searches seemed to be copied by Bing – including misspellings, and also some mistakes in their algorithm that resulted in odd results. The accusation of click fraud is bizarre as the searches Google used to test for click fraud were completely artificial. There is no way that a normal searcher would have made such searches, and so the fact that the results bore no resemblance to the actual search terms is completely different to the spam practice where a dummy site appears for certain searches.
Bing can accuse Google of cloak and dagger behaviour. However sometimes, counter-intelligence requires such behaviour to catch miscreants red-handed. It’s a practice carried out by law enforcement globally where a crime is suspected but where there is insufficient evidence to catch the culprit. As an Internet example, one technique used to catch paedophiles is for a police officer to pretend to be a vulnerable child on an Internet chat-room. Is this fraud – when the paedophile subsequently arranges to meet up – and is caught? In some senses it is. However saying such practices are wrong gives carte-blanche to criminals to continue their illegal practices. Bing appears to be putting themselves in the same camp – by saying that using “honeypot” attacks is wrong.
They also have not recognised the points I’ve stressed about the ethical use of data. There is a big difference between using anonymous data tracking user behaviour on your own search engine and tracking that of a competitor. Using your competitor’s data to improve your own product, when the intelligence was gained by technology that effectively hacks into usage made by your competitor’s customers is espionage. The company guilty of spying is Bing – not Google. Google just used competitive intelligence to identify the problem, and a creative approach to counter-intelligence to prove it.
I’ve been impressed with the numbers of people using social networking sites – and the importance of social networking for marketing has become significant over the last few years.
Facebook claims 400 million users (i.e. nearly 6% of the global population that is approaching 7 billion people). I’ve always thought that this figure must include duplicate accounts – as I don’t believe that most people in China, India, Africa and many other areas of the world have Facebook accounts (or even computers – although the numbers are growing). The World Bank stated that there were just under 300m Internet users in China and 52m in India in 2008. (There’s a great graph of this at Google’s Public Data tool – that shows that in 2008 there were around 1.5bn web-users).
Even taking account the exponential growth – let’s assume that web users globally are now over 2 billion people – Facebook’s figures imply that 1 in 5 users have a Facebook account.
I know of many people who don’t have an account and some who refuse to get one. In my age group (over 40), I’d guess that the majority don’t. So where this 400m figure came from and what it includes is a key question.
It now seems that Facebook has been boosting it’s membership figures. I just read this article from one of my favorite sites (www.pandia.com). Apparently Facebook has been telling advertisers that it has 1.6m users in Oslo. The trouble is that the greater Oslo metropolitan area only has 900,000 people. Facebook apparently counts members by IP address – and I guess that it is feasible that this could include users who access the site via Oslo based web-servers. However not if you consider the next statistic given. The Facebook advertiser tool says that there are 850,000 Facebook users between the ages of 20-29 in Norway – which is 235,000 more than the total numbers (613,000) in that age group.
This over-inflation isn’t just a Norwegian issue. According to CheckFacebook.com (a site that tracks data from the Facebook advertising tool giving Facebook membership numbers), almost 63% of online users in the UK now have a Facebook account. That’s 27m out of a total UK population of 62m. In some countries it’s even higher. Apparently all (100%) Nicaraguan, Qatari and Bangladeshi web users also have a Facebook account, as do 99% of Indonesians, 98% of Filipinos, 97% of Venezuelans, and 85% of Turks.
It’s possible that these statistics are true. However, if so, I’m sure that they also include occasional and infrequent users as well as dormant and duplicated accounts.
One of the most important types of competitive intelligence analysis is to not take everything at face value. When presented with figures, it’s important to sense check them – wherever possible by using other sources (e.g. official population statistics). Only then should such data be used in decision making. You should also ask whether there is an incentive to exaggerate or under-estimate statistics. If there is such an incentive, it is likely that this will be done, at least in the published data. Decisions made using such erroneous or manipulated figures will probably be poor decisions and fail to achieve the expected results. In the case of Facebook, the incentive in exaggerating membership figures is that they can then boost their attractiveness to advertisers, and consequently their advertising revenues.