Archive

Posts Tagged ‘Bing’

Zanran – a new data search engine

April 21, 2011 4 comments

I’ve been playing with a new data search engine called Zanran – that focuses on finding numerical and graphical data. The site is in an early beta. Nevertheless my initial tests brought up material that would only have been found using an advanced search on Google – if you were lucky. As such, Zanran promises to be a great addition for advanced data searching.

Zanran.com

Zanran.com - Front Page

Zanran focuses on finding what it calls  ‘semi-structured’ data on the web. This is defined as numerical data presented as graphs, tables and charts – and these could be held in a graph image or table in an HTML file, as part of a PDF report, or in an Excel spreadsheet. This is the key differentiator – essentially, Zanran is not looking for text but for formatted numerical data.

When I first started looking at the site I was expecting something similar to Wolfram Alpha – or perhaps something from Google (e.g. Google Squared or Google Public Data). Zanran is nothing like these – and so brings something new to search. Rather than take data and structure or tabulate it (as with Wolfram Alpha and Google Squared), Zanran searches for data that is already in tables or charts and uses this in its results listing.

Zanran.com

Zanran.com Search: "Average Marriage Age"

The site has a nice touch in that hovering the cursor over results gives you the relevant data page – whether a table, a chart or a mix of text, tables or charts.

Zanran.com - Hovering over a result brings up an image of the data.

The advanced search options allow country searching (based on server location), document date and file type, each selectable from a drop-down box, as well as searches on specified web-sites.  At the moment only English speaking countries can be selected (Australia, Canada, Ireland, India, UK New Zealand, USA and South Africa). The date selections allow for the last 6, 12 or 24 months and the file type allows for selection based on PDF; Excel; images in HTML files; tables in HTML files; PDF, Excel and dynamic data; and dynamic data alone. PowerPoint and Word files are promised as future options. There are currently no field search options (e.g. title searches).

My main dislike was that the site doesn’t give the full URLs for the data presented. The top-level domain is given, but not the actual URL which makes the site difficult to use when full attribution is required for any data found (especially if data gets downloaded, rather than opening up in a new page or tab).

Zanran.com has been in development since at least 2009 when it was a finalist in the London Technology Fund Competition. The technology behind Zanran is patented and based on open-source software, and cloud storage. Rather than searching for text, Zanran searches for numerical content, and then classifies it by whether it’s a table or a chart.

Atypically, Zanran is not a Californian Silicon Valley Startup, but is based in the Islington area of London, in a quiet residential side-street made up of a mixture of small mostly home-based businesses and flats/apartments. Zanran was founded by two chemists, Jonathan Goldhill and Yves Dassas, who had previously run telecom businesses (High Track Communications Ltd and Bikebug Radio Technologies) from the same address. Funding has come from the London Development Agency and First Capital among other investors.

Zanran views competitors as Wolfram Alpha, Google Public Data and also Infochimps (a database repository – enabling users to search for and download a wide variety of databases). The competitor list comes from Google’s cache of Zanran’s Wikipedia page as unfortunately, Wikipedia has deleted the actual page – claiming that the site is “too new to know if it will or will not ever be notable“.

Google Cache of Zanran's Wikipedia entry

I hope that Wikipedia is wrong and that Zanran will become “notable” as I think the company offers a new approach to searching the web for data. It will never replace Google or Bing – but that’s not its aim. Zanran aims to be a niche tool that will probably only ever be used by search experts. However as such, it deserves a chance, and if its revenue model (I’m assuming that there is one) works, it deserves success.

Google versus Bing – a competitive intelligence case study

February 2, 2011 7 comments

Search experts regularly emphasise that to get the best search results it is important to use more than one search engine. The main reason for this is that each search engine uses a different relevancy ranking leading to different search results pages. Using Google will give a results page with the sites that Google thinks are the most relevant for the search query, while using Bing is supposed to give a results page where the top hits are based on a different relevancy ranking. This alternative may give better results for some searches and so a comprehensive search needs to use multiple search engines.

You may have noticed that I highlighted the word supposed when mentioning Bing. This is because it appears that Bing is cheating, and is using some of Google’s results in their search lists. Plagiarising Google’s results may be Bing’s way of saying that Google is better. However it leaves a bad taste as it means that one of the main reasons for using Microsoft’s search engine can be questioned, i.e. that the results are different and that all are generated independently, using different relevancy rankings.

Bing is Microsoft’s third attempt at a market-leading, Google bashing, search engine – replacing Live.com which in turn had replaced MSN Search. Bing has been successful and is truly a good alternative to Google. It is the default search engine on Facebook (i.e. when doing a search on Facebook, you get Bing results) and is also used to supply results to other search utilities – most notably Yahoo! From a marketing perspective, however, it appears that the adage “differentiate or die” hasn’t been fully understood by Bing. Companies that fail to fully differentiate their product offerings from competitors are likely to fail.

The story that Bing was copying Google’s results dates back to Summer 2010, when Google noticed an odd similarity to a highly specialist search on the two search engines. This, in itself wouldn’t be a problem. You’d expect similar results for very targeted search terms – the main difference will be the sort order. However in this case, the same top results were being generated when spelling mistakes were used as the search term. Google started to look more closely – and found that this wasn’t just a one-off. However to prove that Bing was stealing Google’s results needed more than just observation. To test the hypothesis, Google set up 100 dummy and nonsense queries that led to web-sites that had no relationship at all to the query. They then gave their testers laptops with a new Windows install – running Microsoft’s Internet Explorer 8 and with the Bing Toolbar installed. The install process included the “Suggested Sites” feature of Internet Explorer and the toolbar’s default options.

Within a few weeks, Bing started returning the fake results for the same Google searches. For example, a search for hiybbprqag gave the seating plan for a Los Angeles theatre, while delhipublicschool40 chdjob returned a Ohio Credit Union as the top result. This proved that the source for the results was not Bing’s own search algorithm but that the result had been taken from Google.

What was happening was that the searches and search results on Google were being passed back to Microsoft – via some feature of Internet Explorer 8, Windows or the Bing Toolbar.

As Google states in their Blog article on the discovery (which is illustrated with screenshots of the findings):

At Google we strongly believe in innovation and are proud of our search quality. We’ve invested thousands of person-years into developing our search algorithms because we want our users to get the right answer every time they search, and that’s not easy. We look forward to competing with genuinely new search algorithms out there—algorithms built on core innovation, and not on recycled search results from a competitor. So to all the users out there looking for the most authentic, relevant search results, we encourage you to come directly to Google. And to those who have asked what we want out of all this, the answer is simple: we’d like for this practice to stop.

Interestingly, Bing doesn’t even try to deny the claim – perhaps because they realise that they were caught red-handed. Instead they have tried to justify using the data on customer computers as a way of improving search experiences – even when the searching was being done via a competitor.  In fact, Harry Shum, a Bing VP, believes that this is actually good practice, stating in Bing’s response to a blog post by Danny Sullivan that exposed the practice:

“We have been very clear. We use the customer data to help improve the search experience…. We all learn from our collective customers, and we all should.”

It is well known that companies collect data on customer usage of their own web-sites – that is one purpose of cookies generated when visiting a site. It is less well known that some companies also collect data on what users do on other sites (which is why Yauba boasts about its privacy credentials). I’m sure that the majority of users of the Bing toolbar and other Internet Explorer and Windows features that seem to pass back data to Microsoft would be less happy if they knew how much data was collected and where from. Microsoft has been collecting such data for several years, but ethically the practice is highly questionable, even though Microsoft users may have originally agreed to the company collecting data to “help improve the online experience“.

What the story also shows is how much care and pride Google take in their results – and how they have an effective competitive intelligence (and counter-intelligence) programme, actively comparing their results with competitors. Microsoft even recognised this by falsely accusing Google of spying via their sting operation that exposed Microsoft’s practices – with Shum commenting (my italics):

What we saw in today’s story was a spy-novelesque stunt to generate extreme outliers in tail query ranking. It was a creative tactic by a competitor, and we’ll take it as a back-handed compliment. But it doesn’t accurately portray how we use opt-in customer data as one of many inputs to help improve our user experience.

To me, this sounds like sour-grapes. How can copying a competitor’s results improve the user experience? If it doesn’t accurately portray how customer data IS used, maybe now would be the time for Microsoft to reassure customers regarding their data privacy. And rather than view the comment that Google’s exposure of Bing’s practices was a back-handed compliment, I’d see it as slap in the face with the front of the hand. However what else could Microsoft & Bing say, other than Mea Culpa.

Update – Wednesday 2 February 2011:

The war of words between Google and Bing continues. Bing has now denied copying Google’s results, and moreover accused Google of click-fraud:

Google engaged in a “honeypot” attack to trick Bing. In simple terms, Google’s “experiment” was rigged to manipulate Bing search results through a type of attack also known as “click fraud.” That’s right, the same type of attack employed by spammers on the web to trick consumers and produce bogus search results.  What does all this cloak and dagger click fraud prove? Nothing anyone in the industry doesn’t already know. As we have said before and again in this post, we use click stream optionally provided by consumers in an anonymous fashion as one of 1,000 signals to try and determine whether a site might make sense to be in our index.

Bing seems to have ignored the fact that Google’s experiment resulted from their observation that certain genuine searches seemed to be copied by Bing – including misspellings, and also some mistakes in their algorithm that resulted in odd results. The accusation of click fraud is bizarre as the searches Google used to test for click fraud were completely artificial. There is no way that a normal searcher would have made such searches, and so the fact that the results bore no resemblance to the actual search terms is completely different to the spam practice where a dummy site appears for certain searches.

Bing can accuse Google of cloak and dagger behaviour. However sometimes, counter-intelligence requires such behaviour to catch miscreants red-handed. It’s a practice carried out by law enforcement globally where a crime is suspected but where there is insufficient evidence to catch the culprit. As an Internet example, one technique used to catch paedophiles is for a police officer to pretend to be a vulnerable child on an Internet chat-room. Is this fraud – when the paedophile subsequently arranges to meet up – and is caught? In some senses it is. However saying such practices are wrong gives carte-blanche to criminals to continue their illegal practices. Bing appears to be putting themselves in the same camp – by saying that using “honeypot” attacks is wrong.

They also have not recognised the points I’ve stressed about the ethical use of data. There is a big difference between using anonymous data tracking user  behaviour on your own search engine and tracking that of a competitor. Using your competitor’s data to improve your own product, when the intelligence was gained by technology that effectively hacks into usage made by your competitor’s customers is espionage. The company guilty of spying is Bing – not Google. Google just used competitive intelligence to identify the problem, and a creative approach to counter-intelligence to prove it.

But it’s not google – Bing goes Live!

June 2, 2009 Leave a comment

Another long wait between entries – I really must update more often. However recent events in the Search world and in the CI world mean I have no choice but to update. My thoughts on recent changes at SCIP will have to wait till my next post. This post will look at Microsoft‘s replacement for Live and MSN Search – with its new Bing search engine.

Searches at Live or MSN Search now redirect to Bing.com. I like the front-end – it’s clean and colourful. However I couldn’t find anywhere to change the front image – at least on the UK version that’s still in Beta.
The US version does allow you to scroll back to previous images – with a little arrow option at the bottom of the right side of the screen.

The US version also includes hot-spots describing aspects of the picture, plus a side-bar offering more search options.

At the bottom of both versions is a link for help – interestingly still pointing to Live.com. Obviously Microsoft still has more work to do on this. The help section gives the format for advanced commands and also allows you to remove the screen background.

So how does Bing perform. For the searches I tried, the results are good – and there isn’t that much to choose between Google and Bing. One difference i did notice is that URLs with the search terms used seem to come higher than other sites – so, for example, AWARE‘s web-site came to the top for a search on “marketing-intelligence“. Also relevant is that the algorithm is sufficiently intelligent to realise that “CompetitorAnalysis.com” is a likely candidate for searches on “Competitor Analysis“. I’m not sure the same precision exists in Google. Another odd feature is that some titles seem to be edited. For example some searches on my web-site content bring up the following title: “

This title doesn’t exist on our web-site so has been taken from somewhere else – most likely from a link on a UK government business support web-site.

Where Bing falls is in the advanced searching and also the preferences. I like that you can set Google to display 100 hits at a time. Bing only allows 50. Bing also lacks some of the field / advanced search options available to Google. There are no wild-card searches (using the * character) or synonym searches (the ~ character) for example. However there are options that are not currently available in Google – such as the feed:, hasfeed:, loc:, and contains: options. These allow for searching for RSS sites (feed: and hasfeed:), location searches (loc:), searches for sites containing links to types of content such as WMA, MPG files, etc. – contains:. These options are not available in the advanced search boxes.

All in all – i like Bing and prefer its interface to Live. I like colourful pages, and have customised my Google page with iGoogle themes, and Ask with it’s skins. Yet again, however, this is not a Google Killer – and perhaps it’s not trying to be. The key thing: Bing is not google!

A number of other reviews on Bing worth reading:

Mixed reviews of Bing, Microsoft’s new search engine
– the Daily Telegraph
Bing Don’t Bother
– Karen Blakeman’s review

Bing Launches – it’s awful
– Phil Bradley’s review

Bing Bing: Microsoft’s search engine unexpectedly live, but not Live
– the Guardian