Archive

Posts Tagged ‘Wikipedia’

Zanran – a new data search engine

April 21, 2011 4 comments

I’ve been playing with a new data search engine called Zanran – that focuses on finding numerical and graphical data. The site is in an early beta. Nevertheless my initial tests brought up material that would only have been found using an advanced search on Google – if you were lucky. As such, Zanran promises to be a great addition for advanced data searching.

Zanran.com

Zanran.com - Front Page

Zanran focuses on finding what it calls  ‘semi-structured’ data on the web. This is defined as numerical data presented as graphs, tables and charts – and these could be held in a graph image or table in an HTML file, as part of a PDF report, or in an Excel spreadsheet. This is the key differentiator – essentially, Zanran is not looking for text but for formatted numerical data.

When I first started looking at the site I was expecting something similar to Wolfram Alpha – or perhaps something from Google (e.g. Google Squared or Google Public Data). Zanran is nothing like these – and so brings something new to search. Rather than take data and structure or tabulate it (as with Wolfram Alpha and Google Squared), Zanran searches for data that is already in tables or charts and uses this in its results listing.

Zanran.com

Zanran.com Search: "Average Marriage Age"

The site has a nice touch in that hovering the cursor over results gives you the relevant data page – whether a table, a chart or a mix of text, tables or charts.

Zanran.com - Hovering over a result brings up an image of the data.

The advanced search options allow country searching (based on server location), document date and file type, each selectable from a drop-down box, as well as searches on specified web-sites.  At the moment only English speaking countries can be selected (Australia, Canada, Ireland, India, UK New Zealand, USA and South Africa). The date selections allow for the last 6, 12 or 24 months and the file type allows for selection based on PDF; Excel; images in HTML files; tables in HTML files; PDF, Excel and dynamic data; and dynamic data alone. PowerPoint and Word files are promised as future options. There are currently no field search options (e.g. title searches).

My main dislike was that the site doesn’t give the full URLs for the data presented. The top-level domain is given, but not the actual URL which makes the site difficult to use when full attribution is required for any data found (especially if data gets downloaded, rather than opening up in a new page or tab).

Zanran.com has been in development since at least 2009 when it was a finalist in the London Technology Fund Competition. The technology behind Zanran is patented and based on open-source software, and cloud storage. Rather than searching for text, Zanran searches for numerical content, and then classifies it by whether it’s a table or a chart.

Atypically, Zanran is not a Californian Silicon Valley Startup, but is based in the Islington area of London, in a quiet residential side-street made up of a mixture of small mostly home-based businesses and flats/apartments. Zanran was founded by two chemists, Jonathan Goldhill and Yves Dassas, who had previously run telecom businesses (High Track Communications Ltd and Bikebug Radio Technologies) from the same address. Funding has come from the London Development Agency and First Capital among other investors.

Zanran views competitors as Wolfram Alpha, Google Public Data and also Infochimps (a database repository – enabling users to search for and download a wide variety of databases). The competitor list comes from Google’s cache of Zanran’s Wikipedia page as unfortunately, Wikipedia has deleted the actual page – claiming that the site is “too new to know if it will or will not ever be notable“.

Google Cache of Zanran's Wikipedia entry

I hope that Wikipedia is wrong and that Zanran will become “notable” as I think the company offers a new approach to searching the web for data. It will never replace Google or Bing – but that’s not its aim. Zanran aims to be a niche tool that will probably only ever be used by search experts. However as such, it deserves a chance, and if its revenue model (I’m assuming that there is one) works, it deserves success.

Google Squared – tabulate results instantly

June 10, 2009 2 comments

Google Squared is a new addition to the Google Labs portfolio of products being tested by Google. Launched on June 3, it looks as though it’s aimed at offering an ability to get more from simple searches – perhaps a bit like WolframAlpha.

It’s still very much a beta-test addition to the Google product range so there are bound to be some holes. What happens is that you enter a search term and a spreadsheet type page appears with various headings in the leftmost column, followed by a description and various other (generally) relevant columns.
It’s an interesting way of using Google’s data – i just wish that the results were more consistent and accurate. Entering in the search term Planets should be a perfect way of testing how the spreadsheet approach works – and unfortunately it doesn’t, at least not completely.
The Planets search gives a list on the left of Earth, Jupiter, Pluto, Saturn, Mercury, Venus, Neptune but misses out Mars and Uranus. Fair-enough – as there is the “Add next 10 items” link at the bottom (and also an “Add items” option). Adding the next 10 items however doesn’t give the missing planets, but instead, Ceres, Charon and various other headings relevant but only indirectly (e.g. “Planets in Science Fiction” or “solar system”).
The “Add items” option does better in that it gives potential choices – which include Uranus and Mars.
The next column contains an image of each planet although the one for Pluto, culled from the SouthernWatch blog is actually a diagram of Pluto with a slogan attached saying Pluto for Planethood.

I’m sort of surprised that it didn’t include a picture of the Disney character.
The next column is a description – again taken from various places so not showing any consistency. The description for Jupiter for example, seems accurate – and is taken from Wikpedia:
Jupiter is classified as a gas giant along with Saturn, Uranus and Neptune. Together, these four planets are sometimes referred to as the Jovian planets. …
The problem is that different sources are used for different planets – with both Venus and Saturn being particularly obtuse – taken respectively from www.venus.com and www.saturn.com.

Only at Venus, find the sexiest women’s swimwear and clothing. Shop online or request a catalog for sizzling hot clothing and swimsuits. …

and
While you’re shopping for Saturn vehicles, we’ve given you an easy way to keep information for the next time you visit Saturn.com. With My Saved Info, …
The following columns give the orbital period, equatorial surface and mean density – with an option to add further items at the end. The problem is not knowing how accurate these are – and in fact they appear as eccentric as the planet descriptions with some containing units and others just a number. I’d certainly not want to use the values or any entries in any school paper or anything where I need reliable answers.
In summary, Google Squared is interesting and if Google manages to include some quality checking – perhaps by only using certain sites or back-checking to ensure that correct context then this could be a winner. Until then, i’ll stick with WolframAlpha and Wikipedia for a quick look at multiple facts.

WolframAlpha

June 4, 2009 1 comment

I’m still not sure what to make of WolframAlpha – the new “computational search tool”. I like what it can do – as a way of solving crosswords, or doing math calculations. For a lot of information it’s probably easier to use than Wikipedia but i can’t really see how it will help in most business type queries – at least it won’t yet.

If you want to find a word where you know some letters it’s great. Type in _i_i_i and you’ll get the answer “bikini” and also “militia” – two words that match that pattern. Put in an equation and you’ll get a graph, or a chemical or molecular symbol and you’ll get information on the element or compound. Enter in stock codes and you’ll get some company information but too often the result is “Wolfram|Alpha isn’t sure what to do with your input.” You’ll get this if you put in British Telecom but WolframAlpha knows about BT as enter this and you get correct information on British Telecom’s share performance.

I think part of the problem is that WolframAlpha is different and new. It’s NOT a search engine (despite the hype saying it would be a Google killer). It’s not an encylopedia although many entries are encylopedic. Instead, it’s what it says on its description – it’s a computational knowledge engine. Use it to carry out calculations or to bring up data that’s in it’s knowledge engine – but don’t use it for much more. It’s a useful addition to the search scene and will make life easier for some searches, but that’s about it. For most searches I’ll stick with Google and other search engines. For general information I’ll remain happy with Wikipedia. However I will use WolframAlpha for information requiring some element of computation more complex or requiring greater detail than is available in Google’s calculate functionality.