Every Wednesday, Daniel Russell, a researcher working with Google, posts a search question on his search & research blog. The search question for 26 September 2012 related to differences between the coastlines on the East and West coasts of the USA. Attempting to answer the question I typed in [Atlantic islands] into Google. Unlike the usual list I’d expected, I got this:
The images at the top of my search were a surprise. Clicking on the arrows gave me further images – totalling 55 island pictures. I tried a few other searches [Pacific Islands], [Indian Ocean Islands], etc. and found similar results. Yet most searches such as [Scottish Islands] gave me the normal type of listing.
Intrigued, I contacted a couple of colleagues – Karen Blakeman of RBA Information Services and Marydee Ojala, Editor of Online magazine (and the Online Insider blog). Both Karen and Marydee are also members of the Association of Independent Information Professionals and like so many AIIP members, are expert searchers. (All three of us are presenting at the forthcoming Internet Librarian Conference in London and led the London Websearch Academy in 2011).
Marydee admitted to being bemused but guessed it was connected to Google’s Knowledge Graph initiative – the new service that puts details on a search topic to the right of the search results – as with this example search for [Albert Einstein].
Knowledge Graph was launched by Google in May 2012 and aims to give instant answers to many encyclopedia type search queries. However this didn’t explain what I’d found. Marydee looked a bit further and found that the TechCrunch blog had discovered this earlier in September.
I mentioned that I’d found it because of Dan Russell’s blog and Marydee asked him about the new feature. Dan responded that the “carousel” of images is triggered whenever Google knows about a collection or group of connected items such as “Atlantic Islands”. The group is then summarised and made available at the top of the results list – allowing searchers to quickly recognise the collection and the other group members.
So that’s it then! It’s a new feature giving a “carousel” of images. If you search for [knowledge graph carousel] you get the above Techcrunch link and also Google’s own search blog on the topic . (There’s a lesson here – always check Google’s own blog posts if you spot what looks like odd Google behaviour). A search for [Knowledge graph] gives Google’s own description of the feature, including a YouTube video explaining it.
Dan Russell’s reply however said more:
What it triggers on is a bit more problematic. Answer: only collections we know about, which can be a bit odd. [moons of Saturn] but not [U.S. presidents]. [famous jazz composers] works, but not [cities in UAE]
This seems to explain why not all searches show the carousel. [Atlantic Islands] does. So does [Pacific Islands] but [Islands] doesn’t. [Greek Islands] is mentioned as an example in the YouTube video – but the less touristy [Scottish Islands] fails to show the carousel. It’s not just islands that give oddly inconsistent results. [Famous Jazz composers] results in the carousel appearing but [famous composers] gives a normal display. [20th century composers] works as does [19th century composers]. Bizarrely [18th century composers] doesn’t work and nor does [20th century artists] or [19th century artists]. Yet [impressionist artists] and [surrealist artists] do work. The results definitely seem surreal!
The TechCrunch blog tested the feature looking at rides at the Cedar Point theme park in Northern Ohio. I decided to ride the carousel on Disney parks. Again the results were odd – but a pattern seemed to emerge. [Disneyland rides], [Epcot rides], [Magic Kingdom Rides] all worked but [Disneyworld rides] didn’t. I then tried [Disney Paris Rides]. That works. So does [Disney California Rides]. However [Disney Florida Rides], [Disney Tokyo Rides] and [Disney Hong Kong Rides] all failed to work.
It seems as if there are two factors playing out here. The first is whether Google knows enough about the topic to create a set of common images. My guess is that Disney Hong Kong and Tokyo fail on that count – and possibly this explains why 18th century composers also fails. That can’t however explain the difference between Disney California and Paris, compared to Disney Florida. That brings in the second factor: the number of items in the collection. There are several Disney World theme parks for Disney Florida – Epcot, Magic Kingdom and more. I suspect that there are too many rides to be displayed in a meaningful manner. The aim of the Carousel is to encourage exploration – and a never-ending list tends to do the opposite: like a carousel that goes to fast, there is a risk that people may fall off.
At the end of April 2012, the official web-site for the 2012 London Olympics was launched – listing participating countries. The list contained embarrassing errors – which illustrate how political and geographical ignorance overcame factual accuracy and even elementary school knowledge.
As an example, the web-site gives Asia as the location for Palestine but the country next door – Israel – is in Europe. A quick check on any atlas will show that Israel is located in Asia – as are its neighbours (Palestine, Jordan, Syria, Egypt).
When the web-site launched, this failure to check facts was even more inept. Originally the country profiles included the country capitals, population and currency. However for Israel the site initially put a blank for Israel and named Israel’s capital city, Jerusalem, as the capital of Palestine. When this was pointed out (as reported by the Times of Israel) this was reversed – with Palestine’s capital left out. Meanwhile the US Dollar was listed as the official currency for Palestine.
It’s not difficult to check such facts. There are numerous web-sites that list country capitals, currency and much more. For example, About.com has a geography section listing capitals. About.com is compiled by subject experts and is a good first stop when looking for general information – whether about geography, science, or many other school curriculum topics. Wikipedia also has a page listing country capitals. A quick search on WolframAlpha lists Jerusalem as Israel’s capital although an equivalent search fails on Palestine – perhaps because Palestine is not yet a country. There is also WorldCapitals.info – another listing, and the CIA Factbook, which correctly names Jerusalem as Israel’s capital. Whatever one may think about the CIA as an organisation, its website giving information on the countries of the world is generally reliable and an excellent site for anybody trying to find geographical information.
Accepting that finding the above may be beyond the average Olympic bureaucrat, why not do a simple Google search to check the facts. Putting in Israel capital city as the search term quickly gives the answer: Jerusalem.
This failure hints that in fact the error may not have been just ineptitude but also included an element of political dogma that should be missing from the Olympics. I’m suggesting this because of a related error that appeared in the Guardian newspaper recently.
The Guardian states (note my emphasis):
The caption on a photograph featuring passengers on a tram in Jerusalem observing a two-minute silence for Yom HaShoah, a day of remembrance for the 6 million Jews who died in the Holocaust, wrongly referred to the city as the Israeli capital. The Guardian style guide states: “Jerusalem is not the capital of Israel; Tel Aviv is”
This, as can be seen by the previous listings, is hogwash and is a political attempt by the Guardian to redefine a country’s right to name its own capital city. While it is true that most countries position their embassies in Tel Aviv, this is because of the disputed nature of Jerusalem – despite it being the location for Israel’s government and other national institutions. Failure to give the truth is a disservice to the Guardian’s readers and discredits its position as a leading UK newspaper.
When newspapers such as the Guardian and bodies such as Britain’s National Olympic Committee start to rewrite facts (or fail to check facts) then what hope is there for a genuine peace agreement between Israel and Palestine. What is worrying is that this constant repetition of false information relating to Israel and Palestine is an example of what is commonly termed the Big Lie (Große Lüge). Hitler wrote in Mein Kampf
“But the most brilliant propagandist technique will yield no success unless one fundamental principle is borne in mind constantly and with unflagging attention. It must confine itself to a few points and repeat them over and over. Here, as so often in this world, persistence is the first and most important requirement for success.” (Volume 1, Chapter 6).
The false information relating to Israel in the press now often outweighs the truth. Even the terminology used has become the accepted dictum – and as Hitler counselled, is repeated over and over and over.
As an example, it is rare that the term used for the areas captured by Israel in the 1967 war is anything other than “occupied territories”. In fact, the Gaza strip was given back to Palestinian rule in 2005 and is no longer under Israeli control, and much of the captured territory (that Jordan annexed following the 1948 war) is under Palestinian rule (as had been the objective of the 1947 UN partition plan). Actual ownership of this land is disputed as there is no clear-cut international agreement on who owns the territory. Thus the correct term should be “disputed territories”. Anything else (i.e. “occupied” – as used by anti-Israel protagonists or “liberated” as used by the Israeli right-wing) is inaccurate.
Another example of a propaganda lie used against Israel is the word “apartheid” with Israel accused of adopting apartheid policies to discriminate against the Palestinians. Wikipedia states that
the crime of apartheid is defined by the 2002 Rome Statute of the International Criminal Court as inhumane acts of a character similar to other crimes against humanity “committed in the context of an institutionalized regime of systematic oppression and domination by one racial group over any other racial group or groups and committed with the intention of maintaining that regime.”
Proof that the apartheid claim is a lie is not hard to find. Arab citizens of Israel have full voting rights, rights of employment, education and free movement (which was not the case for black Africans during the apartheid regime in South Africa). There are Arab members of the Israeli parliament (which have included Arab ministers such as Ayoob Kara and Raleb Majadale) and Arab supreme court judges (for example Salim Joubran). Israel’s giving control of Gaza to the Palestinians shows that Israel’s intention is not to maintain dominance over the Palestinians. Yet the lie that Israel is an apartheid State is repeated over and over and over again – just as Hitler counselled for false propaganda.
Using propaganda to make political points makes sense in war but doesn’t make sense when seeking peace. Peace requires honesty, together with an attempt to seek common ground and compromise without propaganda lies, so that reconciliation and trust can be built leading to bridges that end conflict. This applies to all parties – whether involved in the conflict or on the sidelines.
A failure to identify falsehood by basic checking of facts – such as the location of Israel’s capital – does the opposite and prolongs the state of conflict, reinforcing those who choose to believe propaganda over truth. In this, the Guardian and the London2012 websites are both guilty – as continuously repeating such lies (taking Hitler’s advice) aims to delegitimize Israel’s right to exist as a sovereign State, and the right of Jews to live freely and govern themselves in Israel.
AWARE has had a web-site since 1995 and our current domain (www.marketing-intelligence.co.uk) has been active since 1997. When we started there were less than 100,000 companies on the web. Google’s founders had not yet met each other, and even venerable search engines such as AltaVista had not yet started.
Over the years, we’ve made an effort to ensure that our web content was not copied and used on other sites without our permission.
Although doing manual checks by searching for key phrases is one way of checking for plagiarism and copyright theft, there are a number of dedicated plagiarism checking sites. One example is Plagium. Plagium’s drawback, shared with several similar services, is that you have to paste in the text you want to test rather than just enter the URL. Such services are generally aimed at helping teachers and college professors detect student cheating.
Although some services (such as Plagium) are free, most are not and may involve downloading dedicated software. Others only check a limited number of known “essay” type sites where students can download essays written by others. (We’ve found some of our content on such sites – evidently students who use them don’t care where they steal their content from. Once used successfully they then try to reuse it by uploading their A+ essay to the site).
Of all plagiarism detection websites probably the easiest and best is CopyScape. Copyscape’s aim is not only to help academics detect student cheating. It also allows webmasters to search for copied content in general. It doesn’t require users to paste in the suspect text. Instead web-site owners simply need to enter their URLs and get a report on other sites that use similar or identical wording. It’s sufficiently powerful that there is even a flippant web-page on ContentBoss’s website giving advice on how to bypass CopyScape and copy with impunity. (ContentBoss promises to provide unique content at a low monthly fee. Their bypass CopyScape tool uses a technique that will convert content into HTML guaranteed not to be picked up by plagiarism detectors. The catch is, as pointed out by ContentBoss, that using such content is also a guarantee that the site will be banned by search engines for spam content).
We’ve used CopyScape periodically over the years and miscreants included a competitor site that copied multiple pages from our site. We asked the site owner to change his pages and were ignored. We then took stronger action and within a couple of days the site was taken down. Another example involved an article published in a professional journal that took, almost verbatim, the content of our brief guide to competitive intelligence. We notified the publisher who ensured that the payment made to the “author” was recovered, and an apology published. The author said that he thought that material published on the web was copyright free. He was shown to be wrong.
Our most recent trawl for examples of copyright theft from AWARE’s pages turned up further examples where wording we’ve used has been stolen. The following images should show how effective the tool is – while at the same time naming and shaming the companies that are too weak, lazy or incompetent to produce their own copy and have to steal from others. (I’ve named them – but won’t give them the satisfaction of a link as this could help their search engine optimisation efforts – if they have any!)
The first example shows how text that appears on the footer of most of our pages is plagiarised.
This is the orignal text.
CopyScape found several sites had copied this text almost verbatim – for example Green Oasis Associates based in Nigeria:
or ICM Research from Italy and Pearlex from Virginia in the USA.
The ICM Research example is in fact the worst of these three, as their site has taken content from several other AWARE web-site pages.
The problem is that a company that is willing to steal content from other businesses is unethical – breaking the rule against misrepresenting who you are. If they are willing to steal content from others, they may also take short-cuts in the services provided and as a result should not be trusted to provide a competent service.
The page that is most often plagiarised is the Brief Guide to Competitive Intelligence Page, mentioned above. Clicking on a link found by CopyScape highlights the copied portions as seen in the following examples from AGResearch, Emisol and Wordsfinder.
Generally sites do not copy whole pages (although this does happen) but integrate chunks of stolen text into their pages – as seen in the AGResearch example, below – where 12% of the page is copied, and Wordsfinder where 13% has been copied.
The Emisol example below stole less – although copied key parts of the guide page:
Copyright theft is a compliment to the author of the original web-page, as it shows that the plagiarizing site views their competitor as top quality. However the purpose of writing good copy is to stand out and show one’s own capabilities. Sites that steal other site’s work remove this advantage as they make the claims seem anodyne and commonplace. They devalue both the copier – who cannot come up with their own material (and so are unlikely to be able to provide a competent service anyway) and the originator, as most people won’t be able to tell who came first. Fortunately search engines can, and when they detect duplication, they are likely to downplay the duplicated material meaning that such sites are less likely to appear high-up in search engine rankings. The danger is that both the originator of the material and the plagiarizer may get penalised by search engines – which is another reason to ensure that copyright thieves are caught and stopped. CopyScape is one tool that really works in protecting authors from such plagiarism.
Good business intelligence quickly identifies information that is real and what’s false – or should. It’s important that decision making is based on accurate, factual data – as otherwise bad decisions get made. So how do you tell whether something is real or fake?
Generally, the first rule is to check the source or sources.
- Are they reputable and reliable?
- Is the information in the story sensible and reasonable?
- What’s the background to the story – does it fit in with what’s already known?
The problem is that even if information passes these tests it may still not be true. There are numerous examples of news items that sound true but that turn out to be false. One example is a BBC news story from 2002 quoting German researchers who claimed that natural blondes were likely to disappear within 200 years. A similar story appeared in February 2006 in the UK’s Sunday Times. This article quoted a WHO study from 2002. In fact, there was no WHO study that stated this – it was false. The story of blonde extinction has been traced back over 150 years and periodically is reported – always with “scientific” references to imply validity.
The “Internet Explorer users have lower IQs” hoax
Often, the decision to accept a news item depends on whether or not it sounds true. If the story sounds true, especially if supported by apparent research then people think that it probably is – and so checks aren’t made. That is why a recent news story suggesting that users of Internet Explorer have lower IQs than those of other browsers was reported so widely. Internet Explorer is often set up as the default browser on Windows computers, and many users are more familiar with Explorer than other browsers. The suggestion that less technologically adept users (i.e. less intelligent users) would not know how to download or switch to a different browser made sense.
I first read the news story in The Register – an online technical newspaper covering web, computer and scientific news. Apart from The Register, the story appeared on CNN, the BBC, the Huffington Post, Forbes and many other news outlets globally (e.g. the UK’s Daily Telegraph and Daily Mail). Many of these have now either pulled the story completely, just reporting the hoax, or added an addendum to their story showing that it was a hoax. A few admit to being fooled – the Register, for example, explained why they believed it: because it sounded plausible.
The hoax succeeded however, not only because the story itself sounded plausible, but also because a lot of work had been put in to make it look real. The hoaxer had built a complete web-site to accompany the news item – including other research, implying that the research company concerned was bona fide, other product details, FAQs, and even other research reports, etc. The report itself was included as a PDF download.
Red Flags that indicated the hoax
To its credit the technology magazine, Wired.com spotted several red flags, suggesting that the story was a hoax, stating that “If a headline sounds too good to be true, think twice.”
Wired commented that the other journalists hadn’t really looked at the data, pointing out that “journalists get press releases from small research companies all the time“. The problem is that it’s one thing getting a press release and another printing it without doing basic journalistic checks and follow-throughs. In this case,
- the “research company” AptiQuant had no history of past studies – other than on its own web-site;
- the company address didn’t exist;
- the average reported IQ for Internet Explorer users (80) was so low as to put them in the bottom 15% of the population (while that for Opera users put them in the top 5%) – scarcely credible considering Internet Explorer’s market share.
1. The domain was registered on July 14th 2011.
2. The test that was mentioned in the report, “Wechsler Adult Intelligence Scale (IV) test” is a copyrighted test and cannot be administered online.
3. The phone number listed on the report and the press release is the same listed on the press releases/whois of my other websites. A google search reveals this.
4. The address listed on the report does not exist.
5. All the material on my website was not original.
6. The website is made in WordPress. Come on now!
7. I am sure, my haphazardly put together report had more than one grammatical mistakes.
8. There is a link to our website AtCheap.com in the footer.
The rationale and the aftermath
Gill is a computer programmer based in Vancouver, Canada, working on a a comparison shopping website www.AtCheap.com. Gill became irritated at having to code for earlier versions of Internet Explorer – and especially IE 6.0 which is still used by a small percentage of web users. (As of July 2011, 9% of web-users use Internet Explorer versions 6.0 and 7.0 with a further 26% using version 8.0. Only 7% of web users have upgraded to the latest version of Internet Explorer – v9.0).
The problem with IE versions 6.0-8.0 is that they are not compatible with general web-standards making life difficult for web designers who have to code accordingly, and test sites on multiple versions of the same browser – all differing slightly. (As you can’t have all 4 versions of Internet Explorer IE6.0 – IE9.0 on the same computer this means operating 4 separate computers or having 4 hard-disk partitions – one for each version).
Gill decided to create something that would encourage IE users to upgrade or switch, and felt that a report that used scientific language and that looked authentic would do the trick. He designed the web-site, copying material from Central Test, and then put out the press release – never expecting the story to spread so fast or far. He was sure he’d be found out much more quickly.
The problem was that after one or two reputable news sources published the story everybody else piled in. Later reports assumed that the early ones had verified the news story so nobody did any checks. The Register outlined the position in their mea culpa, highlighting how the story sounded sensible.
Many news outlets are busy flagellating themselves for falling for the hoax. But this seems odd when you consider that these news outlets run stories on equally ridiculous market studies on an almost day basis. What’s more, most Reg readers would argue that we all know Internet Explorer users have lower IQs than everyone else. So where’s the harm?
The facts are that AptiQuant doesn’t exist and its survey was a hoax. But facts and surveys are very different from the truth. “It’s official: IE users are dumb as a bag of hammers,” read our headline. “100,000 test subjects can’t be wrong.” The test subjects weren’t real. But they weren’t necessarily wrong either.
You may disagree. But we have no doubt that someone could easily survey 100,000 real internet users and somehow prove that we’re exactly right. And wrong.
The real issue is that nobody checked as the story seemed credible. Competitive Intelligence analysis cannot afford to be so lax. If nobody else bothers verifying a news story that turns out to be false, you have a chance to gain competitive advantage. In contrast those failing to check the story risk losing out. The same lessons that apply to journalists apply to competitive intelligence and just because a news story looks believable, is published in a reputable source and is supported by several other sources doesn’t make it true. The AptiQuant hoax story shows this.
Meanwhile the story rumbles on with threats of lawsuits against Tarandeep Gill by both Microsoft (for insulting Internet Explorer users) and more likely by Central Test. Neither company is willing to comment although Microsoft would like users to upgrade Internet Explorer to the latest version. In May 2010 Microsoft’s Australian operation even said using IE6 was like drinking nine-year-old milk. If Gill has managed to get some users to upgrade he’ll have helped the company. He should have also helped Central Test – as the relatively unknown company has received massive positive publicity as a result of the hoax. If they do sue, it shows a lack of a sense of humour (or a venal desire for money) – and will leave a sour taste as bad as from drinking that nine-year-old milk.
I’ve been playing with a new data search engine called Zanran – that focuses on finding numerical and graphical data. The site is in an early beta. Nevertheless my initial tests brought up material that would only have been found using an advanced search on Google – if you were lucky. As such, Zanran promises to be a great addition for advanced data searching.
Zanran focuses on finding what it calls ‘semi-structured’ data on the web. This is defined as numerical data presented as graphs, tables and charts – and these could be held in a graph image or table in an HTML file, as part of a PDF report, or in an Excel spreadsheet. This is the key differentiator – essentially, Zanran is not looking for text but for formatted numerical data.
When I first started looking at the site I was expecting something similar to Wolfram Alpha – or perhaps something from Google (e.g. Google Squared or Google Public Data). Zanran is nothing like these – and so brings something new to search. Rather than take data and structure or tabulate it (as with Wolfram Alpha and Google Squared), Zanran searches for data that is already in tables or charts and uses this in its results listing.
The site has a nice touch in that hovering the cursor over results gives you the relevant data page – whether a table, a chart or a mix of text, tables or charts.
The advanced search options allow country searching (based on server location), document date and file type, each selectable from a drop-down box, as well as searches on specified web-sites. At the moment only English speaking countries can be selected (Australia, Canada, Ireland, India, UK New Zealand, USA and South Africa). The date selections allow for the last 6, 12 or 24 months and the file type allows for selection based on PDF; Excel; images in HTML files; tables in HTML files; PDF, Excel and dynamic data; and dynamic data alone. PowerPoint and Word files are promised as future options. There are currently no field search options (e.g. title searches).
My main dislike was that the site doesn’t give the full URLs for the data presented. The top-level domain is given, but not the actual URL which makes the site difficult to use when full attribution is required for any data found (especially if data gets downloaded, rather than opening up in a new page or tab).
Zanran.com has been in development since at least 2009 when it was a finalist in the London Technology Fund Competition. The technology behind Zanran is patented and based on open-source software, and cloud storage. Rather than searching for text, Zanran searches for numerical content, and then classifies it by whether it’s a table or a chart.
Atypically, Zanran is not a Californian Silicon Valley Startup, but is based in the Islington area of London, in a quiet residential side-street made up of a mixture of small mostly home-based businesses and flats/apartments. Zanran was founded by two chemists, Jonathan Goldhill and Yves Dassas, who had previously run telecom businesses (High Track Communications Ltd and Bikebug Radio Technologies) from the same address. Funding has come from the London Development Agency and First Capital among other investors.
Zanran views competitors as Wolfram Alpha, Google Public Data and also Infochimps (a database repository – enabling users to search for and download a wide variety of databases). The competitor list comes from Google’s cache of Zanran’s Wikipedia page as unfortunately, Wikipedia has deleted the actual page – claiming that the site is “too new to know if it will or will not ever be notable“.
I hope that Wikipedia is wrong and that Zanran will become “notable” as I think the company offers a new approach to searching the web for data. It will never replace Google or Bing – but that’s not its aim. Zanran aims to be a niche tool that will probably only ever be used by search experts. However as such, it deserves a chance, and if its revenue model (I’m assuming that there is one) works, it deserves success.