Archive
Google Carousel – a roundabout of images but not for all searches
Every Wednesday, Daniel Russell, a researcher working with Google, posts a search question on his search & research blog. The search question for 26 September 2012 related to differences between the coastlines on the East and West coasts of the USA. Attempting to answer the question I typed in [Atlantic islands] into Google. Unlike the usual list I’d expected, I got this:
The images at the top of my search were a surprise. Clicking on the arrows gave me further images – totalling 55 island pictures. I tried a few other searches [Pacific Islands], [Indian Ocean Islands], etc. and found similar results. Yet most searches such as [Scottish Islands] gave me the normal type of listing.
Intrigued, I contacted a couple of colleagues – Karen Blakeman of RBA Information Services and Marydee Ojala, Editor of Online magazine (and the Online Insider blog). Both Karen and Marydee are also members of the Association of Independent Information Professionals and like so many AIIP members, are expert searchers. (All three of us are presenting at the forthcoming Internet Librarian Conference in London and led the London Websearch Academy in 2011).
Marydee admitted to being bemused but guessed it was connected to Google’s Knowledge Graph initiative – the new service that puts details on a search topic to the right of the search results – as with this example search for [Albert Einstein].
Knowledge Graph was launched by Google in May 2012 and aims to give instant answers to many encyclopedia type search queries. However this didn’t explain what I’d found. Marydee looked a bit further and found that the TechCrunch blog had discovered this earlier in September.
I mentioned that I’d found it because of Dan Russell’s blog and Marydee asked him about the new feature. Dan responded that the “carousel” of images is triggered whenever Google knows about a collection or group of connected items such as “Atlantic Islands”. The group is then summarised and made available at the top of the results list – allowing searchers to quickly recognise the collection and the other group members.
So that’s it then! It’s a new feature giving a “carousel” of images. If you search for [knowledge graph carousel] you get the above Techcrunch link and also Google’s own search blog on the topic . (There’s a lesson here – always check Google’s own blog posts if you spot what looks like odd Google behaviour). A search for [Knowledge graph] gives Google’s own description of the feature, including a YouTube video explaining it.
Dan Russell’s reply however said more:
What it triggers on is a bit more problematic. Answer: only collections we know about, which can be a bit odd. [moons of Saturn] but not [U.S. presidents]. [famous jazz composers] works, but not [cities in UAE]
This seems to explain why not all searches show the carousel. [Atlantic Islands] does. So does [Pacific Islands] but [Islands] doesn’t. [Greek Islands] is mentioned as an example in the YouTube video – but the less touristy [Scottish Islands] fails to show the carousel. It’s not just islands that give oddly inconsistent results. [Famous Jazz composers] results in the carousel appearing but [famous composers] gives a normal display. [20th century composers] works as does [19th century composers]. Bizarrely [18th century composers] doesn’t work and nor does [20th century artists] or [19th century artists]. Yet [impressionist artists] and [surrealist artists] do work. The results definitely seem surreal!
The TechCrunch blog tested the feature looking at rides at the Cedar Point theme park in Northern Ohio. I decided to ride the carousel on Disney parks. Again the results were odd – but a pattern seemed to emerge. [Disneyland rides], [Epcot rides], [Magic Kingdom Rides] all worked but [Disneyworld rides] didn’t. I then tried [Disney Paris Rides]. That works. So does [Disney California Rides]. However [Disney Florida Rides], [Disney Tokyo Rides] and [Disney Hong Kong Rides] all failed to work.
It seems as if there are two factors playing out here. The first is whether Google knows enough about the topic to create a set of common images. My guess is that Disney Hong Kong and Tokyo fail on that count – and possibly this explains why 18th century composers also fails. That can’t however explain the difference between Disney California and Paris, compared to Disney Florida. That brings in the second factor: the number of items in the collection. There are several Disney World theme parks for Disney Florida – Epcot, Magic Kingdom and more. I suspect that there are too many rides to be displayed in a meaningful manner. The aim of the Carousel is to encourage exploration – and a never-ending list tends to do the opposite: like a carousel that goes to fast, there is a risk that people may fall off.
How to spot web-site plagiarism and copyright theft with CopyScape
AWARE has had a web-site since 1995 and our current domain (www.marketing-intelligence.co.uk) has been active since 1997. When we started there were less than 100,000 companies on the web. Google’s founders had not yet met each other, and even venerable search engines such as AltaVista had not yet started.
Over the years, we’ve made an effort to ensure that our web content was not copied and used on other sites without our permission.
Although doing manual checks by searching for key phrases is one way of checking for plagiarism and copyright theft, there are a number of dedicated plagiarism checking sites. One example is Plagium. Plagium’s drawback, shared with several similar services, is that you have to paste in the text you want to test rather than just enter the URL. Such services are generally aimed at helping teachers and college professors detect student cheating.
Although some services (such as Plagium) are free, most are not and may involve downloading dedicated software. Others only check a limited number of known “essay” type sites where students can download essays written by others. (We’ve found some of our content on such sites – evidently students who use them don’t care where they steal their content from. Once used successfully they then try to reuse it by uploading their A+ essay to the site).
CopyScape
Of all plagiarism detection websites probably the easiest and best is CopyScape. Copyscape’s aim is not only to help academics detect student cheating. It also allows webmasters to search for copied content in general. It doesn’t require users to paste in the suspect text. Instead web-site owners simply need to enter their URLs and get a report on other sites that use similar or identical wording. It’s sufficiently powerful that there is even a flippant web-page on ContentBoss’s website giving advice on how to bypass CopyScape and copy with impunity. (ContentBoss promises to provide unique content at a low monthly fee. Their bypass CopyScape tool uses a technique that will convert content into HTML guaranteed not to be picked up by plagiarism detectors. The catch is, as pointed out by ContentBoss, that using such content is also a guarantee that the site will be banned by search engines for spam content).
We’ve used CopyScape periodically over the years and miscreants included a competitor site that copied multiple pages from our site. We asked the site owner to change his pages and were ignored. We then took stronger action and within a couple of days the site was taken down. Another example involved an article published in a professional journal that took, almost verbatim, the content of our brief guide to competitive intelligence. We notified the publisher who ensured that the payment made to the “author” was recovered, and an apology published. The author said that he thought that material published on the web was copyright free. He was shown to be wrong.
Our most recent trawl for examples of copyright theft from AWARE’s pages turned up further examples where wording we’ve used has been stolen. The following images should show how effective the tool is – while at the same time naming and shaming the companies that are too weak, lazy or incompetent to produce their own copy and have to steal from others. (I’ve named them – but won’t give them the satisfaction of a link as this could help their search engine optimisation efforts – if they have any!)
The first example shows how text that appears on the footer of most of our pages is plagiarised.
This is the orignal text.
CopyScape found several sites had copied this text almost verbatim – for example Green Oasis Associates based in Nigeria:
or ICM Research from Italy and Pearlex from Virginia in the USA.
The ICM Research example is in fact the worst of these three, as their site has taken content from several other AWARE web-site pages.
The problem is that a company that is willing to steal content from other businesses is unethical – breaking the rule against misrepresenting who you are. If they are willing to steal content from others, they may also take short-cuts in the services provided and as a result should not be trusted to provide a competent service.
The page that is most often plagiarised is the Brief Guide to Competitive Intelligence Page, mentioned above. Clicking on a link found by CopyScape highlights the copied portions as seen in the following examples from AGResearch, Emisol and Wordsfinder.
Generally sites do not copy whole pages (although this does happen) but integrate chunks of stolen text into their pages – as seen in the AGResearch example, below – where 12% of the page is copied, and Wordsfinder where 13% has been copied.
The Emisol example below stole less – although copied key parts of the guide page:
Conclusion
Copyright theft is a compliment to the author of the original web-page, as it shows that the plagiarizing site views their competitor as top quality. However the purpose of writing good copy is to stand out and show one’s own capabilities. Sites that steal other site’s work remove this advantage as they make the claims seem anodyne and commonplace. They devalue both the copier – who cannot come up with their own material (and so are unlikely to be able to provide a competent service anyway) and the originator, as most people won’t be able to tell who came first. Fortunately search engines can, and when they detect duplication, they are likely to downplay the duplicated material meaning that such sites are less likely to appear high-up in search engine rankings. The danger is that both the originator of the material and the plagiarizer may get penalised by search engines – which is another reason to ensure that copyright thieves are caught and stopped. CopyScape is one tool that really works in protecting authors from such plagiarism.