Yahoo Flickr Images: Does Search Work?

August 31, 2014

I think you know the answer if you are a regular reader of Beyond Search.

Nope.

Finding images is a tedious and time consuming business. I know what the marketing collateral and public relations noise suggests. One can search by photographer, color, yada, yada.

The reality is that finding an image requires looking at images. Some find this fun, particularly if the client is paying by the hour for graphic expertise. For me, image search underscores how primitive information retrieval tools are.

Feel free to disagree.

To test Yahoo Flickr search, navigate to “Welcome to the Internet Archive to the the Commons.” Check out the sample entry to the millions of public domain images.

image

Darned meaty.

To search the “Commons”, one has to navigate to the Commons page and scroll down to the search box highlighted in yellow in this screenshot:

image

Enter a query like this one “18th century elocution.”

Here’s what the system displayed:

image

I then tried this query “london omnibus 1870”.

Here’s what the system displayed:

image

No omnibuses.

Like many image retrieval systems, the user has to fiddle with queries until images are spotted by manual inspection.

The archive is useful. Finding images in Yahoo Flickr remains a problem for me. I thought Xooglers knew quite a bit about search. You know: Finding information when the user enters a key word or two.

Stephen E Arnold, August 31, 2014

Quote to Note: Facebook Search

August 31, 2014

Facebook has done little public facing work on search. Behind the scenes, Facebookers and Xooglers have been beavering away. A bit of public information surfaced in “Zuckerberg On Search — Facebook Has More Content Than Google.” Does Facebook have a trillion pieces of content. Is that more content than Google has? Nah. But it is the thought that counts:

Here’s the quote I highlighted:

What would it ultimately mean if Facebook’s search efforts are effective–and if Facebook allowed universal use of a post search tool that really worked? It’s dizzying, really. As Zuckerberg said early this year on an earnings call: “There are more than a trillion status updates and unstructured text posts and photos and pieces of content that people have shared over the past 10 years.” Then the Facebook CEO put that figure into context: “a trillion pieces of content is more than the index in any web search engine.” You know what “any web search engine” spells? That’s a funny way of spelling Google.

With Amazon nosing into ads and Facebook contemplating more public search functionality, will Google be able to respond in a manner that keeps its revenues flowing and projects like Loon flying? I wonder what the Arnold name surfer thinks about Facebook? Maybe it is a place to post musings about failed youth coaching?

Stephen E Arnold, August 31, 2014

Google and Universal Search or Google Floudering with Search

August 30, 2014

There have been some experts who have noticed that Google has degraded blog search. In the good old days, it was possible to query Google’s index of Web logs. It was not comprehensive, and it was not updated with the zippiness of years past.

Search Engine Land and Web Pro News both pointed out that www.google.com/blogsearch redirects to Google’s main search page. The idea of universal search, as I understood it, was to provide a single search box for Google’s content. Well, that is not too useful when it is not possible to limit a query to a content type or a specific collection.

“Universal” to Google is similar to the telco’s use of the word “unlimited.”

According the to experts, it is possible to search blog content. Here’s the user friendly sequence that will be widely adopted by Google users:

  1. Navigate to the US version of Google News. Note that this can be tricky if one is accessing Google from another country
  2. Enter a query; for example, “universal search”
  3. Click on “search tools” and then click on “All news”image
  4. Then click on “Blogs”

image

Several observations:

First, finding information in Google is becoming more and more difficult.

Second, obvious functions such as providing an easy way to run queries against separate Google indexes is anything but obvious. Do you know how to zip to Google’s patent index or its book index? Not too many folks do.

Third, the “logic” of making search a puzzle is no longer of interest to me. Increasing latency in indexing, Web sites that are pushed deep in the index for a reason unrelated to the site’s content, and a penchant for hiding information points to some deep troubles in Google search.

Net net: Google has lost its way in search. Too bad. As the volume of information goes up, the findability goes down. Wild stuff like Loon and Glass go up. Let’s hope Google can keep its ad revenue flowing; otherwise, there would be little demand for individuals who can perform high value research.

Stephen E Arnold, August 30, 2014

Google: Authors Not Helping Traffic

August 30, 2014

First, Google removed operators for Boolean queries. Then, Google started suggesting what I wanted. Now, Google does away with authors. These steps improve user experience. In John  Mueller’s Google Plus post I learned:

(If you’re curious — in our tests, removing authorship generally does not seem to reduce traffic to sites. Nor does it increase clicks on ads. We make these kinds of changes to improve our users’ experience.)

No, I am not curious. I know several things. Precision and recall are less and less useful to Google.

What is important is ad revenue. Google wants a way to sell ads to fund projects like Loon, Glass, and drones. Oh, pesky authors anyway.

Stephen E Arnold, August 30, 2014

IBM Watson and Research

August 29, 2014

The IBM Watson content marketing machine grinds on. This time, IBM’s Hail Mary is making Watson into a research assistant. Let’s see. Watson does cancer treatment, recipe invention, and insurance analyses. “IBM Sees Broader Role for Watson in Airing Research” the operative word is “sees”, not hipping, sold, market dominance, and similar “got it done” phrases. Heck, there’s not even a public demo on Wikipedia data or a collection of patents.

The write up cheers me forward with:

With the aid of Watson, companies could better mine that private information and combine it with scientific data in the public domain.

One company studying such possibilities to evaluate medications and treatments is Johnson & Johnson, IBM said. But the company sees applications beyond the health realm, including making automated suggestions based on financial, legal, energy and intelligence-related information, IBM said.

Watson has to generate lots of dough and fast. IBM expects the Watson “system” to produce billions in revenue in five or six years. What Watson is producing is more credibility problems for search vendors with technology that “sort of” works.

I had a query yesterday from a consultant whose client wants to use IBM Watson technology. I suggested that if IBM will fund the quest for a brass ring, go for it. Have a Plan B.

In the meantime, I find the Watson arabesques pretty darned interesting. With HP planning billions from Autonomy, where is this money going to come from. No one seems to think much about the need to have a product that solves a problem for a specific company.

No “saids” or “sees” required. Just a business built on open source technology and home grown code. IBM is fascinating as is its content marketing methods. Quite an end of summer announcement. How about a live demo? I am weary of Jeopardy references.

Stephen E Arnold, August 29, 2014

How to End Googles Search Monopoly if You Want To

August 29, 2014

The article on makeuseof titled Help End Google’s Search Monopoly: Use Something Else implores Internet users to consider alternatives for search on the basis of a very simple concept: monopolies are bad. Without a doubt, Google is a monopoly, with the Chinese Baidu in a lagging second place. The amount of power this gives Google is the main target of the article, not Google itself, interestingly. The article states,

“The ball is always in Google’s court – they control the search game. This breeds a culture of tailoring content to what Google wants, with the problem being that nobody really knows what this is. Most “SEO experts” will tell you they know how to get your site ranking highly, but really they have no greater insight into what goes on behind the scenes than you do.

We’re not bitter, that’s not the point of this article.”

They are referring to Panda, Google’s 2011 filter that removed lower quality content websites from searches. This benefitted some sites, but it also had far-reaching negative implications for any number of sites. This is why monopolies are bad, not because Google is inherently evil but because they are making decisions that can affect huge amounts of people and businesses. It may be too late to recommend alternatives like DuckDuckGo, since Google is so ingrained in its users as the only option for search.

Chelsea Kerwin, August 29, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Short Honk: Surveillance Database Report

August 26, 2014

I wanted to document a report that ICREACH exists. For information, see The Intercept’s report. No further comment from Beyond Search.

Stephen E Arnold, August 26, 2014

Endeca Wins Over Beauty Retailer

August 26, 2014

To overhaul the customer experience on their site, ULTA Beauty turned to Endeca. We learn of the move from Integrated Solutions for Retailers in, “Thanx Media’s Oracle Endeca and ULTA Beauty Take Customer Experience to the Next Level.” Thanx Media is ULTA’s integrated-search-solutions provider. The press release tells us:

“Oracle Endeca has replaced a third party search solution, now tightly integrating the browse and search navigation, resulting in a consistent guest experience with minimal maintenance. The previous lack of integration with the third party search solution caused discrepancies in product data (such as pricing and inventory levels between search and browse) resulting in product listing pages that didn’t always match and a process that lacked the flexibility required by the e-commerce business team.”

Those are indeed serious problems for a retail site. How did the switch pan out? The write-up makes it clear that the reseller is very, very happy. Less clear is how, exactly, the system paid off for ULTA. Aside from a tangential reference to “positive Q4 results,” we are given no details. Oh, well. At least the middleman is pleased.

Cynthia Murrell, August 26, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Questioning How To Search New Sound files

August 25, 2014

Sound is an underrated science, but it is quite an amazing topic to study. MIT News reports an amazing experiment: “Extracting Audio From Visual Information.” The article explains that Adobe, Microsoft, and MIT researchers developed an algorithm that can reconstruct an audio signal by analyzing minute vibrations of objects depicted in video. The team has been able to get audible files of the leaves of a potted plant, the surface of a glass of water, aluminum foil, and vibrations from a potato-chip bag.

The sound files can be used by law enforcement organizations, but MIT graduate student Abe Davis says it creates a “new kind of imaging.”

“ ‘We’re recovering sounds from objects,’ [Davis] says. ‘That gives us a lot of information about the sound that’s going on around the object, but it also gives us a lot of information about the object itself, because different objects are going to respond to sound in different ways.’”

The team speculates that the technology community will embrace the research and amazing applications will be developed from it. The new sound technology will also create a new slew of content. How will we search the new content? A specific and exact ontology will be needed to distinguish sound files. Will a search application smart enough to read the sound data be developed to identify the user’s information need? Oh wait, enterprise search systems index “all information” so it already exists.

Whitney Grace, August 25, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Launching and Scaling Elasticsearch

August 21, 2014

Elasticsearch is widely hailed as an alternative to SharePoint or many of the other open source alternatives, but it is not without its problems. Ben Hundley from StackSearch offers his input on the software in his QBox article, “Thoughts on Launching and Scaling Elasticsearch.”

Hundley begins:

“Qbox is a dedicated hosting service for Elasticsearch.  The project began internally to find a more economical solution to Amazon’s Cloudsearch, but it evolved as we became enamored by the flexibility and power of Elasticsearch.  Nearly a year later, we’ve adopted the product as our main priority.  Admittedly, our initial attempt took the wrong approach to scale.  Our assumption was that scaling clusters for all customers could be handled in a generalized manner, and behind the scenes.”

Hundley walks through reader through several considerations that affect their own implementation: knowing your application’s needs, deciding on hardware, monitoring, tuning, and knowing when to scale. These are all decisions that must be made on the front-end, allowing for more effective customization. The upside of an open source solution like Elasticsearch is greater customization, control, and less rigidity. Of course for a small organization, that could also be the downside as time and staffing are more limited and an out-of-the-box solution like SharePoint is more likely to be chosen.

Emily Rae Aldridge, August 21, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

« Previous PageNext Page »