Search Poobahs and Numerical Analysis

June 24, 2013

I read a fascinating article called “Plants ‘Do Maths’ to Control Overnight Food Supplies.” Now I used to think the BBC was a “real” journalism outfit. I have some anecdotal evidence that the outfit has wandered a bit, but I don’t know what journalism is, and I can’t tell the difference between a Jimmy Savile and Savile Row. The article asserted:

Plants have a built-in capacity to do maths, which helps them regulate food reserves at night, research suggests. UK scientists say they were “amazed” to find an example of such a sophisticated arithmetic calculation in biology.

Let’s assume that a Rhododendron or an upscale orchid can calculate. What about humans and “real” journalists?

I noted this interesting write up: “Duck Duck Go’s Post-PRISM Growth Actually Proves No One Cares About “Private” Search.” The main point in my view is that traffic increases have nothing to do with the privacy concerns zooming around like migrating geese.

Here’s a passage I noted:

Over the past few weeks, I’ve done several press interviews about Duck Duck Go, where the the issue of whether it can beat Google by being more “private” has come up. My answer has consistently been “no,” because that’s been the experience of search engines before that have tried this. I can imagine some on Reddit or Hacker News or elsewhere arguing about how this time, it’s different. This time, with all the NSA allegations, privacy is front and center. This is the right time for a private search engine to emerge. I doubt it. Having covered the search engine space for 17 years now, having seen the privacy flare-ups come-and-go, I’d be very surprised if this time, it’s somehow going to cause more change than in the past.

My question, “What would a daffodil or sweet pea make of the data analysis?”

Why not Google it? Seed pod closed. Bees dead. Move on.

Stephen E Arnold

Visual Search Presents a Challenge for People and Computers

June 24, 2013

We found a recent Science Daily article, “Visual Search Function: Where Scene Context Happens in Our Brain,“ to be pretty fascinating. We might write a lot about how search works as far as computers process search but another interesting perspective lies in the investigation of how search happens through our eyes and with the power of our brain behind it.

The brain, since the beginnings of human evolution, has developed a framework for search based mostly on context such as the surrounding environment and scene context.

According to the article, scene context creates a strong bias in search. In one study discussed, many people that were shown images of something that looked like a computer mouse on a desk automatically interpreted the object as a mouse.

Computers are only recently being taught such a skill set found in the area of our brains known as the lateral occipital complex:

‘So, if you’re looking for a computer mouse on a cluttered desk, a machine would be looking for things shaped like a mouse. It might find it, but it might see other objects of similar shape, and classify that as a mouse,’ [Miguel Eckstein, professor in UC Santa Barbara’s Department of Psychological & Brain Sciences] said. Computer vision systems might also not associate their target with specific locations or other objects. So, to a machine, the floor is just as likely a place for a mouse as a desk.

Sure, text search remains a work in progress. But why not go ahead and take on a challenge with visual search?

Megan Feil, June 24, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Search and Content Processing Vendors: Me Too, Me Too

June 23, 2013

We just finished updating a broken Twitter function on the core Overflight system. We will add the fix to the text mining and taxonomy services early next week. Ah, we love Twitter.

In the course of working through the list of companies in Overflight, one of the goslings (pictured below) shared several observations with me. Here they are, and I present them for your intellectual stimulation. Remember. Verify observations, a step which is wise whether the big gosling (me) or a smaller gosling (programmers) generate.

baby goose head bw copy copy

An ArnoldIT-Xenky gosling at rest.

1. Webinars, Webinars, Webinars

According to the coding gosling, most search and content processing vendors are doing webinars. These come in several varieties, like roses I suppose. There is the “sign up and watch” version. There is the “catch us on YouTube or other video hosting service” type. There is the “audio only” either on an existing podcast show like Software Engineering Radio or the company’s tie up with a for fee out.

Is there webinar fatigue? On the part of the company trying to sell software licenses, webinars are apparently an adrenaline-charged sales opportunity. On the attendee side, I know I have webinar fatigue. But webinars won’t be going away any time soon. The cost of in person sales calls, traditional trade shows, and more deep thinking type of marketing are just not compatible with today’s go-go management time allocation calculus. Heck, shooting the breeze with PowerPoints or Keynote slides as visual hooks is a pretty low cost way to get the word out.

Do webinars sell? We have only anecdotal information, but we definitely have first hand experience with sign ups beginning at a good sized number and then after 15 minutes dwindling down to a hearty few nectar sippers.

2. Blogs

Most search and content processing companies have blogs. The problem, according to the goslings, is that most of these blogs are updated on an infrequent and/or irregular basis. The idea of a blog is so easy to conceptualize. Publishing content every day in a consistent, high-value manner. Well, that’s just not something whizzy high technology search and content processing firms embrace.

Read more

Search System Tutorial Simplifies Deep Learning

June 21, 2013

In the Wikipedia UFLDL Tutorial, you can learn the basics of Unsupervised Feature Learning and Deep Learning. Of course the tutorial is meant for those who already have some understanding of machine learning (if you need and even more basic approach, you can visit the Machine Learning Course to catch up on supervised learning, logistics regression and gradient descent). The tutorial covers Sparse Autoencoder, Vectorized implementation, Preprocessing: PCA and Whitening as well as Softmax Regression and Building Deep Networks. One exercise for Self-Taught Learning states,

“In this exercise, we will use the self-taught learning paradigm with the sparse autoencoder and softmax classifier to build a classifier for handwritten digits.You will be building upon your code from the earlier exercises. First, you will train your sparse autoencoder on an “unlabeled” training dataset of handwritten digits. This produces feature that are pen stroke-like…These features will then be used as inputs to the softmax classifier that you wrote in the previous exercise.”

The tutorial walks you through each step with a number of examples and exercises, turning what might be fairly expected to be a complicated process into a veritable textbook- streamlined, straightforward and easy to understand. It turns out search systems can be very simple when automated and partially automated learning are implemented.

Chelsea Kerwin, June 21, 2013

Sponsored by ArnoldIT.com, developer of Augmentext.

Vietnamese Startup Challenges Google

June 19, 2013

Some veterans of Russia’s Internet scene perceived a unique opportunity in Vietnam. “Russians Attempt to Topple Google in Vietnam,” declares The Economic Times of India. Those observers recently helped found an Internet search company in that country called Coc Coc (“Knock Knock” in English). Their workers’ fundamental understanding of the language and culture, they say, make for a more effective Vietnamese algorithm that could unseat Google there.

Google’s situation is complicated by the government’s position on censorship; it is currently working on laws that would further strangle free expression online. It might even require foreign companies to maintain servers within their borders. As we saw in China, Google will (wisely) lose business rather than play ball with a repressive regime. Coc Coc, however, may be more willing to cooperate with its host government.

The article relates:

“‘When I came here, I had some understanding why Vietnam was a good market to beat Google,’ said Mikhail Kostin, the company’s chief search expert and like others in Coc Coc, a veteran at Russia’s largest Internet company, Mail.Ru. ‘But after living here for one year, I understand the language and market much more deeply. I’m sure it’s right.’

“Close to a third of Vietnam’s 90 million people are online and men and women browsing phones and tablets are a common sight in the cafes of its towns and cities. The country’s potential for growth, its young population and good Internet infrastructure have made it an attractive destination for regional and international investors and startups in online content, e-payment and other services.”

Coc Coc also has an advantage over other local startups—plenty of cash. The company will not identify investors, but says it will have over $100 million to spend over the next five years. For its part, Google simply says it welcomes the competition.

Cynthia Murrell, June 19, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Search’XPR Interview Available

June 17, 2013

The developer of Oorace is Search’XPR. The company has set up operations in New York to complement its two offices in France. You can read an exclusive interview with Jean-Luc Marini. I will explore the idea of software which goes beyond key word retrieval and facets in an upcoming KMWorld column. In the meantime, check out the interview on Search Wizards Speak. SWS is the largest collection of first-person explanations of concepts in search, content processing, and analytics. The entire collection is available from the index at http://arnoldit.com/wordpress/wizards-index/.

Stephen E Arnold, June 17, 2013

Sponsored by Xenky, the portal to ArnoldIT

Big Questions about Federated and Universal Search Remain

June 16, 2013

Search Engine Watch re-posted an aggressive article towards Google recently: “Google Should Kill or Radically Change Universal Search Results.” The message comes from Foundem, an UK price comparison firm that has rejected Google’s proposed web search concessions.

These concessions come following the European Commission’s ongoing antitrust investigation into Google’s search business. Foundem believes that their proposed concessions will not lessen Google’s monopoly on web search.

The article tells us that the proposed concessions ignore Google’s monopoly on search:

“Instead, the concessions focus on minor alterations to Google’s “self-serving Universal Search inserts.” According to Foundem’s report, any concessions must address Google’s AdWords search capabilities. Foundem says AdWords will continue to give Google an unfair advantage until they are re-worked. The company says that the current proposal fails to correct Google searches relevance for showing its own services in results. Foundem believes that to truly slow Google’s search monopoly it would have to either eliminate universal search or drastically change it.”

This information reported suggests there is still a big question about federated search results despite the fact that Google’s Universal Search initiative was announced back in 2007.

Megan Feil, June 16, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Google Announces Answer Converse: Anticipate Features Ready in the Near Future

June 13, 2013

The article titled Google Overhauling Flagship Search With “Answer, Converse, Anticipate” on Ars Technica discusses the new features Google announced at its Keynote on March 15, 2013. Answer, Converse, Anticipate are the three sections that encapsulate the new strategy. Answer involves Google’s Knowledge Graph, which has been made to understand “real-world entities” instead of just doing a keyword search. Converse uses Google Now to enable conversational searches with Chrome. Anticipation is an expanded version of Google Now, which the article explains was demoed by Joanna Wright, Google VP.

“Using a development build of Chrome, she called up the new search function with a simple “OK, Google”…and asked about interesting things to do in Santa Cruz. She then asked for details about the Santa Cruz boardwalk, which was listed in the results. After a key question (“OK, Google, how far is it from here?”), Google pinpointed her current location at Moscone and told her the boardwalk was 1 hour and 21 minutes away.”

The ability to understand context would mean this technology outsmarts even Siri. Of course it is not yet ready for release, but Google promised it would be available in the near future. The article does not mention good old boolean, date sorting, and relevance with a nod to precision and recall.

Chelsea Kerwin, June 13, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

LucidWorks Advances Open Source Search

June 13, 2013

LucidWorks has always been a company that strongly believes in investing in the open source community. After all, their value-added software solutions are built on top of leading open source components. It makes sense. But it is also a passion and a commitment from top-level LucidWorks executives. The press release, “LucidWorks Advances Open Source Search at Worldwide Events,” expands on this idea.

It begins:

LucidWorks, the company transforming the way people access information, today announced a number of speaking appearances and product demonstrations taking place throughout June, 2013. As data types and demands become more complex, companies of all sizes increasingly rely on search-enabled applications to sharpen their competitive edge with data-driven insights. LucidWorks’ experts will speak at online and offline events, share best practices and demonstrate product use cases in a continued effort to meet growing demand worldwide for knowledge about Lucene/Solr open source search.”

The article goes on to list the commitments that LucidWorks has made this summer including big events like Berlin Buzzwords and DataStax Cassandra Summit. CEO Paul Doscher wants developers to understand that LucidWorks builds on the most active open source search community, meaning developers who go with LucidWorks can know that their applications will stand the test of time. Check out the schedule and head out to see LucidWorks at an event in your area.

Emily Rae Aldridge, June 13, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Apple and Microsoft: Pals Again?

June 11, 2013

I noted “Exciting New Chapter in Bing’s Collaboration with Apple”. Call me old fashioned, but I was fascinated to see the on-again, off-again relationship between Apple and Microsoft click “on”. The key point for me in the write up was this passage:

Starting this fall with iOS 7, Bing will power Siri’s new integrated web search. When users ask Siri a question either the specific answer or web search links will now be delivered automatically so users can find information even faster.

Mobile search is replacing desktop search as the go-to way for some folks to locate information. The challenge in my opinion boils down to the Apple-Microsoft magnetism versus the pulling power of Google.

Neither Apple nor Microsoft has had the business model to generate Google-scale money from search. My view is that Apple and Microsoft may be facing a quite difficult challenge.

Both companies have the resources to take search to a different place. Can these two firms deliver. The Bing index strikes me as less deep than Google’s. I no longer have current data about the number of urls indexed by Bing, but when I run queries, I find more hits in Google. Volume does not equal relevance, however. Google has a point of possible vulnerability. However, Apple has not delivered high impact search in some of its services. I find the iTunes’ search system sluggish and difficult to use. Trimming a result set to include only audiobooks is not particularly intuitive for one of my colleagues.

Where there are tie ups, there is hope.

Stephen E Arnold, June 11, 2013

Sponsored by Xenky

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta