Real Time Conversation with a Mid Tier Wizard

December 9, 2010

I am not making this conversation up. I gave a talk to 43 20 somethings at Skinker’s, a delightful place near London Bridge tube stop. No, I did not buy a Skinker’s T shirt, but it did look smart. My topic was real time search. More accurately, I was explaining the engineering considerations in delivering low latency indexing and querying which most vendors and second string consultants happily tell you is “real time search”.

The most interesting part of my evening was a short conversation I had with a mid tier consultant, what I call an azure chip consultant or generally the azurini. To be a blue chip consultant is easy. Just get hired by one of the two three or four management consulting firms, do some notable work, and not die of a heart attack from the pressure. Thousands of Type A’s who crave constant stroking takes a toll, believe you me. The mid tier lad introduced himself. He reminded me that I had met him before. In the dim light of Skinker’s I would not have  been able to recognize Tess, my deaf white boxer. No matter. A big grin and warm handshake were what the azure chip lad thought would jog my memory.

image

The basic idea is that real time is not achievable. There are gating factors at three main points in any content processing system. The first is the green box, which is the catch all for the service providers, ISPs, and others in the network chain. The pink  boxes represent the vendors providing services to the client who wants low latency service. The yellow boxes represent the different “friction points” behind the firewall or within the organization’s hybrid infrastructure. Resolving these points of “friction” boils down to brains and money. If an organization lacks either, the latency of the system will be high and increase over time. Users, of course, don’t know this. The problems latency produces range from financial losses to field operations personnel being killed due to stale intelligence.

It didn’t.

Anyway, three observations.

Read more

Microsoft, the US Treasury, and Search

December 9, 2010

The new Microsoft-based Treasury.gov Web site works pretty well. Pictures flash, the links work, and the lay out is reasonably clear. There is the normal challenge of government jargon. So “Help, I am going to lose my home” becomes “Homeowner’s HOPE Hotline”.

I am interested in search and retrieval. I wanted to run through my preliminary impressions of the search interface, system responsiveness, and the relevance of the queries. I look at public facing search services differently from most people’s angle of attack. Spare me direct complaints via email. Just put your criticisms, cautions, and comments in the form provided at the foot of this Web page.

Search Interface

The basic search box is in the top right hand corner of the splash page. No problem, and when I navigate to other pages in the Web site, the search box stays put. However, when I click on some links I am whisked outside of the Treasury.gov site and the shift is problematic. No search box on some pages. Here’s an example: http://www.makinghomeaffordable.gov/index.html. Remember my example from the HOPE Hotline reference? Well, that query did not surface content gold on Treasury.gov. I went somewhere else, and I was confused. This probably is a problem peculiar to me, but I found it disconcerting.

Other queries I ran a query for “Treasury Hunt,” a service that allows me to determine if a former Arnold left money or “issues” for me. Here’s the result screen for the query “Treasury Hunt”:

treasury hunt results

The first hit in the result list points to this page:

treasury hunt result 1

The problem is that the hot link from this page points to this Web site, which I could not locate in the results list.

treasury direct explicit link

Several observations:

First, the response time for the system was sluggish, probably two seconds, which was longer than Google’s response time. No big deal, just saying “slower.”

Second, the results list did not return the expected hit. For most people, this makes zero difference. For me, I found the lack of matching hits to explicit links interesting. In fact, I assumed that the results list would have the TreasuryDirect hit at the top of the results list. Not wrong, just not what I expected.

Read more

Which Is Better? Abstract or Full Text Search?

November 26, 2010

Please bear with us while we present a short lesson in the obvious: “Users searching full text are more likely to find relevant articles than searching only abstracts.”  A recent BMC Bioinformatics research article written by Jimmy Lin titled “Is Searching Full Text More Effective than Searching Abstracts?” explores exactly that.

So maybe we opened with the conclusion, but here is some background information.  Since it is no longer an anomaly to view a full-text article online, the author set out to determine if it would be more effective to search full-text versus only the short but direct text of an abstract.  The results:

“Experiments show that treating an entire article as an indexing unit does not consistently yield higher effectiveness compared to abstract-only search. However, retrieval based on spans, or paragraphs-sized segments of full-text articles, consistently outperforms abstract-only search. Results suggest that highest overall effectiveness may be achieved by combining evidence from spans and full articles.”

Yep, at the end of the day, searching from a bank of more words will in fact increase your likeliness of a hit.  The extension here is the future must bring with it some solutions.  Due to the longer length of the full-text articles and the growing digital archive waiting to be tamed, Lin predicts that multiple machines in a cluster as well as distributed text retrieval algorithms will be necessary to effectively handle the search requirements.  Wonder who will be first in line to provide these services…

Sarah Rogers, November 26, 2010

Freebie

Reflections on Ask.com

November 13, 2010

Ask.com used to be the premier search engine for the Internet. According to the article, “IAC’s Barry Diller Surrenders to Google, Ends Ask.com’s Search Effort” they don’t even break the Top Five. Because of this backslide, Diller’s corporation will be laying off 130 engineers and letting the competition take most of its brute force, Web search business.

In the era before Yahoo and Google you could type in any question and your trusty guide, Jeeves, would take you anywhere you needed to go. Not anymore. It seems that Ask.com can no longer keep up with the Jones’s or, in this case, the Google. The write up asserted:

It’s become this huge juggernaut of a company that we really thought we could compete against by innovating. We did a great job of holding our market share but it wasn’t enough to grow the way IAC had hoped we would grow when it bought us.

Google has grown to be the world’s top search engine, and it seems to control 65 percent of the searches performed in the United States.

Some observations:

  • How long will Google be able to sustain brute force indexing? The more interesting services use human input to deliver content.
  • Who will be the next Google? Maybe it will be Facebook?
  • With the rise of “training wheels” on search systems, will most users fiddle with key words? Won’t “get it fast, get it good enough” may become the competitive advantage?

Google is now the old man of search. I see the company moving clumsily. There was the “don’t go to Facebook” payoffs earlier this week. There is the Facebook game and Google watching from the cheap seats.

Changes afoot. I fondly recall the third tier consultant who told me that Ask.com was a winner. I assume that young person is now advising the movers and shakers about search and content processing. Maybe Google needs an advisor to help the firm move from the cheap seats to the starting line up?

Stephen E Arnold and Leslie Radcliff, November 13, 2010

Freebie

Brainware jumps to Version 5.2

November 4, 2010

Short honk: My in box overflowed with a news release about Brainware’s Version 5.2 of its enterprise search system. The news release provides some publicity for a trade show at which Brainware has an exhibit. In addition to helping out the trade show outfit, Brainware called my attention to new features in Version 5.2. These include:

  • More flexible security for processed documents
  • Enhanced indexing of content in relational databases
  • More control over what’s displayed in response to a query.

Brainware’s approach to content processing relies on trigrams for which the firm has a patent. For more information about Brainware, navigate to the firm’s Web site at www.brainware.com. No licensing fee details are available to me at this time. I did see a demo of the new system and I think the firm will give you a peek as well. I had been watching to see if Oracle would acquire Brainware. The database giant seems happy with Brainware’s content acquisition components. Oracle, however, moved in a different direction. I will keep my ear to the shoreline here at the goose pond.

Stephen E Arnold, November 4, 2010

Freebie

Content Analyst Partners with TCDI

November 3, 2010

Lawyers need tools to respond to the demands of their clients. Content Analyst Company, the leader of advanced document analytics tools, helps Technology Concepts & Design, Inc. (TCDI) reduce the time required to analyze information generated by the discovery process.

TCDI and Content Analyst Company Announce Strategic Partnership, Expanding Analytics Capabilities in eDiscovery” reported:

[The companies] will incorporate Content Analyst Analytics Technology (CAAT) into its proprietary eDiscovery Application Suites: Discovery WorkFlow® and ClarVergence®. This partnership offers TCDI’s clients improvements in Document Review efficiencies and increased visibility into their document collections. The enhanced analytics will also reduce the time and cost associated with Document Review.

The tie up will yield improved document review and increased visibility into their document collections.” Content Analyst Company develops advanced document analytics tools, based on patented Latent Semantic Indexing (LSI) technology. Content Analyst Analytical Technology (CAAT) exponentially reduces the time needed to discern relevant information from large volumes of unstructured text. For more information, navigate to www.contentanalyst.com.

Harleena Singh, November 3, 2010

Freebie

Coveo Connects

November 1, 2010

Knowledge and information are directly related to a company’s success. Coveo taps on this aspect as a leading provider of enterprise search and customer information access solutions. The PR-USA.net article “Coveo Announces New Information Indexing Connectors Including Support for Microsoft SharePoint 2010,” tells the story of how “Coveo offers a richer, more integrated view of enterprise knowledge and information compared to what’s available with Microsoft’s native search.”

The article further discloses that through its Enterprise Search 2.0 approach, it is possible for Coveo to “bring the benefits of unified information access to customers faster, and less expensively, than is possible with traditional solutions including SharePoint Search or Microsoft FAST.” Since Coveo dynamically indexes the data and presents it in a unified view, it helps the organizations with instant value of the information and knowledge stored in form of structured and unstructured data across the enterprise, in any system without moving data. Thus, the extended Coveo offers superior functionality and integration. Our recommendation: connect with Coveo.

Harleena Singh, November 1, 2010

Anti Search in 2011

November 1, 2010

In a recent meeting, several of the participants were charged with disinformation from the azurini.

You know. Azurini, the consultants.

Some of these were English majors, others former print journalists, and some unemployed search engine optimization experts smoked by Google Instant.

But mostly the azurini emphasize that their core competency is search, content management, or information governance (whatever the heck that means). In a month or so, there will be a flood of trend write ups. When the Roman god looks to his left and right, the signal for prognostication flashes through the fabric covered cube farms.

To get ahead of the azurini, the addled goose wants to identify the trends in anti search for 2011. Yep, anti search. Remember that in a Searcher article several years ago, I asserted that search was dead. No one believed me, of course. Instead of digging into the problems that ranged from hostile users to the financial meltdown of some high profile enterprise search vendors, search was the big deal.

And why not? No one can do a lick of work today unless that person can locate a document or “find” something to jump start activity. In a restaurant, people talk less and commune with their mobile devices. Search is on a par with food, a situation that Maslow would find interesting.

The idea for this write up emerged from a meeting a couple of weeks ago. The attendees were trying to figure out how to enhance an existing enterprise search system in order to improve the productivity of the business. The goal was admirable, but the company was struggling to generate revenues and reduce costs.The talk was about search but the subtext was survival.

The needs for the next generation search system included:

  • A great user experience
  • An iPad app to deliver needed information
  • Seamless access to Web and Intranet information
  • Google-like performance
  • Improved indexing and metatagging
  • Access to database content and unstructured information like email.

Read more

Open Source Search Run Down

October 25, 2010

Open Source Search with Lucene & Solr” provides a useful overview of information similar to that presented at the Lucene Revolution in Boston, October 7 and 8, 2010. I found the information useful. Even though I poked my head into most sessions and met a number of speakers, Igvita.com has assembled a number of useful factoids. Here’s a selection of four.

First, the Salesforce.com implementation of Lucene “consists of roughly 16 machines, which in turn contain may small and sharded Lucene indexes. Currently, [Salesforce.com] handles 4,000 queries per second (qps) and provides an incremental indexing model where the new user data is searchable within ~ three minutes.”

Second, iTunes is a Lucene user “said to be handling up to 800 queries per second.” I thought Apple was drinking Google Kool-Aid or was before the friction between the two companies entered into a marital separation without counseling.

Third, I found this description of Lucene/Solr interesting:

If Lucene is a low-level IR toolkit, then Solr is the fully-featured HTTP search server which wraps the Lucene library and adds a number of additional features: additional query parsers, HTTP caching, search faceting, highlighting, and many others. Best of all, once you bring up the Solr server, you can speak to it directly via REST XML/JSON API’s. No need to write any Java code or use Java clients to access your Lucene indexes. Solr and Lucene began as independent projects, but just this past year both teams have decided to merge their efforts – all around, great news for both communities. If you haven’t already, definitely take Solr for a spin.

Finally, this passage opened my eyes to some interesting opportunities.

Instead of running Lucene or Solr in standalone mode, both are also easily integrated within other applications. For example, Lucandra is aiming to implement a distributed Lucene index directly on top of Cassandra. Jake Luciani, the lead developer of the project, has recently joined the Riptano team as a full-time developer, so do not be surprised if Cassandra will soon support a Lucene powered IR toolkit as one of its features! At the same time, Lily is aiming to transparently integrate Solr with HBase to allow for a much more flexible query and indexing model of your HBase datasets. Unlike Lucandra, Lily is not leveraging HBase as an index store (see HBasene for that), but runs standalone, albeit tightly integrated Solr servers for flexible indexing and query support.

Navigate to the Igvita Web site and get the full scoop, not a baby cup of goodness.

Stephen E Arnold, October 25, 2010

Freebie

The Bonsai Method: Google and Change

October 24, 2010

When I was in Japan, I watched a bonsai “treasure” work his magic. I liked the idea of binding young shoots with wire and forcing the malleable living things to do what the “treasure wanted.” My guide explained that the “national treasure” could convert any species of tree into a model railroad scale plant. Remarkable.

The problem is that companies in general and a 12 year old Google in particular do not respond to the bonsai master’s interventions the way a sprouting maple does.

Let’s face it. Google is not likely to change in a meaningful way. The aircraft carrier is underway. Even a minor course correction takes a long time. Think about the six versions of the Google Search Appliance before Google could hook Google Apps content into the system.

image

Can the Google oak tree be shaped into a bonsai art work? Not likely, grasshopper.

Google has been chugging along on its “controlled chaos” approach to business for 12 years. If you have worked with juvie offenders, you may have encountered some 12 year olds who are going to grow their own way. Those 12 year olds are on their own often predictable path.

Opinion: Angry birds Android Market Snub Shows Google Has to Change” asserts:

What looks initially to be typical press release waffle is in fact a damning indictment of Google’s Android Market. If its own official retail channel is not seen as the “obvious choice” for a major app developer, something needs to be done to make it so – and fast. Perhaps, in hindsight, we shouldn’t have been so surprised at Rovio’s gutsy move. The signs of general dissatisfaction with Android Market have been there for all to see since its launch. Put bluntly, Android Market is an absolute mess. The navigation experience is blighted by a poor filter system that makes it very hard indeed to hone in on quality paid software. Dubious free ringtone and porn apps clog up the Multimedia, Entertainment, and even Games categories.

The author wanting Google to change is probably not going to do too well at bonsai. How is one to miniature and tame a 12 year old tree? Sure, the tree can be shaped, but the total control stuff is no longer possible. Make a pear tree look like Donald Duck. No problem. Make the pear tree fit into a dish for the dining room sideboard, problem.

I do think Google is changing, but the change has little to do with Angry Birds or even government regulators. Google is changing because of its addiction to money. The shift in StreetView policies is less about fear of legal hassles and mostly about the firm’s ability to get needed data from other methods not widely discussed in the blogosphere.

There are several important changes evident to me. Keep in mind that I look at Google in terms of its technical information freely available as open source content. Here’s my checklist, which you may compare with the Angry Birds’ example in the cited article.

First, Google is going consumer. The company’s roots are in brute force search and solving engineering problems that sank other brute force Web indexing companies. This consumer shift may be a turning point for Google. In my opinion, Google is betting the farm on its understanding of the consumer.

Second, Google faces a world in which Facebook and Apple are the hot tickets. Second or third billing is an issue for those who are sensitive to such shallow accolades. With Xooglers filling the ranks at Facebook, the notion that Google is not number one is a bitter pill in my opinion. Angst can manifest itself in interesting ways. Consider the Google TV which a number of people have found an amusing way to test their technical aptitude. The couch spud? Indifferent.

Third, online advertising is ramping up. But the big money is going to talent centric programming available on the Internet. AdWords is a great business, but a new ad business is emerging and Google has to figure out how to play a big part in that world. Adam Carolla may be a former radio DJ, but his growing empire represents an advertising opportunity that does not lend itself to Google’s algorithms at this moment.

The PocketGamer’s write up about Google and Angry Birds is interesting, but it does not apply to the larger forces at work on and within the Google. Google is in the closing innings of what is its worst public relations year in its 12 year history. Buzz, Wave, Germany, Google Books, and Google TV—quite a track record.

Controlled chaos is the method and it is now showing some flaws. And Google will find it difficult to change. There is no bonsai master able to take a 12 year old tree and squish it down to a seven inch living entity. One big tree does not make a forest.

Stephen E Arnold, October 24, 2010

Freebie

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta