Search Silver Bullets, Elixirs, and Magic Potions: Thinking about Findability in 2012

November 10, 2011

I feel expansive today (November 9, 2011), generous even. My left eye seems to be working at 70 percent capacity. No babies are screaming in the airport waiting area. In fact, I am sitting in a not too sticky seat, enjoying the announcements about keeping pets in their cage and reporting suspicious packages to law enforcement by dialing 250.

I wonder if the mother who left a pink and white plastic bag with a small bunny and box of animal crackers is evil. Much in today’s society is crazy marketing hype and fear mongering.

Whilst thinking about pets in cages and animal crackers which may be laced with rat poison, and plump, fabric bunnies, my thoughts turned to the notion of instant fixes for horribly broken search and content processing systems.

I think it was the association of the failure of societal systems that determined passengers at the gate would allow a pet to run wild or that a stuffed bunny was a threat. My thoughts jumped to the world of search, its crazy marketing pitches, and the satraps who have promoted themselves to “expert in search.” I wanted to capture these ideas, conforming to the precepts of the About section of this free blog. Did I say, “Free.”

A happy quack to http://www.alchemywebsite.com/amcl_astronomical_material02.html for this image of the 21st century azure chip consultant, a self appointed expert in search with a degree in English and a minor in home economics with an emphasis on finger sandwiches.

The Silver Bullets, Garlic Balls, and Eyes of Newts

First, let me list the instant fixes, the silver bullets,  the magic potions, the faerie dust, and the alchemy which makes “enterprise search” work today. Fasten your alchemist’s robe, lift your chin, and grab your paper cone. I may rain on your magic potion. Here are 14 magic fixes for a lousy search system. Oh, one more caveat. I am not picking on any one company or approach. The key to this essay is the collection of pixie dust, not a single firm’s blend of baloney, owl feathers, and goat horn.

  1. Analytics (The kind equations some of us wrangled and struggled with in Statistics 101 or the more complex predictive methods which, if you know how to make the numerical recipes work, will get you a job at Palantir, Recorded FutureSAS, or one of the other purveyors of wisdom based on big data number crunching)
  2. Cloud (Most companies in the magic elixir business invoke the cloud. Not even Macbeth’s witches do as good  a job with the incantation of Hadoop the Loop as Cloudera,but there are many contenders in this pixie concoction. Amazon comes to mind but A9 gives me a headache when I use A9 to locate a book for my trusty e Reeder.)
  3. Clustering (Which I associate with Clustify and Vivisimo, but Vivisimo has morphed clustering in “information optimization” and gets a happy quack for this leap)
  4. Connectors (One can search unless one can acquire content. I like the Palantir approach which triggered some push back but I find the morphing of ISYS Search Software a useful touchstone in this potion category)
  5. Discovery systems (My associative thought process offers up Clearwell Systems and Recommind. I like Recommind, however, because it is so similar to Autonomy’s method and it has been the pivot for the company’s flip flow from law firms to enterprise search and back to eDiscovery in the last 12 or 18 months)
  6. Federation (I like the approach of Deep Web Technologies and for the record, the company does not position its method as a magical solution, but some federating vendors do so I will mention this concept. Yhink mash up and data fusion too)
  7. Natural language processing (My candidate for NLP wonder worker is Oracle which acquired InQuira. InQuira is  a success story because it was formed from the components of two antecedent search companies, pitched NLP for customer support,and got acquired by Oracle. Happy stakeholders all.)
  8. Metatagging (Many candidates here. I nominate the Microsoft SharePoint technology as the silver bullet candidate. SharePoint search offers almost flawless implementation of finding a document by virtue of  knowing who wrote it, when, and what file type it is. Amazing. A first of sorts because the method has spawned third party solutions from Austria to t he United States.)
  9. Open source (Hands down I think about IBM. From Content Analytics to the wild and crazy Watson, IBM has open source tattooed over large expanses of its corporate hide. Free? Did I mention free? Think again. IBM did not hit $100 billion in revenue by giving software away.)
  10. Relationship maps (I have to go with the Inxight Software solution. Not only was the live map an inspiration to every business intelligence and social network analysis vendor it was cool to drag objects around. Now Inxight is part of Business Objects which is part of SAP, which is an interesting company occupied with reinventing itself and ignored TREX, a search engine)
  11. Semantics (I have to mention Google as the poster child for making software know what content is about. I stand by my praise of Ramanathan Guha’s programmable search engine and the somewhat complementary work of Dr. Alon Halevy, both happy Googlers as far as I know. Did I mention that Google has oodles of semantic methods, but the focus is on selling ads and Pandas, which are somewhat related.)
  12. Sentiment analysis (the winner in the sentiment analysis sector is up for grabs. In terms of reinventing and repositioning, I want to acknowledge Attensity. But when it comes to making lemonade from lemons, check out Lexalytics (now a unit of Infonics). I like the Newssift case, but that is not included in my free blog posts and information about this modest multi-vehicle accident on the UK information highway is harder and harder to find. Alas.)
  13. Taxonomies (I am a traditionalist, so I quite like the pioneering work of Access Innovations. But firms run by individuals who are not experts in controlled vocabularies, machine assisted indexing, and ANSI compliance have captured the attention of the azure chip, home economics, and self appointed expert crowd. Access innovations knows its stuff. Some of the boot camp crowd, maybe somewhat less? I read a blog post recently that said librarians are not necessary when one creates an enterprise taxonomy. My how interesting. When we did the ABI/INFORM and Business Dateline controlled vocabularies we used “real” experts and quite a few librarians with experience conceptualizing, developing, refining, and ensuring logical consistency of our word lists. It worked because even the shadow of the original ABI/INFORM still uses some of our term 30 plus years later. There are so many taxonomy vendors, I will not attempt to highlight others. Even Microsoft signed on with Cognition Technologies to beef up its methods.)
  14. XML (there are Google and MarkLogic again. XML is now a genuine silver bullet. I thought it was a markup language. Well, not any more, pal.)

Read more

Mozenda and the Zen of Screen Scraping

September 27, 2011

Mozenda, or “More Zenful Data,” is offering a new approach for comprehensive web data gathering. Their “zen” business style combines with a functional SaaS application to create a new productivity tool. We learn from the company’s Web site:

This concept of creating a productivity tool rather than another application for the IT department resonated well with existing Mozenda customers. In 2008, Mozenda accomplished this goal by launching the first of its kind Software as a Service (SaaS) application for performing comprehensive web data gathering (a.k.a web data extraction, screen scraping, web crawling, web harvesting, etc.), data management, and data publishing. Mozenda, or “More Zenful Data”, is now a reality.

Customers like Attensity and Yahoo! are using Mozenda to obtain content. So when you can’t or don’t want to search using human power, Mozenda could be a good alternative for generating relevant content. An affordable and compelling option, Mozenda will easily compete in the field. Screen scraping is an interesting technical function, and it is one which may lead to some dust ups between content owners and those who repurpose the content. Aggregating many different factoids into a giant repository can reduce some production costs, but the method can put the squeeze on those who create original information.

Emily Rae Aldridge, September 27, 2011

Sponsored by Pandia.com

Search to eMarketing: A Bridge Too Far?

September 14, 2011

I find the creativity of search vendors fascinating. We have the search to customer support method. The play worked for InQuira,which was  snapped up by the Google nemesis Oracle. We have the search to freebie, which is the path Microsoft is taking with the remarkable Fast enterprise search platform. We have the  search to business intelligence play which has worked well for SAS and its Teragram acquisition. Now we have search to eMarketing.

Navigate to “E-marketers Using Less Than Half the Data Available to Them.” Sponsored by Endeca, the article highlights some data which I find useful when I need to support my assertion about how little progress has been made to make information findable and actionable:

half of the respondents report that they are still using multiple tools (at least three or more) to support business intelligence (BI) decisions, underscoring the need for data to be lifted from these separate silos and streamlined into a unified and easy-to-understand view.

The conclusion I draw is that I should use a tool such as Exalead’s search based application technology. But my hunch is that Endeca wants me to embrace the Endeca approach.

A handful of other findings reported strike me as suggestive:

  • 35% say they spend hours combining data from various data sources and over half say they would like to analyze all information in a single view.
  • 48% of respondents say their analytics requirements change at least monthly, with 20% of respondents requirements changing daily or hourly.
  • More than 40% of respondents cite that it often takes months to have their BI requests fulfilled or they often cannot get their requests fulfilled at all.

Will emarketing business intelligence become the next big gold mine for search vendors? My thought is that companies like Attensity have been in the game for a while and may have a head start.

Stephen E Arnold, September 14, 2011

Sponsored by Pandia.com

Google and the Bullies Who Are Not Googley

August 3, 2011

I am tired, in a weird time zone, and in a place that looks like the moon. I took time out from some exciting meetings to read “When Patents Attack Android”. I would have made “android” plural, that’s why I am in the middle of nowhere and Google is sitting on top of the mobile world. Well, not exactly on top because of the pesky Apple, but Google is within striking distance. In one of the many, many Google blogs, Googlers make pithy statements about technology, the world, and getting picked on by uncouth bullies. I am sympathetic. No, I really am. If you were to aim an Attensity or Lexalytics super duper sentiment scooper at me, I would be teary, snuffling, and plagued with hot flashes. I empathize. I really, really do. Here’s the passage that tugged at my frontal cortex empathy

…[Google competitors are] doing this by banding together to acquire Novell’s old patents (the “CPTN” group including Microsoft and Apple) and Nortel’s old patents (the “Rock star” group including Microsoft and Apple), to make sure Google didn’t get them; seeking $15 licensing fees for every Android device; attempting to make it more expensive for phone manufacturers to license Android (which we provide free of charge) than Windows Mobile; and even suing Barnes & Noble, HTC, Motorola, and Samsung. Patents were meant to encourage innovation, but lately they are being used as a weapon to stop it.

When I read this, I thought about Foundem and the other companies who have found Google so sympathetic. I recall meeting Google Search Appliance licensees thrilled with the prompt, helpful, 24X7 customer support. I recall remarks by Web sites whose traffic dropped after Panda strolled through town telling me that their Google representatives were able to answer questions, provide suggested remedies, and volunteering to speak with the keepers of the Page Ran algorithm.

How unjust that competitors are taking action against Google. Ah, ingratitude thy names are plentiful as the monikers of he who rules. I must weep like a guitar purchased from iTunes. I must read about ingratitude in a book from Amazon. I must consult a higher power, an oracle with a mug of Java with unfounded allegations of impropriety by helpless Googlers. Cease. Bully not the Google.

Stephen E Arnold, August 3, 2011

Sponsored by Pandia.com, publisher of The New Landscape of Enterprise Search

ProQuest: A Typo or Marketing?

June 10, 2011

I was poking around with the bound phrase “deep indexing.” I had a briefing from a start up called Correlation Concepts. The conversation focused on the firm’s method of figuring out relationships among concepts within text documents. If you want to know more about Correlation Concepts, you can get more information from the firm’s Web site at http://goo.gl/gnBz6.

I mentioned to Correlation Concepts Dr. Zbigniew Michalewicz’s work in mereology and genetic algorithms and also referenced the deep extraction methods developed by Dr. David Bean at Attensity. I also commented on some of the methods disclosed in Google’s open source content. But Google has become less interesting to me as new approaches have become known to me. Deep extraction requires focus, and I find it difficult to reconcile focus with the paint gun approach Google is now taking in disciplines far removed from my narrow area of interest.

image

A typo is a typo. An intentional mistake may be a joke or maybe disinformation. Source: http://thiiran-muru-arul.blogspot.com/2010/11/dealing-with-mistakes.html

After the interesting demo given to me by Correlation Concepts, I did some patent surfing. I use a number of tools to find, crunch, and figure out which crazily worded filing relates to other, equally crazily worded documents. I don’t think the patent system is much more than an exotic work of fiction and fancy similar to Spenser’s The Faerie Queene.

Deep indexing is important. Key word indexing does not capture in some cases the “aboutness” of a document. As metadata becomes more important, indexing outfits have to cut costs. Human indexers are like tall grass in an upscale subdivision. Someone is going to trim that surplus. In indexing, humans get pushed out for fancy automated systems. Initially more expensive than humans, the automated systems don’t require retirement, health care, or much management. The problem is that humans still index certain content better than automated systems. Toss out high quality indexing and insert algorithmic methods, and you get search results which can vary from indexing update to indexing update.

Read more

Ducks and Alphas: Wolfram Alpha and DuckDuckGo Unite

April 25, 2011

Wolfram|Alpha and DuckDuckGo Partner on API binding and Search Integration,” touts Wolfram Alpha’s own blog. Both organizations have brought something unique to the Search universe, so we’re interested to see what comes of this. Will it be more agile than a Google and Godzilla would? (Googzilla?)

Wolfram|Alpha’s Computational Knowledge Engine not only retrieves data but crunches it for you—very useful, if you phrase your query well. Play with that here.

DuckDuckGo’s claim to fame is that they don’t track us; privacy champions like that. A lot. The site provides brief info, say from a dictionary or Wikipedia, as well as related topics at the top of the results page. It’s also blissfully free of advertising clutter. Check that out here.

According to the Wolfram Alpha blog, they are combining the Wolfram|Alpha functionality with the DuckDuckGo search:

So what does this new partnership mean for you? If you are a DuckDuckGo user, you’ll start to notice expanded Wolfram|Alpha integration. DuckDuckGo will start adding more Wolfram|Alpha functionality and datasets based on users’ suggestions. If there’s a specific topic area you’d like to see integrated into DuckDuckGo, your suggestions are welcome.

And for developers, DuckDuckGo will maintain the free Wolfram Alpha API Perl binding. With that, you can integrate Wolfram|Alpha into your application. Keep in mind that InQuira and Attensity are “products” of similar tie ups.

We’ll enjoy watching the progress of this hybrid beast.

Cynthia Murrell April 25, 2011

Freebie

The Columns of April from Stephen E Arnold

March 30, 2011

Quite a bit of flux in the world of print and online publishing. I am going to need a scorecard to know who publishes which of my for-fee columns. Here’s the line up for April 2011 or a month or two later. The production cycle for some print publications requires two, three, or more months in some cases.

Enterprise Technology Management, owned by IMI Publishing Ltd. “Google Nurtures Its Enterprise Services” talks about some of Google’s more interesting actions germane to its enterprise products and services. One of the points I mention is Google’s hiring Oracle sales and marketing professionals. In a word, “Wow.”

Information Today, owned by Information Today. “Search to Services: The Quiet Enterprise Revolution” explores the shift from licensing software to selling services. Search has become more of a consulting business than a software business in certain circles.

Information World Review, owned by Bizmedia, puts my picture on its home page as I write this on March 18, 2011. Go figure. “Real Time Search and the Search Results Laundry List” talks about the problems of delivering users a laundry list of results for real-time content. I highlight a company called Digital Reasoning and its new Synthesys system. Yes, it is better than a results list.

KMWorld, also owned by Information Today. “The Sentiment Explosion” talks about the use of semantics to solve problems, not provide a subject for lectures on next generation search technology. One of the companies discussed is Attensity and one exemplary product is Hakia’s Sensenews, a stock picking advisory service.

Smart Business Network owns magazines and Web sites. “Google Jazzes Local Advertising Options” talks about Google Tags and how a local business can get an Adwords or “boost” for a compelling $50 a month.

Although it is gratifying to get paid a pittance for these somewhat polished pieces, I am going to have to rethink what I am doing. Across these five publications, the reach is less than that of Beyond Search and Inteltrax, which is a very shocking fact for us dwelling in the heart of darkness in rural Kentucky.

Stephen E Arnold, March 30, 2011

This item is a freebie; the columns are not.

Relevancy and Meaning in New Media

March 6, 2011

In the recent Forbes.com article “Finding Influencers and Influential Events On The Social Web”, an IBM expert for hire expounds on the complexities of analytical solutions in the face of Big Data quandaries.

While attending the recent IBM mega-conference in Orlando, the author found himself in the audience during a discussion of the detailed examination practices applied to the communal media landscape via industry upstarts and innovations. Citing examples such as Klout (the four year old California=based firm that specializes in gauging online influence), he explains the importance of understanding the relationships bred from this form of contact as well as the consistencies therein and its value to new programs and services.

The concentration should not be on models of contact alone, the examination of themes must be a factor as well. This is especially difficult on a technological level given the nuances that exist within speech. Another issue that must be addressed is how to determine individual pertinence. The ability to impact the commercial spectrum, even at one hundred and forty characters or less, can transpire despite an individual’s mastery of a subject. On the plains of communal media, everyone’s voice carries.

One of the more surprising elements of the article was the omission of any reference to the two companies, Attensity and Lexalytics, thought to be leading this new charge of semantic review and management. Could this be an indication of a shift within the industry? Curious indeed.

Micheal Cory, March 6, 2011

Is Customer Support a Revenue Winner for Search Vendors?

February 26, 2011

In a word, “Maybe.” Basic search is now widely available at low or

InQuira has been a player in customer support for a number of years. The big dogs in customer support are outfits like RightNowPega, and a back pack full of off shore outfits. In the last couple of weeks, we have snagged news releases that suggest search vendors are in the customer support business.

Two firms have generated somewhat similar news releases. Coveo, based in Canada, was covered in Search CRM in a story titled “2011 Customer Service Trends: The Mobile Revolution.” The passage that caught our attention was:

The most sophisticated level of mobile enablement includes native applications, such as iPhone applications available from Apple’s App Store, which have been tested and approved by the device manufacturer. Not only do these applications offer the highest level of usability, they allow integration with other device applications. For example, Coveo’s mobile interface for the company’s Customer Information Access Solutions allows you to take action on items in a list of search returns, such as reply to an email or add a comment to a Salesforce.com incident. Like any hot technology trend, when investing in mobile enablement it is important to prioritize projects based on potential return on investment, not “cool” factor.

Okay, mobile for customer support.

Then we saw a few days later “Vivisimo Releases New Customer Experience Optimization Solution” in Destination CRM. Originally a vendor of on-the-fly clustering, Vivisimo has become a full service content processing firm specializing in “information optimization.” The passage that caught our attention was:

Vivisimo has begun to address the needs of these customer-facing professionals with the development of its Customer Experience Optimization (CXO) solution, which gives customer service representatives and account managers quick access to all the information about a customer, no matter where that information is housed and managed—inside or outside a company’s systems, and regardless of the source or type. The company’s products are a hybrid of enterprise search, text-based search, and business intelligence solutions. CXO also targets the $1.4 trillion problem of lost worker productivity fueled by employees losing time looking for information. “All content comes through a single search box,” Calderwood says, “which reduces the amount of time to find information.” CXO works with an enterprise search platform that indexes unstructured data, and a display mechanism that uses analytics to find the data. It sits on top of all the systems and applications a company can have—even hosted applications—and pulls data from them all. It can sync up with major systems from Remedy, Siebel, SAP, Oracle, Microsoft, Salesforce.com, and many others.

So, customer support and customer relationship management it is.

image

Promises are easy to make and sometimes difficult to keep. Source: http://dwellingintheword.wordpress.com/2009/12/29/172-numbers-30-and-31/

I have documented the changes that search and content processing companies have made in the last year. There have been significant executive changes at Lucid Imagination, MarkLogic, and Sinequa. Companies like Attensity and JackBe have shifted from a singular focus on serving a specific business sector to a broader commercial market. Brainware is pushing into document processing and medical information. Recommind has moved from eDiscovery into enterprise search. Palantir, the somewhat interesting visualization and analytics operation, is pushing into financial services, not just government intelligence sectors. There are numerous examples of search vendors looking for revenue love in various market sectors.

So what?

I see four factors influencing search and content processing vendors. I am putting the finishing touches on a “landscape report” in conjunction with Pandia.com about enterprise search. I dipped into the reference material for that study and noted these points:

  1. Repositioning is becoming a standard operating positioning for most search and content processing vendors. Even the giants like Google are trying to find ways to lash their indexing technology to words in hopes of increasing revenue. So wordsmithing is the order of the day. Do these firms have technology that will deliver on the repositioned capability? I am not sure, but I have ample evidence that plain old search is now a commodity. Search does not generate too much excitement among some organizations.
  2. The niches themselves that get attention—customer support, marketers interested in social content, and business intelligence—are in flux. The purpose of customer support is to reduce costs, not put me in touch with an expert who can answer my product question. The social content band wagon is speeding along, but it is unclear if “social media” is useful across a wide swath of business types. Consumer products, yes. Specialty metals, not so much.
  3. A “herd” mentality seems to be operating. Search vendors who once chased “one size fits all” buyers now look at niches. The problem is that some niches like eDiscovery and customer support have quite particular requirements. Consultative selling Endeca-style may be needed, but few search vendors has as many MBA types as Endeca and a handful of other firms. Engineers are not so good at MBA style tailoring, but with staff additions, the gap can be closed, just not overnight. Thus, the herd charges into a sector but there may not be enough grazing to feed everyone.
  4. Significant marketing forces are now at work. You have heard of Watson, I presume. When a company like IBM pushes into search and content processing with a consumer assault, other vendors have to differentiate themselves. Google and Microsoft are also marketing their corporate hearts into 150 beat per minute range. That type of noise forces smaller vendors to amp up their efforts. The result is the type of shape shifting that made the liquid metal terminator so fascinating. But that was a motion picture. Selling information retrieval is real life.

I am confident that the smaller vendors of search and content processing will be moving through a repositioning cycle. The problem for some firms is that their technology is, at the end of the day, roughly equivalent to Lucene/Solr. This means that unless higher value solutions can be delivered, an open source solution may be good enough. Simply saying that a search and retrieval system can deliver eDiscovery, customer support, business intelligence, medical fraud detection, or knowledge management may not be enough to generate needed revenue.

In fact, I think the hunt for revenue is driving the repositioning. Basic search has crumbled away as a money maker. But key word retrieval backed with some categorization is not what makes a customer support solution or one of the other “search positioning plays” work. Each of these niches has specific needs and incumbents who are going to fight back.

Enterprise search and its many variants remains a fascinating niche to monitor. There are changes afoot which are likely to make the known outfits sweat bullets in an effort to find a way to break through the revenue ceilings that seem to be imposed on many vendors of information retrieval technology. Even Google has a challenge, and it has lots of money and smart people. If Google can’t get off its one trick pony, what’s that imply for search vendors with fewer resources?

It is easy to say one has a solution. It is quite another to deliver that solution to an organization with a very real, very large, and very significant problem.

Stephen E Arnold, February 26, 2011

Anti Search in 2011

November 1, 2010

In a recent meeting, several of the participants were charged with disinformation from the azurini.

You know. Azurini, the consultants.

Some of these were English majors, others former print journalists, and some unemployed search engine optimization experts smoked by Google Instant.

But mostly the azurini emphasize that their core competency is search, content management, or information governance (whatever the heck that means). In a month or so, there will be a flood of trend write ups. When the Roman god looks to his left and right, the signal for prognostication flashes through the fabric covered cube farms.

To get ahead of the azurini, the addled goose wants to identify the trends in anti search for 2011. Yep, anti search. Remember that in a Searcher article several years ago, I asserted that search was dead. No one believed me, of course. Instead of digging into the problems that ranged from hostile users to the financial meltdown of some high profile enterprise search vendors, search was the big deal.

And why not? No one can do a lick of work today unless that person can locate a document or “find” something to jump start activity. In a restaurant, people talk less and commune with their mobile devices. Search is on a par with food, a situation that Maslow would find interesting.

The idea for this write up emerged from a meeting a couple of weeks ago. The attendees were trying to figure out how to enhance an existing enterprise search system in order to improve the productivity of the business. The goal was admirable, but the company was struggling to generate revenues and reduce costs.The talk was about search but the subtext was survival.

The needs for the next generation search system included:

  • A great user experience
  • An iPad app to deliver needed information
  • Seamless access to Web and Intranet information
  • Google-like performance
  • Improved indexing and metatagging
  • Access to database content and unstructured information like email.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta