Search Silver Bullets, Elixirs, and Magic Potions: Thinking about Findability in 2012
November 10, 2011
I feel expansive today (November 9, 2011), generous even. My left eye seems to be working at 70 percent capacity. No babies are screaming in the airport waiting area. In fact, I am sitting in a not too sticky seat, enjoying the announcements about keeping pets in their cage and reporting suspicious packages to law enforcement by dialing 250.
I wonder if the mother who left a pink and white plastic bag with a small bunny and box of animal crackers is evil. Much in today’s society is crazy marketing hype and fear mongering.
Whilst thinking about pets in cages and animal crackers which may be laced with rat poison, and plump, fabric bunnies, my thoughts turned to the notion of instant fixes for horribly broken search and content processing systems.
I think it was the association of the failure of societal systems that determined passengers at the gate would allow a pet to run wild or that a stuffed bunny was a threat. My thoughts jumped to the world of search, its crazy marketing pitches, and the satraps who have promoted themselves to “expert in search.” I wanted to capture these ideas, conforming to the precepts of the About section of this free blog. Did I say, “Free.”
A happy quack to http://www.alchemywebsite.com/amcl_astronomical_material02.html for this image of the 21st century azure chip consultant, a self appointed expert in search with a degree in English and a minor in home economics with an emphasis on finger sandwiches.
The Silver Bullets, Garlic Balls, and Eyes of Newts
First, let me list the instant fixes, the silver bullets, the magic potions, the faerie dust, and the alchemy which makes “enterprise search” work today. Fasten your alchemist’s robe, lift your chin, and grab your paper cone. I may rain on your magic potion. Here are 14 magic fixes for a lousy search system. Oh, one more caveat. I am not picking on any one company or approach. The key to this essay is the collection of pixie dust, not a single firm’s blend of baloney, owl feathers, and goat horn.
- Analytics (The kind equations some of us wrangled and struggled with in Statistics 101 or the more complex predictive methods which, if you know how to make the numerical recipes work, will get you a job at Palantir, Recorded Future, SAS, or one of the other purveyors of wisdom based on big data number crunching)
- Cloud (Most companies in the magic elixir business invoke the cloud. Not even Macbeth’s witches do as good a job with the incantation of Hadoop the Loop as Cloudera,but there are many contenders in this pixie concoction. Amazon comes to mind but A9 gives me a headache when I use A9 to locate a book for my trusty e Reeder.)
- Clustering (Which I associate with Clustify and Vivisimo, but Vivisimo has morphed clustering in “information optimization” and gets a happy quack for this leap)
- Connectors (One can search unless one can acquire content. I like the Palantir approach which triggered some push back but I find the morphing of ISYS Search Software a useful touchstone in this potion category)
- Discovery systems (My associative thought process offers up Clearwell Systems and Recommind. I like Recommind, however, because it is so similar to Autonomy’s method and it has been the pivot for the company’s flip flow from law firms to enterprise search and back to eDiscovery in the last 12 or 18 months)
- Federation (I like the approach of Deep Web Technologies and for the record, the company does not position its method as a magical solution, but some federating vendors do so I will mention this concept. Yhink mash up and data fusion too)
- Natural language processing (My candidate for NLP wonder worker is Oracle which acquired InQuira. InQuira is a success story because it was formed from the components of two antecedent search companies, pitched NLP for customer support,and got acquired by Oracle. Happy stakeholders all.)
- Metatagging (Many candidates here. I nominate the Microsoft SharePoint technology as the silver bullet candidate. SharePoint search offers almost flawless implementation of finding a document by virtue of knowing who wrote it, when, and what file type it is. Amazing. A first of sorts because the method has spawned third party solutions from Austria to t he United States.)
- Open source (Hands down I think about IBM. From Content Analytics to the wild and crazy Watson, IBM has open source tattooed over large expanses of its corporate hide. Free? Did I mention free? Think again. IBM did not hit $100 billion in revenue by giving software away.)
- Relationship maps (I have to go with the Inxight Software solution. Not only was the live map an inspiration to every business intelligence and social network analysis vendor it was cool to drag objects around. Now Inxight is part of Business Objects which is part of SAP, which is an interesting company occupied with reinventing itself and ignored TREX, a search engine)
- Semantics (I have to mention Google as the poster child for making software know what content is about. I stand by my praise of Ramanathan Guha’s programmable search engine and the somewhat complementary work of Dr. Alon Halevy, both happy Googlers as far as I know. Did I mention that Google has oodles of semantic methods, but the focus is on selling ads and Pandas, which are somewhat related.)
- Sentiment analysis (the winner in the sentiment analysis sector is up for grabs. In terms of reinventing and repositioning, I want to acknowledge Attensity. But when it comes to making lemonade from lemons, check out Lexalytics (now a unit of Infonics). I like the Newssift case, but that is not included in my free blog posts and information about this modest multi-vehicle accident on the UK information highway is harder and harder to find. Alas.)
- Taxonomies (I am a traditionalist, so I quite like the pioneering work of Access Innovations. But firms run by individuals who are not experts in controlled vocabularies, machine assisted indexing, and ANSI compliance have captured the attention of the azure chip, home economics, and self appointed expert crowd. Access innovations knows its stuff. Some of the boot camp crowd, maybe somewhat less? I read a blog post recently that said librarians are not necessary when one creates an enterprise taxonomy. My how interesting. When we did the ABI/INFORM and Business Dateline controlled vocabularies we used “real” experts and quite a few librarians with experience conceptualizing, developing, refining, and ensuring logical consistency of our word lists. It worked because even the shadow of the original ABI/INFORM still uses some of our term 30 plus years later. There are so many taxonomy vendors, I will not attempt to highlight others. Even Microsoft signed on with Cognition Technologies to beef up its methods.)
- XML (there are Google and MarkLogic again. XML is now a genuine silver bullet. I thought it was a markup language. Well, not any more, pal.)
Inteltrax: Top Stories, October 31 to November 4
November 7, 2011
Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, its impact on businesses and nations around the globe.
A good overview of this topic was our article, “Businesses Prepare for Analytic Bandwagon” http://inteltrax.com/?p=2674 which showed proof that businesses across all industries and sizes are latching onto the power of big data analytics to improve their bottom lines.
More specifically, we saw its impact on a tiny nation in the story, “New Zealand Stepping onto World BI Stage,” http://inteltrax.com/?p=2687 which showed how that country’s passion for big data with companies like Right Hemisphere and ComOps.
We issued a firm warning to any business trying to get something for nothing in “Freemium BI Software Not the Total Answer to Analytic Woes,” http://inteltrax.com/?p=2694 which warned that free BI tools are no match for the investment of proven analytic tools.
This is a wide swath of analytic focus, but each well worth the attention. Whether it puts a small country on the tech map, offers companies chances to get more competitive or also tempts budgets with worthless freebees, IntelTrax is watching the pulse of the industry to keep readers informed.
Follow the Inteltrax news stream by visiting
Patrick Roland, Editor, Inteltrax
The Perils of Searching in a Hurry
November 1, 2011
I read the Computerworld story “How Google Was Tripped Up by a Bad Search.” I assume that it is pretty close to events as the “real” reporter summarized them.
Let me say that I am not too concerned about the fact that Google was caught in a search trip wire. I am concerned with a larger issue, and one that is quite important as search becomes indexing, facets, knowledge, prediction, and apps. The case reported by Computerworld applies to much of “finding” information today.
Legal matters are rich with examples of big outfits fumbling a procedure or making an error under the pressure of litigation or even contemplating litigation. The Computerworld story describes an email which may be interpreted as having a bright LED to shine on the Java in Android matter. I found this sentence fascinating:
Lindholm’s computer saved nine drafts of the email while he was writing it, Google explained in court filings. Only to the last draft did he add the words “Attorney Work Product,” and only on the version that was sent did he fill out the “to” field, with the names of Rubin and Google in-house attorney Ben Lee.
Ah, the issue of versioning. How many content management experts have ignored this issue in the enterprise. When search systems index, does one want every version indexed or just the “real” version? Oh, what is the “real” version. A person has to investigate and then make a decision. Software and azure chip consultants, governance and content management experts, and busy MBAs and contractors are often too busy to perform this work. Grunt work, I believe, it may be described by some.
What I am considering is the confluence of people who assume “search” works, the lack of time Outlook and iCalandar “priority one” people face, and the reluctance to sit down and work through documents in a thorough manner. This is part of the “problem” with search and software is not going to resolve the problem quickly, if ever.
Source: http://www.clipartguide.com/_pages/0511-1010-0617-4419.html
What struck me is how people in a hurry, assumptions about search, and legal procedures underscore a number of problems in findability. But the key paragraph in the write up, in my opinion, was:
It’s unclear exactly how the email drafts slipped through the net, and Google and two of its law firms did not reply to requests for comment. In a court filing, Google’s lawyers said their “electronic scanning tools” — which basically perform a search function — failed to catch the documents before they were produced, because the “to” field was blank and Lindholm hadn’t yet added the words “attorney work product.” But documents produced for opposing counsel should normally be reviewed by a person before they go out the door, said Caitlin Murphy, a senior product manager at AccessData, which makes e-discovery tools, and a former attorney herself. It’s a time-consuming process, she said, but it was “a big mistake” for the email to have slipped through.
What did I think when I read this?
First, all the baloney—yep, the right word, folks–about search, facets, metadata, indexing, clustering, governance and analytics underscore something I have been saying for a long, long time. Search is not working as lots of people assume it does. You can substitute “eDiscovery,” “text mining,” or “metatagging” for search. The statement holds water for each.
The algorithms will work within limits but the problem with search has to do with language. Software, no matter how sophisticated, gets fooled with missing data elements, versions, and words themselves. It is high time that the people yapping about how wonderful automated systems are stop and ask themselves this question, “Do I want to go to jail because I assumed a search or content processing system was working?” I know my answer.
Second, in the Computerworld write up, the user’s system dutifully saved multiple versions of the document. Okay, SharePoint lovers, here’s a question for you? Does your search system make clear which antecedent version is which and which document is the best and final version? We know from the Computerworld write up that the Google system did not make this distinction. My point is that the nifty sounding yap about how “findable” a document is remains mostly baloney. Azure chip consultants and investment banks can convince themselves and the widows from whom money is derived that a new search system works wonderfully. I think the version issue makes clear that most search and content processing systems still have problems with multiple instances of documents. Don’t believe me. Go look for the drafts of your last PowerPoint. Now to whom did you email a copy? From whom did you get inputs? Which set of slides were the ones on the laptop you used for the briefing? What the “correct” version of the presentation? If you cannot answer the question, how will software?
Inteltrax: Top Stories, October 24 to October 28
October 31, 2011
Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, the economic challenges that are realized and overcome thanks to the use of big data and analytics.
The best example of this situation that we found came from our story, “BI’s a Part of Germany’s Strong Economy,” http://inteltrax.com/?p=2647 showcased the fascinating trend of how one of the few thriving European economies is directly tied to business intelligence and data analytics.
The story, “Analytic Jobs a Possible Economic Solution,” http://inteltrax.com/?p=2652 discussed how analytic work has been steady while other industries dry up. Could data analysis be the fix to sluggish economies?
Another economic staple, FICO credit scores, were magnified in the story, “Pushing 60, FICO Adjusts to Analytics.” http://inteltrax.com/?p=2655 Here, we discovered how the credit giant takes the massive amounts of personal data to streamline its analytic system.”
No matter how you slice it, economics is a hot topic these days. We were pleased to discover a positive side to this talk when paired with analytics. We are optimistic about this union in the future and will continue giving it our attention at IntelTrax.
Follow the Inteltrax news stream by visiting
Patrick Roland, Editor, Inteltrax.
October 31, 2011
Datameer Creates Analytics Platform for Hadoop
October 31, 2011
Software development company Datameer has come up with another Hadoop business intelligence play to maintain the compounded 40 percent per year growth rate in corporate data volume, with the lion’s share of the growth in unstructured data, being produced and consumed.
There are current technical challenges that need to be addressed. Hadoop is moving out costly analytic databases and warehouses, in its push forward has given us yet another crazy acronym—ADBMS. Now Hadoop vendors keeping the Big Data market in a state of churn.
In the Datameer blog write up “Why I Am at Datameer” Brian Smith discusses a potential solution to this issue. He asserted:
Datameer is the first BI/Analytics platform built natively on Hadoop. On the surface it sounds interesting, but in practice the solution is game-changing. The Datameer Analytic Solution (DAS) connects business users directly with the entire volume and variety of their raw Hadoop data and makes it available for comprehensive analysis.
While Smith’s assertions are certainly interesting, we are not sure who is “first” in many of the assertions about the Big Data world. IBM is chugging away. Digital Reasoning is a player. There are, in fact, dozens of companies making claims and counter-claims. Perhaps in a dicey economy, marketing takes precedence over cold, hard facts?
Jasmine Ashton, October 31, 2011
Sponsored by Pandia.com
Big Data for Big Thinkers
October 31, 2011
“Big data analytics” is an emerging term in the storage industry that originated within the open source community to develop analytics processes that were faster and more scalable than traditional data warehousing.
Open source advocates hope to use data to extract value from the vast amounts of unstructured data produced daily by web users. I recently read an interesting Karmasphere write up called “Big Data IS Different— I Knew It!” in which Rich Guth mused about his past year spent at Karmasphere. In the period, his opinion of Big Data requires different analytic techniques than traditional business intelligence products provide. Guth asserted:
Today we announced version 1.5 of our Karmasphere Analyst product, a workspace for performing Big Data Analytics. It implements a new workflow for data analysts to mine and analyze Big Data. We also released a whitepaper “Deriving Intelligence from Big Data in Hadoop – A Big Data Analytics Primer” that describes this workflow, discusses why this workflow is necessary and compares it to traditional BI and data warehousing approaches.
The challenge is to make clear exactly what “old methods” will not work and which “new methods” will work. As important, how does a person using a system with new Big Data methods determine if the outputs are accurate. Who wants to make a decision only to find out that the underlying set up of the new methods were off the mark. Most business intelligence professionals don’t know when an old and well worn method is delivering accurate outputs. Toss in a snappy graphic and the disconnect may become significant.
Jasmine Ashton, October 31, 2011
Sponsored by Pandia.com
Software and Smart Content
October 30, 2011
I was moving data from Point A to Point B yesterday, filtering junk that has marginal value. I scanned a news story from a Web site which covers information technology with a Canadian perspective. The story was “IBM, Yahoo turn to Montreal’s NStein to Test Search Tool.” In 2006, IBM was a pace-setter in search development cost control The company was relying on the open source community’s Lucene technology, not the wild and crazy innovations from Almaden and other IBM research facilities. Web Fountain and jazzy XML methods were promising ways to make dumb content smart, but IBM needed a way to deliver the bread-and-butter findability at a sustainable, acceptable cost. The result was OmniFind. I had made a note to myself that we tested the Yahoo OmniFind edition when it became available and noted:
Installation was fine on the IBM server. Indexing seemed sluggish. Basic search functions generated a laundry list of documents. Ho hum.
Maybe this comment was unfair, but five years ago, there were arguably better search and retrieval systems. I was in the midst of the third edition of the Enterprise Search Report, long since batardized by the azure chip crowd and the “real” experts. But we had a test corpus, lots of hardware, and an interest is seeing for ourselves how tough it was to get an enterprise search system up and running. Our impression was that most people would slam in the system, skip the fancy stuff, and move on to more interesting things such as playing Foosball.
Thanks to Adobe for making software that creates a need for Photoshop training. Source: http://www.practical-photoshop.com/PS2/pages/assign.html
Smart, Intelligent… Information?
In this blast from the past article, NStein’s product in 2006 was “an intelligent content management product used by media companies such as Time Magazine and the BBC, and a text mining tool called NServer.” The idea was to use search plus a value adding system to improve the enterprise user’s search experience.
Now the use of the word “intelligent” to describe a content processing system, reaching back through the decades to computer aided logistics and forward to the Extensible Markup Language methods.
The idea of “intelligent” is a pregnant one, with a gestation period measured in decades.
Flash forward to the present. IBM markets OmniFind and a range of products which provide basic search as a utility function. NStein is a unit of OpenText, and it has been absorbed into a conglomerate with a number of search systems. The investment needed to update, enhance, and extend BASIS, BRS Search, NStein, and the other systems OpenText “sells” is a big number. “Intelligent content” has not been an OpenText buzzword for a couple of years.
The torch has been passed to conference organizers and a company called Thoora, which “combines aggregation, curation, and search for personalized news streams.” You can get some basic information in the TechCrunch article “Thoora Releases Intelligent Content Discovery Engine to the Public.”
In two separate teleconference calls last week (October 24 to 28, 2011), “intelligent content” came up. In one call, the firm was explaining that traditional indexing system missed important nuances. By processing a wide range of content and querying a proprietary index of the content, the information derived from the content would be more findable. When a document was accessed, the content was “intelligent”; that is, the document contained value added indexing.
The second call focused on the importance of analytics. The content processing system would ingest a wide range of unstructured data, identify items of interest such as the name of a company, and use advanced analytics to make relationships and other important facets of the content visible. The documents were decomposed into components, and each of the components was “smart”. Again the idea is that the fact or component of information was related to the original document and to the processed corpus of information.
No problem.
Shift in Search
We are witnessing another one of those abrupt shifts in enterprise search. Here’s my working hypothesis. (If you harbor a life long love of marketing baloney, quit reading because I am gunning for this pressure point.)
Let’s face it. Enterprise search is just not revving the engines of the people in information technology or the chief financial officer’s office. Money pumped into search typically generates a large number of user complaints, security issues, and cost spikes. As content volume goes up, so do costs. The enterprise is not Google-land, and money is limited. The content is quite complex, and who wants to try and crack 1990s technology against the nut of 21st century data flows. Not I. So something hotter is needed.
Second, the hottest trends in “search” have nothing to do with search whatsoever. Examples range from conflating the interface with precision and recall. Sorry. Does not compute for me. The other angle is “mobile.” Sure, search will work when everything is monitored and “smart” software provides a statistically appropriate method suggests will work “most” of the time. There is also the baloney about apps, which is little more than the gameification of what in many cases might better be served with a system that makes the user confront actual data, not an abstraction of data. What this means is that people are looking for a way to provide information access without having to grunt around in the messy innards of editorial policies, precision, recall, and other tasks that are intellectually rigorous in a way that Angry Birds interfaces for business intelligence are not.
Third, companies engaged in content access are struggling for revenue. Sure, the best of the search vendors have been purchased by larger technology companies. These acquisitions guarantee three things.
- The Wild West spirit of the innovative content processing vendors is essentially going to be stamped out. Creativity will be herded into the corporate killing pens, and the “team” will be rendered as meat products for a technology McDonald’s
- The cash sink holes that search vendors research programs were will be filled with procedure manuals and forms. There is no money for blue sky problem solving to crack the tough problems in information retrieval at a Fortune 1000 company. Cash can be better spent on things that may actually generate a return. After all, if the search vendors were so smart, why did most companies hit revenue ceilings and have to turn to acquisitions to generate growth? For firms unable to grow revenues, some just fiddled the books. Others had to get injections of cash like a senior citizen in the last six months of life in a care facility. So acquired companies are not likely to be hot beds of innovation.
- The pricing mechanisms which search vendors have so cleverly hidden, obfuscated, and complexified will be tossed out the window. When a technology is a utility, then giant corporations will incorporate some of the technology in other products to make a sale.
What we have, therefore, is a search marketplace where the most visible and arguably successful companies have been acquired. The companies still in the marketplace now have to market like the Dickens and figure out how to cope with free open source solutions and giant acquirers who will just give away search technology.
Inteltrax: Top Stories, October 17 to October 21
October 24, 2011
Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, the ups and downs for some of the industry’s biggest names.
Those in the know about cloud computing were surprised to see our story, “Amazon Analytics Experiences Setbacks,” http://inteltrax.com/?p=2591 since the book and cloud giant’s analytics offerings aren’t taking off like its Kindle.
On the upswing, we offered “Jaspersoft Climbing the BI Competition Ladder” http://inteltrax.com/?p=2595 detailing how one of our favorite BI vendors has made some bold moves pay off recently.
Back on the negative side of the spectrum, “Google Analytics Gets Weaker in Germany” http://inteltrax.com/?p=2588 tough data mining laws are keeping the search king from knowing too much about Germany’s users.
This is just a taste of the news we deliver. There’s never any telling from day-to-day when a major player will suffer a blow and when a little guy will climb higher. Sometimes vice versa. So we watch the big data game like a hawk, showing all sides of the story to give readers a full view of the roller coaster ride.
Follow the Inteltrax news stream by visiting http://www.inteltrax.com/
Patrick Roland, Editor, Inteltrax.
October 24, 2011
Make Case-Based Approximate Reasoning a Reality
October 23, 2011
I stumbled across an interesting book on Amazon.com that has received a great deal of attention over he past few years. The book is called Case-Based Approximate Reasoning (CBR) by Eyke Hullermeier.
CBR has established itself as a core methodology in the field of artificial intelligence. The key idea of CBR is to tackle new problems by referring to similar problems that have already been solved in the past. One reviewer wrote:
In the last years developments were very successful that have been based on the general concept of case-based reasoning. … will get a lot of attention and for a good while will be the reference for many applications and further research. … the book can be used as an excellent guideline for the implementation of problem-solving programs, but also for courses in Artificial and Computational Intelligence. Everybody who is involved in research, development and teaching in Artificial Intelligence will get something out of it.
The problem with CBR can be the time, effort, and cost required to create and maintain the rules. Automated systems work well if the inputs do not change. Flip in some human unpredictability and the CBR system can require baby sitting.
Jasmine Ashton, October 23, 2011
Sponsored by Pandia.com
Baseball Embraces SAS Analytics
October 20, 2011
Baseball as an institution is known for its love of numbers. Now it’s embracing analytics. KDNuggets reports more in, “Pittsburgh Pirates tap SAS Analytics.”
The article explains the use of statistics and analytics:
As ‘Moneyball’ has become a valued statistical approach to selecting talent, teams such as the Pittsburgh Pirates are also embracing analytics to improve operations and marketing and build stronger relationships with fans. Using SAS Visual Data Discovery, the Pirates surface a treasure trove of fan insights. The point-and-click interface gives quick entry to advanced analytics from SAS, the leader in business analytics.
The Pirates had previously used Microsoft Excel, but it’s widely known that the application of such flat data is challenging. SAS will now allow the club to analyze everything from attendance to marketing to statistics. Now to get back to that business of actually winning some games . . .
Keep in mind that SAS now has the Teragram text processing technology. You can put words with your numbers.
Emily Rae Aldridge, October 20, 2011
Sponsored by Pandia.com