Innovation: Not Slowing, Stopped in Search

October 11, 2015

I read “You Call this Progress?” Good article.

The write up points out that some fans of progress may be annoyed at the notion that today’s whiz kids are painting rooms, not building houses. I highlighted this statement:

I think we should admit that our hypothetical 1885 person would be more bewildered by the passage of 65 years than the 1950 “modern” human. I think we should admit that the breathtaking pace of major breakthroughs has actually declined.

I am in agreement with this pace of innovation thing. Years ago when I was working in the thrilling field of big time investing, I remember a discussion among some of my colleagues. The point of the argument was the notion that innovation was ripping right along. I suggested that innovation was more like breeding hamsters than creating a new order of furry friends.

No one at the financial outfit really cared. The focus was making money on things that would create investment opportunities. Do not confuse making money with creating a jet engine.

I can offer one possible market sector which illustrates that innovation has slowed to a crawl in the last 50 years.

Consider search and retrieval. If we look at the early systems, there were things like string matching and counting stuff. There was lemmatization. In fact, most of today’s search and content processing companies are not offering products dramatically different from what was available decades ago.

Sure, today’s products sport graphical interfaces, exploit fast and cheap hardware, and coding methods which allow a stream of content to be moved through an information factory. These are interesting developments, but the underlying procedures are look for strings, alert software or a person when an anomaly occurs, and convert counts for an entity into a graph. Toss in some geo coordinates and ring investors’ door bells.

The problem with search is that humans often have a tough time expressing exactly what they want. That’s why looking at search histories and asking questions like, “What’s the signal for a person’s looking for a pizza joint?” work pretty well.

Also, humans may not know what the heck they want. Armed with partial or incomplete information, the poor human has to look through books from a library or browse a list or colorful icons until something appears meaningful.

I would suggest that once the marketing hoo-hah is stripped from the descriptions of search and retrieval systems, what’s left over reveals the paucity of innovation in information access. Google’s biggest recent search innovation is providing a pointer to content available within an app. Interesting but not discovering penicillin.

Perhaps that’s why it is easier to ask friends or colleagues than use Fancy Dan tools?

Stephen E Arnold, October 11, 2015

Attivio Does Data Dexterity

October 9, 2015

Enterprise search company Attivio has an interesting post in their Data Dexterity Blog titled “3 Questions for the CEO.” We tend to keep a close eye on industry leader Attivio, and for good reason. In this post, the company’s senior director of product marketing Jane Zupan posed a few questions to her CEO, Stephen Baker, about their role in the enterprise search market. Her first question has Baker explaining his vision for the field’s future, “search-based data discovery”; he states:

“With search-based data discovery, you would simply type a question in your natural language like you do when you perform a search in Google and get an answer. This type of search doesn’t require a visualization tool. So, for example, you could ask a question like ‘tell me what type of weather conditions which exist most of the time when I see a reduction in productivity in my oil wells.’ The answer that comes back, such as ‘snow,’ or ‘sleet,’ gives you insights into how weather patterns affect productivity. Right now, search can’t infer what a question means. They match the words in a query, or keywords, with words in a document. But [research firm] Gartner says that there is an increasing importance for an interface in BI tools that extend BI content creation, analysis and data discovery to non-skilled users. You don’t need to be familiar with the data or be a business analyst or data scientist. You can be anyone and simply ask a question in your words and have the search engine deliver the relevant set of documents.”

Yes, many of us are looking forward to that day. Will Attivio be the first to deliver? The interview goes on to discuss the meaning of the company’s slogan, “the data dexterity company.” Part of the answer involves gaining access to “dark data” buried within organizations’ data silos.  Finally, Zupan asks what  “sets Attivio apart?” Baker’s answers: the ability to quickly access data from more sources; deriving structure from and analyzing unstructured data; and friendliness to “non-technical” users.

Launched in 2008, Attivio is headquartered in Newton, Massachusetts. Their team includes folks with an advantageous combination of backgrounds: in search, database, and business intelligence companies.

Cynthia Murrell, October 9, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Another Categorical Affirmative: Nobody Wants to Invest in Search

October 8, 2015

Gentle readers, I read “Autonomy Poisoned the Well for Businesses Seeking VC Cash.” Keep in mind that I am capturing information which appeared in a UK publication. I find this type of essay interesting and entertaining. Will you? Beats me. One thing is certain. This topic will not be fodder for the LinkedIn discussion groups, the marketers hawking search and retrieval at conferences to several dozen fellow travelers, or in consultant reports promoting the almost unknown laborers in the information access vineyards.

Why not?

The problem with search reaches back a few years, but I will add a bit of historical commentary after I highlight what strikes me as the main point of the write up:

Nobody wants to invest in enterprise search, says startup head. Patrick White, Synata

Many enterprise search systems are a bit like the USS United States, once the slickest ocean liner in the world. The ship looks like a ship, but the effort involved in making it seaworthy is going to be project with a hefty price tag. Implementing enterprise search solutions are similar to this type of ocean-going effort.

There you go. “Nobody.” A categorical in the “category” of logic like “All men are mortal.” Remarkable because outfits like Attivio, Coveo, and Digital Reasoning, among others have received hefty injections of venture capital in recent memory.

The write up makes this interesting point:

“I think Autonomy really messed up [the space]”, and when investors hear ‘enterprise search for the cloud’ it “scares the crap out of them”, he added. “Autonomy has poisoned the well for search companies.” However, White added that Autonomy was just the most high profile example of cases that have scared off investors. “It is unfair just to blame Autonomy. Most VCs have at least one enterprise search in their portfolio. So VCs tend to be skittish about it,” he [added.

I am not sure I agree. Before there was Autonomy, there was Fulcrum Technologies. The company’s marketing literature is a fresh today as it was in the 1990s. The company was up, down, bought, and merged. The story of Fulcrum, at least up to 2009 or so is available at this link.

The hot and cold nature of search and content processing may be traced through the adventures of Convera (formerly Excalibur Technologies) and its relationships with Intel and the NBA, Delphes (a Canadian flame out), Entopia (a we can do it all), and, of course, Fast Search & Transfer.

Now Fast Search, like most old school search technology, is very much with us. For a dose of excitement one can have Search Technologies (founded by some Convera wizards) implement Fast Search (now owned by Microsoft).

Where Are the Former Big Six in Enterprise Search Vendors: 2004 and 2015

Autonomy, now owned by HP and mired in litigation over allegations of financial fraud

Convera, after struggles with Intel and NBA engagements, portions of the company were sold off. Essentially out of business. Alums are consultants.

Endeca, owned by Oracle and sold as an eCommerce and business intelligence service. Oracle gives away its own enterprise search system.

Exalead, owned by Dassault Systèmes and now marketed as a product component system. No visibility in the US.

Fast Search, owned by Microsoft and still available as a utility for SharePoint. The technology dates from the late 1990s. Brand is essentially low profiled at this time.

Verity, Autonomy purchased Verity and used its customer list for upsales and used the K2 technology as part of the sprawling IDOL suite.

Fast Search reported revenues which after an investigation and court procedure were found to be a bit enthusiastic. The founder of Fast Search was the subject of the Norwegian authorities’ attention. You can check out the news reports about the prohibition on work and the sentence handed down for the issues the authorities concluded warranted a slap on the wrist and a tap on the head.

The story of enterprise search has been efforts—sometimes Herculean—to sell information access companies. When a company sells like Vivisimo for about one year’s revenues or an estimated $20 million, there is a sense of getting that mythic task accomplished. IBM, like most of the other acquirers of search technology, try valiantly to convert a utility into something with revenue lift. As I watch the evolution of the lucky exits, my overall impression is that the purchasers realize that search is a utility function. Search can generate consulting and engineering fees, but the customers want more.

That realization leads to the wild and crazy hyper marketing for products like Hewlett Packard’s cloud version of Autonomy’s IDOL and DRE technology or IBM’s embrace of open source search and the wisdom of wrapping that core with functions.

Enterprise search, therefore, is alive and well within applications or solutions that are more directly related to something that speaks to senior managers; namely, making sales and reducing costs.

What’s the cost of making sure the controls for an enterprise search system are working and doing the job the licensee wants done?

The problem is the credit card debt load which Googlers explained quite clearly. Technology outfits, particularly information access players, need more money than it is possible for most firms to generate. This contributes to the crazy flips from search to police analysis, from looking up an entry in a data base to an assertion that customer support is enabled, hunting for an article in this blog is now real time, active business intelligence, or indexing by proper noun like White House morphs into natural language understanding of unstructured text.

Investments are flowing to firms which could be easily positioned as old school search and retrieval operations. Consider Lexmark, a former unit of IBM, and an employer of note not far from my pond filled with mine run off in Kentucky. The company, like Hewlett Packard, wants to find a way to replace its traditional business which was not working as planned as a unit of IBM. Lexmark bought Brainware, a company with patents on trigram methods and a good business for processing content related to legal matters. Lexmark is doing its best to make that into a Trump scale back office content processing business. Lexmark then bought a technology dating from the 1980s (ISYS Search Software once officed in Crow’s Nest I believe) and has made search a cornerstone of the Lexmark next generation health care money spinning machine. Oracle has a number of search properties. Most of these are unknown to Oracle DBAs; for example, Artificial Linguistics, TripleHop, InQuira’s shotgun NLP technology, etc. The point is that the “brands” have not had enough magnetism to pull revenues on a stand alone basis.

Successes measured in investment dollars is not revenue. Palantir is, in effect, a search and retrieval outfit packaged as a super stealthy smart intelligence system. Recorded Future, funded by Google and In-Q-Tel, is doing a bang up job with specialized content processing. There are, remember, search and retrieval companies.

The money in search appears to be made in these plays:

  • The Fast Search model. Short cuts until an investigator puts a stop to the activities.
  • Creating a company and then selling it to a larger firm with a firm conviction that it can turn search into a big time money machine
  • Buying a search vendor to get its customers and opportunities to sell other enterprise software to those customers
  • Creating a super technology play and going after venture funding until a convenient time arrives to cash out
  • Pursue a dream for intelligent software and survive on research grants.

This list does not exhaust what is possible. There are me-too plays. There are mobile niche plays. There are apps which are thinly disguised selective dissemination of information services.

The point is that Autonomy is a member of the search and retrieval club. The company’s revenues came from two principal sources:

  1. Autonomy bought companies like Verity and video indexing and management vendor Virage and then sold other products to these firm’s clients and incorporated some of the acquired technology into products and services which allowed Autonomy to enter a new market. Remember Autonomy and enhanced video ads?
  2. Autonomy managed well. If one takes the time to speak with former Autonomy sales professionals, the message is that life was demanding. Sales professionals including partners had to produce revenue or some face time with the delightful Dr. Michael Lynch or other senior Autonomy executives was arranged.

That’s it. Upselling and intense management for revenues. Hewlett Packard was surprised at the simplicity of the Autonomy model and apparently uncomfortable with the management policies and procedures that Autonomy had been using in highly visible activities for more than a decade as a publicly traded company.

Perhaps some sources of funding will disagree with my view of Autonomy. That is definitely okay. I am retired. My house is paid for. I have no charming children in a private school or university.

The focus should be on what the method for generating revenue is. The technology is of secondary importance. When IBM uses “good enough” open source search, there is a message there, gentle reader. Why reinvent the wheel?

The trick is to ask the right questions. If one does not ask the right questions, the person doing the querying is likely to draw incorrect conclusions and make mistakes. Where does the responsibility rest? When one makes a bad decision?

The other point of interest should be making sales. Stated in different terms, the key question for a search vendor, regardless of camouflage, what problem are you solving? Then ask, “Will people pay money for this solution?”

If the search vendor cannot or will not answer these questions and provide data to be verified, the questioner runs the risk of taking the USS United States for a cruise as soon as you have refurbed the ship, made it seaworthy, and hired a crew.

The enterprise search sector is guilty of making a utility function appear to be a solution to business uncertainty. Why? To make sales. Caveat emptor.

Stephen E Arnold, October 8, 2015

IBM Defines Information Access the Madison Avenue Way

October 7, 2015

Yesterday (October 6, 2015) I wrote a little dialogue about the positioning of IBM as the cognitive computing company. I had a lively discussion at lunch after the story appeared about my suggesting that IBM was making a grand stand play influenced by Madison Avenue thinking, not nuts and bolts realities of making sales and generating revenue.

Well, let’s let IBM rejiggle the line items in its financial statements. That should allow the critics of the company to see that Watson (which is the new IBM) account for IBM revenues. I am okay with that, but for me, the important numbers are the top line revenue and profit. Hey, call me old fashioned.

In the midst of the Gartner talk about IBM, the CNBC exclusive with IBM’s Big Blue dog (maybe just like the Gartner talk and thus not really “exclusive”?), and the wall paper scale ads in the New York Times and Wall Street Journal, there was something important. I don’t think IBM recognizes what it has done for the drifting, financially challenged, and incredibly fragmented search and content processing market. Even the LinkedIn enterprise search discussion group which bristles when I quote Latin phrases to the members of the group will be revivified.

image

Indexing and groupoiing are useful functions. When applied with judgment, an earthworm of unrelated words and phrases may communicate more effectively.

To wit, this is IBM’s definition of Watson which is search based on Lucene, home brew code, and IBM acquisitions’ software:

Author extraction—Lots of “extraction” functions
Concept expansion
Concept insights—I am not sure I understand the concept functions
Concept tagging—Another concept function
Dialog—Part of NLP maybe
Entity extraction—Extraction
Face detection with the charming acronym F****d—Were the Mad Ave folks having a bit of fun?
Feed detection—Aha, image related
Image Link extraction—Aha, keeping track of urls
Image tagging—Aha, image indexing. I wonder is this is recognition or using information in the file or a caption
Keyword extraction
Language detection
Language translation
Message resonance—No clue here in Harrod’s Creek
Natural language classifier—NLP again
Personality insights—Maybe figuring out what the personality of the author of a processed file means?
Question and answer (I think this is natural language processing which incorporates many other functions in this list)—More NLP
Relationship extraction—IBM has technology from its purchase of i2 which performs this function. How does this work on disparate streams of unstructured content? I have some thoughts
Review and rank—Does this mean relevance ranking?
Sentiment analysis—Yes, is a document with the word F****d in it positive or negative
Speech to text—Seems similar to text to speech
Taxonomy—Ah, ha. A system to generate a list of controlled terms. No humans needed? Nah, humans can be billable and it is an IBM function
Text extraction—Another extraction function
Text to speech
Tone analyzer—So what is the tone of a document containing the string F****d?
Tradeoff analytics—Hmm. Now Watson is doing a type of analytics presumably performed on text? What are the thresholds in the numerical recipe? Do the outputs make sense to a normal human?
Visual recognition—Baffller
Watson news—Is this news about Watson or news presented in Watson via a feed-type mechanism. Phrase does not even sound cool to me.

Now that’s a heck of a list. Notice that the word “search” does not appear in the list. I did not spot the word “semantics” either. Perhaps I was asleep at the switch.

When I was in freshman biology class in 1962, Dr. Daphne Swartz, a very traditional cut ‘em up and study ‘em scientist, lectured for 90 minutes about classification. I remember learning about Aristotle and this dividing organizations into two groups: Plants and animal. I know this is rocket science, but bear with me. There was the charmingly named Carolus Linnaeus, a fan of herring I believe, who cooked up the kingdom, genus, species thing. Then there was, much later, the wild and crazy library crowd which spawned Dewey or, as I named him, Mr. Decimal.

Why is this germane?

It seems to me that IBM’s list of Watson functions needs a bit of organization. In fact, some of the items appear to below to other items; for example: language detection and language translation. More egregious is the broad concept of natural language processing. One could, if one were motivated, argue that entity extraction, text extraction, and keyword extraction might look similar to a non-Watsonian intellect. Dr. Swartz would probably have some constructive criticism to offer.

What’s the purpose of this earthworm list?

Beats me. Makes IBM Watson seem more than Lucene with add ons?

Stephen E Arnold, October 7, 2015

Legacy Servers: Upgrade Excitement

October 2, 2015

Enterprise management systems (ECM) were supposed to provide an end all solution for storing and organizing digital data.  Data needs to be stored for several purposes: taxes, historical record, research, and audits.  Government agencies deployed ECM solutions to manage their huge data loads, but the old information silos are not performing up to modern standards.  GCN discusses government agencies face upgrading their systems in “Migrating Your Legacy ECM Solution.”

When ECMs first came online, information was stored in silos programmed to support even older legacy solutions with niche applications.  The repositories are so convoluted that users cannot find any information and do not even mention upgrading the beasts:

“Aging ECM systems are incapable of fitting into the new world of consumer-friendly software that both employees and citizens expect.  Yet, modernizing legacy systems raises issues of security, cost, governance and complexity of business rules  — all obstacles to a smooth transition.  Further, legacy systems simply cannot keep up with the demands of today’s dynamic workforce.”

Two solutions present themselves: data can be moved from an old legacy system to a new one or simply moving the content from the silo.  The barriers are cost and time, but the users will reap the benefits of upgrades, especially connectivity, cloud, mobile, and social features.  There is the possibility of leaving the content in place using interoperability standards or cloud-based management to make the data searchable and accessible.

The biggest problem is actually convincing people to upgrade.  Why fix what is not broken?  Then there is the justification of using taxpayers’ money for the upgrade when the money can be used elsewhere.  Round and round the argument goes.

Whitney Grace, October 2, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

The HP Autonomy Enterprise Search Epic Continues

September 26, 2015

I don’t play baseball anymore. I did. I was okay, but one of the fellows who lived in my neighborhood in central Illinois played very well. He played everyday. After a stellar high school career, he became a fielder in the major leagues. The pressure was too much. He made bad decisions. He tried to claw back to the starting rotation. Instead of swinging with the relaxed, fluid motion I recalled from our days of playing together, he tried to hit a home run every time at bat. His confidence dwindled away, and he became a person who did not perform. Last I heard, he had fallen victim to his inner demons and was searching for a panacea. But, in my opinion, he struck out. Bad management.

Definition of panacea:

noun 1. a remedy for all disease or ills; cure-all. 2. an answer or solution for all problems or difficulties:

I thought about this person when I read “Deal Divided H-P Leaders” in the September 26, 2015, Wall Street Journal. You may need to pay to access this article which is available at as “Hewlett Packard’s Then Chairman Ray Lane Tried to Quash Autonomy Acquisition.”

The main point of the write up is that HP wanted a panacea, and the senior management of HP thought Autonomy, a search and content processing company, was the answer to HP’s revenue challenges.

The Wall Street Journal points out that the Chairman of the Board of Directors was supportive of the multi billion dollar deal and then wanted to kill the deal.

Also, the WSJ identifies what I would call a “management” problem; to wit:

HP missed other red flags in assessing the Autonomy deal. In 2013, the Journal reported that outside auditors for Autonomy had noted that an Autonomy executive had alleged improper accounting practices at the company [Autonomy]. However, HP executives briefed on the allegations hadn’t passed them along to HP’s Board or to Mr. Apotheker [president and Autonomy deal supporter].

The Wall Street Journal article includes a point I made in my 2003 analysis of Autonomy, a version of which appeared in the first edition of the Enterprise Search Report.

Revenues from software which allows employees to locate information germane to work activities has for decades faced a major hurdle; namely, making sales and keeping customers. The problem, which persists today, is that enterprise search vendors have a tough time making basic key word search command the type of license fees and corporate commitment which enterprise resource planning, accounting, and compliance-related systems demand.

Enterprise search vendors have, again for decades, explained that search and retrieval was something more than finding a needed document. The buzzwords used for decades invoke “knowledge management,” “business intelligence,” and “customer support.” Each of these is baloney, but enterprise search vendors trapped. Making search work in the fast changing content environments in which organization operate was a tough technical problem. The costs of engineering fixes was uncontrollable, and, not surprisingly, enterprise search vendors layered on additional functions in an effort to make sales, charge more, and stay in business.

Autonomy, along with IBM and OpenText, were firms which grew search via acquisition. Autonomy was perhaps the most successful of the roll up tacticians. The firm acquired Verity, a system which dated from the 1980s and added it to Autonomy’s earlier video management acquisitions, document management acquisitions, and other bits and pieces accumulated since Autonomy opened for business in the late 1990s.

Each acquisition added revenue to Autonomy’s financial reports and the customers of these acquisitions became candidates for other Autonomy products. At the time of the HP purchase decision, Autonomy had about six or seven times the revenue of Endeca, another late 1990s search vendor. (Oracle bought Endeca for $1.1 billion in 2011. Other search vendors sold in the 2008 t0 2014 period traded from much lower purchase prices; for example, IBM bought Vivisimo for $20 million, a figure which was equivalent to one year Vivisimo revenues.)

HP did not, in my opinion, understand that search and retrieval was a business that broke the backs of many bright MBAs and whiz kid engineers. HP assumed that its management team would triumph in generating billions from Autonomy’s core technology. I think some of Autonomy’s innovations are important, but I know that Autonomy was able to generate six or seven times the revenue of the number two search vendor in 2011 because it managed a portfolio of content processing companies and did a pretty good job of generating revenue from lines of business ADJACENT to search and retrieval.

HP wanted the 1990s technology of Autonomy to generate billions. HP quickly learned that its view of Autonomy did not match what Autonomy’s management team built.

I am not sure how bright folks at HP could not look at the failures of Convera, Delphes, Entopia, Siderean, and other search vendors and not ask, “What’s different about search?”

HP wanted a panacea. HP demonstrates the type of problem my friend who became a major league player had and still has. In the big leagues, swinging for the fences, seeking a silver bullet, and looking for a quick fix is easy. Finding a fix for a company with problematic business models and conflicting management views is very difficult.

What does the HP experience suggest? After decades of enterprise search hyperbole, reality is different from the word picture sales professionals create in the minds of those whose desperation clouds their thinking.

My view is that HP has struck out. Bad management in my opinion.

Stephen E Arnold, September 26, 2016

Intel and Its Search Quest: Maana from Heaven

September 26, 2015

One of my two or three readers sent me a link to “Rethinking Enterprise Search for the Big Data Age.” The write up explains that old-school search won’t do the trick in today’s digital content environment.

I learned that the Manna Search and Discovery Platform is built on a modern Hadoop stack that leverages HDFS, the Accumulo graph database, Apache Spark, heaps of Scala code, and a host of various machine learning algorithms for teasing knowledge out of reams of unstructured data.

The write up veers into a swamp I try to avoid. I am not sure what knowledge is, and I have a heck of a time figuring out how data becomes information. The knowledge part is a mystery for brighter “sparks” to pursue.

The Maana system is a “search and discovery platform.” The write up quotes a Mr. Thompson who explains:

You can tell Maana, ‘I want to know all pieces of equipment that have led to most unplanned downtime,” Thompson says. After telling it to look in the Gulf and entering the appropriate EQP code, the system returns of histogram of pieces of equipment with the most amount of downtime. “So you get very quickly through a simple search and filtering operation a visual representation of the underlying data.”

The magic is that the system:

can join multiple disparate data sets and enable users to search and discover data across them in a semantic method. “It’s very simple to navigate the entire information space, which may be being fed from many different sources simultaneously,” Thompson says. “But you’re working at level of domain concepts.”

Okay, a modern Version of a federating system with clustering, correlation, classification, data mining, semantic, and correlation features.

The open source software issue is an interesting one. The write up points out that Maana relies on Apache Spark. However, I did a quick memory refresh on the Maana Web site which states here that the system is not based on Lucene/Solr.

The company is backed by Conoco Phillips, Chevron, Frost Data Capital, and GE Ventures. I also noticed that Intel has a stake in the company. Intel, in my opinion, continues to explore content processing. After the company’s adventure (maybe misadventure with Convera (formerly Excalibur Technologies), Intel took a stake in Endeca. Endeca sold itself to Oracle and Intel has obviously moved on to Maana.

Will the LucidWorks approach to Big Data capture customers who want to make sense of Big Data? Will Elasticsearch make inroads? My hunch is that Big Data will come under the influence of the systems built to deal with flows of real time data from disparate sources, including audio and video. Most of these firms use open source search and retrieval tools as a utility.

Maana appears to be positioning itself to be a key player in Big Data access. I will wait to see which horses make it to the finish line.

Stephen E Arnold, September 26, 2015

SharePoint Revealed

September 23, 2015

Microsoft SharePoint. It brings smiles to the faces of the consultants and Certified Experts who can make the collection of disparate parts work like refurbished John Harrison clock.

I read “Microsoft SharePoint ECM Suite for Content Management.” The write up explains that SharePoint became available in 2001. The write up does not reference the NCompass Labs’ acqusition or other bits and pieces of the SharePoint story. That’s okay. It is 2015. Ancient history in terms of Internet time. Also, what is content management? Does it include audio, video, and digital images? What about binaries? What about data happily stored on the state of Michigan’s mainframes?

image

 

 

 

 

 Jack Benny’s Maxwell reminds me of Fast Search’s 1998 approach to information access. With Fast Search inside, SharePoint delivers performance that is up to the Maxwell’s standards for speed, reliability, and engineering excellence.

The write up reveals that SharePoint evolved “gradually.” The most recent incarnation of the system includes a number of functions; specifically mentioned in the article, are:

  • A cloud based service
  • A foundation for collaboration and document sharing
  • A server. I thought there were multiple servers. Guess not.
  • A designer component for creating nifty looking user experiences. Isn’t Visual Studio or other programming tool required as well?
  • Cloud storage. Isn’t this redundant?
  • Search

I prefer a more modern approach to information access. The search systems I use are like a Kia Soul. The code often includes hamsters too.

Here’s what the write up says about search:

Microsoft FAST Search, which provides indexing and efficient search of content of all types.

I like the indexing and “efficient” description. The content of “all types” is interesting as well.

How well does Fast Search in its present incarnation handle audio and video? What about real time streams of social media like the Twitter fire hose? You get the idea. “All” is shorthand for “some” content.

I am not captivated by the whizzy features in SharePoint and its content management capabilities. I am not thrilled with building profiles of employees within an organization. I am pretty relaxed when it comes to collaboration. Phones work pretty well. Email is okay too. I work on documents alone and provide a version for authorized individuals to review. I need no big gun system necessary needed. Just a modern one.

What about Fast Search?

Let me highlight a few salient points:

  • The product originated in Norway. You know where Trondheim is, right? Oslo? Of course. Great in the winter too. The idea burst from academia prior to 1998, when the company was officially set up. That makes the architecture an agile, youthful 17 years old.
  • In 2008, Microsoft paid $1.2 billion for a company which was found wanting in its accountancy skills. After investigations and a legal proceeding, the company seems to have had revenues well below its reported $170 million in 2007. Until the HP Autonomy deal, this was a transaction that helped fuel the “search is a big payday” belief. At an estimated $60 million instead of $170, Microsoft paid about 20 times Fast Search’s 2007 earnings. After the legal eagles landed, the founder of Fast Search found himself on the wrong end of a court decision. Think lock up time.
  • Fast Search is famous for me because its founder told me that he was abandoning Web search for the enterprise search market. Autonomy’s revenue seemed to be a number the founder thought was reachable. As time unspooled, the big pay day arrived for Google. Enterprise search did not work out in terms of Google scale numbers. Fast Search backed out of an ad model to pursue an academic vision of information access as the primary enterprise driver.
  • The Fast Search solution which is part of SharePoint has breathed life into dozens of SharePoint search add ins. These range from taxonomy systems to clustering components to complete snap in replacements for the Fast Search components. Hundreds upon hundreds of consultants make their living implementing, improving, and customizing search and retrieval for SharePoint.

Net net: SharePoint has more than 150 million licensees. SharePoint is the new DOS for the enterprise. SharePoint is a consultant’s dream come true.

For me, I prefer simpler and more recent technology. That 17 year old approach seems more like Jack Benny’s Maxwell than a modern search Kia Soul.

Stephen E Arnold, September 23, 2015

Enterprise Search: Search No Longer Big Enough

September 22, 2015

I read the news on LinkedIn. (Registration may be required, gentle reader.) A post by a forum moderator raised the question, “Should be expand enterprise search?” There are other signs of trouble in search land as well. The Paper.Li enterprise search curated newsletter is about Big Data, analytics, education, and—almost as an afterthought—enterprise search in the form of endlessly recycled references to mid tier consulting reports based on what are in my opinion subjective criteria.

Is the implosion of enterprise search complete? Has the shockwave of the Fast Search financial charade caught up with today’s vendors? Is the shadow of the billion dollar bust that was HP’s acquisition of Autonomy/Verity been the straw which broke the camel’s back? Was it the mid tier consulting firm’s enterprise search report which ignored the major player in open source information access? Was it the constant repositioning, faux news releases, and posturing on webinars the karate chop across the throat of search marketers?

I don’t know.

From my point of view, there are high value solutions to the challenge of providing employees with access to certain types of content. One can use the appliance approach of Maxxcat? There is Elasticsearch? The Blossom Software solution is pretty darned good. Specialist solutions are available for parts. There are even semi automated systems to help a user make sense of the noise filled streams of social media content. Think Recorded Future.

Gentle reader, starting in 2003 when I began work on the Enterprise Search Report, sponsored by, of all things, a content management specialist, there were some brand leaders. But these have fallen into disgrace, been absorbed into larger firms with little incentive to invest in research, or crashed and burned as a result of failed implementations.

What remains today are some grim facts:

  • Search is perceived by many information technology professionals as a problem. Enterprise search implementations are often doomed from the git go because few want to hook their careers to projects which have for decades failed to keep users happy and been unable to provide useful results without constant infusions of money, computing cycles, and whiz kids.
  • Open source solutions are available, and they are pretty good. Large companies have the time, staff, resources, and incentive to get away from the proprietary solutions which limit what the licensee can do with the system.
  • Search is an inclusion in the most advanced systems. Consider Recorded Future, Diffeo, or any other cyber centric, next generation system. System is available, but these systems solve specific problems. Search is sort of an apple pie, mother, and love type solution. These generalizations are tough to apply in a business like manner in organizations struggling to pay their bills. Most organizations just use what’s available? Even AutoCAD includes a search function. Oracle, bless its proprietary heart, provides a database licensee with a good enough solution. For those wanting a more robust solution, the Secure Enterprise Search system is available without charge. Yikes.

In my own experience, the sins of the earliest enterprise search vendors like Fulcrum Technologies and Verity have bulldozed a highway built on quicksand. Today’s vendors talk about search in terms of buzzwords like these:

  • Customer support. The idea is a variation of ClearForest’s pitch that one can find answers to customer issues by indexing text.
  • Big Data. I am confident that when I look for information in a Big Data set, I want to use search as a secondary tool. Enterprise search vendors offer analytic routines as add ons or as spin on counting terms which have been extracted.
  • Taxonomy. I love this concept. A company needs to index its content. Nothing improves search, which has not been improved too much in the last 50 years, like machine indexing. Just don’t pin down the vendor on the amount of human intervention that is required to keep the automated system on track.
  • Natural language processing, semantics, and artificial intelligence. The idea is that a search system with smart software can figur4e out what a human generated document means and make it  available to a user easily or, in some cases, BEFORE the user knows she needs access to the information in that document.

There are three problems which vendors and their customers have to wrestle into submission.

First, vendor and customer have to agree on exactly what the information access system is supposed to do. In my experience, this is an important step which is usually given modest, if any, attention. The reason is that instead of narrowing the focus to a specific problem, the problem gets defined in ever widening circles of functionality. The result is cost overruns and disaffected users.

Second, the vendors’ marketing argues that certain functions and benefits are a consequence of installing their software. The flaw is that marketing is easy; implementing a search system which the customer can afford to maintain is very hard. Add to this disconnect the characteristic of some vendors to sell software which is half baked, or, in some cases, not even completed. A certain vendor was kicked off a government procurement list for getting caught with software that did not work.

Third, the customers know that finding information is important. Most enterprise search vendors cannot provide access to the type of content which is growing rapidly and gaining importance with each passing day. I am talking about indexing audio, video, social media generated by employees and contractors, and digital images. The focus has been for a half century on text. That does not work particularly well if one does not select a solution from a handful of vendors with solutions that actually work. Need I repeat Blossom, Elastic, and Maxxcat?

What about today’s flagship vendors? If one embraces the analysis of the mid tier consulting firms, the solutions are ones that are proprietary and have some profile and money due to the ministrations of addled venture capital players looking for the next Google.

There are solutions. Until the LinkedIn pool of job hunters and consultants comes to grip with software that works in a reliable manner, it is unlikely that the enterprise search discussion on LinkedIn will rise above thinly veiled marketing.

Search, gentle reader, is important. There are solutions which work. The problem is that in today’s go go world, those with a veneer of knowledge and expertise are guided by individuals who may be failed webmasters, unemployed journalists, English majors, and self appointed experts.

I have no solution to the crisis in enterprise search. Google muffed the bunny. Microsoft has its Powerset and Fast Search technologies. IBM offers Watson.

Maybe these solutions will work for you. They won’t work for me. Search experts, crisis time. My vantage point is from rural Kentucky. The experts in Manhattan and San Francisco have a much better view. What they see, however, is quite different from what I observe. Just make search bigger. The problems will just fade away, right? Grass is easy to grow in scorched earth, correct?

Stephen E Arnold, September 22, 2015

Oracle Revenues: Implications for HP and IBM

September 17, 2015

Oracle is an interesting company because it owns a number of enterprise search and content processing technologies. For example, decades ago, the company bought the often overlooked Artificial Linguistics. Then Oracle complemented its “Text” and “Secure Enterprise Search” technology with Triple Hop. Gentle reader, I am confident you know about Triple Hop’s clustering methods. Then in a spate of content processing fury, Oracle bought RightNow (Dutch developed indexing technology), InQuira (natural language processing crafted from two early Sillycon Valley search vendors), and Endeca, the now long in the tooth, computationally intensive “Guided Navigation” outfit. And we must not forget the retrieval functions of PL/SQL. Oracle has almost as many search and retrieval systems to nurture as that high flying OpenText outfit in Canada.

With such a backpack of information access goodies, should we expect a revenue report bursting with good news? It struck me as I read “Oracle Beats Profit Estimates by a Penny a Share but Revenue Slides” that search and retrieval may not be a zoo with golden geese.

Oracle delivered earnings which made the fine Wall Street MBAs glow. However, the revenue did not win a gold star.

Set aside Oracle for the nonce.

Think about Hewlett Packard (Autonomy stuff) and IBM (Watson stuff). Both of these outfits are reporting declining revenues too. Both have bet large sums on information access.

My question is, “Will a payoff arrive?”

My other question is, “When the payoff arrives, will it make up for the loss in revenues from old line products and services?”

My hunch is that these big bets on search are current and future ponds of despair.

Now set aside these floundering blue chips.

What about the up and coming search vendors? Life is not easy for vendors of search and content processing technology. There are some bright spots, of course, but vendors with deep roots in traditional search craziness are likely to find revenues insufficient to pay for customer support, bug fixing, and implementation of new technical methods.

Google before its founders did an arabesque into Alphabet figured this out with the high interest credit card of technical debt. When will HP, IBM, and Oracle get the message?

Stephen E Arnold, September 17, 2015

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta