The Equivalent of a Brexit

August 31, 2016

Britain’s historical vote to leave the European Union has set a historical precedent. What is the precedent however? Is it the choice to leave an organization? The choice to maintain their independence? Or is it a basic example of the right to choose? The Brexit will be used as a metaphor for any major upheaval for the next century, so how can it be used in technology context? BA Insight gives us the answer with “Would Your Users Vote ‘Yes’ For Sharexit?”

SharePoint is Microsoft Office’s collaborative content management program. It can be used to organize projects, build Web sites, store files, and allow team members to communicate. Office workers also spurn it across the globe over due to its inefficiencies. To avoid a Sharexit in your organization, the article offers several ways to improve a user’s SharePoint experience. One of the easiest ways to keep SharePoint is to build an individual user interface that handles little tasks to make a user’s life easier. Personalizing the individual SharePoint user experience is another method, so the end user does not feel like another cog in the system but rather that SharePoint was designed for them. Two other suggestions are plain, simple advice: take user feedback and actually use it and make SharePoint the go information center for the organization by putting everything on it.

Perhaps the best advice is making information easy to find on SharePoint:

Documents are over here, discussions over there, people are that way, and then I don’t know who the experts really are. You can make your Intranet a whole lot smarter, or dare we say “intelligent”, if you take advantage of this information in an integrated fashion, exposing your users to connected, but different, information. You can connect documents to the person who wrote them, then to that person’s expertise and connected colleagues, enabling search for your hidden experts. The ones that can really be helpful often reduce chances for misinformation, repetition of work, or errors. To do this, expertise location capabilities can combine contributed expertise with stated expertise, allowing for easy searching and expert identification.

Developers love SharePoint because it is easy to manage and to roll out information or software to every user. End users hate it because it creates more problems than resolving anything. If developers take the time to listen to what the end users need from their SharePoint experience than can avoid an Sharexit.

Whitney Grace, August 31, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Analytics, Data, News, Search, SharePoint | Comments Off on The Equivalent of a Brexit

Facebook and Humans: Reality Is Not Marketing

May 16, 2016

I read “Facebook News Selection Is in Hands of Editors Not Algorithms, Documents Show.” The main point of the story is that Facebook uses humans to do work. The idea is that algorithms do not seem to be a big part of picking out what’s important.

The write up comes from a “real” journalism outfit. The article points out:

The boilerplate about its [Facebook’s] news operations provided to customers by the company suggests that much of its news gathering is determined by machines: “The topics you see are based on a number of factors including engagement, timeliness, Pages you’ve liked and your location,” says a page devoted to the question “How does Facebook determine what topics are trending?”

After reading this, I thought of Google’s poetry created by its artificial intelligence system. Here’s the line which came to mind:

I started to cry. (Source: Quartz)

I vibrate with the annoyance bubbling under the surface of the newspaper article. Imagine. Facebook has great artificial intelligence. Facebook uses smart software. Facebook open sources its systems and methods. The company says it is at the cutting edge of replacing humans with objective procedures.

The article’s belief in baloney is fried and served cold on stale bread. Facebook uses humans. The folks at real journalism outfits may want to work through articles like “Different Loci of Semantic Interference in Picture Naming vs. Word-Picture Matching Tasks” to get a sense of why smart systems go wandering.

So what’s new? Palantir Technologies uses humans to index content. Without that human input, the “smart” software does some useful work, but humans are part of the work flow process.

Other companies use humans too. But the marketing collateral and the fizzy presentations at fancy conferences paint a picture of a world in which cognitive, artificially intelligent, smart systems do the work that subject matter experts used to do. Humans, like indexers and editors, are no longer needed.

Now reality pokes is rose tinted fingertips into the real world.

Let me be clear. One reason I am not happy with the verbiage generated about smart software is one simple fact.

Most of the smart software systems require humans to fiddle at the beginning when a system is set up, while the system operates to deal with exceptions, and after an output is produced to figure out what’s what. In short, smart software is not that smart yet.

There are many reasons but the primary one is that the math and procedures underpinning many of the systems with which I am familiar are immature. Smart software works well when certain caveats are accepted. For example, the vaunted Watson must be trained. Watson, therefore, is not that much different from the training Autonomy baked into its IDOL system in the mid 1990s. Palantir uses humans for one simple reason. Figuring out what’s important to a team under fire with software works much better if the humans with skin in the game provide indexing terms and identify important points like local names for stretches of highway where bombs can be placed without too much hassle. Dig into any of the search and content processing systems and you find expenditures for human work. Companies licensing smart systems which index automatically face significant budget overruns, operational problems because of lousy outputs, and piles of exceptions to either ignore or deal with. The result is that the smoke and mirrors of marketers speaking to people who want a silver bullet are not exactly able to perform like the carefully crafted demonstrations. IBM i2 Analyst’s Notebook requires humans. Fast Search (now an earlobe in SharePoint) requires humans. Coveo’s system requires humans. Attivio’s system requires humans. OpenText’s suite of search and content processing requires humans. Even Maxxcat benefits from informed set up and deployment. Out of the box, dtSearch can index, but one needs to know how to set it up and make it work in a specific Microsoft environment. Every search and content processing system that asserts that it is automatic is spackling flawed wallboard.

For years, I have given a lecture about the essential sameness of search and content processing systems. These systems use the same well known and widely taught mathematical procedures. The great breakthroughs at SRCH2 and similar firms amount to optimization of certain operations. But the whiziest system is pretty much like other systems. As a result, these systems perform in a similar manner. These systems require humans to create term lists, look up tables of aliases for persons of interest, hand craft taxonomies to represent the chunk of reality the system is supposed to know about, and other “libraries” and “knowledgebases.”

The fact that Watson is a source of amusement to me is precisely because the human effort required to make a smart system work is never converted to cost and time statements. People assume Watson won Jeopardy because it was smart. People assume Google knows what ads to present because Google’s software is so darned smart. People assume Facebook mines its data to select news for an individual. Sure, there is automation of certain processes, but humans are needed. Omit the human and you get the crazy Microsoft Tay system which humans taught to be crazier than some US politicians.

For decades I have reminded those who listened to my lectures not to confuse what they see in science fiction films with reality. Progress in smart software is evident. But the progress is very slow, hampered by the computational limits of today’s hardware and infrastructure. Just like real time, the concept is easy to say but quite expensive and difficult to implement in a meaningful way. There’s a reason millisecond access to trading data costs so much that only certain financial operations can afford the bill. Smart software is the same.

How about less outrage from those covering smart software and more critical thinking about what’s required to get a system to produce a useful output? In short, more info and less puffery, more critical thinking and less sawdust. Maybe I imagined it but both the Google and Tesla self driving vehicles have crashed, right? Humans are essential because smart software is not as smart as those who believe in unicorns assume. Demos, like TV game shows, require pre and post production, gentle reader.

What happens when humans are involved? Isn’t bias part of the territory?

Stephen E Arnold, May 16, 2016

Written by Stephen E. Arnold · Filed Under AI, News | Comments Off on Facebook and Humans: Reality Is Not Marketing

Enterprise Search Revisionism: Can One Change What Happened

March 9, 2016

I read “The Search Continues: A History of Search’s Unsatisfactory Progress.” I noted some points which, in my opinion, underscore why enterprise search has been problematic and why the menagerie of experts and marketers have put search and retrieval on the path to enterprise irrelevance. The word that came to mind when I read the article was “revisionism” for the millennials among us.

The write up ignores the fact that enterprise search dates back to the early 1970s. One can argue that IBM’s Storage and Information Retrieval System (STAIRS) was the first significant enterprise search system. The point is that enterprise search as a productized service has a history of over promising and under delivering of more than 40 years.

Enterprise search with a touch of Stalinist revisionism.

Customers said they wanted to “find” information. What those individuals meant was have access to information that provided the relevant facts, documents, and data needed to deal with a problem.

Because providing on point information was and remains a very, very difficult problem, the vendors interpreted “find” to mean a list of indexed documents that contained the users’ search terms. But there was a problem. Users were not skilled in crafting queries which were essentially computer instructions between words the index actually contained.

After STAIRS came other systems, many other systems which have been documented reasonably well in Bourne and Bellardo-Hahn’s A History of Online information Services 1963-1976. (The period prior to 1970 describes for-fee research centric online systems. STAIRS was among the most well known early enterprise information retrieval system.) I provided some history in the first three editions of the Enterprise Search Report, published from 2003 to 2007. I have continued to document enterprise search in the Xenky profiles and in this blog.

The history makes painful reading for those who invested in many search and retrieval companies and for the executives who experienced the crushing of their dreams and sometimes career under the buzz saw of reality.

In a nutshell, enterprise search vendors heard what prospects, workers overwhelmed with digital and print information, and unhappy users of those early systems were saying.

The disconnect was that enterprise search vendors parroted back marketing pitches that assured enterprise procurement teams of these functions:

Easy to use
“All” information instantly available
Answers to business questions
Faster decision making
Access to the organization’s knowledge.

The result was a steady stream of enterprise search product launches. Some of these were funded by US government money like Verity. Sure, the company struggled with the cost of infrastructure the Verity system required. The work arounds were okay as long as the infrastructure could keep pace with the new and changed word-centric documents. Toss in other types of digital information, make the system perform ever faster indexing, and keep the Verity system responding quickly was another kettle of fish.

Research oriented information retrieval experts looked at the Verity type system and concluded, “We can do more. We can use better algorithms. We can use smart software to eliminate some of the costs and indexing delays. We can [ fill in the blank ].

The cycle of describing what an enterprise search system could actually deliver was disconnected from the promises the vendors made. As one moves through the decades from 1973 to the present, the failures of search vendors made it clear that:

Companies and government agencies would buy a system, discover it did not do the job users needed, and buy another system.
New search vendors picked up the methods taught at Cornell, Stanford, and other search-centric research centers and wrap on additional functions like semantics. The core of most modern enterprise search systems is unchanged from what STAIRS implemented.
Search vendors came like Convera, failed, and went away. Some hit revenue ceilings and sold to larger companies looking for a search utility. The acquisitions hit a high water mark with the sale of Autonomy (a 1990s system) to HP for $11 billion.

What about Oracle, as a representative outfit. Oracle database has included search as a core system function since the day Larry Ellison envisioned becoming a big dog in enterprise software. The search language was Oracle’s version of the structured query language. But people found that difficult to use. Oracle purchased Artificial Linguistics in order to make finding information more intuitive. Oracle continued to try to crack the find information problem through the acquisitions of Triple Hop, its in-house Secure Enterprise Search, and some other odds and ends until it bought in rapid succession InQuira (a company formed from the failure of two search vendors), RightNow (technology from a Dutch outfit RightNow acquired), and Endeca. Where is search at Oracle today? Essentially search is a utility and it is available in Oracle applications: customer support, ecommerce, and business intelligence. In short, search has shifted from the “solution” to a component used to get started with an application that allows the user to find the answer to business questions.

I mention the Oracle story because it illustrates the consistent pattern of companies which are actually trying to deliver information that the u9ser of a search system needs to answer a business or technical question.

I don’t want to highlight the inaccuracies of “The Search Continues.” Instead I want to point out the problem buzzwords create when trying to understand why search has consistently been a problem and why today’s most promising solutions may relegate search to a permanent role of necessary evil.

In the write up, the notion of answering questions, analytics, federation (that is, running a single query across multiple collections of content and file types), the cloud, and system performance are the conclusion of the write up.

Wrong.

The use of open source search systems means that good enough is the foundation of many modern systems. Palantir-type outfits, essential an enterprise search vendors describing themselves as “intelligence” providing systems,, uses open source technology in order to reduce costs, shift bug chasing to a community, The good enough core is wrapped with subsystems that deal with the pesky problems of video, audio, data streams from sensors or similar sources. Attivio, formed by professionals who worked at the infamous Fast Search & Transfer company, delivers active intelligence but uses open source to handle the STAIRS-type functions. These companies have figured out that open source search is a good foundation. Available resources can be invested in visualizations, generating reports instead of results lists, and graphical interfaces which involve the user in performing tasks smart software at this time cannot perform.

For a low cost enterprise search system, one can download Lucene, Solr, SphinxSearch, or any one of a number of open source systems. There are low cost (keep in mind that costs of search can be tricky to nail down) appliances from vendors like Maxxcat and Thunderstone. One can make do with the craziness of the search included with Microsoft SharePoint.

For a serious application, enterprises have many choices. Some of these are highly specialized like BAE NetReveal and Palantir Metropolitan. Others are more generic like the Elastic offering. Some are free like the Effective File Search system.

The point is that enterprise search is not what users wanted in the 1970s when IBM pitched the mainframe centric STAIRS system, in the 1980s when Verity pitched its system, in the 1990s when Excalibur (later Convera) sold its system, in the 2000s when Fast Search shifted from Web search to enterprise search and put the company on the road to improper financial behavior, and in the efflorescence of search sell offs (Dassault bought Exalead, IBM bought iPhrase and other search vendors), and Lexmark bought Brainware and ISYS Search Software.

Where are we today?

Users still want on point information. The solutions on offer today are application and use case centric, not the silly one-size-fits-all approach of the period from 2001 to 2011 when Autonomy sold to HP.

Open source search has helped create an opportunity for vendors to deliver information access in interesting ways. There are cloud solutions. There are open source solutions. There are small company solutions. There are more ways to find information than at any other time in the history of search as I know it.

Unfortunately, the same problems remain. These are:

As the volume of digital information goes up, so does the cost of indexing and accessing the sources in the corpus
Multimedia remains a significant challenge for which there is no particularly good solution
Federation of content requires considerable investment in data grooming and normalizing
Multi-lingual corpuses require humans to deal with certain synonyms and entity names
Graphical interfaces still are stupid and need more intelligence behind the icons and links
Visualizations have to be “accurate” because a bad decision can have significant real world consequences
Intelligent systems are creeping forward but crazy Watson-like marketing raises expectations and exacerbates the credibility of enterprise search’s capabilities.

I am okay with history. I am not okay with analyses that ignore some very real and painful lessons. I sure would like some of the experts today to know a bit more about the facts behind the implosions of Convera, Delphis, Entopia, and many other companies.

I also would like investors in search start ups to know a bit more about the risks associated with search and content processing.

In short, for a history of search, one needs more than 900 words mixing up what happened with what is.

Stephen E Arnold, March 9, 2016

Written by Stephen E. Arnold · Filed Under Business strategy, Enterprise search, News | Comments Off on Enterprise Search Revisionism: Can One Change What Happened

Enterprise Search Is a Growth Industry: No, Really

October 16, 2015

I noticed two things when we were working through the Overflight news about proprietary vendors of enterprise search systems on October 14, 2015.

First, a number of enterprise search vendors which the Overflight system monitors, are not producing substantive news. Aerotext, Dieselpoint, and even Polyspot are just three firms with no buzz in social media or in traditional public relations channels. Either these outfits are so busy that the marketers have no time to disseminate information or there is not too much to report.

Second, no proprietary enterprise search vendor is marketing search and retrieval in the way Autonomy and the now defunct Convera used to market. There were ads, news releases, and conference presentations. Now specialist vendors talk about webinars, business intelligence, Big Data, and customer support solutions. These outfits are mostly selling consulting firms. Enterprise search as a concept is not generating much buzz based on the Overflight data.

Imagine my surprise when I read “Enterprise Search Market Expanding at a 12.2% CAGR by 2019.” What a delicious counterpoint to the effective squishing of the market sector which husbanded the Autonomy and Fast Search & Transfer brouhahas. These high profile enterprise search vendors found themselves mired in legal hassles. In fact, the attention given to these once high profile search vendors has made it difficult for today’s vendors to enjoy the apparent success that Autonomy and Fast Search enjoyed prior to their highly publicized challenges.

Open source search solutions have become the popular and rational solution to information access. Companies offering Lucene, Solr, and other non proprietary information access systems have made it difficult for vendors of proprietary solutions to generate Autonomy-scale revenue. The money seems to be in consulting and add ons. The Microsoft SharePoint system supports a hot house of third party components which improve the SharePoint experience. The problem is that none of the add in and component vendors are likely to reach Endeca-scale revenues.

Even IBM with its Watson play seems to be struggling to craft a sustainable, big money revenue stream. Scratch the surface of Watson and you have an open source system complemented with home brew code and technology from acquired companies.

The write up reporting the double digit comp9ound growth rate states:

According to a recent market study published by Transparency Market Research (TMR), titled “Enterprise Search Market – Global Industry Analysis, Size, Share, Growth, Trends and Forecast 2013 – 2019”, the global enterprise search market is expected to reach US$3,993.7 million by 2019, increasing from US$1,777.5 million in 2012 and expanding at a 12.2% CAGR from 2013 to 2019. Enterprise search system makes content from databases, intranets, data management systems, email, and other sources searchable. Such systems enhance the productivity and efficiency of business processes and can save as much as 30% of the time spent by employees searching information.The need to obtain relevant information quickly and the availability of technological applications to obtain it are the main factors set to drive the global enterprise search market.

TMR, like other mid tier consulting firms, will sell some reports to enterprise search vendors who need some good news about the future of the market for their products.

The write up also contains a passage which I found quite remarkable:

To capitalize on opportunities present in the European regional markets, major market players in the U.S. are tying up with European vendors to provide enterprise search solutions.

Interesting. I do not agree. I don’t see to many US outfits tying up with Antidot, Intrafind, or Sinequa and their compatriots. Folks are using Elasticsearch, but I don’t categorize these relationships as tie ups like the no cash merger between Lexalytics and its European partner.

Furthermore, we have the Overflight data and evidence that enterprise search is a utility function increasingly dominated by open source options and niche players. Where are the big brands of a decade ago: Acquired, out of business, discredited, and adorned with jargon.

The problems include sustainable revenue, the on going costs of customer support, and the appeal of open source solutions.

Transparency Market Research seems to know more than I do about enterprise search and its growth rate. That’s good. Positive. Happy.

Stephen E Arnold, October 16, 2015

Written by Stephen E. Arnold · Filed Under Consulting, Enterprise search, Marketing, News | Comments Off on Enterprise Search Is a Growth Industry: No, Really

Software Market Begs for Integration Issue Relief

July 2, 2015

A recent report proves what many users already know: integrating an existing CMS with new and emerging software solutions is difficult. As quickly as software emerges and changes, users are finding that hulking overgrown CMS solutions are lagging behind in terms of agility. SharePoint is no stranger to this criticism. Business Solutions offers more details in their article, “ISVs: Study Shows Microsoft SharePoint Is Open To Disruption.”

“A report from Software Advice surveyed employees that use content management systems (CMS) on a daily basis and found 48 percent had considerable problems integrating their CMS with their other software solutions. The findings mirror a recent AIIM report that found only 11 percent of companies experienced successful Microsoft SharePoint implementation . . . The results of this report indicate that the CMS market is ripe for disruption if a software vendor could solve the integration issues typically associated with SharePoint.”

No doubt, Microsoft understands the concerns and perceived threats, and will attempt to solve some of the issue with the upcoming release of SharePoint Server 2016. However, the fact remains that SharePoint is a big ship to turn, and change will not be dramatic or happen overnight. In the meantime, stay on top of the latest news for tips, tricks, and third-party solutions that may ease some of the pain. Look to Stephen E. Arnold and his SharePoint feed on ArnoldIT.com in order to stay in touch without a huge investment in time.

Emily Rae Aldridge, July 2, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Data, Microsoft, News, Search, SharePoint, Technology | Comments Off on Software Market Begs for Integration Issue Relief

Improving the Preservica Preservation Process

April 17, 2015

Preservica is a leading program for use in digital preservation, consulting, and research, and now it is compatible with Microsoft SharePoint. ECM Connection has the scoop on the “New Version Of Preservica Aligns Records Management And Digital Preservation.” The upgrade to Preservica will allow SharePoint managers to preserve content from SharePoint as well as Microsoft Outlook, a necessary task as most companies these days rely on the Internet for business and need to archive transactions.

Preservica wants to become a bigger part of enterprise system strategies such as enterprise content management and information governance. One of their big selling points is that Preservica will archive information and keep it in a usable format, as obsoleteness becomes a bigger problem as technology advances.

“Jon Tilbury, CEO Preservica adds: ‘The growing volume and diversity of digital content and records along with rapid technology and IT refresh rates is fuelling the need for Records and Compliance managers to properly safe-guard their long-term and permanent digital records by incorporating Digital Preservation into their overall information governance lifecycle. The developing consensus is that organizations should consider digital preservation from the outset – especially if they hold important digital records for more than 10 years or already have records that are older than 10 years. Our vision is to make this a pluggable technology so it can be quickly and seamlessly integrated into the corporate information landscape.’ ”

Digital preservation with a compliant format is one of the most overlooked problems companies deal with. They may have stored their records on a storage device, but if they do not retain the technology to access them, then the records are useless. Keeping files in a readable format not only keeps them useful, but it also makes the employee’s life who has to recall them all the easier.

Whitney Grace, April 17, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Written by Stephen E. Arnold · Filed Under Business intelligence, Business strategy, Data, Database, Digital Library, Enterprise, Microsoft, News, Search, SharePoint, Technology | Comments Off on Improving the Preservica Preservation Process

Semantic Search Becomes Search Engine Optimization: That Is Going to Improve Relevance

March 27, 2015

I read “The Rapid Evolution of Semantic Search.” It must be my age or the fact that it is cold in Harrod’s Creek, Kentucky, this morning. The write up purports to deliver “an overview of the history of semantic search and what this means for marketers moving forward.” I like that moving forward stuff. It reminds me of Project Runway’s “fashion forward.”

The write up includes a wonky graphic that equates via an arrow Big Data and metadata, volume, smart content, petabytes, data analysis, vast, structured, and framework. Big Data is a cloud with five little arrows pointing down. Does this mean Big Data is pouring from the sky like yesterday’s chilling rain?

The history of the Semantic Web begins in 1998. Let’s see that is 17 years ago. The milestone is in the context of the article, the report “Semantic Web road Map.” I learned that Google was less than a month old. I thought that Google was Backrub and the work on what was named Google begin a couple, maybe three years, earlier. Who cares?

The Big Idea is that the Web is an information space. That sounds good.

Well in 2012, something Big happened. According to the write up Google figured out that 20 percent of its searches were “new.” Aren’t those pesky humans annoying. The article reports:

long tail keywords made up approximately 70 percent of all searches. What this told Google was that users were becoming interested in using their search engine as a tool for answering questions and solving problems, not just looking up facts and finding individual websites. Instead of typing “Los Angeles weather,” people started searching “Los Angeles hourly weather for March 1.” While that’s an extremely simplified explanation, the fact is that Google, Bing, Facebook, and other internet leaders have been working on what Colin Jeavons calls “the silent semantic revolution” for years now. Bing launched Satori, a knowledge storehouse that’s capable of understanding complex relationships between people, things, and entities. Facebook built Knowledge Graph, which reveals additional information about things you search, based on Google’s complex semantic algorithm called Hummingbird.

Yep, a new age dawned. The message in the article is that marketers have a great new opportunity to push their message in front of users. In my book, this is one reason why running a query on any of the ad supported Web search engines returns so much irrelevant information. In my just submitted Information Today column, I report how a query for the phrase “concept searching” returned results littered with a vendor’s marketing hoo-hah.

I did not want information about a vendor. I wanted information about a concept. But, alas, Google knows what I want. I don’t know what I want in the brave new world of search. The article ignores the lack of relevance in results, the dust binning of precision and recall, and the bogus information many search queries generate. Try to find current information about Dark Web onion sites and let me know how helpful the search systems are. In fact, name the top TOR search engines. See how far you get with Bing, Google, and Yandex. (DuckDuckGo and Ixquick seem to be aware of TOS content by the way.)

So semantic in the context of this article boils down to four points:

Think like an end user. I suppose one should not try to locate an explanation of “concept searching.” I guess Google knows I care about a company with a quite narrow set of technology focused on SharePoint.
Invest in semantic markup. Okay, that will make sense to the content marketers. What if the system used to generate the content does not support the nifty features of the Semantic Web. OWL, who? RDF what?
Do social. Okay, that’s useful. Facebook and Twitter are the go to systems for marketing products I assume. Who on Facebook cares about cyber OSINT or GE’s cratering petrochemical business?
And the keeper, “Don’t forget about standard techniques.” This means search engine optimization. That SEO stuff is designed to make relevance irrelevant. Great idea.

Net net: The write up underscores some of the issues associated with generating buzz for a small business like the ones INC Magazine tries to serve. With write ups like this one about Semantic Search, INC may be confusing their core constituency. Can confused executives close deals and make sense of INC articles? I assume so. I know I cannot.

Stephen E Arnold, March 27, 2015

Written by Stephen E. Arnold · Filed Under Marketing, News, Search, Semantic | Comments Off on Semantic Search Becomes Search Engine Optimization: That Is Going to Improve Relevance

Enterprise Search Is Important: But Vendor Survey Fails to Make Its Case

March 20, 2015

I read “Concept Searching Survey Shows Enterprise Search Rises in the Ranks of Strategic Applications.” Over the years, I have watched enterprise search vendors impale themselves on their swords. In a few instances, licensees of search technology loosed legal eagles to beat the vendors to the ground. Let me highlight a few of the milestones in enterprise search before commenting on this “survey says, it must be true” news release.

A Simple Question?

What do these companies have in common?

Autonomy
Convera
Fast Search & Transfer?

I know from my decades of work in the information retrieval sector that financial doubts plagued these firms. Autonomy, as you know, is the focal point of on-going litigation over accounting methods, revenue, and its purchase price. Like many high-tech companies, Autonomy achieved significant revenues and caused some financial firms to wonder how Autonomy achieved its hundreds of millions in revenue. There was a report from Cazenove Capital I saw years ago, and it contained analyses that suggested search was not the money machine for the company.

And Convera? After morphing from Excalibur with its acquisition of the manual-indexing ConQuest Technologies, a document scanning with some brute force searching technology morphed into Convera. Convera suggested that it could perform indexing magic on text and video. Intel dived in and so did the NBA. These two deals did not work out and the company fell on hard times. With an investment from Allen & Company, Conquest tried its hand at Web indexing. Finally, stakeholders lost faith and Convera sold off its government sales and folded its tent. (Some of the principals cooked up another search company. This time the former Convera wizards got into the consulting engineering business.) Convera lives on in a sense as part of the Ntent system. Convera lost some money along the way. Lots of money as I recall.

And Fast Search? Microsoft paid $1.2 billion for Fast Search. Now the 1998 technology lives on within Microsoft SharePoint. But Fast Search has the unique distinction of facing both a financial investigation for fancy dancing with its profit and loss statement and the distinction of having its founder facing a jail term. Fast Search ran into trouble when its marketers promised magic from the ESP system. When the pixie dust caused licensees to develop an allergic reaction, Fast ran into trouble. The scrambling caused some managers to flee the floundering Norwegian search ship and found another search company. For those who struggle with Fast Search in its present guise, you understand the issues created by Fast Search’s “sell it today and program it tomorrow” approach.

Is There a Lesson in These Vendors’ Trajectories?

What do these three examples tell us? High flying enterprise search vendors seem to have run into some difficulties. Not surprisingly, the customers of these companies are often wary of enterprise search. Perhaps that is the reason so many enterprise search vendors do not use the words “enterprise search”, preferring euphemisms like customer support, business intelligence, and knowledge management?

The Rush to Sell Out before Drowning in Red Ink

Now a sidelight. Before open source search effectively became the go to keyword search system, there were vendors who had products that for the most part worked when installed to do basic information retrieval. These companies’ executives worked overtime to find buyers. The founders cashed out and left the new owners to figure out how to make sales, pay for research, and generate sufficient revenue to get the purchase price back. Which companies are these? Here’s a short list and incomplete list to help jog your memory:

Artificial Linguistics (sold to Oracle)
BRS Search (sold to OpenText)
EasyAsk (first to Progress Software and then to an individual investor)
Endeca to Oracle
Enginium (sold to Kroll and now out of business)
Exalead to Dassault
Fulcrum Technology to IBM (quite a story. See the Fulcrum profile at www.xenky.com/vendor-profiles)
InQuira to Oracle
Information Dimensions (sold to OpenText)
Innerprise (Microsoft centric, sold to GoDaddy)
iPhrase to IBM (iPhrase was a variant of Teratext’s approach)
ISYS Search Software to Lexmark (yes, a printer company)
RightNow to Oracle (RightNow acquired Dutch technology for its search function)
Schemalogic to Smartlogic
Stratify/Purple Yogi (sold to Iron Mountain and then to Autonomy)
Teratext to SAIC, now Leidos
TripleHop to Oracle
Verity to Autonomy and then HP bought Autonomy
Vivisimo to IBM (how clustering and metasearch magically became a Big Data system from the company that “invented” Watson) .

The brand impact of these acquired search vendors is dwindling. The only “name” on the list which seems to have some market traction is Endeca.

Some outfits just did not make it or who are in a very quiet, almost dormant, mode. Consider these search vendors:

Delphes (academic thinkers with linguistic leanings)
Edgee
Dieselpoint (structured data search)
DR LINK (Syracuse University and an investment bank)
Executive Search (not a headhunting outfit, an enterprise search outfit)
Grokker
Intrafind
Kartoo
Lextek International
Maxxcat
Mondosoft
Pertimm (reincarnated with Axel Springer (Macmillan) money as Qwant, which according to Eric Schmidt, is a threat to Google. Yeah, right.)
Siderean Software (semantic search)
Speed of Mind
Suggest (Weitkämper Technology)?
Thunderstone

These are not a comprehensive list. I just wanted to layout some facts about vendors who tilted at the enterprise search windmill. I think that a reasonable person might conclude that enterprise search has been a tough sell. Of the companies that developed a brand, none was able to achieve sustainable revenues. The information highway is littered with the remains of vendors who pitched enterprise search as the killer app for anything to do with information.

Now the survey purports to reveal insights to which I have been insensitive in my decades of work in digital information access.

Here’s what the company sponsoring the survey offers:

Concept Searching [the survey promulgator], the global leader in semantic metadata generation, auto-classification, and taxonomy management software, and developer of the Smart Content Framework™, is compiling the statistics from its 2015 SharePoint and Office 365 Metadata survey, currently unpublished. One of the findings, gathered from over 360 responses, indicates a renewed focus on improving enterprise search.

The focus seems to be on SharePoint. I thought SharePoint was a mishmash of content management, collaboration, and contacts along with documents created by the fortunate SharePoint users. Question: Is enterprise search conflated with SharePoint?

I would not make this connection.

If I understand this, the survey makes clear that some of the companies in the “sample” (method of selection not revealed) want better search. I want better information access, not search per se.

Each day I have dozens of software applications which require information access activity. I also have a number of “enterprise” search systems available to me. Nevertheless, the finding suggests to me that enterprise search is and has not been particularly good. If I put on my SharePoint sunglasses, I see a glint of the notion that SharePoint search is not very good. The dying sparks of Fast Search technology smoldering in fire at Camp DontWorkGud.

Images, videos, and audio content present me with a challenge. Enterprise search and metatagging systems struggle to deal with these content types. I also get odd ball file formats; for example, Framemaker, Quark, and AS/400 DB2 UDB files.

The survey points out that the problem with enterprise search is that indexing is not very good. That may be an understatement. But the remedy is not just indexing, is it?

After reading the news release, I formed the opinion that the fix is to use the type of system available from the survey sponsor Concept Searching. Is that a coincidence?

Frankly, I think the problems with search are more severe than bad indexing, whether performed by humans or traditional “smart” software.

According the news release, my view is not congruent with the survey or the implications of the survey data:

A new focus on enterprise search can be viewed as a step forward in the management and use of unstructured content. Organizations are realizing that the issue isn’t going to go away and is now impacting applications such as records management, security, and litigation support. This translates into real business currency and increases the risk of non-compliance and security breaches. You can’t find, protect, or use what you don’t know exists. For those organizations that are using, or intend to deploy, a hybrid environment, the challenges of leveraging metadata across the entire enterprise can be daunting, without the appropriate technology to automate tagging.

Real business currency. Is that money?

Are system administrators still indexing human resource personnel records, in process legal documents related to litigation, data from research tests and trials in an enterprise search system? I thought a more fine-grained approach to indexing was appropriate. If an organization has a certain type of government work, knowledge of that work can only be made available to those with a need to know. Is indiscriminate and uncontrolled indexing in line with a “need to know” approach?

Information access has a bright future. Open source technology such as Lucene/Solar/Searchdaimon/SphinxSearch, et al is a reasonable approach to keyword functionality.

Value-added content processing is also important but not as an add on. I think that the type of functionality available from BAE, Haystax, Leidos, and Raytheon is more along the lines of the type of indexing, metatagging, and coding I need. The metatagging is integrated into a more modern system and architecture.

For instance, I want to map geo-coordinates in the manner of Geofeedia to each item of data. I also want context. I need an entity (Barrerra) mapped to an image integrated with social media. And, for me, predictive analytics are essential. If I have the name of an individual, I want that name and its variants. I want the content to be multi-language.

I want what next generation information access systems deliver. I don’t want indexing and basic metatagging. There is a reason for Google’s investing in Recorded Future, isn’t there?

The future of buggy whip enterprise search is probably less of a “strategic application” and more of a utility. Microsoft may make money from SharePoint. But for certain types of work, SharePoint is a bit like Windows 3.11. I want a system that solves problems, not one that spawns new challenges on a daily basis.

Enterprise search vendors have been delivering so-so, flawed, and problematic functionality for 40 years. After decades of vendor effort to make information findable in an organization, has significant progress been made. DARPA doesn’t think search is very good. The agency is seeking better methods of information access.

What I see when I review the landscape of enterprise search is that today’s “leaders” (Attivio, BA Insight, Coveo, dtSearch, Exorbyte, among others) remind me of the buggy whip makers driving a Model T to lecture farmers that their future depends on the horse as the motive power for their tractor.

Enterprise search is a digital horse, an one that is approaching break down.

Enterprise search is a utility within more feature rich, mission critical systems. For a list of 20 companies delivering NGIA with integrated content processing, check out www.xenky.com/cyberosint.

Stephen E Arnold, March 20, 2015

Written by Stephen E. Arnold · Filed Under Business strategy, Enterprise search, Indexing, News | Comments Off on Enterprise Search Is Important: But Vendor Survey Fails to Make Its Case

Is Enterprise Search Exempt from Intellectual Dishonesty?

January 20, 2015

I read “Techmeme’s Gabe Rivera on Tech Media: A Lot of Intellectual Dishonesty.” I figured out that “intellectual dishonesty” covers a large swath of baloney information. I have been involved in “technology” since I was hired by Halliburton Nuclear in 1972. In that period, I have watched engineers try to explain to non-engineers the objective functions of processes, algorithms, systems, and methods. I learned quickly that those who were not informed had a tough time figuring out what the engineers were saying or “meant.” Thus, the task became recasting details into something easily understood. Yep, nothing like simplified nuclear fission. It’s just like boiling water over a campfire. There you go. Nuclear energy made simple.

This article is a brief interview with a Silicon Valley luminary. The point seems to be that today much of the information about technology is off the mark. Well, let me make this simple: Almost useless. Today, thanks to innovation and re-imagineering, anyone able to click a mouse button can assert, “I am a technologist.” Many mouse clickers add a corollary: “I can learn anything.” No doubt failed middle school teachers, unemployed webmasters, and knowledge management experts have confidence in their abilities. Gold stars in middle school affirm one’s excellence, right?

In this interview, there were two observations that I related to my field of interest: Information Access.

I noted this comment about technology information:

Another problem: lying by omission, hyperbole and other forms of intellectual dishonesty are creeping into more tech reporting.

Ah, lying, hyperbole, and “other forms of intellectual dishonesty.” Good stuff.

I found this remark on point as well:

Most of the people who can offer key insights for understanding the industry are not incentivized to write, so a lot of crucial knowledge just never appears online. It’s just passed along to certain privileged people in the know.

I think this means that those with high value information may not produce listicles every few days. Too bad.

So what about enterprise search? Some thoughts:

Consultants and experts who write what the prospects or the clients want to get money, consideration, or self aggrandizement. Dave Schubmehl, are you done recycling my research without permission?
Vendors who say almost anything to close a deal. That’s why enterprise search vendors hop from SharePoint utility to customer support to business intelligence to analytics. The idea is that once the money is in hand, the vendor can code up a good enough solution
Cheerleaders for failed concepts promise “value” or performance. The idea that knowledge management or innovation will be a direct consequence of finding information is only a partial truth.
Open source cheerleaders. Open source is one source of information access technology. Open source requires glue code and scripting and often costs as much as a proprietary solution when direct and indirect expenses are tallied and summed. But free is “good”, right?
Bloggers, experts, newly minted consultants, and unemployed English majors conclude that they are expert searchers and can learn anything.
Job seekers. I find some of the information available on LinkedIn and Slideshare quite amazing, fascinating, and unfortunately disheartening.
Unemployed search administrators. These folks want to use failure as a ladder to climb higher in their next job.

Net net: In enterprise search, the problems are significant because of the nature of human utterance. Those who are uninformed cater to the customers who may be uninformed. The result is the all-too-predictable rise and fall of companies like Delphes, Convera, Entopia, or Fast Search & Transfer, among many others. For example, Google tried to “fix” enterprise search with a locked down appliance. How is that working out?

The volume of misinformation, disinformation, and reformation makes accurate, objective analysis of search an almost impossible job. When everyone is an expert in search and content processing, most information about information access has almost zero knowledge value.

Stephen E Arnold, January 20, 2015

Written by Stephen E. Arnold · Filed Under Enterprise search, News | 1 Comment

Enterprise Search: Confusing Going to Weeds with Being Weeds

November 30, 2014

I seem to run into references to the write up by a “expert”. I know the person is an expert because the author says:

As an Enterprise Search expert, I get a lot of questions about Search and Information Architecture (IA).

The source of this remarkable personal characterization is “Prevent Enterprise Search from going to the Weeds.” Spoiler alert: I am on record as documenting that enterprise search is at a dead end, unpainted, unloved, and stuck on the margins of big time enterprise information applications. For details, read the free vendor profiles at www.xenky.com/vendor-profiles or, if you can find them, read one of my books such as The New Landscape of Search.

Okay. Let’s assume the person writing the Weeds’ article is an “expert”. The write up is about misconcepts [sic]; specifically, crazy ideas about what a 50 year plus old technology can do. The solution to misconceptions is “information architecture.” Now I am not sure what “search” means. But I have no solid hooks on which to hang the notion of “information architecture” in this era of cloud based services. Well, the explanation of information architecture is presented via a metaphor:

The key is to understand: IA and search are business processes, rather than one-time IT projects. They’re like gardening: It’s up to you if you want a nice and tidy garden — or an overgrown jungle.

Gentle reader, the fact that enterprise search has been confused with search engine optimization is one thing. The fact that there are a number of companies happily leapfrogging the purveyors of utilities to make SharePoint better or improve automatic indexing is another.

Let’s look at each of the “misconceptions” and ask, “Is search going to the weeds or is search itself weeds?”

The starting line for the write up is that no one needs to worry about information architecture because search “will do everything for us.” How are thoughts about plumbing and a utility function equivalent. The issue is not whether a system runs on premises, from the cloud, or in some hybrid set up. The question is, “What has to be provided to allow a person to do his or her job?” In most cases, delivering something that addresses the employee’s need is overlooked. The reason is that the problem is one that requires the attention of individuals who know budgets, know goals, and know technology options. The confluence of these three characteristics is quite rare in my experience. Many of the “experts” working enterprise search are either frustrated and somewhat insecure academics or individuals who bounced into a niche where the barriers to entry are a millimeter or two high.

Next there is a perception, asserts the “expert”, that search and information architecture are one time jobs. If one wants to win the confidence of a potential customer, explaining that the bills will just keep on coming is a tactic I have not used. I suppose it works, but the incredible turnover in organizations makes it easy for an unscrupulous person to just keep on billing. The high levels of dissatisfaction result from a number of problems. Pumping money into a failure is what prompted one French engineering company to buy a search system and sideline the incumbent. Endless meetings about how to set up enterprise systems are ones to which search “experts” are not invited. The information technology professionals have learned that search is not exactly a career building discipline. Furthermore, search “experts” are left out of meetings because information technology professionals have learned that a search system will consume every available resource and produce a steady flow of calls to the help desk. Figuring out what to build still occupies Google and Amazon. Few organizations are able to do much more that embrace the status quo and wait until a mid tier consultant, a cost consultant, or a competitor provides the stimulus to move. Search “experts” are, in my experience, on the outside of serious engineering work at many information access challenged organizations. That’s a good thing in my view.

The middle example is what the expert calls “one size fits all.” Yep, that was the pitch of some of the early search vendors. These folks packaged keyword search and promised that it would slice, dice, and chop. The reality of information, even for the next generation information access companies with which I work, focus on making customization as painless as possible. In fact, these outfits provide some ready-to-roll components, but where the rubber meets the road is providing information tailored to each team or individual user. At Target last night, my wife and I bought Christmas gifts for needy people. One of the gifts was a 3X sweater. We had a heck of a time figuring out if the store offered such a product. Customization is necessary for more and more every day situations. In organizations, customization is the name of the game. The companies pitching enterprise search today lag behind next generation information access providers in this very important functionality. The reason is that the companies lack the resources and insight needed to deliver. But what about information architecture? How does one cloud based search service differ from another? Can you explain the technical and cost and performance differences between SearchBlox and Datastax?

The penultimate point is just plain humorous: Search is easy. I agree that search is a difficult task. The point is that no one cares how hard it is. What users want are systems that facilitate their decision making or work. In this blog I reproduced a diagram showing one firm’s vision for indexing. Suffice it to say that few organizations know why that complexity is important. The vendor has to deliver a solution that fits the technical profile, the budget, and the needs of an organization. Here is the diagram. Draw your own conclusion:

The final point is poignant. Search, the “expert” says, can be a security leak. No, people are the security link. There are systems that process open source intelligence and take predictive, automatic action to secure networks. If an individual wants to leak information, even today’s most robust predictive systems struggle to prevent that action. The most advanced systems from Centripetal Networks and Zerofox offer robust systems, but a determined individual can allow information to escape. What is wrong with search has to do with the way in which provided security components are implemented. Again we are back to people. Information architecture can play a role, but it is unlikely that an organization will treat search differently from legal information or employee pay data. There are classes of information to which individuals have access. The notion that a search system provides access to “all information” is laughable.

I want to step back from this “expert’s” analysis. Search has a long history. If we go back and look at what Fulcrum Technologies or Verity set out to do, the journeys of the two companies are quite instructive. Both moved quickly to wrap keyword search with a wide range of other functions. The reason for this was that customers needed more than search. Fulcrum is now part of OpenText, and you can buy nubbins of Fulcrum’s 30 year old technology today, but it is wrapped in huge wads of wool that comprise OpenText’s products and services. Verity offered some nifty security features and what happened? The company chewed through CEOs, became hugely bloated, struggled for revenues, and end up as part of Autonomy. And what about Autonomy? HP is trying to answer that question.

Net net: This weeds write up seems to have a life of its own. For me, search is just weeds, clogging the garden of 21st century information access. The challenges are beyond search. Experts who conflate odd bits of jargon are the folks who contribute to confusion about why Lucene is just good enough so those in an organization concerned with results can focus on next generation information access providers.

Stephen E Arnold, November 30, 2014

Written by Stephen E. Arnold · Filed Under Editorial opinion, Enterprise search, News | Comments Off on Enterprise Search: Confusing Going to Weeds with Being Weeds

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

The Equivalent of a Brexit

Facebook and Humans: Reality Is Not Marketing

Enterprise Search Revisionism: Can One Change What Happened

Enterprise Search Is a Growth Industry: No, Really

Software Market Begs for Integration Issue Relief

Improving the Preservica Preservation Process

Semantic Search Becomes Search Engine Optimization: That Is Going to Improve Relevance

Enterprise Search Is Important: But Vendor Survey Fails to Make Its Case

Is Enterprise Search Exempt from Intellectual Dishonesty?

Enterprise Search: Confusing Going to Weeds with Being Weeds

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta