Enterprise Search: Confusing Going to Weeds with Being Weeds

November 30, 2014

I seem to run into references to the write up by a “expert”. I know the person is an expert because the author says:

As an Enterprise Search expert, I get a lot of questions about Search and Information Architecture (IA).

The source of this remarkable personal characterization is “Prevent Enterprise Search from going to the Weeds.” Spoiler alert: I am on record as documenting that enterprise search is at a dead end, unpainted, unloved, and stuck on the margins of big time enterprise information applications. For details, read the free vendor profiles at www.xenky.com/vendor-profiles or, if you can find them, read one of my books such as The New Landscape of Search.

Okay. Let’s assume the person writing the Weeds’ article is an “expert”. The write up is about misconcepts [sic]; specifically, crazy ideas about what a 50 year plus old technology can do. The solution to misconceptions is “information architecture.” Now I am not sure what “search” means. But I have no solid hooks on which to hang the notion of “information architecture” in this era of cloud based services. Well, the explanation of information architecture is presented via a metaphor:

The key is to understand: IA and search are business processes, rather than one-time IT projects. They’re like gardening: It’s up to you if you want a nice and tidy garden — or an overgrown jungle.

Gentle reader, the fact that enterprise search has been confused with search engine optimization is one thing. The fact that there are a number of companies happily leapfrogging the purveyors of utilities to make SharePoint better or improve automatic indexing is another.

Let’s look at each of the “misconceptions” and ask, “Is search going to the weeds or is search itself weeds?”

The starting line for the write up is that no one needs to worry about information architecture because search “will do everything for us.” How are thoughts about plumbing and a utility function equivalent. The issue is not whether a system runs on premises, from the cloud, or in some hybrid set up. The question is, “What has to be provided to allow a person to do his or her job?” In most cases, delivering something that addresses the employee’s need is overlooked. The reason is that the problem is one that requires the attention of individuals who know budgets, know goals, and know technology options. The confluence of these three characteristics is quite rare in my experience. Many of the “experts” working enterprise search are either frustrated and somewhat insecure academics or individuals who bounced into a niche where the barriers to entry are a millimeter or two high.

Next there is a perception, asserts the “expert”, that search and information architecture are one time jobs. If one wants to win the confidence of a potential customer, explaining that the bills will just keep on coming is a tactic I have not used. I suppose it works, but the incredible turnover in organizations makes it easy for an unscrupulous person to just keep on billing. The high levels of dissatisfaction result from a number of problems. Pumping money into a failure is what prompted one French engineering company to buy a search system and sideline the incumbent. Endless meetings about how to set up enterprise systems are ones to which search “experts” are not invited. The information technology professionals have learned that search is not exactly a career building discipline. Furthermore, search “experts” are left out of meetings because information technology professionals have learned that a search system will consume every available resource and produce a steady flow of calls to the help desk. Figuring out what to build still occupies Google and Amazon. Few organizations are able to do much more that embrace the status quo and wait until a mid tier consultant, a cost consultant, or a competitor provides the stimulus to move. Search “experts” are, in my experience, on the outside of serious engineering work at many information access challenged organizations. That’s a good thing in my view.

The middle example is what the expert calls “one size fits all.” Yep, that was the pitch of some of the early search vendors. These folks packaged keyword search and promised that it would slice, dice, and chop. The reality of information, even for the next generation information access companies with which I work, focus on making customization as painless as possible. In fact, these outfits provide some ready-to-roll components, but where the rubber meets the road is providing information tailored to each team or individual user. At Target last night, my wife and I bought Christmas gifts for needy people. One of the gifts was a 3X sweater. We had a heck of a time figuring out if the store offered such a product. Customization is necessary for more and more every day situations. In organizations, customization is the name of the game. The companies pitching enterprise search today lag behind next generation information access providers in this very important functionality. The reason is that the companies lack the resources and insight needed to deliver. But what about information architecture? How does one cloud based search service differ from another? Can you explain the technical and cost and performance differences between SearchBlox and Datastax?

The penultimate point is just plain humorous: Search is easy. I agree that search is a difficult task. The point is that no one cares how hard it is. What users want are systems that facilitate their decision making or work. In this blog I reproduced a diagram showing one firm’s vision for indexing. Suffice it to say that few organizations know why that complexity is important. The vendor has to deliver a solution that fits the technical profile, the budget, and the needs of an organization. Here is the diagram. Draw your own conclusion:


The final point is poignant. Search, the “expert” says, can be a security leak. No, people are the security link. There are systems that process open source intelligence and take predictive, automatic action to secure networks. If an individual wants to leak information, even today’s most robust predictive systems struggle to prevent that action. The most advanced systems from Centripetal Networks and Zerofox offer robust systems, but a determined individual can allow information to escape. What is wrong with search has to do with the way in which provided security components are implemented. Again we are back to people. Information architecture can play a role, but it is unlikely that an organization will treat search differently from legal information or employee pay data. There are classes of information to which individuals have access. The notion that a search system provides access to “all information” is laughable.

I want to step back from this “expert’s” analysis. Search has a long history. If we go back and look at what Fulcrum Technologies or Verity set out to do, the journeys of the two companies are quite instructive. Both moved quickly to wrap keyword search with a wide range of other functions. The reason for this was that customers needed more than search. Fulcrum is now part of OpenText, and you can buy nubbins of Fulcrum’s 30 year old technology today, but it is wrapped in huge wads of wool that comprise OpenText’s products and services. Verity offered some nifty security features and what happened? The company chewed through CEOs, became hugely bloated, struggled for revenues, and end up as part of Autonomy. And what about Autonomy? HP is trying to answer that question.

Net net: This weeds write up seems to have a life of its own. For me, search is just weeds, clogging the garden of 21st century information access. The challenges are beyond search. Experts who conflate odd bits of jargon are the folks who contribute to confusion about why Lucene is just good enough so those in an organization concerned with results can focus on next generation information access providers.

Stephen E Arnold, November 30, 2014

Facial Recognition: A Clue for Dissemblers

November 29, 2014

The idea that numerical recipes can identify a person in video footage is a compelling one. I know one or two folks involved in law enforcement who would find a reliable, low cost, high speed solution very desirable.

image image

The face on the left is a reverse engineered FR image. The chap on the right is the original Creature from the Black Lagoon. Toss in a tattoo and this Black Lagoon fellow would not warrant a second look at Best Buy on Cyber Monday.

I read “This Is What Happens When You Reverse Engineer Facial Recognition.” the internal data about an image is not a graduation photograph. The write up contains an interesting statement:

The resulting meshes were then 3D-printed, creating masks that could be worn by people, presenting cameras with an image that is no longer an actual face, yet still recognizable as one to software.

Does this statement point to a way to degrade the performance of today’s systems? A person wanting to be unrecognized could flip this reverse engineering process and create a mash up of two or more “faces.” Sure, the person would look a bit like the Creature from the Black Lagoon, but today’s facial recognition systems might be uncertain about who was wearing the mask.

Stephen E Arnold, November 29, 2014


Enterprise Search: Hidden and Intentional Limitations

November 29, 2014

Several years ago, an ArnoldIT team tackled a content management search system that had three characteristics: [1] The system could not locate content a writer saved to the system. [2] The search results lacked relevance that made sense to the user; that is, time sequences for versions of the article were neither in ascending or descending order. [3] Response time was sluggish. Let’s look at each of these obvious problems.

The user wanted to access a document that required final touch ups. The article was not in a result that even when the writer entered the full title of the article. Examination of the system revealed that it lacked keyword search, relying on a training set of documents that did not contain some of the words in the title. The fix was to teach the user to search for articles using words and concepts that appeared in the body of the article. We also identified an indexing latency problem. The system lacked sufficient resources so recent material was not in an index until the indexing system caught up. The organization did not have the money to add additional resources, so the writers were told, “Live with it.”


What is that vendor doing amidst the prospects? A happy quack to http://bit.ly/1CrUc81

The chaos in the relevance function was a result of a configuration error during an upgrade and subsequent reindexing. The fix was to reconfigure the system and reindex the content. Keep in mind that indexing required more than two weeks. The attitude of the client was, “It is what it is.”

The third problem was the response time. This particular system used a product from a Canadian vendor of search and content management systems. The firm acquires companies and then “milks” the system. The idea is that updating and bug fixing are expensive. The problem was exacerbated because the search system was a stub provided by another vendor. The result was that in order to get more robust performance, the client had to upgrade the OEM search system AND add computers, memory, and network infrastructure. The client lacked the money to take these actions.

What were the hidden limitations of this enterprise search system?

On the client side, there was a lack of understanding of the interdependencies of a complex system. The client lacked expertise and, more importantly, money to address the problems of an “as is” infrastructure. Moving the operation to the cloud was not possible due to security concerns, knowledge of what to do and how to do it, and initiative on the management team. The conclusion was, “The system is good enough.”

On the vendor side, the marketing team closing the deal did not disclose that the OEM search system was a stub designed to upsell the licensee the seven figure “fix”. The vendor also did not reveal that funds were not being invested in the system due to the vendor’s “milk the cow” philosophy.

I point out this situation because it applies to many vendors of enterprise search systems. The object of the game on the client side is to get a good enough system at the best possible price. Senior managers are rarely experts in search and often search Google anyway.

The vendor has zero incentive to change its business practices. Even with low cost options available, once a prospect becomes a customer, lock in usually keeps the account alive. When a switch is in the wind, the search vendor plays the “legacy” card, pointing out that there are some people who need the older system. As a result, the licensing organization ends up with multiple search systems. The demand for money just goes up and findability remains a bit of a challenge for employees.


I do not see a fix under the present enterprise search business model. Education does not work. Search conferences dodge the tough issues, preferring to pander to vendors who buy exhibit stands and sponsor lunch.

Something different is needed: Different vendors, different approaches, and different assemblies of technology.

That’s why next generation information access is going to pose a threat to companies that pitch enterprise search disguised as customer support, business intelligence, analysis systems, and eDiscovery tools.

At some point, the NGIA vendors will emerge as the go-to vendors. Like the promising but now quiet outfits like Hakia and Autonomy, search as it has been practiced for decades is rapidly becoming a digital Antikythera mechanism.

Stephen E Arnold, November 29, 2014

An Answer to the Legacy of Steve Jobs

November 28, 2014

The answer is, “Patent every possible thing in order to make the patent wall higher and thicker.”

The article sort of misses the point of my answer. Navigate to to “Steve Jobs Lives on at the Patent Office.” The write up sees the situation in this way:

Deceased inventors can win patents if the approval process draws out, or when attorneys seek “continuations”—essentially new versions of old patents. And the more lawyers and money an inventor has, the more likely his ghost will rattle on. The estate of Jerome Lemelson, the sometimes-controversial independent inventor who came up with the bar code reader, received 96 patents following his death in 1997 at age 74.

Okay, okay. Apple, not Steve Jobs, is milking the cow. Patents unfortunately do not correlate with here and now financial success. I know of one really good example: IBM.

Some folks are confusing legal procedures with making money for someone other than lawyers.

I want to avoid that error. Also, would not life be better if Apple offered a search system that sort of worked.

Stephen E Arnold, November 28, 2014

The EU Parliament and How Google Works

November 28, 2014

The search engine optimization crowd is definitely excited about calls to break up the Google. You will want to read (when sitting down, of course) “Oh No They Didn’t: European Parliament Calls For Break Up Of Google.” I am not sure if this write up is about the vision of search in Europe or the view of the search engine optimization brigands.

The idea in Europe has to do with memories of big companies and the difficulty ruling bodies have of controlling them. Think IG Farben and certain US outfits in the second world war. I assume the learnings from the Quaero investment and the market success of Dassault Exalead’s Internet search system and the more recent Quixotic Qwant.com, the adrenaline pumping Sinequa, and other European search efforts has made one fact clear: Google is the go to search system by a wide margin. How about 95 percent of the search traffic in Denmark, for example?

For the SEO crowd, the notion of splitting up Google is obviously a new idea. The write up states:

It’s clear there’s a lot of frustration — even exasperation — behind this vote and Europe’s seeming inability to date to “do anything about Google.” Europe has been unable to produce home-grown competitors that can challenge the online hegemony of internet companies such as Google and Facebook. The company’s PC market share is much higher in Europe than in the US and Android is the dominant smartphone operating system there by far.

Like an American pro football competition, there is a winner and Europe does not like the outcome. The SEO crowd owes its livelihood to Google’s indifference to objective search results. Don’t tip the apple cart, please.

In the view of the SEO crowd:

It’s very unlikely that the European Commission will actually try to “unbundle” Google’s search engine from the rest of the company. However it’s possible that in Europe Google will be compelled to unbundle its privacy policy and won’t be able to combine data-sets for personalization and ad-targeting purposes. We will also probably see some effort to curb Google’s control over Android as well.

I find it fascinating that the lessons of online are one that have not yet been learned by either regulators or the search engine optimization wizards. The only thing missing is a for fee analysis of the search scene by one of the mid tier consulting outfits. Dave Schubmehl, are you at your iPhone’s touch screen keyboard.

Oh, no. Oh, yes. How Google works is the issue.

Stephen E Arnold, November 28, 2014

Tibco Integrates Attivio Features into Spotfire Analytics Platform

November 28, 2014

Tibco has upgraded its Spotfire analytics platform, we learn from “Content Analytics Now Available in TIBCO Spotfire” at MarketWatch. The press release reports:

“Now customers can connect to new sources of unstructured text-based data and discover trends, identify patterns, and derive new business insights for improved decision making. Fully integrated into the Spotfire UI, the new Spotfire product capability powered by Attivio’s Active Intelligence Engine, will deliver fast, comprehensive sentiment, content, and text analytics functionality.”

Attivio’s CEO expresses excitement over the integration of their Active Intelligence Engine into Spotfire, confident the combination will make it easy to analyze unstructured data and lead to “powerful business insights.” The award-winning platform is central to Attivio, which was founded in 2007 and is headquartered in Massachusetts. The write-up highlights a few new features:

“*Enhanced usability through Attivio’s search box into Spotfire dashboards, allowing for an intuitive experience for business users to search for new insights and analytic views from new content sources.

*Apply predictive analytics to human-created information, using Spotfire’s Predictive Modeling tools or by scripting in the R language using TIBCO Enterprise Runtime for R (TERR).

*Integrate content from Microsoft SharePoint leveraging Attivio’s SharePoint connector.”

Launched in 1997, Tibco serves up infrastructure and business intelligence solutions to businesses in several industries around the world. While the company is headquartered in Palo Alto, California, it maintains offices on several continents.

Cynthia Murrell, November 28, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Now Entering the Age of Web Experience Management

November 28, 2014

As the Internet grows and evolves, the features users expect from search and content management systems is changing. SearchContentManagement addresses the shift in “Semantic Technologies Fuel the Web Experience Wave.” As the title suggests, writer Geoffrey Bock sees this shift as opening a new area with a new set of demands — “web experience management” (WEM) goes beyond “web content management” (WCM).

The inclusion of metadata and contextual information makes all the difference. For example, the information displayed by an airline’s site should, he posits, be different for a user working at their PC, who may want general information, and someone using their phone in the airport parking lot, where they probably need to check their gate number or see whether their flight has been delayed. (Bock is disappointed that none of the airlines’ sites yet work this way.)

The article continues:

“Not surprisingly, to make contextually aware Web content work correctly, a lot of intelligence needs to be added to the underlying information sources, including metadata that describes the snippets, as well as location-specific geo-codes coming from the devices themselves. There is more to content than just publishing and displaying it correctly across multiple channels. It is important to pay attention to the underlying meaning and how content is used — the ‘semantics’ associated with it.

“Another aspect of managing Web experiences is to know when you are successful. It’s essential to integrate tracking and monitoring capabilities into the underlying platform, and to link business metrics to content delivery. Counting page views, search terms and site visitors is only the beginning. It’s important for business users to be able to tailor metrics and reporting to the key performance indicators that drive business decisions.”

Bock supplies an example of one company, specialty-plumbing supplier Uponor, that is making good use of such “WEM” possibilities. See the article for more details on his strategy for leveraging the growing potential of semantic technology.

Cynthia Murrell, November 28, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Mid Tier Consultants Try the Turkey Tactic

November 27, 2014

Entering 2015, the economy is not ripping along like some of the MBAs suggest. Life is gloomy for many keyword search, content management, and indexing system vendors. And for good reason. These technologies have run their course. Instead of being must have enterprise solutions, the functions are now utilities. The vendors of proprietary systems have to realize that free and open source systems provide “good enough” utility type functions.

Perhaps this brutal fact is the reason that search “expert” Dave Schubmehl recycled our research on open source software, tried to flog it on Amazon without my permission, and then quietly removed the reports based on ArnoldIT research. When a mid tier consulting firm cannot sell recycled research taken without permission for sale via Amazon for the quite incredible price of $3,500 for eight pages of information garbling our work, you know that times are tough for the mid tier crowd.


Are the turkeys the mid-tier consultants or their customers? Predictions about the future wrapped in the tin foil of jargon may not work like touts who pick horses. The difference between a mid tier consulting firm and a predictive analytics firm is more than the distance between an art history major and a PhD in mathematics with a Master’s in engineering and an undergraduate degree in history in my opinion.

Now the focus at the mid tier consulting firms is turning to the issue of raising turkeys. Turkeys are stellar creatures. Is it true that the turkey is the only fowl that will drown itself during a rain storm. My grandmother told me the birds will open their beaks and capture the rain. According to the Arnold lore, some lightning quick turkeys will drown themselves.

For 2015, the mid tier consultants want to get the Big Data bird moving. Also, look for the elegant IoT or Internet of Things to get the blue ribbon treatment. You can get a taste of this dish in this news release: “IDC Hosts Worldwide Internet of Things 2015 Predictions Web Conference.”

Yep, a Web conference. I call this a webinar, and I have webinar fatigue. The conference is intended to get the turkeys in the barn. Presumably some of the well heeled turkeys will purchase the IDC Future Scape report. When I mentioned this to a person with whom I spoke yesterday, I think that individual said, “A predictions conference. You are kidding me.” An, no I wasn’t. Here’s the passage I noted:

“The Internet of Things will give IT managers a lot to think about,” said Vernon Turner, Senior Vice President of Research at IDC. “Enterprises will have to address every IT discipline to effectively balance the deluge of data from devices that are connected to the corporate network. In addition, IoT will drive tough organizational structure changes in companies to allow innovation to be transparent to everyone, while creating new competitive business models and products.”

I think I understand. “Every”, “tough organizational changes,” and “new competitive business models.” Yes. And the first example is a report with predictions.

When I think of predictions, I don’t think of mid tier consultants. I think of outfits like Recorded Future, Red Owl, and Haystax, among others. The predictions these companies output are based on data. Predictions from mid tier consultants are based on a wide range of methods. I have a hunch that some of these techniques include folks sitting around and asking, “Well, what do you think this Internet of Things stuff will mean?”

Give me the Recorded Future approach. Oh, I don’t like turkey. I am okay with macaroni and cheese. Basic but it lacks the artificial fizz with which some farmers charge their fowl.

Stephen E Arnold, November 27, 2014

Poor Search Equals Poor E-Sales

November 27, 2014

Logically this statement makes sense and if you have been paying attention to facts you already knew it:

“A recent study by the Baymard Institute, an independent web research institute with a focus on e-commerce usability and optimization, found that many of the top 50 U.S. e-commerce sites are lacking essential e-commerce search capabilities which is hindering current online sales.”

Please feel free to insert your favorite exasperation for pointing out the obvious. This is something that even an experienced online retail shopper could tell you. Digital Journal covers the story in “Baymard Institute Study Finds Major Problems With Search On Leading E-Commerce Sites.”

Baymard found that most users don’t like browsing through categories. The search function on these big e-retailers also found they don’t have a spell check feature, did not support thematic or product searches, and required specific jargon.

EasyAsk responded to the Baymard’s with a white paper detailing how e-commerce Web site can improve their search feature to improve sales. One way is supporting natural language search. The white paper is available for free download.

Whitney Grace, November 27, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Elasticsearch Partners With Cisco

November 27, 2014

Cisco is a popular solution for enterprise communication software and now Elasticsearch has joined its team. Unlike other partnerships, which involve either company buyouts or some form of give and take, Cisco is remaining independent. The news came to us from Market Wired in “Elasticsearch Joins the Cisco Solution Partner Program.” Being a member of the Cisco Solution Partner Program allows Elasticsearch to access Cisco’s Internet of Everything network. The program also gives Elasticsearch the opportunity to quickly create and deploy solutions to the Internet of Everything network.

Another boon of Elasticsearch teaming with Cisco is that it brings Mozilla into the circle. Mozilla already uses Elasticsearch and Kibana, an open source security information and event management platform. When you put the three together teamed with Cisco’s UCS infrastructure, real-time indexing, search, real-time analytics, and security protection.

” ‘Elasticsearch, Kibana and Cisco UCS allowed us to quickly stand up an infrastructure we could use to build MozDef, and support our needs for rapid expansion, querying, indexing and replication of data traffic,’ said Jeff Bryner, Intrusion Detection Engineer at Mozilla. ‘Elasticsearch provides us the flexibility and speed to handle our increasing stream of event data, which we can search and visualize in Kibana and then use MozDef to perform incident response, alerting and advanced visualizations to protect Mozilla’s data, systems and customers.’ “

The Cisco Solution Partner Program is a win-win situation for all participants. The partners can draw on each other’s strengths and offer a wider array of services.

Whitney Grace, November 27, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Next Page »

  • Archives

  • Recent Posts

  • Meta