CyberOSINT banner

Enterprise Search: You Cannot Do It Yourself, People.

July 31, 2015

I love write ups like “Don’t Settle When It Comes to Enterprise Search Platforms.” These articles are designed to make consulting firms with the marketing flim flam which positions each as an “expert” in enterprise information access. I would not be surprised to find copies of this article in the peddler kit of search sales professionals.

The main point of the write up is that enterprise search is a “platform.” Because there are options, no self respecting company will try to implement search without the equivalent of the F Troop in mid tier or below consultants.

I noted:

Let’s look at two very common workarounds some have tried, and then we will talk about why you must go with a reputable developer when you make your final decision.

When I read this, I wondered if the “expert” were familiar with the Maxxcat line of enterprise search systems or the Blossom hosted solution.

The write up dismisses an open source solution apparently unaware of research by Diomidis Spinellis and Vaggelis Giannikas work published in Journal of Systems and Software, March 2012, pages 666 to 682. That’s okay. My hunch is that those finding the “Don’t Settle” article compelling are not likely to be interested in researchy type stuff.

One of the more interesting segments in the write up is the assertion that scalability is a “given.” Hmmm. In my experience, there are some on going enterprise search challenges: Scalability is one facet of a nest of vipers which includes my favorite reptile indexing latency.

The article states:

Open source platforms are only as scalable as their code allows, so if the person who first made it didn’t have your company’s needs in mind, you’ll be in trouble. Even if they did, you could run into a problem where you find out that scaling up actually reveals some issues you hadn’t encountered before. This is the exact kind of event you want to avoid at all costs.

I don’t want to rain on this parade of “information,” but every enterprise search system which I have had the pleasure of procuring, managing, investigating, and analyzing has scalability problems.

The reason is simple: The volume of changed information and the flow of new information goes up. Whatever one starts with is rather rapidly choked. The solutions are painful: Spend more or index less.

I am not confident that one who follows the advice of certain experts will find his or her enterprise search journey pleasant. On the other hand, there are opportunities as Uber drivers one can pursue.

Stephen E Arnold, July 31, 2015

PowerPoint Enabled Big Data Presenters Rejoice

July 27, 2015

Navigate to “A Plethora of Big Data Infographics.” Note that the original write up misspells “plethora” at “pletora” but, as many in Big Data say, “it is close enough for horseshoes.”

big data chart snip

I quit browsing after a baker’s dozen of these puppies. If you want to be an expert in Big Data, these charts will do the trick. I would steer clear of a person with a PhD in statistics, however.

Stephen E Arnold, July 27, 2015

Forbes and Some Big Data Forecasts

July 26, 2015

Short honk: For fee, mid tier consultants have had their thunder stolen. Forbes, the capitalist tool, wants to make certain its readers know how juicy Big Data is as a market. Navigate to “Roundup Of Analytics, Big Data & Business Intelligence Forecasts And Market Estimates, 2015.”

The write up summarizes the eye watering examples of spreadsheet fever’s impact on otherwise semi-rationale MBAs, senior managers, and used car sales professionals. IDC, without the inputs of Dave Schubmehl comes up with a spectacular number: $125 billion in 2015.

Sounds good, right?

The data will find their way into innumerable PowerPoint presentations. Snag ‘em while you can.

Stephen E Arnold, July 26, 2015

Big Data Basics: Garbage In, Garbage Out Still a Problem

July 20, 2015

The person writing “Data Integrity: A Sequence of Words Lost in the World of Big Data” appears to be older than 18. I don’t hear too many young wizards nattering about data integrity. The operative concept is that with enough data, the data work out the bumps in the Big Data tapestry. The cloth may have leaves and twigs in it. But when you make the woven object big enough and hang it on a wall in a poorly illuminated chateau, who can tell. Few visitors demand a ladder and a lanthorn to inspect the handiwork.

According to the write up:

The purpose of this post is to highlight the necessity to keep data clean and orderly so that the results of the analysis are reliable and trustworthy – if data integrity is intact, information derived from this data will be trustworthy resulting in actionable information.

Why tackle this topic in a blog for Big Data professionals?

Answer: No one pays much attention. The author saddles up and does the Don Quixote gallop at the Big Data hyperbole windmill.

The article includes a partial list of questions to ask and, keep this in mind, gentle reader, to answer. One example: “Are values outside of acceptable domain values?”

I found this article refreshing. Take a gander.

Stephen E Arnold, July 20, 2015

Holy Cow. More Information Technology Disruptors in the Second Machine Age!

July 11, 2015

I read a very odd write up called “The Five Other Disruptors about to Define IT in the Second Machine Age.”

Whoa, Nellie. The second machine age. I thought we were in the information age. Dorky machines are going to be given an IQ injection with smart software. The era is defined by software, not machines. You know. Mobile phones are pretty much a commodity with the machine part defined by fashion and brand and, of course, software.

So a second machine age. News to me. I am living in the second machine age. Interesting. I thought we had the Industrial Revolution, then the boring seventh grade mantra of manufacturing, the nuclear age, the information age, etc. Now we are doing the software thing.

My hunch is that the author  of this strange article is channeling Shoshana Zuboff’s In the Age of the Smart Machine. That’s okay, but I am not convinced that the one, two thing is working for me.

Let’s look at the disruptors which the article asserts are just as common as the wonky key fob I have for my 2011 Kia Soul. A gray Kia soul. Call me exciting.

Here are the four disruptors that, I assume, are about to remake current information technology models. Note that these four disruptors are “about to define IT.” These are like rocks balanced above Alexander the Great’s troops as they marched through the valleys in what is now Afghanistan. A 12 year old child could push the rock from its perch and crush a handful of Macedonians. Potential and scary enough to help Alexander to decide to march in a different direction. Hello, India.

These disruptors are the rocks about to plummet into my information technology department. The department, I wish to point out, works from their hovels and automobiles, dialing in when the spirit moves them.

Here we go:

  • Big Data
  • Cloud
  • Mobile
  • Social

I am not confident that these four disruptors have done much to alter my information technology life, but if one is young, I assume that these disruptors are just part of the everyday experience. I see grade school children poking their smart phones when I take my dogs for their morning constitutional.

But the points which grabbed my attention were the “five other disruptors.” I had to calm down because I assumed i had a reasonable grasp on disruptors important in my line of work. But, no. These disruptors are not my disruptors.

Let’s look at each:

The Trend to NoOps

What the heck does this mean? In my experience, experienced operations professionals are needed even as some of the smart outfits I used to work with.

Agility Becomes a First Class Citizen

I did not know that the ability to respond to issues and innovations was not essential for a successful information technology professional.

Identity without Barriers

What the heck does this mean? The innovations in security are focused on ensuring that barriers exist and are not improperly gone through. The methods have little to do with an individual’s preferences. The notion of federation is an interesting one. In some cases, federation is one of the unresolved challenges in information technology. Mixing up security, “passwords,” and disparate content from heterogeneous systems is a very untidy serving of fruit salad.

Thinking about information technology after reading Rush’s book of farmer flummoxing poetry. Is this required reading for a mid tier consultant? I wonder if Dave Schubmehl has read it? I wonder if some Gartner or Forrester consultants have dipped into its meaty pages. (No pun intended.)

IT Goes Bi Modal?

What the heck does this mean again? Referencing Gartner is a sure fire way to raise grave concerns about the validity of the assertion. But bi-modal. Two modes. Like zero and one. Organizations have to figure out how to use available technology to meet that organization’s specific requirements. The problem of legacy and next generation systems defines the information landscape. Information technology has to cope with a fuzzy technology environment. Bi modal? Baloney.

The Second Machine Age

Okay, I think I understand the idea of a machine age. The problem is that we are in a software and information datasphere. The machine thing is important, but it is software that allows legacy systems to coexist with more with it approaches. This silly number of ages makes zero sense and is essentially a subjective, fictional, metaphorical view of the present information technology environment.

Maybe that’s why Gartner hires poets and high profile publications employ folks who might find an hour discussing the metaphorical implications of “bare ruined choirs.”

None of these five disruptions makes much sense to me.

My hunch is that you, gentle reader, may be flummoxed as well.

Stephen E Arnold, July 11, 2015

Enterprise Search and the Mythical Five Year Replacement Cycle

July 9, 2015

I have been around enterprise search for a number of years. In the research we did in 2002 and 2003 for the Enterprise Search Report, my subsequent analyses of enterprise search both proprietary and open source, and the ad hoc work we have done related to enterprise search, we obviously missed something.

Ah, the addled goose and my hapless goslings. The degrees, the experience, the books, and the knowledge had a giant lacuna, a goose egg, a zero, a void. You get the idea.

We did not know that an enterprise licensing an open source or proprietary enterprise search system replaced that system every 60 months. We did document the following enterprise search behaviors:

  • Users express dissatisfaction about any installed enterprise search system. Regardless of vendor, anywhere from 50 to 75 percent of users find the system a source of dissatisfaction. That suggests that enterprise search is not pulling the hay wagon for quite a few users.
  • Organizations, particularly the Fortune 500 firms we polled in 2003, had more than five enterprise search systems installed and in use. The reason for the grandfathering is that each system had its ardent supporters. Companies just grandfathered the system and looked for another system in the hopes of finding one that improved information access. No one replaced anything was our conclusion.
  • Enterprise search systems did not change much from year to year. In fact, the fancy buzzwords used today to describe open source and proprietary systems were in use since the early 1980s. Dig out some of Fulcrum’s marketing collateral or the explanation of ISYS Search Software from 1986 and look for words like clustering, automatic indexing, semantics, etc. A short cut is to read some of the free profiles of enterprise search vendors on my Web site.

I learned about a white paper, which is 21st century jargon for a marketing essay, titled “Best Practices for Enterprise Search: Breaking the Five-Year Replacement Cycle.” The write up comes from a company called Knowledgent. The company describes itself this way on its Who We Are Web page:

Knowledgent [is] a precision-focused data and analytics firm with consistent, field-proven results across industries.

The essay begins with a reference to Lexis, which along with Don Wilson (may he rest in peace) and a couple of colleagues founded. The problem with the reference is that the Lexis search engine was not an enterprise search and retrieval system. The Lexis OBAR system (Ohio State Bar Association) was tailored to the needs of legal researchers, not general employees. Note that Lexis’ marketing in 1973 suggested that anyone could use the command line interface. The OBAR system required content in quite specific formats for the OBAR system to index it. The mainframe roots of OBAR influenced the subsequent iterations of the LexisNexis text retrieval system: Think mainframes, folks. The point is that OBAR was not a system that was replaced in five years. The dog was in the kennel for many years. (For more about the history of Lexis search, see Bourne and Hahn, A History of Online information Services, 1963-1976. By 2010, LexisNexis had migrated to XML and moved from mainframes to lower cost architectures. But the OBAR system’s methods can still be seen in today’s system. Five years. What are the supporting data?

The white paper leaps from the five year “assertion” to an explanation of the “cycle.” In my experience, what organizations do is react to an information access problem and then begin a procurement cycle. Increasingly, as the research for our CyberOSINT study shows, savvy organizations are looking for systems that deliver more than keyword and taxonomy-centric access. Words just won’t work for many organizations today. More content is available in videos, images, and real time almost ephemeral “documents” which can difficult to capture, parse, and make findable. Organizations need systems which provide usable information, not more work for already overextended employees.

The white paper addresses the subject of the value of search. In our research, search is a commodity. The high value information access systems go “beyond search.” One can get okay search in an open source solution or whatever is baked in to a must have enterprise application. Search vendors have a problem because after decades of selling search as a high value system, the licensees know that search is a cost sinkhole and not what is needed to deal with real world information challenges.

What “wisdom” does the white paper impart about the “value” of search. Here’s a representative passage:

There are also important qualitative measures you can use to determine the value and ROI of search in your organization. Surveys can quickly help identify fundamental gaps in content or capability. (Be sure to collect enterprise demographics, too. It is important to understand the needs of specific teams.) An even better approach is to ask users to rate the results produced by the search engine. Simply capturing a basic “thumbs up” or “thumbs down” rating can quickly identify weak spots. Ultimately, some combination of qualitative and quantitative methods will yield an estimate of  search, and the value it has to the company.

I have zero clue how this set of comments can be used to justify the direct and indirect costs of implementing a keyword enterprise search system. The advice is essentially irrelevant to the acquisition of a more advanced system from an leading edge next generation information access vendor like BAE Systems (NetReveal), IBM (not the Watson stuff, however), or Palantir. The fact underscored by our research over the last decade is tough to dispute: Connecting an enterprise search system to demonstrable value is a darned difficult thing to accomplish.

It is far easier to focus on a niche like legal search and eDiscovery or the retrieval of scientific and research data for the firm’s engineering units than to boil the ocean. The idea of “boil the ocean” is that a vendor presents a text centric system (essentially a one trick pony) as an animal with the best of stallions, dogs, tigers, and grubs. The spam about enterprise search value is less satisfying than the steak of showing that an eDiscovery system helped the legal eagles win a case. That, gentle reader, is value. No court judgment. No fine. No PR hit. A grumpy marketer who cannot find a Web article is not value no matter how one spins the story.

Read more

Keyword Search Is Not Productive. Who Says?

June 30, 2015

I noticed a flurry of tweets pointing to a diagram which maps out the Future of Search. You can view the diagram at or Direct your attention to this assertion:

As amount of data grows, keyword search is becoming less productive.

Now look at what will replace keyword search:

  • Social tagging
  • Automatic semantic tagging
  • Natural language search
  • Intelligent agents
  • Web scale reasoning.

The idea is that we will experience a progression through these “operations” or “functions.” The end point is “The Intelligent Web” and the Web scale reasoning approach to information access.

Interesting. But I am not completely comfortable with this analysis.

Let me highlight four observations and then leave you to your own sense of what the Web will become as the amount of data increases.

First, keyword search is a utility function, and it will become ubiquitous. It will not go away or be forgotten. Keyword search will just appear in more and more human machine interactions. Telling your automobile to call John is keyword search. Finding an email is often a matter of plugging a couple of words into the Gmail search box.

Second, more data does translate to programmers lacing together algorithms to deliver information to users. The idea is that a mobile device user will just “get” information. This is a practical response to the form factor, methods to reduce computational loads imposed by routine query processing, and the human desire for good enough information. The information just needs to be good enough which will work for most people. Do you want your child’s doctor to take automatic outputs if your child has cancer?

Third, for certain types of information access, the focus is shifting, as it should, from huge flows of data to chopping flows down into useful chunks. Governments archive intercepts because the computational demands of processing information in real time for large numbers of users who need real time access are an issue. As data volume grows, computing horsepower is laboring to keep pace. Short cuts are, therefore, important. But most of the short cuts require on having a question to answer. Guess what? Those short cuts are often keyword queries. The human may not be doing keyword searching, but the algorithms are.

Fourth, some types of information require both old fashioned Boolean keyword search and retrieval AND the manual, time consuming work of human specialists. In my experience, algorithms are useful, but there are subjects which require the old fashioned methods of querying, reading, researching, analyzing, and discussing. Most of these functions are keyword centric.

In short, keyword queries can be dismissed or dressed up in fancy jargon. I don’t think the method is going away too quickly. Charts and subjective curves are one thing. Real world information interaction is another.

Stephen E Arnold, June 30, 2015

Forrester: Join Us in the Revolution

June 28, 2015

Err, I am not a revolutionary. The term evokes memories and thoughts which I find uncomfortable. Revolution, Forrester, IS/ISIL/Daesh. Shiver.

The intent of ““Big Data” Has Lost Its Zing – Businesses Want Insight And Action” is one of those marketing, mid tier consulting pronouncements. Most of these are designed to stimulate existing customers to buy more expertise or lure those with problems which the management team cannot solve to the door of an expert who purports to have the answer.

I highlighted this passage in pale yellow with my trusty Office Depot highlighter:

I saw it coming last year. Big data isn’t what it used to be. Not because firms are disillusioned with the technology, but rather because the term is no longer helpful. With nearly two-thirds of firms having implemented or planning to implement some big data capability by the end of 2015, the wave has definitely hit. People have bought in. But that doesn’t mean we find many firms extolling the benefits they should be seeing by now; even early adopters still have problems across the customer lifecycle.

Big Data faces challenges because users want accurate, reliable outputs. News?

Stephen E Arnold, June 28, 2015

What Twitter Should Do: The New York Times Opines with Woulda, Coulda, Shoulda Ideas

June 14, 2015

Well, advice from the gray lady about what a digital company should do is fascinating. Frankly, I would be more inclined to go with Snoop Dogg than a newspaper which seems to have made floundering and gesticulating its principal business strategy since Jeff Pemberton walked out the door 40 years ago.


Navigate to “for Twitter, Future Means Here and Now.” Keep in mind that this link may require you to pay money or go on an Easter Egg Hunt for locate a hard copy of the newspaper. Not my problemo, gentle reader. It is the dead tree New York Times’ approach to information.

Here’s one of the passages I circle in yellow and then put a black Sharpie exclamation point next to the sentences:

Twitter, as a service, is many things to many people at different times. It is one of the world’s best sources for news and for jokes about news, a playground for professional networking, and a haven for that most human of pastimes, idle gossip. But because the service offers so many uses, Twitter, as a company, has had trouble focusing on one purpose for which it should aim to excel. The lack of concentration has damaged its prospects with users, investors and advertisers. Choosing a single intent for Twitter — and working to make that a reality — ought to be the next chief’s main task. Among the many uses that Twitter fulfills as a social network, there is one it is uniquely suited for: as a global gathering space for live events. When something goes down in the real world — when a plane crashes, an earthquake strikes, a basketball game gets crazy, or Kanye West hijacks an awards show — Twitter should aim to become the first and only app that people load up to comment on the news.

There you go. Make Twitter into a human intermediated version of the New York Times, lite edition. More data, less filling, and you trim your IQ as well.

I find that journalistic enterprises in the midst of revenue, profit, and innovation swamps have advice to give to digital companies fascinating. I wonder if the gray lady assumes that the stakeholders, Twitter management, and the advisers to the firm have failed to craft options, ideas, tactics, and strategies.

My hunch is that like many Internet centric communication services one rides a curve up due to novelty and apparent utility. Then a new thing comes along like WhatsApp or Jott, and the potential users of the older service just surf newness. Once the cachet fades, a phenomenon with which the New York Times may be familiar, the options just don’t deliver.

Amusing to me, however.

Stephen E Arnold, June 14, 2015

IDC: Knowledge Managemment and Knowledge Quotients

June 2, 2015

IDC tried to sell some of my work on Amazon without my permission. Much lawyering ensued, and IDC removed the $3,500 eight page heavily edited report about Attivio. I suppose that is a form of my knowledge management expertise: But $3,500 for eight pages without my caveats about Attivio? Goodness gracious. $3,500 for eight pages on Amazon, a company I describe as a digital WalMart..

I then wrote a humorous (to me) analysis of an IDC report about something called a knowledge quotient. You can read that Swiftian write up at this link: . I write a column about knowledge management, and I found the notion of the KQ intellectually one of the lighter, almost diaphonous, IDC information molecules.

An I too harsh? No because now there is more evidence for my tough love approach to IDC and its KQ content marketing jingoism.

Navigate to “Where to for Knowledge Management in 2015: IDM Reader Survey.” The survey may or may not be spot on. Some of the data undermine the IDC KQ argument and raise important questions about those who would “manage knowledge.” Also, I had to read the title a couple of times to figure out what IDC’s expert was trying to communicate. The where to for is particularly clumsy to me.

I noted this passage:

“The challenge is for staff being able to find the time to contribute and leverage the knowledge/information repositories and having technology systems that are intuitive putting the right information that their fingertips, instead of having to wade through the sea of information spam.”

Ah, ha. KM is about search.

Wait. Not so fast. I highlighted this statement:

Technology is making it easier to integrate systems and connect across traditional boundaries, and social media has boosted people’s expectations for interaction and feedback. The result is that collaboration across the extended value chain is becoming the new normal.

Yikes. A revelation. KM is about social collaboration.

No, no. Another speed bump. I marked this insight too:

“There is also a fair gap between knowledge of the theoretical and knowledge of how things actually work. It is easy to say we should assign metadata to information to increase its discovery but if that metadata should really be more of a folksonomy, some systems and approaches are far too restrictive to enable this. Semantics is also a big issue.”

Finally. KM is about indexing and semantics. Yes, the info I needed.

Wrong again. I circled this brilliant gem:

“Knowledge management has probably lost it momentum as the so-called measurement tools are really measuring best practice which in turn is an average. Perhaps the approach should be along the lines of “Communities of Process” where there is a common objective but various degrees and level of participation but collectively provide a knowledge pool,” he [survey participant]observed.

The write continues along this rocky road of generalizations and buzzwords.

The survey data make three things clear to me:

  • The knowledge quotient jargon is essentially a scoop of sales Jello, Jack Benny’s long suffering sponsor
  • Knowledge is so broad IDC’s attempt to clarify gave me the giggles
  • Workers know that knowledge has value, so workers protect it with silos.

I assume that experts cooked up the knowledge quotient notion. The pros running the survey reported data which suggests that knowledge management is a bit of a challenge.

Perhaps IDC experts will coordinate their messaging in the future? In my opinion, two slabs of spam do not transmogrify into prime rib.

Little wonder IDC contracts is unable to function, one of its officers (Dave Schubmehl) resells my research on Amazon without my permission at $3,500 per eight pages edited to remove the considerations Attivio warranted from my team. Then an IDC research unit provides data which strike me as turning the silly KQ thing into search engine optimization corn husks.

Is IDC able to manage its own knowledge processes using its own theories and data? Perhaps IDC should drop down a level and focus on basic business processes? Yet IDC’s silos appear before me, gentle reader. and the silos are built from hefty portions of a mystery substance. Could it be consulting spam, to use IDC’s own terminology?

Stephen E Arnold, June 2, 2015


Next Page »