CyberOSINT banner

Featured

Enterprise Search and the Mythical Five Year Replacement Cycle

I have been around enterprise search for a number of years. In the research we did in 2002 and 2003 for the Enterprise Search Report, my subsequent analyses of enterprise search both proprietary and open source, and the ad hoc work we have done related to enterprise search, we obviously missed something.

Ah, the addled goose and my hapless goslings. The degrees, the experience, the books, and the knowledge had a giant lacuna, a goose egg, a zero, a void. You get the idea.

We did not know that an enterprise licensing an open source or proprietary enterprise search system replaced that system every 60 months. We did document the following enterprise search behaviors:

  • Users express dissatisfaction about any installed enterprise search system. Regardless of vendor, anywhere from 50 to 75 percent of users find the system a source of dissatisfaction. That suggests that enterprise search is not pulling the hay wagon for quite a few users.
  • Organizations, particularly the Fortune 500 firms we polled in 2003, had more than five enterprise search systems installed and in use. The reason for the grandfathering is that each system had its ardent supporters. Companies just grandfathered the system and looked for another system in the hopes of finding one that improved information access. No one replaced anything was our conclusion.
  • Enterprise search systems did not change much from year to year. In fact, the fancy buzzwords used today to describe open source and proprietary systems were in use since the early 1980s. Dig out some of Fulcrum’s marketing collateral or the explanation of ISYS Search Software from 1986 and look for words like clustering, automatic indexing, semantics, etc. A short cut is to read some of the free profiles of enterprise search vendors on my Xenky.com Web site.

I learned about a white paper, which is 21st century jargon for a marketing essay, titled “Best Practices for Enterprise Search: Breaking the Five-Year Replacement Cycle.” The write up comes from a company called Knowledgent. The company describes itself this way on its Who We Are Web page:

Knowledgent [is] a precision-focused data and analytics firm with consistent, field-proven results across industries.

The essay begins with a reference to Lexis, which along with Don Wilson (may he rest in peace) and a couple of colleagues founded. The problem with the reference is that the Lexis search engine was not an enterprise search and retrieval system. The Lexis OBAR system (Ohio State Bar Association) was tailored to the needs of legal researchers, not general employees. Note that Lexis’ marketing in 1973 suggested that anyone could use the command line interface. The OBAR system required content in quite specific formats for the OBAR system to index it. The mainframe roots of OBAR influenced the subsequent iterations of the LexisNexis text retrieval system: Think mainframes, folks. The point is that OBAR was not a system that was replaced in five years. The dog was in the kennel for many years. (For more about the history of Lexis search, see Bourne and Hahn, A History of Online information Services, 1963-1976. By 2010, LexisNexis had migrated to XML and moved from mainframes to lower cost architectures. But the OBAR system’s methods can still be seen in today’s system. Five years. What are the supporting data?

The white paper leaps from the five year “assertion” to an explanation of the “cycle.” In my experience, what organizations do is react to an information access problem and then begin a procurement cycle. Increasingly, as the research for our CyberOSINT study shows, savvy organizations are looking for systems that deliver more than keyword and taxonomy-centric access. Words just won’t work for many organizations today. More content is available in videos, images, and real time almost ephemeral “documents” which can difficult to capture, parse, and make findable. Organizations need systems which provide usable information, not more work for already overextended employees.

The white paper addresses the subject of the value of search. In our research, search is a commodity. The high value information access systems go “beyond search.” One can get okay search in an open source solution or whatever is baked in to a must have enterprise application. Search vendors have a problem because after decades of selling search as a high value system, the licensees know that search is a cost sinkhole and not what is needed to deal with real world information challenges.

What “wisdom” does the white paper impart about the “value” of search. Here’s a representative passage:

There are also important qualitative measures you can use to determine the value and ROI of search in your organization. Surveys can quickly help identify fundamental gaps in content or capability. (Be sure to collect enterprise demographics, too. It is important to understand the needs of specific teams.) An even better approach is to ask users to rate the results produced by the search engine. Simply capturing a basic “thumbs up” or “thumbs down” rating can quickly identify weak spots. Ultimately, some combination of qualitative and quantitative methods will yield an estimate of  search, and the value it has to the company.

I have zero clue how this set of comments can be used to justify the direct and indirect costs of implementing a keyword enterprise search system. The advice is essentially irrelevant to the acquisition of a more advanced system from an leading edge next generation information access vendor like BAE Systems (NetReveal), IBM (not the Watson stuff, however), or Palantir. The fact underscored by our research over the last decade is tough to dispute: Connecting an enterprise search system to demonstrable value is a darned difficult thing to accomplish.

It is far easier to focus on a niche like legal search and eDiscovery or the retrieval of scientific and research data for the firm’s engineering units than to boil the ocean. The idea of “boil the ocean” is that a vendor presents a text centric system (essentially a one trick pony) as an animal with the best of stallions, dogs, tigers, and grubs. The spam about enterprise search value is less satisfying than the steak of showing that an eDiscovery system helped the legal eagles win a case. That, gentle reader, is value. No court judgment. No fine. No PR hit. A grumpy marketer who cannot find a Web article is not value no matter how one spins the story.

Read more »

Interviews

Recorded Future: The Threat Detection Leader

The Exclusive Interview with Jason Hines, Global Vice President at Recorded Future

In my analyses of Google technology, despite the search giant’s significant technical achievements, Google has a weakness. That “issue” is the company’s comparatively weak time capabilities. Identifying the specific time at which an event took place or is taking place is a very difficult computing problem. Time is essential to understanding the context of an event.

This point becomes clear in the answers to my questions in the Xenky Cyber Wizards Speak interview, conducted on April 25, 2015, with Jason Hines, one of the leaders in Recorded Future’s threat detection efforts. You can read the full interview with Hines on the Xenky.com Cyber Wizards Speak site at the Recorded Future Threat Intelligence Blog.

Recorded Future is a rapidly growing, highly influential start up spawned by a team of computer scientists responsible for the Spotfire content analytics system. The team set out in 2010 to use time as one of the lynch pins in a predictive analytics service. The idea was simple: Identify the time of actions, apply numerical analyses to events related by semantics or entities, and flag important developments likely to result from signals in the content stream. The idea was to use time as the foundation of a next generation analysis system, complete with visual representations of otherwise unfathomable data from the Web, including forums, content hosting sites like Pastebin, social media, and so on.

Recorded Future Interface

A Recorded Future data dashboard it easy for a law enforcement or intelligence professionals to identify important events and, with a mouse click, zoom to the specific data of importance to an investigation. (Used with the permission of Recorded Future, 2015.)

Five years ago, the tools for threat detection did not exist. Components like distributed content acquisition and visualization provided significant benefits to enterprise and consumer applications. Google, for example, built a multi-billion business using distributed processes for Web searching. Salesforce.com integrated visualization into its cloud services to allow its customers to “get insight faster.”

According to Jason Hines, one of the founders of Recorded Future and a former Google engineer, “When our team set out about five years ago, we took on the big challenge of indexing the Web in real time for analysis, and in doing so developed unique technology that allows users to unlock new analytic value from the Web.”

Recorded Future attracted attention almost immediately. In what was an industry first, Google and In-Q-Tel (the investment arm of the US government) invested in the Boston-based company. Threat intelligence is a field defined by Recorded Future. The ability to process massive real time content flows and then identify hot spots and items of interest to a matter allows an authorized user to identify threats and take appropriate action quickly. Fueled by commercial events like the security breach at Sony and cyber attacks on the White House, threat detection is now a core business concern.

The impact of Recorded Future’s innovations on threat detection was immediate. Traditional methods relied on human analysts. These methods worked but were and are slow and expensive. The use of Google-scale content processing combined with “smart mathematics” opened the door to a radically new approach to threat detection. Security, law enforcement, and intelligence professionals understood that sophisticated mathematical procedures combined with a real-time content processing capability would deliver a new and sophisticated approach to reducing risk, which is the central focus of threat detection.

In the exclusive interview with Xenky.com, the law enforcement and intelligence information service, Hines told me:

Recorded Future provides information security analysts with real-time threat intelligence to proactively defend their organization from cyber attacks. Our patented Web Intelligence Engine indexes and analyzes the open and Deep Web to provide you actionable insights and real-time alerts into emerging and direct threats. Four of the top five companies in the world rely on Recorded Future.

Despite the blue ribbon technology and support of organizations widely recognized as the most sophisticated in the technology sector, Recorded Future’s technology is a response to customer needs in the financial, defense, and security sectors. Hines said:

When it comes to security professionals we really enable them to become more proactive and intelligence-driven, improve threat response effectiveness, and help them inform the leadership and board on the organization’s threat environment. Recorded Future has beautiful interactive visualizations, and it’s something that we hear security administrators love to put in front of top management.

As the first mover in the threat intelligence sector, Recorded Future makes it possible for an authorized user to identify high risk situations. The company’s ability to help forecast and spotlight threats likely to signal a potential problem has obvious benefits. For security applications, Recorded Future identifies threats and provides data which allow adaptive perimeter systems like intelligent firewalls to proactively respond to threats from hackers and cyber criminals. For law enforcement, Recorded Future can flag trends so that investigators can better allocate their resources when dealing with a specific surveillance task.

Hines told me that financial and other consumer centric firms can tap Recorded Future’s threat intelligence solutions. He said:

We are increasingly looking outside our enterprise and attempt to better anticipate emerging threats. With tools like Recorded Future we can assess huge swaths of behavior at a high level across the network and surface things that are very pertinent to your interests or business activities across the globe. Cyber security is about proactively knowing potential threats, and much of that is previewed on IRC channels, social media postings, and so on.

In my new monograph CyberOSINT: Next Generation Information Access, Recorded Future emerged as the leader in threat intelligence among the 22 companies offering NGIA services. To learn more about Recorded Future, navigate to the firm’s Web site at www.recordedfuture.com.

Stephen E Arnold, April 29, 2015

Latest News

Connecting SharePoint with External Data

One of the most frequently discussed SharePoint struggles is integrating SharePoint data with existing external data. IT Business Edge has compiled a short slideshow... Read more »

July 28, 2015 | | Comment

Monkeys Cause System Failure

Nobody likes to talk about his or her failures.  Admitting to failure proves that you failed at a task in the past and it is a big blow to the ego.  Failure admission... Read more »

July 28, 2015 | | Comment

Googles Chauvinistic Job Advertising Delivery

I thought we were working to get more women into the tech industry, not fewer. That’s why it was so disappointing to read, “Google Found to Specifically Target... Read more »

July 28, 2015 | | Comment

PageRank: Viewed through the Linear Algebra Sunglasses

I urge you to read and work through the examples in “The $25,000,000,000,000 Eigenvector: The Linear Algebra behind Google.” The write up was a tour be force... Read more »

July 27, 2015 | | Comment

Unemployed in Search or Content Processing? Go for Data Science

I read an amazing write up. The title of this gem of high school counseling is “7 Skills/Attitudes to Become a Better Data Scientist.” What does one need to... Read more »

July 27, 2015 | | Comment

PowerPoint Enabled Big Data Presenters Rejoice

Navigate to “A Plethora of Big Data Infographics.” Note that the original write up misspells “plethora” at “pletora” but, as many in Big Data say, “it... Read more »

July 27, 2015 | | Comment

Instagram’s Search Feature Is A Vast Improvement

Instagram apparently knows more about your life than you or your friends.  The new search overhaul comes with new features that reveal more information than you... Read more »

July 27, 2015 | | Comment

Data Companies Poised to Leverage Open Data

Support for open data, government datasets freely available to the public, has taken off in recent years; the federal government’s launch of Data.gov in 2009 is... Read more »

July 27, 2015 | | Comment

Semantic Promotions and a Nutrition Free Exercise

I saw a link to an item called “5 Basic Steps to Make Sure You Hit Page 1 on Google.” I followed it to this message: One link pointed to this page: But this... Read more »

July 26, 2015 | | Comment

Forbes and Some Big Data Forecasts

Short honk: For fee, mid tier consultants have had their thunder stolen. Forbes, the capitalist tool, wants to make certain its readers know how juicy Big Data is... Read more »

July 26, 2015 | | Comment