CyberOSINT banner

Featured

Why Enterprise Search Fails

I participated in a telephone call before the US holiday break. The subject was the likelihood of a potential investment in an enterprise search technology would be a winner. I listened for most of the 60 minute call. I offered a brief example of the over promise and under deliver problems which plagued Convera and Fast Search & Transfer and several of the people on the call asked, “What’s a Convera?” I knew that today’s whiz kids are essentially reinventing the wheel.

I wanted to capture three ideas which I jotted down during that call. My thought is that at some future time, a person wanting to understand the incredible failures that enterprise search vendors have tallied will have three observations to consider.

No background is necessary. You don’t need to read about throwing rocks at the Google bus, search engine optimization, or any of the craziness about search making Big Data a little pussycat.

Enterprise Search: Does a Couple of Things Well When Users Expect Much More

Enterprise search systems ship with filters or widgets which convert source text into a format that the content processing module can index. The problem is that images, videos, audio files, content from wonky legacy systems, or proprietary file formats like IBM i2’s ANB files do not lend themselves to indexing by a standard enterprise search system.  The buyers or licensees of the enterprise search system do not understand this one trick pony nature of text retrieval. Therefore, when the system is deployed, consternation follows confusion when content is not “in” the enterprise search system and, therefore, cannot be found. There are systems which can deal with a wide range of content, but these systems are marketed in a different way, often cost millions of dollars a year to set up, maintain, and operate.

image

Net net: Vendors do not explain the limitations of text search. Licensees do not take the time or have the desire to understand what an enterprise search system can actually do. Marketers obfuscate in order to close the deal. Failure is a natural consequence.

Data Management Needed

The disconnect boils down to what digital information the licensee wants to search. Once the universe is defined, the system into which the data will be placed must be resolved. No data management, no enterprise search. The reason is that licensees and the users of an enterprise search system assume that “all” or “everything” – maps to web content, email to outputs from an AS/400 Ironside are available any time. Baloney. Few organizations have the expertise or the appetite to deal with figuring out what is where, how much, how frequently each type of data changes, and the formats used. I can hear you saying, “Hey, we know what we have and what we need. We don’t need a stupid, time consuming, expensive inventory.” There you go. Failure is a distinct possibility.

image

Net net: Hope springs eternal. When problems arise, few know what’s where, who’s on first, and why I don’t know is on third.

Read more »

Interviews

Exclusive Interview: Danny Rogers, Terbium Labs

Editor’s note: The full text of the exclusive interview with Dr. Daniel J. Rogers, co-founder of Terbium Labs, is available on the Xenky Cyberwizards Speak Web service at www.xenky.com/terbium-labs. The interview was conducted on August 4, 2015.

Significant innovations in information access, despite the hyperbole of marketing and sales professionals, are relatively infrequent. In an exclusive interview, Danny Rogers, one of the founders of Terbium Labs, has developed a way to flip on the lights to make it easy to locate information hidden in the Dark Web.

Web search has been a one-trick pony since the days of Excite, HotBot, and Lycos. For most people, a mobile device takes cues from the user’s location and click streams and displays answers. Access to digital information requires more than parlor tricks and pay-to-play advertising. A handful of companies are moving beyond commoditized search, and they are opening important new markets such as secret and high value data theft. Terbium Labs can “illuminate the Dark Web.”

In an exclusive interview, Dr. Danny Rogers, one of the founders of Terbium Labs with Michael Moore, explained the company’s ability to change how data breaches are located. He said:

Typically, breaches are discovered by third parties such as journalists or law enforcement. In fact, according to Verizon’s 2014 Data Breach Investigations Report, that was the case in 85% of data breaches. Furthermore, discovery, because it is by accident, often takes months, or may not happen at all when limited personnel resources are already heavily taxed. Estimates put the average breach discovery time between 200 and 230 days, an exceedingly long time for an organization’s data to be out of their control. We hope to change that. By using Matchlight, we bring the breach discovery time down to between 30 seconds and 15 minutes from the time stolen data is posted to the web, alerting our clients immediately and automatically. By dramatically reducing the breach discovery time and bringing that discovery into the organization, we’re able to reduce damages and open up more effective remediation options.

Terbium’s approach, it turns out, can be applied to traditional research into content domains to which most systems are effectively blind. At this time, a very small number of companies are able to index content that is not available to traditional content processing systems. Terbium acquires content from Web sites which require specialized software to access. Terbium’s system then processes the content, converting it into the equivalent of an old-fashioned fingerprint. Real-time pattern matching makes it possible for the company’s system to locate a client’s content, either in textual form, software binaries, or other digital representations.

One of the most significant information access innovations uses systems and methods developed by physicists to deal with the flood of data resulting from research into the behaviors of difficult-to-differentiate sub atomic particles.

One part of the process is for Terbium to acquire (crawl) content and convert it into encrypted 14 byte strings of zeros and ones. A client such as a bank then uses the Terbium content encryption and conversion process to produce representations of the confidential data, computer code, or other data. Terbium’s system, in effect, looks for matching digital fingerprints. The task of locating confidential or proprietary data via traditional means is expensive and often a hit and miss affair.

Terbium Labs changes the rules of the game and in the process has created a way to provide its licensees with anti-fraud and anti-theft measures which are unique. In addition, Terbium’s digital fingerprints make it possible to find, analyze, and make sense of digital information not previously available. The system has applications for the Clear Web, which millions of people access every minute, to the hidden content residing on the so called Dark Web.

image

Terbium Labs, a start up located in Baltimore, Maryland, has developed technology that makes use of advanced mathematics—what I call numerical recipes—to perform analyses for the purpose of finding connections. The firm’s approach is one that deals with strings of zeros and ones, not the actual words and numbers in a stream of information. By matching these numerical tokens with content such as a data file of classified documents or a record of bank account numbers, Terbium does what strikes many, including myself, as a remarkable achievement.

Terbium’s technology can identify highly probable instances of improper use of classified or confidential information. Terbium can pinpoint where the compromised data reside on either the Clear Web, another network, or on the Dark Web. Terbium then alerts the organization about the compromised data and work with the victim of Internet fraud to resolve the matter in a satisfactory manner.

Terbium’s breakthrough has attracted considerable attention in the cyber security sector, and applications of the firm’s approach are beginning to surface for disciplines from competitive intelligence to health care.

Rogers explained:

We spent a significant amount of time working on both the private data fingerprinting protocol and the infrastructure required to privately index the dark web. We pull in billions of hashes daily, and the systems and technology required to do that in a stable and efficient way are extremely difficult to build. Right now we have over a quarter trillion data fingerprints in our index, and that number is growing by the billions every day.

The idea for the company emerged from a conversation with a colleague who wanted to find out immediately if a high profile client list was ever leaded to the Internet. But, said Rogers, “This individual could not reveal to Terbium the list itself.”

How can an organization locate secret information if that information cannot be provided to a system able to search for the confidential information?

The solution Terbium’s founders developed relies on novel use of encryption techniques, tokenization, Clear and Dark Web content acquisition and processing, and real time pattern matching methods. The interlocking innovations have been patented (US8,997,256), and Terbium is one of the few, perhaps the only company in the world, able to crack open Dark Web content within regulatory and national security constraints.

Rogers said:

I think I have to say that the adversaries are winning right now. Despite billions being spent on information security, breaches are happening every single day. Currently, the best the industry can do is be reactive. The adversaries have the perpetual advantage of surprise and are constantly coming up with new ways to gain access to sensitive data. Additionally, the legal system has a long way to go to catch up with technology. It really is a free-for-all out there, which limits the ability of governments to respond. So right now, the attackers seem to be winning, though we see Terbium and Matchlight as part of the response that turns that tide.

Terbium’s product is Matchlight. According to Rogers:

Matchlight is the world’s first truly private, truly automated data intelligence system. It uses our data fingerprinting technology to build and maintain a private index of the dark web and other sites where stolen information is most often leaked or traded. While the space on the internet that traffics in that sort of activity isn’t intractably large, it’s certainly larger than any human analyst can keep up with. We use large-scale automation and big data technologies to provide early indicators of breach in order to make those analysts’ jobs more efficient. We also employ a unique data fingerprinting technology that allows us to monitor our clients’ information without ever having to see or store their originating data, meaning we don’t increase their attack surface and they don’t have to trust us with their information.

For more information about Terbium, navigate to the company’s Web site. The full text of the interview appears on Stephen E Arnold’s Xenky cyberOSINT Web site at http://bit.ly/1TaiSVN.

Stephen E Arnold, August 11, 2015

Latest News

Bing Becomes More Like Telex

Decades ago, probably around 1980, I met with a person in New York City. The purpose of the meeting was to talk about value added information provided to users of... Read more »

July 23, 2016 | | Comment

Palantir Thiel: An If Then Chess Move?

I read “The Peter Principle: Why Thiel’s GOP Convention Speech Will Be about Him and Not about Silicon Valley.” Interesting write up but I think the “about”... Read more »

July 22, 2016 | | Comment

DuckDuckGo: Filtering

I read “Is DuckDuckGo.com Partially Enforcing the “Celebrity Threesome Injunction“? The point of the write up is that information is filtered from... Read more »

July 22, 2016 | | Comment

Alphabet Google Is Busy Reinventing

From Forbes in India (“Sundar Pichai to Reinvent Google with a Heavy Dose of Artificial Intelligence” which may require a proxy maneuver due to the digitally... Read more »

July 22, 2016 | | Comment

Oracle v Google Copyright Trial in Progress

The battle between Google and Oracle over Android’s use of Java has gone to federal court, and the trial is expected to conclude in June. CBS San Francisco Bay... Read more »

July 22, 2016 | | Comment

Meet the Company Selling Our Medical Data

A company with a long history is getting fresh scrutiny. An article at Fortune reports, “This Little-Known Firm Is Getting Rich Off Your Medical Data.” Writer... Read more »

July 22, 2016 | | Comment

Is IBM Vulnerable to OpenText?

I read “Hey, IBM, OpenText Is Coming for You.” The write up reports that the poobah of OpenText said that its new Magellan system is “a next generation analytics... Read more »

July 21, 2016 | | Comment

Amazon: Not the Corner Store? Big Insight

I love Amazon almost as much as I love Google. I would have a tough time deciding which of these services warrants more of my affection, trust, and respect. I said... Read more »

July 21, 2016 | | Comment

Coveo Wins a Stevie. Congrats Coveo. What Is a Stevie?

The article titled Coveo Sweeps Early 2016 Awards Programs on Coveo promotes some of the many honors and recognitions that the Coveo company and its apps have earned.... Read more »

July 21, 2016 | | Comment

Scholarship Evolving with the Web

Is big data good only for the hard sciences, or does it have something to offer the humanities? Writer Marcus A Banks thinks it does, as he states in, “Challenging... Read more »

July 21, 2016 | | Comment