Recorded Future: Google and Cyber OSINT
February 2, 2015
I find the complaints about Google’s inability to handle time amusing. On the surface, Google seems to demote, ignore, or just not understand the concept of time. For the vast majority of Google service users, Google is no substitute for the users’ investment of time and effort into dating items. But for the wide, wide Google audience, ads, not time, are more important.
Does Google really get an F in time? The answer is, “Nope.”
In CyberOSINT: Next Generation Information Access I explain that Google’s time sense is well developed and of considerable importance to next generation solutions the company hopes to offer. Why the craw fishing? Well, Apple could just buy Google and make the bitter taste of the Apple Board of Directors’ experience a thing of the past.
Now to temporal matters in the here and now.
CyberOSINT relies on automated collection, analysis, and report generation. In order to make sense of data and information crunched by an NGIA system, time is a really key metatag item. To figure out time, a system has to understand:
- The date and time stamp
- Versioning (previous, current, and future document, data items, and fact iterations)
- Times and dates contained in a structured data table
- Times and dates embedded in content objects themselves; for example, a reference to “last week” or in some cases, optical character recognition of the data on a surveillance tape image.
For the average query, this type of time detail is overkill. The “time and date” of an event, therefore, requires disambiguation, determination and tagging of specific time types, and then capturing the date and time data with markers for document or data versions.
A simplification of Recorded Future’s handling of unstructured data. The system can also handle structured data and a range of other data management content types. Image copyright Recorded Future 2014.
Sounds like a lot of computational and technical work.
In CyberOSINT, I describe Google’s and In-Q-Tel’s investments in Recorded Future, one of the data forward NGIA companies. Recorded Future has wizards who developed the Spotfire system which is now part of the Tibco service. There are Xooglers like Jason Hines. There are assorted wizards from Sweden, countries the most US high school software cannot locate on a map, and assorted veterans of high technology start ups.
An NGIA system delivers actionable information to a human or to another system. Conversely a licensee can build and integrate new solutions on top of the Recorded Future technology. One of the company’s key inventions is numerical recipes that deal effectively with the notion of “time.” Recorded Future uses the name “Tempora” as shorthand for the advanced technology that makes time along with predictive algorithms part of the Recorded Future solution.
CyberOSINT and the Associated Press
January 31, 2015
Remember the days when there were Associated Press stringers? Remember when the high value AP service was information gathered at state capitols? Remember when humans did this work?
Enter cyber information or as I dub this stuff Cyber OSINT.
Navigate to “AP’s Robot Journalists Are Writing Their Own Stories Now.” I would have added the subtitle “And the Obituaries of Stringers”. The idea is simple: Smart software assembles sentences that comprise a “news story.” Here’s the passage I noted:
Philana Patterson, an assistant business editor at the AP tasked with implementing the system, tells us there was some skepticism from the staff at first. “I wouldn’t expect a good journalist to not be skeptical,” she said. Patterson tells us that when the program first began in July, every automated story had a human touch, with errors logged and sent to Automated Insights to make the necessary tweaks. Full automation began in October, when stories “went out to the wire without human intervention.” Both the AP and Automated Insights tell us that no jobs have been lost due to the new service. We’re also told the automated system is now logging in fewer errors than the human-produced equivalents from years past.
The shift from humans to software is just beginning. To get a glimpse of how industrial strength systems perform far more sophisticated operations automatically, you will want to read CyberOSINT: Next Generation Information Access.
Forget traditional search and information gathering, the world has shifted. You know it when a stodgy, collectively owned outfit like the AP goes public with cyber tools.
When will the enterprise search vendors flogging consulting services and keyword systems figure it out? Perhaps search and indexing companies are the heirs to the cluelessness of news gathering organizations.
Stephen E Arnold, January 31, 2015
List of Cyber Security Companies
January 3, 2015
Short honk: Cyber is hot. Cyber security is even hotter. Some, well, most, of the cyber outfits are not household names. The blue chip consulting firm has produced a list of 100 of these cyber security outfits. If you want the list, navigate to New United’s article “Top 100 Cyber Security Companies: Ones to Watch in 2016.” Keep in mind that this list is probably some of the prospects that the consulting firm wants to convert into paying customers. Nevertheless, the list is interesting if incomplete.
Stephen E Arnold, January 3, 2016
Push Pull Has Lost Out to Collect: The Next Phase of the Internet
December 23, 2014
I am fascinated with the way old insights become the next big thing. Consider “Two Eras of the Internet: Pull and Push.” I am not sure I am comfortable with either of these words. Just as read-write creates an image of how digital information “works,” the notions are difficult to reconcile with what is important about online accessible information.
In the push-pull analogy, the write up focuses on social and flow. The challenge is, “What’s going on now with regards to accessible information?” The answer is, “Collection.” Just as read-write misses the important point about the changes between data that have been written and data that are being written. Wonks refer to this as the “delta.”
My point is that next generation information access is based on these word pairs masquerading as explanations. Based on our research for “CyberOSINT: Next Generation Informaiton Access,” the freshest approach to digital content is automated collection and analysis. How does one make sense of historical, real time, data change, and large volumes of content–predictive analytics that generate useful outputs for humans and for systems.
“CyberOSINT” will be available early in 2015. If you are an active law enforcement, security, and intelligence professional, you can reserve your copy by writing benkent2020 at yahoo dot com. Go beyond simplicity and learn about the information shift changing information access.
Stephen E Arnold, December 23, 2014
Predictive Analytics: An Older Survey with Implications for 2015
November 2, 2014
In my files I had a copy of the 2009 Predictive Analytics World survey about, not surprisingly, predictive analytics. When I first reviewed the data in the report, I noted that “information retrieval” or “search” were not to be found. Before the bandwagon began to roll for predictive analytics, search technology was not in the game if I interpret the survey data correctly.
The factoid I marked was revealed in this table:
The planned use of predictive analytics was for fraud detection.It appears that 64 percent of the sample planned to adopt predictive analytics for criminal or terrorist detection. The method requires filtering various types of public information including text.
Are vendors of enterprise search and content processing systems leaders in this sector in 2014? Based on my research, content processing vendors provide the equivalent of add-in utilities to the popular systems. The hypothesis I have formulated is that traditional information retrieval companies find themselves relegated to a supporting role.
Looking forward to 2015, I see growing dominance by leaders in the cyber OSINT market. Irrelevancy awaits the traditional search vendor unable to identify and then deliver high value solutions to a high demand, high growth market sector.
Has IDC or Dave Schubmehl tracked this sector? I don’t think so. As I produce more information about this market, I anticipate some me-too activity, however.
Stephen E Arnold, November 2, 2014