March 4, 2016
I read “Google Continuing Effort to Win Allies Amid Europe Antitrust Tax Probes.” (You may have to pay to view this article and its companion “European newspapers Get Google Grants.” Hey, Mr. Murdoch has bills to pay.)
As you may know, Google has had an on again off again relationship with “real” publishers. Also, Alphabet Google finds itself snared in some income tax matters.
The write up points out:
Alphabet Inc.’s Google … awarded a set of grants to European newspapers and offered to help protect them from cyber attacks, continuing an effort to win allies while it faces both antitrust and tax probes.
I find this interesting. Has Alphabet Google “trumped” some of its early activities. I like that protection against cyber attacks too. Does that mean that Alphabet Google does not protect other folks against cyber attacks?
Stephen E Arnold, March 4, 2016
May 5, 2015
Kenny Toth, May 5, 2015
April 29, 2015
The Exclusive Interview with Jason Hines, Global Vice President at Recorded Future
In my analyses of Google technology, despite the search giant’s significant technical achievements, Google has a weakness. That “issue” is the company’s comparatively weak time capabilities. Identifying the specific time at which an event took place or is taking place is a very difficult computing problem. Time is essential to understanding the context of an event.
This point becomes clear in the answers to my questions in the Xenky Cyber Wizards Speak interview, conducted on April 25, 2015, with Jason Hines, one of the leaders in Recorded Future’s threat detection efforts. You can read the full interview with Hines on the Xenky.com Cyber Wizards Speak site at the Recorded Future Threat Intelligence Blog.
Recorded Future is a rapidly growing, highly influential start up spawned by a team of computer scientists responsible for the Spotfire content analytics system. The team set out in 2010 to use time as one of the lynch pins in a predictive analytics service. The idea was simple: Identify the time of actions, apply numerical analyses to events related by semantics or entities, and flag important developments likely to result from signals in the content stream. The idea was to use time as the foundation of a next generation analysis system, complete with visual representations of otherwise unfathomable data from the Web, including forums, content hosting sites like Pastebin, social media, and so on.
A Recorded Future data dashboard it easy for a law enforcement or intelligence professionals to identify important events and, with a mouse click, zoom to the specific data of importance to an investigation. (Used with the permission of Recorded Future, 2015.)
Five years ago, the tools for threat detection did not exist. Components like distributed content acquisition and visualization provided significant benefits to enterprise and consumer applications. Google, for example, built a multi-billion business using distributed processes for Web searching. Salesforce.com integrated visualization into its cloud services to allow its customers to “get insight faster.”
According to Jason Hines, one of the founders of Recorded Future and a former Google engineer, “When our team set out about five years ago, we took on the big challenge of indexing the Web in real time for analysis, and in doing so developed unique technology that allows users to unlock new analytic value from the Web.”
Recorded Future attracted attention almost immediately. In what was an industry first, Google and In-Q-Tel (the investment arm of the US government) invested in the Boston-based company. Threat intelligence is a field defined by Recorded Future. The ability to process massive real time content flows and then identify hot spots and items of interest to a matter allows an authorized user to identify threats and take appropriate action quickly. Fueled by commercial events like the security breach at Sony and cyber attacks on the White House, threat detection is now a core business concern.
The impact of Recorded Future’s innovations on threat detection was immediate. Traditional methods relied on human analysts. These methods worked but were and are slow and expensive. The use of Google-scale content processing combined with “smart mathematics” opened the door to a radically new approach to threat detection. Security, law enforcement, and intelligence professionals understood that sophisticated mathematical procedures combined with a real-time content processing capability would deliver a new and sophisticated approach to reducing risk, which is the central focus of threat detection.
In the exclusive interview with Xenky.com, the law enforcement and intelligence information service, Hines told me:
Recorded Future provides information security analysts with real-time threat intelligence to proactively defend their organization from cyber attacks. Our patented Web Intelligence Engine indexes and analyzes the open and Deep Web to provide you actionable insights and real-time alerts into emerging and direct threats. Four of the top five companies in the world rely on Recorded Future.
Despite the blue ribbon technology and support of organizations widely recognized as the most sophisticated in the technology sector, Recorded Future’s technology is a response to customer needs in the financial, defense, and security sectors. Hines said:
When it comes to security professionals we really enable them to become more proactive and intelligence-driven, improve threat response effectiveness, and help them inform the leadership and board on the organization’s threat environment. Recorded Future has beautiful interactive visualizations, and it’s something that we hear security administrators love to put in front of top management.
As the first mover in the threat intelligence sector, Recorded Future makes it possible for an authorized user to identify high risk situations. The company’s ability to help forecast and spotlight threats likely to signal a potential problem has obvious benefits. For security applications, Recorded Future identifies threats and provides data which allow adaptive perimeter systems like intelligent firewalls to proactively respond to threats from hackers and cyber criminals. For law enforcement, Recorded Future can flag trends so that investigators can better allocate their resources when dealing with a specific surveillance task.
Hines told me that financial and other consumer centric firms can tap Recorded Future’s threat intelligence solutions. He said:
We are increasingly looking outside our enterprise and attempt to better anticipate emerging threats. With tools like Recorded Future we can assess huge swaths of behavior at a high level across the network and surface things that are very pertinent to your interests or business activities across the globe. Cyber security is about proactively knowing potential threats, and much of that is previewed on IRC channels, social media postings, and so on.
In my new monograph CyberOSINT: Next Generation Information Access, Recorded Future emerged as the leader in threat intelligence among the 22 companies offering NGIA services. To learn more about Recorded Future, navigate to the firm’s Web site at www.recordedfuture.com.
Stephen E Arnold, April 29, 2015
April 13, 2015
Do you want a way to search medical information without false drops, the need to learn specialized vocabularies, and sidestep Boolean? Apparently the purveyors of medical search systems have left a user scratch without an antihistamine within reach.
Navigate to Slideshare (yep, LinkedIn) and flip through “Current Advances to Bridge the Usability Expressivity Gap in biomedical Semantic Search.” Before reading the 51 slide deck, you may want to refresh yourself with Quertle, PubMed, MedNar, or one of the other splendiferous medical information resources for researchers.
The slide deck identifies the problems with the existing search approaches. I can relate to these points. For example, those who tout question answering systems ignore the difficulty of passing a question from medicine to a domain consisting of math content. With math the plumbing in many advanced medical processes, the weakness is a bit of a problem and has been for decades.
The “fix” is semantic search. Well, that’s the theory. I interpreted the slide deck as communicating how a medical search system called ReVeaLD would crack this somewhat difficult nut. As an aside: I don’t like the wonky spelling that some researchers and marketers are foisting on the unsuspecting.
I admit that I am skeptical about many NGIA or next generation information access systems. One reason medical research works as well as it does is its body of generally standardized controlled term words. Learn MeSH and you have a fighting chance of figuring out if the drug the doctor prescribed is going to kill off your liver as it remediates your indigestion. Controlled vocabularies in scientific, technology, engineering, and medical domains address the annoying ambiguity problems encounter when one mixes colloquial words with quasi consultant speak. A technical buzzword is part of a technical education. It works, maybe not too well, but it works better than some of the wild and crazy systems which I have explored over the years.
You will have to dig through old jargon and new jargon such as entity reconciliation. In the law enforcement and intelligence fields, an entity from one language has to be “reconciled” with versions of the “entity” in other languages and from other domains. The technology is easier to market than make work. The ReVeaLD system is making progress as I understand the information in the slide deck.
Like other advanced information access systems, ReVeaLD has a fair number of moving parts. Here’s the diagram from Slide 27 in the deck:
There is also a video available at this link. The video explains that Granatum Project uses a constrained domain specific language. So much for cross domain queries, gentle reader. What is interesting to me is the similarity between the ReVeaLD system and some of the cyber OSINT next generation information access systems profiled in my new monograph. There is a visual query builder, a browser for structured data, visualization, and a number of other bells and whistles.
- Finding relevant technical information requires effort. NGIA systems also require the user to exert effort. Finding the specific information required to solve a time critical problem remains a hurdle for broader deployment of some systems and methods.
- The computational load for sophisticated content processing is significant. The ReVeaLD system is likely to such up its share of machine resources.
- Maintaining a system with many moving parts when deployed outside of a research demonstration presents another series of technical challenges.
I am encouraged, but I want to make certain that my one or two readers understand this point: Demos and marketing are much easier to roll out than a hardened, commercial system. Just as the EC’s Promise program, ReVeaLD may have to communicate its achievements to the outside world. A long road must be followed before this particular NGIA system becomes available in Harrod’s Creek, Kentucky.
Stephen E Arnold, April 13, 2015
March 25, 2015
I read “Lexmark Buys Software Maker Kofax at 47% Premium in $1B Deal.” The write up focuses on Kofax’s content management services. Largely overlooked is Kofax’s Kapow Tech unit. This company provides specialized services to intelligence, law enforcement, and security entities. How will a printer company in Lexington manage the ageing Kofax technology and the more promising Kapow entity? This should be interesting. Lexmark already owns the Brainware technology and the ISYS Search Software system. Lexmark is starting to look a bit like IBM and OpenText. These companies have rolled up promising firms, only to lose their focus. Will Lexmark follow in IBM’s footsteps and cook up a Watson? I think there is still some IBM DNA in the pale blue veins of the Lexmark outfit. On the other hand, Lexmark seems to be emulating some of the dance steps emerging from the Hewlett Packard ballroom as well. Fascinating. The mid-tier consultants with waves, quadrants, and paid for webinars will have to find a way to shoehorn hardware, health care, intelligence, and document scanning into one overview. Confused? Just wait.
Stephen E Arnold, March 25, 2015
February 11, 2015
Download an open source enterprise search system or license a proprietary system. Once the system has been installed, the content crawled, the index built, the interfaces set up, and the system optimized the job is complete, right?
Not quite. Retrofitting a keyword search system to meet today’s security requirements is a complex, time consuming, and expensive task. That’s why “experts” who write about search facets, search as a Big Data system, and search as a business intelligence solution ignore security or reassure their customers that it is no big deal. Security is a big deal, and it is becoming a bigger deal with each passing day.
There are a number of security issues to address. The easiest of these is figuring out how to piggyback on access controls provided by a system like Microsoft SharePoint. Other organizations use different enterprise software. As I said, using access controls already in place and diligently monitored by a skilled security administrator is the easy part.
A number of sticky wickets remain; for example:
- Some units of the organization may do work for law enforcement or intelligence entities. There may be different requirements. Some are explicit and promulgated by government agencies. Others may be implicit, acknowledged as standard operating procedure by those with the appropriate clearance and the need to know.
- Specific administrative content must be sequestered. Examples range from information assembled for employee health or compliance requirements for pharma products or controlled substances.
- Legal units may require that content be contained in a managed system and administrative controls put in place to ensure that no changes are introduced into a content set, access is provided to those with specific credential, or kept “off the radar” as the in house legal team tries to figure out how to respond to a discovery activity.
- Some research units may be “black”; that is, no one in the company, including most information technology and security professionals are supposed to know where an activity is taking place, what the information of interest to the research team is, and specialized security steps be enforced. These can include dongles, air gaps, and unknown locations and staff.
An enterprise search system without NGIA security functions is like a 1960s Chevrolet project car. Buy it ready to rebuild for $4,500 and invest $100,000 or more to make it conform to 2015’s standards. Source: http://car.mitula.us/impala-project
How do enterprise search systems deal with these access issues? Are not most modern systems positioned to index “all” content? Is the procedures for each of these four examples part of the enterprise search systems’ administrative tool kit?
Based on the research I conducted for CyberOSINT: Next Generation Information Access and my other studies of enterprise search, the answer is, “No.”
February 10, 2015
President Obama’s announcement of a new entity to combat the deepening threat from cyber attacks adds an important resource to counter cyber threats.
The decision reflects the need for additional counter terrorism resources in the wake of the Sony and Anthem security breaches. The new initiative serves both Federal and commercial sectors’ concerns with escalating cyber threats.
The Department of Homeland Security said in a public release: “National Cybersecurity and Communications Integration Center mission is to reduce the likelihood and severity of incidents that may significantly compromise the security and resilience of the Nation’s critical information technology and communications networks.”
For the first time, a clear explanation of the software and systems that perform automated collection and analysis of digital information is available. Stephen E. Arnold’s new book is “CyberOSINT: Next Generation Information Access” was written to provide information about advanced information access technology. The new study was published by Beyond Search on January 21, 2015.
The author is Stephen E Arnold, a former executive at Halliburton Nuclear Services and Booz, Allen & Hamilton . He said: “The increase in cyber threats means that next generation systems will play a rapidly increasing part in law enforcement and intelligence activities.”
The monograph explains why next generation information access systems are the logical step beyond keyword search. Also, the book provides the first overview of the architecture of cyber OSINT systems. The monograph provides profiles of more than 20 systems now available to government entities and commercial organizations. The study includes a summary of the year’s research behind the monograph and a glossary of the terms used in cyber OSINT.
Cyber threats require next generation information access systems due to proliferating digital attacks. According to Chuck Cohen, lieutenant with a major Midwestern law enforcement agency and adjunct instructor at Indiana University, “This book is an important introduction to cyber tools for open source information. Investigators and practitioners needing an overview of the companies defining this new enterprise software sector will want this monograph.”
In February 2015, Arnold will keynote a conference on CyberOSINT held in the Washington, DC area. Attendance to the conference is by invitation only. Those interested in the a day long discussion of cyber OSINT can write benkent2020 at yahoo dot com to express their interest in the limited access program.
Arnold added: “Using highly-automated systems, governmental entities and corporations can detect and divert cyber attacks and take steps to prevent assaults and apprehend the people that are planning them. Manual methods such as key word searches are inadequate due to the volume of information to be analyzed and the rapid speed with which threats arise.”
Robert David Steele, a former CIA professional and the co-creator of the Marine Corps. intelligence activity said about the new study: “NGIA systems are integrated solutions that blend software and hardware to address very specific needs. Our intelligence, law enforcement, and security professionals need more than brute force keyword search. This report will help clients save hundreds of thousands of dollars.”
Information about the new monograph is available at www.xenky.com/cyberosint.
Ken Toth, February 10, 2015
February 5, 2015
One of the content challenges traditional enterprise search trips over is geographic functions. When an employee looks for content, the implicit assumption is that keywords will locate a list of documents in which the information may be located. The user then scans the results list—whether in Google style laundry lists or in the graphic display popularized by Grokker and Kartoo which have gone dark. (Quick aside: Both of these outfits reflect the influence of French information retrieval wizards. I think of these as emulators of Datops “balls” displays.)
A results list displayed by the Grokker system. The idea is that the user explores the circular areas. These contain links to content germane to the user’s keyword query.
The Kartoo interface displays sources connected to related sources. Once again the user clicks and goes through the scan, open, read, extract, and analyze process.
In a broad view, both of these visualizations are maps of information. Do today’s users want these type of hard to understand maps?
In CyberOSINT I explore the role of “maps” or more properly geographic intelligence (geoint), geo-tagging, and geographic outputs) from automatically collected and analyzed data.
The idea is that a next generation information access system recognizes geographic data and displays those data in maps. Think in terms of overlays on the eye popping maps available from commercial imagery vendors.
What do these outputs look like? Let me draw one example from the discussion in CyberOSINT about this important approach to enterprise related information. Keep in mind that an NGIA can process any information made available to the systems; for example, enterprise accounting systems or databased content along with text documents.
In response to either a task, a routine update when new information becomes available, or a request generated by a user with a mobile device, the output looks like this on a laptop:
Source: ClearTerra, 2014
The approach that ClearTerra offers allows a person looking for information about customers, prospects, or other types of data which carries geo-codes appears on a dynamic map. The map can be displayed on the user’s device; for example a mobile phone. In some implementations, the map is a dynamic PDF file which displays locations of items of interest as the item of interest moves. Think of a person driving a delivery truck or an RFID tagged package.
February 5, 2015
I have been tracking Twitter search for a while. There are good solutions, but these require some heavy lifting. The public services are hit and miss. Have you poked into the innards of TweetTunnel?
I read “Twitter Strikes Search Deal with Google to Surface Tweets.” Note that this link may require you to pay for access or the link has gone dead. According to the news story:
The deal means the 140-character messages written by Twitter’s 284 million users could be featured faster and more prominently by the search engine. The hope is that greater placement in Google’s search results could drive more traffic to Twitter, which could one day sell advertising to these visitors when they come to the site, or more important, entice them to sign up for the service.
Twitter wants to monetize its content. Google wants to sell ads.
The only hitch in the git along is that individual tweets are often less useful than processing of tweets by a person, a tag, or some other index point. A query for a tweet can be darned misleading. Consider running a query for a tweet on the Twitter search engine. Enter the term “thunderstone”. What do you get? Games. What about the search vendor Thunderstone. Impossible to find, right?
For full utility from Twitter, one may want to license the Twitter stream from an authorized vendor. Then pump the content into a next generation information access system. Useful outputs result for many concepts.
For more about NGIA systems and processing large flows of real time information, see CyberOSINT: Next Generation Information Access. Reading an individual tweet is often less informative than examining subsets of tweets.
Stephen E Arnold, February 5, 2015
February 4, 2015
I have been following the “blast from the past” articles that appear on certain content management oriented blogs and news services. I find the articles about federated search, governance, and knowledge related topics oddly out of step with the more forward looking developments in information access.
I am puzzled because the keyword search sector has been stuck in a rut for many years. The innovations touted in the consulting-jargon of some failed webmasters, terminated in house specialists, and frustrated academics are old, hoary with age, and deeply problematic.
There are some facts that cheerleaders for the solutions of the 1970s, 1980s, and 1990s choose to overlook:
- Enterprise search typically means a subset of content required by an employee to perform work in today’s fluid and mobile work environment. The mix of employees and part timers translates to serious access control work. Enterprise search vendors “support” an organization’s security systems in the manner of a consulting physician to heart surgery. Inputs but no responsibility are the characteristics.
- The costs of configuring, testing, and optimizing an old school system are usually higher than the vendor suggests. When the actual costs collide with the budget costs, the customer gets frisky. Fast Search & Transfer’s infamous revenue challenges came about in part because customers refused to pay when the system was not running and working as the marketers suggested it would.
- Employees cannot locate needed information and don’t like the interfaces. The information is often “in” the system but not in the indexes. And if in the indexes, the users cannot figure out which combination of keywords unlocks what’s needed. The response is, “Who has time for this?” When a satisfaction measure is required somewhere between 55 and 75 percent of the search system’s users don’t like it very much.
Obviously organizations are looking for alternatives. These range from using open source solutions which are good enough. Other organizations put up with Windows’ search tools, which are also good enough. More important software systems like an enterprise resource planning or accounting system come with basis search functions. Again: These are good enough.
The focus of information access has shifted from indexing a limited corpus of content using a traditional solution to a more comprehensive, automated approach. No software is without its weaknesses. But compared to keyword search, there are vendors pointing customers toward a different approach.
Who are these vendors? In this short write up, I want to highlight the type of information about next generation information access vendors in my new monograph, CyberOSINT: Next Generation Information Access.
I want to highlight one vendor profiled in the monograph and mention three other vendors in the NGIA space which are not included in the first edition of the report but for whom I have reports available for a fee.
I want to direct your attention to Knowlesys, an NGIA vendor operating in Hong Kong and the Nanshan District, Shenzhen. On the surface, the company processes Web content. The firm also provides a free download of a scraping software, which is beginning to show its age.
Dig a bit deeper, and Knowlesys provides a range of custom services. These include deploying, maintaining, and operating next generation information access systems for clients. The company’s system can process and make available automatically content from internal, external, and third party providers. Access is available via standard desktop computers and mobile devices:
Source: Knowlesys, 2014.
The system handles both structured and unstructured content in English and a number of other languages.
The company does not reveal its clients and the firm routinely ignores communications sent via the online “contact us” mail form and faxed letters.
How sophisticated in the Knowlesys system? Compared to the other 20 systems analyzed for the CyberOSINT monograph, my assessment is that the company’s technology is on a part with that of other vendors offering NGIA systems. The plus of the Knowlesys system, if one can obtain a license, is that it will handle Chinese and other ideographic languages as well as the Romance languages. The downside is that for some applications, the company’s location in China may be a consideration.