Interview with Stephen E Arnold, Reveals Insights about Content Processing
March 22, 2016
Nikola Danaylov of the Singularity Weblog interviewed technology and financial analyst Stephen E. Arnold on the latest episode of his podcast, Singularity 1 on 1. The interview, Stephen E. Arnold on Search Engines and Intelligence Gathering, offers thought-provoking ideas on important topics related to sectors — such as intelligence, enterprise search, and financial — which use indexing and content processing methods Arnold has worked with for over 50 years.
Arnold attributes the origins of his interest in technology to a programming challenge he sought and accepted from a computer science professor, outside of the realm of his college major of English. His focus on creating actionable software and his affinity for problem-solving of any nature led him to leave PhD work for a job with Halliburton Nuclear. His career includes employment at Booz, Allen & Hamilton, the Courier Journal & Louisville Times, and Ziff Communications, before starting ArnoldIT.com strategic information services in 1991. He co-founded and sold a search system to Lycos, Inc., worked with numerous organizations including several intelligence and enforcement organizations such as US Senate Police and General Services Administration, and authored seven books and monographs on search related topics.
With a continued emphasis on search technologies, Arnold began his blog, Beyond Search, in 2008 aiming to provide an independent source of “information about what I think are problems or misstatements related to online search and content processing.” Speaking to the relevance of the blog to his current interest in the intelligence sector of search, he asserts:
“Finding information is the core of the intelligence process. It’s absolutely essential to understand answering questions on point and so someone can do the job and that’s been the theme of Beyond Search.”
As Danaylov notes, the concept of search encompasses several areas where information discovery is key for one audience or another, whether counter-terrorism, commercial, or other purposes. Arnold agrees,
“It’s exactly the same as what the professor wanted to do in 1962. He had a collection of Latin sermons. The only way to find anything was to look at sermons on microfilm. Whether it is cell phone intercepts, geospatial data, processing YouTube videos uploaded from a specific IP address– exactly the same problem and process. The difficulty that exists is that today we need to process data in a range of file types and at much higher speeds than ever anticipated, but the processes remain the same.”
Arnold explains the iterative nature of his work:
“The proof of the value of the legacy is I don’t really do anything new, I just keep following these themes. The Dark Web Notebook is very logical. This is a new content domain. And if you’re an intelligence or information professional, you want to know, how do you make headway in that space.”
Describing his most recent book, Dark Web Notebook, Arnold calls it “a cookbook for an investigator to access information on the Dark Web.” This monograph includes profiles of little-known firms which perform high-value Dark Web indexing and follows a book he authored in 2015 called CYBEROSINT: Next Generation Information Access.
Danaylov asked Arnold to offer a glimpse into the big-picture of the current ecosystem of search firms, which Arnold describes as “an extremely rich spectrum” consisting of well-known open-source solutions such as Elastic on one end to the large brands like Oracle. Additionally, he discusses Terbium Labs and Hyperion Gray as important players using mathematical methods from various disciplines like nuclear physics.
Another question regarding companies in the search space generated interesting discussion involving Recorded Future. Arnold identifies Recorded Future as the “leader in predictive analytics”, an area which he warns “any mathematical system can be interpreted as predictive, so we have to put the marketing aside.” He mentions Recorded Future’s ability to process standard web content and unstructured data such as text messaging for analysts to use in generating insights; additionally, they are “one of the best companies at displaying those signals in temporal structures.
Other topics discussed in the interview include observations about Palantir, Arnold’s trilogy book series about Google, surveillance and Big Brother, Edward Snowden and ethics. Learn more by listening to the full interview, which also features Arnold’s thoughts on what he considers the major issues in search and content processing today.
Megan Feil. March 21, 2016