April 4, 2013
Robert Steele has been a prescient thinking and actor in the intelligence sector for decades. In 1979 he was competitively selected to join the Central Intelligence Agency’s clandestine service. He spent nine years with the CIA, doing three tours overseas as a case officer recruiting and handling agents. In 1986, helped write the Marine Corps Master Intelligence Plan (MCMIP) as well as a plan for a Marine Corps Intelligence Center (MCIC). In the last 30 years, Mr. Steele has worked on a wide range of projects around the world.
In the interview which appeared in HighGainBlog, he said:
For all the money we spend on it, the secret world is not really providing the return on investment taxpayers should expect. Intelligence – decision support – is simply not being provided to everyone that needs it.
His views on the relationship of intelligence to decision support caught my attention as well. He said:
intelligence helps to emphasize that intelligence is synonymous with decision-support – the output of a very robust process of requirements definition, collection management, source discovery and validation, multi-source fusion, historically- and culturally-informed analytics, and the sharp visualization that answers an important question for a particular decision-making considering a particular decision challenge. Few realize that most of what is produced by the secret world is not intelligence at all. Rather, it is secret information that is generic in nature and often not useful to decision-makers.
Mr. Steele’s views on open source software identifies a trend which has been accelerating in the last few years. Proprietary software has issues which have added a turbo charger to open source software adoption. He asserted:
Proprietary software is unsafe, does not scale, and is unaffordable. I have been unhappy with all vendors for the past 40 years because not a single one of them is committed to helping people make sense of information – they focus on trapping customers into using them as a core system, make promises they cannot keep, and then over-charge for configuration management and data conversion. I am also very concerned about Google’s computational mathematics and programmable search engines – I have a very high regard for Google’s expertise, and a very low regard for the government’s ability to understand now Google can manipulate search outcomes and other forms.
For those interested in intelligence activities, the new Robert Steele interview is a must read. You can find the Steele 2013 interview in HighGainBlog. Mr. Steele’s Public Intelligence blog is a valuable resource.
Stephen E Arnold, April 4, 2013
Sponsored by Augmentext
March 20, 2013
Antonio S. Valderrábanos, founder of Bitext, recently granted an exclusive interview to the Arnold Information Technology Search Wizards Speak series. Bitext provides multilingual semantic technologies, with probably the highest accuracy in the market, for companies that use text analytics and natural language interfaces. The full text of the interview is available at http://www.arnoldit.com/search-wizards-speak/bitext-2.html.
Bitext provides B2B multilingual semantic technologies with probably the highest accuracy in the market. Bitext works for companies in two main markets: Text Analytics (Concept and Entity Extraction, Sentiment Analysis) for Social CRM, Enterprise Feedback Management or Voice of the Customer; and Natural Language Interfaces for Search Engines and Virtual Assistants. Visit Bitext at http://www.bitext.com. Contact information is available at http://www.bitext.com/contact.html.
Bitext is seeing rapidly growth, including recent deals with Salesforce and the Spanish government. The company has added significant and important technology to its multilingual content processing system.
In addition to support for more languages, the company is getting significant attention for its flexible sentiment analysis system. Valderrábanos gave this example: “flies” may be a noun, but also a verb. We say “time flies like an arrow” versus “fruit flies like bananas.” Bitext believes computers should be able to parse both sentences and get the right meaning. With that goal in mind, they started the development of an NLP (natural language processing) platform flexible enough to perform multilingual analysis just by exchanging grammars, not modifying the core engine.
He told ArnoldIT’s Search Wizards Speak:
Our system and method give us a competitive advantage with regards to quick development and deployment,” Valderrábanos said. “Currently, our NLP platform can handle 10 languages. Unlike most linguistic platforms, the Bitext API ‘snaps in’ to existing software.
Bitext’s main area of research is focused on deep language analysis, which captures the semantics of text. “Our work involves dealing with word meanings and truly understanding what they mean, interpreting wishes, intentions, moods or desires,” Valderrábanos explained. “We just need to know what type of content, according to our client, is useful for her business purposes, and then we program the relevant linguistic structures.” He added:
Many vendors advocate a ‘rip and replace’. Bitext does not. Its architecture allows our system to integrate with almost any enterprise application.”
Bitext already delivers accuracy, reliability and flexibility. In the future, the company will be focusing on bringing those capabilities to mobile applications. “IPads, tablet devices in general, and mobile phones are becoming the main computing devices in a world where almost everybody will be always online. This opens a new whole arena for mobile applications which will have to cater for any single need mobile users may have,” Valderrábanos said.
Donald C. Anderson, March 20, 2013
March 15, 2013
Mark Bennett is a recent edition to the LucidWorks team, after New Idea Engineering joined LucidWorks. Steven Arnold recently interviewed Bennett for his noteworthy series, Search Wizard Speak. “An Interview with Mark Bennett,” can be found on the ArnoldIT Web site.
After discussing many of the latest trends in search, Arnold and Bennett turn to the question of proprietary search solutions, and how they are responding to the surge in open source.
Bennett weighs in:
“Some organizations will use open source because its efficiencies are recognized by management. Other organizations will embrace open source because a vendor offers 24×7 support like LucidWorks and has world class engineers available to customize the system. The feature-set is different as well, enterprise buyers care about analytics and data quality, and would prefer a graphical UI. Other organizations will stick with what has been traditionally licensed year after year indifferent to the fact that what’s in an IBM solution may be open source or totally proprietary like Oracle Endeca or Oracle InQuira.”
Bennett is a great addition to the LucidWorks team, which has expanded again recently with the addition of Stephen Tsuchiyama as SVP. LucidWorks is increasing their staff to meet the growing demands of open source software for the enterprise. But LucidWorks is not just responding to a trend, they have been a leader in search and customer service for years, so they are also benefiting from their stellar reputation.
Emily Rae Aldridge, March 15, 2013
March 5, 2013
Engineer Mark Bennett says it’s the tools that matter. Beyond Search agrees. Having tools and talking about tools are two very different things.
Mr. Bennett, co-founder of New Idea Engineering, recently brought more than twenty years’ enterprise search experience to LucidWorks, along with knowledge across major commercial search platforms, superior mathematics and physics-related disciplinary training, and a history in the search industry, including an early tenure at Verity, one of the pioneers in enterprise and large-scale information retrieval back in the 1990s.
Mark Bennett of LucidWorks, a member of their core enterprise search engineering team, recently granted an exclusive interview to the Arnold Information Technology Search Wizards Speak series to discuss the trajectory of search in 2013. LucidWorks is the leading developer of search, discovery, and analytics software based on Apache Lucene and Apache Solr technology. The full text of the interview is available at http://goo.gl/eoeuz.
He told Beyond Search:
“In a nutshell: search, analytics, and content processing vendors have to recognize that what is needed to allow developers to use the product is different from what is required to sell the product and deliver software which users embrace,” Bennett said about the immediate future of search products. “The challenge that keeps search specialists engaged is the problem of dealing with outliers—bizarre business requirements that every project seems to unearth. Outliers are the new norm.”
Bennett recalls a talk with a vendor ten years about a particularly tough search problem. Then, the vendor “ticked off a half dozen reasons why it was really very hard to solve and not worth the effort.” Years later, open source people visited the same problem, came up with a similar list, and diligently worked through those items. “LucidWorks, for instance, delivers facets, suggestions, advanced file storage, and high performance without the punishing costs of proprietary solution,” Bennett explained.
Stephen E. Arnold, Managing Director of Arnold Information Technology and publisher of the influential search industry blog Beyond Search, said:
“In my analysis of open source search, I rated LucidWorks as one of the leading vendors in enterprise search. Other firms with open source components have not yet achieved the technical critical mass of LucidWorks. Proprietary search vendors are integrating open source search technology into their systems in an effort to reduce their technology costs. At this time, LucidWorks is one of the leading vendors of enterprise and Web-centric search. Firms like Attivio (http://www.idc.com/getdoc.jsp?containerId=236514#.US9fGzBcgug) and ElasticSearch (http://www.idc.com/getdoc.jsp?containerId=237410) ElasticSearch are racing to catch up with LucidWorks’ robust technology, engineering and consulting services, and training programs.”
Bennett commented on the differences between LucidWorks and other retrieval solutions companies. “Despite all the comparisons done lately, the target audiences for most open source solutions are very different,” he explained. “If you spin up a copy of Solr you’ve got a very powerful Web user interface, and LucidWorks gives you even more of an administrative user interface. But when you fire up ElasticSearch, you’ve got a REST API.”
Bennett still often works from the Unix command prompt. “But when I watch a Windows or Mac power user for a day, and then watch a Unix command prompt guru—both get a lot of work done. My point is that each is a different type of power user. By the way, I work from the Unix command prompt myself.”
His point is that vendors need to be able to address the user interface preferences. “I do wonder what happens when an ElasticSearch developer hands off an application to a busy information technology person or an operations team to manage. Either those new owners are will need to know the ‘Web command line’ (URL and JSON syntax) extremely well, or if not, an administrative framework will be needed.”
LucidWorks is a step beyond more commercial proprietary search systems, in Bennett’s opinion, because it serves both groups of users. “Our professional services team has experience with many of other search engines. Chances are we’ve worked with many of the pieces before and know how to crack tough problems quickly. If an issue is a first time event, I am confident we can develop a solution.” He added:
“LucidWorks has delivered an open source enterprise search solution which accomplishes two things,” Arnold said. “First, it is an excellent alternative to many proprietary information retrieval systems. Second, the system takes the rough edges off some open source search solutions which add to an organization’s costs, not keeping them within budget allocations.”
Search is not a “one size fits all” solution, Bennett confirmed. “So while some engines drop features that ‘only three percent of people will ever use’, other groups realize that it’s the tools that matter.”
Visit the LucidWorks website at http://www.lucidworks.com.
Donald Anderson, March 5, 2013
February 27, 2013
Tom Reamy, Chief Knowledge Architect and founder of KAPS Group, a group of knowledge architecture, taxonomy, and eLearning consultants, Stephen E. Arnold of ArnoldIT.com about the upcoming Text Analytics World conference. On April 17-18, 2013. The conference addresses several important real world topics, including big data, social media, enterprise applications, intelligence applications and knowledge organization. The full text of the interview is available at http://goo.gl/eCDwi.
Mr. Reamy brings years of content structure and text analytics experience to the conference, which covers the entire spectrum of text analytics from big data/ social media to enterprise text analytics which includes everything from fixing enterprise search to developing advanced, smart applications that gain real value from all that unstructured text. Mr. Reamy will deliver the keynote session “Full Spectrum Text Analytics: Integration of Text Mining & Text Analytics.”
He told Beyond Search:
We also cover all the practical use case examples that let new developers learn how to do TA and more experienced developers share the latest techniques. We cover the real business value that TA can bring to the enterprise, and, lastly, given my interest in theory, we share new ideas and new techniques that enrich the theoretical foundation that is needed to deliver the best (and most practical) applications.
Mr. Reamy has worked in artificial intelligence, computer game and educational software development and corporate intranet consulting, which convinced him of the importance of improving access to data.
Mr. Reamy said:
The basic problem is that search engines don’t understand meaning. Humans think in concepts and search engines deal with meaningless strings. Search companies and users alike have been very creative about trying to overcome the basic stupidity of search engines.
And text analytics is the answer to “smart” searching. “Quite simply, it’s potentially the salvation of search.” Mr. Reamy said. “TA is the piece that can add meaning to search by more closely matching how people think, and can do it more cheaply and consistently than human taggers.”
With new search vendors out there every day, Text Analytic World conference information can arm your company with the knowledge needed to make informed and cost-effective choices.
For more on the conference, visit http://www.textanalyticsworld.com/sanfrancisco/2013.
Don Anderson, February 27, 2013
January 30, 2013
Miles Kehoe, formerly a senior manager at Verity and then the founder of New Idea Engineering, joined LucidWorks in late 2012. I worked with Miles on a project and found him a top notch resource for search and the tough technical area which was our concern.
I was able to interview Miles Kehoe on January 25, 2013. He was forthcoming and offered me insights which I found fresh and practical. For example, he told me:
You know I come from a ‘platform neutral’ background, and I know many of the folks involved with ElasticSearch. Their product addresses many of the shortcomings in Solr 3.x, and a year or two ago that would have been a coup. But now, Solr 4 completely addresses those shortcomings, and then some, with SolrCloud and Zoo Keeper. ES says it doesn’t require a pesky ‘schema’ to define fields; and when you’re playing with a product for the first time, that is kind of nice. On the other hand, folks I know who have attempted production projects with ES tell me there’s no way you want to go into production without a schema. Apache Lucene and Solr enjoy a much larger community of developers. If you check the Wikipedia page, you’ll see that Lucene and Solr both list the Apache Software Foundation as the developer; Elastic Search lists a single developer, who it turns out, has made the vast majority of updates to date. While it is based on Apache Lucene, Elastic Search is not an Apache project. Both products support RESTful API usage, but Elastic requires all transactions to use JSON. Solr supports JSON as well, but goes beyond to support transactions in many formats including XML, Java, PHP, CSV and Python. This lets you write applications to interact with Solr in any language and with any protocol you want to use. But the most noticeable difference is that Solr has an awesome Web Based Admin UI, ES doesn’t. If you’re only writing code, you might not care, but the second a project is handed over to an Admin group they’re bound to notice! It makes me smile every time somebody says ES and “ease of use” in the same sentence – you remember the MS DOS prompt back in 1990? Although early adopters enjoyed that “simplicity”, business people preferred mouse-based systems like the Mac and Windows. We’re seeing this play out all over again – busy IT people want an admin UI – they don’t want to spend all day at what amounts to a “web command line”, stitching together URLs and JSON commands.
I found this comment prescient. I learned about a possible issue triggered by ElasticSearch in “Github Search Exposes Passwords Then Crashes.”
I pressed Mr. Kehoe for key points of differentiation in open source search. I pointed out that every vendor is rushing to embrace open source search. Some do it with lights flashing like IBM and others operate in a lower profile manner like Attivio. He told me:
Just as we have different products and services for our customers, we can customize our engagements to meet our customers’ needs. Some of our customers want to have deep product expertise in-house, and with training, best practice and advisory consulting, and operations/production consulting, we help them come up to speed. We also provide ongoing technical and production support for mission critical applications – just last month an eCommerce site ran into production problems on the Friday afternoon before Christmas. We were able to help them out and have them at full capacity before dinner. Not to dwell on it, but what sets LucidWorks apart is the people. We employ a large number of the team that created and enhances Lucene and Solr including Grant Ingersoll, Steve Rowe and Yonik Seeley. We also have significant expertise on the business side as well. At the top, Paul Doscher grew Exalead from an unknown firm into a major enterprise search player over just a few years; my former business partner Mark Bennett and I have built up deep understanding of search since our Verity days in the early 1990s.
Important information for those analyzing search systems I believe.
You can read the full text of the interview on the ArnoldIT Search Wizards Speak series at http://goo.gl/31682. Search Wizards Speak is the largest, no cost, freely available collection of interviews with experts in search and content processing. There are more than 60 interviews available. You can find the full series listing at http://www.arnoldit.com/search-wizards-speak/ and http://arnoldit.com/wordpress/wizards-index/.
Stephen E Arnold, January 30, 2013
Sponsored by Dumante.com
January 14, 2013
Dr. Jerry Lucas, founder of TeleStrategies, is an expert in digital information and founder of the ISS World series of conferences. “ISS” is shorthand for “intelligence support systems.” The scope of Mr. Lucas’ interests range from the technical innards of modern communications systems to the exploding sectors for real time content processing. Analytics, fancy math, and online underpin Mr. Lucas’ expertise and form the backbone of the company’s training and conference activities.
What makes Dr. Lucas’ viewpoint of particular value is his deep experience in “lawful interception, criminal investigations, and intelligence gathering.” The perspective of an individual with Dr. Lucas’ professional career offers an important and refreshing alternative to the baloney promulgated by many of the consulting firms explaining online systems.
Dr. Lucas offered a more “internationalized” view of the Big Data trend which is exercising many US marketers’ and sales professionals’ activities. He said:
“Big Data” is an eye catching buzzword that works in the US. But as you go east across the globe, “Big Data” as a buzzword doesn’t get traction in the Middle East, Africa and Asia Pacific Regions if you remove Russia and China. One interesting note is that Russian and Chinese government agencies only buy from vendors based in their countries. The US Intelligence Community (IC) has big data problems because of the obvious massive amount of data gathered that’s now being measured in zettabytes. The data gathered and stored by the US Intelligence Community is growing beyond what typical database software products can handle as well as the tools to capture, store, manage and analyze the data. For the US, Western Europe, Russia and China, “Big Data” is a real problem and not a hyped up buzzword.
Western vendors have been caught in the boundaries between different countries’ requirements. Dr. Lucas observed:
A number of western vendors made a decision because of the negative press attention to abandon the global intelligence gathering market. In the US Congress Representative Chris Smith (R, NJ) sponsored a bill that went nowhere to ban the export of intelligence gathering products period. In France a Bull Group subsidiary, Amesys legally sold intelligence gathering systems to Lybia but received a lot of bad press during Arab Spring. Since Amesys represented only a few percent of Bull Group’s annual revenues, they just sold the division. Amesys is now a UAE company, Advanced Middle East Systems (Ames). My take away here is governments particularly in the Middle East, Africa and Asia have concerns about the long term regional presence of western intelligence gathering vendors who desire to keep a low public profile. For example, choosing not to exhibit at ISS World Programs. The next step by these vendors could be abandoning the regional marketplace and product support.
The desire for federated information access is, based on the vendors’ marketing efforts, is high. Dr. Lucas made this comment about the existence of information silos:
Consider the US where you have 16 federal organizations collecting intelligence data plus the oversight of the Office of Director of National Intelligence (ODNI). In addition there are nearly 30,000 local and state police organizations collecting intelligence data as well. Data sharing has been a well identified problem since 9/11. Congress established the ODNI in 2004 and funded the Department of Homeland Security to set up State and Local Data Fusion Centers. To date Congress has not been impressed. DNI James Clapper has come under intelligence gathering fire over Benghazi and the DHS has been criticized in an October Senate report that the $1 Billion spent by DHS on 70 state and local data fusion centers has been an alleged waste of money. The information silo or the information stovepipe problem will not go away quickly in the US for many reasons. Data cannot be shared because one agency doesn’t have the proper security clearances, job security which means “as long as I control access the data I have a job,” and privacy issues, among others.
The full text of the exclusive interview with Dr. Lucas is at http://www.arnoldit.com/search-wizards-speak/telestrategies-2.html. The full text of the 2011 interview with Dr. Lucas is at this link. Stephen E Arnold interviewed Dr. Lucas on January 10, 2013. The full text of the interview is available on the ArnoldIT.com subsite “Search Wizards Speak.”
Donald Anderson, January 14, 2013
December 4, 2012
Cybertap is a company which pushes beyond key word search. The firm’s technology permits a different type of information retrieval.
In an exclusive interview with ArnoldIT, Cybertap revealed that hidden within the network traffic are malicious attacks, personal and medical information leaks, and insider theft of intellectual property and financial information. Cybertap’s clients use Recon to keep tabs on the good and the bad being done on their networks and who’s doing it, so that they can take the proper actions to mitigate any damage and bring the individuals to account.
Dr. Russ Couturier, Chief Technology Officer of Cybertap, recently granted an exclusive interview to the Arnold Information Technology Search Wizards Speak series to discuss Cybertap Recon, a product that applies big data analytics to captured network traffic to give organizations unparalleled visibility into what is transpiring both on and to their networks.
Until recently, the firm’s technology was available to niche markets. However, due to the growing demand to identify potentially improper actions, Cybertap has introduced its technology to organizations engaged in fraud detection and related disciplines. The Cybertap system facilitates information analysis in financial services, health care, and competitive intelligence.
Dr. Couturier said:
Recon is able to decrease risk and improve your situational awareness by decreasing the time to resolution of a cyber event and by improving your knowledge of what happened during a cyber event. We are incorporating big data analysis techniques to reduce the meaningless data and quantify the meaningful information using categorization, semantic, and sentiment tools,” Couturier said. “Recon presents the information as it was originally seen so analysts can follow conversations and threads in context.
The firm’s system processes content, embedded files, attachments, attributes, network protocol data, metadata, and entities. Developers incorporated semantic analysis tools to “roll-up” large volumes of data into what they call “themes” and “topics.” This aggregation enables researchers to more quickly decide whether information is relevant.
Mash ups and data fusion are crucial when dealing with big data. You can search, visualize, link, and reconstruct exactly what happened from the primary source and reduce investigation times by hours or days.
Cybertap is one of a handful of content processing firms taking findability to a new level of utility. The firm’s system combines next-generation methods with a search box and visualization to provide unique insights into information processed by the Cybertap system. The full text of the interview is available at www.arnoldit.com/search-wizards-speak/cybertap.html.
Cybertap LLC’s vision is to integrate the best-of-breed cyber forensics, analysis, and security technologies. Cybertap serves all markets requiring solutions next generation data analysis tools including: federal government markets, both civilian and Department of Defense agencies; commercial markets; and state and local governments. The privately held company has offices located in Vienna, Virginia; Englewood, Colorado and Palmer, Massachusetts.
The system is important because it underscores the opportunities for innovators in information retrieval and analysis. Cybertap combines search with a range of functions which allow a combination of alerting, discovering, and finding. In my experience, few products offer this type of pragmatic insight without the costs and complexities of traditional systems built by cobbling together different vendors’ products.
Search Wizards Speak is the largest collection of interviews with innovators and developers working in search and content processing. An index to the more than 60 interviews is available at http://www.arnoldit.com/search-wizards-speak/.
Additional information about Cybertap LLC is available at http://www.cybertapllc.com.
Stephen E Arnold, December 4, 2012
November 21, 2012
I continue to learn about companies with high-value content processing technologies. The challenge in real-time translation, if one believes the Google marketing, is now in “game over” mode. The winner, of course, is Google. Other firms can head to the showers and maybe think about competing in another business sector.
But some of that Google confidence may be based on assumptions about Google’s language processing expertise, not more recent systems and methods. I know. This is “burn at the stake” information to a Googler.
However, I saw a demonstration which made clear to me that Google’s “kitchen sink” approach to figuring out how to handle speech input and near real time translation may not be in step with other firm’s approaches. The company with some quite interesting translation technology and a commitment to easy integration is IMT Holdings. The privately held company’s product is Rosoka.
IMT Holdings, Corp. was founded in 2007. Our background is in US government contracting. In the course of the firm’s work, Mr. Sorah saw that the existing NLP or Natural Language Processing (NLP) tools were not able to handle the volumes and complexities of the data they needed to process. In December of 2011, IMT began actively marketing its NLP technology.
I was able after some telephone tag and email to interview Mike Sorah, one of the co-founders of IMT and one of the wizards behind the Rosoka technology.
Mr. Sorah told me:
Many of the existing NLP tools claim to be multilingual, but what they mean is that they have linguistic knowledge bases usually acquired from vendors who provide dictionaries and libraries that make NLP an issue for many licensees. But most of the NLP system don’t process documents that contain English and Chinese or English and Spanish. In the world of our clients, mixed language documents are important. These have to be processed as part of the normal stream, not put in an exception folder and maybe never processed or processed after a delay of hours or days.
The Rosoka system is different from other NLP and translation systems on the market at this time. He asserted:
In most multilingual NLP systems, the customer needs to know before they process the document what language the document is so they can load the appropriate language-specific knowledge base. What we did via our proprietary Rosoka algorithms was to take a multilingual look at the world. Our system automatically understands that a document may be in English or Chinese, or even English and Spanish mixed. The language angle is huge. We randomly sample Twitter stream and have been tweeting the top 10 languages of the week are. English varies between 35 to 45% of the tweets. Every language that Rosoka can process is included. Our multilingual support is not not sold as separate, add-on functionality.
You can read the full text of the interview with Mike Sorah in the ArnoldIT.com Search Wizards Speak series at this link. More information about IMT and Rosoka is available from the firm’s Web site, http://www.imtholdings.com.
Stephen E. Arnold, November 21, 2012
November 12, 2012
With the hype surrounding analytics, I have difficulty separating the wheat from the Wheaties when it comes to companies offering next-generation . Since the election, some individuals have been positioned as superstars of analytics. (The full text of my interview with Mr. Westphal appears in the ArnoldIT Search Wizards Speak series at this link.)
I am not comfortable with political predictions nor the notion of superstars. I did some checking and got solid referrals to Chris Westphal, one of the founders of Visual Analytics. The company has a solid client base in the world where analytics are essential to security and risk mitigation.
Mr. Westphal was gracious with his time, and I was able to speak with him in Washington, DC and then continue our discussion via email. In the course of my conversations with Mr. Westphal, he provided a different perspective on the fast-growing analytics sector.
His company, Visual Analytics or VAI is a privately-held company based in the Washington, DC metropolitan area providing proactive analytical, decision support, and information sharing solutions in commercial and government marketplaces throughout the world for investigating money laundering, financial crimes, narcotics, terrorism, border security, embezzlement, and fraud.
He told me that his firm’s approach and its success are a result of focusing on client problems, not imposing a textbook solution on a situation. He said:
We are problem driven. One of the most important areas we have found that separates us from much of our competition is the ability to deliver actual “analytics” to our end-user clients. It is not simply running a query looking for a specific value or graphically showing the contents of a spreadsheet. Our approach is to exploit the patterns and behaviors that are hidden with in the structures and volumes of the data we are processing. Our system effectively taps multiple, disparate sources to deliver one of the industry’s only federated data access platforms. We continue to focus on how to create algorithms (in a generic fashion), that can detect temporal sequences repeating activities, commonality, high-velocity connections, pathways, and complex aggregations.
One of the keys to Visual Analytics success is the company’s distinction between analytics and monitoring. Mr. Westphal pointed out:
The world is full of very good data management systems. There are databases, crawlers, indexers, etc. Our approach is to provide a layer on top of these existing sources and provide “interface-compliant-queries” to pull out relevant content. In about 90 percent of our engagements, we take advantage of the existing infrastructure with little to no impact on the client’s information technology processes, networks, or hardware footprint. If special processing is required, we tune the data management application to best meet the structure of the data so it can be processed/queried to maximize the analytical results. One other discussion is to differentiate “analytics” from “monitoring.” Much of our capability is to expose new patterns and trends, define the parameters, and verify data structures, content, and other key factors. Once we’ve locked in on a valuable pattern, we can continue to look for the pattern or it can be recoded into another system/approach (e.g., like is typically done with inline transactional systems) for real-time detection. The hard-issue is detecting the pattern in the first place.
The technical approach of Visual Analytics relies on open source and proprietary systems and methods. Mr. Westphal noted:
We have a very robust data connection framework consisting of different methods for different purposes. The core “connectors” are for relational databases and are based on standard database connector protocols. Our system also has drivers to other platforms such as information retrieval systems, various enterprise systems, plus the ability to create custom web services to expand, where necessary, to handle new sources or systems (including proprietary formats – assuming there is a Web-service interface available. We also have Apache Lucene built into our application at the data-layer so it can crawl and index content as needed. We try to make options available along with guidance about each approach. We offer a collection of methods to deliver the right-content for meeting a wide range of client needs. We always reference “contextual analytics” which basically means providing the actual content or pointers to content for any data entity – regardless of where it resides.
The full text of the interview is available at http://goo.gl/2y6T8. After my discussions with Mr. Westphal I remain convinced that the notion of next generation analytics is more rich and mature than some applications of next generation analytics.
Stephen E Arnold, November 12, 2012