Visual Analytics
An Interview with Christopher Westphal
Visual Analytics provides an analytical solution called Data Clarity® which can be implemented quickly and tailored to the needs of specific investigation teams within an organization. The system federates information from a variety of sources, and it provides a point-and-click solution which gets users up and running without the lengthy and complex installation and configuration processes some systems require. After listening to Chris Westphal’s compelling presentation at a recent intelligence conference, I was able to get him to agree to an interview about his firm’s approach to analytics. The full text of the interview appears below. |
What information challenges does the company seek to resolve for its clients?
Visual Analytics Inc. (VAI) is a privately-held company based in the Washington, DC metropolitan area providing proactive analytical, decision support, and information sharing solutions in commercial and government marketplaces throughout the world for investigating money laundering, financial crimes, narcotics, terrorism, border security, embezzlement, and fraud.
In 1998, Visual Analytics founders and executive team saw a significant need within the intelligence, law enforcement and commercial communities to find better ways to access, share, analyze, and report on their data. Today, VAI's tools are delivering unprecedented results to the analytical community and satisfying their need for better, faster, and cheaper alternatives for data analysis and investigative support.
In a nutshell, (VAI) develops technologies to discover patterns, trends, anomalies, and inconsistencies in data to help improve the fundamental core processes associated with organizations, businesses, and government operations. Using a data-centric framework enables our systems to elevate insight, expand knowledge, measure risk, and continually identify opportunities to implement change. VAI is focused on exploiting and harmonizing big data using a unique federated access model to deliver contextual analytics while remaining agile, adaptable, and scalable to aggregate high-quality content. VAI’s systems and interfaces augmented with data form the foundation for innovative breakthroughs to deliver the next generation of disruptive analytical and information sharing technologies.
When did you personally become interested in text and content processing?
Early in my career, in fact – right out of college, I worked for a large system integrator in their Advanced Technology Division and we were focused mostly on cutting-edge computing techniques; a majority of the contracts were funded from DARPA related research projects.
Those early work experiences exposed me to the basic core or canonical representations of data and we were creating expert systems and artificial intelligence-related programs to help process and exploit the content of various data sources. From there, I worked at several other high-tech companies where I became more and more involved in how to best present and convey information through visual paradigms. Eventually, I met my business partner (David O’Connor) and we founded Visual Analytics about 15 years ago.
Visual Analytics has been investing in search, content processing, and text analysis for several years. What are the general areas of your research activities?
We are problem driven. One of the most important areas we have found that separates us from much of our competition is the ability to deliver actual “analytics” to our end-user clients. It is not simply running a query looking for a specific value or graphically showing the contents of a spreadsheet. Our approach is to exploit the patterns and behaviors that are hidden with in the structures and volumes of the data we are processing. Our system effectively taps multiple, disparate sources to deliver one of the industry's only federated data access platforms.
We continue to focus on how to create algorithms (in a generic fashion), that can detect temporal sequences repeating activities, commonality, high-velocity connections, pathways, and complex aggregations.
What sets us apart is how our methods covey how processed data can be interpreted in the context of our client’s data, problems, and requirements. So our challenge is to create ways to discover new things – and apply them in operational environments – to help combat fraud, terrorism, and crime. As our client needs change, we adapt our research engineers’ work to real-world challenges.
Many vendors argue that analytics, mash ups are and data fusion are "the way information retrieval will work going forward." I am referring to structured and unstructured information. How does Visual Analytics perceive this blend of search and user-accessible outputs?
We are already delivering these types of solutions to our client base. In fact, we’ve been doing that for quite some time. Most of the systems offered by industry vendors are very basic, simplistic, and are focused on a specific problem set. These types of systems are indeed useful in the right context, but we come at things from an entirely different perspective.
We’ve seen a distinct interest in having more “intelligence” built into the systems and offering “contextual analytics” where a 360-degree viewpoint can be expressed on any data element.
Visual Analytics can deliver what we call a “fusion” capability to help access, organize, process, present, and report on data. There is still a lot of structured content in our marketplace that has not been exploited to its fullest, and the unstructured content is evolving quickly to offer standard features such as entity and relationship extraction, salient phrase detection, categorization, language identification, polarity/sentiment values, and tokenization.
We address some difficult problem areas and work with best-of-breed technology providers (for example, IMT/Rosoka, among others) when additional features and functions are required.
Without divulging your firm's methods or clients, will you characterize a typical use case for your firm's search and retrieval capabilities?
Sure, a good example is what we did for an East Coast law enforcement agency. We deployed a program designed to support the information sharing and analytical needs of different county-level Crime Analysis Centers (CACs). Beginning with the five highest-crime counties, our Data Clarity system was configured to meet the individual needs of each CAC. This information sharing project has expanded its footprint across the state and is still incorporating as many organizations as possible while also expanding current deployments to broaden the scope of the data available, making it one of the first states to fully deploy a federated data sharing network that is scalable, adaptable, and cost effective. This project also received multiple awards from the state for being one of the best information technology implementation programs.
To sum up, we listened. We tailored the solution. We deployed quickly and without writing custom code.
What are the advantages a client experiences when working with your firm?
Based on the feedback I get from our clients, the advantages our system and method delivers vary depending on the context and opportunity.
Can you highlight several for me?
Yes. First, we use non-proprietary formats and processes. Some of the firms we bump up against in the market often require that data be reprocessed or converted into a specific form, a proprietary format, or consolidated into one big centralized warehouse. Unfortunately, this locks the clients into a closed way of doing something; it is not very flexible, and is very expensive to operate and maintain . Our systems are “open” and use industry standards to access (input/output) data, so there is very little impact on the existing environment.
Second, our technologies are scalable. Our Data Clarity architecture is designed to provide an extremely scalable framework where large numbers of users and data sources are effectively managed. We are a pioneer in federated search technologies, data harmonization, disambiguation, and visualization. We operate at the level our customers expect and require.
Third, our system has low administrative costs. All interactions with our system are through point-and-click interfaces. There is no programming required. We have easily understood and easily accessible parameters and settings. Our interfaces allow our clients to customize their experiences and interactions without having to create special code or one-off deployments. In fact, the return on investment for our clients is almost immediate. Our core license prices are also aggressively priced compared to other less-featured or less capable systems in the marketplace.
What are the benefits to an organization working with your firm?
We are known for great customer service, best of breed technology, fairly priced solutions, and the breadth of value added functions we provide.
VAI has a significant amount of experience and industry know-how. We understand data very well. We know how to exploit, extract, and expose patterns. We can find patterns that allude others. As a company, we are very responsive and can deliver results faster than most clients can respond. I know our support and engineering teams are second-to-none.
In my experience, our technology is the best, most comprehensive, and affordable in the market. I believe that VAI’s technology represents the gold standard for unparalleled, enterprise-level analytics. If an organization requires the ability to integrate, access, and analyze any size and type of internal, shared, and/or public data stores—our system delivers.
Our tools are delivering unprecedented results to the analytical community and satisfying the need for better and faster technologies for data analysis and investigative support around the globe.
How does an information retrieval engagement with your firm move through its life cycle?
I don’t want to sound repetitive, but the specific implementation of VAI for a client depends on the client’s needs, requirements, and infrastructure.
Our technical approach often allows us to address their “immediate” needs to show high-value targets within their data sources quickly.
We typically employ an iterative model to show results, add more value, show results, and continue.
For many engagements we will bring on new data sources to help supplement the client’s core data or information.
Can you give me an example?
Sure, we now offer access to a wide range of public data available now. We have about 10 billion records available on a subscription basis. These data can be used to help deliver entity-authentication. We’ve created a very agile and adaptable method to incorporate different sources of data to help expose better and more valuable patterns.
One challenge to those involved with squeezing useful elements from large volumes of content is the volume of content and the rate of change in existing content objects. What does your firm provide to customers to help them deal with these types of problems?
Right. This is a good question. As you know, the solution depends on the client’s operating environment. By design (as endorsed as an Office of the Director of National Intelligence standard) the data sources should be separate from the applications.
The world is full of very good data management systems. There are databases, crawlers, indexers, etc. Our approach is to provide a layer on top of these existing sources and provide “interface-compliant-queries” to pull out relevant content.
In about 90 percent of our engagements, we take advantage of the existing infrastructure with little to no impact on the client’s information technology processes, networks, or hardware footprint.
If special processing is required, we tune the data management application to best meet the structure of the data so it can be processed/queried to maximize the analytical results.
One other discussion is to differentiate “analytics” from “monitoring.” Much of our capability is to expose new patterns and trends, define the parameters, and verify data structures, content, and other key factors.
Once we’ve locked in on a valuable pattern, we can continue to look for the pattern or it can be recoded into another system/approach (e.g., like is typically done with inline transactional systems) for real-time detection. The hard-issue is detecting the pattern in the first place.
Another challenge, particularly in commercial competitive intelligence operations, is moving data from point A to point B; that is, information enters a system but it must be made available to an individual who needs that information or at least must know about the information. What does your firm offer licensees to address this issue of content "push", report generation, and personalization within a work flow?
One of the primary strengths in our offing is the separation of the data from the application. Thus, we are able to create new workflows, detect and alert of various content, and deploy/distribute/route the information accordingly. More recently, we have completed the development of our REST application programming interface. The method changes our focus from being somewhat of an entire end-to-end system to one of being more a customized solution.
As a result, the heavy lifting of the data federation, harmonization, security, resolution, and result generation are still being handled by our core middle-tier services. However, any type of interface can be created and layered on top of these services.
“Custom apps” are now being created to address different client needs, including simplified displays, geospatial interfaces, and various content presentations.
Are there any restraints?
Not really. The sky is the limit. We continue to get extremely positive feedback on these new interfaces. Another benefit of our approach is that partners and licensees as well as other third parties are able to create their own personalized applications to behave anyway they deem fit.
There has been a surge in interest in putting "everything" in a repository and then manipulating the indexes to the information in the repository. On the surface, this seems to be gaining traction because network resident information can "disappear" or become unavailable. What's your view of the repository versus non repository approach to content processing?
Another good question. My belief is that the optimal approach should be a hybrid. Licensees can acquire content that can’t be recreated or is at risk of being lost and pull that into a repository. Other information and data can be accessed via federated model.
There are a number of reasons for this, cost, scalability, security, control, audit, ability-to-share, hardware, operations, etc. VAI has been creating a Data Mart of content for use in entity-authentication. I mentioned that we have 10 billion records, and our collection is growing. As with everything we do, there are interfaces to this content to help “inline” with other analytical needs, products, and content.
Visualization has been a great addition to briefings. On the other hand, visualization and other graphic eye candy can be a problem to those in stressful operational situations. What's your firm's approach to presenting visual "outputs"?
As you know, visualization is core to our existence and part of our entire technology framework. And, as we discussed, the recent deployment of our REST APIs has opened new worlds for us to pursue. These include mobile access and mobile apps.
Customized interfaces can be created without affecting the core-server/services – which allows us to maintain a single code base. The interfaces can be built with specific domain knowledge, processes, and workflows – thereby making the conveyance of info to the end user as detailed and as precise as needed. So, using different displays or visualizations to handle different scenarios or situations is now a reality – and moving forward quickly.
I am on the fence about the merging of retrieval within other applications. What's your take on the "new" method which some people describe as "search enabled applications"?
We have a very robust data connection framework consisting of different methods for different purposes. The core “connectors” are for relational databases and are based on standard database connector protocols.
Our system also has drivers to other platforms such as information retrieval systems, various enterprise systems, plus the ability to create custom web services to expand, where necessary, to handle new sources or systems (including proprietary formats – assuming there is a Web-service interface available.
We also have Apache Lucene built into our application at the data-layer so it can crawl and index content as needed. We try to make options available along with guidance about each approach. We offer a collection of methods to deliver the right-content for meeting a wide range of client needs. We always reference “contextual analytics” which basically means providing the actual content or pointers to content for any data entity – regardless of where it resides.
There seems to be a popular perception that the world will be doing computing via iPad devices and mobile phones. My concern is that serious computing infrastructures are needed and that users are "cut off" from access to more robust systems. How does your firm see the computing world over the next 12 to 18 months?
Certainly the computing power of an iPad is getting better and better. However, the core value is in the network on which it operates. Thus, server-side control is still important, and will be for quite some time. A lot of new “crowd sourcing” apps are being introduced which provide a functional interface for the end-users – but the heavy computing being handled by the server-side components. We are generally following this process/trend--per our REST APIs--and data mart capabilities. We also implement in-line cloud computing, will become more ubiquitous as the networks get faster. Our approach is seamless, simple, and quick.
Put on your wizard hat. What are the three most significant trends that you see affecting your business?
Tough question.
First, There is what I call smart computing. The idea is that a system will deliver better results that are more specific or pertinent to my needs. We are addressing this via our REST APIs and offering an “app store” like concept for analytics. We expect this to become a common approach across the analytical community.
Second, faster computing. Under this umbrella I group CPU processing along with the network and access times. Getting to the information you need faster. We are being smart about how we configure our systems to ensure they can be accessed and activated with minimal overhead and deliver results as quick as possible.
Third, new security trends. There are new cyber attacks and clever attempts to compromise systems. New methods will be needed to secure our clients and their data. We are working on techniques in which to authenticate against systems. We ensure our system security models remain flexible. What I mean is that our security models can be updated or changed as needed to keep pace with new challenges and protocols.
Where does a reader get more information about your firm?
We are based in Old Town Frederick Maryland, which is about 45 minutes from downtown Washington, DC. We have detailed information on our Web site, http://www.visualanalytics.com.
ArnoldIT Comment
At the recent ISS World Conference in Washington, DC, presentations by Visual Analytics professionals were popular. The company’s open approach and its focus on easy-to-use, high-value solutions which work in “the real world” has resonated with government and commercial clients. If you are seeking an analytics solution which does not require that you rip and replace existing systems, we suggest you take a close look at Visual Analytics.
Stephen E. Arnold, November 6, 2012