Exclusive: Interview with DataWalk’s Chief Analytics Officer Chris Westphal, Who Guides an Analytics Rocket Ship

October 21, 2020

I spoke with Chris Westphal, Chief Analytics Officer for DataWalk about the company’s string of recent contract “wins.” These range from commercial engagements to heavy lifting for the US Department of Justice.

Chris Westphal, founder of Visual Analytics (acquired by Raytheon) brings his one-click approach to advanced analytics.

The firm provides what I have described as an intelware solution. DataWalk ingests data and outputs actionable reports. The company has leap-frogged a number of investigative solutions, including IBM’s Analyst’s Notebook and the much-hyped Palantir Technologies’ Gotham products. This interview took place in a Covid compliant way. In my previous Chris Westphal interviews, we met at intelligence or law enforcement conferences. Now the experience is virtual, but as interesting and information in July 2019. In my most recent interview with Mr. Westphal, I sought to get more information on what’s causing DataWalk to make some competitors take notice of the company and its use of smart software to deliver what customers want: Results, not PowerPoint presentations and promises. We spoke on October 8, 2020.

DataWalk is an advanced analytics tool with several important innovations. On one hand, the company’s information processing system performs IBM i2 Analyst’s Notebook and Palantir Gotham type functions — just with a more sophisticated and intuitive interface. On the other hand, Westphal’s vision for advanced analytics has moved past what he accomplished with his previous venture Visual Analytics. Raytheon bought that company in 2013. Mr. Westphal has turned his attention to DataWalk. The full text of our conversation appears below.

The Westphal Interview: Autumn 2020

Thanks for taking the time to speak with me today. Why DataWalk:

Raytheon acquired my previous company, Visual Analytics, in 2013 and I initially stayed on to help with the transition. Raytheon is an exceptional company, they are extremely innovative, they help defend and protect our national security, they have incredible talent, and, they’re really good at integrating and configuring technologies. However, Raytheon is not a commercial software development shop and I wanted to get back-to-basics with a smaller and more agile organization to serve a broader “investigative” community from regional police organizations to large federal agencies. I saw the DataWalk platform as the catalyst to enable this transition and with my background, I could help guide and affect its progression into the marketplace.

Would you describe the principal technical and feature differences between Visual Analytics and DataWalk?

When I co-founded Visual Analytics back in 1998, the Apache Software Foundation did not exist. Thus, we had to innovate, design, and create some very advanced software, techniques, and systems from scratch to deliver our analytical platform (Data Clarity) into this fledgling marketplace. It was challenging, invigorating, and rewarding to see our system evolve and address some very complex environments across a wide range of agencies including the Defense Intelligence Agency, DISA, FinCEN, IRS, FBI, CIA, US Army, US Marshals Service, and the New York Police Department to name a few, plus deployments to over 40 countries while working extensively through our partnership channels. The system was very capable and feature-rich but it required some upfront training to configure and use properly.

I’ve always been very “user-focused” and have significant client-empathy to deliver better analytics through powerful visualizations, intuitive interactions, and focused outcomes. One goal I always wanted to achieve was to define a process to naturally and non-intrusively capture the tacit knowledge a user employs while analyzing data. A way to share this encoded knowledge with other users in a secure, reliable, and manageable fashion using methods that are transparent and easy to understand, evaluate, and audit.

DataWalk eloquently addresses this need as each click, filter, query, or selection in the graphical user interface, called the Universe Viewer, generates a breadcrumb defining the selected action. You can see examples on the DataWalk Web site. As users progress in their analyses, these breadcrumbs chain together and form workflows. With DataWalk it’s easy to go back and change a parameter, pursue a different analytical branch, or simply save the results.

Once the workflow is saved, they are accessible in a user-dashboard to re-run or use as an alert to monitor for specific changes in the underlying data sets. They also form the basis for risk-scoring where each workflow is individually weighted. There can be dozens of scores created with different combinations of workflows and any single entity (for example, a person, address, transaction, account, etc.) can be part of many risk scores – which are easily aggregated into a master-risk-score, if required.

Imagine creating a library of workflows to cover the conditions for analyzing suspicious banking transactions, detecting fraudulent activities, flagging improper payments, evaluating communication records for collusion, examining log files for cyber-crimes, exposing human trafficking patterns, or monitoring sources to detect terrorist behaviors. DataWalk has achieved these capabilities using a consistent and repeatable framework to capture and democratize the domain knowledge.

Palantir Technologies went public and as part of that process, the company revealed that it had fewer than 140 customers and was losing hundreds of millions of dollars. What differentiates DataWalk from Palantir?

My elevator speech often starts with, “We are a cost-affordable and integrator friendly alternative to Palantir…” Fundamentally, our business models are very different. Palantir tends to be a one-stop shop (all-or-nothing) delivering professional services wrapped around a core technology. Whereas DataWalk directly sells software licenses designed as an easy-to-configure, user-centric platform that quickly couples with different sources and external (federated) systems.

DataWalk’s system and method for accessing processed content delivers cross-corpus results. A DataWalk user gets a view across text, numbers, and other content without having to perform separate data manipulations.

In comparing the technical features and functions, as posted by Palantir regarding their Titan release a majority of their “innovative” features appear to reflect capabilities that already exist within DataWalk. Both are highly scalable, both support collaboration, and both support accessing external interfaces – to name a few similarities. Of course, each tool has their own strengths and weaknesses depending on what requirements are evaluated. However, there are some “real” differences between our platforms.

One major difference is that “all” data must be converted into an internal Palantir entity format (via pXML) which requires additional time, effort, and costs to reformat the data into a proprietary Palantir ontology. Extracting this content along with any derived analytics is not straightforward and I believe was one of the reasons why the New York Police Department terminated the use of Palantir, which was written up by Buzzfeed. With DataWalk, everything is done using open standards. It’s easy to get data in and easy to get data out with no transformations required. And, the client owns all their data, all their analytics, and anything else produced using the platform. Period.

Another major difference is the price and configuration. As shown in GSA Schedule-70, the cost of a single Palantir core is $141,015 verses $35,000 for DataWalk; we’re over 75 percent less expensive. See (starting on page 34) their GSA schedule to confirm the costs. Furthermore, DataWalk, also on GSA (NASA SEWP, CIO-CS, DOJ-BPA), offers clients several licensing models (core, concurrent, perpetual, and term) to best fit their budget and usage requirement. A basic system for five users costs less than $100,000 to purchase or less than $5,000 per month to lease.

In past engagements, Palantir required “significant” consulting time for their forward deployed engineers (ninjas) to configure the system to meet customer needs. Many issues including transparency, inflated costs, and complex usage are discussed in a Wired article about Palantir; for instance and I quote:

Palantir uses an opaque pricing model and does not discretely identify to customers the costs of software, hardware, equipment, and professional services… Palantir Technologies, Inc. is the only vendor available to provide support and maintenance on Palantir’s Gotham software platform… [which] is proprietary to Palantir Technologies, Inc.

Finally, many well-respected system integrators operating in the government sector do not work with Palantir, as Palantir generally does not “play-well-with-others” as stated in the Wired article – thereby limiting your choice of what vendor can provide the onsite services and support. DataWalk is a true commercial off the shelf platform with roadmaps, APIs, manuals, bug-fixes, release notes, training guides, and even a partner program. Plus, we work very well with both clients and integrators to get them proficient on our platform. This avoids “vendor lock-in” and saves significant resources and costs for ongoing operations and maintenance.

The market for intelware or policeware is limited and in many statements of work, the emphasis is upon a compromise of excellence and price. What’s your approach to the policeware market?

Most opportunities or requests for proposals from law enforcement look for an integrated solution to analyze data from their cases, arrests, incidents, leads, license plates, parking tickets, gangs, gun permits, accidents, jail, probation, and different records management systems (RMS). Often, they want to combine multiple technologies to incorporate search, analytics, charting, mapping, prediction, and reporting. And, they want it easy to use, easy to maintain, and easy to train.

The DataWalk interface provides visual cues for an investigative workflow and smart icons. The idea is that the system provides the functions required to deliver the information required to address a specific issue in a case.

Cost is always a top concern for law enforcement operations. As agencies transition to become more data-centric, they embrace the time-to-results and consistency obtained from using an enterprise-wide analytical capability. Affordability is a relative dimension in this marketplace and according to a National Police Foundation article:

…the standard cost to recruit, hire, equip, and fully train a police officer from the time they submit their initial application to the time they can function independently may exceed $100,000 and take up to eighteen months.”

For less than the cost of onboarding a single officer, a system like DataWalk delivers an ROI in a much shorter time frame (days/weeks), operates 24/7, and doesn’t charge for overtime.

Here’s my main point: Our goal is not just to sell licenses – it’s to deliver operational platforms to help solve real world problems. There are approximately 18,000 police agencies within the United States and each has their own specific requirements, thus, there are no cookie-cutter deployments; one-size-does-not-fit-all. Certainly, there’s overlap among requirements, but for each deployment, we must evaluate and address their immediate needs and be adaptable to deliver on any number of future requests.

A good example of this occurred using DataWalk in a multi-jurisdictional gang task force to simplify and standardize the available data from across all the participating agencies. This group seizes a lot of mobile devices and uses digital forensics to produce some basic reports. DataWalk improved this process by creating an importer for the Cellebrite platform to ingest the content from multiple devices to cross reference calls, contacts, messages, texts, locations, and other important data.

One of the detectives ask if we could also do anything with all the images and photos contained on these devices since it was then a manual process to review. To address this need, we incorporated an API call-out to a third-party machine learning library called TensorFlow to categorize and define the content (for example, vehicles, weapons, drugs, people, and even nudity). We delivered this capability in a few days and made it available for use by all DataWalk’s clients.

Let me also add that we are looking at other markets, there are approximately 6,000 insurance companies and over 5,000 banks in the US. Although their business processes tend to be more homogenous, they still want highly configurable systems to deliver better, faster, and more accurate results that can easily adapt to new fraud schemes and money laundering patterns.

“Cyber” is a buzzword. However, the security issues facing many organizations — insider threat, data loss, bogus data, fraud, etc. — are growing problems. Where do you fit in a landscape in which “cyber” solutions may be greeted with skepticism because existing solutions either do not work or are too complicated to work?

There’s a lot of confusion and misunderstanding in the cyber-marketplace. Depending on who you ask, it means different things including detecting irregular network traffic, suspicious user behaviors, account takeovers, external threats, content abuse, or hostile actions – plus a lot more. These are all very different problem sets. However, just like any other domain, it all comes down to the “data” and what you expect to do with it. Most stakeholders are aware there are problems but are not sure how to best address them.

For example, a government agency may routinely receive online applications for benefits, loans, refunds, or entitlements. Generally, most transactions are normal in appearance and don’t set-off any warning flags. However, if different applications are submitted from the same IP address, subnet, or TOR exit node – is it suspicious? What if the IP is from a Starbucks location? Suppose they are all within a few minutes of each other? Are the email addresses similar (for example, use of dots in naming Gmail accounts)? What is the local time of the transaction – 2:00am? Do the accounts all use the same pattern to encode the password? What if the same account has multiple logins within a short time period, yet the geocoded location of the IP addresses show they are miles apart?

A DataWalk user can obtain a report about an entity with a click. The report contains pictures, documents, and videos. A click on any object reveals the underlying content, including relationship graphs or social graphs of an entity’s connections.

The cyber-domain is constantly changing and the adversaries regularly update their tactics and therefore, the solution must adapt to keep pace with them. Also, those systems only looking at a single part of the overall data will have inherent limitations; more data touch points deliver better resolution into the problem space.

What are you doing to reduce the amount of time required for an analyst to learn how to use your system and become productive?

Except for assembling IKEA furniture, most people don’t read the instructions. Thus, we’ve made our system inherently easy to use and invested heavily into designing simple interfaces using visualizations, dashboards, and graphics. We’ve limited the number of options to reduce the complexity and keep the interfaces intuitive. Of course, we provide online training guides and self-paced video-tutorials to introduce various features and show the users how to operate the platform. After a few hours, most users are productive.

In more targeted deployments, we can use preconfigured workflows to automatically generate results. We can set up risk-scores to quickly identify the most “suspicious” entities. We can train machine learning models to classify the data. We can even generate alerts to notify users when specific data conditions are met. There are many ways to help deliver results so the analysts can remain focused-on and effective-in their investigations.

A year ago I mentioned that DataWalk had boiled down your expertise to an icon infused with your expertise and smart software. The demands on a skilled analyst like yourself are significant. How are the smart icons and the DataWalk interface addressing the challenges of selecting the appropriate procedure for a specific situation?

Within many operations, there is a consistent and repeated level of turnover as personnel transition to new roles and make their assigned rotations. Unfortunately, when these assets move-on, they also take valuable knowledge and the insights learned during their tenure. DataWalk delivers a more agile and adaptable protocol where the processes, workflows, and outcomes are captured, stored, and made available for sharing, alerting, and reporting. The smart icons (for example, saved workflows) are created by the users, on their own data, in a unique context to address a specific problem or mission need. The workflows encode the organizational knowledge, where the newest analyst can consistently run the same analyses generated by seasoned users. This builds an expanding knowledgebase of expertise that is auditable, adaptable, repeatable, and remarkably transparent in its operation. Additionally, I remain involved in a lot of opportunities and contribute by helping to define the analytical models, create data transformations, recommend new sources, and incorporate third-party party functionality. I’m also working on creating knowledge libraries for specific content. Our goal is to deliver results and we’re always innovating. We want our clients to be successful and I’m part of a great support team. I’m always available. It’s my passion.

Every vendor with which I speak tells me AI, machine learning, predictive analytics, secret sauce. What are you implementing that you consider cutting edge?

One ingredient to our secret sauce is the way we made DataWalk extensible – we made it easy to expand to include other “components” to extend its functionality. Most of the add-ons (micro services) are made available through the App-Center where a published interface accesses 3rd party systems, subscription services, or various external libraries. App-Center add-ons can include Natural Language Processing (NLP) modules, AI/ML libraries, access to statistical systems like R, social media interfaces, open-source exploitation, document management systems, digital forensics, and many others. DataWalk ships with scripts available in the App Center for platforms such as Rosoka, spaCy, Whoster, WebHose.io, ShadowDragon, TensorFlow, WhoIs, Libpostal, and many-many others. It is straightforward to create scripts to extract data from systems using client-generated templates or configurations. New apps are easily created, by partners, integrators, or clients to ensure the system remains extensible to meet a wide range of needs.

The DataWalk system generates a social graph. People, activities, and other information can be absorbed quickly by the investigator.

As for machine learning, the latest version incorporates third-party libraries like H2O / AutoML to support more predictive results and actions. DataWalk facilitates the creation of machine learning models using a powerful framework to inline and manage all the processing (training) and custom algorithms (client defined). DataWalk supervises the processes to ensure they are sharable, trackable, and easy to maintain thus accommodating reusability while supporting fast iteration cycles to develop new models. Models are coded and deployed in DataWalk where the results, potentially from multiple machine learning models, are available to compare, contrast, or combine. Machine learning is embedded using simple wrapper functions to provide a unified interface to a variety of machine learning algorithms, with extensive support to help explain functionality and shorten the time to results.

If you look forward 12 to 18 months, what type of innovations will users of your system deliver to its licensees?

We’ve got a feature-list a mile long and we prioritize much of the development based on client feedback to ensure we are addressing their immediate needs. Some of the more advanced features planned for development include advanced entity-resolution methods, entity-deconfliction, automatic schema and content matching, data quality transformations, delivery of pre-encoded workflows for specific domains and data sets, automatic classification models using machine learning, and a number of new visualizations, reports formats, and output specifications. We’re also looking at novel methods for collaborating on an investigation/case with automatic markers, updates and notifications. Plus, there are a lot more apps being added to the App Center including connections to Lexis/Nexis, Thomson Reuters, TransUnion, DarkOwl, Anno.Ai, Dataiku and many more. Our goal is to deliver a positive experience to help the user make confident and well-informed decisions.

There are calls for defunding law enforcement, intelligence, and the military in the US. What’s the outlook for policeware and intelware for investigators in government and non-governmental organizations?

The outlook is very positive. The cliché “doing more with less” is apropos for the current marketplace. Under normal circumstances you really don’t need to hire new people, you just need to refocus and refactor the available resources. Organizations already have access to a lot of data – it’s a matter of using it better, smarter, and more effectively.

For example, during the recent riots in Philadelphia, a woman torched a police car. The FBI reviewed videos of the incident uploaded to Instagram and Vimeo and saw she had a distinct tattoo on her forearm and discovered her shirt was only sold from a specific Etsy store where she had posted a review. Using the name in the online review, they identified a matching Poshmark profile and from there, found that name referenced in a LinkedIn profile which exposed her true identity and showed the same tattoo.

Thus, the outlook for platforms such as DataWalk, designed to achieve efficiencies with the data, will be well received. They’ll be quickly adopted because they are less expensive than traditional systems, they deliver better and faster results, they are easy to train, and they’re extensible to keep up with new technology advances. Net-net, don’t take a knife to a gun fight.

How can a person interested in DataWalk contact you?

Take a look at our website http://www.datawalk.com and review the write-ups and videos. Request additional materials from info @ datawalk.com or you can contact me directly at: chris.westphal @ datawalk.com

DarkCyber Observations

DataWalk’s approach provides the analytic power of industry-standard services. The firm’s interface, its approach to customers and licensees, and its commitment to providing a system which outputs understandable results sets it apart. The firm’s technology has captured customers in the US Department of Justice, financial institutions, and organizations outside the US. This is an important player in the policeware and intelware sector.

Stephen E Arnold, October 21, 2020

Stephen E Arnold, publisher of “Dark Cyber Annex” and producer of DarkCyber, a weekly video news program for law enforcement and intelligence professionals. Access these information sources at www.arnoldit.com/wordpress.

Comments

One Response to “Exclusive: Interview with DataWalk’s Chief Analytics Officer Chris Westphal, Who Guides an Analytics Rocket Ship”

  1. E-Cigarettes and Variety Store on November 4th, 2020 12:27 am

    Has anyone shopped at Uptown Vapor? 🙂

Got something to say?





  • Archives

  • Recent Posts

  • Meta