Palantir and Sompo: Is a $150 Million Deal Big Enough, Too Small, or Just Right

November 19, 2019

Palantir Technologies has ingested about $2 billion in a couple of dozen investment rounds. Now a $150 million deal is very important to a services firm with a few million in sales. To an outfit like Booz, Allen or Deloitte, $150 million means a partner will keep her job and a handful of MBAs will be making regular flights to wonderful Narita.

Thiel Marks Palantir’s Asia Push with $150 Million Japan Venture” reports that Sompo Holdings is now Palantir’s partner, noting that the $150 million may be more of an investment. We noted this passage:

The billionaire entrepreneur [Peter Thiel] was in Japan Monday to unveil a $150 million, 50-50 joint venture with local financial services firm Sompo Holdings Inc., Palantir Technologies Japan Co. The new company will target government and public sector customers, emphasizing health and cybersecurity initially. Like IBM Corp. and other providers, Palantir’s software pulls together a range of data provided by its customers, mining it for patterns and displaying connections in easy-to-read spider web-like graphics that might otherwise get overlooked.

Bloomberg reported:

Palantir is very close to breaking even and will end 2019 either slightly in the black or slightly in the red, Thiel said at the briefing. The company will be “significantly in the black” next year, he added.

A few comments from the DarkCyber team:

  • The money in the headline is not explained in much detail. There is a difference between setting up a new company and landing a cash deal.
  • Bloomberg seems indifferent to the revenue challenge Palantir faces; namely, there are quite a few investors and stakeholders who want their money plus interest. The announcement may not put these individuals’ minds at ease.
  • The news story does not mention that new, more agile companies are introducing solutions which make both IBM Analysts Notebook and Gotham look a bit like Vinnie Testaverde or Bart Starr throwing passes at a barbeque.

Singapore is the location of choice for some of the more agile intelware and policeware vendors. Is Japan is a bit 2003?

To sum up, Palantir is to some a start up. To others Palantir is an example of a company that may lose out to upstarts which offer a more intuitive user interface and slicker data analytics. It is possible that an outfit like Amazon and its whiz bang data market place could deliver a painful blow to a firm which opened for business in 2003. That’s more than 15 years ago. But next year? Palantir will be profitable.

Stephen E Arnold, November 19, 2019

Simple English Is The Basis For Complex Data Visualizations

November 7, 2019

Computers started to gain a foothold in modern society during the 1980s. By today’s standards, the physical size and amount of data old school computers used to process are laughable. Tech Explore reports on how spoken English can actually create complex, data rich visitations, something that was only imaginable in the 1980s, in the article, “Study Leads To A System That Lets People Use Simple English To Create Complex Complex Machine Learning-Driven Visualizations.”

Today’s technology collects terabytes of diverse information from traffic patterns, weather patterns, disease outbreaks, animal migrations, financial trends, and human behavior models. The problem is that the people who could benefit from this data do not know how to make visualization models.

Professor Claudio Silva led a team at New York University Tandon School of Engineering’s Visualization and Data Analytics (VIDA) that developed VisFlow, a framework that allows non-data experts to create flexible and graphic rich data visualization models. These models will also be easy to edit with an extension called FlowSense. FlowSense will allow users to edit and synthesize data exploration pipes with a NLP interface.

VIDA is one of the leading research centers on data visualizations and FlowSense is already being used in astronomy, medicine, and climate research”

• “OpenSpace, a System for Astrographics is being used worldwide in planetariums, museums, and other contexts to explore the solar system and universe

Motion Browser: Visualizing and Understanding Complex Upper Limb Movement under Obstetrical Brachial Plexus Injuries is a collaboration between computer scientists, orthopedic surgeons, and rehabilitation physicians that could lead to new treatments for brachial nerve injuries and hypotheses for future research

The Effect of Color Scales on Climate Scientists’ Objective and Subjective Performance in Spatial Data Analysis Tasks is a web-based user study that takes a close look at the efficacy of the widely used practice of superimposing color scales on geographic maps.”

FlowSense and VisFlow are open source frameworks available on Github and programmers are welcome to experiment with them. These applications allow non-data experts to manipulate data for their fields, take advantage of technology, and augment their current work.

Whitney Grace, November 7, 2019

False News: Are Smart Bots the Answer?

November 7, 2019

To us, this comes as no surprise—Axios reports, “Machine Learning Can’t Flag False News, New Studies Show.” Writer Joe Uchill concisely summarizes some recent studies out of MIT that should quell any hope that machine learning will save us from fake news, at least any time soon. Though we have seen that AI can be great at generating readable articles from a few bits of info, mimicking human writers, and even detecting AI-generated stories, that does not mean they can tell the true from the false. These studies were performed by MIT doctoral student Tal Schuster and his team of researchers. Uchill writes:

“Many automated fact-checking systems are trained using a database of true statements called Fact Extraction and Verification (FEVER). In one study, Schuster and team showed that machine learning-taught fact-checking systems struggled to handle negative statements (‘Greg never said his car wasn’t blue’) even when they would know the positive statement was true (‘Greg says his car is blue’). The problem, say the researchers, is that the database is filled with human bias. The people who created FEVER tended to write their false entries as negative statements and their true statements as positive statements — so the computers learned to rate sentences with negative statements as false. That means the systems were solving a much easier problem than detecting fake news. ‘If you create for yourself an easy target, you can win at that target,’ said MIT professor Regina Barzilay. ‘But it still doesn’t bring you any closer to separating fake news from real news.’”

Indeed. Another of Schuster’s studies demonstrates that algorithms can usually detect text written by their kin. We’re reminded, however, that just because an article is machine written does not in itself mean it is false. In fact, he notes, text bots are now being used to adapt legit stories to different audiences or to generate articles from statistics. It looks like we will just have to keep verifying articles with multiple trusted sources before we believe them. Imagine that.

Cynthia Murrell, November 7, 2019

Visual Data Exploration via Natural Language

November 4, 2019

New York University announced a natural language interface for data visualization. You can read the rah rah from the university here. The main idea is that a person can use simple English to create complex machine learning based visualizations. Sounds like the answer to a Wall Street analyst’s prayers.

The university reported:

A team at the NYU Tandon School of Engineering’s Visualization and Data Analytics (VIDA) lab, led by Claudio Silva, professor in the department of computer science and engineering, developed a framework called VisFlow, by which those who may not be experts in machine learning can create highly flexible data visualizations from almost any data. Furthermore, the team made it easier and more intuitive to edit these models by developing an extension of VisFlow called FlowSense, which allows users to synthesize data exploration pipelines through a natural language interface.

You can download (as of November 3, 2019, but no promises the document will be online after this date) “FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System.”

DarkCyber wants to point out that talking to a computer to get information continues to be of interest to many researchers. Will this innovation put human analysts out of their jobs.

Maybe not tomorrow but in the future. Absolutely. And what will those newly-unemployed people do for money?

Interesting question and one some may find difficult to consider at this time.

Stephen E Arnold, November 4, 2019

 

Deepfake Detection: Unsolvable

November 3, 2019

CEO of Anti-Deepfake Software Says His Job Is Ultimately a Losing Battle” describes what may be an unsolvable problem. Manipulated content may be in the category of the Millennium Prize Problems, just more complicated. The slightly gloomy write up quotes the founder of Amber Video (Shamai Allibhai):

“Ultimately I think it’s a losing battle. The whole nature of this technology is built as an adversarial network where one tries to create a fake and the other tries to detect a fake. The core component is trying to get machine learning to improve all the time…Ultimately it will circumvent detection tools.

The newspaper publishing this observation did not include Jorge Luis Borges’ observation made in the Paris Review in 1967:

Really, nobody knows whether the world is realistic or fantastic, that is to say, whether the world is a natural process or whether it is a kind of dream, a dream that we may or may not share with others.

But venture funding makes the impossible appear to be possible until it is not.

Stephen E Arnold, November 3, 2019

Cyberbully Algorithm: Will It Work?

November 1, 2019

Given the paradoxes of human expression, teaching algorithms to identify harmful speech on social media has proven a difficult task. One group of researchers, though, has made a breakthrough—EurekAlert declares, “New Algorithms Can Distinguish Cyberbullies from Normal Twitter Users with 90% Accuracy.” The news release explains:

“Effective tools for detecting harmful actions on social media are scarce, as this type of behavior is often ambiguous in nature and/or exhibited via seemingly superficial comments and criticisms. Aiming to address this gap, a research team featuring Binghamton University computer scientist Jeremy Blackburn analyzed the behavioral patterns exhibited by abusive Twitter users and their differences from other Twitter users. ‘We built crawlers — programs that collect data from Twitter via variety of mechanisms,’ said Blackburn. ‘We gathered tweets of Twitter users, their profiles, as well as (social) network-related things, like who they follow and who follows them.’ The researchers then performed natural language processing and sentiment analysis on the tweets themselves, as well as a variety of social network analyses on the connections between users. The researchers developed algorithms to automatically classify two specific types of offensive online behavior, i.e., cyberbullying and cyberaggression. The algorithms were able to identify abusive users on Twitter with 90 percent accuracy. These are users who engage in harassing behavior, e.g. those who send death threats or make racist remarks to users.”

Of course, 90 percent accuracy means 10 percent slips through, so we still have a way to go. Also, for a bully to be detected, they have to have already acted badly, and no algorithm can undo that damage. Blackburn says his team is working on “pro-active mitigation techniques” that could help. I am curious to see what that will look like. Stay tuned.

Cynthia Murrell, November 1, 2019

Gender Bias in Old Books. Rewrite Them?

October 9, 2019

Here is an interesting use of machine learning. Salon tells us “What Reading 3.5 Million Books Tells Us About Gender Stereotypes.” Researchers led by University of Copenhagen’s Dr. Isabelle Augenstein analyzed 11 billion English words in literature published between 1900 and 2008. Not surprisingly, the results show that adjectives about appearance were most often applied to women (“beautiful” and “sexy” top the list), while men were more likely to be described by character traits (“righteous,” “rational,” and “brave” were most frequent). Writer Nicole Karlis describes how the team approached the analysis:

“Using machine learning, the researchers extracted adjectives and verbs connected to gender-specific nouns, like ‘daughter.’ Then the researchers analyzed whether the words had a positive, negative or neutral point of view. The analysis determined that negative verbs associated with appearance are used five times more for women than men. Likewise, positive and neutral adjectives relating to one’s body appearance occur twice as often in descriptions of women. The adjectives used to describe men in literature are more frequently ones that describe behavior and personal qualities.

“Researchers noted that, despite the fact that many of the analyzed books were published decades ago, they still play an active role in fomenting gender discrimination, particularly when it comes to machine learning sorting in a professional setting. ‘The algorithms work to identify patterns, and whenever one is observed, it is perceived that something is “true.” If any of these patterns refer to biased language, the result will also be biased,’ Augenstein said. ‘The systems adopt, so to speak, the language that we people use, and thus, our gender stereotypes and prejudices.’” Augenstein explained this can be problematic if, for example, machine learning is used to sift through employee recommendations for a promotion.”

Karlis does list some caveats to the study—it does not factor in who wrote the passages, what genre they were pulled from, or how much gender bias permeated society at the time. The research does affirm previous results, like the 2011 study that found 57% of central characters in children’s books are male.

Dr. Augenstein hopes her team’s analysis will raise awareness about the impact of gendered language and stereotypes on machine learning. If they choose, developers can train their algorithms on less biased materials or program them to either ignore or correct for biased language.

Cynthia Murrell, October 9, 2019

Memos: Mac Search Tool for Images

October 3, 2019

You need a Mac. You need photos with text. You need Memos. (Apple account may be required to snag the software.) The software identifies texts in images and extracts text. Enter a query and the software displays the source image.

You can take a pix of text and Memos will OCR it. You can search across images for in photo text.

Cost? About $5.

On the App Store. Use the link on the Memos Web site. When we search the App Store, Memos was not findable.

Does it work? Still some bugs if the user comments are on point.

Stephen E Arnold, October 3, 2019

Palantir Technologies: Fund Raising Signal

September 6, 2019

Palantir Technologies offers products and services which serve analysts and investigators. The company was founded in 2003, and it gained some traction in a number of US government agencies. The last time I checked for Palantir’s total funding, my recollection is that the firm has ingested about $2 billion from a couple dozen funding rounds. If you subscribe to Crunchbase, you can view that service’s funding round up. An outfit known as Growjo reports that Palantir has 2,262 employees. That works out cash intake of $884,173 per employee. Palantir is a secretive outfit, so who knows about funding, the revenue, the profits or losses, and the number of full time equivalents, contractors, etc. But Palantir is one of the highest profile companies in the law enforcement, regulatory, and intelligence sectors.

I read “Palantir to Seek Funding on Private Market, Delay IPO” and noted this statement:

The company has never turned an annual profit.

Bloomberg points out that customization of the system is expensive. Automation is a priority. Sales cycles are lengthy. And some stakeholders and investors are critical of the company.

Understandable. After 16 years and allegedly zero profits, annoyance is likely to surface in the NYAC after an intense game of squash.

But I am not interested in Palantir. The information about Palantir strikes me as germane to the dozens upon dozens of Palantir competitors. Consider these questions:

  1. Intelligence, like enterprise search, requires software and services that meet the needs of users who have quite particular work processes. Why pay lots of money to customize something that will have to be changed when a surprise event tips over established procedures? Roll your own? Look for the lowest cost solution?
  2. With so many competitors, how will government agencies be able to invest in a wide range of solutions. Why not seek a single source solution and find ways to escape from the costs of procuring, acquiring, tuning, training, and changing systems? If Palantir was the home run, why haven’t Palantir customers convinced their peers and superiors to back one solution? That hasn’t happened, which makes an interesting statement in itself. Why isn’t Palantir the US government wide solution the way Oracle was a few years ago?
  3. Are the systems outputting useful, actionable information. Users of these systems who give talks at LE and intel conferences are generally quite positive. But the reality is that cyber problems remain and have not been inhibited by Palantir and similar tools or the raft of cyber intelligence innovations from companies in the UK, Germany, Israel, and China. What’s the problem? Staff turnover, complexity, training cost, reliability of outputs?

Net net: Palantir’s needing money is an interesting signal. Stealth, secrecy, good customer support, and impressive visuals of networks of bad actors — important. But maybe — just maybe — the systems are ultimately not working as advertised. Sustainable revenues, eager investors, and a home run product equivalent to Facebook or Netflix — nowhere to be found. Yellow lights are flashing in DarkCyber’s office for some intelware vendors.

Stephen E Arnold, September 6, 2019

Can a Well Worn Compass Help Enterprise Search Thrive?

September 4, 2019

In the early 1990s, Scotland Yard (which never existed although there is a New Scotland Yard) wanted a way to make sense of the data available to investigators in the law enforcement sector.

A start up in Cambridge, England, landed a contract. To cut a multi year story short, i2 Ltd. created Analyst’s Notebook. The product is now more than a quarter century old, and the Analyst’s Notebook is owned by IBM. In the span of five or six years, specialist vendors reacted to the Analyst’s Notebook functionalities. Even though the early versions were clunky, the software performed some functions that may be familiar to anyone who has tried to locate, analyze, and make sense of data within an organization. I am using “organization” in a broad sense, not just UK law enforcement, regulatory enforcement, and intelligence entities.

What were some of the key functions of Analyst’s Notebook, a product which most people in the search game know little about? Let me highly a handful, and then flash forward to what enterprise search vendors are trying to pull off in an environment which is very different from what the i2 experts tackled 25 years ago. Hint: Focus was the key to Analyst’s Notebook’s success and to the me-too products which are widely available to LE and intel professionals. Enterprise search lacks this singular advantage, and, as a result, is likely to flounder as it has for decades.

The Analyst’s Notebook delivered:

  • Machine assistance to investigators implemented in software which generally followed established UK police procedures. Forget the AI stuff. The investigator or a team of investigators focused on a case provided most of the brain power.
  • Software which could identify entities. An entity is a person, place, thing, phone number, credit card, event, or similar indexable item.
  • Once identified, the software — influenced by the Cambridge curriculum in physics — could display a relationship “map” or what today looks like a social graph.
  • Visual cues allowed investigators to see that a person who received lots of phone calls from another person were connected. To make the relationship explicit, a heavy dark line connected the two phone callers.
  • Ability to print out on a big sheet of paper these relationship maps and other items of interest either identified by an investigator or an item surfaced using maths which could identify entities within a cluster or an anomaly and its date and time.

Over the years, other functions were added. Today’s version offers a range of advanced functions that make it easy to share data, collaborate, acquire and add to the investigative teams’ content store (on premises, hybrid, or in the cloud), automate some functions using IBM technology (no, I won’t use the Watson word), and workflow. Imagery is supported. Drill down makes it easy to see “where the data came from.” An auditor can retrace an investigator’s action in order to verify a process. If you want more about i2, just run a Bing, Google, or Yandex query.

Why am I writing about decades old software?

The reason is that is read an item from my files as my team was updating my comments about Amazon’s policeware for the October TechnoSecurity & Digital Forensics Conference. The item I viewed is titled “Thomson Reuters Partners with Squirro to Combine Artificial Intelligence Technology and Data to Unlock Customer Intelligence.” I had written about Squirro in “Will Cognitive Search (Whatever That Is) Change Because of Squirro?

I took a look at the current Squirro Web site and learned that the company is the leader in “context intelligence.” That seemed similar to what i2 delivered in the 1990s version of Analyst’s Notebook. The software was designed to fit the context of a specific country’s principal police investigators. No marketing functions, no legal information, no engineering product data — just case related information like telephone records, credit card receipts, officer reports, arrest data, etc.

Squirro, founded in 2012 or 2013 (there are conflicting dates online) states that the software delivers

a personalized, real-time contextual stream from the sea of information directly to your workplace. It’s based on Squirro’s digital fingerprint technology connecting personal interests and workflows while learning and refining as user interactions increase.

I also noted this statement:

Squirro combines all the different tools you need to work with unstructured data and enables you to curate a self-learning 360° context radar natural to use in any enterprise system. ‘So What?’ Achieving this reduces searching time by 90%, significantly cutting costs and allows for better, more effective decision-making. The highly skilled Swiss team of search experts has been working together for over 10 years to create a precise context intelligence solution. Squirro: Your Data in Context.

Well, 2013 to the present is six years, seven if I accept the 2012 date.

The company states that it offers “A.I.-driven actionable Insights,” adding:

Squirro is a leading AI-platform – a self-learning system keeping you in the know and recommending what’s next.

I’m okay with marketing lingo. But to my way of thinking, Squirro is edging toward the i2 Analyst’s Notebook type of functionality. The difference is that Squirro wants to serve the enterprise. Yep, enterprise search with wrappers for smart software, reports, etc.

I don’t want to make a big deal of this similarity, but there is one important point to keep in mind. Delivering an enterprise solution to a commercial outfit means that different sectors of the business will have different needs. The different needs manifest themselves in workflows and data particular to their roles in the organization. Furthermore, most commercial employees are not trained like police and intelligence operatives; that is, employees looking for information have diverse backgrounds and different educational experiences. For better or worse, law enforcement intelligence professionals go to some type of training. In the US, the job is handled by numerous entities, but a touchstone is FLETC. Each country has its equivalent. Therefore, there is a shared base of information, a shared context if you will.

Modern companies are a bit like snowflakes. There’s a difference, however, the snowflakes may no longer work together in person. In fact, interactions are intermediated in numerous ways. This is not a negative, but it is somewhat different from how a team of investigators worked on a case in London in the 1990s.

What is the “search” inside the Squirro information retrieval system? The answer is open source search. The features are implemented via software add ons, wrappers, and micro services plus other 2019 methods.

This is neither good nor bad. Using open source reduces some costs. On the other hand, the resulting system will have a number of moving parts. As complexity grows with new features, some unexpected events will occur. These have to be chased down and fixed.

New features and functions can be snapped in. The trajectory of this modern approach is to create a system which offers many marketing hooks and opportunities to make a sale to an organization looking for a solution to the ever present “information problem.”

My hypothesis is that i2 Analyst’s Notebook succeeded an information access, analysis, and reporting system because it focused on solving a rather specific use case. A modern system such as a search and retrieval solution that tries to solve multiple problems is likely to hit a wall.

The digital wall is the same one that pushed Fast Search & Transfer and many other enterprise search systems to the sidelines or the scrap heap.

Net net: Focus, not jargon, may be valuable, not just for Squirro, but for other enterprise search vendors trying to attain sustainable revenues and a way to keep their sources of funding, their customers, their employees, and their stakeholders happy.

Stephen E Arnold, September 4, 2019

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta