Thomson Reuters: Whither Palantir Technologies

May 6, 2019

When I was working on a profile of Palantir Technologies for a client a couple of years ago, I came across a reference to Thomson Reuters’ use of Palantir Technologies smart system. News of the deal surfaced in a 2010 news release issued on Market Wired, but like many documents in the “new” approach to Web indexing, the content is a goner.

My memory isn’t what it used to be, but I recall that the application was called QA Studio. The idea obviously was to allow a person to ask a question using the “intuitive user interface” which the TR and Palantir team created to generate revenue magic. The goal was to swat the pesky FactSet and Bloomberg offerings as well as the legion of wanna-be analytics vendors chasing the Wall Street wizards.

Here’s a document form my files showing a bit of the PR lingo and the interface to the TR Palantir service:

image

I am not sure what happened to this product nor the relationship with the Palantir outfit.

I assume that TR wants more smart software, not just software which creates more work for the already overburdened MBAs planning the future of the economic world.

One of the DarkCyber researchers spotted this news release, which may suggest that TR is looking to the developer of OS/2 (once used by TR as I recall) for smart software: “IBM, Thomson Reuters Introduce Powerful New AI and Data Combination to Simplify How Financial Institutions Tackle Regulatory Compliance Challenges.”

The news release informed me that:

IBM and Thomson Reuters Regulatory Intelligence will now offer financial institutions access to a RegTech solution delivered from the IBM Cloud that features real-time financial services data from thousands of content sources. Backed by the power of AI and domain knowledge of Promontory Financial Group, the collaboration will enable risk and compliance professionals to keep pace with regulatory changes, manage risk and reduce the overall cost of compliance.

I learned:

Thomson Reuters and IBM have been collaborating on AI and data intelligence since 2015, bringing together expertise and technology to solve industry-specific problems in areas such as healthcare and data privacy. Today’s announcement represents another step forward in helping businesses combat their most pressing regulatory challenges.

The most interesting word in the news release is “holistic.” I haven’t encountered that since “synergy” became a thing. Here’s what the TR IBM news release offered:

Featuring an updated user experience to allow for increased engagement, IBM OpenPages with Watson 8.0 transforms the way risk and compliance professionals work. By providing a holistic view of risk and regulatory responsibilities, OpenPages helps compliance professionals actively participate in risk management as a part of their day-to-day activity. In addition to integrating Thomson Reuters Regulatory Intelligence, IBM OpenPages with Watson incorporates the expertise of Promontory Financial Group to help users of OpenPages create libraries of relevant regulatory requirements, map them to their internal framework and evaluate their impact to the business.

Yep, OpenPages. What is this? Well, it is Watson, but that doesn’t help me. Watson is more of a combo consulting-licensing thing. In this implementation, OpenPages reduces risk and makes “governance” better with AI and advanced analytics.

Analytics? That was the purpose of Palantir Technologies’ solution.

Let’s step back. What is the news release saying? These thoughts zoomed through my now confused brain:

  • TR licensed Palantir’s system which delivers some of the most advanced analytics offered based on my understanding of the platform. Either TR can’t make Palantir do what TR wants to generate revenue or Palantir’s technology is falling below the TR standard for excellence.
  • TR needs a partner which can generate commercial sales. IBM is supposed to be a sales powerhouse, but IBM’s financial performance has been dicey for years. Palantir, therefore, may be underperforming, and IBM’s approach is better. What?
  • IBM’s Watson TR solution works better than IBM’s forays into medicine, enterprise search, cloud technology for certain government entities, and a handful of other market sectors. What?

To sum up, I am not sure which company is the winner in this TR IBM deal? One hypothesis is that both TR and IBM hope to pull a revenue bunny from the magic hat worn by ageing companies.

The unintentional cold shoulder to Palantir may not be a signal about that firm. But with IPO talk circulating in some circles, Palantir certainly wants outfits like TR to emit positive vibes.

Interesting stuff this analytics game. I suppose one must take a “holistic” view. Will there be “synergy” too?

Stephen E Arnold, May 6, 2018

DarkCyber for April 30, 2019, Now Available

April 30, 2019

DarkCyber for April 30, 2019, is now available at www.arnoldit.com/wordpress and on Vimeo at https://www.vimeo.com/332933089 .

The program is a production of Stephen E Arnold. It is the only weekly video news shows focusing on the Dark Web, cybercrime, and lesser known Internet services.

This week’s story line up includes: The British government’s online harms report; work methods of hackers; Qintar, a Sharia compliant crypto currency; a new Dark Web index; and a close look at Haystax Constellation cyber software.

This week’s feature examines Haystax Technologies’ Constellation system. The platform can perform a range of cyber functions, including analyzing and protecting facilities and events like the US Super Bowl. The system can also identify and monitor employees which are likely to present a high probability of risk to their employers. The insider threat capability reduces risk and helps reduce the loss of sensitive data. Constellation uses a range of patented systems and methods. The company relies, in part, on the mathematics of Sir Thomas Bayes. Like Autonomy plc, Haystax processes existing data and then integrates real time information in order to generate its predictive outputs.

Other stories in the April 30, 2019, DarkCyber video include brief “cybershots” about:

  • The British government released a report about the activities of social media firms. The document is a harsh critique of the management and business tactics of a number of high profile firms. The facts uncovered by the government analysts, the examples presented, and the recommendations set forth in the document are likely to have considerable weight. Britain is contemplating new regulations to control the behaviors of US social media firms.
  • DarkCyber provides basic information about how hackers (white hat and black hat varieties) perform their work. Not surprisingly, trial and error play a sign cant part. However, there are specific methods, and these have been disclosed by the WikiLeaks-type site edited by a persona which appears to be a former CIA agent. A way to download the report and access the site are included in the video.
  • A new Dark Web indexing service called Darkmention. The viewer learns where a detailed technical description of the system can be obtained. Although there are numerous Dark Web indexing systems, the Darkmention approach is to process more than 350 different content platforms, not just Tor accessible sites.
  • DarkCyber explains that a new Sharia compliant crypto currency is now available. Qintar is based on the Islamic blockchain technology. The crypto tokens may be purchased from the Qintar bank based in Geneva, Switzerland.

The video is available at www.arnoldit.com/wordpress.

Kenny Toth, April 30, 2019

Google: History? Backfiles Do Not Sell Ads

April 29, 2019

We spotted a very interesting article in Tablix: “Google Index Coverage”. We weren’t looking for the article, but it turned up in a list of search results and one of the DarkCyber researchers called it to my attention.

Background: Years ago we did a bit of work for a company engaged in data analysis related to the health and medical sectors. We had to track down the names of the companies who were hired by the US government to do some outsourced fraud investigation. We were able to locate the government statements of work and even some of the documents related to investigations. We noticed a couple of years ago that our bookmarks to some government documents did not resolve. With USA.gov dependent on Bing, we checked that index. We tried US government Web sites related to the agencies involved. Nope. The information had disappeared, but in one case we did locate documents on a US government agency’s Web site. The data were “there” but the data were not in Bing, Exalead, Google, or Yandex. We also checked the recyclers of search results: Startpage, the DuckDuck thing, and MillionShort.

We had other information about content disappearing from sites like the Wayback Machine too. From our work for assorted search companies and our own work years ago on ThePoint.com, which we sold to Lycos, we had considerable insight into the realities of paying for indexing that did not generate traffic or revenue. The conclusion we had reached and we assumed that other vendors would reach was:

Online search is not a “free public library.”

A library is/was/should be an archiving entity; that is, someone has to keep track and store physical copies of books and magazines.

Online services are not libraries. Online services sell ads as we did to Zima who wanted their drink in front of our users. This means one thing:

Web indexes dump costs.

The Tablix article makes clear that some data are expendable. Delete them.

Our view is:

Get used to it.

There are some knock on effects from the simple logic of reducing costs and increasing the efficiency of the free Web search systems. I have written about many of these, and you can search the 12,000 posts on this blog or pay to search commercial indexes for information in my more than 100 published articles related to search. You may even have a copy of one of my more than a dozen monographs; for example, the original Enterprise Search Reports or The Google Legacy.

  1. Content is disappearing from indexes on commercial and government Web sites. Examples range from the Tablix experience to the loss of the MIC contracts which detail exclusives for outfits like Xerox.
  2. Once the content is not findable, it may cease to exist for those dependent on free search and retrieval services. Sorry, Library of Congress, you don’t have the content, nor does the National Archives. The situation is worse in countries in Asia and Eastern Europe.
  3. Individuals — particularly the annoying millennials who want me to provide information for free — do not have the tools at hand to locate high value information. There are services which provide some useful mechanisms, but these are often affordable only by certain commercial enterprises, some academic research organizations, and law enforcement and intelligence agencies. This means that most people are clueless about the “accuracy”, “completeness,” and “provenance” of certain information.

Net net: If data generate revenue, it may be available online and findable. If the data do not, hasta la vista. The situation is one that gives me and my research team considerable discomfort.

Imagine how smart software trained on available data will behave? Probably in a pretty stupid way? Information is not what people believe it to be. Now we have a generation or two of people who think research is looking something up on a mobile device. Quite a combo: Ill informed humans and software trained on incomplete data.

Yeah, that’s just great.

Stephen E Arnold, April 28, 2019

Latest GraphDB Edition Available

April 25, 2019

A new version of GraphDB is now available, we learn from the company’s News post, “Ontotext’s GraphDB 8.9 Boosts Semantic Similarity Search.” The semantic graph database offers a couple new features inspired by user feedback. We learn:

“The semantic similarity search is based on the Random Indexing algorithm. … The latest GraphDB release enables users to create hybrid similarity searches using pre-built text-based similarity vectors for the predication-based similarity index. The index combines the power of graph topology with the text similarity. The users can control the index accuracy by specifying the number of iterations required to refine the embeddings. Another improvement is that now GraphDB 8.9 allows users to boost the term weights when searching in text-based similarity indexes. It also simplifies the processes of abortion of running queries or updates from the SPARQL editor in the Workbench.”

The database continues to be updated to the current RDF4J 2.4.6 public release. GraphDB comes in Free, Standard, and Enterprise editions. Begun in 2000, Ontotext is based in Sofia, Bulgaria, and maintains its North American office in New York City.

Cynthia Murrell, April 25, 2019

Nosing Beyond the Machine Learning from Human Curated Data Sets: Autonomy 1996 to Smart Software 2019

April 24, 2019

How does one teach a smart indexing system like Autonomy’s 1996 “neurodynamic” system?* Subject matter experts (SMEs) assembled training collection of textual information. The article and other content would replicate the characteristics of the content which the Autonomy system would process; that is, index and make searchable or analyzable. The work was important. Get the training data wrong and the indexing system would assign metadata or “index terms” and “category names” which could cause a query to generate results the user could perceive as incorrect.

image

How would a licensee adjust the Autonomy “black box”? (Think of my reference to Autonomy and search as a way of approaching “smart software” and “artificial intelligence.”)

The method was to perform re-training. The approach was practical and for most content domains, the re-training worked. It was an iterative process. Because the words in the corpus fed into the “black box” included new words, concepts, bound phrases, entities, and key sequences, there were several functions integrated into the basic Autonomy system as it matured. Examples ranged from support for term lists (controlled vocabularies) and dictionaries.

The combination of re-training and external content available to the system allowed Autonomy to deliver useful outputs.

Where the optimal results departed from the real world results usually boiled down to several factors, often working in concert. First, licensees did not want to pay for re-training. Second, maintenance of the external dictionaries was necessary because new entities arrive with reasonable frequency. Third, testing and organizing the freshening training sets and the editorial work required to keep dictionaries ship shape was too expensive, time consuming, and tedious.

Not surprisingly, some licensees grew unhappy with their Autonomy IDOL (integrated data operating layer) system. That, in my opinion, was not Autonomy’s fault. Autonomy explained in the presentations I heard what was required to get a system up and running and outputting results that could easily hit 80 percent or higher on precision and recall tests.

The Autonomy approach is widely used. In fact, wherever there is a Bayesian system in use, there is the training, re-training, external knowledge base demand. I just took a look at Haystax Constellation. It’s Bayesian and Haystax makes it clear that the “model” has to be training. So what’s changed between 1996 and 2019 with regards to Bayesian methods?

Nothing. Zip. Zero.

Read more

IBM: Drugs, Web Pages, and Watson

April 22, 2019

I read “Watson For Drug Discovery”. I don’t pay much attention to IBM’s assertions about its IBM Watson technology. The Jeopardy thing, the HRBlock thing, and the froth whipped up about smart software bored me.

This story was a bit different because, if it is accurate, it reveals a lack of coordination within a company which once was reasonably well organized. I worked on indexing the content of the IBM technical libraries and oversaw the leasing of certain data sets to Big Blue for a number of years. That IBM — despite the J1, J2, and J3 charging mechanism — was a good customer and probably could have made New York commuter trains run on time. (Well, maybe not.)

The Science Magazine story focuses on IBM pulling out of selling Watson to invent drugs. I mean if anyone took a look at the recipes Watson cooked up and memorialized in the IBM cook book, drugs seemed to be a stretch. Would you like tamarind for your cancer treatment? No, possibly another spice?

The factoid I noted in the article is that even though the drug thing is history, IBM keeps or kept its Web pages touting the Watson thing. I snapped this screen shot at 641 am US Eastern time on April 22, 2019. Here it is:

image

The Science Magazine write up (which I assume is not channeling its inner Saturday Night Live) states:

The idea was that it [Watson} would go ripping through the medical literature, genomics databases, and your in-house data collection, finding correlations and clues that humans had missed. There’s nothing wrong with that as an aspirational goal. In fact, that’s what people eventually expect out of machine learning approaches, but a key word in that sentence is “eventually”. IBM, though, specifically sold the system as being ready to use for target identification, pathway elucidation, prediction of gene and protein function and regulation, drug repurposing, and so on. And it just wasn’t ready for those challenges, especially as early as they were announcing that they were.

Failure I understand. The inability to manage the Web site is a bit like screwing up Job Control Language instructions. When I worked in the university computer lab, that was a minimum wage student job, dead easy, and only required basic organizational and coordination skills.

IBM seems to have lost something just as it did when it allegedly fired old timers to become the “new” IBM. Maybe the old IBM has something today’s IBM lacks?

Stephen E Arnold, April 22, 2019

Quantum Search: Consultants, Rev Your Engines

April 18, 2019

Search is a utility function. A number of companies have tried to make it into a platform upon which a business or a government agency’s mission rests. Nope.

In fact, for a decade I published “Beyond Search” and just got tired of repeating myself. Search works if one has a bounded domain, controlled vocabularies, consistent indexing, and technology which embraces precision and recall.

Today, not so much. People talk about search and lose their grip on the accuracy, relevance, and verifiability of the information retrieved. It’s not just wonky psycho-economic studies which cannot be replicated. Just try running the same query on two different mobile phones owned by two different people.

Against this background, please, read “How the Quantum Search Algorithm Works.” The paper contains some interesting ideas; for example:

It’s incredible that you need only examine an NN-item search space on the order of \sqrt{N}N?times in order to find what you’re looking for. And, from a practical point of view, we so often use brute search algorithms that it’s exciting we can get this quadratic speedup. It seems almost like a free lunch. Of course, quantum computers still being theoretical, it’s not quite a free lunch – more like a multi-billion dollar, multi-decade lunch!

Yes, incredible.

However, the real impact of this quantum search write up will be upon the search engine optimization crowd. How quickly will methods for undermining relevance be found.

Net net: Quantum or not, search seems destined to repeat its 50 year history in a more technically sophisticated computational environment. Consultants, abandon your tired explanations of federated search. Forget mere geo-tagging. Drill right into the heart of quantum possibilities. I am eagerly awaiting a Forrester wave report on quantum search and a Gartner magic quadrant, filled with subjective possibilities.

Stephen E Arnold, April 18, 2019

Expert System: Interesting Financials

April 6, 2019

Expert System SpA is a firm providing semantic software that extracts knowledge from text by replicating human processes. I noticed information on the company’s Web site which informed me:

  • The company had sales revenues of 28.7 million euros for 2018
  • The company’s growth was 343 percent compared to 2017
  • The net financial position was 12.4 million euros up from 8.8 million euros in March 2017.

Remarkable financial performance.

Out of curiosity I navigated to Google Finance and plugged in Expert System Spa to see what data the GOOG could offer.

Here’s the chart displayed on April 6, 2019:

image

The firm’s stock does not seem to be responding as we enter the second quarter of 2019.

Read more

Netwrix Buys Concept Searching

April 5, 2019

Late last year we learned that Concept Searching was selling itself to Netwrix. I don’t pay much attention to “finding” solutions. I thought of Concept Searching in the context of the delay in awarding the JEDI contract. Concept Searching might be a nifty add on if Microsoft gets the $10 billion deal.

Concept Searching had positioned itself as an indexing outfit and taxonomy management tool. The company struck me as having a Microsoft-centric focus and dabbled in enterprise search and jousted with Smartlogic.

According to the company’s founder Martin Garland:

Concept Searching is excited about becoming a part of Netwrix. Merging our unique technology with its exceptional Netwrix Auditor product delivers a new level of protection to organizations concerned about data security, with the ability to identify and remediate personal or organizationally defined sensitive information, regardless of where it is stored or how it was ingested. The expanded team will enable us to be even more agile, increasingly responsive to our clients’ needs, and to deliver a platform for growth to both client bases and ensure we maintain our leadership position in delivering world-class metadata-driven solutions.

Netwrix is a software company focused exclusively on providing IT security and operations teams with pervasive visibility into user behavior, system configurations and data sensitivity across hybrid IT infrastructures to protect data regardless of its location. The company has 10,000 customers.

DarkCyber believes that like Exalead’s acquisition by Dassault or OpenText’s purchase of assorted search and retrieval systems, it will be interesting to watch how this acquisition works out.

Stephen E Arnold, April 5, 2019

Federating Data: Easy, Hard, or Poorly Understood Until One Tries It at Scale?

March 8, 2019

I read two articles this morning.

One article explained that there’s a new way to deal with data federation. Always optimistic, I took a look at “Data-Driven Decision-Making Made Possible using a Modern Data Stack.” The revolution is to load data and then aggregate. The old way is to transform, aggregate, and model. Here’s a diagram from DAS43. A larger version is available at this link.das42 diagram

Hard to read. Yep, New Millennial colors. Is this a breakthrough?

I don’t know.

When I read “2 Reasons a Federated Database Isn’t Such a Slam-Dunk”, it seems that the solution outlined by DAS42 and the InfoWorld expert are not in sync.

There are two reasons. Count ‘em.

One: performance

Two: security.

Yeah, okay.

Some may suggest that there are a handful of other challenges. These range from deciding how to index audio, video, and images to figuring out what to do with different languages in the content to determining what data are “good” for the task at hand and what data are less “useful.” Date, time, and geocodes metadata are needed, but that introduces the not so easy to solve indexing problem.

So where are we with the “federation thing”?

Exactly the same place we were years ago…start ups and experts notwithstanding. But then one has to wrangle a lot of data. That’s cost, gentle reader. Big money.

Stephen E Arnold, March 8, 2019

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta