CyberOSINT banner

ThomsonReuters: Palantir Not Enough Math?

April 6, 2016

I read “TRRI Users Will Gain Access to FiscalNote’s Legislative Modeling Techniques.” The licensees of Palantir Metropolitan and the owner of Westlaw smart software for legal eagles is pushing into new territory. That’s probably good news for stakeholders who have watch ThomsonReuters bump into a bit of a revenue ceiling in the last few years.

According to the write up:

The main benefit of the agreement [with FiscalNote] will grant Thomson Reuters’ Regulatory Intelligence (TRRI) newly extended capabilities across its predictive legislative analytics. TRRI is a global solution that helps clients focus and leverage their regulatory risk. Per the agreement, FiscalNote will help provide TRRI users with likelihood factors and other insights relegated to specifics pieces of legislative passage.

Interesting. I assumed that Palantir’s platform would have the extensibility to handle this type of content processing and analysis. Wrong again.

I learned:

FiscalNote utilizes machine learning and natural language processing in its modeling techniques that help it engineer models to conduct a host of analyses on open government data. In essence, these models allow FiscalNote to automatically analyze how legislation is going to yield any material impact via a combination of factors such as legislators, committee assignments, actions taken, bill versions, and amendments.

Wait, wait, don’t tell me. Westlaw’s smart software which can do many wonderful advanced text processing tricks is not able to perform in the manner of FiscalNote.

My hunch is that the deal has less to do with technologies, extensible or not, and more to do with getting some customers and an opportunity to find a way to pump up those revenues. Another idea: Is ThomsonReuters emulating IBM’s tactic of buying duplicative technology as a revenue rocket booster?

Perhaps Palantir and Westlaw should team up so ThomsonReuters’ customers have additional choices? Think of the XML slicing and dicing strategy with the intelligence and legal technology working in harmony.

Stephen E Arnold, April 6, 2016

Patents and Semantic Search: No Good, No Good

March 31, 2016

I have been working on a profile of Palantir (open source information only, however) for my forthcoming Dark Web Notebook. I bumbled into a video from an outfit called ClearstoneIP. I noted that ClearstoneIP’s video showed how one could select from a classification system. With every click,the result set changed. For some types of searching, a user may find the point-and-click approach helpful. However, there are other ways to root through what appears to be patent applications. There are the very expensive methods happily provided by Reed Elsevier and Thomson Reuters, two find outfits. And then there are less expensive methods like Alphabet Google’s odd ball patent search system or the quite functional FreePatentsOnline service. In between, you and I have many options.

None of them is a slam dunk. When I was working through the publicly accessible Palantir Technologies’ patents, I had to fall back on my very old-fashioned method. I tracked down a PDF, printed it out, and read it. Believe me, gentle reader, this is not the most fun I have ever had. In contrast to the early Google patents, Palantir’s documents lack the detailed “background of the invention” information which the salad days’ Googlers cheerfully presented. Palantir’s write ups are slogs. Perhaps the firm’s attorneys were born with dour brain circuitry.

I did a side jaunt and came across a white paper from ClearstoneIP called “Why Semantic Searching Fails for Freedom-to-Operate (FTO).”i The 12 page write up is from a company called ClearstoneIP, which is a patent analysis company. The firm’s 12 pager is about patent searching. The company, according to its Web site is a “paradigm shifter.” The company describes itself this way:

ClearstoneIP is a California-based company built to provide industry leaders and innovators with a truly revolutionary platform for conducting product clearance, freedom to operate, and patent infringement-based analyses. ClearstoneIP was founded by a team of forward-thinking patent attorneys and software developers who believe that barriers to innovation can be overcome with innovation itself.

The “freedom to operate” phrase is a bit of legal jargon which I don’t understand. I am, thank goodness, not an attorney.

The firm’s search method makes much of the ontology, taxonomy, classification approach to information access. Hence, the reason my exploration of Palantir’s dynamic ontology with objects tossed ClearstoneIP into one of my search result sets.

The white paper is interesting if one works around the legal mumbo jumbo. The company’s approach is remarkable and invokes some of my caution light words; for example:

  • “Not all patent searches are the same.”, page two
  • “This all leads to the question…”, page seven
  • “…there is never a single “right” way to do so.”, page eight
  • “And if an analyst were to try to capture all of the ways…”, page eight
  • “to capture all potentially relevant patents…”, page nine.

The absolutist approach to argument is fascinating.

Okay, what’s the ClearstoneIP search system doing? Well, it seems to me that it is taking a path to consider some of the subtlties in patent claims’ statements. The approach is very different from that taken by Brainware and its tri-gram technology. Now that Lexmark owns Brainware, the application of the Brainware system to patent searching has fallen off my radar. Brainware relied on patterns; ClearstoneIP uses the ontology-classification approach.

Both are useful in identifying patents related to a particular subject.

What is interesting in the write up is its approach to “semantics.” I highlighted in billable hour green:

Anticipating all the ways in which a product can be described is serious guesswork.

Yep, but isn’t that the role of a human with relevant training and expertise becomes important? The white paper takes the approach that semantic search fails for the ClearstoneIP method dubbed FTO or freedom to operate information access.

The white paper asserted:


Semantic searching is the primary focus of this discussion, as it is the most evolved.

ClearstoneIP defines semantic search in this way:

Semantic patent searching generally refers to automatically enhancing a text -based query to better represent its underlying meaning, thereby better identifying conceptually related references.

I think the definition of semantic is designed to strike directly at the heart of the methods offered to lawyers with paying customers by Lexis-type and Westlaw-type systems. Lawyers to be usually have access to the commercial-type services when in law school. In the legal market, there are quite a few outfits trying to provide better, faster, and sometimes less expensive ways to make sense of the Miltonesque prose popular among the patent crowd.

The white paper, in a lawyerly way, the approach of semantic search systems. Note that the “narrowing” to the concerns of attorneys engaged in patent work is in the background even though the description seems to be painted in broad strokes:

This process generally includes: (1) supplementing terms of a text-based query with their synonyms; and (2) assessing the proximity of resulting patents to the determined underlying meaning of the text – based query. Semantic platforms are often touted as critical add-ons to natural language searching. They are said to account for discrepancies in word form and lexicography between the text of queries and patent disclosure.

The white paper offers this conclusion about semantic search:

it [semantic search] is surprisingly ineffective for FTO.

Seems reasonable, right? Semantic search assumes a “paradigm.” In my experience, taxonomies, classification schema, and ontologies perform the same intellectual trick. The idea is to put something into a cubby. Organizing information makes manifest what something is and where it fits in a mental construct.

But these semantic systems do a lousy job figuring out what’s in the Claims section of a patent. That’s a flaw which is a direct consequence of the lingo lawyers use to frame the claims themselves.

Search systems use many different methods to pigeonhole a statement. The “aboutness” of a statement or a claim is a sticky wicket. As I have written in many articles, books, and blog posts, finding on point information is very difficult. Progress has been made when one wants a pizza. Less progress has been made in finding the colleagues of the bad actors in Brussels.

Palantir requires that those adding content to the Gotham data management system add tags from a “dynamic ontology.” In addition to what the human has to do, the Gotham system generates additional metadata automatically. Other systems use mostly automatic systems which are dependent on a traditional controlled term list. Others just use algorithms to do the trick. The systems which are making friends with users strike a balance; that is, using human input directly or indirectly and some administrator only knowledgebases, dictionaries, synonym lists, etc.

ClearstoneIP keeps its eye on its FTO ball, which is understandable. The white paper asserts:

The point here is that semantic platforms can deliver effective results for patentability searches at a reasonable cost but, when it comes to FTO searching, the effectiveness of the platforms is limited even at great cost.

Okay, I understand. ClearstoneIP includes a diagram which drives home how its FTO approach soars over the competitors’ systems:


ClearstoneIP, © 2016

My reaction to the white paper is that for decades I have evaluated and used information access systems. None of the systems is without serious flaws. That includes the clever n gram-based systems, the smart systems from dozens of outfits, the constantly reinvented keyword centric systems from the Lexis-type and Westlaw-type vendor, even the simplistic methods offered by free online patent search systems like

What seems to be reality of the legal landscape is:

  1. Patent experts use a range of systems. With lots of budget, many fee and for fee systems will be used. The name of the game is meeting the client needs and obviously billing the client for time.
  2. No patent search system to which I have been exposed does an effective job of thinking like an very good patent attorney. I know that the notion of artificial intelligence is the hot trend, but the reality is that seemingly smart software usually cheats by formulating queries based on analysis of user behavior, facts like geographic location, and who pays to get their pizza joint “found.”
  3. A patent search system, in order to be useful for the type of work I do, has to index germane content generated in the course of the patent process. Comprehensiveness is simply not part of the patent search systems’ modus operandi. If there’s a B, where’s the A? If there is a germane letter about a patent, where the heck is it?

I am not on the “side” of the taxonomy-centric approach. I am not on the side of the crazy semantic methods. I am not on the side of the keyword approach when inventors use different names on different patents, Babak Parviz aliases included. I am not in favor of any one system.

How do I think patent search is evolving? ClearstoneIP has it sort of right. Attorneys have to tag what is needed. The hitch in the git along has been partially resolved by Palantir’’-type systems; that is, the ontology has to be dynamic and available to anyone authorized to use a collection in real time.

But for lawyers there is one added necessity which will not leave us any time soon. Lawyers bill; hence, whatever is output from an information access system has to be read, annotated, and considered by a semi-capable human.

What’s the future of patent search? My view is that there will be new systems. The one constant is that, by definition, a lawyer cannot trust the outputs. The way to deal with this is to pay a patent attorney to read patent documents.

In short, like the person looking for information in the scriptoria at the Alexandria Library, the task ends up as a manual one. Perhaps there will be a friendly Boston Dynamics librarian available to do the work some day. For now, search systems won’t do the job because attorneys cannot trust an algorithm when the likelihood of missing something exists.

Oh, I almost forget. Attorneys have to get paid via that billable time thing.

Stephen E Arnold, March 30, 2016

Attensity Europe Has a New Name

March 30, 2016

Short honk: The adventure of Attensity continues. Attensity Europe has renamed itself Sematell Interactive Solutions. You can read about the change here. The news release reminds the reader that Sematell is “the leading provider of interaction solutions.” I am not able to define interaction solutions, but I assume the company named by combining semantic and intelligence will make the “interaction solutions” thing crystal clear. The url is

Stephen E Arnold, March 30, 2016

Content Analyst Sold to kCura

March 30, 2016

kCura, an e-discovery company, purchased Content Analyst. Content Analyst was a spin out from a Washington, DC consulting and services firm. According to “kCura Acquires Content Analyst Company, Developers of High-Performance Advanced Text Analytics Technologies

Content Analyst’s analytics engine has been fully integrated into Relativity Analytics for eight years, supporting a wide range of features that are flexible enough to handle the needs of any type or size of case — everything from organizing unstructured data to email threading to categorization that powers flexible technology-assisted review workflows….By joining teams, kCura will bring Content Analyst’s specialized engineering talent closer to Relativity users, in order to continue building a highly scalable analytics solution even faster.

Content Analytics performs a number of text processing functions, including entity extraction and concept identification for metatagging text. When the initial technology was developed by the DC firm specializing in intelligence and related work for the US government, the system captured the attention of the intelligence community. The systems and methods used by Content Analyst remain useful.

Unlike some text processing companies, Content Analyst focused on legal e-discovery. kCura is the new Content Analyst. What company will acquire Recommind?

Stephen E Arnold, March 30, 2016

Expert System Does a Me Too Innovation

March 29, 2016

Years ago I was a rental to an outfit called i2 Group in the UK. Please, don’t confuse the UK i2 with the ecommerce i2 which chugged along in the US of A.

The UK i2 had a product called Analysts Notebook. At one time it was basking in a 95 percent share of the law enforcement and intelligence market for augmented investigatory software. Analysts Notebook is still alive and kicking in the loving arms of IBM.

I thought of the vagaries of product naming when I read “Expert System USA Launches Analysts’ Workspace.”

According to the write up:

Analysts’ Workspace features comprehensive enterprise search and case management software integrated with a customizable semantic engine. It incorporates a sophisticated and efficient workflow process that enables team-wide collaboration and rapid information sharing. The product includes an intuitive dashboard allowing analysts to monitor, navigate, and access information using different taxonomies, maps, and worldviews, as well as intelligent workflow features specifically designed to proactively support analysts and investigators in the different phases of their activities.

The lingo reminds me of the early i2 Group marketing collateral. The terminology has surfaced in some of Palantir’s marketing statements and, quite recently, in the explanation of the venture funded Digital Shadows’ service.

I love me-too products. Where would one be if Mozart had not heard and remembered the note sequences of other composers.

Now the trick will be to make some money. Mozart, though a very good me too innovator, struggled in that department. Expert System, according to Google Finance, is going to have to find a way to keep that share price climbing. Today’s (March 22, 2016) share price is in penny stock territory:


Stephen E Arnold, March 29, 2016

Not So Weak. Right, Watson?

March 25, 2016

I read an article which provided to be difficult to find. None of my normal newsreaders snagged the write up called “The Pentagon’s Procurement System Is So Broken They Are Calling on Watson.” Maybe it is the singular Pentagon hooked with the plural pronoun “they”? Hey, dude, colloquial writing is chill.

Perhaps my automated systems’ missing the boat was the omission of the three impressive letters “IBM”? If you follow the activities of US government procurement, you may want to note the article. If you are tracking the tension between IBM i2 and Palantir Technologies, the article adds another flagstone to the pavement that IBM is building to support it augmented intelligence activities in the Department of Defense and other US government agencies.

Let me highlight a couple of comments in the write up and leave you to explore the article at whatever level you choose. I noted these “reports”:

The Air Force is currently working with two vendors, both of which have chosen Watson, IBM’s cognitive learning computer, to develop programs that would harness artificial intelligence to help businesses and government acquisitions officials work through the mind-numbing system.

The write up identifies one of the vendors working on IBM Watson for the US Air Force. The company is Applied Research.

I circled this quote: “The Pentagon’s procurement system is the “perfect application for Watson.”

The goslings and I love “perfect” applications.

How does Watson learn about procurement? The approach is essentially the method used in the mid 1990s by Autonomy IDOL. Here’s a passage I highlighted:

But first Watson must be trained. The first step is to feed it all the relevant documents. Then its digital intellect will be molded by humans, asking question after question, about 5,000 in all, to help understand context and the particular nuance that comes with federal procurement law.

How does this IBM deal fit into the Palantir versus IBM interaction? That’s a good question. What is clear is that the US Air Force has embraced a solution which includes systems and methods first deployed two decades ago.

What’s that about the pace of technology?

Stephen E Arnold, March 25, 2016

Expert System Is Getting with the AI Boomlet

March 17, 2016

I read “Cognitive Computing Specialist Expands US R&D.” The company is Expert System, founded in Modena (not Bologna) in 1989. The company will be celebrating its 27th birthday this year. Apart from Lexmark ISYS and OpenText’s Fulcrum, Expert System is one of the most senior vendors of semantic technology. To respond the vocabulary of IBM Watson, Expert System is now billing itself as a “cognitive computing specialist.”

The passage I highlighted with a quarter century old marker I found in my Expert System file box was:

The new labs in Palo Alto, California., and Rockville, Maryland., will focus on expanding the company’s Cogito cognitive computing software, the Italian company (EXSY.MI) said Tuesday (March 15). The U.S. locations expand the network of Cogito Labs that includes three in Italy along with facilities in Grenoble, France, and Madrid.

That’s a lot of research laboratories for a company whose share price has only recently blipped above $1.97. See this Google Finance chart. In the past six months, the company has deemphasized its “semantic” positioning and embraced the “cognitive” buzzword.

Other notable developments include:

  1. Breaking the company into two separate units. This news arrived in October 2015. See “Expert System Announces Plans to Structure U.S. Presence into Two Separate Companies for Public and Private Sectors.” The announcement followed hard on the heels of Expert System’s acquisition of the Temis outfit. Temis was created by a former IBM whiz but ran into a revenue ceiling several years ago. The Temis DNA may explain the “cognitive” appellation. I won’t go into the Watson-esque heritage. Just think rules. Training. Lots of time and human resources.
  2. A push into the high growth security sector. See “Expert System Launches Cogito Risk Watcher Software.” With the struggles some cybersecurity outfits are facing (example, Norse), one would think cybersecurity might be a somewhat crowded sector. In our research for the forthcoming “Dark Web Notebook,” we logged many references to Terbium Labs and Recorded Future, among others. We did not locate a single reference to Expert System’s Risk Watcher. Perhaps our research is incomplete?
  3. A deal with Quantic, a company with security intelligence solutions. See “Expert System Partners with Quantic Research for Security Intelligence Solutions.” Quantic Research is a subsidiary for the Holding Nivi Group, The Nivi Group is interesting. Here’s the message Google displays about the organization’s Web site:

warningThe google warning for Navi Group. March 16, 2016.

Interesting relationships. Expert System may want to do some checking to make sure that references in write ups about their innovations do not trigger oddball Google alerts.

To sum up, Expert System will be competing in some hot markets for top research talent. Maybe the downturn in unicorn valuations will free up some human resources for Expert System to hire?

The company  is definitely lab rich. The stock price suggests that revenue may be less fecund.

Stephen E Arnold, March 17, 2016

Text Analytics: Crazy Numbers Just Like the Good Old Days of Enterprise Search

March 16, 2016

Short honk: Want a growth business in a niche function that supports enterprise platforms? Well, gentle reader, look no farther than text analytics. Get your checkbook out and invest in this remarkable sector. It will be huuuuge.

Navigate to “Text Analytics Market to Account for US$12.16 bn in Revenue by 2024.”  What is text analytics? How big is text analytics today? How long has text analytics been a viable function supporting content processing?

Ah, good questions, but what’s really important is this passage:

According to this report, the global text analytics market revenue stood at US$2.82 bn in 2015 and is expected to reach US$12.16 bn by 2024, at a CAGR of 17.6% from 2016 to 2024.

I love these estimates. Imagine. Close out your life savings and invest in text analytics. You will receive a CAGR of 17.6 percent which you can cash in and buy stuff in 2024. That’s just eight years.

Worried about the economy? Want to seek the safe shelter of bonds? Forget the worries. If text analytics is so darned hot, why is the consulting firm pitching this estimate writing reports. Why not invest in text analytics?

Answer: Maybe the estimate is a consequence of spreadsheet fever?

Text analytics is a rocket just like the ones Jeff Bezos will use to carry you into space.

Stephen E Arnold, March 16, 2016

Xerox and Paper: A Lasting Love Affair

March 14, 2016

I can envision a person with a document in one language and no way to create a digital version so it can be pumped into an online translation service. Granted I have to think hard for scenarios outside of scriptwriters for the Jason Bourne films or some other low probability activity like a hard working person working in a government office in Bulgaria. But paper? Really?

I read “Xerox Adds Instant Translator Feature to Some of Its Printers.” The idea is that one puts in a page, the system digitizes the page, does the optical character recognition thing, and generates a version in the user’s language. Well, that’s the theory.

The write up says:

Just scan the original document, and the machine will instantly print it out in the language you choose among the 40 available.

Yeah, but what if the source language is one that is not supported? Well, there’s dear, old Google Translate with a 100 or so languages.

Xerox. A pacesetter. How quickly will this function become available in Canon and Epson printers?

Stephen E Arnold, March 14, 2016

Online Translation: Google or Microsoft?

March 1, 2016

HI have solved the translation problem. I live in Harrod’s Creek, Kentucky. Folks here speak Kentucky. No other language needed. However, gentle reader, you may want to venture into lands where one’s native language is not spoken or written. You will need online translation.

Should I forget Systran and other industrial strength solutions of yesteryear. Today the choice is Google or Microsoft if I understand “2 Main Reasons Why Google Translate Is Ahead of Microsoft and Skype.” (The link worked on February 22, 2016. If it does not work when you read this blog post, you may have to root around. That’s life in the zip zip world today.)

Reason one is that Google supports more languages than Microsoft. The total is 100 plus. The write up is sufficiently amazed to describe the language support of the Alphabet Google thing as “mind blowing.” Okay.

Reason two is that Google’s translation function works on smartphone. The write up points out:

You can hand-write, speak, type, or even take a picture of a given language and Google Translate will translate it for you. Not only this but on Android, some of the translation features are available offline. So, some features are accessible even if you do not have access to the internet.

The write up does not dig too deeply into Microsoft’s translation capability. If you are interested in Microsoft’s quite capable and useful services, navigate to the Microsoft Language Portal. Google is okay, but one service may not do the job a person who does not speak Kentucky requires.

Stephen E Arnold, February 27, 2016

« Previous PageNext Page »