Smartlogic: A Buzzword Blizzard

August 2, 2017

I read “Semantic Enhancement Server.” Interesting stuff. The technology struck me as a cross between indexing, good old enterprise search, and assorted technologies. Individuals who are shopping for an automatic indexing systems (either with expensive, time consuming hand coded rules or a more Autonomy-like automatic approach) will want to kick the tires of the Smartlogic system. In addition to the echoes of the SchemaLogic approach, I noted a Thomson submachine gun firing buzzwords; for example:

best bets (I’m feeling lucky?)
dynamic summaries (like Island Software’s approach in the 1990s)
faceted search (hello, Endeca?)
navigator (like the Siderean “navigator”?)
real time
related topics (clustering like Vivisimo’s)
semantic (of course)
topic maps
topic pages (a Google report as described in US29970198481)
topic path browser (aka breadcrumbs?)

What struck me after I compiled this list about a system that “drives exceptional user search experiences” was that Smartlogic is repeating the marketing approach of traditional vendors of enterprise search. The marketing lingo and “one size fits all” triggered thoughts of Convera, Delphes, Entopia, Fast Search & Transfer, and Siderean Software, among others.

I asked myself:

Is it possible for one company’s software to perform such a remarkable array of functions in a way that is easy to implement, affordable, and scalable? There are industrial strength systems which perform many of these functions. Examples range from BAE’s intelligence system to the Palantir Gotham platform.

My hypothesis is that Smartlogic might struggle to process a real time flow of WhatsApp messages, YouTube content, and mobile phone intercept voice calls. Toss in the multi language content which is becoming increasingly important to enterprises, and the notional balloon I am floating says, “Generating buzzwords and associated over inflated expectations is really easy. Delivering high accuracy, affordable, and scalable content processing is a bit more difficult.”

Perhaps Smartlogic has cracked the content processing equivalent of the Voynich manuscript.


Will buzzwords crack the Voynich manuscript’s inscrutable text? What if Voynich is a fake? How will modern content processing systems deal with this type of content? Running some content processing tests might provide some insight into systems which possess Watson-esque capabilities.

What happened to those vendors like Convera, Delphes, Entopia, Fast Search & Transfer, and  Siderean Software, among others? (Free profiles of these companies are available at Oh, that’s right. The reality of the marketplace did not match the companies’ assertions about technology. Investors and licensees of some of these systems were able to survive the buzzword blizzard. Some became the digital equivalent of Ötzi, 5,300 year old iceman.

Stephen E Arnold, August 2, 2017

AI Not to Replace Lawyers, Not Yet

May 9, 2017

Robot or AI lawyers may be effective in locating relevant cases for references, but they are far away from replacing lawyers, who still need to go to the court and represent a client.

ReadWrite in a recently published analytical article titled Look at All the Amazing Things AI Can (and Can’t yet) Do for Lawyers says:

Even if AI can scan documents and predict which ones will be relevant to a legal case, other tasks such as actually advising a client or appearing in court cannot currently be performed by computers.

The author further explains that what the present generation of AI tools or robots does. They merely find relevant cases based on indexing and keywords, which was a time-consuming and cumbersome process. Thus, what robots do is eliminate the tedious work that was performed by interns or lower level employees. Lawyers still need to collect evidence, prepare the case and argue in the court to win a case. The robots are coming, but only for doing lower level jobs and not to snatch them.

Vishol Ingole, May 9, 2017

Palantir Technologies: A Beatdown Buzz Ringing in My Ears

April 27, 2017

I have zero contacts at Palantir Technologies. The one time I valiantly contacted the company about a speaking opportunity at one of my wonky DC invitation-only conferences, a lawyer from Palantir referred my inquiry to a millennial who had a one word vocabulary, “No.”

There you go.

I have written about Palantir Technologies because I used to be an adviser to the pre-IBM incarnation of i2 and its widely used investigation tool, Analyst’s Notebook. I did write about a misadventure between i2 Group and Palantir Technologies, but no one paid much attention to my commentary.

An outfit called Buzzfeed, however, does pay attention to Palantir Technologies. My hunch is that the online real news outfit believes there is a story in the low profile, Peter Thiel-supported company. The technology Palantir has crafted is not that different from the Analyst’s Notebook, Centrifuge Systems’ solution, and quite a few other companies which provide industrial-strength software and systems to law enforcement, security firms, and the intelligence community. (I list about 15 of these companies in my forthcoming “Dark Web Notebook.” No, I won’t provide that list in this free blog. I may be retired, but I am not giving away high value information.)

So what’s caught my attention. I read the article “Palantir’s Relationship with the Intelligence Community Has Been Worse Than You Think.” The main idea is that the procurement of Palantir’s Gotham and supporting services provided by outfits specializing in Palantir systems has not been sliding on President Reagan’s type of Teflon. The story has been picked up and recycled by several “real” news outfits; for example, Brainsock. The story meshes like matryoshkas with other write ups; for example, “Inside Palantir, Silicon Valley’s Most Secretive Company” and “Palantir Struggles to Retain Clients and Staff, BuzzFeed Reports.” Palantir, it seems to me in Harrod’s Creek, is a newsy magnet.

The write up about Palantir’s lousy relationship with the intelligence community pivots on a two year old video. I learned that the Big Dog at Palantir, Alex Karp, said in a non public meeting which some clever Hobbit type videoed on a smartphone words presented this way by the real news outfit:

The private remarks, made during a staff meeting, are at odds with a carefully crafted public image that has helped Palantir secure a $20 billion valuation and win business from a long list of corporations, nonprofits, and governments around the world. “As many of you know, the SSDA’s recalcitrant,” Karp, using a Palantir codename for the CIA, said in the August 2015 meeting. “And we’ve walked away, or they walked away from us, at the NSA. Either way, I’m happy about that.” The CIA, he said, “may not like us. Well, when the whole world is using Palantir they can still not like us. They’ll have no choice.” Suggesting that the Federal Bureau of Investigation had also had friction with Palantir, he continued, “That’s de facto how we got the FBI, and every other recalcitrant place.”

Okay, I don’t know the context of the remarks. It does strike me that 2015 was more than a year ago. In the zippy doo world of Sillycon Valley, quite a bit can change in one year.

I don’t know if you recall Paul Doscher who was the CEO of Exalead USA and Lucid Imagination (before the company asserted that its technology actually “works). Mr. Doscher is a good speaker, but he delivered a talk in 2009, captured on video, during which he was interviewed by a fellow in a blue sport coat and shirt. Mr. Doscher wore a baseball cap in gangsta style, a crinkled unbuttoned shirt, and evidenced a hipster approach to discussing travel. Now if you know Mr. Doscher, he is not a manager influenced by gangsta style. My hunch is that he responded to an occasion, and he elected to approach travel with a bit of insouciance.

Could Mr. Karp, the focal point of the lousy relationship article, have been responding to an occasion? Could Mr. Karp have adopted a particular tone and style to express frustration with US government procurement? Keep in mind that a year later, Palantir sued the US Army. My hunch is that views expressed in front of a group of employees may not be news of the moment. Interesting? Sure.

What I find interesting is that the coverage of Palantir Technologies does not dig into the parts of the company which I find most significant. To illustrate: Palantir has a system and method for an authorized user to add new content to the Gotham system. The approach makes it possible to generate an audit trail to make it easy (maybe trivial) to answer these questions:

  1. What data were added?
  2. When were the data added?
  3. What person added the data?
  4. What index terms were added to the data?
  5. What entities were added to the metadata?
  6. What special terms or geographic locations were added to the data?

You get the idea. Palantir’s Gotham brings to intelligence analysis the type of audit trail I found some compelling in the Clearwell system and other legal oriented systems. Instead of a person in information technology saying in response to a question like “Where did this information come from?”, “Duh. I don’t know.”

Gotham gets me an answer.

For me, explaining the reasoning behind Palantir’s approach warrants a write up. I think quite a few people struggling with problems of data quality and what is called by the horrid term “governance” would find Palantir’s approach of some interest.

Now do I care about Palantir? Nah.

Do I care about bashing Palantir? Nah.

What I do care about is tabloidism taking precedence over substantive technical approaches. From my hollow in rural Kentucky, I see folks looking for “sort of” information.

How about more substantive information? I am fed up with podcasts which recycle old information with fake good cheer. I am weary of leaks. I want to know about Palantir’s approach to search and content processing and have its systems and methods compared to what its direct competitors purport to do.

Yeah, I know this is difficult to do. But nothing worthwhile comes easy, right?

I can hear the millennials shouting, “Wrong, you dinosaur.” Hey, no problem. I own a house. I don’t need tabloidism. I have picked out a rest home, and I own 60 cemetery plots.

Do your thing, dudes and dudettes of “real” journalism.

Stephen E Arnold, April 27, 2017

Palantir Technologies: 9000 Words about a Secretive Company

April 3, 2017

Palantir Technologies is a search and content processing company. The technology is pretty good. The company’s marketing pretty good. Its public profile is now darned good. I don’t have much to say about Palantir’s wheel interface, its patents, or its usefulness to “operators.” If you are not familiar with the company, you may want to read or at least skim the weirdo Fortune Magazine Web article “Donald Trump, Palantir, and the Crazy Battle to Clean Up a Multibillion Dollar Military Procurement Swamp.” The subtitle is a helpful statement:

Peter Thiel’s software company says it has a product that will save soldiers’ lives—and hundreds of millions in taxpayer funds. The Army, which has spent billions on a failed alternative, isn’t interested. Weill the president and his generals ride to the rescue?”

The article, minus the pull quotes, is more than 9000 words long. The net net of the write  up is that changing the US government’s method of purchasing goods and services may be tough to modify. I used to work at a Beltway Bandit outfit. Legend has it that my employer helped set up the US Department of the Navy and many of the business processes so many contractors know and love.

One has to change elected officials, government professionals who operate procurement processes, outfits like Beltway Bandits, and assorted legal eagles.

Why take 9000 words to reach this conclusion. My hunch is that the journey was fun: Fun for the Fortune Magazine staff, fun for the author, and fun for the ad sales person who peppered the infinite page with ads.

Will Palantir Technologies enjoy the write up? I suppose it depends on whom one asks. Perhaps a reader connected to IBM could ask Watson about the Analyst’s Notebook team. What are their views of Palantir? For most folks, my thought is that the Palantir connection to President Trump may provide a viewshed from which to assess the impact of this real journalism essay thing.

Stephen E Arnold, April 3, 2017

Is Google Plucking a Chicken Joint?

March 14, 2017

Real chicken or fake news? You decide. I read “Google, What the H&%)? Search Giant Wrongly Said Shop Closed Down, Refused to List the Truth.” The write up reports that a chicken restaurant is clucking mad about how Google references the eatery. The Google, according to the article, thinks the fowl peddler is out of business. The purveyor of poultry disagrees.

The write up reports:

Kaie Wellman says that her rotisserie chicken outlet Arrosto, in Portland, Oregon, US, was showing up as “permanently closed” on Google’s mobile search results.

Ms Wellman contacted the Google and allegedly learned that Google would not change the listing. The fix seems to be that the bird roaster has to get humans to input data via Google Maps. The smart Google system will recognize the inputs and make the fix.

The write up reports that the Google listing is now correct. The fowl mix up is now resolved.

Yes, the Google. Relevance, precision, recall, and accuracy. Well, maybe not so much of these ingredients when one is making fried mobile outputs.

Stephen E Arnold, March 14, 2017

Index Is Important. Yes, Indexing.

March 8, 2017

I read “Ontologies: Practical Applications.” The main idea in the write up is that indexing is important. Now indexing is labeled in different ways today; for example, metadata, entity extraction, concepts, etc. I agree that indexing is important, but the challenge is that most people are happy with tags, keywords, or systems which return a result that has made a high percentage of users happy. Maybe semi-happy. Who really knows? Asking about search and content processing system satisfaction returns the same grim news year after year; that is, most users (roughly two thirds) are not thrilled with the tools available to locate information. Not much progress in 50 years it seems.

The write up informs me:

Ontologies are a critical component of the enterprise information architecture. Organizations must be capable of rapidly gathering and interpreting data that provides them with insights, which in turn will give their organization an operational advantage.  This is accomplished by developing ontologies that conceptualize the domain clearly, and allows transfer of knowledge between systems.

This seems to mean a classification system which makes sense to those who work in an organization. The challenge which we have encountered over the last half century is that the content and data flowing into an organization changes often rapidly over time. At any one point in time, the information today is not available. The organization sucks in what’s needed and hopes the information access system indexes the new content right away and makes it findable and usable in other software.

That’s the hope anyway.

The reality is that a gap exists between what’s accessible to a person in an organization and what information is being acquired and used by others in the organization. Search fails for most system users because what’s needed now is not indexed or if indexed, the information is not findable.

An ontology is a fancy way of saying that a consultant and software can cook up a classification system and use those terms to index content. Nifty idea, but what about that gap?

This is the killer for most indexing outfits. They make a sale because people are dissatisfied with the current methods of information access. An ontology or some other jazzed up indexing component is sold as the next big thing.

When an ontology, taxonomy, or other solution does not solve the problem, the company grouses about search and cotenant processing again.

Is there a fix? Who knows. But after 50 years in the information access sector, I know that jargon is not an effective way to solve very real problems. Money, know how, and old school methods are needed to make certain technologies deliver useful applications.

Ontologies. Great. Silver bullet. Nah. Practical applications? Nifty concept. Reality is different.

Stephen E Arnold, March 8, 2017

Forecasting Methods: Detail without Informed Guidance

February 27, 2017

Let’s create a scenario. You are a person trying to figure out how to index a chunk of content. You are working with cancer information sucked down from PubMed or a similar source. You run an extraction process and push the text through an indexing system. You use a system like Leximancer and look at the results. Hmmm.

Next you take a corpus of blog posts dealing with medical information. You suck down the content and run it through your extractor, your indexing system, and your Leximancer set up. You look at the results. Hmmm.

How do you figure out what terms are going to be important for your next batch of mixed content?

You might navigate to “Selecting Forecasting Methods in Data Science.” The write up does a good job of outlining some of the numerical recipes taught in university courses and discussed in textbooks. For example, you can get an overview in this nifty graphic:


And you can review outputs from the different methods identified like this:



What’s missing? For the person floundering away like one government agency’s employee at which I worked years ago, you pick the trend line you want. Then you try to plug in the numbers and generate some useful data. If that is too tough, you hire your friendly GSA schedule consultant to do the work for you. Yep, that’s how I ended up looking at:

  • Manually selected data
  • Lousy controls
  • Outputs from different systems
  • Misindexed text
  • Entities which were not really entities
  • A confused government employee.

Here’s the takeaway. Just because software is available to output stuff in a log file and Excel makes it easy to wrangle most of the data into rows and columns, none of the information may be useful, valid, or even in the same ball game.

When one then applies without understanding different forecasting methods, we have an example of how an individual can create a pretty exciting data analysis.

Descriptions of algorithms do not correlate with high value outputs. Data quality, sampling, understanding why curves are “different”, and other annoying details don’t fit into some busy work lives.

Stephen E Arnold, February 27, 2017

Intellisophic / Linkapedia

February 24, 2017

Intellisophic identifies itself as a Linkapedia company. Poking around Linkapedia’s ownership revealed some interesting factoids:

  • Linkapedia is funded in part by GITP Ventures and SEMMX (possible a Semper fund)
  • The company operates in Hawaii and Pennsylvania
  • One of the founders is a monk / Zen master. (Calm is a useful characteristic when trying to spin money from a search machine.)

First, Intellisophic. The company describes itself this way at this link:

Intellisophic is the world’s largest provider of taxonomic content. Unlike other methods for taxonomy development that are limited by the expense of corporate librarians and subject matter experts, Intellisophic content is machine developed, leveraging knowledge from respected reference works. The taxonomies are unbounded by subject coverage and cost significantly less to create. The taxonomy library covers five million topic areas defined by hundreds of millions of terms. Our taxonomy library is constantly growing with the addition of new titles and publishing partners.

In addition, Intellisophic’s technology—Orthogonal Corpus Indexing—can identify concepts in large collections of text. The system can be sued to enrich an existing technology, business intelligence, and search. One angle Intellisophic exploits is its use of reference and educational books. The company is in the “content intelligence” market.

Second, the “parent” of Intellisophic is Linkapedia. This public facing Web site allows a user to run a query and see factoids, links about a topic. Plus, Linkapedia has specialist collections of content bundles; for example, lifestyle, pets, and spirituality. I did some clicking around and found that certain topics were not populated; for instance, Lifestyle, Cars, and Brands. No brand information appeared for me.  I stumbled into a lengthy explanation of the privacy policy related to a mathematics discussion group. I backtracked, trying to get access the actual group and failed. I think the idea is an interesting one, but more work is needed. My test query for “enterprise search” presented links to Convera and a number of obscure search related Web sites.

The company is described this way in Crunchbase:

Linkapedia is an interest based advertising platform that enables publishers and advertisers to monetize their traffic, and distribute their content to engaged audiences. As opposed to a plain search engine which delivers what users already know, Linkapedia’s AI algorithms understand the interests of users and helps them discover something new they may like even if they don’t already know to look for it. With Linkapedia content marketers can now add Discovery as a new powerful marketing channel like Search and Social.

Like other search related services, Linkapedia uses smart software. Crunchbase states:

What makes Linkapedia stand out is its AI discovery engine that understands every facet of human knowledge. “There’s always something for you on Linkapedia”. The way the platform works is simple: people discover information by exploring a knowledge directory (map) to find what interests them. Our algorithms show content and native ads precisely tailored to their interests. Linkapedia currently has hundreds of million interest headlines or posts from the worlds most popular sources. The significance of a post is that “someone thought something related to your interest was good enough to be saved or shared at a later time.” The potential of a post is that it is extremely specific to user interests and has been extracted from recognized authorities on millions of topics.

Interesting. Search positioned as indexing, discovery, social, and advertising.

Stephen E Arnold, February 24, 2017

Mondeca: Tweaking Its Market Position

February 22, 2017

One of the Beyond Search goslings noticed a repositioning of the taxonomy capabilities of Mondeca. Instead of pitching indexing, the company has embraced ElasticSearch (based on Lucene) and Solr. The idea is that if an organization is using either of these systems for search and retrieval, Mondeca can provide “augmented” indexing. The idea is that keywords are not enough. Mondeca can index the content using concepts.

Of course, the approach is semantic, permits exploration, and enables content discovery. Mondeca’s Web site describes search as “find” and explains:

Initial results are refined, annotated and easy to explore. Sorted by relevancy, important terms are highlighted: easy to decide which one are relevant. Sophisticated facet based filters. Refining results set: more like this, this one, statistical and semantic methods, more like these: graph based activation ranking. Suggestions to help refine results set: new queries based on inferred or combined tags. Related searches and queries.

This is a similar marketing move to the one that Intrafind, a German search vendor, implemented several years ago. Mondeca continues to offer its taxonomy management system. Human subject matter experts do have a role in the world of indexing. Like other taxonomy systems and services vendors, the hook is that content indexed with concepts is smart. I love it when indexing makes content intelligent.

The buzzword is used by outfits ranging from MarkLogic’s merry band of XML and XQuery professionals to the library-centric outfits like Smartlogic. Isn’t smart logic better than logic?

Stephen E Arnold, February 22, 2017

The Pros and Cons of Human Developed Rules for Indexing Metadata

February 15, 2017

The article on Smartlogic titled The Future Is Happening Now puts forth the Semaphore platform as the technology filling the gap between NLP and AI when it comes to conversation. The article posits that in spite of the great strides in AI in the past 20 years, human speech is one area where AI still falls short. The article explains,

The reason for this, according to the article, is that “words often have meaning based on context and the appearance of the letters and words.” It’s not enough to be able to identify a concept represented by a bunch of letters strung together. There are many rules that need to be put in place that affect the meaning of the word; from its placement in a sentence, to grammar and to the words around – all of these things are important.

Advocating human developed rules for indexing is certainly interesting, and the author compares this logic to the process of raising her children to be multi-lingual. Semaphore is a model-driven, rules-based platform that allows us to auto-generate usage rules in order to expand the guidelines for a machine as it learns. The issue here is cost. Indexing large amounts of data is extremely cost-prohibitive, and that it before the maintenance of the rules even becomes part of the equation. In sum, this is a very old school approach to AI that may make many people uncomfortable.

Chelsea Kerwin, February 15, 2017

Next Page »

  • Archives

  • Recent Posts

  • Meta