Palantir: Now an Enterprise App Developer

September 30, 2014

I read “Hush Hush Data Firm Palantir Snags ICE Case Tracking Deal.” Palantir may be moving from supporting intelligence agencies to the market sector dominated by government contractors like SRA, Booz Allen Hamilton, and CACI.

The article states:

Immigration and Customs Enforcement has awarded secretive data-mining firm Palantir a $42 million contract to redo the investigation agency’s failed case filing system.

The challenge will be to make a case management system work in a manner that satisfies the statement of work. Other case management efforts have crashed and burned.

Palantir appears to be working with a tough mandate: On time and on budget delivery. As you may know, the notion of on time and on budget is only valid until the first scope change rolls down the timeline.

Are flaws in case management systems unusual. Nah. The article reveals:

The Justice Department inspector general last week released a report on the FBI’s new case management system, Sentinel, assailing its searching and indexing features for slowing the investigations of special agents and the productivity levels of evidence technicians.

Why are case management systems problematic? I can identify a number of reasons, but it will be more entertaining if I wait for news about the Palantir project’s path.

Stephen E Arnold, October 17, 2014

Cheerleading for dtSearch

September 30, 2014

Short honk: Want to read how one dtSearch user just loves the desktop search system to death? Navigate to “dtSearch: How to handle Big or even Biggish Data.” What strikes me is the write up’s creation of a new buzzword: biggish. dtSearch has been around since 1991 and is now at version 7. The system was once Microsoft centric, but a version for Android is allegedly in beta testing.

The write up states:

The performance of dtSearch is truly impressive and the fact that it’s not only fast but can handle Big Data makes it ideal for all sorts of heavy lifting searches as well as digital forensics; indeed, the company has extensive advice on how to use dtSearch for just that purpose.

There a a few, apparently minor downsides:

There are some things dtSearch doesn’t do such as exporting the data from only one or more indexed fields (for example, just “Sender” and “Date”) although exporting to CSV and importing into Excel allows you to slice and dice the data with ease. My only other criticism of dtSearch is that its user interface looks a little dated.

No information about the time required to add additional content to the index. What happens when dtSearch hits a Drobo with a terabyte of text? Answer: it takes days to index the content collection.

The big plus for dtSearch is not mentioned. In my opinion, dtSearch is one of the few remaining commercial personal desktop search solutions. Exalead and ISYS Search Software have left the field of battle. Freeware and shareware products have an odd predilection to crash and burn.

Check out dtSearch at

Stephen E Arnold, September 30, 2014

Search: Just an Activity

September 30, 2014

Well, this is going to be a surprise for some folks at Google. After building a brand and habit for the search box at, search is just an activity. I leaned this in “Search Is No Longer a Destination. It’s an Activity.”

If I am an advertiser using AdWords or Facebook’s mechanism, I just want sales. Does the shift from activity to destination increase the value of a Facebook ad versus a Google ad.

The article points out:

Search engines have always had a hard time differentiating themselves to the masses. While digital marketers love analyzing the differences between algorithms, targeting methods, and result page layouts, the average person can’t tell much of a difference. That’s why for years “” was one of the top searches on Yahoo. That’s why despite some very clever (in my opinion) “Bing It On” TV commercials and some great case studies, Bing has had a very difficult time winning search traffic away from Google. As long as users aren’t dissatisfied with the results, they’ll keep searching wherever is convenient – often without even realizing what search engine they’re using.

Well, I am not sure that “always” is exactly on target. I think Chemical Abstracts differentiates itself quite well from Bing, Google, and a query about torts passed against Lexis. I know. I know. The article is aimed at folks who think about search in terms of Google, not the context of search and its more uninteresting manifestations.

The one point that I noted as fodder for my files was this one:

Context is the key element that powers these new search experiences. While some still contain a box where you can enter a query, their core functionality is around understanding and anticipating the searcher’s needs in the moment based on secondary signals like location, history, and other personal data the user chooses to share. And should the user need answers outside of this proactive information, voice search is the primary point of interaction.

I suppose I should be cheered that Delve, Microsoft’s search for Office 365, is going to get some blogger love. I am not exactly how a person looking for specific information will go about that task if accounts to commercial databases are not affordable and information access becomes an app.

I do not need to worry. The author provides this glimpse of the benefits of the death of traditional search:

No matter what format search marketing may take in the future, brands that build their strategy around providing valuable answers to their customers’ questions will continue to drive success in search – regardless of how the consumer searches, or if they even know what engine they’re using.

Right. When someone looks for a household cleanser, those ads for big name consumers products will fill the bill. How reassuring.

Stephen E Arnold, September 30, 2014

Analytics Troubled by Bottlenecks. Impossible.

September 30, 2014

The hyperbole artists have painted themselves into a corner. I am not sure too many folks know this. The idea that one can crank out killer analyses with a couple of swipes or a mouse click are raising expectations. Like so much in content processing, reality is a just little bit different.

You know that the slips twixt cup and lip must be cropping up in numerous organizations. The Harvard Business Review does not write too much science fiction compared to MIT’s Technology Review across the river.

Beware the Analytics Bottleneck” adopts the same MBA tone that makes Wall Street bankers and lawyers so beloved by the common man and delivers what might be a downside.

The write up states:

“Don’t be overwhelmed. Start slower to go faster.” I think that runs counter to the baloney in the Eric Schmidt Google tome.

Next the HBR wants to keep life simple for the busy one percenters:

Technology doesn’t have to be exposed. Keep the complexity behind the curtain. Definitely good advice if one does not know whether the data are valid and the numerical recipes are configured in an appropriate manner.

Then the golden piece of advice for the go go MBA looking for a payday so he or she can pursue his or her dream of helping people or just spending money:

Make faster decisions for faster rewards.

That’s a sure fire way to break through bottlenecks. Use the outputs to support really fast decisions. Forget that pondering stuff. Just guess.

What’s scary is that when some folks have a tiny bit of knowledge, their deliberations can yield disastrous decisions. Need some examples. Well, do some thinking. How about GM and ignition switches? What about IRS actions and email mysteries? Or multi billion dollar acquisitions that lead to multi billion dollar write offs shortly after handing over the dump trucks filled with cash?

My take on this write up is that the “expert” did not focus on the bottlenecks that Big Data often produce like sex crazed hamsters:

  1. The time and cost to normalize and validate data
  2. The complexity of updating indexes so that reports reflect the most recent data, not stale data
  3. Dealing with the configuration decisions that generate outputs that are just plain wrong
  4. The money spent to get a system back online when it crashes either an old fashioned on premises flame out or one of the nifty new cloud systems that are virtual and allegedly fool proof.

In short, Big Data and analytics pose some very significant challenges for vendors, licensees, and those who use the systems. The good news is that guessing will probably produce better results than reasoning through a decision based on flawed information. The bad news is that fancy content processing systems are likely to gobble budgets and increase certain operational costs.

The HBR obviously does not agree. Well, the fellows around the cast iron stove in Harrod’s Creek, Kentucky, find my observations directly on point.

Stephen E Arnold, September 30, 2014

Among More Changes Connotate Adds New Leader

September 30, 2014

Connotate has been going through many changes through 2014. According to Virtual Strategy they can count adding a new leader to the list: “Connotate Appoints Rich Kennelly As Chief Executive.” Connotate sells big data technology, specializing in enterprise grade Web data harvesting services. The newest leader for the company is Richard J. Kennelly. Kennelly has worked in the IT sector for over twenty years. Most of his experience has been helping developing businesses harness Internet and data. He has worked at Ipswitch and Akami Technologies, holding leadership roles at both companies.

Kennelly is excited about his new position:

“ ‘This is the perfect time to join Connotate,’ said Kennelly. ‘The Web is the largest data source ever created.  The biggest brands are moving quickly to leverage that data to drive competitive advantage and create new revenue streams. Connotate’s patented technology, scalability, and deep technical expertise make us the natural choice for these forward thinking companies.’”

The rest of the quote includes a small, but impressive client list, more praise for Kennelly, and how Connotate is a leading big data company.

If Connotate did not have good products and services, then they would not keep their clients. Despite the big names, they are still going through financial woes. Is choosing Kennelly a sign that they are trying to raise harvest more funding?

Whitney Grace, September 30, 2014
Sponsored by, developer of Augmentext

Processing Content Is Easy, Right?

September 30, 2014

A mobile search app would be useful and appreciated by mobile devices. According to the URX Blog post “Deduplication Of Web Content” it is relatively easy to create a search app, but creating a robust search app is the challenge. A robust search app would need to include link prioritization, feature extraction, re-crawl estimation, and content deduplication. The post is the first in an article series developing a mobile search app.

Deduplicating content is important for user experience:

“Duplicate pages in a search index poison search results. The goal of a search engine is to return both relevant and diverse documents, allowing users to decide the optimal resolution for a query. Without deduplication, the top-k results returned for a user’s query would likely contain duplicate content. In the extreme, all k results will be copies of the same page. This creates a bad user experience where, as the crawler scales out, the duplicate likelihood increases. In fact, Google’s Matt Cutts believes that up to 20% of web content is duplicated.”

The rest of the post examines the different types of duplication, how to identify them, and remove them from search results.

While the search app will serve an important function, it does not make sense to me why people cannot just open a Web browser on a mobile device and conduct a regular search. What I would like to see is an app that searches content on apps on a device.

Whitney Grace, September 30, 2014
Sponsored by, developer of Augmentext

Internet Business: Slightly Different Points of View

September 29, 2014

First, navigate to “Another Top Investor Sounds the Alarm: When the Market Turns, a Bunch of Startups Are Going to Vaporize.” No big surprise here. The main idea is, in my opinion:

Over the past few years, it’s been relatively easy for startups to raise money from venture capitalists. In some cases, they’re raising hundreds of millions of dollars to keep their companies afloat. But behind the scenes, they’re plowing through that money either on marketing, overhead, or some other expense, which results in high burn rates. These bloated companies are using their millions to hide serious flaws in their business models.

At some point, those who provide the bucks to the venture firms will want a return. Many of the Fancy Dan outfits are not among the world’s most liquid operations. To raise cash, MBAs and accountants can cook up some quite remarkable solutions. The actions cascade down the line and end up pushing technology companies like those that pitch wild and crazy content technology into an Iron Maiden. This is essentially a casket with spikes protruding into the box and spikes pointing into the box on its lid.

Ta da.

The individual is placed into the Iron Maiden and the door is shut. Ouch.

Now navigate to either the Google book itself or the concepts Web site at Eric Schmidt argues that businesses should be like Google. You know the moon shots, trying stuff and failing fast (I am not sure how fast Google has failed at social networking, but I don’t want to be argumentative), and value numbers/data over any humanoid subjectivity.

For many search and content processing companies, the senior managers have been failing for years in some cases. I want to make a list of would be start ups and then provide their date of inception. Heck, why embarrass outfits like Attivio, Coveo, Digital Reasoning, Lucid Imagination (now Lucid Works to which I am tempted to add “Really? but I will not.”), and quite a few others.

The point is that we have two somewhat conflicting interpretations of the present business climate. The tweets that inspired the Business Insider write up are taking a hard look at what happens when the money goes away. No money means that affected firms first people, raise prices, and pivot along with a half dozen or so MBA maneuvers before shutting the doors as Convera, Delphes, did Entopia. A few lucky outfits will sell out like Endeca, Exalead, and iPhrase. A few will struggle along sort of open and sort of closed like a number of French search and content processing firms.

On one hand, these outfits are toast if more money is not “found.” On the other hand, forget money. In Google’s world view, these companies need to be more like Google or out Google Google.

The reality is that the contraction of search and content processing has already begun. Some outfits are going to have to find a way to deliver a solution that solves an actual problem and generates sustainable revenue. Companies in this spot include IBM with its Watson project, Hewlett Packard with its Autonomy IDOL technology, and Palantir, a billion dollar baby of considerable note.

My view is that the doom and gloom expressed in the Business Insider write up is more likely to occur than a Google style entity arising from the Google Moon shot and allied suggestions. I am not sure the Google recommendations apply to Google. A company that is 15 years old and has one revenue stream may be a success that fulfills Steve Ballmer’s one trick pony observation.

For search and content processing vendors, there is no easy way out unless money remains plentiful and Google’s advice actually works for an information retrieval company.

Stephen E Arnold, September 29, 2014

Why Good Enough Is the New Norm in Search

September 29, 2014

Navigate to “Postgres Full Text Search Is Good Enough.” I first heard this argument at a German information technology conference a few years ago. The idea is surprisingly easy to understand. As long as a user can bang in a couple of key words, scan a result list, and locate information that the user finds helpful—job done. The search results may consist of flawed or manipulated information. The search results may be off point for the user’s query when evaluated by old fashioned methods such as precision and recall. The user may be dumb and relies on what the user finds accurate.


This write up explains the good enough approach in terms of PostgreSQL, a useful open source Codd type data management system. Please, note. I am not uncomfortable with good enough search. I understand that when the herd stampedes, it is not particularly easy to stop the run. Prudence suggests that one take cover.

Here’s the guts of the write up:

What do I mean by ‘good enough’? I mean a search engine with the following features:

  • Stemming
  • Ranking / Boost
  • Support Multiple languages
  • Fuzzy search for misspelling
  • Accent support

Luckily PostgreSQL supports all these features.

The write up contains some useful code snippets to make use of search features. The discussion of full text search is coherent and addresses a vast swath of content. Note that proprietary vendors have tilled acres of marketing earth and fertilizer to convert search into a mind boggling range of functions.

This article includes code snippets to tackle full text within PostgreSQL.

Querying is included as well. Again, code snippets are included. (My teenage advisors said, “Very useful snippets.” Okay. Good.

The write up concludes:

We have seen how to build a decent multi-language search engine based on a non-trivial document. This article is only an overview but it should give you enough background and examples to get you started with your own….Postgres is not as advanced as ElasticSearch and SOLR but these two are dedicated full-text search tools whereas full-text search is only a feature of PostgreSQL and a pretty good one

Reasonable observation. Worth reading.

If you are a vendor of proprietary search technology, there will be more individuals infused with the sprit of open source, not fewer. How many experts are there for proprietary systems? Fewer than the cadres of open source volk I surmise.

Stephen E Arnold, September 29, 2014

Tibco: Will It Regain Its Momentum?

September 29, 2014

I read “Tibco Sells Out to Private Equity in $4.3bn Deal with Vista Equity Partners.” I found Tibco interesting when I saw the servers used to power Yahoo News a number of years ago. The company is now owned by accountants and MBAs. I learned in the write up:

Tibco was founded in 1997 by its current chairman and CEO Vivek Ranadive. It was a pioneer of message-oriented middleware, particularly for the financial sector, which enables information to be pushed to multiple recipients at precisely the same time. However, Tibco’s expensive high-end proprietary software is under attack from open source in the form of the Advanced Message Queuing Protocol (AMQP), which promises not just lower-cost message queuing software, but also inter-operability between different vendors’ implementations of the open-source standard.

My recollection is that Tibco’s “information bus” made some of the old line outfits uncomfortable. Perhaps IBM? If the write up is accurate, open source is claiming a proprietary vendor.

How long will proprietary enterprise search vendors be able to keep the open source predators away? If the financial market gets the willies, the collapse of over hyped proprietary systems are likely to face high seas. Some swimmers drown in rough water even though the marketers insist the sun is shining.

Stephen E Arnold, September 29, 2014

IDC Tweets, IBM, and Content Marketing

September 29, 2014

Some Backstory

In 2012 and 2013, IDC sold my content with my name and Dave Schubmehl’s. These were nifty IDC “official” reports. The only hitch in the git along is that IDC did not trouble itself to issue a contract, get my permission, or tell me what they were doing with research my team prepared. The deal was witnessed by a law librarian, and I have a stack of emails about my research into such open source companies as Attivio, ElasticSearch (one of the disruptors of the enterprise search market), IBM (the subject of the IDC twit storm), Lucid Imagination (now Lucid Works which I write when I feel playful as Lucid works, really?), and eight other companies.

Hit by a twit storm. Rough seas ahead. Image from

In 2012, I had the open source research. IDC wanted the open source content to use in a monograph. So in front of a law librarian, IDC’s search “expert” thought the exchange of my information for open source intelligence, money, and stuff to sell was a great idea. (I have a file of email from IDC to me about what IDC wanted, but I never got a contract. But IDC had my research. Ah, those administrative delays.) IDC, however, was organized enough to additions to my company research like an open source industry overview.

In an odd approach to copyright, IDC did not produce a contract but it produced reports about four open source companies. Mr. Schubmehl and IDC just went about producing what were recycled company reports and trying to sell them at $3,500 a whack. Is that value or an example of the culture of narcissism? It may come as a surprise to you, gentle reader, but I sell research for money. I have a business model and it has worked for about 40 years. When an outfit uses the research without issuing a contract, I have to start thinking about such issues as fairness, integrity, copyright, and name surfing. Call me idiosyncratic, but when my name is used without my permission, I wonder how a big and allegedly respected organization can operate like a BearStearns-type senior executive.

Then, the straw that broke the proverbial camel’s back, a librarian told me that IDC was selling a report with my name and Mr. Schubmehl’s on Amazon. Wow, Amazon, the Wal-Mart for the digital age. The reports, now removed from Amazon’s blue light special shelf cost $3,500. Not bad for eight pages of  information based on my year long research investment into the wild and volatile world of open source search and content processing. Surf’s up for Mr. Schubmehl.

Well, IDC after some prodding by my very gentle legal gerbil stopped selling my work. We received a proposal that offered me a pittance for a guarantee that I would not talk or write about this name surfing, unauthorized resale of my information on Amazon, and the flubs of Mr. Schubmehl.

My legal gerbil rejected IDC’s lawyer crafted “deal,” and I am now converting my IDC misadventure  into a metaphor for some of the deeper issues associated with “experts” and certain professional services firms. My legal gerbil suggested a significantly higher fee, but, like many of that ilk, the gerbil broke my heart.

Hence, IDC and Mr. Schubmehl’s tweets and twit storm are on my fragile ship’s radar. Let’s review the IBM IDC Schubmehl twit storm on just one day in September 2014. Trigger warning: Do not emulate the IDC Schubmehl method for your content marketing program. One day of tweets only generates a lot of twit.

Now to the Twit Storm Unleashed on September 16, 2014

Using my Overflight system, I monitor IDC tweets. Quite an interesting series of tweets appears on September 16, 2014. Mr. Schubmehl posted 25 tweets about IBM Watson.

Here are three examples of the Watson content content to which his name was attached::

  • September 16, 2014. #WatsonAnalytics uses Watson cognitive technologies to ingest structured data and find relationships – Robin Grosset & Dan Wolfson
  • September 16, 2014 Combo of cognitive with cloud analytics improves process, analysis and decision making – cognitive will change all mkts #WatsonAnalytics
  • September 16, 2014 #WatsonAnalytics will be using a freemium model….first time for IBM…

Obviously there is nothing wrong with a tweet about an IBM product. What’s one more twit emission in a flow of several hundred thousand 144 character text outputs.

There is nothing illegal with two dozen tweets about IBM. What two dozen tweets do is make me laugh and see this content marketing effort as fodder for corporate weirdness.

Also, this IBM twit storm is not on the Miley Cyrus or Lady Gaga scale, but it is notable because it is a one day twit storm quite unlike the Jeopardy journey. Quite a marketing innovation: getting an alleged “expert” to craft  16 “original” tweets in one day and issue seven retweets of tweets from others who are fans of Big Blue. A few Schubmehl tweets on the 16th illustrated diversity; for example, “The FBI’s Facial Recognition System Is Here.” Hmm. The FBI and facial recognition. I wonder why one is interested in this development.

The terms mentioned in these IBM centric tweets on September 16, 2014, reveal the marketing jargon that IBM is using to generate revenue from the game show winning technology. My list of buzzwords from the tweets read like a who’s who of blogosphere and venture oriented yak:

  • Automated data cleansing
  • Analytics (cloud based)
  • Big Data
  • Cognitive (system and capabilities)
  • Data explorer
  • Democratizing
  • Freemium
  • Natural Language Computing
  • Natural Language Query.

From this list of buzzwords my favorites are “cognitive,” “Big Data,” and the number one silly word “Freemium.” Imagine. Freemium from IBM. Imagine.

My Interpretation of the Twit Storm

Let me capture several preliminary observations:

First, the Schubmehl Twitter activity on September 16, 2014 focuses mostly on IBM’s challenged Watson business development effort. The cluster of tweets on the 16th suggest a somewhat ungainly and down-market content marketing play.

Did Mr. Schubmehl wake up on the 16th of September and decide to crank out Watson centric tweets? Did IBM pay IDC and Mr. Schubmehl to do some content marketing like thousands of PR firms do each day? We even have these outfits in Harrod’s Creek, Kentucky to flog auto sales, bourbon, and cheesy festivals in Middletown, Kentucky.

Here’s a question: “How many tweets does a McKinsey or Bain type of consulting firm issue on a single day for a single product that seems to be struggling for revenue?” If you know, please, use the comments section of this blog to provide some factoids.

Second, the tweets provide the reader with a list of what seem to be IBM Watson aficionados or employees who have the job of making the shotgun marriage of open source code, legacy Almaden technology, and proprietary scripts into a billion dollar revenue producer soon, very soon, gentle reader. The individuals mentioned in the September 16, 2014, tweets include:

  • Steve Gold, Baylor University
  • Robin Grosset, Distinguished engineer Watson Analytics.
  • Dan Wolfson, IBM Distinguished Engineer
  • Bob Picciano, Senior vice president, IBM information and analytics group.

Perhaps Mr. Gold is objective? I ask, “Do the other three IBM wizards looking at the world through IBM tinted spectacles when reading their business objectives for the current fiscal year?” I asked myself, “Should I trust these individuals who presumably are also “experts” in all things related to Watson?” My preliminary answer is, “Not for an objective view of the game show winning Watson.”

Third, what’s the payoff of this twit storm for IBM? Did IBM expect me to focus on the Schubmehl twit storm and convert the information into my idea of a 10 minute stand up comedy routine to deliver at the upcoming intelligence and law enforcement conference in nine days? Is it possible that “doing social media” looks good on a weekly report when an executive does not have juicy revenue numbers to present? The value of the effort strikes me as modest. In fact, viewed as a group, the tweets could be interpreted as a indicator of IBM’s slide into desperation marketing?

What about consulting firms and their ability to pump out high margin revenue?

Outfits like Gerson Lehrman Group have put the squeeze on mid tier consulting firms. The bottom feeders with its middle school teacher and poet contingent are not likely to sell to the IBMs of the world. GLG types companies are also nipping at the low end business of the blue chip outfits like Bain, Boston Consulting, and even McKinsey.

Put GLG can deliver to a client retired professionals from blue chip firms and on point experts. As a result, GLG has made life very, very tough for the mid tier outfits. Why pay $50,000 for an unproven “expert” when you can buy a person with a pedigree for an hour and pay a few hundred bucks when you need a factoid or an opinion? I consider IDC’s move to content marketing indicative of a fundamental shift in the character of a consulting firm’s business. The shift to low level PR work seems out of character for a professionals services with a commitment to intellectual rigor.

Every few days I learn that something called generates a list of content marketing leaders. Will IDC appear on this list?

For those who depend on lower- or mid tier consulting firms for professional counsel, how would you answer these questions:

  1. What is the intellectual substance behind pronouncements? Is there original research underpinning pronouncements and projections, or are the data culled from secondary sources and discussions with paying customers?
  2. What is the actual relationship between a mid tier consulting firm and the companies discussed in “authoritative” reports? Are these reports and projects inclusions (a fancy word for ads) or are they objective discussions of companies?
  3. Are the experts presented as “experts” actually experts or are they individuals who want to hit revenue goals while keeping costs as low as possible?

I don’t have definitive answers to these questions. Perhaps one day I can use a natural language query to tap into Big Data and rely on cognitive methods to provide answers.

For now, a one day twit storm is a wonderful example of how not to close deals, build reputations, and stimulate demand for advanced technology offered via a “Freemium” model. What the heck does that mean anyway?

Stephen E Arnold, September 29, 2014

Next Page »

  • Archives

  • Recent Posts

  • Meta