Google Made Simple

October 16, 2014

Google is just so darned friendly. The system knows exactly what users want. The GOOG offers helpful suggestions for just about everything.

Now how does Google work?

To answer this question, study “How Google Works.” This is a very sophisticated presentation that is so darned clever. The information is so darned relevant. Darn it. I wish every 60,000 person company controlling information access would explain itself in this way.

Here’s an example:


Isn’t this type of presentation darned magnificent? Why fool around with Google patent documents? Why waste time on Google technical papers like this one


Why waste time fooling around with trivial activities like this?


Oh, wow. This slide show is not about technology. The slide show wants to convince you to buy Eric Schmidt’s new book.


Isn’t that so darned clever? You can buy a copy from the company that is Google’s closest competitor, Amazon. Isn’t that a darned good way to do cooperative competition?

Stephen E Arnold, October 16, 2014

MarkLogic: Banging a Drum in Hopes of Drowning Out Open Source NoSQL Reggae Beat

October 3, 2014

I read “MarkLogic Positioned as a Leader in NoSQL Document Databases Report by Independent Research Firm.” The research firm is the mid tier outfit Forrester Research Inc. Forrester creates “wave” reports. These are Forrester’s response to various grid, quadrants, and tables cranked out by Gartner, Ovum, Butler, Kelsey, and a life boat stuffed with consulting firm shakeout survivors. Dated October 2, 2014, the MarkLogic news release will be the first of a half dozen or more issued by companies in this “independent research firm’s” report. The mid tier analyses are crafted so that negatives are swathed in high density, low impact foam like the spray on insulation.

Why not?

Like Heaven’s Gate’s media event, any publicity is good publicity. At least, that’s the public relations mantra. Look at IBM Watson and its BBQ sauce recipe with tamarind. I mention that innovation as frequently as possible.

Well, let me do my part for this report:

The write up asserts:

“MarkLogic offers the most mature and scalable NoSQL document database. Unlike other NoSQL document databases, MarkLogic has been offering a NoSQL solution for more than a decade,” stated Forrester in the report that evaluated select companies against 57 criteria. “MarkLogic has the most comprehensive data management features and functionality to store, process, and access any kind of structured and multi structured data.” Forrester’s evaluation of NoSQL document database vendors scored factors like performance, scalability, integration, security, high availability, workload management and form factor. MarkLogic was cited as a Leader in the evaluation, receiving its highest score in the go-to-market category.

Okay. The news release provides a link so the reader can get a copy of the “independent research firm’s” report. If you want to skip the original document and go to the registration form so you can download the “independent research firm’s” report, navigate to In my experience, some follow up by the “leader” MarkLogic may take place.

In my view, content marketing covers these “independent” reports. The idea makes clear that attention is required in order to kindle interest in a product or a service. Now MarkLogic is an Extensible Markup Language data management system. The company has been in business since 2003. The firm has ingested more than $70 million in venture funding. The firm has experienced the same type of revolving door for senior management that other ageing starts up experience; for example, Lucid Imagination (now Lucid Works, which I write as Lucid Works. Really?). MarkLogic, in order to meet stakeholders’ expectations, has to find a growth bull, get it in a corral, and covert the animal to high value revenue.

Several observations:

  1. Proprietary XML systems positioned as NoSQL alternatives have to find a way to convince a prospect that proprietary is a better value than open source. The impact of Hadoop, a variant of Google’s Big Table, is long in the tooth and faces some of its own value challenges.
  2. Companies like Oracle are providing some of its clients with the comfort of a proprietary system with compatibility with open source technology. Thus, some large companies may be reluctant to dismount one old nag and climb on another. IBM also does some anti open source marketing but that’s another story. For some insights, run a query for Watson on the Beyond Search index.
  3. The noise surrounding NoSQL is creating some confusion. This means that firms that are neither big or small have to find a way to make their size into a positive. Enter content marketing and reports that present a group of companies in a simplified table.
  4. Do the “independent” experts use the products included in a variant of the Boston Consulting Group’s matrix? You know: Install, optimize, customize, and utilize with their own brain, fingers, and eyeballs? My hunch is that none of this “real” experience stuff is germane to cranking out an “independent” report. Just my uninformed opinion, you understand.

If a company requires a NoSQL solution, how do those firms select vendors? Based on the research that IDC used to skip Dave Schubmehl to expert status, large companies are more likely to try open source for a new project. Smaller firms often look for brand name software in order to show investors that base technology has a brand name.

Forrester-type firms (Gartner, IDC, Ovum, etc.) generate “independent” reports to inflate the balloon. The French have a delightful verb for this: “se gonfler”. So, nous [MarkLogic] gonflons notre ballon. (If the translation is poor, blame Google, the inventor of Big Table more than a decade ago.)

Stephen E Arnold, October 3, 2014

Coveo and Fresh Jargon

October 3, 2014

I spotted some “fresh” jargon from search vendor Coveo. Here are the terms:

  1. Relevance solutions. The implicit idea is that other vendors’ search systems do not deliver results that are on point to a user. I am not sure how many vendors pay much attention to relevance. In fact, the shift to graphics and reports purport to “answer questions.” These systems may not, but it is an approach that powers some big folks’ innovations. Microsoft Delve, anyone?
  2. Unified search. The idea is that information can reside in different forms and locations. Presenting search results from these different sources eliminates the need to run queries on different systems. The problem is that “unified,” in my experience does not include certain types of content; for example, snippets from videos or data locked in proprietary systems like the IBM i2 Dot ANB format. Unified or federated search is a term popular with well known companies like Attivio and lesser known companies like Polyspot. The company in my mind most closely associated with this concept is Deep Web Technology. The idea is a good one, but expectations can rise above the actual “federated” experience in my opinion.

I find the creativity evidenced in these examples of jargonizing evidence of three trends:

First, “search” as a buzzword only has impact if qualified in some way; for example, federated, unified, intelligent, etc.

Second, individual vendors are working to try and differentiate themselves from what to many people seem to be identical in form and function. The differentiator boils down to price, the power of the brand, or perceived value of the system endorsed by mid tier consultants like IDC-type or Forrester-type outfits.

Third, the impact of open source alternatives lurks behind these verbal gymnastics. ElasticSearch, whether proprietary vendors are comfortable with the notion or not, offer a way to get search without the lock down that proprietary vendors bring to the party.

Stephen E Arnold, October 2, 2014

IDC Tweets, IBM, and Content Marketing

September 29, 2014

Some Backstory

In 2012 and 2013, IDC sold my content with my name and Dave Schubmehl’s. These were nifty IDC “official” reports. The only hitch in the git along is that IDC did not trouble itself to issue a contract, get my permission, or tell me what they were doing with research my team prepared. The deal was witnessed by a law librarian, and I have a stack of emails about my research into such open source companies as Attivio, ElasticSearch (one of the disruptors of the enterprise search market), IBM (the subject of the IDC twit storm), Lucid Imagination (now Lucid Works which I write when I feel playful as Lucid works, really?), and eight other companies.

Hit by a twit storm. Rough seas ahead. Image from

In 2012, I had the open source research. IDC wanted the open source content to use in a monograph. So in front of a law librarian, IDC’s search “expert” thought the exchange of my information for open source intelligence, money, and stuff to sell was a great idea. (I have a file of email from IDC to me about what IDC wanted, but I never got a contract. But IDC had my research. Ah, those administrative delays.) IDC, however, was organized enough to additions to my company research like an open source industry overview.

In an odd approach to copyright, IDC did not produce a contract but it produced reports about four open source companies. Mr. Schubmehl and IDC just went about producing what were recycled company reports and trying to sell them at $3,500 a whack. Is that value or an example of the culture of narcissism? It may come as a surprise to you, gentle reader, but I sell research for money. I have a business model and it has worked for about 40 years. When an outfit uses the research without issuing a contract, I have to start thinking about such issues as fairness, integrity, copyright, and name surfing. Call me idiosyncratic, but when my name is used without my permission, I wonder how a big and allegedly respected organization can operate like a BearStearns-type senior executive.

Then, the straw that broke the proverbial camel’s back, a librarian told me that IDC was selling a report with my name and Mr. Schubmehl’s on Amazon. Wow, Amazon, the Wal-Mart for the digital age. The reports, now removed from Amazon’s blue light special shelf cost $3,500. Not bad for eight pages of  information based on my year long research investment into the wild and volatile world of open source search and content processing. Surf’s up for Mr. Schubmehl.

Well, IDC after some prodding by my very gentle legal gerbil stopped selling my work. We received a proposal that offered me a pittance for a guarantee that I would not talk or write about this name surfing, unauthorized resale of my information on Amazon, and the flubs of Mr. Schubmehl.

My legal gerbil rejected IDC’s lawyer crafted “deal,” and I am now converting my IDC misadventure  into a metaphor for some of the deeper issues associated with “experts” and certain professional services firms. My legal gerbil suggested a significantly higher fee, but, like many of that ilk, the gerbil broke my heart.

Hence, IDC and Mr. Schubmehl’s tweets and twit storm are on my fragile ship’s radar. Let’s review the IBM IDC Schubmehl twit storm on just one day in September 2014. Trigger warning: Do not emulate the IDC Schubmehl method for your content marketing program. One day of tweets only generates a lot of twit.

Now to the Twit Storm Unleashed on September 16, 2014

Using my Overflight system, I monitor IDC tweets. Quite an interesting series of tweets appears on September 16, 2014. Mr. Schubmehl posted 25 tweets about IBM Watson.

Here are three examples of the Watson content content to which his name was attached::

  • September 16, 2014. #WatsonAnalytics uses Watson cognitive technologies to ingest structured data and find relationships – Robin Grosset & Dan Wolfson
  • September 16, 2014 Combo of cognitive with cloud analytics improves process, analysis and decision making – cognitive will change all mkts #WatsonAnalytics
  • September 16, 2014 #WatsonAnalytics will be using a freemium model….first time for IBM…

Obviously there is nothing wrong with a tweet about an IBM product. What’s one more twit emission in a flow of several hundred thousand 144 character text outputs.

There is nothing illegal with two dozen tweets about IBM. What two dozen tweets do is make me laugh and see this content marketing effort as fodder for corporate weirdness.

Also, this IBM twit storm is not on the Miley Cyrus or Lady Gaga scale, but it is notable because it is a one day twit storm quite unlike the Jeopardy journey. Quite a marketing innovation: getting an alleged “expert” to craft  16 “original” tweets in one day and issue seven retweets of tweets from others who are fans of Big Blue. A few Schubmehl tweets on the 16th illustrated diversity; for example, “The FBI’s Facial Recognition System Is Here.” Hmm. The FBI and facial recognition. I wonder why one is interested in this development.

The terms mentioned in these IBM centric tweets on September 16, 2014, reveal the marketing jargon that IBM is using to generate revenue from the game show winning technology. My list of buzzwords from the tweets read like a who’s who of blogosphere and venture oriented yak:

  • Automated data cleansing
  • Analytics (cloud based)
  • Big Data
  • Cognitive (system and capabilities)
  • Data explorer
  • Democratizing
  • Freemium
  • Natural Language Computing
  • Natural Language Query.

From this list of buzzwords my favorites are “cognitive,” “Big Data,” and the number one silly word “Freemium.” Imagine. Freemium from IBM. Imagine.

My Interpretation of the Twit Storm

Let me capture several preliminary observations:

First, the Schubmehl Twitter activity on September 16, 2014 focuses mostly on IBM’s challenged Watson business development effort. The cluster of tweets on the 16th suggest a somewhat ungainly and down-market content marketing play.

Did Mr. Schubmehl wake up on the 16th of September and decide to crank out Watson centric tweets? Did IBM pay IDC and Mr. Schubmehl to do some content marketing like thousands of PR firms do each day? We even have these outfits in Harrod’s Creek, Kentucky to flog auto sales, bourbon, and cheesy festivals in Middletown, Kentucky.

Here’s a question: “How many tweets does a McKinsey or Bain type of consulting firm issue on a single day for a single product that seems to be struggling for revenue?” If you know, please, use the comments section of this blog to provide some factoids.

Second, the tweets provide the reader with a list of what seem to be IBM Watson aficionados or employees who have the job of making the shotgun marriage of open source code, legacy Almaden technology, and proprietary scripts into a billion dollar revenue producer soon, very soon, gentle reader. The individuals mentioned in the September 16, 2014, tweets include:

  • Steve Gold, Baylor University
  • Robin Grosset, Distinguished engineer Watson Analytics.
  • Dan Wolfson, IBM Distinguished Engineer
  • Bob Picciano, Senior vice president, IBM information and analytics group.

Perhaps Mr. Gold is objective? I ask, “Do the other three IBM wizards looking at the world through IBM tinted spectacles when reading their business objectives for the current fiscal year?” I asked myself, “Should I trust these individuals who presumably are also “experts” in all things related to Watson?” My preliminary answer is, “Not for an objective view of the game show winning Watson.”

Third, what’s the payoff of this twit storm for IBM? Did IBM expect me to focus on the Schubmehl twit storm and convert the information into my idea of a 10 minute stand up comedy routine to deliver at the upcoming intelligence and law enforcement conference in nine days? Is it possible that “doing social media” looks good on a weekly report when an executive does not have juicy revenue numbers to present? The value of the effort strikes me as modest. In fact, viewed as a group, the tweets could be interpreted as a indicator of IBM’s slide into desperation marketing?

What about consulting firms and their ability to pump out high margin revenue?

Outfits like Gerson Lehrman Group have put the squeeze on mid tier consulting firms. The bottom feeders with its middle school teacher and poet contingent are not likely to sell to the IBMs of the world. GLG types companies are also nipping at the low end business of the blue chip outfits like Bain, Boston Consulting, and even McKinsey.

Put GLG can deliver to a client retired professionals from blue chip firms and on point experts. As a result, GLG has made life very, very tough for the mid tier outfits. Why pay $50,000 for an unproven “expert” when you can buy a person with a pedigree for an hour and pay a few hundred bucks when you need a factoid or an opinion? I consider IDC’s move to content marketing indicative of a fundamental shift in the character of a consulting firm’s business. The shift to low level PR work seems out of character for a professionals services with a commitment to intellectual rigor.

Every few days I learn that something called generates a list of content marketing leaders. Will IDC appear on this list?

For those who depend on lower- or mid tier consulting firms for professional counsel, how would you answer these questions:

  1. What is the intellectual substance behind pronouncements? Is there original research underpinning pronouncements and projections, or are the data culled from secondary sources and discussions with paying customers?
  2. What is the actual relationship between a mid tier consulting firm and the companies discussed in “authoritative” reports? Are these reports and projects inclusions (a fancy word for ads) or are they objective discussions of companies?
  3. Are the experts presented as “experts” actually experts or are they individuals who want to hit revenue goals while keeping costs as low as possible?

I don’t have definitive answers to these questions. Perhaps one day I can use a natural language query to tap into Big Data and rely on cognitive methods to provide answers.

For now, a one day twit storm is a wonderful example of how not to close deals, build reputations, and stimulate demand for advanced technology offered via a “Freemium” model. What the heck does that mean anyway?

Stephen E Arnold, September 29, 2014

Search Vendors Under Siege: The AI Aiyaiiii Revolution

September 26, 2014

Science is a marvelous manifestation of human curiosity. I read “Five Ways the Superintelligence Revolution Might Happen.” Note the “could” as in “woulda, coulda, shoulda.” These terms shingle protectively many backsides.

The write up lists the options for “superintelligence”, which is a variant of artificial intelligence with a dash up the Googler’s singularity stirred in for good measure. These scientists remember some of the basic methods of chemistry. List the constituents and mix ‘em up. See what happens.

Here are the five ways technological nirvana will arrive:

  1. Make intelligence out of software. Essential humans will code a brain. Great idea for a research project. Probably won’t work particularly well for a while.
  2. Just use math. This is the approach my uncle and his both would have favored. You remember Kolmogorov and Arnold, don’t you? Also, Googlers and Xooglers are into this approach.
  3. Brute force. This is the technological equivalent of using surplus military equipment to check out a noisy fraternity party. In terms of smart software, the article is into using von Neumann systems to crack P=NP problems. Yep, that’s a good idea.
  4. Plagiarizing nature. This idea is that one emulates in software biological processes. Think how ants find a ham sandwich at your picnic. I was on the board of a company that puttered around organic algorithms in 2001. For some problems attacked via well managed methods, the results are interesting. Maybe IBM’s new chip or quantum computing will help out, but it is a subset of the math and brute force approach. Messy categorization I conclude.
  5. Use humans plus any other methods that seems to work. There you go. This is the state of the art. I have discussed the approach in my analysis of Google’s nowcasting model here. Only hitch? Well, it is not right when it counts: Ebola threat, ISIS/ISIL, horse racing.

Net net: Not much in the breakthrough arena. Looks like we’re into incremental improvements. This method will work, but it won’t arrive quickly enough to keep some of the venture firms funding the wild and crazy AI Aiyaiii world.

The outfits that will be directly affected by this AI craziness are the search and content processing vendors. Many of the companies in this sector will assert that their systems are “intelligent,” “able to comprehend human utterance,” and “predict” user needs. The problem is that delivering on inflated expecations is more difficult than doing a PowerPoint about the magic of information access.

Stephen E Arnold, September 26, 2014

MarkLogic Bets New Offices Equal Revenues

September 25, 2014

MarkLogic, founded more than a decade ago, is an interesting company. I heard that Google kicked its tires because Christopher Lindblad is a true wizard.

The outfit offers an Extensible Markup Language data management solution. Over the years, the company has positioned the system to slice and dice content for publishers, intelligence analysis for government entities, and enterprise search. Along the way, the company’s technology has been shaped to meet the needs of the pivoting forces in content processing. Stated another way, when one thing won’t sell at a pace to keep investors happy, try another way. In the course of its journey, the company brushed against Oracle and then found itself snarled in the confusion between JSON and XML and the sort of open proprietary extensions to the query language used to extract results from the XML store only to get buffeted by the hoo hah about Hadoop and assorted open source alternatives to Codd databases. Wow.

I read a content marketing / public relations story called “MarkLogic Expands Global Reach with New Offices in Chicago.” Check the source quickly because some BusinessWire content can disappear or become available to those who fork over dough to the “news” service. The write up asserted:

“The opening of these new offices is well-timed for the growing number of global customers who need the enterprise grade NoSQL solutions we are delivering to US-based customers,” said David Ponzini, senior vice president of corporate development, MarkLogic. “We are in an advantageous position to make an immediate impact in Europe and Southeast Asia. We continue broadening the market awareness for MarkLogic throughout the world.”

The trick, of course, will be to blast through the financial goals for the company set by the investors years ago. A failure to produce more than $60 million in revenues a several years ago led to the departure of one president. A couple of more senior executives have spun through the revolving door not too far from Google Island with its quirky dinosaur skeleton. Does that skeleton stand as a metaphor to proprietary software solutions?

In my view, the business thinking at work is more sales offices equals more sales. I once had an office in Manhattan even though I worked in Illinois. The cost was about $20 per month. I had an address on Park Avenue, south unfortunately and a 212 phone number. I made a sale or two to an organization run by John Suhler, but I quickly figured out that the key to making sales was my being in and around midtown.

I thought I read that outfits like IBM are going to a “no office” approach. Maybe MarkLogic has identified a solution to the overhead associated with full time equivalents and physical space? That begs another question, “What does MarkLogic know that IBM does not know?”

Some vendors have found that more sales offices increase costs without generating sufficient revenue to cover the overhead, miscellaneous costs and in country marketing expenses. I can name several Paris, France based content processing companies who learned first hand that additional offices are a very, very expensive proposition. Other companies leverage partners for revenues. In one of my industry reports, I pointed out that prior to the sale of Autonomy to HP, Autonomy figured out a hybrid sales model that seemed to work as long as Dr. Lynch was cracking the whip. Remove the management, the partnering model can go off the rails.

Don’t get me wrong. XML is a wonderful solution to certain types of information challenges. Thomson Reuters can produce hundreds of for fee publications using XQuery and XSLT with proprietary extensions. A quick look at Thomson Reuters financial results suggest that more may be needed by this company than a foundation and an XML data store.

How quickly will MarkLogic deliver a five or ten X return on the $70 million investors have pumped in. In today’s market, cranking out $300 to $700 million in revenues from content processing technology that competes with open source alternatives is a tall order.

Maybe more sales offices will do it? My hunch is that more closed deals is the evidence some stakeholders seek.

Stephen E Arnold, September 25, 2014

Lucid Works: Pando Daily Sets the Record Straight

September 23, 2014

On LinkedIn I learned about this Pando Daily write up: “How Disgruntled Ex-Employees and Bad Reporting Hung LucidWorks Out to Dry.” I noted the Venture Beat analysis of Lucid Works in my post on September 6, 2014. My focus was the wild and crazy information from an “expert” about various factoids. You can read my reaction to the “Trouble at LucidWorks” story here.

The Pando Daily story comes at the issue in a different way. I was delighted to see that Pando found the “expert’s” comments a bit wobbly. There was an interesting run down about Lucid Works that seems to have come from a different point of view. In a way, the two stories—Venture Beat’s and Pando Daily’s—are a bit like the he said, she said information provided to police investigating a married couple’s disturbing the peace incident. I am no cop, so I can’t figure out who is correct and who is incorrect.

Pando takes this tack:

More accurately: It’s [Lucid Works] a startup, and this shit is hard.

I understand that search is hard, but is an eight year old company a start up? That time span baffled me. Coveo asserts that it too is a start up. Other search vendors dating from the implosion of the Big Five in 2006 also use the start up moniker.

the article points out that there are happy employees and positive investors. More money is likely to be needed. Pando Daily quotes a backer as saying:

We won’t start looking for an expansion round until early next year.

ElasticSearch has amassed about $90 million in funding. So LucidWorks may be thinking it needs the same scale of investment to take wing.

With regard to management, Pando Daily reports that the new top dog is the type of CEO who can deliver revenues. The new president—Will Smith—is described in this context:

On this point, VentureBeat seems oddly hung up on the idea that Hayes is a first-time CEO, perhaps failing to realize that Silicon Valley was (and continues to be) literally built on the success of first-time CEOs. Not to over egg the point, but Mark Zuckerberg and Steve Jobs were first-time CEOs.

Pando Daily added:

As an early member of the Splunk team, Hayes is certainly more qualified for this job than 99 percent of the candidates out there, and more importantly, given that he didn’t found the company, he appears excited about the category.

Pando Daily reminded me that good start ups fire people. I understand the difference between the Silicon Valley approach to management and that practiced at Halliburton and Booz, Allen & Hamilton where I worked for many years. The idea of stability is not always congruent with the needs of a fast moving, pivoting technology company.

Pando Daily also takes issue with Venture Beat’s report that Lucid Works fumbled deals with some real big companies. Pando Daily asserted:

These accounts may or may not have any basis in reality, but they hardly indicate a failing company. The very nature of sales and business development is that deals fall apart all the time. Sometimes those are big deals, sometimes not. The facts are that LucidWorks counts Apple, Sears, Verizon, ADP, Raytheon, Zappos, Qualcomm, Ford, eHarmony, Cisco, and others among current customers.

My reaction to this is okay, but won’t naming these firms give ElasticSearch and other firms a target at which to shoot. Some content processing vendors like Palantir and Recorded Future don’t provide too much information about their customers.

On the all important revenue front, Pando Daily quoted the new top dog at Lucid Works as saying:

“$12 million in services revenue isn’t worth shit,” Hayes says. “But $12 million in product sales on subscription? That’s a $100 million business.”

I agree. Unless the subscriber terminates the subscription. As the competition among content processing vendors heats up, some firms will be quite aggressive in their attempts to take away business. Amazon, for example, seems to be struggling with search, but it could get its act together and offer both a good enough solution at very competitive prices. Amazon is not the only sharp toothed outfit in the pond.

Pando Daily tracked down its own search wizard. That poobah said:

Not everyone agrees that enterprise search is quite this sexy. One enterprise analyst, speaking to Pando on the condition of anonymity, describes it as “not that big of an end market.” But at the same time, it’s one that’s still out there for the taking. “There isn’t really a single company or set of companies that have dominant products in the space,” this analyst says. Google and Microsoft have entered the market (the latter via acquisition) with low-cost offerings that would seem to make the competitive environment more challenging for LucidWorks and other upstarts. But according to the company’s supporters, these products are targeting different, less big data-centric applications and are thus not a valid comparison.

If you have ever listened to opposing expert witnesses in a legal dispute, the same factoid gets very different treatment by each expert. That’s what makes subjective expertise difficult to interpret. My view is that enterprise search is struggling for credibility. Some of the value for information retrieval has been exhausted by vendors now out of business. These include Convera, Delphes, Entopia, Siderean, and others. Some credibility has been eroded as a result of the Fast Search & Transfer matter. The CEO was hit with a jail term and a ban on working in search for a couple of years. Then there is the on going dispute between Hewlett Packard and Autonomy. IDOL is an aging technology like Endeca. But the mud slinging about search and content processing does not improve the image of those working in this sector.

Consequently information retrieval companies are working overtime to explain their solutions in terms that do not invoke memories of Convera or Fast Search. Palantir is a data mining company. Record Future does predictive analytics. Coveo is eDiscovery and customer support. Search vendors are using a wide range of jargon to describe findability. Lucid Works is brave in using enterprise search with a dash of Big Data in its marketing.

Pando Daily said:

Journalism is tough, particularly in the technology sector. Reporters in this industry asked to cover complex and rapidly evolving companies that often take on hordes of venture cash and set outrageous performance expectations. Unseemly as it may be, stories of failure and calamity make for good scoops, and in these cases ex-employees and competitors often make the best sources. Unfortunately, they also can be the most biased sources and are often are in the best position to credibly lead a journalist astray. LucidWorks certainly has its warts and its scars. But that doesn’t make it trouble, that only makes it a startup.

One question remains: When does a company cease to be a start up and start to be a viable company? Is it one years, four years, or eight years? I just don’t know, but I think that companies that have been in business for almost a decade may not be start ups. Management with a start up mentality may not want to face the cold realities expected of established, stable firms. With Lucid’s technology originating with a community, management may be the issue to watch at Lucid Works. Good management can produce revenue, happy employees, and contented customers. Its absence is often evidenced by a lack of harmony.

Stephen E Arnold, September 23, 2014

Concept Searching: More Smart Content Rah Rah

September 23, 2014

I read “Concept Searching Taxonomy Workflow Tool solving Migration, Security, and Records Management Challenges.” This is a news release and it can disappear at any time. Don’t hassle me if it is a goner. The write up walks me rapidly into the smart content swamp. The idea is that content without indexing is dumb content. Okay. Lots of folks are pitching the smart versus dumb content idea now.

The fix? Concept Searching provides a smart tool to make content intelligent; that is, include index terms. For the youngster at heart, “indexing” is the old school word for metadata.

The company’s news announcement asserts:

conceptTaxonomyWorkflow serves as a strategic tool, managing enterprise metadata to drive business processes at both the operational and tactical levels. It provides administrators with the ability to independently manage access, information management, information rights management, and records management policy application within their respective business units and functional areas, without the need for IT support or access to enterprise-wide servers. By effectively and accurately applying policy across applications and content repositories, conceptTaxonomyWorkflow enables organizations to significantly improve their compliance and information governance initiatives.

The product name is indeed a unique string in the Google index. The company asserts that the notion of a workflow is strategic. Not only is workflow strategic, it is also tactical. For some, this is a two for one deal that may be heard to resist. The tool allows administrators to perform what appears to be tasks I think of “editorial policy” or as the young at heart say, information governance.

The only issue for me is that the organizations with which I am familiar have pretty miserable information governance methods. What I find is that organizations have Balkanized methods for dealing with digital information. Examples of poor information governance fall readily to hand. The US court system removed public documents only to reinstate them. The IRS in the US cannot locate email. And when the IRS finds an archive of the email, the email cannot be searched. And, of course, there is Mr. Snowden. How many documents did he remove from NSA servers?

The notion that the CTW tool makes it possible to “apply policy across applications and content repositories” sounds absolutely fantastic to a person with indexing experience. There is a problem. Many organizations do not understand an editorial policy or are willing to do much more than react when something goes off the tracks. The reality is that the appetite for meaningful action is often not in commercial enterprises or government entities. Budgets remain tight. Reducing information technology budgets is often a more important goal than improve information technology.

What’s this mean?

My hunch is that Concept Searching is offering a product for an organization that [a] has an editorial policy in place or [b] wants to appear to be taking meaning steps toward useful information governance.

The president of Concept Searching is taking a less pragmatic approach to selling this tool. Martin Garland, according to the company story, states:

Managing metadata and auto-classifying to taxonomies provides high value in applications such as search, text analytics, and business social. But many forward thinking organizations are now looking to leverage their enterprise metadata and use it to improve business processes aligned with compliance and information governance initiatives. To accomplish this successfully, technologies such as conceptTaxonomyWorkflow must be able to qualify metadata and process the content based on enterprise policies. A key benefit of the product is its ease of use and rapid deployment. It removes the lengthy application development cycle and can be used by a large community of business specialists as well as IT.

The key benefit, for me, is that a well conceived and administered information policy eliminates risks of an information misstep. I would suggest that the Snowden matter was a rather serious misstep.

One assumes that companies have information policies, stand behind them, and keep them current. This strikes me as a quite significant assumption.

A similar message is now being pushed by Smartlogic, TEMIS, WAND, and other “indexing” companies.

Are these products delivering essentially similar functionality? Is any system indexing with less than a 10 percent error rate? Are those with responsibility for figuring out what to do with the flood of digital information equipped to enforce organization wide policies? And once installed, will the organization continue to commit resources to support tools that manage indexing? What happens if Microsoft Azure Search and Delve deliver good enough indexing and controls?

These are difficult questions to answer. Based on the pivoting content processing vendors are doing, most companies selling information solutions are trying to find a way to boost revenues in an exhausting effort to maintain stable cash flows.

Does anyone make an information governance tool that keeps track of what information retrieval companies market?

Stephen E Arnold, September 23, 2014

Mondeca: Content IQ

September 23, 2014

I reacted strongly to the IDC report about the knowledge quotient. IDC, as you know, is the home of the fellow who sold my content on Amazon without written permission. I learned that Mondeca is using a variant of “knowledge quotient.” This company’s approach taps the idea of the intelligence quotient of content.

I interpret content with a high IQ in a way that is probably not what Mondeca intended. Smart content is usually content that conveys information that I find useful. Modena, like other purveyors of indexing software, uses the IQ to refer to content that is indexed in a meaningful way. Remember if the users do not use the index terms, assigning these terms to a document does not help a user. Effective indexing helps the user find content. In the good old days of specialist indexing, users had to learn the indexing vocabulary and conventions. Today users just pump 2.7 words into a search box and feel lucky.

Like vendors of automated indexing systems and software, humans have to get into the mix.

One twist Modena brings to the content IQ notion is a process that helps a potential licensee answer the question, “How smart is your content?” For me, poorly indexed content is not smart. The content is simply poorly indexed.

I navigated to the “more information” link on the Content IQ page and learned that answering the question costs 5000 Euros, roughly $6,000.

Like the knowledge quotient play, smart content and allied jargon make an effort to impart a halo of magic around a pretty obvious function. I suppose that in today’s market, clarity is not important. Marketing magic is needed to create a demand for indexing.

I believe professionally administered indexing is important. I was one of the people responsible for creating the ABI/INFORM controlled vocabulary revision and the reindexing of the database in 1981. Our effort involved controlled terms, company name fields, and a purpose built classification system.

Some day automated systems will be able to assign high value index terms without humans. I don’t think that day has arrived. To create smart content, have smart people write it. Then get smart, professional indexers to index it. If a software system can contribute to the effort, I support that effort. I am just not comfortable with the “smart software” trend that is gaining traction.

Stephen E Arnold, September 23, 2014

Luxid: Positioning Adjustments

September 23, 2014

Luxid, based in Paris, offers an automatic indexing service. The company has focused on the publishing sector as well a number of other verticals. The company uses the phrase “semantic content enrichment” to describe the companies indexing. The more trendy phrase is “metatagging,” but I prefer the older term.

The company also uses the term “ontology” along with references to semantic jargon like “triples.” The idea is that a licensee can select a module that matches an industry sector. WAND, a competitor, offers a taxonomy library. The idea is that much of the expensive and intellectually demand work needed to build a controlled vocabulary from scratch is sidestepped.

The positioning that I find interesting is that Luxid delivers “NLP enabled ontology management workflow.” The idea is that once the indexing system is installed, the licensee can maintain the taxonomy using the provided interface. This is another way of saying that administrative tools are included. Another competitor, Smartlogic, uses equally broad and somewhat esoteric terms to describe what are essential indexing operations.

Like other search and content processing vendors, Luxid invokes the magic of Big Data. Luxid asserts, “Streamlined, Big Data architecture offers improved scalability and robust integration options.” The point that indexing processes often stub toes is the amount of human effort and machine processing time required to keep and index updated and populate the new content across already compiled indexes. Scalability can be addressed with more resources. More resources often means increased costs, a challenge for any indexing system that deals with regular content, not just Big Data.

Will the revised positioning generate more inquiries and sales leads? Possibly. I find the wordsmithing content processing vendors use fascinating. The technology, despite the academic jargon, has been around since the days of Data Harmony and other aging methods.

The key points, in my view, is that Luxid offers a story that makes sense. The catnip may be the jargon, the push into publishing which is loath to spend for humans to create indexes, and the packaging of vocabularies into “Skill Cartridges.”

I anticipate that some of Luxid’s competitors will emulate the Luxid terminology. For many years, much of the confusion about which content processing does what can be traced to widespread use of jargon.

Stephen E Arnold, September 22, 2014

« Previous PageNext Page »