September 29, 2014
In 2012 and 2013, IDC sold my content with my name and Dave Schubmehl’s. These were nifty IDC “official” reports. The only hitch in the git along is that IDC did not trouble itself to issue a contract, get my permission, or tell me what they were doing with research my team prepared. The deal was witnessed by a law librarian, and I have a stack of emails about my research into such open source companies as Attivio, ElasticSearch (one of the disruptors of the enterprise search market), IBM (the subject of the IDC twit storm), Lucid Imagination (now Lucid Works which I write when I feel playful as Lucid works, really?), and eight other companies.
Hit by a twit storm. Rough seas ahead. Image from www.qsl.net.
In 2012, I had the open source research. IDC wanted the open source content to use in a monograph. So in front of a law librarian, IDC’s search “expert” thought the exchange of my information for open source intelligence, money, and stuff to sell was a great idea. (I have a file of email from IDC to me about what IDC wanted, but I never got a contract. But IDC had my research. Ah, those administrative delays.) IDC, however, was organized enough to additions to my company research like an open source industry overview.
In an odd approach to copyright, IDC did not produce a contract but it produced reports about four open source companies. Mr. Schubmehl and IDC just went about producing what were recycled company reports and trying to sell them at $3,500 a whack. Is that value or an example of the culture of narcissism? It may come as a surprise to you, gentle reader, but I sell research for money. I have a business model and it has worked for about 40 years. When an outfit uses the research without issuing a contract, I have to start thinking about such issues as fairness, integrity, copyright, and name surfing. Call me idiosyncratic, but when my name is used without my permission, I wonder how a big and allegedly respected organization can operate like a BearStearns-type senior executive.
Then, the straw that broke the proverbial camel’s back, a librarian told me that IDC was selling a report with my name and Mr. Schubmehl’s on Amazon. Wow, Amazon, the Wal-Mart for the digital age. The reports, now removed from Amazon’s blue light special shelf cost $3,500. Not bad for eight pages of information based on my year long research investment into the wild and volatile world of open source search and content processing. Surf’s up for Mr. Schubmehl.
Well, IDC after some prodding by my very gentle legal gerbil stopped selling my work. We received a proposal that offered me a pittance for a guarantee that I would not talk or write about this name surfing, unauthorized resale of my information on Amazon, and the flubs of Mr. Schubmehl.
My legal gerbil rejected IDC’s lawyer crafted “deal,” and I am now converting my IDC misadventure into a metaphor for some of the deeper issues associated with “experts” and certain professional services firms. My legal gerbil suggested a significantly higher fee, but, like many of that ilk, the gerbil broke my heart.
Hence, IDC and Mr. Schubmehl’s tweets and twit storm are on my fragile ship’s radar. Let’s review the IBM IDC Schubmehl twit storm on just one day in September 2014. Trigger warning: Do not emulate the IDC Schubmehl method for your content marketing program. One day of tweets only generates a lot of twit.
Now to the Twit Storm Unleashed on September 16, 2014
Using my Overflight system, I monitor IDC tweets. Quite an interesting series of tweets appears on September 16, 2014. Mr. Schubmehl posted 25 tweets about IBM Watson.
Here are three examples of the Watson content content to which his name was attached::
- September 16, 2014.
#WatsonAnalytics uses Watson cognitive technologies to ingest structured data and find relationships – Robin Grosset & Dan Wolfson
- September 16, 2014 Combo of cognitive with cloud analytics improves process, analysis and decision making – cognitive will change all mkts
- September 16, 2014
#WatsonAnalytics will be using a freemium model….first time for IBM…
Obviously there is nothing wrong with a tweet about an IBM product. What’s one more twit emission in a flow of several hundred thousand 144 character text outputs.
There is nothing illegal with two dozen tweets about IBM. What two dozen tweets do is make me laugh and see this content marketing effort as fodder for corporate weirdness.
Also, this IBM twit storm is not on the Miley Cyrus or Lady Gaga scale, but it is notable because it is a one day twit storm quite unlike the Jeopardy journey. Quite a marketing innovation: getting an alleged “expert” to craft 16 “original” tweets in one day and issue seven retweets of tweets from others who are fans of Big Blue. A few Schubmehl tweets on the 16th illustrated diversity; for example, “The FBI’s Facial Recognition System Is Here.” Hmm. The FBI and facial recognition. I wonder why one is interested in this development.
The terms mentioned in these IBM centric tweets on September 16, 2014, reveal the marketing jargon that IBM is using to generate revenue from the game show winning technology. My list of buzzwords from the tweets read like a who’s who of blogosphere and venture oriented yak:
- Automated data cleansing
- Analytics (cloud based)
- Big Data
- Cognitive (system and capabilities)
- Data explorer
- Natural Language Computing
- Natural Language Query.
From this list of buzzwords my favorites are “cognitive,” “Big Data,” and the number one silly word “Freemium.” Imagine. Freemium from IBM. Imagine.
My Interpretation of the Twit Storm
Let me capture several preliminary observations:
First, the Schubmehl Twitter activity on September 16, 2014 focuses mostly on IBM’s challenged Watson business development effort. The cluster of tweets on the 16th suggest a somewhat ungainly and down-market content marketing play.
Did Mr. Schubmehl wake up on the 16th of September and decide to crank out Watson centric tweets? Did IBM pay IDC and Mr. Schubmehl to do some content marketing like thousands of PR firms do each day? We even have these outfits in Harrod’s Creek, Kentucky to flog auto sales, bourbon, and cheesy festivals in Middletown, Kentucky.
Here’s a question: “How many tweets does a McKinsey or Bain type of consulting firm issue on a single day for a single product that seems to be struggling for revenue?” If you know, please, use the comments section of this blog to provide some factoids.
Second, the tweets provide the reader with a list of what seem to be IBM Watson aficionados or employees who have the job of making the shotgun marriage of open source code, legacy Almaden technology, and proprietary scripts into a billion dollar revenue producer soon, very soon, gentle reader. The individuals mentioned in the September 16, 2014, tweets include:
- Steve Gold, Baylor University
- Robin Grosset, Distinguished engineer Watson Analytics.
- Dan Wolfson, IBM Distinguished Engineer
- Bob Picciano, Senior vice president, IBM information and analytics group.
Perhaps Mr. Gold is objective? I ask, “Do the other three IBM wizards looking at the world through IBM tinted spectacles when reading their business objectives for the current fiscal year?” I asked myself, “Should I trust these individuals who presumably are also “experts” in all things related to Watson?” My preliminary answer is, “Not for an objective view of the game show winning Watson.”
Third, what’s the payoff of this twit storm for IBM? Did IBM expect me to focus on the Schubmehl twit storm and convert the information into my idea of a 10 minute stand up comedy routine to deliver at the upcoming intelligence and law enforcement conference in nine days? Is it possible that “doing social media” looks good on a weekly report when an executive does not have juicy revenue numbers to present? The value of the effort strikes me as modest. In fact, viewed as a group, the tweets could be interpreted as a indicator of IBM’s slide into desperation marketing?
What about consulting firms and their ability to pump out high margin revenue?
Outfits like Gerson Lehrman Group have put the squeeze on mid tier consulting firms. The bottom feeders with its middle school teacher and poet contingent are not likely to sell to the IBMs of the world. GLG types companies are also nipping at the low end business of the blue chip outfits like Bain, Boston Consulting, and even McKinsey.
Put GLG can deliver to a client retired professionals from blue chip firms and on point experts. As a result, GLG has made life very, very tough for the mid tier outfits. Why pay $50,000 for an unproven “expert” when you can buy a person with a pedigree for an hour and pay a few hundred bucks when you need a factoid or an opinion? I consider IDC’s move to content marketing indicative of a fundamental shift in the character of a consulting firm’s business. The shift to low level PR work seems out of character for a professionals services with a commitment to intellectual rigor.
Every few days I learn that something called TopSEOs.com generates a list of content marketing leaders. Will IDC appear on this list?
For those who depend on lower- or mid tier consulting firms for professional counsel, how would you answer these questions:
- What is the intellectual substance behind pronouncements? Is there original research underpinning pronouncements and projections, or are the data culled from secondary sources and discussions with paying customers?
- What is the actual relationship between a mid tier consulting firm and the companies discussed in “authoritative” reports? Are these reports and projects inclusions (a fancy word for ads) or are they objective discussions of companies?
- Are the experts presented as “experts” actually experts or are they individuals who want to hit revenue goals while keeping costs as low as possible?
I don’t have definitive answers to these questions. Perhaps one day I can use a natural language query to tap into Big Data and rely on cognitive methods to provide answers.
For now, a one day twit storm is a wonderful example of how not to close deals, build reputations, and stimulate demand for advanced technology offered via a “Freemium” model. What the heck does that mean anyway?
Stephen E Arnold, September 29, 2014
September 26, 2014
Science is a marvelous manifestation of human curiosity. I read “Five Ways the Superintelligence Revolution Might Happen.” Note the “could” as in “woulda, coulda, shoulda.” These terms shingle protectively many backsides.
The write up lists the options for “superintelligence”, which is a variant of artificial intelligence with a dash up the Googler’s singularity stirred in for good measure. These scientists remember some of the basic methods of chemistry. List the constituents and mix ‘em up. See what happens.
Here are the five ways technological nirvana will arrive:
- Make intelligence out of software. Essential humans will code a brain. Great idea for a research project. Probably won’t work particularly well for a while.
- Just use math. This is the approach my uncle and his both would have favored. You remember Kolmogorov and Arnold, don’t you? Also, Googlers and Xooglers are into this approach.
- Brute force. This is the technological equivalent of using surplus military equipment to check out a noisy fraternity party. In terms of smart software, the article is into using von Neumann systems to crack P=NP problems. Yep, that’s a good idea.
- Plagiarizing nature. This idea is that one emulates in software biological processes. Think how ants find a ham sandwich at your picnic. I was on the board of a company that puttered around organic algorithms in 2001. For some problems attacked via well managed methods, the results are interesting. Maybe IBM’s new chip or quantum computing will help out, but it is a subset of the math and brute force approach. Messy categorization I conclude.
- Use humans plus any other methods that seems to work. There you go. This is the state of the art. I have discussed the approach in my analysis of Google’s nowcasting model here. Only hitch? Well, it is not right when it counts: Ebola threat, ISIS/ISIL, horse racing.
Net net: Not much in the breakthrough arena. Looks like we’re into incremental improvements. This method will work, but it won’t arrive quickly enough to keep some of the venture firms funding the wild and crazy AI Aiyaiii world.
The outfits that will be directly affected by this AI craziness are the search and content processing vendors. Many of the companies in this sector will assert that their systems are “intelligent,” “able to comprehend human utterance,” and “predict” user needs. The problem is that delivering on inflated expecations is more difficult than doing a PowerPoint about the magic of information access.
Stephen E Arnold, September 26, 2014
September 25, 2014
MarkLogic, founded more than a decade ago, is an interesting company. I heard that Google kicked its tires because Christopher Lindblad is a true wizard.
The outfit offers an Extensible Markup Language data management solution. Over the years, the company has positioned the system to slice and dice content for publishers, intelligence analysis for government entities, and enterprise search. Along the way, the company’s technology has been shaped to meet the needs of the pivoting forces in content processing. Stated another way, when one thing won’t sell at a pace to keep investors happy, try another way. In the course of its journey, the company brushed against Oracle and then found itself snarled in the confusion between JSON and XML and the sort of open proprietary extensions to the query language used to extract results from the XML store only to get buffeted by the hoo hah about Hadoop and assorted open source alternatives to Codd databases. Wow.
I read a content marketing / public relations story called “MarkLogic Expands Global Reach with New Offices in Chicago.” Check the source quickly because some BusinessWire content can disappear or become available to those who fork over dough to the “news” service. The write up asserted:
“The opening of these new offices is well-timed for the growing number of global customers who need the enterprise grade NoSQL solutions we are delivering to US-based customers,” said David Ponzini, senior vice president of corporate development, MarkLogic. “We are in an advantageous position to make an immediate impact in Europe and Southeast Asia. We continue broadening the market awareness for MarkLogic throughout the world.”
The trick, of course, will be to blast through the financial goals for the company set by the investors years ago. A failure to produce more than $60 million in revenues a several years ago led to the departure of one president. A couple of more senior executives have spun through the revolving door not too far from Google Island with its quirky dinosaur skeleton. Does that skeleton stand as a metaphor to proprietary software solutions?
In my view, the business thinking at work is more sales offices equals more sales. I once had an office in Manhattan even though I worked in Illinois. The cost was about $20 per month. I had an address on Park Avenue, south unfortunately and a 212 phone number. I made a sale or two to an organization run by John Suhler, but I quickly figured out that the key to making sales was my being in and around midtown.
I thought I read that outfits like IBM are going to a “no office” approach. Maybe MarkLogic has identified a solution to the overhead associated with full time equivalents and physical space? That begs another question, “What does MarkLogic know that IBM does not know?”
Some vendors have found that more sales offices increase costs without generating sufficient revenue to cover the overhead, miscellaneous costs and in country marketing expenses. I can name several Paris, France based content processing companies who learned first hand that additional offices are a very, very expensive proposition. Other companies leverage partners for revenues. In one of my industry reports, I pointed out that prior to the sale of Autonomy to HP, Autonomy figured out a hybrid sales model that seemed to work as long as Dr. Lynch was cracking the whip. Remove the management, the partnering model can go off the rails.
Don’t get me wrong. XML is a wonderful solution to certain types of information challenges. Thomson Reuters can produce hundreds of for fee publications using XQuery and XSLT with proprietary extensions. A quick look at Thomson Reuters financial results suggest that more may be needed by this company than a foundation and an XML data store.
How quickly will MarkLogic deliver a five or ten X return on the $70 million investors have pumped in. In today’s market, cranking out $300 to $700 million in revenues from content processing technology that competes with open source alternatives is a tall order.
Maybe more sales offices will do it? My hunch is that more closed deals is the evidence some stakeholders seek.
Stephen E Arnold, September 25, 2014
September 23, 2014
On LinkedIn I learned about this Pando Daily write up: “How Disgruntled Ex-Employees and Bad Reporting Hung LucidWorks Out to Dry.” I noted the Venture Beat analysis of Lucid Works in my post on September 6, 2014. My focus was the wild and crazy information from an “expert” about various factoids. You can read my reaction to the “Trouble at LucidWorks” story here.
The Pando Daily story comes at the issue in a different way. I was delighted to see that Pando found the “expert’s” comments a bit wobbly. There was an interesting run down about Lucid Works that seems to have come from a different point of view. In a way, the two stories—Venture Beat’s and Pando Daily’s—are a bit like the he said, she said information provided to police investigating a married couple’s disturbing the peace incident. I am no cop, so I can’t figure out who is correct and who is incorrect.
Pando takes this tack:
More accurately: It’s [Lucid Works] a startup, and this shit is hard.
I understand that search is hard, but is an eight year old company a start up? That time span baffled me. Coveo asserts that it too is a start up. Other search vendors dating from the implosion of the Big Five in 2006 also use the start up moniker.
the article points out that there are happy employees and positive investors. More money is likely to be needed. Pando Daily quotes a backer as saying:
We won’t start looking for an expansion round until early next year.
ElasticSearch has amassed about $90 million in funding. So LucidWorks may be thinking it needs the same scale of investment to take wing.
With regard to management, Pando Daily reports that the new top dog is the type of CEO who can deliver revenues. The new president—Will Smith—is described in this context:
On this point, VentureBeat seems oddly hung up on the idea that Hayes is a first-time CEO, perhaps failing to realize that Silicon Valley was (and continues to be) literally built on the success of first-time CEOs. Not to over egg the point, but Mark Zuckerberg and Steve Jobs were first-time CEOs.
Pando Daily added:
As an early member of the Splunk team, Hayes is certainly more qualified for this job than 99 percent of the candidates out there, and more importantly, given that he didn’t found the company, he appears excited about the category.
Pando Daily reminded me that good start ups fire people. I understand the difference between the Silicon Valley approach to management and that practiced at Halliburton and Booz, Allen & Hamilton where I worked for many years. The idea of stability is not always congruent with the needs of a fast moving, pivoting technology company.
Pando Daily also takes issue with Venture Beat’s report that Lucid Works fumbled deals with some real big companies. Pando Daily asserted:
These accounts may or may not have any basis in reality, but they hardly indicate a failing company. The very nature of sales and business development is that deals fall apart all the time. Sometimes those are big deals, sometimes not. The facts are that LucidWorks counts Apple, Sears, Verizon, ADP, Raytheon, Zappos, Qualcomm, Ford, eHarmony, Cisco, and others among current customers.
My reaction to this is okay, but won’t naming these firms give ElasticSearch and other firms a target at which to shoot. Some content processing vendors like Palantir and Recorded Future don’t provide too much information about their customers.
On the all important revenue front, Pando Daily quoted the new top dog at Lucid Works as saying:
“$12 million in services revenue isn’t worth shit,” Hayes says. “But $12 million in product sales on subscription? That’s a $100 million business.”
I agree. Unless the subscriber terminates the subscription. As the competition among content processing vendors heats up, some firms will be quite aggressive in their attempts to take away business. Amazon, for example, seems to be struggling with search, but it could get its act together and offer both a good enough solution at very competitive prices. Amazon is not the only sharp toothed outfit in the pond.
Pando Daily tracked down its own search wizard. That poobah said:
Not everyone agrees that enterprise search is quite this sexy. One enterprise analyst, speaking to Pando on the condition of anonymity, describes it as “not that big of an end market.” But at the same time, it’s one that’s still out there for the taking. “There isn’t really a single company or set of companies that have dominant products in the space,” this analyst says. Google and Microsoft have entered the market (the latter via acquisition) with low-cost offerings that would seem to make the competitive environment more challenging for LucidWorks and other upstarts. But according to the company’s supporters, these products are targeting different, less big data-centric applications and are thus not a valid comparison.
If you have ever listened to opposing expert witnesses in a legal dispute, the same factoid gets very different treatment by each expert. That’s what makes subjective expertise difficult to interpret. My view is that enterprise search is struggling for credibility. Some of the value for information retrieval has been exhausted by vendors now out of business. These include Convera, Delphes, Entopia, Siderean, and others. Some credibility has been eroded as a result of the Fast Search & Transfer matter. The CEO was hit with a jail term and a ban on working in search for a couple of years. Then there is the on going dispute between Hewlett Packard and Autonomy. IDOL is an aging technology like Endeca. But the mud slinging about search and content processing does not improve the image of those working in this sector.
Consequently information retrieval companies are working overtime to explain their solutions in terms that do not invoke memories of Convera or Fast Search. Palantir is a data mining company. Record Future does predictive analytics. Coveo is eDiscovery and customer support. Search vendors are using a wide range of jargon to describe findability. Lucid Works is brave in using enterprise search with a dash of Big Data in its marketing.
Pando Daily said:
Journalism is tough, particularly in the technology sector. Reporters in this industry asked to cover complex and rapidly evolving companies that often take on hordes of venture cash and set outrageous performance expectations. Unseemly as it may be, stories of failure and calamity make for good scoops, and in these cases ex-employees and competitors often make the best sources. Unfortunately, they also can be the most biased sources and are often are in the best position to credibly lead a journalist astray. LucidWorks certainly has its warts and its scars. But that doesn’t make it trouble, that only makes it a startup.
One question remains: When does a company cease to be a start up and start to be a viable company? Is it one years, four years, or eight years? I just don’t know, but I think that companies that have been in business for almost a decade may not be start ups. Management with a start up mentality may not want to face the cold realities expected of established, stable firms. With Lucid’s technology originating with a community, management may be the issue to watch at Lucid Works. Good management can produce revenue, happy employees, and contented customers. Its absence is often evidenced by a lack of harmony.
Stephen E Arnold, September 23, 2014
September 23, 2014
I read “Concept Searching Taxonomy Workflow Tool solving Migration, Security, and Records Management Challenges.” This is a news release and it can disappear at any time. Don’t hassle me if it is a goner. The write up walks me rapidly into the smart content swamp. The idea is that content without indexing is dumb content. Okay. Lots of folks are pitching the smart versus dumb content idea now.
The fix? Concept Searching provides a smart tool to make content intelligent; that is, include index terms. For the youngster at heart, “indexing” is the old school word for metadata.
The company’s news announcement asserts:
conceptTaxonomyWorkflow serves as a strategic tool, managing enterprise metadata to drive business processes at both the operational and tactical levels. It provides administrators with the ability to independently manage access, information management, information rights management, and records management policy application within their respective business units and functional areas, without the need for IT support or access to enterprise-wide servers. By effectively and accurately applying policy across applications and content repositories, conceptTaxonomyWorkflow enables organizations to significantly improve their compliance and information governance initiatives.
The product name is indeed a unique string in the Google index. The company asserts that the notion of a workflow is strategic. Not only is workflow strategic, it is also tactical. For some, this is a two for one deal that may be heard to resist. The tool allows administrators to perform what appears to be tasks I think of “editorial policy” or as the young at heart say, information governance.
The only issue for me is that the organizations with which I am familiar have pretty miserable information governance methods. What I find is that organizations have Balkanized methods for dealing with digital information. Examples of poor information governance fall readily to hand. The US court system removed public documents only to reinstate them. The IRS in the US cannot locate email. And when the IRS finds an archive of the email, the email cannot be searched. And, of course, there is Mr. Snowden. How many documents did he remove from NSA servers?
The notion that the CTW tool makes it possible to “apply policy across applications and content repositories” sounds absolutely fantastic to a person with indexing experience. There is a problem. Many organizations do not understand an editorial policy or are willing to do much more than react when something goes off the tracks. The reality is that the appetite for meaningful action is often not in commercial enterprises or government entities. Budgets remain tight. Reducing information technology budgets is often a more important goal than improve information technology.
What’s this mean?
My hunch is that Concept Searching is offering a product for an organization that [a] has an editorial policy in place or [b] wants to appear to be taking meaning steps toward useful information governance.
The president of Concept Searching is taking a less pragmatic approach to selling this tool. Martin Garland, according to the company story, states:
Managing metadata and auto-classifying to taxonomies provides high value in applications such as search, text analytics, and business social. But many forward thinking organizations are now looking to leverage their enterprise metadata and use it to improve business processes aligned with compliance and information governance initiatives. To accomplish this successfully, technologies such as conceptTaxonomyWorkflow must be able to qualify metadata and process the content based on enterprise policies. A key benefit of the product is its ease of use and rapid deployment. It removes the lengthy application development cycle and can be used by a large community of business specialists as well as IT.
The key benefit, for me, is that a well conceived and administered information policy eliminates risks of an information misstep. I would suggest that the Snowden matter was a rather serious misstep.
One assumes that companies have information policies, stand behind them, and keep them current. This strikes me as a quite significant assumption.
A similar message is now being pushed by Smartlogic, TEMIS, WAND, and other “indexing” companies.
Are these products delivering essentially similar functionality? Is any system indexing with less than a 10 percent error rate? Are those with responsibility for figuring out what to do with the flood of digital information equipped to enforce organization wide policies? And once installed, will the organization continue to commit resources to support tools that manage indexing? What happens if Microsoft Azure Search and Delve deliver good enough indexing and controls?
These are difficult questions to answer. Based on the pivoting content processing vendors are doing, most companies selling information solutions are trying to find a way to boost revenues in an exhausting effort to maintain stable cash flows.
Does anyone make an information governance tool that keeps track of what information retrieval companies market?
Stephen E Arnold, September 23, 2014
September 23, 2014
I reacted strongly to the IDC report about the knowledge quotient. IDC, as you know, is the home of the fellow who sold my content on Amazon without written permission. I learned that Mondeca is using a variant of “knowledge quotient.” This company’s approach taps the idea of the intelligence quotient of content.
I interpret content with a high IQ in a way that is probably not what Mondeca intended. Smart content is usually content that conveys information that I find useful. Modena, like other purveyors of indexing software, uses the IQ to refer to content that is indexed in a meaningful way. Remember if the users do not use the index terms, assigning these terms to a document does not help a user. Effective indexing helps the user find content. In the good old days of specialist indexing, users had to learn the indexing vocabulary and conventions. Today users just pump 2.7 words into a search box and feel lucky.
Like vendors of automated indexing systems and software, humans have to get into the mix.
One twist Modena brings to the content IQ notion is a process that helps a potential licensee answer the question, “How smart is your content?” For me, poorly indexed content is not smart. The content is simply poorly indexed.
I navigated to the “more information” link on the Content IQ page and learned that answering the question costs 5000 Euros, roughly $6,000.
Like the knowledge quotient play, smart content and allied jargon make an effort to impart a halo of magic around a pretty obvious function. I suppose that in today’s market, clarity is not important. Marketing magic is needed to create a demand for indexing.
I believe professionally administered indexing is important. I was one of the people responsible for creating the ABI/INFORM controlled vocabulary revision and the reindexing of the database in 1981. Our effort involved controlled terms, company name fields, and a purpose built classification system.
Some day automated systems will be able to assign high value index terms without humans. I don’t think that day has arrived. To create smart content, have smart people write it. Then get smart, professional indexers to index it. If a software system can contribute to the effort, I support that effort. I am just not comfortable with the “smart software” trend that is gaining traction.
Stephen E Arnold, September 23, 2014
September 23, 2014
Luxid, based in Paris, offers an automatic indexing service. The company has focused on the publishing sector as well a number of other verticals. The company uses the phrase “semantic content enrichment” to describe the companies indexing. The more trendy phrase is “metatagging,” but I prefer the older term.
The company also uses the term “ontology” along with references to semantic jargon like “triples.” The idea is that a licensee can select a module that matches an industry sector. WAND, a competitor, offers a taxonomy library. The idea is that much of the expensive and intellectually demand work needed to build a controlled vocabulary from scratch is sidestepped.
The positioning that I find interesting is that Luxid delivers “NLP enabled ontology management workflow.” The idea is that once the indexing system is installed, the licensee can maintain the taxonomy using the provided interface. This is another way of saying that administrative tools are included. Another competitor, Smartlogic, uses equally broad and somewhat esoteric terms to describe what are essential indexing operations.
Like other search and content processing vendors, Luxid invokes the magic of Big Data. Luxid asserts, “Streamlined, Big Data architecture offers improved scalability and robust integration options.” The point that indexing processes often stub toes is the amount of human effort and machine processing time required to keep and index updated and populate the new content across already compiled indexes. Scalability can be addressed with more resources. More resources often means increased costs, a challenge for any indexing system that deals with regular content, not just Big Data.
Will the revised positioning generate more inquiries and sales leads? Possibly. I find the wordsmithing content processing vendors use fascinating. The technology, despite the academic jargon, has been around since the days of Data Harmony and other aging methods.
The key points, in my view, is that Luxid offers a story that makes sense. The catnip may be the jargon, the push into publishing which is loath to spend for humans to create indexes, and the packaging of vocabularies into “Skill Cartridges.”
I anticipate that some of Luxid’s competitors will emulate the Luxid terminology. For many years, much of the confusion about which content processing does what can be traced to widespread use of jargon.
Stephen E Arnold, September 22, 2014
September 22, 2014
I read “How IBM’s Watson Could Do for Analytics What Search Did for Google.” I urge you to flip through a math book like Calculus on Manifolds: A Modern Approach to Classical Theorems of Advanced Calculus. Although an older book, some of its methods are now creeping into the artificial intelligence revolution that seems to be the next big thing. Then read the Datamation write up.
IBM is rolling out a “freemium model to move Watson, their [sic] English language AI interface for analytics, into the market more aggressively.” What could be more aggressive than university contents, recipes for Bon Appétit, and curing cancer?
The article points out that the only competitor to Watson is Google. Well, that’s an interesting assertion.
Google put an interface on search I learned. The rest is Google’s dominance. Now IBM wants to put an interface on analytics, and—I assume it follows to the thinkers at IBM—IBM’s dominance will tag along.
The article asserts:
We often talk about analytics needing data scientists who have a unique skill set, allowing them to get out the answers needed from highly complex data repositories. Since the results of the analysis are supposed to lead to better executive decisions the ideal skill set would have been an MBA Data Scientist, yet I’ve actually never seen one of those. Folks who are good at deep analysis and folks that are good at business tend to be very different folks, and data scientists are in very short supply at the moment.
Well, someone has to:
- Select numerical recipes
- Set thresholds
- Select process sequences
- Select data and ensure that they are valid
- Set up outputs, making decisions about what to show and what not to show
- Modify when the outputs do not match reality. (I realize that this step is of little interest to some analytics users.)
The article concludes:
The Freemium model has similar advantages. So if you wrap a product that line executives should prefer with an economic model that removes most of the financial barriers, you should end up with a solution that does for IBM what Search did for Google. And that could do some interesting things to the analytics market, creating a similar set of conditions to those that put IBM on top of technology in the last century.
What’s a freemium model? What’s the purpose of the analysis? What’s the method to validate results? What controls does a clueless user have over the Watson system?
Oh, wait. Watson is a search system. Google is a search system that people use. Watson is a search system that few use. Also, IBM still sells mainframes. This is a useful factoid to keep in mind.
Stephen E Arnold, September 22, 2014
September 21, 2014
Editor’s Note: This amusing open letter to Chrissy Lee at Launchsquad Public Relations points out some of the challenges Lucid Imagination (now Lucid Works) faces. Significant competition exists from numerous findability vendors. The market leader in open source search is, in Beyond Search’s view, ElasticSearch.
Dear Ms. Lee,
I sent you an email on September 18, 2014, referring you to my response to Stacy Wechsler at Hired Gun public relations. I told you I would create a prize for the news release you sent me. I am retired, but I don’t have too much time to write for PR “professionals” who send me spam, fail to do some research about my background, and understand the topic addressed in your email.
Some history: I recall the first contact I had from Lucid Imagination in 2008. A fellow named Anil Uberoi sent me an email. He and I had a mutual connection, Mark Krellenstein who was the CTO for Northern Light when it was a search vendor.
I wrote a for fee report for Mr. Uberoi, who shortly thereafter left Lucid for an outfit called Kitana. His replacement was a fellow named David. He left and migrated to another company as well. Then a person named Nancy took over marketing and quickly left for another outfit. My recollection is that in a span of 24 months, Lucid Imagination churned through technical professionals, marketers, and presidents. Open source search, it seemed, was beyond the management expertise of the professionals at Lucid.
Then co founder Mark Krellenstein cut his ties with the firm, I wondered how Mr. Krellenstein could deliver the innovative folders function for Northern Light and flop at Lucid. Odd.
Recently I have been the recipient of several emails sent to my two major email accounts. For me, this is an indication of spam. I knew about the appointment of another president. I read “Trouble at Lucid Works: Lawsuits, Lost Deals, and Layoffs Plague the Search Startup Despite Funding.” Like other pundit-fueled articles, there is probably some truth, some exaggeration, and some errors in the article. The overall impression left on me by the write up is that Lucid Works seems to be struggling.
Your emails to me indicate that you perceive me as a “real” journalist. Call me quirky, but I do not like it when a chipper young person writes me, uses my first name, and then shovels baloney at me. As the purveyor of search silliness for your employer Launchsquad, which seems Lucid Works’ biggest fan and current content marketing agent. Not surprisingly, the new Lucid Fusion products is the Popeil pocket fisherman of search. Fusion slices, dices, chops, and grates. Here’s what Lucid Works allegedly delivers via Lucene/Solr and proprietary code:
- Modular integration. Sorry, Ms. Lee, I don’t know what this means.
- Big Data Discovery Engine. Ms. Lee, Lucid has a search and retrieval system, not a Cybertap, Palantir, or Recorded Future type system.
- Connector Framework. Ms. Lee licensees want connectors included. Salesforce bought Entropy Soft to meet this need. Oracle bought Outside In for the same reason. Even Microsoft includes some connectors with the quite fragile Delve system for Office 365.
- Intelligent Search Services.Ms. Lee, I suggest you read my forthcoming article in KMWorld about smart software. Today, most search services are using the word intelligent when the technology in use has been available for decades.
- Signals Processing.Ms. Lee, I suggest you provide some facts for signals processing. I think in terms of SIGINT, not crude click log file data.
- Advanced Analytics.Ms. Lee, I lecture at several intelligence and law enforcement conferences about “analytics.” The notion of “advanced” analytics is at odds with the standard numerical recipes that most vendors use. The reason “advanced” is not a good word is that there are mathematical methods that can deliver significant return. Unfortunately today’s computer systems cannot get around the computational barriers that bring x86 architectures to their knees.
- Natural Language Search.Ms. Lee, I have been hearing about NLP for many years. Perhaps you have not experimented with the voice search functions on Apple and Android devices? You should. Software does a miserable job of figuring out what a human “means.”
Frankly I am not confident that Lucid Works can close the gap between your client and ElasticSearch’s. Furthermore, I don’t think Lucid Works can deliver the type of performance available from Searchdaimon or ElasticSearch. The indexing and query processing gap between Lucid Works and Blossom Software is orders of magnitude. How do I know? Well, my team tested Lucid Works’ performance against these systems. Why don’t you know this when you write directly to the person who ran the tests? I sent a copy of the test results to one of Lucid Works’ many presidents.
Do I care about Ms. Lee, the new management team, the investors, or the “new” Lucid?
The sun has begun to set on vendors and their agents who employ meaningless jargon to generate interest from potential licensees.
What’s my recommendation? I suggest a person interested in Lucid navigate to my Search Wizards Speak series and read the Lucid Imagination and Lucid Works interviews. Notice how the story drifts. You can find these interviews at www.arnoldit.com/search-wizards-speak.
Why does Lucid illustrate “pivoting”? It is easy to sit around and dream about what software could do. It is another task to deliver software that matches products and services from industry leaders and consistent innovators.
For open source search, I suggest you pay attention to www.Flax.co.uk, www.Searchdaimon.com, www.sphinxsearch.com, and www.elasticsearch.com for starters. Keep in mind that other competitors like IBM and Attivio use open source search technology too.
You will never have the opportunity to work directly for me. I can offer one small piece of advice: Do your homework before writing about search to me.
Stephen E Arnold, September 21, 2014
September 9, 2014
In early September 2014, Hewlett Packard announced its hackathons. These are designed to “unleash developer creativity.” The hacks will demonstrate the power of [the] IDOL OnDemand platform.
HP has lined up events at DataWeek and API World, Legal Hackers, HackMIT Hackathon, and TCO14. The most interesting comment in the announcement is this statement attributed to the IDOL OnDemand “evangelist”:
IDOL OnDemand is the ideal platform for today’s developer looking to build amazing applications in the mobile, big data world. The hackathons are terrific opportunities for developers to engage with their peers and the IDOL OnDemand platform, and are always a lot of fun too.
For me, the most fun I have is watching Hewlett Packard sling mud at Autonomy, Deloitte, and former Autonomy employees.
The idea informing these hackathons appears to be building apps for HP’s Autonomy IDOL in the cloud initiative. Compared to ElasticSearch, HP is putting quite a bit of effort into this program. ElasticSearch, on the other hand, announces a developer training session and the developers show up.
Perhaps HP’s struggles with IDOL have something to do with one or more of these factors:
- Open source options / alternatives to proprietary information retrieval systems
- HP’s history of management turnover
- HP’s on again and off again approach to certain business initiatives
- The public relations stemming from the Autonomy litigation.
Did I omit a factor or two? Use the comments section to set me straight.
Stephen E Arnold, September 9, 2014