From Search to Sentiment

July 28, 2014

Attivio has placed itself in the news again, this time for scoring a new patent. Virtual-Strategy Magazine declares, “Attivio Awarded Breakthrough Patent for Big Data Sentiment Analysis.” I’m not sure “breakthrough” is completely accurate, but that’s the language of press releases for you. Still, any advance can provide an advantage. The write-up explains that the company:

“… announced it was awarded U.S. Patent No. 8725494 for entity-level sentiment analysis. The patent addresses the market’s need to more accurately analyze, assign and understand customer sentiment within unstructured content where multiple brands and people are referenced and discussed. Most sentiment analysis today is conducted on a broad level to determine, for example, if a review is positive, negative or neutral. The entire entry or document is assigned sentiment uniformly, regardless of whether the feedback contains multiple comments that express a combination of brand and product sentiment.”

I can see how picking up on nuances can lead to a more accurate measurement of market sentiment, though it does seem more like an incremental step than a leap forward. Still, the patent is evidence of Attivio’s continued ascent. Founded in 2007 and headquartered in Massachusetts, Attivio maintains offices around the world. The company’s award-winning Active Intelligence Engine integrates structured and unstructured data, facilitating the translation of that data into useful business insights.

Cynthia Murrell, July 28, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Searchcode Is a Valuable Resource for Developers

July 28, 2014

Here is a useful tool that developers will want to bookmark: searchcode does just what its name suggests—paste in a snippet of code, and it returns real-world examples of its use in context. Great for programming in an unfamiliar language, working to streamline code, or just seeing how other coders have approached a certain function. The site’s About page explains:

“Searchcode is a free source code and documentation search engine. API documentation, code snippets and open source (free software) repositories are indexed and searchable. Most information is presented in such a way that you shouldn’t need to click through, but can if required.”

Searchcode pulls its samples from Github, Bitbucket, Google Code, Codeplex, Sourceforge, and the Fedora Project. There is a way to search using special characters, and users can filter by programming language, repository, or source. The tool is the product of one Sydney-based developer, Ben E. Boyter, and is powered by open-source indexer Sphinx Search. Many, many more technical details about searchcode can be found at Boyter’s blog.

Cynthia Murrell, July 28, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Snowden Effect on Web Search

July 27, 2014

If you are curious about the alleged impact of intercepts and monitoring on search, you will want to read “Government Surveillance and Internet Search Behavior.” You may have to pay to access the document. Here’s a passage I noted:

In the U. S., this was the main subset of search terms that were affected. However, internationally there was also a drop in traffic for search terms that were rated as personally sensitive.

Stephen E Arnold, July 27, 2014

Sponsors of Two Content Marketing Plays

July 27, 2014

I saw some general information about allegedly objective analyses of companies in the search and content processing sector.

The first report comes from the Gartner Group. The company has released its “magic quadrant” which maps companies by various allegedly objective methods into leaders, challengers, niche players, and visionaries.

The most recent analysis includes these companies:

Attivio
BA Insight
Coveo
Dassault Exalead
Exorbyte
Expert System
Google
HP Autonomy IDOL
IBM
HIS
Lucid Works
MarkLogic
Mindbreeze
Perceptive ISYS Search
PolySpot
Recommind
Sinequa

There are several companies in the Gartner pool whose inclusion surprises me. For example, Exorbyte is primarily an eCommerce company with a very low profile in the US compared to Endeca or New Zealand based SLI Systems. Expert System is a company based in Italy. This company provides semantic software which I associated with mobile applications. IHS (International Handling Service) provides technical information and a structured search system. MarkLogic is a company with XML data management software that has landed customers in publishing and the US government. With an equally low profile is Mindbreeze, a home brew search system funded by Microsoft-centric Fabasoft. Dassault Exalead, PolySpot, and Sinequa are French companies offering what I call “information infrastructure.” Search is available, but the approach is digital information plumbing.

The IDC report, also allegedly objective, is sponsored by nine companies. These outfits are:

Attivio
Coveo
Earley & Associates
HP Autonomy IDOL
IBM
IHS
Lexalytics
Sinequa
Smartlogic

This collection of companies is also eclectic. For example, Earley & Associates does indexing training, consulting, and does not have a deep suite of enterprise software. IHS (International Handling Services) appears in the IDC report as a knowledge centric company. I think I understand the concept. Technical information in Extensible Markup Language and a mainframe-style search system allow an engineer to locate a specification or some other technical item like the SU 25. Lexalytics is a sentiment analysis company. I do not consider figuring out if a customer email is happy or sad the same as Coveo’s customer support search system. Smartlogic is interesting because the company provides tools that permit unstructured content to be indexed. Some French vendors call this process “fertilization.” I suppose that for purists, indexing might be just as good a word.

What unifies these two lists are the companies that appear in both allegedly objective studies:

Attivio
Coveo
HP
IBM
IHS (International Handling Service)
Sinequa

My hunch is that the five companies appearing in both lists are in full bore, pedal to the metal marketing mode.

Attivio and Coveo have ingested tens of millions in venture funding. At some point, investors want a return on their money. The positioning of these two companies’ technologies as search and the somewhat unclear knowledge quotient capability suggest that implicit endorsement by mid tier consulting firms will produce sales.

The appearance of HP and IBM on each list is not much of a surprise. The fact that Oracle Endeca is not in either report suggests that Oracle has other marketing fish to fry. Also, Elasticsearch, arguably the game changer in search and content processing, is not in either pool may be evidence that Elasticsearch is too busy to pursue “expert” analysts laboring in the search vineyard. On the other hand, Elasticsearch may have its hands full dealing with demands of developers, prospects, and customers.

IHS has not had a high profile in either search or content processing. The fact that International Handling Services appears signals that the company wants to market its mainframe style and XML capable system to a broader market. Sinequa appears comfortable with putting forth its infrastructure system as both search and a knowledge engine.

I have not seen the full reports from either mid tier consulting firm. My initial impression of the companies referenced in the promotional material for these recent studies is that lead generation is the hoped for outcome of inclusion.

Other observations I noted include:

  1. The need to generate leads and make sales is putting multi-company reports back on the marketing agenda. The revenue from these reports will be welcomed at IDC and Gartner I expect. The vendors who are on the hook for millions in venture funding are hopeful that inclusion in these reports will shake the money trees from Boston to Paris.
  2. The language used to differentiate and describe the companies referenced in these two studies is unlikely to clarify the differences between similar companies or make clear the similarities. From my point of view, there are few similarities among the companies referenced in the marketing collateral for the IDC and Gartner study.
  3. The message of the two reports appears to be “these companies are important.” My thought is that because IDC and Gartner assume their brand conveys a halo of excellence, the companies in these reports are, therefore, excellent in some way.

Net net: Enterprise search and content processing has a hurdle to get over: Search means Google. The companies in these reports have to explain why Google is not the de facto choice for enterprise search and then explain how a particular vendor’s search system is better, faster, cheaper, etc.

For me, a marketer or search “expert” can easily stretch search to various buzzwords. For some executives, customer support is not search. Customer support uses search. Sentiment analysis is not search. Sentiment analysis is a signal for marketers or call center managers. Semantics for mobile phones, indexing for SharePoint content, and search for a technical data sheet are quite different from eCommerce, business intelligence, and business process engineering.

A fruit cake is a specific type of cake. Each search and content processing system is distinct and, in my opinion, not easily fused into the calorie rich confection. A collection of systems is a lumber room stuffed with different objects that don’t have another place in a household.

The reports seem to make clear that no one in the mid tier consulting firms or the search companies knows exactly how to position, explain, and verify that content processing is the next big thing. Is it?

Maybe a Google Search Appliance is the safe choice? IBM Watson does recipes, and HP Autonomy connotes high profile corporate disputes.

Elasticsearch, anyone?

Stephen E Arnold, July 27, 2014

Honk Tracks Search Marketing Memes

July 26, 2014

The Honk page for Beyond Search now tracks information retrieval marketing memes. The information at http://bit.ly/1uqWxfA now includes a discussion of a coinage designed to sell “search” without using the word “search.” Is the approach likely to reverse the fortunes of search vendors who face increasingly intense uphill battles to generate substantive revenue? The Honk “Meme of the Moment” updates will keep you posted.

Stephen E Arnold, July 26, 2014

IDC and Reports by Schubmehl

July 25, 2014

I wanted to nail down a handful of facts.

First, IDC published without a contract four reports in 2012. These reports were disseminated via the IDC Web site, various communications, and via Amazon’s eCommerce site. These reports were:

  • Attivio 236514
  • Elasticsearch 237410
  • Lucid Imagination / Works 236086
  • Polyspot 236511

Each report was $3,500. One report about Attivio was sold on Amazon until July 17, 2014.

image

image

image

image

The “authors” of these IDC reports included:

  • Susan Feldman, a former IDC professional positioned as a “search expert”. Only Attivio.
  • David Schubmehl, a former OpenText and Janya (no longer in business) “professional” and heir to Ms Feldman as IDC’s search expert who has jumped from my Attivio information into a consulting relationship with that company founded by former Fast Search & Transfer executives. See this link. Dave Schubmehl’s name appears on each of the four published reports using information from my team and me.
  • Constance Ard, MLS, who was at this time the coordinator of my research projects
  • Dr. Tyra Oldham, one of the 2012 members of my research team
  • Stephen E Arnold, me. I have pointed to a biography on my Web site set up to promote the deal I had with IDC to publish an open source search monograph containing profiles of more than a dozen companies in 2012.

So what’s the big deal? Let me highlight the reason I will be taking a look at some of the IDC expertise in the future.

First, Ms. Feldman and I worked on projects that originated at Manning & Napier, then an investment services company. I was happy to cooperate with her when she joined IDC as the head of the IDC search practice. However, under circumstances I don’t understand, Ms. Feldman left IDC and the area of her responsibility was snagged by David Schubmehl. Without Ms. Feldman at IDC, I made numerous efforts to get a contract, get information about sales, find out where the 13 profiles provided by me and my team were at IDC, and, of course, get paid. Ms. Feldman made administrative procedures work. Mr. Schubmehl took a different approach from where I sat.

Second, Mr. Schubmehl made certain his name appeared on the reports published and sold by IDC without written permission from me to use my material or to stake a claim on the work. Furthermore, the source material we provided contained certain information that was in 2012 not widely known. Significant information about the companies we analyzed were not included in the IDC reports. As a result, the IDC reports using my material were not in line with my thinking. One example of Mr. Schubmehl’s thinking is this tweet:

image

According to LinkedIn, IDC’s analyst profile, and various biographies charting his work career, he is an expert in Enterprise Search, Text Analytics, Customer Relations, Consultancy, Document Management, Enterprise Content Management, Business Intelligence, Information Management, Intellectual Property, Litigation Support, Enterprise Software, SaaS, Product Management, Cloud Computing, Analytics, Go-to-market Strategy, Knowledge Management, Software Development, and Enterprise Architecture. This impressive list begs one simple question: “If one is so expert, why is it necessary to use without permission and payment the work of others?”

Third, my attorney sought information about sales and finally pressed IDC to stop selling reports with my name and David Schubmehl’s on them. One fix was for IDC to roll Lucid information into a separate report. IDC stopped selling the four reports identified above in early 2014. IDC continued to sell the Attivio report on Amazon until July 17, 2014. IDC is no longer selling reports with my name on them. This is a modest victory, but it leaves the question hanging, “What motivates a large and presumably well regarded consulting firm to trample over basic business procedures?” I don’t have an answer. I do believe IDC is perhaps not quite so confident of its “experts’” expertise, particularly with regard to search and content processing.

Net net: IDC used my name without my permission. IDC published my research material without issuing a contract for work for hire. IDC took possession of detailed, high value information and permitted that information to “inform” David Schubmehl to further his impact as a sales person and “expert” at IDC like Mr. Schubmehl, a “long suffering Buffalo Bills fan and reformed youth soccer referee.”

The next time you read an IDC report, please, consider these questions:

  1. Who is the “expert”? The contributors or the IDC person who piggybacks on the names of others with particular expertise?
  2. Is $3,500 for a rehash of other people’s work a wise use of scarce resources?
  3. Why does a large company fail to follow standard business practices such as issuing contracts, observing contracts, providing sales reports, and compensating those who actually performed original work?

Stephen E Arnold, July 25, 2014

Microsoft Puts Machine Learning in the Cloud

July 25, 2014

Machine learning is ascending to the cloud. The Register asks, “Do Data Centers Dream of Electric Sheep? Microsoft Announces Machine Learning Cloud.” As competition in the world of SaaS and remote hosting continues to escalate, this move may set Microsoft ahead of Amazon and Google (for now). Our question—will this progress rub off on Bing? One can hope.

Writer Jack Clark tells us:

“The company’s new ‘Azure ML’ service was announced on Monday and means developers can access machine learning systems hosted in the Azure cloud and even link their applications directly to them. The tech gives developers a directory of machine learning and associated technologies, including deep learning systems, that they can apply to their applications…

“Azure ML also has ‘a number of tools to help clean data,’ explained Microsoft exec Joseph Sirosh in a chat with El Reg, and has compatibility with popular mathematical software R. The service also gives users a way to drag-and-drop various machine learning technologies together so that they can build an application in a visually striking and understandable way.”

It is interesting to note that Sirosh spent nearly ten years working with (among other things) Amazon’s internal machine learning systems during his stint at that company. Though machine learning itself is nothing new, Microsoft hopes Azure ML will make it more accessible, and tempting, to developers. Likening this advance to the birth of the cloud itself, Sirosh enthuses, “Machine learning is an incredibly underutilized capability—every app around us could be becoming intelligent. I would love to have the excitement around machine learning be unleashed.”

Cynthia Murrell, July 25, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

PetMatch for iOS Finds Furry Friends

July 25, 2014

A new image-based search tool can take some of the research out of adopting a pet. Lifehacker turns our attention to the free iOS app in, “PetMatch Searches for an Adoptable Pet Based on Appearance.” Now, pet lovers who see their perfect pet on the street can take a picture and find local doppelgangers in need of homes. Perhaps this will help lower dog-napping rates. Reporter Dave Greenbaum notes:

“You should never adopt an animal solely based on looks, of course—you should research the personality of the breed you want—but looks are a factor. This app works great for mixed breed dogs when you aren’t sure what kind of dog you are looking at. I like the fact it will look at local adoption agencies to find a match, too. Online services like Petfinder.com help you find local pets to adopt, but you have to know which breed you are looking for first, and searching for mixed breed dogs (common at shelters) is difficult. This app makes it easy to do a reverse image search and do your research based on the results.”

Another point to note is that PetMatch includes a gallery of dog and cat breeds, so if the picture is in your head instead of your phone, you can still search for a look-alike. It’s a clever idea, and an innovative use of image search functionality.

Cynthia Murrell, July 25, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Facebook Factoids

July 24, 2014

I enjoyed “Facebook Now Has 1.32 Billion Users, with 30 Percent Only Using It on Their Mobile—and the Average American Spends 40 minutes a Day on the Site.” Why read anything except the headline?

Tucked into this sensational write up was this gem:

The number of people who log in at least once a day on mobile devices was 654 million on average in June, up 14 percent from a year earlier.

The reason I highlighted this item is that it focuses on the problem Facebook poses to Google. Not only do hundreds of millions of people use Facebook on a notebook or desktop computer, Facebook’s mobile cohort is growing.

So what?

Facebook users willingly create content for Facebook. Facebook users willingly provide information to Facebook. Facebook captures implicit data from user behaviors.

Advertisers like this situation.

If Facebook keeps expanding its mobile advertising paw print, the Google will have to find a way to counter or take some steps that might be unpalatable to some folks.

In short, the headline tells a story, but the pressure on Google is not front and center where it should be. And search? Who thinks about that for either Facebook or Google? (My tiny voice says, “I do.”)

Stephen E Arnold, July 24, 2014

What SEO, Google, and Marketing Hath Wrought

July 24, 2014

Myths and Misreporting About Malaysia Airlines Flight 17” is an interesting article. I found the examples of misinformation, disinformation, and reformation thought provoking. The write up spotlights a few examples of fake or distorted information about an airline’s doomed flight.

As i considered the article and its appearance in a number of news alerting services, I shifted from the cleverness of the content to a larger and more interesting issue. From the revelations about software that can alter inputs to an online survey (see this link) to fake out “real” news, determining what’s sort of accurate from what’s totally bogus is becoming more and more difficult. I have professional researchers, librarians, and paralegals at my disposal. Most people do not. No longer surprising to me is the email from one of the editors working to fact check my for fee columns. The questions range from “Did IBM Watson invent a recipe with tamarind in its sauce?” to “Do you have a source for the purchase price of Vivisimo?” Now I include online links for the facts and let the editors look up my source without the intermediating email. Even then, there is a sense of wonderment when an editor expresses surmise that what he or she believed is, in fact, either partially true, bogus, or unexpected. Example: “Why do French search vendors feel compelled to throw themselves at the US market despite the historically low success rates?” The answer is anchored in [a] French tax regulations, [b] French culture, particularly when a scruffy entrepreneur from the wrong side of the educational tracks tries to connect with a French money source from the right side of the educational tracks, [c] the lousy financial environment for certain high technology endeavors, and [d] selling to the big US markets looks like a slam dunk, at least for a while.

ww1 fixed copy

The reason for the disconnect between factoids and information manipulation boils down to a handful of factors. Let me highlight several:

First, the need for traffic to Web sites (desktop, mobile, app instances, etc.) is climbing up the hierarchy of business / personal needs. You want traffic today? The choices are limited. Pay Google $25,000 or more a month. Pay an SEO (search engine optimization “expert” whatever you can negotiate. Create content, do traditional marketing, and trust that the traffic follows the “if you build it they will come” pipedream. Most folks just whack at getting traffic and use increasingly SEOized headlines as a low cost way of attracting attention. Think headlines from the National Enquirer in the 1980s.

Second, Google has to pump lots of money into plumbing, infrastructure, moon shots, operational costs  (three months at the Stanford Psych unit, anyone?) At the same time, mobile is getting hot. Two problems plague the sunny world of the GOOG. [a] Revenue from mobile ads is less than from traditional ads. Therefore, Google has to find a way to keep that 2006 style revenue flowing. Because there is a systemic shift, the GOOG needs money. One way to get it is to think about Adwords as a machine that needs tweaking. How does one sell Adwords to those who do not buy enough today? You ponder the question, but it involves traffic to a Web site. [b] Google gets bigger so the “think cheap” days of yore are easier to talk about than deliver. A 15 year old company is getting more and more expensive to run. The upcoming battles with Amazon and Samsung will not be cheap. The housing developments, the Loon balloons, and the jet fleet, smart people, and other oddments of the company—money pits. If the British government can fiddle traffic, is it possible that others have this capability too?

Third, marketing, an easy whipping boy or girl as the case may be. After spending lots and lots on Web sites and apps, some outfits’ CFOs are asking, “What do we get for this spending?” In order to “prove” their worth and stop the whipping, marketers have kicked into overdrive. Baloney, specious, half baked, crazy, and recycled content is generated by the terabyte drive. The old fashioned ideas about verification, accuracy, and provenance are kicked to the side of the road.

Net net: running a query on a search engine, accepting the veracity of a long form article, or just finding out what happened at an event is very difficult. The fixes are not palatable to some people. Others are content to believe that their Internet or Internet search engine dispenses wisdom like the oracle at Delphi. Who knew the “oracles” relied on confusing entrances, various substances, and stage tricks to get their story across.

image

We now consult digital Delphis. How is that working out when you search for information to address a business problem, find a person who can use finger manipulation to relax a horse’s muscle, or determine if a company is what its Web site says it is?

Stephen E Arnold, July 24, 2014

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta