Quality and Text Processing: An Old Couple Still at the Alter

August 6, 2015

I read “Why Quality Management Needs Text Analytics.” I learned:

To analyze customer quality complaints to find the most common complaints and steer the production or service process accordingly can be a very tedious job. It takes time and resources.

This idea is similar to the one expressed by Ronen Feldman in a presentation he gave in the early 2000s. My notes of the event record that he reviewed the application of ClearForest technology to reports from automobile service professionals which presented customer comments and data about repairs. ClearForest’s system was able to pinpoint that a particular mechanical issue was emerging. The client responded to the signals from the ClearForest system and took remediating action. The point was that sometime in the early 2000s, ClearForest had built and deployed a text analytics system with a quality-centric capability.

I mention this point because many companies are recycling ideas and concepts which are in some cases long beards. ClearForest was acquired by the estimable Thomson Reuters. Some of the technology is available as open source at Calais.

In search and content processing, the case examples, the lingo, and even the technology has entered what I call its “recycling” phase.

I learned about several new search systems this week. I looked at each. One was a portal, another a metasearch system, and a third a privacy centric system with a somewhat modest index. Each was presented as new, revolutionary, and innovative. The reality is that today’s information highways are manufactured from recycled plastic bottles.

Stephen E Arnold, August 6, 2015

Microsoft Nudges English to Ideographs

May 5, 2015

Short honk: In my college days, I studied with a fellow who was the world’s expert in the morpheme burger. You are familiar with hamburger. Lev Soudek (I believe this was his name) set out to catalog every use of –burger he could find. Dr. Soudek was convinced that words had a future.

He is probably pondering the rise of ideographs like emoji. For insiders, a pictograph can be worth a thousand words. I suppose the morpheme burger is important to the emergence of the hamburger icon like this:

image

Microsoft is pushing into new territory according to “Microsoft Is First to Let You Flip the Middle Finger Emoji.” Attensity, Smartlogic, and other content processing systems will be quick to adapt. The new Microsoft is a pioneering outfit.

Is it possible to combine the hamburger icon with the middle finger emoji to convey a message without words.

Dr. Soudek, what do you think?

image image

What about this alternative?

image image

How would one express this thought? Modern language? Classy!

Stephen E Arnold, May 5, 2015

The Challenge of Synonyms

April 12, 2015

I am okay with automated text processing systems. The challenge is for software to keep pace with the words and phrases that questionable or bad actors use to communication. The marketing baloney cranked out by vendors suggests that synonyms are not a problem. I don’t agree. I think that words used to reference a subject can fool smart software and some humans as well. For an example of the challenge, navigate to “The Euphemisms People Use to Pay Their Drug Dealer in Public on Venmo.” The write up presents some of the synonyms for controlled substances; for example:

  • Kale salad thanks
  • Columbia in the 1980s
  • Road trip groceries
  • Sanity 2.0
  • 10 lbs of sugar

The synonym I found interesting was an emoji, which most search and content processing systems cannot “understand.”

image

and

image

Attensity asserts that it can “understand” emojis. Sure, if there is a look up list hard wired to a meaning. What happens if the actor changes the emoji? Like other text processing systems, the smart software may become less adept than the marketers state.

But why rain on the hype parade and remind you that search is difficult? Moving on.

Stephen E Arnold, April 12, 2015

Hidden Data In Big Data

December 15, 2014

Did you know that there was hidden data in big data? Okay, that makes a little sense given that big data software is designed to find the hidden trends and patterns, but RC Wireless’ “Discovering Big Data Unknowns” article points out that there is even more data left unexplored. Why? Because people are only searching in the known areas. What about the unknown areas?

The article focuses on Katherine Matsumoto of Attensity and how she uses natural language processing to “social listen” in these grey areas. Attensity is a company that specializes in natural language processing analytics to understand the content around unstructured data—big data white noise. Attensity views the Internet as the world’s largest consumer focus group and they help their clients’ consumerism habits. The new Attensity Q platform enables users to identify these patterns in real time with and detect big data unknowns.

“The company’s platform combines sentiment and trend analysis with geospatial information and information on trend influencers, and said its approach of analyzing the conversations around emerging trends enables it to act as an “early warning” system for market shifts.”

The biggest problem Attensity faces is filtering out spam and understanding the data’s context. Finding the context is the main way social data can be harnessed for companies.

Scooping out the white noise for the useful information is a hard job. Can the same technology be applied to online ads to filter out the scams from legitimate ones?

Whitney Grace, December 15, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Chiliad Offline: A Precursor for Other BI Outfits

October 13, 2014

According to PacerMonitor, Chiliad, Inc. filed for bankruptcy on August 6, 2014. As you may recall, the company was a Washington, DC area analytics firm founded by Christine Maxwell of McKinley Group and Magellan fame. (Magellan became part of Excite, which also faded away.)

About two years ago, Beyond Search wrote about Chiliad and its big rocks. Also, in 2012, the company named Craig Norris, as chief executive officer. Mr. Norris (an industry leader according to Reuters)  had been the CEO of Attensity, sentiment analysis outfit, which has experienced its share of strong headwinds. In the news release about his appointment, he said:

“I am excited to be joining Chiliad at an important stage in its growth. What makes or breaks an analytics company is the quality and usability of its core technology. Chiliad’s offering has proven its ability to extract critical findings from data at massive scale for both Government and Commercial customers. I am eager to see us gain recognition for our technology leadership.”

The news release included assertions by Patrick Gross (Chairman of the Chiliad board of directors) that I have encountered many times in the last five years; to wit:

“Chiliad has already solved two very challenging problems. The first is the ability to rapidly search data collections at greater scale than any other offering in the market. The second is to allow search formulation and analysis in natural language. This means that no longer is an elite class of analysts required in order to generate meaningful results, thus reducing the personnel training and skills shortages that plague alternative solutions and put timely discovery at risk. The explosion of ‘Big Data’ is real and valuable findings are buried in vast collections for both enterprises and governments. Chiliad has the opportunity to integrate its innovative, massively scalable solutions with emerging open source software to build customized solutions for the largest-scale clients.”

Businessweek described the company in this way:

Chiliad, Inc. provides data analysis solutions for various clouds, agencies, departments, and other stovepipes. The company offers Discovery/Alert, a platform that enables investigators, business analysts, and knowledge workers to securely reach, find, analyze, and continuously stay on top of big data—whether structured or unstructured, and classified or unclassified. Its software solutions include Iterative Discovery cycle that allows analysts and researchers to reach various content silos, find what matters, analyze it to find meaning from the information relationships presented and continuously monitor changes; and Architecture, a virtual consolidated data center that enables multidimensional analysis and ranking. It serves government/intelligence, law enforcement, healthcare, pharmaceutical, insurance, and other markets. Chiliad, Inc. was founded in 1998 and is headquartered in Herndon, Virginia.

I have highlighted the buzzwords that were designed to generate sales leads and revenue. I can only assume that the verbiage and the Attensity management touch fell short of the mark. How many of the “analytics” and “business intelligence” companies will follow Chiliad’s path? Good question but I keep asking it.

Stephen E Arnold, October 12, 2014

Huge Bets on Search: Spreadsheet Fever Rages

June 11, 2014

The news of the $70 million injected into Elasticsearch caused me to check out Crunchbase and some other sources of funding data. I looked at a handful of search and content processing vendors in the departures lounge. I am supposed to be retired, but Zurich beckons.

How large is the market for search and content processing software and services. As a former laborer in the vineyards of Halliburton Nuclear and Booz, Allen & Hamilton, the answer is, “You can charge as much as you want when the customer is in a corner.” The flipside of this adage is, “You can’t charge as much when there are many low cost options.”

In my view, search—regardless of the window dressing slapped on decades old systems and methods—is sort of yesterday. One of the goslings posted a list of Hewlett Packard’s verbal arabesques to explain IDOL search as everything EXCEPT search. The HP verbal arabesques make my point:

Search is not going to generate big money going forward.

Is search (regardless of the words used to describe it) a money pit like as the Tom Hanks’ motion picture made vivid?

For that reason, I am wondering what investors are thinking as they pump money into search and content processing companies. The largest revenue generator in the search sector is either Google or Autonomy. Google, as you may know, is in the online advertising business. Search is a Trojan horse. Search is free and the clicks trigger the GoTo/Overture mechanism that caused Google’s moment of inspiration. Before the Google IPO, Google ponied up some dough to Yahoo regarding alleged borrowing of pay to play methods.

Autonomy focused on the enterprise. Between 1996 and October 2011, Sir Michael Lynch grew the company to about $1 billion in revenues. HP’s prescient and always interesting management paid $10.3 billion for Autonomy and then wrote off $8 billion, aimed allegations at Autonomy at the company, and, in general, made it clear that HP was essentially a printer ink business with what seems to be great faith in IDOL, DRE, and assorted rich media tools.

More recently, IBM, the subject of an entertaining analysis The Decline and Fall of IBM by Robert X. Cringely suggested that Watson would grow to be a $10 billion in revenue business. Not a goal to ignore. The fact that Watson is a collection of home grown widgets and open source search technology. I think Watson’s last search contribution was creating a recipe for a tamarind flavored sauce. IBM is probably staffed with folks smarter than I. But a billion dollar bet with a goal of building a revenue stream 10 to 12 times greater than Autonomy’s in one third the time. Wowza.

Let’s do some simple addition in the elegant United lounge.

Let’s assume that IBM and HP actually generate the billions necessary to recover the cost of IDOL and hit the crazy IBM goal of $10 billion in four or five years. To make the math simple, skip interest, the cost of assuaging stakeholders, and the money needed to close deals that total $20 to $25 billion. HP pumps up Autonomy to $10 or $11 billion and IBM tallies another $10 to $12 billion.

So, HP and IBM need or want to build $10 billion or more in revenues from their respective search and content processing ventures. I estimated that the market for “search” was about $1.3 billion in 2006. I am not too sure that market has grown by a significant factor since the economic headwinds began blowing through carpetland.

Now consider the monies invested in some search and content processing companies.

Attensity (sentiment analysis), $90 million

BA Insight (Microsoft centric, search and business intelligence), $14.5 million

Content Analyst (text analysis, SAIC technology, $7.0 million

Coveo (originally all Microsoft all the time, now kitchen sink vendor), $34.7 million

Digital Reasoning (text analysis, no shipping product), $4.2 million

EasyAsk (natural language processing, several owners(, $20 million

Elasticsearch (open source search and  consulting), $104 million

Hakia (semantic search), $23.5 million

MarkLogic (XML data management and kitchen sink apps), $73.6 million

Recorded Future (text analysis of Web content), $20.9 million

Recommind (similar to Autonomy method), $15 million

Sinequa (proprietary search and widgets), $5.3 million

X1 (search and new management), $12.2 million

ZyLab (search and licensed visualizations), $2.4 million

Read more

Semantria and Diffbot: Clever Way to Forge a Tie Up

December 12, 2013

Short honk. I came across an interesting marketing concept in “Diffbot and Semantria Join to Find and Parse the Important Text on the ‘Net (Exclusive).”

Semantria (a company that offers sentiment analysis as a service) participated in a hackathon in San Francisco. The explains:

To make the Semantria service work quickly, even for text-mining novices, Rogynskyy’s team decided to build a plugin for Microsoft’s popular Excel spreadsheet program. The data in a spreadsheet goes to the cloud for processing, and Semantria sends back analysis in Excel format.

Semantria sponsored a prize for the best app. Diffbot won:

A Diffbot developer built a simple plugin for Google’s Chrome browser that changes the background color of messages on Facebook and Twitter based on sentiment — red for negative, green for positive. The concept won a prize from Semantria, Rogynskyy said. A Diffbot executive was on hand at the hackathon, and Rogynskyy started talking with him about how the two companies could work together.

I like the “sponsor”, “winner” and “team up” approach. The pay off, according to the article, is “While Semantria and Diffbot technologies continue to be available separately, they can now be used together.”

Sentiment analysis is one of the search submarkets that caught fire and then, based on the churning at some firms like Attensity, may be losing some momentum. Marketing innovation may be a goal other firms offering this functionality in 2014.

Stephen E Arnold, December 12, 2013

Try TextBlob for Sentiment Analysis

November 19, 2013

Sad to say, we have heard rumblings about severe disappointment with Attensity-type and Lexalytics-type sentiment applications. If you want to kick some tires in this interesting search niche, look instead to the open source application TextBlob. OpenShift points out this resource in, “Day 9: TextBlog—Finding Sentiments in Text.” The article is one in an ambitious series by writer Shekhar Gulati, who challenged himself to master one technology a day for a month. Very admirable, sir!

Gulati begins with his experience with sentiment analysis:

“My interest in sentiment analysis is few years old when I wanted to write an application which will process a stream of tweets about a movie, and then output the overall sentiment about the movie. Having this information would help me decide if I wanted to watch a particular movie or not.

“I googled around, and found that Naive Bayes classifier can be used to solve this problem. The only programming language that I knew at the time was Java, so I wrote a custom implementation and used the application for some time. I was lazy to commit the code, so when my machine crashed, I lost the code and application. Now I commit all my code to github, and I have close to 200 public repositories 🙂

“In this blog, I will talk about a Python package called TextBlob which can help developers solve this problem. We will first cover some basics, and then we will develop a simple Flask application which will use the TextBlob API.”

The post does indeed cover the basics, including the installation of Python and virtualenv before we can get going with TextBlob. It then takes us through writing an example application and deploying to the cloud. As he notes above, Gulati has his code safe and sound at Github; the code for this example are posted here, and the js and css files can be found here.

Cynthia Murrell, November 19, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Best Social Monitoring Tool Depends on Who Is Asking

August 28, 2013

Confused about social media monitoring? A thread at Quora, “Which Are the Best Social Media Monitoring Tools?” suggests that like search, social media monitoring is pretty tricky. The overall consensus statement makes it clear there is no simple answer: “No overall best tool. Pick the best fit for your needs.” Hmm.

Several respondents share their thoughts. One had compared Radian6 and Sysomos, and found the latter much easier to use. Another liked Engagor for its low price point. Perhaps the most comprehensive (though admittedly promotional) answer comes from Web Liquid account executive Ben Semmar, who shares:

“[. . .] Over the past couple of months, I’ve been involved in the creation of a Social Media Monitoring Buyer’s Guide. We began with a list of over 40 vendors, and based on a variety of criteria, whittled it down to a list five ‘finalists’ that we then conducted hands-on trials with. We found that some tools perform better than other tools in certain areas (but, really, doesn’t everything?) and so we don’t proclaim one tool king of them all; suffice it to say, though, that the five tools we tested are, based on our experience with and objective evaluation of the market, the best out there. You can find the study here: http://www.webliquidgroup.com/social-media-monitoring-tool-buyers-guide.”

Note that the guide he mentions is free, but requires a name and email address to view. Semmar goes on to assert one important caveat: We have not reached the point where algorithms can make reliable judgments about which insights a business should focus on, and how to use them. Though quality monitoring software can be a useful tool, the human mind is still required to wield it. (For now.)

Cynthia Murrell, August 28, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

News Channels Are Not Dying They Are Expanding

August 13, 2013

If you do not read enough about customer relationship management (also known as search) at Arnold IT, you might find Paper.li interesting. Paper.li is an aggregation service that focuses on the above topic, plus many more that makes the subject matter wander. Take a look and not only will you see stories about IT, business, and entertainment, but there are some items that look like they have been pulled from Reddit.

How does Paper.li work? Its approach is similar to an old-fashioned newspaper with customable options on your topic of choice:

“The key to a great newspaper is a great newsroom. The Paper.li platform gives you access to an ever-expanding universe of articles, blog posts, and rich media content. Paper.li automatically processes more than 250 million social media posts per day, extracting & analyzing over 25 million articles. Only Paper.li lets you tap into this powerful media flow to find exactly what you need, and publish it easily on your own online newspaper.”

Aggregation services are gaining more prominence, especially since the Google Reader went kaput. Also another factor is that people do not enjoy having to sift through search results. There is definitely a market, so why not find a sponsor. Perhaps Attentisity? Let us know if you know the funding agent.

Whitney Grace, August 13, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta