Twenty Electric Text Analytics Platforms

March 11, 2014

Butler Analytics collected a list of “20+ Text Analytics Platforms” that delve through the variety of text analytics platforms available and what their capabilities are. According to the list, text analytics has not reached its full maturity yet. There are three main divisions in the area: natural language processing, text mining, and machine learning. Each is distinct and each company has their own approach to using these processes:

“Some suppliers have applied text analytics to very specific business problems, usually centering on customer data and sentiment analysis. This is an evolving field and the next few years should see significant progress. Other suppliers provide NLP based technologies so that documents can be categorized and meaning extracted from them. Text mining platforms are a more recent phenomenon and provide a mechanism to discover patterns that might be used in operational activities. Text is used to generate extra features which might be added to structured data for more accurate pattern discovery. There is of course overlap and most suppliers provide a mixture of capabilities. Finally we should not forget information retrieval, more often branded as enterprise search technology, where the aim is simply to provide a means of discovering and accessing data that are relevant to a particular query. This is a separate topic to a large extent, although again there is overlap.”

Reading through the list shows the variety of options users have when it comes to text analytics. There does not appear to be a right or wrong way, but will the diverse offerings eventually funnel

down to few fully capable platforms?

Whitney Grace, March 11, 2014
Sponsored by, developer of Augmentext

Getting a Failing Grade in Artificial Intelligence: Watson and Siri

February 12, 2014

I read “Gödel, Escher, Bach: An Eternal Golden Braid” in 1999 or 2000. My reaction was, “I am glad I did not have Dr. Douglas R. Hofstadter critiquing my lame work for the PhD program at my university. Dr. Hofstadter’s intellect intimidated me. I had to look up “Bach” because I knew zero about the procreative composer of organ music. (Heh, heh)

Imagine my surprise when I read “Why Watson and Siri Are Not Real AI” in Popular Mechanics magazine. Popular Mechanics is not my first choice as an information source for analysis of artificial intelligence and related disciplines. Popular Mechanics explains saws, automobiles, and gadgets.

But there was the story, illustration with one of those bluish Jeopardy Watson photographs. The write up is meaty because Popular Mechanics asked Dr. Hofstadter questions and presented his answers. No equations. No arcane references. No intimidating the fat, ugly grad student.

The point of the write up is probably not one that IBM and Apple will like. Dr. Hofstadter does not see the “artificial intelligence” in Watson and Siri as “thinking machines.” (I share this view along with DARPA, I believe.)

Here’s a snippet of the Watson analysis:

Watson is basically a text search algorithm connected to a database just like Google search. It doesn’t understand what it’s reading. In fact, read is the wrong word. It’s not reading anything because it’s not comprehending anything. Watson is finding text without having a clue as to what the text means. In that sense, there’s no intelligence there. It’s clever, it’s impressive, but it’s absolutely vacuous.

I had to look up vacuous. It means, according to the Google “define” function: “having or showing a lack of thought or intelligence; mindless.” Okay, mindless. Isn’t IBM going to build a multi-billion dollar a year business on Watson’s technology? Isn’t IBM delivering a landslide business to the snack shops adjacent its new Watson offices in Manhattan? Isn’t Watson saving lives in Africa?

The interview uses a number of other interesting words; for example:

  • Hype
  • Silliest
  • Shambles
  • Slippery
  • Profits

Yet my favorite is the aforementioned—vacuous.

Please, read the interview in its entirety. I am not sure it will blunt the IBM and Apple PR machines, but kudos to Popular Mechanics. Now if the azure chip consultants, the failed Webmasters turned search experts, and the MBA pitch people would shift from hyperbole to reality, some clarity would return to the discussion of information retrieval.

Stephen E Arnold, February 11, 2014

From Scanning to eDiscovery to Fraud Triangle Analytics

February 9, 2014

Search and content processing vendors are innovating for 2014. The shift from a back office function like scanning to searching and then “solutions” is a familar path for companies engaged in information retrieval.

I read a 38 page white paper explaining a new angle—fraud triangle analytics. You can get a copy of the explanation by navigating to and going through the registration process.

The ZyLab concept is that three factors usually surface when fraud exists. These are a payoff, an opportunity, and “ the mindset of the fraudster that  justifies them to commit fraud.”

ZyLab’s system uses content analytics, discovery, sentiment analysis, metatagging, faceted search, and visualization to help the analyst chase down the likelihood of fraud. ZyLab weaves in the go-to functions for attorneys from its system. Four case examples are provided, including the Enron matter.

Unlike some search vendors, ZyLab is focusing on a niche. Law enforcement is a market that a number of companies are pursuing. A number of firms offer similar tools, and the competition in this sector is increasing. IBM, for example, has products that perform or can be configured to perform in a somewhat similar manner.

IBM has the i2 product and may be in the process of acquiring a company that adds dramatic fraud detection functionality to the i2 product. This rumored acquisition adds content acquisition different from traditional credit card statements and open source content (little data or big data forms).

As some commercial markets for traditional search and content processing, some vendors are embracing the infrastructure or framework approach. This is a good idea, and it is one that has been evident since the days of Fulcrum Technologies’ launch and TeraText’s infrastructure system. Both date from the 1980s. (My free analysis of the important TeraText system will appear be available on the Web site at the end of this month.)

At ZyLab, search is still important, but it is now a blended set of software package with the FTA notion. As the world shifts to apps and predictive methods, it is interesting to watch the re-emergence of approaches popular with vendors little known by some today.

Stephen E Arnold, February 9, 2014

Government Buys into Text Analytics

February 7, 2014

What do you make of this headline from All Analytics: “Text And The City: Municipalities Discover Text Analytics”? Businesses have been using text mining software for awhile and understand the insights it can deliver to business decisions. The same goes for law firms that must wade through piles of litigation. Are governments really only catching onto text mining software now?

The article reports on several examples where municipal governments have employed text mining and analytics. Law enforcement agencies are using it to identify key concepts to deliver quick information to officials. The 311 systems, known as the source of local information and immediate contact with services, is another system that can benefit from text analytics, because it can organize and process the information faster and more consistently.

There are many ways text analytics can be helpful to local governments:

“Identifying root causes is a unique value proposition for text analytics in government. It’s one thing to know something happened — a crime, a missed garbage collection, a school expulsion — and another to understand where the problem started. Conventional data often lacks clues about causes, but text reveals a lot.”

The bigger question is will local governments spend the money on these systems? Perhaps, but analytic software is expensive and governments are pressured to find low-cost solutions. Expertise and money are in short supply on this issue.

Whitney Grace, February 07, 2014

Sponsored by, developer of Augmentext

Business Intelligence: Free Pressures For Fee Solutions

December 14, 2013

I read “KB Crawl sort la tête de l’eau,” published by 01Business. The hook for the article is that KB Crawl, a company harvesting Internet content for business intelligence analyses, has emerged from bankruptcy. Good news for KB Crawl, whose parent company is reported to be KB Intelligence.

The write up contained related interesting information.

First, the article points out that business intelligence services like KB Crawl are perceived as costs, not revenue producers. If this is accurate, the same problem may be holding back once promising US vendors like Digital Reasoning and Ikanow, among others.

Second, the article seems to suggest that for fee business intelligence services are in direct competition with free services like Google. Although Google’s focus on ads continues to have an impact on the relevance of the Google results, users may be comfortable with information provided by free services. Will the same preference for free impact the US business intelligence sector?

Third, the article identifies a vendor (Ixxo) as facing some financial headwinds, writing:

D’autres éditeurs du secteur connaissent des difficultés, comme Ixxo, éditeur de la solution Squido.

But the most useful information in the story is the list of companies that compete with KB Crawl. Some of the firms are:

  • AMI Software.  This company has roots in enterprise search and touts 1500 customers
  • Data Observer. The company is a tie up between Asapspot and Data-Deliver. The firm offers “an all-encompassing Internet monitoring and e-reputation services company.”
  • Digimind. The firm makes sense of social media.
  • Eplica. A possible reference to a San Diego employment services firm.
  • iScop. Unknown.
  • Ixxo. The firm “develops innovative software applications to boost business responsiveness when faced with unstructured data.”
  • Pikko. A visualization company.
  • Qwam. Another “content intelligence” company.
  • SindUp. The company offers a monitoring platform for strategic and e reputation information.
  • Spotter. A company that provides the “power to understand.”
  • Synthesio. The company says, “We help brands and agencies find valuable social insights to drive real business value.”
  • TrendyBuzz. The company lets a client measure “Internet visibility units.”

My view is that 01Busienss may be identifying a fundamental problem in the for fee business intelligence, open source harvesting, and competitive intelligence sector.

Information about business and competitive intelligence that I see in my TRAX Overflight service is mostly of the “power of positive thinking” variety. Companies like Palantir capture attention because the firms are able to raise astounding amounts of funding. Less visible are the financial pressures on the companies trying to generate revenue with systems aimed at commercial enterprises.

If the 01Business article is on the money, what US vendors are like to have their heads under water in 2014? Use the comments section of this blog to identify the stragglers in the North American market.

Stephen E Arnold, December 14, 2013

Quote to Note: NLP and Recipes for Success and Failure

December 11, 2013

I read “Natural language Processing in the Kitchen.” The post was particularly relevant because I had worked through “The Main Trick in Machine Learning.” The essay does an excellent job of explaining coefficients (what I call for ease of recall, “thresholds.”) The idea is that machine learning requires a human to make certain judgments. Autonomy IDOL uses Bayesian methods and the company has for many years urged licensees to “train” the IDOL system. Not only that, successful Bayesian systems, like a young child, have to be prodded or retrained. How much and how often depends on the child. For Bayesian-like systems, the “how often” and “how much” varies by the licensees’ content contexts.

Now back to the Los Angeles Times’ excellent article about indexing and classifying a small set of recipes. Here’s the quote to note:

Com­puters can really only do so much.

When one jots down the programming and tuning work required to index recipes, keep in mind the “The Main Trick in Machine Learning.” There are three important lessons I draw from the boundary between these two write ups:

  1. Smart software requires programming and fiddling. At the present time (December 2013), this reality is as it has been for the last 50 years, maybe more.
  2. The humans fiddling with or setting up the content processing system have to be pretty darned clever. The notion of “user friendliness” is strongly disabused by these two articles. Flashy graphics and marketers’ cooing are not going to cut the mustard or the sirloin steak.
  3. The properly set up system with filtered information processed without some human intervention hits 98 percent accuracy. The main point is that relevance is a result of humans, software, and consistent, on point content.

How many enterprise search and content processing vendors explain that a failure to put appropriate resources toward the search or content processing implementation guarantees some interesting issues. Among them, systems will routinely deliver results that are not germane to the user’s query.

The roots of dissatisfaction with incumbent search and retrieval systems is not the systems themselves. In my opinion, most are quite similar, differing only in relatively minor details. (For examples of the similarity, review the reports at Xenky’s Vendor Profiles page.)

How many vendors have been excoriated because their customers failed to provide the cash, time, and support necessary to deliver a high-performance system? My hunch is that the vendors are held responsible for failures that are predestined by licensees’ desire to get the best deal possible and believe that magic just happens without the difficult, human-centric work that is absolutely essential for success.

Stephen E Arnold, December 11, 2013

Palantir: What Is the Main Business of the Company?

December 11, 2013

I read about Palantir and its successful funding campaign in “Palantir’s Latest Round Valuing It at $9B Swells to $107.8M in New Funding.” Compared to the funding for ordinary search and content processing companies, Palantir is obviously able to attract investors better than most of the other companies that make sense out of data.

If you run a query for “Palantir” on Beyond Search, you will get links to articles about the company’s previous funding and to a couple of stories about the companies interaction with IBM i2 related to an allegation about Palantir’s business methods.

Image from the Louisiana Lottery.

I find Palantir interesting for three reasons.

First, it is able to generate significant buzz in police and intelligence entities in a number of countries. Based on what I have heard at conferences, the Palantir visualizations knock the socks off highly placed officials who want killer graphics in their personal slide presentations.

Second, the company has been nosing into certain financial markets. The idea is that the Palantir methods will give some of the investment outfits a better way to figure out what’s going up and what’s going down. The visuals are good, I have heard, but the Palantir analytics are perceived, if my sources are accurate, as better than those from companies like IBM SPSS, Digital Reasoning, Recorded Future, and similar analytics firms.

Third, the company may have moved into a new business sector. The firm’s success in fund raising begs the question, “Is Palantir becoming a vehicle to raise more and more cash?”

Palantir is worth monitoring. The visualizations and the math are not really a secret sauce. The magic ingredient at Palantir may be its ability to sell its upside to investors. Is Palantir introducing a new approach to search and content processing? The main business of the company could be raising more and more money.

Stephen E Arnold, December 11, 2013

Exclusive Silobreaker Interview: Mats Bjore, Silobreaker

November 25, 2013

With Google becoming more difficult to use, many professionals need a way to locate, filter, and obtain high value information that works. Silobreaker is an online service and system that delivers actionable information.

The co-founder of Silobreaker said in an exclusive interview for Search Wizards Speaks says:

I learned that in most of the organizations, information was locked in separate silos. The information in those silos was usually kept under close control by the silo manager. My insight was that if software could make available to employees the information in different silos, the organization would reap an enormous gain in productivity. So the idea was to “break” down the the information and knowledge silos that exists within companies, organizations and mindsets.

And knock down barriers the system has. Silobreaker’s popularity is surging. The most enthusiastic supporters of the system come from the intelligence community, law enforcement, analysts, and business intelligence professionals. A user’s query retrieves up-to-the-minute information from Web sources, commercial services, and open source content. The results are available as a series of summaries, full text documents, relationship maps among entities, and other report formats. The user does not have to figure out which item is an advertisement. The Silobreaker system delivers muscle, not fatty tissue.

Mr. Bjore, a former intelligence officer, adds:

Silobreaker is an Internet and a technology company that offers products and services which aggregate, analyze, contextualize and bring meaning to the ever-increasing amount of digital information.

Underscoring the difference between Silobreaker and other online systems, Mr. Bjore points out:

What sets us apart is not only the Silobreaker technology and our commitment to constant innovation. Silobreaker embodies the long term and active experience of having a team of users and developers who can understand the end user environment and challenges. Also, I want to emphasize that our technology is one integrated technology that combines access, content, and actionable outputs.

The ArnoldIT team uses Silobreaker in our intelligence-related work. We include a profile of the system in our lectures about next-generation information gathering and processing systems.

You can get more information about Silobreaker at A 2008 interview with Mr. Bjore is located at on the Search Wizards Speak site at

Stephen E Arnold, November 25, 2013

2013 Text Mining Summit Draws Record Crowd

November 2, 2013

The Linguamatics Blog recently reported on the outcome of the 2013 Text Mining Summit in the post “Pharma and Healthcare Come Together to See the Future of Text Mining.”

According to the article, this year’s event drew a record crowd of over 85 attendees who had the opportunity to listen to industry experts from the pharma and healthcare sector.

The article summarizes a few event highlights:

“Delegates were provided with an excellent opportunity to explore trends in text mining and analytics, natural language processing and knowledge discovery. Delegates discovered how I2E is delivering valuable intelligence from text in a range of applications, including the mining of scientific literature, news feeds, Electronic Health Records (EHRs), clinical trial data, FDA drug labels and more. Customer presentations demonstrated how I2E helps workers in knowledge driven organizations meet the challenge of information overload, maximize the value of their information assets and increase speed to insight.”

Events like the Text Analytics Summit are excellent opportunities for members of the data analytics community to gather and share their insights and new advances in the industry.

Jasmine Ashton, November 02, 2013

Sponsored by, developer of Beyond Search

Search Wizards Speak: Oleg Rogynskyy, Semantria

October 28, 2013

Semantria is a company focused on providing text and sentiment analysis to anyone. The company’s approach is to streamline the analysis of content to that in less than three minutes and for a nominal $1,000, the power of content processing can help answer tough business questions.

The firm’s founder is Oleg Rogynskyy, who has worked at Nstein (now part of Open Text) and Lexalytics. The idea for Semantria blossomed from Mr. Rogynskyy’s insight that text analytics technology was sufficiently mature so that it could be useful to almost any organization or business professionals.

I interviewed Mr. Rogynskyy on October 24, 2013. He told me:

At Semantria, we want to simplify and democratize access to text analytics technology. We want people to be able to get up and running in no time, with a small budget, and actually derive value from our technology. The classic story is you buy a system worth $100k and don’t deploy it.

Semantria focuses on a class of problems that a few years ago would have been outside the reach of many firms. He said:

We make it simple for our clients to solve the following problems: First, some organizations have too much text to read. For example, a Twitter stream or surveys with many responses. Also, there is the need to move quickly and reduce the time to get to market. Many survey results come with an expiry date before they’re irrelevant. Then there is reporting the information. Anyone can use their Excel smarts to build simple/interesting reports and visuals out of unstructured data. But that can take some time, and Semantria accelerates this step. Finally, users need to analyze text with the same impartiality each time. A human might see a glass as half full or half empty, but Semantria will always see a glass with water.

One of the most interesting aspects of Semantria is that the company delivers its solution as a cloud service. Mr. Rogynskyy observed:

We are happily in the cloud, and in the cloud we trust. We have android and iOS software development kits in the works, so whoever wants to talk to our API from mobile devices will be doing it with ease very soon.

You can get more information about Semantria at

This interview is one or more than 60 full-text interviews with individuals who are deeply involved in search, content processing, and analytics. You can find the full series at

Stephen E Arnold, October 28, 2013

Next Page »