Attensity Ups Its Presence in Hackathons

October 28, 2014

I found the Attensity blog post “Attensity Takes Utah Tech Week” quite interesting. I cannot recall when mainstream content processing companies embraced hackathons so fiercely.

The blog post explains:

A hackathon, for the uninitiated, is exactly what it sounds like: a hybrid of computer hacking and a marathon in a grueling, caffeine-fueled, 12-hour time period. Groups comprised of mostly engineers and IT whizzes compete against the clock and other teams to create a project to present at the of the day to a panel of judges.

What did Attensity’s engineers build to showcase the company’s sentiment analysis and analytics technologies? Here’s the Attensity description:

With the Twitter API up and running, Team Attensity used Raspberry Pi to process tweets using #obama and #utahtechweek. Simultaneously, the team used Arduino to code sentiments from the tweets using a red light for negative sentiments, blue for positive sentiments, and yellow for neutral sentiments.

Attensity was pleased with the outcome in Utah. More hackathons are in the firm’s future. I wonder if one can deploy IBM Watson using a Raspberry Pi or showcase HP Autonomy with an Arduino.

How will hackathons generate revenue? I am not sure. The effort seems like a cost hole to me.

Stephen E Arnold, October 28, 2014

Predictive Analytics: Trouble Ahead?

October 28, 2014

I learned about a new book that will be available in early 2015. Its title is The Black Box Society: The Secret Algorithms That Control Money and Information. The author is Frank Pasquale, a professor of law at the University of Maryland.

The Harvard promotional Web site for the book asserts:

Hidden algorithms can make (or ruin) reputations, decide the destiny of entrepreneurs, or even devastate an entire economy. Shrouded in secrecy and complexity, decisions at major Silicon Valley and Wall Street firms were long assumed to be neutral and technical. But leaks, whistleblowers, and legal disputes have shed new light on automated judgment. Self-serving and reckless behavior is surprisingly common, and easy to hide in code protected by legal and real secrecy. Even after billions of dollars of fines have been levied, underfunded regulators may have only scratched the surface of this troubling behavior.

The Institute for Ethics and Emerging Technologies mentioned the forthcoming book here. One of the comments about that post was interesting to me. TooManyJoes wrote:

The control of the results by the decision makers is what makes this future menacing. Right now, Google is under attack being too good at search prediction and making money on targeted advertisements whose brilliantly written algorithms allow such a sophisticated variety of information to be indexed. As a result search bubbles have formed, and a lack of statistics comprehension prevents the awareness of control over this new medium. Snake oil salesmen turned into Mad Men and psychiatrists, it’s the medium of internet based controlled by one snake oil salesman that frightens us all. I believe it’s not possible without a formal computational human algorithm to have enough of an impact to have widespread influence. I bring up these mediums because to engage in them is to participate, participation can be tracked, then imagine the expense of the things we have access to because free participation drives those products and services by up selling those products. Without education, which most people won’t be open to, and time for the common man to analyze the data…those in control of the data will be people delegated by others. Welcome to the age of transparency.

The Google reference may presage some discussion of the company’s predictive wizardry.

Stephen E Arnold, October 28, 2014

Looking for an AI Silver Bullet to Make Software Smart? Keep Looking

October 24, 2014

Here in Harrod’s Creek, Kentucky there is not too much chatter about machine learning. It is hunting season. Time to get out the Barrett Automatic Rifle and go hunting for varmints.

Sundown yesterday when calm returned to the hollow, I read “Machine-Learning Maestro Michael Jordan on the Delusions of Big Data and Other Huge Engineering Efforts.”

My thought after reading the IEEE article was that I was really tired of the artificial intelligence yap yap. Now a whiz at UCal Berkeley is pointing out that some of the methods are a “cartoon.”

The Dr. Michael Jordan says:

I think data analysis can deliver inferences at certain levels of quality. But we have to be clear about what levels of quality. We have to have error bars around all our predictions. That is something that’s missing in much of the current machine learning literature.,,if people use data and inferences they can make with the data without any concern about error bars, about heterogeneity, about noisy data, about the sampling pattern, about all the kinds of things that you have to be serious about if you’re an engineer and a statistician—then you will make lots of predictions, and there’s a good chance that you will occasionally solve some real interesting problems. But you will occasionally have some disastrously bad decisions. And you won’t know the difference a priori. You will just produce these outputs and hope for the best. And so that’s where we are currently.

In short, marketing hyperbole takes precedence over the plodding realities of the steps required of a person aspiring to a PhD in statistics is supposed to follow.

With regard to the applications that deliver predictive outputs, Dr. Jordan says:

But unless you’re actually doing the full-scale engineering statistical analysis to provide some error bars and quantify the errors, it’s gambling. It’s better than just gambling without data. That’s pure roulette. This is kind of partial roulette.

I strongly recommend you read the interview. I would not involve a search or content processing marketer in the exercise, however.

Stephen E Arnold, October 24, 2014

Watson Analytic Example

October 21, 2014

Navigate to Thinglink. At this location is an example of the type of graphic that can be generated with output from Watson, IBM’s next big thing. A graphic artist has taken the data and created an eye snapping infographic. How many other systems can generate this type of output? Quite a few if the information in my analytics files are representative. Is it necessary to use IBM Watson when Microsoft Excel and an open source tool like Tableau are available? IBM Watson analyzed 135 million tweets from 10 countries in Central and South America. Brazil was excluded.

Twitter said in 2013:

Brazil is one of our largest markets with a strong user base. Twitter has already become an important part of our lives in Brazil and, by strengthening our local presence, we plan to continue delighting our users as well as creating new opportunities for marketers who want to connect with them.

Perhaps I overlooked Brazil. No big deal.

Stephen E Arnold, October 22, 2014

Autonomy: 33 APIs

October 21, 2014

Curious about Hewlett Packard’s Autonomy APIs? You can see the list of 33 at If you are curious about Autonomy’s Big Data capabilities, you may be puzzled about the lack of explicit analytics application programming interfaces. Don’t be. The savvy developer selects operations, takes outputs, and pumps the data into a search based application, third party number crunching system, a data management system, or plain old Excel. What’s interesting is that the naming of the APIs makes clear the search-centric nature of Autonomy. The marketing of IDOL as a service or a cloud solution shifts attention away from search in my view.

Stephen E Arnold, October 21, 2014

Big Data Failure: Teens and Music

October 13, 2014

How much data are available for teen demographics, popular music sales by genre and medium, downloads from iTunes and Amazon, the music trade associations, and myriad other sources. If there is one industry with data, lots of data, isn’t it the music business?

I read “No One Knows How Teens Listen to Music.” The information is surprising. I thought we lived in the world of Big Data. With flashy algorithms and lots of zeros and ones, the secrets of the universe are exposed. Business strategists and entrepreneurs would flourish. The world would be a better place. Isn’t that what Big Data marketers suggest?

Here’s a passage I noted:

Fast forward to 2014. Nielsen’s recent analysis of the music industry at large showed a six-percent decrease in digital music sales and a 32-percent increase in overall streaming. According to the company, these changes were largely… because of teens. As Martin Pyykkonnen, an analyst at Wedge Partners, told Yahoo last year, “Young people today don’t buy music anymore.” Except maybe they do, according to the Piper Jaffray report. Or maybe they don’t buy MP3s but do download them. Or maybe they don’t download them but do listen to them.

So lots of data about music and teens. We learn, “All the major surveys disagree. Maybe it’s a secret.”

Yep, Big Data delivers. Oh, how about those Ebola predictions?

Stephen E Arnold, October 14, 2014

Chiliad Offline: A Precursor for Other BI Outfits

October 13, 2014

According to PacerMonitor, Chiliad, Inc. filed for bankruptcy on August 6, 2014. As you may recall, the company was a Washington, DC area analytics firm founded by Christine Maxwell of McKinley Group and Magellan fame. (Magellan became part of Excite, which also faded away.)

About two years ago, Beyond Search wrote about Chiliad and its big rocks. Also, in 2012, the company named Craig Norris, as chief executive officer. Mr. Norris (an industry leader according to Reuters)  had been the CEO of Attensity, sentiment analysis outfit, which has experienced its share of strong headwinds. In the news release about his appointment, he said:

“I am excited to be joining Chiliad at an important stage in its growth. What makes or breaks an analytics company is the quality and usability of its core technology. Chiliad’s offering has proven its ability to extract critical findings from data at massive scale for both Government and Commercial customers. I am eager to see us gain recognition for our technology leadership.”

The news release included assertions by Patrick Gross (Chairman of the Chiliad board of directors) that I have encountered many times in the last five years; to wit:

“Chiliad has already solved two very challenging problems. The first is the ability to rapidly search data collections at greater scale than any other offering in the market. The second is to allow search formulation and analysis in natural language. This means that no longer is an elite class of analysts required in order to generate meaningful results, thus reducing the personnel training and skills shortages that plague alternative solutions and put timely discovery at risk. The explosion of ‘Big Data’ is real and valuable findings are buried in vast collections for both enterprises and governments. Chiliad has the opportunity to integrate its innovative, massively scalable solutions with emerging open source software to build customized solutions for the largest-scale clients.”

Businessweek described the company in this way:

Chiliad, Inc. provides data analysis solutions for various clouds, agencies, departments, and other stovepipes. The company offers Discovery/Alert, a platform that enables investigators, business analysts, and knowledge workers to securely reach, find, analyze, and continuously stay on top of big data—whether structured or unstructured, and classified or unclassified. Its software solutions include Iterative Discovery cycle that allows analysts and researchers to reach various content silos, find what matters, analyze it to find meaning from the information relationships presented and continuously monitor changes; and Architecture, a virtual consolidated data center that enables multidimensional analysis and ranking. It serves government/intelligence, law enforcement, healthcare, pharmaceutical, insurance, and other markets. Chiliad, Inc. was founded in 1998 and is headquartered in Herndon, Virginia.

I have highlighted the buzzwords that were designed to generate sales leads and revenue. I can only assume that the verbiage and the Attensity management touch fell short of the mark. How many of the “analytics” and “business intelligence” companies will follow Chiliad’s path? Good question but I keep asking it.

Stephen E Arnold, October 12, 2014

New IBM Redbook: IBM Watson Enterprise Search and Analytics

October 12, 2014

The Redbook is free. You can download it from this IBM link for now. The full title is “IBM Watson Content Analytics. Discovering Actionable Insight from Your Content.”

The Redbook weighs in with 598 pages of Watson goodness. If you follow the IBM content analytics products, you may know that the previous version was know as IBM Content Analytics with Enterprise Search or (ICAwES).

The Redbook presents some philosophical content. IBM has a tradition to uphold. In addition, the Redbook provides information about facets (yep, good old metadata), some mathy features that make analytics analytical, and sentiment analysis.

ICAwES does not operate as an island. The sprawling system can hook into IBM’s semi automatic classification system, Cognos, and interface tools.

Is ICAwES an “enterprise search” system? I would say, “Sure is.” You will have to work through the Redbook and draw your own conclusions. You will also want to identify the Watson component. Watson is Lucene with IBM scripts and wrappers, but IBM has far more colorful lingo for describing the system. After all, IBM Watson is supposed to generate $1 billion in a snappy manner. If IBM’s plan bears revenue fruit, in five or six years, Watson will be a $10 billion per year business. That’s quite a goal, considering Autonomy required 13 years to push into $800 million in revenue territory and IBM has been offering information retrieval systems since the days of STAIRS.

The new information in the July 2014 edition of the Redbook adds a chapter containing some carefully selected case studies. There is a new chapter called “Enterprise Search” to which I will return in a moment. Also, the many authors of the Redbook have added to the discussion of Cognos, one of IBM’s business intelligence systems. Finally, the Redbook provides some helpful suggestions for “customizing and extending the content analytics miner.”

I urge you to work through this volume because it provides a useful yardstick against which to measure the IBM Watson marketing and public relations explanations against the reality, limitations, and complexity of the IBM Content Analytics system. Is the Redbook describing a product or a collection of components that an IBM implementation team will use to craft a customized solution?

The chapter on Enterprise Search begins on page 445 and continues to page 486. The solution is a two part affair. On one hand, processed content will output data about the entities, word frequencies, and similar metrics in the corpus and updates to the corpus. On the other hand, ICAwES is a search and retrieval system. Many vendors take this approach today; however, certain types of content cannot be comprehensively processed by the system. Examples range from video content, engineering drawings, digital imagery, and certain types of ephemeral content such as text messages sent via an ad hoc Bluetooth mesh network. One can code up a fix, but that is likely to be more hassle than many licensees will tolerate.

The Redbook shows some ready-to-use interfaces. These can, of course, be modified. The sample in the screenshot below looks quite a bit like the original Fulcrum Technologies’ presentation of information processed by the system. A more modern implementation would be Amazon’s recent JSON centric system for content.


ICAwES Redbook, Copyright IBM 2014.

The illustration shows a record viewed by tags; for example categories. Items can be tallied in a chart that provides a summary of how many content objects share a particular index terms. The illustration shows the ICAwES identifying terms in a user’s query, identifying entities like IBM Lotus Domino, and other features associated with Autonomy IDOL or Endeca style systems. Both of these date from the late 1990s, so IBM is not pushing too far from the dirt path carved out of the findability woods by former leaders in enterprise search.

IBM provides information needed to implement query expansion. Yes, a dictionary lurks within the system, and an interface is provided so the licensee can be like Noah Webster. The system is rules based, and a specialist is needed to create or edit rules. As you may know, rules based systems suffer from several drawbacks. Rules have to be maintained, subject matter experts or programmers are usually required to make the proper judgments, and rules can drift out of phase with the users’ queries unless the system is monitored with above average rigor. Like Autonomy IDOL, skimp on monitoring and tuning, and the system can generate some interesting results.

The provided user interface looks like this:


ICAwES Redbook, Copyright IBM 2014.

With many users wanting a “big red button” to simplify information access, this interface brings forward the high density displays associated with TeraText and similar legacy systems. The density seems to include hints of Attivio and BA Insight user interfaces as well. There are many choices available to the user. However, without special training, it is unlikely that a marketing professional using ICAwES will be able to make full use of of query trees, category trees, and the numerous icons that appear in four different locations. I can hear the user now, “I want this system to be just like Google? I want to type in a three words and scan the results.”

Net net. If you are working in an organization that favors IBM solutions, this system is likely to be what senior management licenses. Keep in mind that ICAwES will require the ministrations of IBM professional services, probably additional headcount, and on-going work to keep the system delivering useful results to users and decision makers.

The system delivers key word search, rich indexing, and basic metrics about the content. IBM offers more robust analytic tools in its SPSS product line. For more comprehensive text analysis, take a look at IBM i2 and Cybertap solutions if your organization has appropriate credentials for these somewhat more sophisticated information access and analysis systems.

After working through the Redbook, I had one question, “Where’s Watson?”

Stephen E Arnold, October 12, 2014

V.I. Arnold and His Math Teaching Ideas

October 11, 2014

Short honk: My relative, Vladimir Igorevich Arnold, worked with some fairly smart people; for example, Kolmogorov. If you want to get a sense of his ideas about math teaching, you may find “On Teaching Mathematics” interesting. Like most of my family’s work, one can improve by applying more effort. Demanding group, probably just like yours, gentle reader.

Stephen E Arnold, October 11, 2014

Thomas Bayes Gets Ink

October 8, 2014

I read “Belief, Bias, and Bayes.” The write up appeared in the open source friendly Guardian newspaper. Bayes and his methods are more popular than ever. Instead of meeting the good churchman in a statistics class, there he is near the editorial page, on the Web, and in the blogosphere.

This particular write up is surprisingly gentle toward the bane of many university students. Here’s the explanation of the method in the article:

I find it easier to be concrete. So imagine I have a bag containing three stones; two blue and one red. Without looking, and in random order, you and I pick, and keep, one stone each. What are the chances I have blue and you have red? I could work them out two ways. If you have the only red stone (which you have a one-in-three chance of having got, without knowing anything about my choice) then I must have a blue (one-in-one). The probability is ? × 1 = ?, a third. On the other hand, if we know I have a blue stone (probability two-in-three) then there is a 50:50 chance you have a red stone. The probability is ? × ½ = ? again. The answers had to come out the same, since both ways of working it out describe the same result. The “probability of me having blue if you have red, multiplied by the probability of you having red”, has to be the same as “the probability of you having red if I have blue multiplied by the probability of me having blue”. Abstracted, that’s Bayes’ Theorem.

There you go. There was a particularly useful quotation in the article; to wit:

One of the things that gets people fired up is that Bayesian statistics can introduce a level of subjectivity into the scientific process that some scientists see as unacceptable.

Spot on. I recall one failed webmaster who publishes “expert opinions” who fulminated against this “flaw” in the method. I made a brief effort to explain the benefits of the method, but he would have none of it. The biases baked into his “expert” brain was more correct than any mathematical reasoning. That’s what makes this person a “real” expert and, of course, a failed webmaster.

The Guardian article comes at being resistant to a procedure this way:

I guess Bayesian statistics provides a mathematical definition of a closed mind. Anyone with a prior of zero about something can never learn from any amount of evidence, because anything multiplied by zero is still zero.

I think this means one is stupid. Perhaps this resistance to a method is behind much of the fulminating about Autonomy’s digital reasoning engine and its integrated data operating layer?

Stephen E Arnold, October 8, 2014

« Previous PageNext Page »