Looking for an AI Silver Bullet to Make Software Smart? Keep Looking

October 24, 2014

Here in Harrod’s Creek, Kentucky there is not too much chatter about machine learning. It is hunting season. Time to get out the Barrett Automatic Rifle and go hunting for varmints.

Sundown yesterday when calm returned to the hollow, I read “Machine-Learning Maestro Michael Jordan on the Delusions of Big Data and Other Huge Engineering Efforts.”

My thought after reading the IEEE article was that I was really tired of the artificial intelligence yap yap. Now a whiz at UCal Berkeley is pointing out that some of the methods are a “cartoon.”

The Dr. Michael Jordan says:

I think data analysis can deliver inferences at certain levels of quality. But we have to be clear about what levels of quality. We have to have error bars around all our predictions. That is something that’s missing in much of the current machine learning literature.,,if people use data and inferences they can make with the data without any concern about error bars, about heterogeneity, about noisy data, about the sampling pattern, about all the kinds of things that you have to be serious about if you’re an engineer and a statistician—then you will make lots of predictions, and there’s a good chance that you will occasionally solve some real interesting problems. But you will occasionally have some disastrously bad decisions. And you won’t know the difference a priori. You will just produce these outputs and hope for the best. And so that’s where we are currently.

In short, marketing hyperbole takes precedence over the plodding realities of the steps required of a person aspiring to a PhD in statistics is supposed to follow.

With regard to the applications that deliver predictive outputs, Dr. Jordan says:

But unless you’re actually doing the full-scale engineering statistical analysis to provide some error bars and quantify the errors, it’s gambling. It’s better than just gambling without data. That’s pure roulette. This is kind of partial roulette.

I strongly recommend you read the interview. I would not involve a search or content processing marketer in the exercise, however.

Stephen E Arnold, October 24, 2014

Watson Analytic Example

October 21, 2014

Navigate to Thinglink. At this location is an example of the type of graphic that can be generated with output from Watson, IBM’s next big thing. A graphic artist has taken the data and created an eye snapping infographic. How many other systems can generate this type of output? Quite a few if the information in my analytics files are representative. Is it necessary to use IBM Watson when Microsoft Excel and an open source tool like Tableau are available? IBM Watson analyzed 135 million tweets from 10 countries in Central and South America. Brazil was excluded.

Twitter said in 2013:

Brazil is one of our largest markets with a strong user base. Twitter has already become an important part of our lives in Brazil and, by strengthening our local presence, we plan to continue delighting our users as well as creating new opportunities for marketers who want to connect with them.

Perhaps I overlooked Brazil. No big deal.

Stephen E Arnold, October 22, 2014

Autonomy: 33 APIs

October 21, 2014

Curious about Hewlett Packard’s Autonomy APIs? You can see the list of 33 at IdolOnDemand.com. If you are curious about Autonomy’s Big Data capabilities, you may be puzzled about the lack of explicit analytics application programming interfaces. Don’t be. The savvy developer selects operations, takes outputs, and pumps the data into a search based application, third party number crunching system, a data management system, or plain old Excel. What’s interesting is that the naming of the APIs makes clear the search-centric nature of Autonomy. The marketing of IDOL as a service or a cloud solution shifts attention away from search in my view.

Stephen E Arnold, October 21, 2014

Big Data Failure: Teens and Music

October 13, 2014

How much data are available for teen demographics, popular music sales by genre and medium, downloads from iTunes and Amazon, the music trade associations, and myriad other sources. If there is one industry with data, lots of data, isn’t it the music business?

I read “No One Knows How Teens Listen to Music.” The information is surprising. I thought we lived in the world of Big Data. With flashy algorithms and lots of zeros and ones, the secrets of the universe are exposed. Business strategists and entrepreneurs would flourish. The world would be a better place. Isn’t that what Big Data marketers suggest?

Here’s a passage I noted:

Fast forward to 2014. Nielsen’s recent analysis of the music industry at large showed a six-percent decrease in digital music sales and a 32-percent increase in overall streaming. According to the company, these changes were largely… because of teens. As Martin Pyykkonnen, an analyst at Wedge Partners, told Yahoo last year, “Young people today don’t buy music anymore.” Except maybe they do, according to the Piper Jaffray report. Or maybe they don’t buy MP3s but do download them. Or maybe they don’t download them but do listen to them.

So lots of data about music and teens. We learn, “All the major surveys disagree. Maybe it’s a secret.”

Yep, Big Data delivers. Oh, how about those Ebola predictions?

Stephen E Arnold, October 14, 2014

Chiliad Offline: A Precursor for Other BI Outfits

October 13, 2014

According to PacerMonitor, Chiliad, Inc. filed for bankruptcy on August 6, 2014. As you may recall, the company was a Washington, DC area analytics firm founded by Christine Maxwell of McKinley Group and Magellan fame. (Magellan became part of Excite, which also faded away.)

About two years ago, Beyond Search wrote about Chiliad and its big rocks. Also, in 2012, the company named Craig Norris, as chief executive officer. Mr. Norris (an industry leader according to Reuters)  had been the CEO of Attensity, sentiment analysis outfit, which has experienced its share of strong headwinds. In the news release about his appointment, he said:

“I am excited to be joining Chiliad at an important stage in its growth. What makes or breaks an analytics company is the quality and usability of its core technology. Chiliad’s offering has proven its ability to extract critical findings from data at massive scale for both Government and Commercial customers. I am eager to see us gain recognition for our technology leadership.”

The news release included assertions by Patrick Gross (Chairman of the Chiliad board of directors) that I have encountered many times in the last five years; to wit:

“Chiliad has already solved two very challenging problems. The first is the ability to rapidly search data collections at greater scale than any other offering in the market. The second is to allow search formulation and analysis in natural language. This means that no longer is an elite class of analysts required in order to generate meaningful results, thus reducing the personnel training and skills shortages that plague alternative solutions and put timely discovery at risk. The explosion of ‘Big Data’ is real and valuable findings are buried in vast collections for both enterprises and governments. Chiliad has the opportunity to integrate its innovative, massively scalable solutions with emerging open source software to build customized solutions for the largest-scale clients.”

Businessweek described the company in this way:

Chiliad, Inc. provides data analysis solutions for various clouds, agencies, departments, and other stovepipes. The company offers Discovery/Alert, a platform that enables investigators, business analysts, and knowledge workers to securely reach, find, analyze, and continuously stay on top of big data—whether structured or unstructured, and classified or unclassified. Its software solutions include Iterative Discovery cycle that allows analysts and researchers to reach various content silos, find what matters, analyze it to find meaning from the information relationships presented and continuously monitor changes; and Architecture, a virtual consolidated data center that enables multidimensional analysis and ranking. It serves government/intelligence, law enforcement, healthcare, pharmaceutical, insurance, and other markets. Chiliad, Inc. was founded in 1998 and is headquartered in Herndon, Virginia.

I have highlighted the buzzwords that were designed to generate sales leads and revenue. I can only assume that the verbiage and the Attensity management touch fell short of the mark. How many of the “analytics” and “business intelligence” companies will follow Chiliad’s path? Good question but I keep asking it.

Stephen E Arnold, October 12, 2014

New IBM Redbook: IBM Watson Enterprise Search and Analytics

October 12, 2014

The Redbook is free. You can download it from this IBM link for now. The full title is “IBM Watson Content Analytics. Discovering Actionable Insight from Your Content.”

The Redbook weighs in with 598 pages of Watson goodness. If you follow the IBM content analytics products, you may know that the previous version was know as IBM Content Analytics with Enterprise Search or (ICAwES).

The Redbook presents some philosophical content. IBM has a tradition to uphold. In addition, the Redbook provides information about facets (yep, good old metadata), some mathy features that make analytics analytical, and sentiment analysis.

ICAwES does not operate as an island. The sprawling system can hook into IBM’s semi automatic classification system, Cognos, and interface tools.

Is ICAwES an “enterprise search” system? I would say, “Sure is.” You will have to work through the Redbook and draw your own conclusions. You will also want to identify the Watson component. Watson is Lucene with IBM scripts and wrappers, but IBM has far more colorful lingo for describing the system. After all, IBM Watson is supposed to generate $1 billion in a snappy manner. If IBM’s plan bears revenue fruit, in five or six years, Watson will be a $10 billion per year business. That’s quite a goal, considering Autonomy required 13 years to push into $800 million in revenue territory and IBM has been offering information retrieval systems since the days of STAIRS.

The new information in the July 2014 edition of the Redbook adds a chapter containing some carefully selected case studies. There is a new chapter called “Enterprise Search” to which I will return in a moment. Also, the many authors of the Redbook have added to the discussion of Cognos, one of IBM’s business intelligence systems. Finally, the Redbook provides some helpful suggestions for “customizing and extending the content analytics miner.”

I urge you to work through this volume because it provides a useful yardstick against which to measure the IBM Watson marketing and public relations explanations against the reality, limitations, and complexity of the IBM Content Analytics system. Is the Redbook describing a product or a collection of components that an IBM implementation team will use to craft a customized solution?

The chapter on Enterprise Search begins on page 445 and continues to page 486. The solution is a two part affair. On one hand, processed content will output data about the entities, word frequencies, and similar metrics in the corpus and updates to the corpus. On the other hand, ICAwES is a search and retrieval system. Many vendors take this approach today; however, certain types of content cannot be comprehensively processed by the system. Examples range from video content, engineering drawings, digital imagery, and certain types of ephemeral content such as text messages sent via an ad hoc Bluetooth mesh network. One can code up a fix, but that is likely to be more hassle than many licensees will tolerate.

The Redbook shows some ready-to-use interfaces. These can, of course, be modified. The sample in the screenshot below looks quite a bit like the original Fulcrum Technologies’ presentation of information processed by the system. A more modern implementation would be Amazon’s recent JSON centric system for content.


ICAwES Redbook, Copyright IBM 2014.

The illustration shows a record viewed by tags; for example categories. Items can be tallied in a chart that provides a summary of how many content objects share a particular index terms. The illustration shows the ICAwES identifying terms in a user’s query, identifying entities like IBM Lotus Domino, and other features associated with Autonomy IDOL or Endeca style systems. Both of these date from the late 1990s, so IBM is not pushing too far from the dirt path carved out of the findability woods by former leaders in enterprise search.

IBM provides information needed to implement query expansion. Yes, a dictionary lurks within the system, and an interface is provided so the licensee can be like Noah Webster. The system is rules based, and a specialist is needed to create or edit rules. As you may know, rules based systems suffer from several drawbacks. Rules have to be maintained, subject matter experts or programmers are usually required to make the proper judgments, and rules can drift out of phase with the users’ queries unless the system is monitored with above average rigor. Like Autonomy IDOL, skimp on monitoring and tuning, and the system can generate some interesting results.

The provided user interface looks like this:


ICAwES Redbook, Copyright IBM 2014.

With many users wanting a “big red button” to simplify information access, this interface brings forward the high density displays associated with TeraText and similar legacy systems. The density seems to include hints of Attivio and BA Insight user interfaces as well. There are many choices available to the user. However, without special training, it is unlikely that a marketing professional using ICAwES will be able to make full use of of query trees, category trees, and the numerous icons that appear in four different locations. I can hear the user now, “I want this system to be just like Google? I want to type in a three words and scan the results.”

Net net. If you are working in an organization that favors IBM solutions, this system is likely to be what senior management licenses. Keep in mind that ICAwES will require the ministrations of IBM professional services, probably additional headcount, and on-going work to keep the system delivering useful results to users and decision makers.

The system delivers key word search, rich indexing, and basic metrics about the content. IBM offers more robust analytic tools in its SPSS product line. For more comprehensive text analysis, take a look at IBM i2 and Cybertap solutions if your organization has appropriate credentials for these somewhat more sophisticated information access and analysis systems.

After working through the Redbook, I had one question, “Where’s Watson?”

Stephen E Arnold, October 12, 2014

V.I. Arnold and His Math Teaching Ideas

October 11, 2014

Short honk: My relative, Vladimir Igorevich Arnold, worked with some fairly smart people; for example, Kolmogorov. If you want to get a sense of his ideas about math teaching, you may find “On Teaching Mathematics” interesting. Like most of my family’s work, one can improve by applying more effort. Demanding group, probably just like yours, gentle reader.

Stephen E Arnold, October 11, 2014

Thomas Bayes Gets Ink

October 8, 2014

I read “Belief, Bias, and Bayes.” The write up appeared in the open source friendly Guardian newspaper. Bayes and his methods are more popular than ever. Instead of meeting the good churchman in a statistics class, there he is near the editorial page, on the Web, and in the blogosphere.

This particular write up is surprisingly gentle toward the bane of many university students. Here’s the explanation of the method in the article:

I find it easier to be concrete. So imagine I have a bag containing three stones; two blue and one red. Without looking, and in random order, you and I pick, and keep, one stone each. What are the chances I have blue and you have red? I could work them out two ways. If you have the only red stone (which you have a one-in-three chance of having got, without knowing anything about my choice) then I must have a blue (one-in-one). The probability is ? × 1 = ?, a third. On the other hand, if we know I have a blue stone (probability two-in-three) then there is a 50:50 chance you have a red stone. The probability is ? × ½ = ? again. The answers had to come out the same, since both ways of working it out describe the same result. The “probability of me having blue if you have red, multiplied by the probability of you having red”, has to be the same as “the probability of you having red if I have blue multiplied by the probability of me having blue”. Abstracted, that’s Bayes’ Theorem.

There you go. There was a particularly useful quotation in the article; to wit:

One of the things that gets people fired up is that Bayesian statistics can introduce a level of subjectivity into the scientific process that some scientists see as unacceptable.

Spot on. I recall one failed webmaster who publishes “expert opinions” who fulminated against this “flaw” in the method. I made a brief effort to explain the benefits of the method, but he would have none of it. The biases baked into his “expert” brain was more correct than any mathematical reasoning. That’s what makes this person a “real” expert and, of course, a failed webmaster.

The Guardian article comes at being resistant to a procedure this way:

I guess Bayesian statistics provides a mathematical definition of a closed mind. Anyone with a prior of zero about something can never learn from any amount of evidence, because anything multiplied by zero is still zero.

I think this means one is stupid. Perhaps this resistance to a method is behind much of the fulminating about Autonomy’s digital reasoning engine and its integrated data operating layer?

Stephen E Arnold, October 8, 2014

Microsoft Prediction Technology: What about Windows Phone Sales?

October 7, 2014

I am amused when a company can roll out a product that people do not like. A good example is the Windows 8 version of the popular operating system. I think of Vista and Windows ME. I wonder how a company cannot “predict” how its own customers will react to a series of very expensive operating system changes.

The answer is that Microsoft’s ability to predict is not particularly good in my opinion. I won’t mention Windows Phone. I would point out that Apple’s iPhone 6 moved millions of units over a weekend. Did Microsoft predict that its phone would perform at a comparable level? Probably.

I read “A New Kind of Data-Driven Predictive Methodology.” The article is one of a flurry of fancy math stories that are choking my Overflight intelligence system.

The article explains that Microsoft predicted the Scottish independence vote and:

Microsoft…correctly predicted the winners of all 15 World Cup knockout games earlier this year and got the Obama vs. Romney outcome right in 50 of 51 jurisdictions (the states plus the District of Columbia) in the 2012 U.S. presidential election.

Pretty impressive until I think about Microsoft’s dismal track record with its own products’ acceptance by its own customers.

If you want to get more insight into a system that seems to perform well for non Microsoft questions, dig in. Microsoft is into social, reinventing survey research, and analysis of data that “must be accurate.”

Yep, accurate data help. How did those predictions about the Fast Search & Transfer acquisition work out? I will try to “Delve” into that question.

Stephen E Arnold, October 7, 2014

IBMs Watson Analytics for the Average Businessperson

October 6, 2014

The article on Geekzone titled IBM Introduces Powerful Analytics For Everyone discusses the recent announcement from IBM. The promise of an natural language-based analytics that is easier to get and to use is meant to cater to the modern businessperson. Three major advances in analytics technology have been applied to Watson Analytics including a streamlined “single business analytics experience”, a guided predictive analytics” that brings relevant material to the surface, and finally a “natural language dialogue” in familiar business jargon. Senior Vice President Bob Picciano explains in the article,

“Watson Analytics is designed to help all business people – from sales reps on the road to company CEOs – see patterns, pursue ideas and improve all types of decisions. We have eliminated the barrier between the answers they seek, the analytics they want and the data in the form they need. The combination of Watson-fueled analytics to magnify human cognition, the vast potential of big data, and cloud-scale delivery to PCs, smart phones and other devices is transformational.”

IBM has kept the actual businessperson in mind, the businessperson who spends half of their time collecting accurate data. By automating many of the steps in analysis, Watson analytics aims to aid in the efficiency and relevance of the analysis at hand.

Chelsea Kerwin, October 06, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Next Page »