HP Enterprise Investigative Analytics

February 5, 2016

Shiver me timbers. Batten the hatches. There is a storm brewing in the use of Autonomy-type methods to identify risks and fraud. To be fair, HP Enterprise no longer pitches Autonomy, but the sprit of Dr. Mike Lynch’s 1990s technology is there, just a hint maybe, but definitely noticeable to one who has embraced IDOL.

For the scoop, navigate to “HPE Launches Investigative Analytics, Using AI and Big Data to Identify Risk.” I was surprised that the story’s headline did not add “When Swimming in the Data Lake.” But the message is mostly clear despite the buzzwords.

Here’s a passage I highlighted:

The software is initially geared toward financial services organizations, and it combines existing HPE products like Digital Safe, IDOL, and Vertica all on one platform. By using big data analytics and artificial intelligence, it can analyze a large amount of data and help pinpoint potential risks of fraudulent behavior.

Note the IDOL thing.

The write up added:

Investigative Analytics starts by collecting both structured sources like trading systems, risk systems, pricing systems, directories, HR systems, and unstructured sources like email and chat. It then applies analysis to query “aggressively and intelligently across all those data sources,” Patrick [HP Enterprise wizard] said. Then, it creates a behavior model on top of that analysis to look at certain communication types and see if they can define a certain problematic behavior and map back to a particular historical event, so they can look out for that type of communication in the future.

This is okay, but the words, terminology, and phrasing remind me of more than 1990 Autonomy marketing collateral, BAE’s presentations after licensing Autonomy technology in the late 1990s, the i2 Ltd. Analyst Notebook collateral, and, more recently, the flood of jabber about Palantir’s Metropolitan Platform and Thomson Reuters’ version of Metropolitan called QA Direct or QA Studio or QA fill in the blank.

The fact that HP Enterprise is pitching this new service developed with “one bank” at a legal eagle tech conference is a bit like me offering to do my Dark Web Investigative Tools lecture at Norton Elementary School. A more appropriate audience might deliver more bang for each PowerPoint slide, might it not?

Will HP Enterprise put a dent in the vendors already pounding the carpeted halls of America’s financial institutions?

HP Enterprise stakeholders probably hope so. My hunch is that a me-too, me-too product is a less than inspiring use of the collection of acquired technologies HP Enterprise appears to put in a single basket.

Stephen E Arnold, February 5, 2016

Cheerleading for the SAS Text Exploration Framework

January 27, 2016

SAS is a stalwart in the number crunching world. I visualize the company’s executives chatting among themselves about the Big Data revolution, the text mining epoch, and the predictive analytics juggernaut.

Well, SAS is now tapping that staff interaction.

Navigate to “To Data Scientists and Beyond! One of Many Applications of Text Analytics.” There is an explanation of the ease of use of SAS. Okay, but my recollection was that I had to hire a PhD in statistics from Cornell University to chase down the code which was slowing our survivability analyses to meander instead of trot.

I learned:

One of the misconceptions I often see is the expectation that it takes a data scientist, or at least an advanced degree in analytics, to work with text analytics products. That is not the case. If you can type a search into a Google toolbar, you can get value from text analytics.

The write up contains a screenshot too. Where did the text analytics plumbing come from? Perchance an acquisition in 2008 like the canny purchase Teragram’s late 1990s technology?

The write up focuses on law enforcement and intelligence applications of text analytics. I find that interesting because Palantir is allegedly deriving more than 60 percent of the firm’s revenue from commercial customers like JP Morgan and starting to get some traction in health care.

Check out the screenshot. That is worth 1,000 words. SAS has been working on the interface thing to some benefit.

Stephen E Arnold, January 27, 2016

Pearson: Revenue Challenges and Digital Initiatives

January 26, 2016

I used to follow Pearson when it owned a wax museum and a number of other fascinating big revenue opportunities. Today the company is still big:  $8 billion in revenue, 40,000 employees, and offices in 70 countries. (Lots of reasons for senior executives to do field trips I assume.)

I noted that that Pearson plans to RIF (reduce in force) 4,000 employees. Let’s see. Yep, that works out to 10 percent of the “team.” Without the wax museum as a job option, will these folks become entrepreneurs?

I read “Turning Digital Learning Into Intellectual Property.” The title snagged me, and I assume that some of the 4,000 folks now preparing to find their future elsewhere were intrigued.

The write up reported:

Pearson is also positioning itself as a major center for the analysis of educational big data.

Ah, ha. A publishing outfit involved in education is getting with the Big Data thing.

How is a traditional publishing company going to respond to the digital opportunities it now perceives?

big data analysis methods will enable researchers to “capture stream or trace data from learners’ interactions” with learning materials, detect “new patterns that may provide evidence about learning,” and “more clearly understand the micro-patterns of teaching and learning by individuals and groups.” Big data methods of pattern recognition are at the heart of its activities, and Pearson ambitiously aims to use pattern recognition to identify generalizable insights into learning processes not just at the level of the individual learner but at vast scale.

Yes, vast. Micro patterns. Big Data.

My mouth is watering and my ageing brain cells hunger for the new learning.

Big questions have to be answered. For example, who owns learning theory?

I recall my brush with the education department. Ugly. I thought that most of the information to which I was exposed was baloney. For evidence, I think back to my years in Brazil with my hit and miss involvement with the Calvert Course, the “English not spoken here” approach of the schools in Campinas, and the seamless transition I made back to my “regular” US school after having done zero in the learning aquaria for several years.

I also recall the look of befuddlement on the face of the check out clerks, when I point out that a cash register tally is incorrect or the consternation that furrows the brow when I provide bills and two pennies.

My hunch is that the education thing is a juicy business, but I am not confident in Pearson’s ability to catch up with the folks who are not saddled with the rich legacy of printing books and charging lots of money for them.

This is a trend worth watching. Will it become the success of Ebsco’s “discovery” system? Will it generate the payoff Thomson Reuters is getting by reselling Palantir? Will it allow Pearson to make the bold moves that so many traditional publishing companies have made after they embraced XML as the silver bullet and incantation to ward off collapsing revenues?

I for one will be watching. Who knows? Maybe I will return to school to brighten the day of an adjunct professor at the local university. (This institution I might add is struggling with FBI investigations, allegations of sexual misconduct, and a miasma of desperation.)

Education. Great stuff.

Stephen E Arnold, January 26, 2016

MIT Tries to Rescue IBM Watson

January 21, 2016

Don’t Blame Watson for IBM’s Slide” is a remarkable example of “real” journalism. I love it when academics (you know the innocent party in the student loan maneuver) defends a really big outfit (IBM).

The write up focuses squarely on the problem IBM has created; for example:

If you’ve seen IBM’s advertisements or have read the proclamations that the company is making a big bet on Watson, its famed “cognitive computing” engine, you might be tempted to think the gamble is failing.

I know that IBM has made a serious miscalculation with the Watson play. First, HP is learning the hard way that search is not something that generates tons of dough without some serious management expertise. How is HP solving its conundrum with regard to the $11 billion bet on mid 1990s technology? HP is just going to fly like the legal eagles. Now legal eagles don’t like a sparrow in the flock. You may be able to guess who the winners will be in the HP solution.

IBM has embraced Lucene. That’s a good move. Palantir has done the same thing. IBM then failed to learn from outfits like Palantir even though IBM owns i2 and could have pursued a similarly focused and clear headed approach.

Nope. IBM made Watson into what I consider Jack Benny Show material. Watson does game shows without realizing that post production wizardry makes a “win” look like the mouse clicks of a film school grad. Watson does cook books. Watson cures cancer. Watson solves insurance woes. Watson does not generate the type of revenue that will make a dent in IBM’s revenue needs.

But IBM has achieved one thing. IBM has made cognitive computing the skateboard on which more nimble outfits are riding. Perhaps IBM should charge these start ups for the psychological assistance the wonky Watson PR campaigns have delivered without a prescription or a juicy hourly fee.

The write up reports in “real” journalistic style:

… Even if Watson had become a big business by now, IBM would still be in huge trouble because of trends that have been afoot for a very long time—notably the rise of cloud computing services that have diminished the need for large organizations to buy IBM servers and mainframes. This was in play long before Ginni Rometty was named CEO in 2011, but her predecessor, Sam Palmisano, was better able to mask the decline and keep Wall Street happy by selling off unprofitable lines of business, buying high-margin software companies, and returning billions of dollars to investors through dividends and share buybacks. Now there are fewer financial levers left to pull. Revenue has been falling for 15 quarters in a row.

Yep, blame Sam. Blame someone.

The reality is that Lucene, acquired technology, and home brew software are not going to do lots of stuff.

Back to Palantir. Whatever the company’s faults (and there are some Doozies), Palantir aced IBM in the intelligence sector. Palantir focused and then used indirect sales methods to move into financial services and health care.

IBM buys ads and does PR. Palantir gets others to push the product. I wonder if anyone at some of the banks know what a “helper” is or how to create and “object” in Palantir land.

Probably not.

But IBM has demonstrated that it lacks focus, has no effective strategy to make search and information access generate big money or in Palantir’s case, big flows of venture capital.

What IBM can do, however, is get the “real” journalists to play along. After 15 consecutive quarters of revenue excitement, IBM needs to find a solution.

Hint: i2 can help. Will IBM listen? Nah, it’s more fund thinking up ways to hire Bob Dylan to explain Watson or visualizing Watson as chemical structures.

Stephen E Arnold, January 21, 2016

Big Data Blending Solution

January 20, 2016

I would have used Palantir or maybe our own tools. But an outfit named National Instruments found a different way to perform data blending. “How This Instrument Firm Tackled Big Data Blending” provides a case study and a rah rah for Alteryx. Here’s the paragraph I highlighted:

The software it [National Instruments] selected, from Alteryx, takes a somewhat unique approach in that it provides a visual representation of the data transformation process. Users can acquire, transform, and blend multiple data sources essentially by dragging and dropping icons on a screen. This GUI approach is beneficial to NI employees who aren’t proficient at manipulating data using something like SQL.

The graphical approach has been part of a number of tools. There are also some systems which just figure out where to put what.

The issue for me is, “What happens to rich media like imagery and unstructured information like email?”

There are systems which handle these types of content.

Another challenge is the dependence on structured relational data tables. Certain types of operations are difficult in this environment.

The write up is interesting, but it reveals that a narrow view of available tools may produce a partial solution.

Stephen E Arnold, January 20, 2016

Boolean Search: Will George Boole Rotate in His Grave?

January 12, 2016

Boolean logic is, for most math wonks, the father of Boolean logic. This is a nifty way to talk about sets and what they contain. One can perform algebra and differential equations whilst pondering George and his method for thinking about fruits when he went shopping.

In the good old days of search, there was one way to search. One used AND, OR, NOT, and maybe a handful of other logic operators to retrieve information from structured indexes and content. Most folks with a library science degree or a friendly math major can explain Boolean reasonably well. Here’s an example which might even work on CSA ProQuest (nèe Lockheed Dialog) even today:

CC=77? AND scam?

The systems when fed the right query would reply with pretty good precision and recall. Precision provided info that was supposed to be useful. Recall meant that what should be included was in the result set.

I thought about Boole, fruit, and logic when I read “The Best Boolean and Semantic Search Tool.” Was I going to read about SDC’s ORBIT, ESA Quest, or (heaven help me) the original Lexis system?

Nope.

I learned about LinkedIn. Not one word about Palantir’s injecting Boolean logic squarely in the middle of its advanced data management processes. Nope.

LinkedIn. I thought that LinkedIn used open source Lucene, but maybe the company has invested in Exorbyte, Funnelback, or some other information access system.

The write up stated:

If you use any source of human capital data to find and recruit people (e.g., your ATS/CRM, resume databases, LinkedIn, Google, Facebook, Github, etc.) and you really want to understand how to best approach your talent sourcing efforts, I recommend watching this video when you have the time.

Okay, human resource functions. LinkedIn, right.

But there is zero content in the write up. I was pointed to a video called “Become a LinkedIn Search Ninja: Advanced Boolean Search” on YouTube.

Here’s what I learned before I killed the one hour video:

  1. The speaker is in charge of personnel and responsible for Big Data activities related to human resources
  2. Search is important to LinkedIn users
  3. Profiles of people are important
  4. Use OR. (I found this suggestion amazing.)
  5. Use iterative, probabilistic, and natural language search, among others. (Yep, that will make sense to personnel professionals.)

Okay. I hit the stop button. Not only will George be rotating, I may have nightmares.

Please, let librarians explicitly trained in online search and retrieval explain methods for obtaining on point results. Failing a friendly librarian, ask someone who has designed a next generation system which provides “helpers” to allow the user to search and get useful outputs.

Entity queries are important. LinkedIn can provide some useful information. The tools to obtain that high value information are a bit more sophisticated than the recommendations in this video.

Stephen E Arnold, January 12, 2016

The Secret Weapon of Predictive Analytics Revealed

January 8, 2016

I like it when secrets are revealed. I learned how to unlock the treasure chest containing predictive analytics secret weapon. You can too. Navigate to “Contextual Integration Is the Secret Weapon of Predictive Analytics.”

The write up reports:

Predictive analytics has been around for years, but only now have data teams begun to refine the process to develop more accurate predictions and actionable business insights. The availability of tremendous amounts of data, cheap computation, and advancements in artificial intelligence has presented a massive opportunity for businesses to go beyond their legacy methodologies when it comes to customer data.

And what is the secret?

Contextual transformation.

Here’s the explanation:

A major part of this transformation is the realization that data needs to be looked at from as many angles as possible in an effort to create a multi-dimensional profile of the customer. As a consequence, we view recommendations through the lens of ensembles in which each modeled dimension may be weighted differently based on real-time contextual information. This means that, rather than looking at just transactional information, layering in other types of information, such as behavioral data, gives context and allows organizations to make more accurate predictions.

Is this easy?

Nope. The article reminds the reader:

A sound approach follows the scientific method, starting with understanding the business domain and the underlying data that is available. Then data scientists can prepare to test a particular hypothesis, build a model, evaluate results, and refine the model to draw general conclusions.

I would point out that folks at Palantir, Recorded Future, and other outfits have been working for years to deal with integration, math, and sense making.

I wonder if the wonks at these firms have realized that contextual integration is the secret? I assume one could ask IBM Watson or just understand the difference between interpreting marketing inputs from a closed user base and dealing with slightly more slippery data has more than one secret.

Stephen E Arnold, January 8, 2016

Dark Web and Tor Investigative Tools Webinar

January 5, 2016

Telestrategies announced on January 4, 2016, a new webinar for active LEA and intel professionals. The one hour program is focused on tactics, new products, and ongoing developments for Dark Web and Tor investigations. The program is designed to provide an overview of public, open source, and commercial systems and products. These systems may be used as standalone tools or integrated with IBM i2 ANB or Palantir Gotham. More information about the program is available from Telestrategies. There is no charge for the program. In 2016, Stephen E Arnold’s new Dark Web Notebook will be published. More information about the new monograph upon which the webinar is based may be obtained by writing benkent2020 at yahoo dot com.

Stephen E Arnold, January 5, 2016

Are Search Unicorns Sub Prime Unicorns?

January 4, 2016

The question is a baffler. Navigate to “Sorting Truth from Myth at Technology Unicorns.” If the link is bad or you have to pay to read the article in the Financial Times, pony up, go to the library, or buy hard copy. Don’t complain to me, gentle reader. Publishers are in need of revenue. Now the write up:

The assumption is that a unicorn exists. What exists are firms with massive amounts of venture funding and billion dollar valuations. I know the money is or was real, but the “sub prime unicorn” is a confection from a money thought leader Michael Moritz. A subprime unicorn is a co9mpany “built on the flimsiest of edifices.” Does this mean fairy dust or something more substantial?

According to the write up:

High quality global journalism requires investment. Please share this article with others using the link below, do not cut & paste the article. But the way in which private market valuations have become skewed and inflated as start-ups have delayed IPOs raises questions about the financing of innovation. Despite the excitement, venture capital has produced weak returns in recent decades — only a minority of funds have produced rewards high enough to compensate investors for illiquidity and opacity.

Why would funding start ups perform better than a start up financed by mom, dad, and one’s slightly addled, but friendly, great aunt?

The article then makes a reasonably sane point:

With the rise in US interest rates, the era of ultra-cheap financing is ending. As it does, Silicon Valley’s unicorns are losing their mystique and having to work to raise equity, sometimes at valuations below those they achieved before. The promise of private financing is being tested, and there will be disappointments. It does not pay to be dazzled by mythical beasts.

Let’s think a moment about search and content processing. The mid tier consulting firms—the outfits I call azure chip outfits—have generated some pretty crazy estimates about the market size for search and content processing solutions.

The reality is at odds with these speculative, marketing fueled prognostications. Yep, I would include the wizards at IDC who wanted $3,500 to sell an eight page document with my name on it without my permission. Refresh yourself on the IDC Schubmehl maneuver at this link.

Based on my research, two enterprise search outfits broke $150 million in revenues prior to 2011: Endeca tallied an estimated $150 million in revenues and Autonomy reported $700 million in revenues. Both outfits were sold.

Since 2012 exactly zero enterprise search firms have generated more than $700 million in revenues. Now the wild and crazy funding of search vendors has continued apace since 2012. There are a number of search and retrieval companies and some next generation content processing outfits which have ingested tens of millions of dollars.

How many of these outfits have gone public in the zero cost money environment? Based on my records, zero. Why haven’t Attivio, BA Insight, Coveo, Palantir and others cashed in on their technology, surging revenues, and market demand?

There are three reasons:

  1. The revenues are simply acceptable, not stunning. In the post Fast Search & Transfer era, twiddling the finances carries considerable risks. Think about a guilty decision for a search wizard. Yep, bad.
  2. The technology is a rehash gilded with new jargon. Take a look at the search and content processing systems, and you find the same methods and functions that have been known and in use for more than 30 years. The flashy interfaces are new, but the plumbing still delivers precision and recall which has hit a glass ceiling at 80 to 90 percent accuracy for the top performing systems. Looking for a recipe with good enough relevance is acceptable. Looking for a bad actor with a significant margin for error is not so good.
  3. The smart software performs certain functions at a level comparable to the performance of a subject matter index when certain criteria are met. The notion of human editors riding herd on entity and synonym dictionaries is not one that makes customers weep with joy. Smart software helps with some functions, but today’s systems remain anchored in human operators, and the work these folks have to perform to keep the systems in tip top share is expensive. Think about this human aspect in terms of how Palantir explains architects’ changes to type operators or the role of content intake specialists using the revisioning and similar field operations.

Why do I make this point in the context of unicorns? Search has one or two unicorns. I would suggest Palantir is a unicorn. When I think of Palantir, I consider this item:

To summarize, only a small number of companies reach the IPO stage.

Also, the HP Autonomy “deal” is a quasi unicorn. IBM’s investment in Watson is a potential unicorn if and when IBM releases financial data about his TV show champion.

Then there are a number of search and content processing creatures which could be hybrids of a horse and a donkey. The investors are breeders who hope that the offspring become champions. Long shots all.

The Financial Times’s article expresses a broad concept. The activities of the search and content processing vendors in the next 12 to 18 months will provide useful data about the genetic make up of some technology lab creations.

Stephen E Arnold, January 4, 2015

Weekly Watson: In the Real World

January 2, 2016

I want to start off the New Year with look at Watson in the real world. My real world is circumscribed by abandoned coal mines and hollows in rural Kentucky. I am pretty sure this real world is not the real world assumed in “IBM Watson: AI for the Real World.” IBM has tapped Bob Dylan, a TV game show, and odd duck quasi chemical symbols to communicate the importance of search and content processing.

The write up takes a different approach. In fact, the article begins with an interesting comment:

Computers are stupid.

There you go. A snazzy one liner.

The purpose of the reminder that a man made device is not quite the same as one’s faithful boxer dog or next door neighbor’s teen is startling.

The article summarizes an interview with a Watson wizard, Steven Abrams, director of technology for the Watson Ecosystem. This is one of those PR inspired outputs which I quite enjoy.

The write up quotes Abrams as saying:

“You debug Watson’s system by asking, ‘Did we give it the right data?'” Abrams said. “Is the data and experience complete enough?”

Okay, but isn’t this Dr. Mike Lynch’s approach. Lynch, as you may recall, was the Cambridge University wizard who was among the first to commercialize “learning” systems in the 1990s.

According to the write up:

Developers will have data sets they can “feed” Watson through one of over 30 APIs. Some of them are based on XML or JSON. Developers familiar with those formats will know how to interact with Watson, he [Abrams] explained.

As those who have used the 25 year old Autonomy IDOL system know, preparing the training data takes a bit of effort. Then as the content from current content is fed into the Autonomy IDOL system, the humans have to keep an eye on the indexing. Ignore the system too long, and the indexing “drifts”; that is, the learned content is not in tune with the current content processed by the system. Sure, algorithms attempt to keep the calibrations precise, but there is that annoying and inevitable “drift.”

IBM’s system, which strikes me as a modification of the Autonomy IDOL approach with a touch of Palantir analytics stirred in is likely to be one expensive puppy to groom for the dog show ring.

The article profiles the efforts of a couple of IBM “partners” to make Watson useful for the “real” world. But the snip I circled in IBM red-ink red was this one:

But Watson should not be mistaken for HAL. “Watson will not initiate conduct on its own,” IBM’s Abrams pointed out. “Watson does not have ambition. It has no objective to respond outside a query.” “With no individual initiative, it has no way of going out of control,” he continued. “Watson has a plug,” he quipped. It can be disconnected. “Watson is not going to be applied without individual judgment … The final decision in any Watson solution … will always be [made by] a human, being based on information they got from Watson.”

My hunch is that Watson will require considerable human attention. But it may perform best on a TV show or in a motion picture where post production can smooth out the rough edges.

Maybe entertainment is “real”, not the world of a Harrod’s Creek hollow.

Stephen E Arnold, January 2, 2016

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta