Data Analysis by Algorithm

December 22, 2014

The folks at Google may have the answer for the dearth of skilled data analysts out there. Unfortunately for our continuing job crisis, that answer does not lie in (human) training programs. Google Research Blog discusses “Automatically Making Sense of Data.” Writers Keven Murphy and David Harper ask:

“What if one could automatically discover human-interpretable trends in data in an unsupervised way, and then summarize these trends in textual and/or visual form? To help make progress in this area, Professor Zoubin Ghahramani and his group at the University of Cambridge received a Google Focused Research Award in support of The Automatic Statistician project, which aims to build an ‘artificial intelligence for data science’.”

Trends in time-series data have thus far provided much fodder for the team’s research. The article details an example involving solar-irradiance levels over time, and discusses modeling the data using Gaussian-based statistical models. Murphy and Harper report on the Cambridge team’s progress:

“Prof Ghahramani’s group has developed an algorithm that can automatically discover a good kernel, by searching through an open-ended space of sums and products of kernels as well as other compositional operations. After model selection and fitting, the Automatic Statistician translates each kernel into a text description describing the main trends in the data in an easy-to-understand form.”

Naturally, the team is going on to work with other kinds of data. We wonder—have they tried it on Google Glass market projections?

There’s a simplified version available for demo at the project’s website, and an expanded version should be available early next year. See the write-up for the technical details.

Cynthia Murrell, December 22, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Narrative Science Gets Money to Crunch Numbers

December 18, 2014

A smaller big data sector that specializes in text analysis to generate content and reports is burgeoning with startups. Venture Beat takes a look out how one of these startups, Narrative Science, is gaining more attention in the enterprise software market: “Narrative Science Pulls In $10M To Analyze Corporate Data And Turn It Into Text-Based Reports.”

Narrative Science started out with software that created sport and basic earnings articles for newspaper filler. It has since grown into help businesses in different industries to take their data by the digital horns and leverage it.

Narrative Science recently received $10 million in funding to further develop its software. Stuart Frankel, chief executive, is driven to help all industries save time and resources by better understanding their data

“ ‘We really want to be a technology provider to those media organizations as opposed to a company that provides media content,’ Frankel said… ‘When humans do that work…it can take weeks. We can really get that down to a matter of seconds.’”

From making content to providing technology? It is quite a leap for Narrative Science. While they appear to have a good product, what is it they exactly do?

Whitney Grace, December 18, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Watson in a Beta Phase

December 18, 2014

IBM has put Watson to work in different fields, including: intelligence, cooking, and medicine. The goal is to apply Watson’s advanced analytic software to improve the quality and workflow in these fields as well as discover new insights. Watson Analytics will launch its first public beta test this month, three months after its private beta tests, says ZDNet in “IBM’s Watson Analytics Enters Public Beta.”

Watson Analytics will be freemium software available for mobile and Web devices to run predictive analytics and use the information for visual storytelling.

How does it work?

“Users of Watson Analytics feed in their own raw data, say, in the form of a spreadsheet, which the service then crunches with its own statistical analysis to highlight associations between different variables. It saves execs from needing to know how to write their own scripts or understand statistics in order to derive meaning from their data.”

Watson Analytics is still being changed to meet users’ needs, such as allowing them to create dashboards and infographics and being compatible with other programs: Oracle, SalesForce, Google Docs, and more.

IBM is still programming all the Watson Analytics features, but more details will be revealed as the public tests it.

Is this another PR scheme for Watson and IBM? How much have they spent on public relations? How much will Watson Analytics generate for IBM?

Whitney Grace, December 18, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Short Honk: Google and Fish

December 17, 2014

You may want to read “Google Helps to Use Big Data for Global Surveillance—And That’s Good.” I have no big thoughts about this write up. Googlers like sushi, so protecting fish from overzealous fisher people seems logical to me. I would raise one question you ponder after you have read the article:

What happens when humans are tracked and analyzed in this manner?

Next:

Is this function in place as you read this?

I have no answers, but I enjoy learning what other people think. We do not need to discuss the meaning of “good.”

Stephen E Arnold, December 17, 2014

Rocket Software Explores The UniDataVerse

December 16, 2014

Rocket Software is growing their product base and Business Wire reports that “Rocket Software’s CorVu NG Business Intelligence Product Now Works With Its UniData and UniVerse Databases.” The new CorVu NG is compatible with UniData and UniVerse-U2 products. All U2 users will be transferred over to the product will not be charged any extra fees and will have access to more features, including drill down path support and Aerotext text analytics.

“Peter Richardson, Vice President and General Manager of Rocket’s business intelligence and analytics business unit, says, ‘One of the biggest advantages that our valued customers gets making this change is getting access to a wider range of Rocket services and products to help them reach their business goals. CorVu NG has the ability to work with the full suite of CorVu performance management modules, including CorStrategy, CorPlanning, CorRisk, and CorProject. This is a major enhancement for U2 customers who are interested in elevating their BI solution to one that includes not only tactical performance visualizations, but also strategic, high-level KPI tracking.’ “

It was also noted that not only is Rocket Software expanding its product base to attract more clients, but the CoVu NG software will allow their current clients to have more opportunities as well. This is in compliance with Rocket Software’s philosophy to offer a broader range of solutions to complement and supplement clients’ solutions.

Whitney Grace, December 16, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Bottlenose: Not a Dolphin, Another Intelligence Vendor

December 15, 2014

Last week, maybe two weeks ago, I learned that KPMG invested in Bottlenose. The company say that the cash will “take trend intelligence global.” The company asserts here:

We organize the world’s attention and emotion.

I am, as you may know, am interested in what I call NGIA systems. These are next generation information access systems. Instead of dumping a list of Google-style search results in front of me, NGIA systems provide a range of tools to use information in ways that do not require me to formulate a query, open and browse possibly relevant documents, and either output a report or pipe the results into another system. For example, in one application of NGIA system functions, the data from a predictive system can be fed directly into the autonomous component of a drone. The purpose is to eliminate the time delay between an action that triggers a flag in a smart system and taking immediate action to neutralize a threat. NGIA is not your mother’s search engine, although I suppose one could use this type of output input operation to identify a pizza joint.

I scanned the Bottlenose Web site, trying to determine if the claims of global intelligence and organizing the world’s attention and emotion was an NGIA technology or another social media monitoring service. The company asserts that it monitors “the stream.” The idea is that real-time information is flowing through the firm’s monitoring nodes. The content is obviously processed. The outputs are made available to those interested in the marketing.

The company states:

Our Trend Intelligence solutions will take in all forms of stream data, internal and external, for a master, cross-correlated view of actionable trends in all the real-time forces affecting your business.

The key phrase for me is “all forms” of data, “internal and external.” The result will be “a master, cross-correlated view of actionable trends in all the real time forces affecting your business.” Will Bottlenose deliver this type of output to its customers? See “Leaked Emails Reveal MPAA Plans to Pay Elected Officials to Attack Google.” Sure, but only after the fact. If the information is available via a service like Bottlenose there may be some legal consequences in my view.

By my count, there are a couple of “alls” in this description. A bit of reflection reveals that if Bottlenose is to deliver, the company has to have collection methods that work like those associated with law enforcement and intelligence agencies. A number of analysts have noted that the UK’s efforts to intercept data flowing through a Belgian telecommunications company’s servers is interesting.

Is it possible that a commercial operation, with or without KPMG’s investment, is about to deliver this type of comprehensive collection to marketers? Based on what the company’s Web site asserts, I come away with the impression that Bottlenose is similar to the governmental services that are leading to political inquiries and aggressive filtering of information on networks. China is one country which is not shy about its efforts to prevent certain information from reaching its citizens.

Bottlenose says:

Bottlenose Nerve Center™ spots real-time trends, tracks interests, measures conversations, analyzes keywords and identifies influencers. As we expand our library of data sources and aggregate the content, people, thinking and emotion of humanity’s connected communications, Bottlenose will map, reflect and explore the evolving global mind. We aim to continuously show what humanity is thinking and feeling, now.

I can interpret this passage as suggesting that a commercial company will deliver “all” information to a customer via its “nerve center.” Relationships between and among entities can be discerned; for example:

Trend Intelligence - Sonar

This is the type of diagram that some of the specialized law enforcement and intelligence systems generate for authorized users. The idea is that a connection can be spotted without having to do any of the Google-style querying-scanning-copying-thinking type work.

My view of Bottlenose and other company’s rushing to emulate the features and functio0ns of the highly specialized and reasonably tightly controlled systems in use by law enforcement and intelligence agencies may be creating some unrealistically high expectations.

The reality of many commercial services, which may or may not apply to Bottlenose, is that:

  1. The systems use information on RSS feeds, the public information available from Twitter and Facebook, and changes to Web pages. These systems do not and cannot due to the cost  perform comprehensive collection of high-interest data. The impression is that something is being done which is probably not actually taking place.
  2. The consequence of processing a subset of information is that the outputs may be dead wrong at worst and misleading at best. Numerical processes can identify that Lady Gaga’s popularity is declining relative to Taylor Swift’s. But this is a function that has been widely available from dozens of vendors for many years. Are the users of these systems aware of the potential flaws in the outputs? In my experience, nope.
  3. The same marketing tendencies that have contributed to the implosion of the commercial enterprise search sector are now evident in the explanation of what can be done with historical and predictive math. The hype may attract a great deal of money. But it appears that generating and sustaining revenue is a challenge few companies in this sector have been able to achieve.

My suggestion is that Bottlenose may not be a “first mover.” Bottlenose is a company that is following in the more than 15 year old footsteps of companies like Autonomy, developers of the DRE, and i2 Ltd. Both of these are Cambridge University alumni innovations. Some researchers push the origins of this type of information analysis back to the early 1970s. For me, the commercialization of the Bayesian and graph methods in the late 1990s is a useful take off point.

What is happening is that lower computing costs and cheaper storage have blended with mathematical procedures taught in most universities. Add in the Silicon Valley sauce, and we have a number of start ups that want to ride the growing interest in systems that are not forcing Google style interactions on users.

The problem is that it is far easier to paint a word picture than come to grips with the inherent difficulties in using the word “all.” That kills credibility in my book. For a company to deliver an NGIA solution, a number of software functions must be integrated into a functioning solution. The flame out of Fast Search & Transfer teaches a useful lesson. Will the lessons of Fast Search apply to Bottlenose? It will be interesting to watch the story unfold.

Stephen E Arnold, December 15, 2014

A Possibility of Profit from Autonomy Deal

December 15, 2014

While this is the season of miracles and magic, usually those are reserved for Hallmark movies and people in need, but one could argue that HP was in desperate need after the Autonomy fiasco. Maybe their Christmas wish will come true if the Information Week article “HP Cloud Adds Big Data Options” makes correct prediction.

HP will release its Haven big data analytics platform through the HP Helion cloud as Haven OnDemand. The writer believes this is HP’s next logical step given Autonomy Idol was released in January as SaaS. The popular Vertica DBMS will also be launches as a cloud service.

“Cloud-based database services have proven to be popular, with Amazon’s fast-growing Redshift service being an obvious point of comparison. Both HP Vertica and Redshift are distributed, columnar databases that are ideally suited to high-scale data-mart and data-warehouse use cases.”

HP wants to make a mark in the big data market and help their clients harness the valuable insights hiding in structured and unstructured data. While HP is on its way to becoming a key component in big data software, but it still needs improvement to compete. It doesn’t offer Hadoop OnDemand and it also lacks ETL, analytics software, and BI solutions that run alongside HP Haven OnDemand.

The company is finally moving forward and developing products that will start making up for the money lost in the Autonomy deal. How long will it take, however, to get every penny back?

Whitney Grace, December 15, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Cisco Relies on OpenSOC through GitHub When it Comes to Big Data

December 10, 2014

The article on Enterprise Networking Planet titled Cisco Goes Open-Source for Big Data Analytics discusses the change for Cisco with some high-ups in the company. Annie Ballew, Solutions Architect in the Cisco Security Business Group, mentions that OpenSOC is not actually a Security Information and Event Management system but rather should be considered “big data technology for security analytics.” OpenSOC is freely available through Github. The article states,

“While the OpenSOC project itself is open-source, Cisco is already leveraging the technology in its commercial products.”OpenSOC is currently included in our Managed Threat Defense services offering where it is installed, implemented and fully operationalized,” Ballew said. Cisco launched its Manage Threat Defense service in April. That service manages and monitors logs as well as a customer’s security event lifecycle. Ballew added that OpenSOC is also integrated with various other Cisco security components such as Sourcefire FirePower NGIPS, SourceFire AMP, and ThreatGrid.”

The article also remarks on the importance of Elasticsearch to OpenSOC. The Kibana project provides the dashboard for the opensource Elasticsearch project, and Cisco admits that they work with Elasticsearch, but currently that relationship is only through Kibana. Cisco has worked with open-source before, so perhaps it should be no surprise that they turn to OpenSOC to meet their security demands when it comes to big data.

Chelsea Kerwin, December 10, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Pi in the Sky: HP and IBM Race to Catch Up with NGIA Leaders

December 7, 2014

I read “HP Takes Analytics to the Cloud in Comeback to IBM’s Watson.” The write up is darned interesting. Working through the analysis reminded me that HP does not realized that Autonomy’s 1999 customer BAE Systems has been working with analytics from the cloud for—what?—15 years? What about Recorded Future, SAIC, and dozens of other companies running successful businesses with this strategy?

dreadnoughtus 2

The article points out that two large and somewhat pressured $100 billion companies are innovating like all get out. I learned:

Although it [Hewlett Packard] may not win any trivia contests in the foreseeable future, the hardware maker’s entry into the world of end-of-end analytics does hold up to Watson where the rubber meets the road in the enterprise…But the true equalizer for the company is IDOL, the natural language processing and search it obtained through the $11.7 billion acquisition of Autonomy Corp. PLC in 2011, which reduces the gap between human and machine interaction in a similar fashion to IBM’s cognitive computing platform.

Okay. IBM offers Watson, which was supposed to generate a billion or more by 2015 and then surge to $10 billion in revenue in another four or five years. What is Watson? As I understand it, Watson is open source code, some bits and pieces from IBM’s research labs, and wrappers that convert search into a towering giant of artificial intelligence. Why doesn’t IBM focus on its next generation information access units that are exciting and delivering services that customers want. i2 does not produce recipes incorporating tamarind. Cybertap does not help sick teenagers.

HP, on the other hand, owns the Autonomy Digital Reasoning Engine and the Integrated Data Operating Layer. These incorporate numerical recipes based on the work of Bayes, LaPlace, and Markov, among others. The technology is not open source. Instead, IDOL is a black box. HP spent $11 billion for Autonomy, figured out that it overpaid, wrote off $5 billion or so, and launched a global scorched earth policy for its management methods. Recently, HP has migrated DRE and IDOL to the cloud. Okay, but HP is putting more effort into accusing Autonomy of fooling HP. Didn’t HP buy Autonomy after experts reviewed the deal, the technology, and the financial statements? HP has lost years in an attempt to redress a perceived wrong. But HP decided to buy Autonomy.

Read more

Start Your Search Engines…Go!

December 5, 2014

What does predictive analytics have to do with formula 1 racing? Everything, says Computer World UK in “McLaren’s F1 Predictive Analytics Snapped Up By KPMG.” Formula 1 is to Europe as NASCAR is to the United States. It is one of Europe’s most popular sports and a lot of high-end technology is used to make the sport more exciting. McLaren is a top team and KPMG, a tax and advisory firm purchased its predictive analytics. KPMG will then use the analytics software to improve audits and advisory services.

“Simon Collins, KPMG’s UK chairman said: ‘McLaren has honed sophisticated predictive analytics and technologies that can be applied to many business issues. We believe this specialist knowledge has the power to radically transform audit, improving quality and providing greater insight to management teams, audit committees and investors.’”

McLaren is also renowned for its software being used to make split level decisions. The software’s potential is untested and its capability to help more industries is about to take off from the start line.

Whitney Grace, December 05, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Next Page »