Hewlett Packard Lusts after Big Data

December 16, 2011

As Web users continue creating structured and unstructured data at higher volumes than ever before we are starting to need technology to analyze it.

According to the Dec 1, Front Line article “HP Predicts 50 Zettabytes of Data will be Created Annually by 2020,” Hewlett Packard (HP) predicts that by 2020, fifty zettabytes (fifty billion terrabytes) of data will be created every year. This will present a major challenge for businesses.

Prith Banerjee, head of HP Labs, said at the firm’s Discover event:

By 2020 there could be as many as 10 billion people on the planet and some four billion of these will be online interacting on social networks. While now there are 2.5 million tweets per day this will rise to tens of millions.There’s also going to be a huge increase of sensors on the network measuring everything from temperature to heart monitoring. We expect there to be one trillion sensors by 2020.

HP Labs is currently working to address this issue by investigating technology that tracks a variety of complex events which must be correlated so that patterns can be detected. It could contextually analyze what customers say on twitter a mere ten seconds after the tweet is sent.

What will Autonomy’s role in this big data love fest be? Stay tuned.

Jasmine Ashton, December 16, 2011

Karmasphere and MapR Team Up on Hadoop Help

December 15, 2011

Karmasphere and MapR Technologies are working together to make Hadoop’s Big Data Analytics platform more accessible, announces Karmasphere in “Combination Offers Self-Service Big Data Analytics with Minimal IT Support.” Hadoop, of course is free as open source software. You can, however, purchase help in managing it.

Karmasphere Analytics is now available on MapR’s Hadoop distribution system. The write up notes:

‘Karmasphere’s graphical Big Data Analytics workspace is the perfect complement to MapR’s easy to use, dependable and fast platform,’ said Jack Norris, vice president of marketing, MapR. ‘With the availability of Karmasphere products on our distribution, data analysts can derive insights from their structured and unstructured data in Hadoop without developing MapReduce programs.’

Karmasphere helps its customers use Hadoop to extract patterns, relationships, and drivers from big data. The company boasts that its Analytics Engine is intuitive and simplifies data analysis.

MapR Technologies helps business users who don’t also happen to be IT pros efficiently manage their Hadoop implementation. It prides itself on making Hadoop more reliable and easier to use.

Cynthia Murrell, December 15, 2011

Sponsored by Pandia.com

IBM, Watson, and Patents

December 13, 2011

What no game show?

Although it’s getting a lot of recognition lately, Apple’s Siri probably isn’t the smartest machine on the block.

IBM’s Watson, if you remember, was the one to beat Ken Jennings in Jeopardy. With the computer’s speech recognition, natural language processing, machine learning, and data mining, IBM is now pushing Watson into other applications.

For example, WellPoint, a health plan company, is using Watson to search patient records and improve diagnosis. We learn more in the article on Slashdot, “IBM Watson to Battle Patent Trolls”:

..IBM itself is using Watson to help sell Watson (and other IBM products) to other companies. Now, using Watson’s data mining and natural language talents, IBM has created the Strategic IP Insight Platform, or SIIP, a tool that has already scanned millions of medical patents and journals for the sake of improving drug discovery — and in the future, it’s easy to see how the same tool could be used to battle patent trolling, too.

It seems there are a lot of present and future implications for the company, but where’s the cloud service which showcases this formidable system?

Andrea Hayden, December 13, 2011

Sponsored by Pandia.com

Digital Reasoning Receives Funding from Silver Lake

December 6, 2011

Companies that combine big data expertise with analytics knowledge are a hot commodity these days as government and private firms are looking to invest in technology to make sense of the massive amounts of unstructured data being collected.

On this note, Big Data Analytics specialist Digital Reasoning announced in a December 6 news release “Digital Reasoning Raises Venture Financing for Automated Understanding of Big Data” that it has successfully raised Series B funding with help from In-Q-Tel, individual partners of Silver Lake, and other private investors. The company did not disclose the amount, but a GigaOM article uncovered it’s SEC filing which puts the number at $4.2 million.

In addition to achieving this feat, the company also welcomed industry veteran and Silver Lake Sumeru partner John Brennan to its board of directors.

Digital Reasoning uses its flagship product Synthesys to analyze unstructured and structured big data to reveal relationships between people, place and time. It takes text-based data and sifts through documents and connects the dots without company employees having to read them all. Digital Reasoning works with more than a dozen government agencies to uncover security threats and accelerate the time to actionable intelligence.

Brennan stated:

“Organizations in every market are looking for ways to exploit the information and intelligence embedded in unstructured data; Synthesys could be a transformational solution in the enterprise as organizations develop their big data strategies,” said John Brennan. “Digital Reasoning’s platform can go beyond its success in the government intelligence market to help enterprises quickly analyze big data to detect fraud, uncover market trends, gain better insight into customer behavior, and mitigate risk.”

The combined power of an investment of this magnitude and Brennan’s software and operating background will allow, the already successful, company to potentially expand beyond its current government intelligence work into new markets.

Jasmine Ashton, December76, 2011

Sponsored by Pandia.com

Sentiment Analysis Explained

December 1, 2011

Sentiment and text mining analytics company Lexalytics  has created the first easy to use semantic classifier by compiling over 1.1 million words and phrases from Wikipedia. Sentiment analysis, or opinion mining, refers to the application of natural language processing, computational linguistics, and text analytics to identify and extract subjective information in source materials.

I read a recent Click Centive post called “OEM Text Analytics from Lexalytics”  that breaks down the concept of sentiment analysis and scoring and provides a series of posts related to Lexalytics software.

The post states:

Sentiment scoring allows a computer to consistently rate the positive or negative assertions that are associated with a document or entity. The scoring of sentiment (sometimes referred to as tone) from a document is a problem that was originally raised in the context of marketing and business intelligence, where being able to measure the public’s reaction to a new marketing campaign (or a corporate scandal) can have a measurable financial impact on your business.

This is an informative post, but I’m more interested to see specific information regarding the “easy to user semantic classifier” that Lexalytics has created, rather than generalities on sentiment scoring.

Jasmine Ashton, December 1, 2011

Sponsored by Pandia.com

Inteltrax: Top Stories, November 21 to November 25

November 28, 2011

Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, the highs and lows of recent analytics news.

On the high side, was our story “Speech Analytics Market Approaches Billions” that chronicled the success of applying unstructured big data analytic techniques to recorded speech, such as in call centers.

On the low side, we found “Mobile BI Takes a Surprising Misstep” explores how the once bustling mobile BI market recently took a hit.

And somewhere in the middle, we found “In-Memory Databases Cause a Stir” attempted to draw the line between traditionalist and futurists of analytics.

It’s a wild ride every week in the world of big data analytics. Sure things go bust, underdogs appear from nowhere and divisions are drawn. Stay tuned to see where it all leads.

Follow the Inteltrax news stream by visiting www.inteltrax.com

Patrick Roland, Editor, Inteltrax.

November 28, 2011

Bloomberg Discovers Palantir: Huh?

November 23, 2011

News flash! Bloomberg Businessweek has realized that Palantir, which has garnered more than $90 million in funding,is indispensible to the US intelligence community. Er, okay. You will want to read this “real” news story yourself. Just point your monitored browser at “Palantir: The War on Terror’s Secret Weapon.” Palantir has been a well kept secret at least in Bloomberg’s news room. Palantir ended up in a nifty legal spat with i2 Group, not part of IBM. The settlement was sealed, which certainly catches the attention of the goslings in Harrod’s Creek, but not the “real” journalists in New York. The fact that Palantir is the PowerPoint superstar which has the attention of those attention deficit disorder presenters is not on the radar of the Bloombergians.

Here’s the passage which I enjoyed:

The origins of Palantir go back to PayPal, the online payments pioneer founded in 1998. A hit with consumers and businesses, PayPal also attracted criminals who used the service for money laundering and fraud. By 2000, PayPal looked like “it was just going to go out of business” because of the cost of keeping up with the bad guys, says Peter Thiel, a PayPal co-founder….PayPal’s computer scientists set to work building a software system that would treat each transaction as part of a pattern rather than just an entry in a database. They devised ways to get information about a person’s computer, the other people he did business with, and how all this fit into the history of transactions. These techniques let human analysts see networks of suspicious accounts and pick up on patterns missed by the computers. PayPal could start freezing dodgy payments before they were processed. “It saved hundreds of millions of dollars,” says Bob McGrew, a former PayPal engineer and the current director of engineering at Palantir.

Want more? Well, the story sprawls over six pages.

My view?

First, point your browser to www.inteltrax.com and read the stories about Palantir.

Second, what about the legal dust up? Well, run a Google query and get the scoop. The legal documents are quite interesting as well. The interesting information is available on WestlawNext and Lexis. The free Web content is, well, not industrial strength.

Third, what about Digital Reasoning, a company with groundbreaking entity based analytics? Check that out at www.digitalreasoning.com . For more amusement look at www.recordedfuture.com.

You can read interviews with founders of companies with technology that goes beyond Palantir at these two links:

  1. Tim Estes, Digital Reasoning here
  2. Christian Ahlberg, Recorded Future here

We are not “real” journalists. On the other hand, you will get some insight into what’s happening with next generation analytics. No turkey on Thanksgiving at Beyond Search.

Stephen E Arnold, November 24, 2011

Freebie. Unlike Palantir’s solutions.

Inteltrax: Top Stories, November 14 to November 18

November 21, 2011

Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, Some exciting nes among our favorite providers.

The most interesting tale came from, “Tibco and Digital Reasoning Give A Glimpse at Operational Thinking,” which looked at the minds of the CEOs of these exciting organizations.

In “IBM Ready to Take Analytics Seriously” we discovered some interesting news that shows the computing giant is pushing all its chips into the analytic pile.

However, our story “Qlik Tech’s Collaborative BI is Too Much of a Good Thing” shows that too many cooks can spoil one’s analytic soup.

Here’s just another quick sampling of the many ways big data analytics is changing. And we’re following the biggest names in big data everyday, noting the moves and blunders therein.

Follow the Inteltrax news stream by visiting http://www.inteltrax.com/

Patrick Roland, Editor, Inteltrax.

Selventa and Linguamatics Team Up to Mine Scientific Research Details

November 14, 2011

At the end of last month, Cambridge MA based Selventa, a personalized healthcare company, announced that they were teaming up with text mining UK based software firm Linguamatics  to extract complex life science knowledge in a computable, structured, biological expression language (BEL) format that can be used to interpret large-scale experimental data in the context of published literature.

In a November 7, Fierce Biotech post “Selventa and Linguamatics Team on Mining Details in Journals” David de Graaf, president and CEO of Selventa, was quoted saying:

“Collaborating with Linguamatics will enable rapid yet comprehensive investigation of new areas of biology by extracting computable knowledge from unstructured text. This will lead to innovation on many fronts, such as next generation sequencing, where well-structured information for reasoning has been limited.”

The technology created from this unique partnership could save Scientists countless hours that they previously spent poring over scientific texts or doing manual database searches to get to the findings they need for studies. You’ve gotta love technological innovation.

Jasmine Ashton, November 14, 2011

Search Silver Bullets, Elixirs, and Magic Potions: Thinking about Findability in 2012

November 10, 2011

I feel expansive today (November 9, 2011), generous even. My left eye seems to be working at 70 percent capacity. No babies are screaming in the airport waiting area. In fact, I am sitting in a not too sticky seat, enjoying the announcements about keeping pets in their cage and reporting suspicious packages to law enforcement by dialing 250.

I wonder if the mother who left a pink and white plastic bag with a small bunny and box of animal crackers is evil. Much in today’s society is crazy marketing hype and fear mongering.

Whilst thinking about pets in cages and animal crackers which may be laced with rat poison, and plump, fabric bunnies, my thoughts turned to the notion of instant fixes for horribly broken search and content processing systems.

I think it was the association of the failure of societal systems that determined passengers at the gate would allow a pet to run wild or that a stuffed bunny was a threat. My thoughts jumped to the world of search, its crazy marketing pitches, and the satraps who have promoted themselves to “expert in search.” I wanted to capture these ideas, conforming to the precepts of the About section of this free blog. Did I say, “Free.”

A happy quack to http://www.alchemywebsite.com/amcl_astronomical_material02.html for this image of the 21st century azure chip consultant, a self appointed expert in search with a degree in English and a minor in home economics with an emphasis on finger sandwiches.

The Silver Bullets, Garlic Balls, and Eyes of Newts

First, let me list the instant fixes, the silver bullets,  the magic potions, the faerie dust, and the alchemy which makes “enterprise search” work today. Fasten your alchemist’s robe, lift your chin, and grab your paper cone. I may rain on your magic potion. Here are 14 magic fixes for a lousy search system. Oh, one more caveat. I am not picking on any one company or approach. The key to this essay is the collection of pixie dust, not a single firm’s blend of baloney, owl feathers, and goat horn.

  1. Analytics (The kind equations some of us wrangled and struggled with in Statistics 101 or the more complex predictive methods which, if you know how to make the numerical recipes work, will get you a job at Palantir, Recorded FutureSAS, or one of the other purveyors of wisdom based on big data number crunching)
  2. Cloud (Most companies in the magic elixir business invoke the cloud. Not even Macbeth’s witches do as good  a job with the incantation of Hadoop the Loop as Cloudera,but there are many contenders in this pixie concoction. Amazon comes to mind but A9 gives me a headache when I use A9 to locate a book for my trusty e Reeder.)
  3. Clustering (Which I associate with Clustify and Vivisimo, but Vivisimo has morphed clustering in “information optimization” and gets a happy quack for this leap)
  4. Connectors (One can search unless one can acquire content. I like the Palantir approach which triggered some push back but I find the morphing of ISYS Search Software a useful touchstone in this potion category)
  5. Discovery systems (My associative thought process offers up Clearwell Systems and Recommind. I like Recommind, however, because it is so similar to Autonomy’s method and it has been the pivot for the company’s flip flow from law firms to enterprise search and back to eDiscovery in the last 12 or 18 months)
  6. Federation (I like the approach of Deep Web Technologies and for the record, the company does not position its method as a magical solution, but some federating vendors do so I will mention this concept. Yhink mash up and data fusion too)
  7. Natural language processing (My candidate for NLP wonder worker is Oracle which acquired InQuira. InQuira is  a success story because it was formed from the components of two antecedent search companies, pitched NLP for customer support,and got acquired by Oracle. Happy stakeholders all.)
  8. Metatagging (Many candidates here. I nominate the Microsoft SharePoint technology as the silver bullet candidate. SharePoint search offers almost flawless implementation of finding a document by virtue of  knowing who wrote it, when, and what file type it is. Amazing. A first of sorts because the method has spawned third party solutions from Austria to t he United States.)
  9. Open source (Hands down I think about IBM. From Content Analytics to the wild and crazy Watson, IBM has open source tattooed over large expanses of its corporate hide. Free? Did I mention free? Think again. IBM did not hit $100 billion in revenue by giving software away.)
  10. Relationship maps (I have to go with the Inxight Software solution. Not only was the live map an inspiration to every business intelligence and social network analysis vendor it was cool to drag objects around. Now Inxight is part of Business Objects which is part of SAP, which is an interesting company occupied with reinventing itself and ignored TREX, a search engine)
  11. Semantics (I have to mention Google as the poster child for making software know what content is about. I stand by my praise of Ramanathan Guha’s programmable search engine and the somewhat complementary work of Dr. Alon Halevy, both happy Googlers as far as I know. Did I mention that Google has oodles of semantic methods, but the focus is on selling ads and Pandas, which are somewhat related.)
  12. Sentiment analysis (the winner in the sentiment analysis sector is up for grabs. In terms of reinventing and repositioning, I want to acknowledge Attensity. But when it comes to making lemonade from lemons, check out Lexalytics (now a unit of Infonics). I like the Newssift case, but that is not included in my free blog posts and information about this modest multi-vehicle accident on the UK information highway is harder and harder to find. Alas.)
  13. Taxonomies (I am a traditionalist, so I quite like the pioneering work of Access Innovations. But firms run by individuals who are not experts in controlled vocabularies, machine assisted indexing, and ANSI compliance have captured the attention of the azure chip, home economics, and self appointed expert crowd. Access innovations knows its stuff. Some of the boot camp crowd, maybe somewhat less? I read a blog post recently that said librarians are not necessary when one creates an enterprise taxonomy. My how interesting. When we did the ABI/INFORM and Business Dateline controlled vocabularies we used “real” experts and quite a few librarians with experience conceptualizing, developing, refining, and ensuring logical consistency of our word lists. It worked because even the shadow of the original ABI/INFORM still uses some of our term 30 plus years later. There are so many taxonomy vendors, I will not attempt to highlight others. Even Microsoft signed on with Cognition Technologies to beef up its methods.)
  14. XML (there are Google and MarkLogic again. XML is now a genuine silver bullet. I thought it was a markup language. Well, not any more, pal.)

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta