Big Data Analytics and Sense Making with Synthesys
December 19, 2011
Tim Estes is the CEO and co-founder of Digital Reasoning. Digital Reasoning develops and markets solutions that provide Automated Understanding for Big Data.
There’s a great deal of talk about “big data” today. If you walk into an AT&T store near you, you may see the statistics of users sending over 3 Billion text messages a day or over 250 million tweets. Compare that to closer to 100 million or less tweets a day a year or two ago, and it’s daunting how rapidly the volume of digital information is increasing. A mobile phone without expandable storage frustrates users who want to keep a contacts list, rich media, and apps in their pocket. In organizations, the appetite for storage is significant. EMC, Hewlett Packard, and IBM are experiencing strong demand for their storage systems. Cloud vendors such as Amazon and Rackspace are also experiencing strong demand from companies offering compelling services to end users on their infrastructure. At a recent Amazon conference in Washington, Werner Vogels revealed that the AWS Cloud has hundreds of thousands of companies/customers running on it as some level. Finally, companies like Digital Reasoning are working the next generation of Cloud – automated understanding – that goes from a focus on infrastructure to sense-making of data that sits in hosted or private clouds.
While most of the attention has been on infrastructure like virtualization / hypervisors, Hadoop, and NoSQL data storage systems, we think those are really the enablers of the killer app for Cloud- which is making sense of data to solve information overload. Without next generation analytics and supporting technology, it is essentially impossible to:
- Analyze a flow of data from multiple sensors deployed in a factory
- Process mobile traffic at a telephone company
- Make sense of unstructured and structured information flowing through an email system
- Identify key entities and their importance in a stream of financial news and transaction data.
These are the real world problems that have engaged me for many years. I founded Digital Reasoning to automatically make sense of data because I believed that someday all software would learn and that would unleash the next great revolution in the Information Age. The demand for this revolution is inevitable because while data has increased exponentially, human attention has been essentially static in comparison. Technology to create better return on attention would go from “nice to have” to utterly essential. And now, that moment is here.
Digging a little deeper, Digital Reasoning has created a way to take human communication and use algorithms to make sense of it without having to depend on a human design, an ontology, or some other structure. Our system looks at patterns and the way a word is used in its context and bootstraps the understanding much like a human child does – creating associations and building into more complex relationships.
In 2009, we migrated onto Hadoop and began taking on the problem of managing very large scale unstructured data and move the industry beyond counting things that are well structured and toward being able to figure out exactly what the data means that you are measuring.
Digital Reasoning asks the question: “How do you take loose, noisy information that is disconnected and unstructured and then make sense of it so that you can then apply analytics to it in a way that is valuable to business?”
We identify actors, actions, patterns, and facts and then put it into the context of space and time in an efficient and scalable way. In the government scenario, that can mean to finding and stopping bad guys. In the legal environment they want to answer the questions of “who”, “what”, “where”, and “when”.
Digital Reasoning initially set our focus on the complex task of making sense out of massive volumes of unstructured text within the US Government Intelligence Community after the events of 9/11. But we also believe that our Synthesys software can be utilized in the commercial sector to create great value from the mountains of unstructured data that sit in the Enterprise and streaming in from the Web.
Companies with large-scale data will see value in investing in our technology because they cannot hire 100,000 people to go through and read all of the available material. This matters if you are a bank and trying to make financial trades. This matters for companies doing electronic discovery. This matters for health sectors that need help organizing medical records and guarding against fraud.
We are an emerging firm, growing rapidly and looking to have the best and the brightest join our quest to empower users and customers to make sense of their data through revolutionary software. With the recent investment from In-Q-Tel and partners of Silver Lake, I believe that Digital Reasoning has a great future ahead. We are on the bleeding edge of what is going on with Hadoop and Big Data in the engineering area and how to make sense of data through some of the most advanced learning algorithms in the world. Most of all we care that people are empowered with technology so that they can recover value and time in the race to overcome information overload.
To learn more about Digital Reasoning, navigate to our Web site and download our white paper.
Tim Estes, December 19, 2011
Sponsored by Pandia.com
Inteltrax: Top Stories, December 12 to December 16
December 19, 2011
Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, the issue of change in the analytic world—for the better, for the worse and everything in between.
One example of change came from our story, “Data Mining Changing Scientific Thought” shows how the way scientists think is being streamlined by analytics.
On the other hand, “ManTech has Uphill Climb with Intelligence Analytics,” shows that not all change looks promising, like one company’s new focus on intelligence.
And some change, well, we’re just not sure how it’ll pan out, like with the story “Predicting the Ponies is Just Unstructured Data” which exposes how the gambling industry could be changed by analytic tools. For the better or worse is up for debate.
Change, in any aspect of life, is inevitable. However, the world of big data analytics seems more susceptible than most. And we couldn’t be happier, as we watch the unexpected turns these changes bring to the industry every day.
Follow the Inteltrax news stream by visiting http://www.inteltrax.com/
Patrick Roland, Editor, Inteltrax.
IBM Redbooks Reveals Content Analytics
December 16, 2011
IBM Redbooks has put out some juicy reading for the azure chip consultants wanting to get smart quickly with IBM Content Analytics Version 2.2: Discovering Actionable Insight from Your Content. The sixteen chapters of this book take the reader from an overview of IBM content analytics, through understanding the details, to troubleshooting tips. The above link provides an abstract of the book, as well as links to download it as a PDF, view in HTML/Java, or order a hardcopy.
We learned from the write up:
The target audience of this book is decision makers, business users, and IT architects and specialists who want to understand and use their enterprise content to improve and enhance their business operations. It is also intended as a technical guide for use with the online information center to configure and perform content analysis with Content Analytics.
The product description notes a couple of specifics. For example, creating custom annotators with the LanguageWare Resource Workbench is covered. So is using the IBM Content Assessment to weed out superfluous data.
The content is, of course, slanted toward working with IBM solutions. However, there is also some more general information included. This is a good place to go to get a better handle on content management.
Cynthia Murrell, December 16, 2011
Sponsored by Pandia.com
Hewlett Packard Lusts after Big Data
December 16, 2011
As Web users continue creating structured and unstructured data at higher volumes than ever before we are starting to need technology to analyze it.
According to the Dec 1, Front Line article “HP Predicts 50 Zettabytes of Data will be Created Annually by 2020,” Hewlett Packard (HP) predicts that by 2020, fifty zettabytes (fifty billion terrabytes) of data will be created every year. This will present a major challenge for businesses.
Prith Banerjee, head of HP Labs, said at the firm’s Discover event:
By 2020 there could be as many as 10 billion people on the planet and some four billion of these will be online interacting on social networks. While now there are 2.5 million tweets per day this will rise to tens of millions.There’s also going to be a huge increase of sensors on the network measuring everything from temperature to heart monitoring. We expect there to be one trillion sensors by 2020.
HP Labs is currently working to address this issue by investigating technology that tracks a variety of complex events which must be correlated so that patterns can be detected. It could contextually analyze what customers say on twitter a mere ten seconds after the tweet is sent.
What will Autonomy’s role in this big data love fest be? Stay tuned.
Jasmine Ashton, December 16, 2011
Karmasphere and MapR Team Up on Hadoop Help
December 15, 2011
Karmasphere and MapR Technologies are working together to make Hadoop’s Big Data Analytics platform more accessible, announces Karmasphere in “Combination Offers Self-Service Big Data Analytics with Minimal IT Support.” Hadoop, of course is free as open source software. You can, however, purchase help in managing it.
Karmasphere Analytics is now available on MapR’s Hadoop distribution system. The write up notes:
‘Karmasphere’s graphical Big Data Analytics workspace is the perfect complement to MapR’s easy to use, dependable and fast platform,’ said Jack Norris, vice president of marketing, MapR. ‘With the availability of Karmasphere products on our distribution, data analysts can derive insights from their structured and unstructured data in Hadoop without developing MapReduce programs.’
Karmasphere helps its customers use Hadoop to extract patterns, relationships, and drivers from big data. The company boasts that its Analytics Engine is intuitive and simplifies data analysis.
MapR Technologies helps business users who don’t also happen to be IT pros efficiently manage their Hadoop implementation. It prides itself on making Hadoop more reliable and easier to use.
Cynthia Murrell, December 15, 2011
Sponsored by Pandia.com
IBM, Watson, and Patents
December 13, 2011
What no game show?
Although it’s getting a lot of recognition lately, Apple’s Siri probably isn’t the smartest machine on the block.
IBM’s Watson, if you remember, was the one to beat Ken Jennings in Jeopardy. With the computer’s speech recognition, natural language processing, machine learning, and data mining, IBM is now pushing Watson into other applications.
For example, WellPoint, a health plan company, is using Watson to search patient records and improve diagnosis. We learn more in the article on Slashdot, “IBM Watson to Battle Patent Trolls”:
..IBM itself is using Watson to help sell Watson (and other IBM products) to other companies. Now, using Watson’s data mining and natural language talents, IBM has created the Strategic IP Insight Platform, or SIIP, a tool that has already scanned millions of medical patents and journals for the sake of improving drug discovery — and in the future, it’s easy to see how the same tool could be used to battle patent trolling, too.
It seems there are a lot of present and future implications for the company, but where’s the cloud service which showcases this formidable system?
Andrea Hayden, December 13, 2011
Sponsored by Pandia.com
Digital Reasoning Receives Funding from Silver Lake
December 6, 2011
Companies that combine big data expertise with analytics knowledge are a hot commodity these days as government and private firms are looking to invest in technology to make sense of the massive amounts of unstructured data being collected.
On this note, Big Data Analytics specialist Digital Reasoning announced in a December 6 news release “Digital Reasoning Raises Venture Financing for Automated Understanding of Big Data” that it has successfully raised Series B funding with help from In-Q-Tel, individual partners of Silver Lake, and other private investors. The company did not disclose the amount, but a GigaOM article uncovered it’s SEC filing which puts the number at $4.2 million.
In addition to achieving this feat, the company also welcomed industry veteran and Silver Lake Sumeru partner John Brennan to its board of directors.
Digital Reasoning uses its flagship product Synthesys to analyze unstructured and structured big data to reveal relationships between people, place and time. It takes text-based data and sifts through documents and connects the dots without company employees having to read them all. Digital Reasoning works with more than a dozen government agencies to uncover security threats and accelerate the time to actionable intelligence.
Brennan stated:
“Organizations in every market are looking for ways to exploit the information and intelligence embedded in unstructured data; Synthesys could be a transformational solution in the enterprise as organizations develop their big data strategies,” said John Brennan. “Digital Reasoning’s platform can go beyond its success in the government intelligence market to help enterprises quickly analyze big data to detect fraud, uncover market trends, gain better insight into customer behavior, and mitigate risk.”
The combined power of an investment of this magnitude and Brennan’s software and operating background will allow, the already successful, company to potentially expand beyond its current government intelligence work into new markets.
Jasmine Ashton, December76, 2011
Sponsored by Pandia.com
Sentiment Analysis Explained
December 1, 2011
Sentiment and text mining analytics company Lexalytics has created the first easy to use semantic classifier by compiling over 1.1 million words and phrases from Wikipedia. Sentiment analysis, or opinion mining, refers to the application of natural language processing, computational linguistics, and text analytics to identify and extract subjective information in source materials.
I read a recent Click Centive post called “OEM Text Analytics from Lexalytics” that breaks down the concept of sentiment analysis and scoring and provides a series of posts related to Lexalytics software.
The post states:
Sentiment scoring allows a computer to consistently rate the positive or negative assertions that are associated with a document or entity. The scoring of sentiment (sometimes referred to as tone) from a document is a problem that was originally raised in the context of marketing and business intelligence, where being able to measure the public’s reaction to a new marketing campaign (or a corporate scandal) can have a measurable financial impact on your business.
This is an informative post, but I’m more interested to see specific information regarding the “easy to user semantic classifier” that Lexalytics has created, rather than generalities on sentiment scoring.
Jasmine Ashton, December 1, 2011
Sponsored by Pandia.com
Inteltrax: Top Stories, November 21 to November 25
November 28, 2011
Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, the highs and lows of recent analytics news.
On the high side, was our story “Speech Analytics Market Approaches Billions” that chronicled the success of applying unstructured big data analytic techniques to recorded speech, such as in call centers.
On the low side, we found “Mobile BI Takes a Surprising Misstep” explores how the once bustling mobile BI market recently took a hit.
And somewhere in the middle, we found “In-Memory Databases Cause a Stir” attempted to draw the line between traditionalist and futurists of analytics.
It’s a wild ride every week in the world of big data analytics. Sure things go bust, underdogs appear from nowhere and divisions are drawn. Stay tuned to see where it all leads.
Follow the Inteltrax news stream by visiting www.inteltrax.com
Patrick Roland, Editor, Inteltrax.
November 28, 2011
Bloomberg Discovers Palantir: Huh?
November 23, 2011
News flash! Bloomberg Businessweek has realized that Palantir, which has garnered more than $90 million in funding,is indispensible to the US intelligence community. Er, okay. You will want to read this “real” news story yourself. Just point your monitored browser at “Palantir: The War on Terror’s Secret Weapon.” Palantir has been a well kept secret at least in Bloomberg’s news room. Palantir ended up in a nifty legal spat with i2 Group, not part of IBM. The settlement was sealed, which certainly catches the attention of the goslings in Harrod’s Creek, but not the “real” journalists in New York. The fact that Palantir is the PowerPoint superstar which has the attention of those attention deficit disorder presenters is not on the radar of the Bloombergians.
Here’s the passage which I enjoyed:
The origins of Palantir go back to PayPal, the online payments pioneer founded in 1998. A hit with consumers and businesses, PayPal also attracted criminals who used the service for money laundering and fraud. By 2000, PayPal looked like “it was just going to go out of business” because of the cost of keeping up with the bad guys, says Peter Thiel, a PayPal co-founder….PayPal’s computer scientists set to work building a software system that would treat each transaction as part of a pattern rather than just an entry in a database. They devised ways to get information about a person’s computer, the other people he did business with, and how all this fit into the history of transactions. These techniques let human analysts see networks of suspicious accounts and pick up on patterns missed by the computers. PayPal could start freezing dodgy payments before they were processed. “It saved hundreds of millions of dollars,” says Bob McGrew, a former PayPal engineer and the current director of engineering at Palantir.
Want more? Well, the story sprawls over six pages.
My view?
First, point your browser to www.inteltrax.com and read the stories about Palantir.
Second, what about the legal dust up? Well, run a Google query and get the scoop. The legal documents are quite interesting as well. The interesting information is available on WestlawNext and Lexis. The free Web content is, well, not industrial strength.
Third, what about Digital Reasoning, a company with groundbreaking entity based analytics? Check that out at www.digitalreasoning.com . For more amusement look at www.recordedfuture.com.
You can read interviews with founders of companies with technology that goes beyond Palantir at these two links:
We are not “real” journalists. On the other hand, you will get some insight into what’s happening with next generation analytics. No turkey on Thanksgiving at Beyond Search.
Stephen E Arnold, November 24, 2011
Freebie. Unlike Palantir’s solutions.