Prediction Data Joins the Fight

January 12, 2012

It seems that prediction data could be joining the fight against terrorism. According to the Social Graph Paper article “Prediction Data As An API in 2012” some companies are working on developing prediction models that can be applied to terror prevention. The article mentions the company Palantir “they emphasize development of prediction models as applied to terror prevention, and consumed by non-technical field analysts.” Recorded Future is another company but they rely on “creating a ‘temporal index’, a big data/ semantic analysis problem, as a basis to predict future events.”  Other companies that have been dabbling in big data/prediction modeling are Sense Networks, Digital Reasoning, BlueKai and Primal. The author theorizes that “There will be data-domain experts spanning the ability to make sense of unstructured data, aggregate from multiple sources, run prediction models on it, and make it available to various “application” providers.”  Using data to predict the future seems a little farfetched but the technology is still new and not totally understood. Everyone does need to join the fight against terrorism but exactly how data prediction fits in remains to be seen.

April Holmes, January 12, 2012

Sponsored by Pandia.com

Big Data in 2012: Reliable Open-Source Software Required

January 11, 2012

Enthusiasm and optimism that Big Data as a concept is the next big thing. We are almost ready to board the Big Data bull dozer. The hoopla surrounding Big Data has not died down in 2012. Instead, the concept demonstrates the continuing environment of processing and analysis.

As businesses become aware that the Big Data trend is here to stay, publishers are looking for reliable support. The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The company offers much in the way of dealing with unstructured data and is setting the pace for consolidation as well as personalization. I came across an interesting article, “State of the World IT: Big Data, An Offer That is Formed” (The original article is in French, but http://translate.google.com works well for this gosling). We learn:

As a recognition of the market in 2011, Hadoop has also attracted the top names in the IT industry who put this framework in the heart of their range of data processing volume. One reason: the cost mainly reminded us James Markarian, executive vice president and technical director of Informatica confirming that the framework ‘helped to change the economic model of the Big Data.’ Adding that flexibility… was as a criterion for adoption.

It is clear that the excess of data will only continue to grow by the minute. Generations of search, publishing, and consolidation will continue to emerge. I recommend staying informed of the products and the specific capabilities of each. However, Big Data which is filtered may pose some interesting problems; for example, will the outputs match the pre-filtered reality? Will predictive methods work when some data are no longer in the stream? So far the cheerleading is using chants from an older, pre-filtering era. Is this a good thing or a no-thing?

Andrea Hayden, January 11, 2012

Sponsored by Pandia.com

Temis, Spammy PR, and Quite Silly Assertions

January 11, 2012

I am working on a project related to semantics. The idea is, according to that almost always reliable Wikipedia resource is:

the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata.

Years ago I studied at Duquesne University, a fascinating blend of Jesuit obsession, basketball, and phenomenological existentialism. If you are not familiar with this darned exciting branch of philosophy, you can dig into Psychology from an Empirical Standpoint by Franz Brentano or grind through Carl Stumpf’s The Psychological Origins of Space Perception, or just grab the Classic Comic Book from your local baseball card dealer. (My hunch is that many public relations professionals feel more comfortable with the Classic approach, not the primary texts of philosophers who focus on how ephemera and baloney affect one’s perception of reality one’s actions create.)

But my personal touchstone is Edmund Husserl’s body of work. To get the scoop on Lebenswelt (a universe of what is self-evident), you will want to skip the early work and go directly to The Crisis of European Sciences and Transcendental Phenomenology. For sure, PR spam is what I would call self evident because it exists, was created by a human (possibly unaware that actions define reality), to achieve an outcome which is hooked to the individual’s identify.

Why mention the crisis of European  thought? Well, I received “American Society for Microbiology Teams Up With TEMIS to Strengthen Access to Content” in this morning’s email (January 10, 2012). I noted that the document was attributed to an individual identified as Martine Fallon. I asked to be removed from the spam email list that dumps silly news releases about Temis into my system. I considered that Martine Fallon may be a ruse like Betty Crocker. Real or fictional, I am certain she or one of her colleagues, probably schooled in an esoteric discipline such as modern dance, agronomy, and public relations are familiar with the philosophical musings of Jean Genet.

You can get a copy of Born to Lose at this link.

I recall M. Genet’s observation:

I recognize in thieves, traitors and murderers, in the ruthless and the cunning, a deep beauty – a sunken beauty.

Temis, a European company in the dicey semantic game, surely appreciates the delicious irony of explaining a license deal as a “team”. The notion of strengthening access to content is another semantic bon mot. The problem is that the argument does not satisfy my existential quest for factual information; for example, look at the words and bound phrases in bold:

Temis, the leading provider of Semantic Content Enrichment solutions for the Enterprise, today announced it has signed a license and services agreement with the American Society for Microbiology (ASM), the oldest and largest life science membership organization in the world.

Do tell. Leading? Semantic content enrichment. What’s that?

The “leading” word is interesting but it lacks the substance of verifiable fact. Well, there’s more to the news story and the Temis pitch. Temis speaks for its client, asserting:

To serve its 40,000 members better, ASM is completely revamping its online content offering, and aggregating at a new site all of its authoritative content, including ASM’s journal titles dating back to 1916, a rapidly expanding image library, 240 book titles, its news magazine Microbe, and eventually abstracts of meetings and educational publications.

I navigated to the ASM Web site, did some poking around, and learned that ASM is rolling in dough. You can verify the outfit’s financial status at this page. But the numbers and charts allowed me to see that ASM has increasing assets, which is good. However, this chart suggests that since 2008, revenue has been heading south.

image

Source: http://www.faqs.org/tax-exempt/DC/American-Society-For-Microbiology.html

In my limited experience in rural Kentucky, not-for-profits embrace technology for one of three reasons. Let me list them and see if we can figure out what causes the estimable American Society for Microbiology.

Read more

Connotate Embraces Big Data

January 10, 2012

The Internet is an environment where unregulated data is being created at rapid rates. It has become far too much for company staff to keep track of. Therefore, software that collects and organizes Big Data is becoming a hot commodity for enterprises all over the world.

According to the recent news release “Staffing and the Volume of Information are the Primary Big Data Challenges” Connotate, Inc., a provider of solutions that help organizations monitor and collect data and content from the Web, announced the results of its Big Data Attitudes and Perceptions Survey.

Connotate CEO Tom Meyer said:

Our research shows that Big Data goes beyond technology and is an HR challenge for corporate America. While it is important that organizations devote resources to Big Data, employees must be freed from the information fire hose so they can concentrate only on the information that is relevant to their tasks. Connotate’s Agent Community data extraction and monitoring tools are a proven force multiplier, enabling companies to drastically reduce the amount of personnel needed to run and achieve significant ROI from Big Data projects.

The Connotate survey suggests that companies are finding it too time consuming and impractical for their staff to sort through Big Data. Companies focused on data fusion are responding to the explosion in social content. Clients demand; vendors respond.

Jasmine Ashton, January 10, 2012

Sponsored by Pandia.com

Digital Reasoning Connects with TeraDact

January 4, 2012

Big data analytics specialist Digital Reasoning has been a regular topic of discussion here at Beyond Search, most recently for achieving series B funding for a big data intelligence push.

Now, we would like to share an exciting new development in the quest to solve the big data problem in the news release “Digital Reasoning and TeraDact Partner to Automatically Remove Sensitive Information from Big Data.”

According to the article, TeraDact Solutions, a software tools and data integration solutions provider, has integrated their TeraDactor Information Identification and Presentation capabilities with Synthesys Cloud, a software-as-a-service data analytics solution.

The news story states:

In conjunction with Synthesys, TeraDactor can automatically assist in appropriately classifying information not recognized by the original data provider. TeraDactor allows participants to push and pull information without waiting for the declassification process, assuring that formerly classified documents may be released without unintended leakages.

The innovative technology that TeraDact Solutions brings to Digital Reasoning’s table demonstrates the power of Synthesys as a cloud-based data analytics tool in building the next generation of Big Data analytic solutions. Kudos to the surging Digital Reasoning organization.

Jasmine Ashton, January 4, 2012

Sponsored by Pandia.com

Digital Reasoning: A New Generation of Big Data Tools

December 31, 2011

I read “Tool Detects Patterns Hidden in Vast Data Sets.” The Broad Institute’s online Web site reported that a group of researchers in the US and Israel “have developed a tool that can tackle large data sets in a way that no other software program can.”

What seems exciting to me is that the mathematical procedure which involves creating a space and grids into which certain discerned patterns are placed provides a fascinating potential enhancement to companies like ours–Digital Reasoning. Our proprietary methods have performed similar associative analytics in order to reduce the uncertainty associated with processing large flows of data and distilling meaningful relationships from them. Some day computers and associated systems will be able to cope with exabytes of data from the Internet of things. Today, the Broad Institute validates the next-generation numerical methods that its researchers, Digital Reasoning’s engineers, and a handful of other organizations have been exploring.

The technical information about the method, which is called MIC, shorthand for Maximal Information Coefficient, is available to members of the AAAS. To get a copy of the original paper and its mathematical exegesis you will want the full bibliographic information:

“Detecting Novel Associations in Large Data Sets” by David N. Reshef, Yakir A. Reshef, Hilary K. Finucane, Sharon R. Grossman, Gilean McVean, Peter J. Turnbaugh, Eric S. Lander, Michael Mitzenmacher, and Pardis C. Sabeti, Science, 16 December 2011, Volume. 334, Number 6062, pages 1518-1524.

The core of the authors work is:

Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R2) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.

Digital Reasoning’s application of similar mathematical methods underpins our entity-oriented analytics. You can read more about our methods in our description of Synthesys, a platform for performing automated understanding of the meaning of Big Data in real time.

The significance of this paper is that it shines a spotlight on the increasing importance of research into applications of next-generation numerical methods. Public discussion of methods like MIC will serve to accelerate innovation and the diffusion of knowledge. At Digital Reasoning we see this as further evidence of the potential of algorithmic, unaided approaches like ours to achieve true “automated understanding” of all forms of text regardless of volume, velocity or variety. As we shift to IPv6, the “Internet of things” will dramatically increase the flows of real time data. With automobiles and consumer devices transmitting data continuously or on demand, the digital methods of 10 or five years ago fall short.

Three other consequences of MIC-style innovations will accrue:

First, at Digital Reasoning, we will be able to enhance our existing methods with the new insights, forming partnerships and investing in research to apply demonstrations to real world problems. The confidence SilverLake partners’ investment in Digital Reasoning has provided us with capital to extend our commercial system quickly and in new directions such as financial services, health care, legal, and other verticals.

Second, we see the MIC method fueling additional research into methods making Big Data more accessible and useful; that is, consumerize some applications without solutions. Big Data will eventually be part of a standard information process, not something discussed as “new” and “unusual.”

Third, greater awareness of the contribution of mathematics will, I believe, stimulate young men and women to make mathematics and statistics a career. With more talent entering the workforce, the pace of innovation and integration will accelerate. That’s good for many companies, not just Digital Reasoning.

Kudos to the MIC team. What’s next?

Tim Estes, December 31, 2011

Sponsored by Pandia.com

Text Analytics Offers Summit, Podcast

December 28, 2011

Social media has exploded, with billions and billions of pieces of content hitting the Web every day; making sense of it all can be overwhelming. That’s why Text Analytics News is sponsoring the first ever two day conference dedicated solely to social media analytics– the Social Media Analytics Summit in San Francisco next April 17–18. The conference description informs us:

The Social Media Analytics Summit offers unmatchable networking and knowledge sharing opportunities for social media and analytics professionals. The Summit will be a true forum for vendors, end users, and consultants alike to develop long-lasting business relationships. The conference agenda has been designed based on meticulous research with industry experts and is well-rounded with presentations, panels, case studies, and workshops, giving you deep insights into the social analytics industry from many angles.

Social media analytics is only going to keep growing; learning about this key field now is a wise investment. Here’s an inside tip: register with discount code BSEARCHSMA to save $150.

Text Analytics News’ Chief Editor Erza Steinberg also has a podcast available called the “Social Media Analytics Perspectives Panel,” which he recently recorded with professionals from Social Media Today, Radian6, J.D. Power & Associates, and Beyond The Arc. The podcast explores:

  • Effective ways for leveraging social media information to gain a competitive advantage
  • The cutting edges of social media analytics and sentiment analysis technology
  • How to make business sense out of the flood of user-generated content across social media channels

Two sources of important information from Text Analytics News. Here’s hoping you can take advantage of both.

Cynthia Murrell, December 28, 2011

Sponsored by Pandia.com

Big Data Analytics and Sense Making with Synthesys

December 19, 2011

Tim Estes is the CEO and co-founder of Digital Reasoning. Digital Reasoning develops and markets solutions that provide Automated Understanding for Big Data.

There’s a great deal of talk about “big data” today. If you walk into an AT&T store near you, you may see the statistics of users sending over 3 Billion text messages a day or over 250 million tweets. Compare that to closer to 100 million or less tweets a day a year or two ago, and it’s daunting how rapidly the volume of digital information is increasing. A mobile phone without expandable storage frustrates users who want to keep a contacts list, rich media, and apps in their pocket. In organizations, the appetite for storage is significant. EMC, Hewlett Packard, and IBM are experiencing strong demand for their storage systems. Cloud vendors such as Amazon and Rackspace are also experiencing strong demand from companies offering compelling services to end users on their infrastructure. At a recent Amazon conference in Washington, Werner Vogels revealed that the AWS Cloud has hundreds of thousands of companies/customers running on it as some level. Finally, companies like Digital Reasoning are working the next generation of Cloud – automated understanding – that goes from a focus on infrastructure to sense-making of data that sits in hosted or private clouds.

While most of the attention has been on infrastructure like virtualization / hypervisors, Hadoop, and NoSQL data storage systems, we think those are really the enablers of the killer app for Cloud- which is making sense of data to solve information overload. Without next generation analytics and supporting technology, it is essentially impossible to:

  • Analyze a flow of data from multiple sensors deployed in a factory
  • Process mobile traffic at a telephone company
  • Make sense of unstructured and structured information flowing through an email system
  • Identify key entities and their importance in a stream of financial news and transaction data.

These are the real world problems that have engaged me for many years. I founded Digital Reasoning to automatically make sense of data because I believed that someday all software would learn and that would unleash the next great revolution in the Information Age. The demand for this revolution is inevitable because while data has increased exponentially, human attention has been essentially static in comparison. Technology to create better return on attention would go from “nice to have” to utterly essential. And now, that moment is here.

Digging a little deeper, Digital Reasoning has created a way to take human communication and use algorithms to make sense of it without having to depend on a human design, an ontology, or some other structure. Our system looks at patterns and the way a word is used in its context and bootstraps the understanding much like a human child does – creating associations and building into more complex relationships.

In 2009, we migrated onto Hadoop and began taking on the problem of managing very large scale unstructured data and move the industry beyond counting things that are well structured and toward being able to figure out exactly what the data means that you are measuring.

Digital Reasoning asks the question: “How do you take loose, noisy information that is disconnected and unstructured and then make sense of it so that you can then apply analytics to it in a way that is valuable to business?”

We identify actors, actions, patterns, and facts and then put it into the context of space and time in an efficient and scalable way. In the government scenario, that can mean to finding and stopping bad guys. In the legal environment they want to answer the questions of “who”, “what”, “where”, and “when”.

Digital Reasoning initially set our focus on the complex task of making sense out of massive volumes of unstructured text within the US Government Intelligence Community after the events of 9/11. But we also believe that our Synthesys software can be utilized in the commercial sector to create great value from the mountains of unstructured data that sit in the Enterprise and streaming in from the Web.

Companies with large-scale data will see value in investing in our technology because they cannot hire 100,000 people to go through and read all of the available material. This matters if you are a bank and trying to make financial trades. This matters for companies doing electronic discovery. This matters for health sectors that need help organizing medical records and guarding against fraud.

We are an emerging firm, growing rapidly and looking to have the best and the brightest join our quest to empower users and customers to make sense of their data through revolutionary software. With the recent investment from In-Q-Tel and partners of Silver Lake, I believe that Digital Reasoning has a great future ahead. We are on the bleeding edge of what is going on with Hadoop and Big Data in the engineering area and how to make sense of data through some of the most advanced learning algorithms in the world. Most of all we care that people are empowered with technology so that they can recover value and time in the race to overcome information overload.

To learn more about Digital Reasoning, navigate to our Web site and download our white paper.

Tim Estes, December 19, 2011

Sponsored by Pandia.com

Inteltrax: Top Stories, December 12 to December 16

December 19, 2011

Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, the issue of change in the analytic world—for the better, for the worse and everything in between.

One example of change came from our story, “Data Mining Changing Scientific Thought” shows how the way scientists think is being streamlined by analytics.

On the other hand, “ManTech has Uphill Climb with Intelligence Analytics,” shows that not all change looks promising, like one company’s new focus on intelligence.

And some change, well, we’re just not sure how it’ll pan out, like with the story “Predicting the Ponies is Just Unstructured Data” which exposes how the gambling industry could be changed by analytic tools. For the better or worse is up for debate.

Change, in any aspect of life, is inevitable. However, the world of big data analytics seems more susceptible than most. And we couldn’t be happier, as we watch the unexpected turns these changes bring to the industry every day.

Follow the Inteltrax news stream by visiting http://www.inteltrax.com/

Patrick Roland, Editor, Inteltrax.

IBM Redbooks Reveals Content Analytics

December 16, 2011

IBM Redbooks has put out some juicy reading for the azure chip consultants wanting to get smart quickly with IBM Content Analytics Version 2.2: Discovering Actionable Insight from Your Content. The sixteen chapters of this book take the reader from an overview of IBM content analytics, through understanding the details, to troubleshooting tips. The above link provides an abstract of the book, as well as links to download it as a PDF, view in HTML/Java, or order a hardcopy.

We learned from the write up:

The target audience of this book is decision makers, business users, and IT architects and specialists who want to understand and use their enterprise content to improve and enhance their business operations. It is also intended as a technical guide for use with the online information center to configure and perform content analysis with Content Analytics.

The product description notes a couple of specifics. For example, creating custom annotators with the LanguageWare Resource Workbench is covered. So is using the IBM Content Assessment to weed out superfluous data.

The content is, of course, slanted toward working with IBM solutions. However, there is also some more general information included. This is a good place to go to get a better handle on content management.

Cynthia Murrell, December 16, 2011

Sponsored by Pandia.com

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta