Search Only Goes So Far
January 30, 2012
Infocentric Research surveyor Stephan Schillerwein, who presented his findings at the Online Information Conference, released some alarming statistics about enterprise search in his report “The Digital Workplace.” Among the points which jumped out at me were 40 percent of employees use the wrong information when conducting enterprise searches and 63 percent “make critical decisions without being informed,” which results in a 25 percent work information productivity loss.
According to the Pandia Search Engine News Article “Huge Problems for Search In the Enterprise” Schillerwein believes there are a few reasons why enterprise search is problematic. Users don’t account for the fact that enterprise search is different from Web Search, they have unrealistic expectations and there is a clear problem of lack of content. The Pandia article asserted: Schillerwein suggests a solution based on several elements, such as consistent coverage of information flows for processes, bringing together the worlds of structured and unstructured information, and adding context. I would agree as this ability to combine structured and unstructured data while maintaining context is key in our approach. However, when you combine the crowded jumble of tweets, social media and other data that crowd employees’ smart devices the problems with enterprise search could continue to take a downward spiral and “finding a needle in a haystack” could be easier than doing an enterprise search.
These observations triggered several questions and observations.
First, there are a number of companies offering enterprise information solutions. Many are focused on the older approach of key word queries. There are business intelligence systems which provide “find-ability” tools along with a range of useful analytic features. Although search is not the focal point of these solutions, they do provide useful visualizations and statistics on content. The problem is that most organizations are confused about what is needed and what must be done to maximize the value of systems which go beyond key word retrieval. This confusion is likely to play a far larger role in enterprise search challenges than many market analysts want to acknowledge. Instead, many solutions today seem to be making information access more confusing and problematic, not clearer and more trouble free.
Second, the challenge may be more directly related to figuring out what specific business process needs which information. Without a clear understanding of the user’s requirements, it may be difficult to deploy a system that delivers higher user satisfaction. If this hypothesis is correct, perhaps more vendors should adopt the approach we have taken at Digital Reasoning. We make an extra effort to understand what the user requires and then invest time and resources in hooking appropriate information and data into the system. No solution can deliver the right fact-based answers if the required information is not within the data store and available to the algorithms which make sense of what is otherwise noise? We think that many problems with user acceptance originate with a misunderstanding or sidestepping of user requirements and the fundamental task of getting the necessary information for the system.
Third, the terminology used to describe information retrieval and access is becoming devalued. At Digital Reasoning, we work to explain succinctly and without jargon how our next-generation system can facilitate better decision making for financial, health, intelligence, and other professional markets. We have complex numerical recipes and sophisticated systems and methods. Our focus, however, is on what the system does for a user. We have been fortunate to receive support from a range of clients from government and industry as well as the investment community for our next-generation approach. We think our strength is our focus on the customer’s need and not only our unique predictive algorithms and cloud-based solution.
To learn more about Digital Reasoning and our products, navigate to www.digitalreasoning.com .
Dave Danielson, Digital Reasoning, January 30, 2012
Sponsored by Pandia.com
Prediction Data Joins the Fight
January 12, 2012
It seems that prediction data could be joining the fight against terrorism. According to the Social Graph Paper article “Prediction Data As An API in 2012” some companies are working on developing prediction models that can be applied to terror prevention. The article mentions the company Palantir “they emphasize development of prediction models as applied to terror prevention, and consumed by non-technical field analysts.” Recorded Future is another company but they rely on “creating a ‘temporal index’, a big data/ semantic analysis problem, as a basis to predict future events.” Other companies that have been dabbling in big data/prediction modeling are Sense Networks, Digital Reasoning, BlueKai and Primal. The author theorizes that “There will be data-domain experts spanning the ability to make sense of unstructured data, aggregate from multiple sources, run prediction models on it, and make it available to various “application” providers.” Using data to predict the future seems a little farfetched but the technology is still new and not totally understood. Everyone does need to join the fight against terrorism but exactly how data prediction fits in remains to be seen.
April Holmes, January 12, 2012
Sponsored by Pandia.com
Big Data in 2012: Reliable Open-Source Software Required
January 11, 2012
Enthusiasm and optimism that Big Data as a concept is the next big thing. We are almost ready to board the Big Data bull dozer. The hoopla surrounding Big Data has not died down in 2012. Instead, the concept demonstrates the continuing environment of processing and analysis.
As businesses become aware that the Big Data trend is here to stay, publishers are looking for reliable support. The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The company offers much in the way of dealing with unstructured data and is setting the pace for consolidation as well as personalization. I came across an interesting article, “State of the World IT: Big Data, An Offer That is Formed” (The original article is in French, but http://translate.google.com works well for this gosling). We learn:
As a recognition of the market in 2011, Hadoop has also attracted the top names in the IT industry who put this framework in the heart of their range of data processing volume. One reason: the cost mainly reminded us James Markarian, executive vice president and technical director of Informatica confirming that the framework ‘helped to change the economic model of the Big Data.’ Adding that flexibility… was as a criterion for adoption.
It is clear that the excess of data will only continue to grow by the minute. Generations of search, publishing, and consolidation will continue to emerge. I recommend staying informed of the products and the specific capabilities of each. However, Big Data which is filtered may pose some interesting problems; for example, will the outputs match the pre-filtered reality? Will predictive methods work when some data are no longer in the stream? So far the cheerleading is using chants from an older, pre-filtering era. Is this a good thing or a no-thing?
Andrea Hayden, January 11, 2012
Sponsored by Pandia.com
Temis, Spammy PR, and Quite Silly Assertions
January 11, 2012
I am working on a project related to semantics. The idea is, according to that almost always reliable Wikipedia resource is:
the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata.
Years ago I studied at Duquesne University, a fascinating blend of Jesuit obsession, basketball, and phenomenological existentialism. If you are not familiar with this darned exciting branch of philosophy, you can dig into Psychology from an Empirical Standpoint by Franz Brentano or grind through Carl Stumpf’s The Psychological Origins of Space Perception, or just grab the Classic Comic Book from your local baseball card dealer. (My hunch is that many public relations professionals feel more comfortable with the Classic approach, not the primary texts of philosophers who focus on how ephemera and baloney affect one’s perception of reality one’s actions create.)
But my personal touchstone is Edmund Husserl’s body of work. To get the scoop on Lebenswelt (a universe of what is self-evident), you will want to skip the early work and go directly to The Crisis of European Sciences and Transcendental Phenomenology. For sure, PR spam is what I would call self evident because it exists, was created by a human (possibly unaware that actions define reality), to achieve an outcome which is hooked to the individual’s identify.
Why mention the crisis of European thought? Well, I received “American Society for Microbiology Teams Up With TEMIS to Strengthen Access to Content” in this morning’s email (January 10, 2012). I noted that the document was attributed to an individual identified as Martine Fallon. I asked to be removed from the spam email list that dumps silly news releases about Temis into my system. I considered that Martine Fallon may be a ruse like Betty Crocker. Real or fictional, I am certain she or one of her colleagues, probably schooled in an esoteric discipline such as modern dance, agronomy, and public relations are familiar with the philosophical musings of Jean Genet.
You can get a copy of Born to Lose at this link.
I recall M. Genet’s observation:
I recognize in thieves, traitors and murderers, in the ruthless and the cunning, a deep beauty – a sunken beauty.
Temis, a European company in the dicey semantic game, surely appreciates the delicious irony of explaining a license deal as a “team”. The notion of strengthening access to content is another semantic bon mot. The problem is that the argument does not satisfy my existential quest for factual information; for example, look at the words and bound phrases in bold:
Temis, the leading provider of Semantic Content Enrichment solutions for the Enterprise, today announced it has signed a license and services agreement with the American Society for Microbiology (ASM), the oldest and largest life science membership organization in the world.
Do tell. Leading? Semantic content enrichment. What’s that?
The “leading” word is interesting but it lacks the substance of verifiable fact. Well, there’s more to the news story and the Temis pitch. Temis speaks for its client, asserting:
To serve its 40,000 members better, ASM is completely revamping its online content offering, and aggregating at a new site all of its authoritative content, including ASM’s journal titles dating back to 1916, a rapidly expanding image library, 240 book titles, its news magazine Microbe, and eventually abstracts of meetings and educational publications.
I navigated to the ASM Web site, did some poking around, and learned that ASM is rolling in dough. You can verify the outfit’s financial status at this page. But the numbers and charts allowed me to see that ASM has increasing assets, which is good. However, this chart suggests that since 2008, revenue has been heading south.
Source: http://www.faqs.org/tax-exempt/DC/American-Society-For-Microbiology.html
In my limited experience in rural Kentucky, not-for-profits embrace technology for one of three reasons. Let me list them and see if we can figure out what causes the estimable American Society for Microbiology.
Connotate Embraces Big Data
January 10, 2012
The Internet is an environment where unregulated data is being created at rapid rates. It has become far too much for company staff to keep track of. Therefore, software that collects and organizes Big Data is becoming a hot commodity for enterprises all over the world.
According to the recent news release “Staffing and the Volume of Information are the Primary Big Data Challenges” Connotate, Inc., a provider of solutions that help organizations monitor and collect data and content from the Web, announced the results of its Big Data Attitudes and Perceptions Survey.
Connotate CEO Tom Meyer said:
Our research shows that Big Data goes beyond technology and is an HR challenge for corporate America. While it is important that organizations devote resources to Big Data, employees must be freed from the information fire hose so they can concentrate only on the information that is relevant to their tasks. Connotate’s Agent Community data extraction and monitoring tools are a proven force multiplier, enabling companies to drastically reduce the amount of personnel needed to run and achieve significant ROI from Big Data projects.
The Connotate survey suggests that companies are finding it too time consuming and impractical for their staff to sort through Big Data. Companies focused on data fusion are responding to the explosion in social content. Clients demand; vendors respond.
Jasmine Ashton, January 10, 2012
Sponsored by Pandia.com
Digital Reasoning Connects with TeraDact
January 4, 2012
Big data analytics specialist Digital Reasoning has been a regular topic of discussion here at Beyond Search, most recently for achieving series B funding for a big data intelligence push.
Now, we would like to share an exciting new development in the quest to solve the big data problem in the news release “Digital Reasoning and TeraDact Partner to Automatically Remove Sensitive Information from Big Data.”
According to the article, TeraDact Solutions, a software tools and data integration solutions provider, has integrated their TeraDactor Information Identification and Presentation capabilities with Synthesys Cloud, a software-as-a-service data analytics solution.
The news story states:
In conjunction with Synthesys, TeraDactor can automatically assist in appropriately classifying information not recognized by the original data provider. TeraDactor allows participants to push and pull information without waiting for the declassification process, assuring that formerly classified documents may be released without unintended leakages.
The innovative technology that TeraDact Solutions brings to Digital Reasoning’s table demonstrates the power of Synthesys as a cloud-based data analytics tool in building the next generation of Big Data analytic solutions. Kudos to the surging Digital Reasoning organization.
Jasmine Ashton, January 4, 2012
Sponsored by Pandia.com
Digital Reasoning: A New Generation of Big Data Tools
December 31, 2011
I read “Tool Detects Patterns Hidden in Vast Data Sets.” The Broad Institute’s online Web site reported that a group of researchers in the US and Israel “have developed a tool that can tackle large data sets in a way that no other software program can.”
What seems exciting to me is that the mathematical procedure which involves creating a space and grids into which certain discerned patterns are placed provides a fascinating potential enhancement to companies like ours–Digital Reasoning. Our proprietary methods have performed similar associative analytics in order to reduce the uncertainty associated with processing large flows of data and distilling meaningful relationships from them. Some day computers and associated systems will be able to cope with exabytes of data from the Internet of things. Today, the Broad Institute validates the next-generation numerical methods that its researchers, Digital Reasoning’s engineers, and a handful of other organizations have been exploring.
The technical information about the method, which is called MIC, shorthand for Maximal Information Coefficient, is available to members of the AAAS. To get a copy of the original paper and its mathematical exegesis you will want the full bibliographic information:
“Detecting Novel Associations in Large Data Sets” by David N. Reshef, Yakir A. Reshef, Hilary K. Finucane, Sharon R. Grossman, Gilean McVean, Peter J. Turnbaugh, Eric S. Lander, Michael Mitzenmacher, and Pardis C. Sabeti, Science, 16 December 2011, Volume. 334, Number 6062, pages 1518-1524.
The core of the authors work is:
Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R2) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.
Digital Reasoning’s application of similar mathematical methods underpins our entity-oriented analytics. You can read more about our methods in our description of Synthesys, a platform for performing automated understanding of the meaning of Big Data in real time.
The significance of this paper is that it shines a spotlight on the increasing importance of research into applications of next-generation numerical methods. Public discussion of methods like MIC will serve to accelerate innovation and the diffusion of knowledge. At Digital Reasoning we see this as further evidence of the potential of algorithmic, unaided approaches like ours to achieve true “automated understanding” of all forms of text regardless of volume, velocity or variety. As we shift to IPv6, the “Internet of things” will dramatically increase the flows of real time data. With automobiles and consumer devices transmitting data continuously or on demand, the digital methods of 10 or five years ago fall short.
Three other consequences of MIC-style innovations will accrue:
First, at Digital Reasoning, we will be able to enhance our existing methods with the new insights, forming partnerships and investing in research to apply demonstrations to real world problems. The confidence SilverLake partners’ investment in Digital Reasoning has provided us with capital to extend our commercial system quickly and in new directions such as financial services, health care, legal, and other verticals.
Second, we see the MIC method fueling additional research into methods making Big Data more accessible and useful; that is, consumerize some applications without solutions. Big Data will eventually be part of a standard information process, not something discussed as “new” and “unusual.”
Third, greater awareness of the contribution of mathematics will, I believe, stimulate young men and women to make mathematics and statistics a career. With more talent entering the workforce, the pace of innovation and integration will accelerate. That’s good for many companies, not just Digital Reasoning.
Kudos to the MIC team. What’s next?
Tim Estes, December 31, 2011
Sponsored by Pandia.com
Text Analytics Offers Summit, Podcast
December 28, 2011
Social media has exploded, with billions and billions of pieces of content hitting the Web every day; making sense of it all can be overwhelming. That’s why Text Analytics News is sponsoring the first ever two day conference dedicated solely to social media analytics– the Social Media Analytics Summit in San Francisco next April 17–18. The conference description informs us:
The Social Media Analytics Summit offers unmatchable networking and knowledge sharing opportunities for social media and analytics professionals. The Summit will be a true forum for vendors, end users, and consultants alike to develop long-lasting business relationships. The conference agenda has been designed based on meticulous research with industry experts and is well-rounded with presentations, panels, case studies, and workshops, giving you deep insights into the social analytics industry from many angles.
Social media analytics is only going to keep growing; learning about this key field now is a wise investment. Here’s an inside tip: register with discount code BSEARCHSMA to save $150.
Text Analytics News’ Chief Editor Erza Steinberg also has a podcast available called the “Social Media Analytics Perspectives Panel,” which he recently recorded with professionals from Social Media Today, Radian6, J.D. Power & Associates, and Beyond The Arc. The podcast explores:
- Effective ways for leveraging social media information to gain a competitive advantage
- The cutting edges of social media analytics and sentiment analysis technology
- How to make business sense out of the flood of user-generated content across social media channels
Two sources of important information from Text Analytics News. Here’s hoping you can take advantage of both.
Cynthia Murrell, December 28, 2011
Sponsored by Pandia.com
Big Data Analytics and Sense Making with Synthesys
December 19, 2011
Tim Estes is the CEO and co-founder of Digital Reasoning. Digital Reasoning develops and markets solutions that provide Automated Understanding for Big Data.
There’s a great deal of talk about “big data” today. If you walk into an AT&T store near you, you may see the statistics of users sending over 3 Billion text messages a day or over 250 million tweets. Compare that to closer to 100 million or less tweets a day a year or two ago, and it’s daunting how rapidly the volume of digital information is increasing. A mobile phone without expandable storage frustrates users who want to keep a contacts list, rich media, and apps in their pocket. In organizations, the appetite for storage is significant. EMC, Hewlett Packard, and IBM are experiencing strong demand for their storage systems. Cloud vendors such as Amazon and Rackspace are also experiencing strong demand from companies offering compelling services to end users on their infrastructure. At a recent Amazon conference in Washington, Werner Vogels revealed that the AWS Cloud has hundreds of thousands of companies/customers running on it as some level. Finally, companies like Digital Reasoning are working the next generation of Cloud – automated understanding – that goes from a focus on infrastructure to sense-making of data that sits in hosted or private clouds.
While most of the attention has been on infrastructure like virtualization / hypervisors, Hadoop, and NoSQL data storage systems, we think those are really the enablers of the killer app for Cloud- which is making sense of data to solve information overload. Without next generation analytics and supporting technology, it is essentially impossible to:
- Analyze a flow of data from multiple sensors deployed in a factory
- Process mobile traffic at a telephone company
- Make sense of unstructured and structured information flowing through an email system
- Identify key entities and their importance in a stream of financial news and transaction data.
These are the real world problems that have engaged me for many years. I founded Digital Reasoning to automatically make sense of data because I believed that someday all software would learn and that would unleash the next great revolution in the Information Age. The demand for this revolution is inevitable because while data has increased exponentially, human attention has been essentially static in comparison. Technology to create better return on attention would go from “nice to have” to utterly essential. And now, that moment is here.
Digging a little deeper, Digital Reasoning has created a way to take human communication and use algorithms to make sense of it without having to depend on a human design, an ontology, or some other structure. Our system looks at patterns and the way a word is used in its context and bootstraps the understanding much like a human child does – creating associations and building into more complex relationships.
In 2009, we migrated onto Hadoop and began taking on the problem of managing very large scale unstructured data and move the industry beyond counting things that are well structured and toward being able to figure out exactly what the data means that you are measuring.
Digital Reasoning asks the question: “How do you take loose, noisy information that is disconnected and unstructured and then make sense of it so that you can then apply analytics to it in a way that is valuable to business?”
We identify actors, actions, patterns, and facts and then put it into the context of space and time in an efficient and scalable way. In the government scenario, that can mean to finding and stopping bad guys. In the legal environment they want to answer the questions of “who”, “what”, “where”, and “when”.
Digital Reasoning initially set our focus on the complex task of making sense out of massive volumes of unstructured text within the US Government Intelligence Community after the events of 9/11. But we also believe that our Synthesys software can be utilized in the commercial sector to create great value from the mountains of unstructured data that sit in the Enterprise and streaming in from the Web.
Companies with large-scale data will see value in investing in our technology because they cannot hire 100,000 people to go through and read all of the available material. This matters if you are a bank and trying to make financial trades. This matters for companies doing electronic discovery. This matters for health sectors that need help organizing medical records and guarding against fraud.
We are an emerging firm, growing rapidly and looking to have the best and the brightest join our quest to empower users and customers to make sense of their data through revolutionary software. With the recent investment from In-Q-Tel and partners of Silver Lake, I believe that Digital Reasoning has a great future ahead. We are on the bleeding edge of what is going on with Hadoop and Big Data in the engineering area and how to make sense of data through some of the most advanced learning algorithms in the world. Most of all we care that people are empowered with technology so that they can recover value and time in the race to overcome information overload.
To learn more about Digital Reasoning, navigate to our Web site and download our white paper.
Tim Estes, December 19, 2011
Sponsored by Pandia.com
Inteltrax: Top Stories, December 12 to December 16
December 19, 2011
Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, the issue of change in the analytic world—for the better, for the worse and everything in between.
One example of change came from our story, “Data Mining Changing Scientific Thought” shows how the way scientists think is being streamlined by analytics.
On the other hand, “ManTech has Uphill Climb with Intelligence Analytics,” shows that not all change looks promising, like one company’s new focus on intelligence.
And some change, well, we’re just not sure how it’ll pan out, like with the story “Predicting the Ponies is Just Unstructured Data” which exposes how the gambling industry could be changed by analytic tools. For the better or worse is up for debate.
Change, in any aspect of life, is inevitable. However, the world of big data analytics seems more susceptible than most. And we couldn’t be happier, as we watch the unexpected turns these changes bring to the industry every day.
Follow the Inteltrax news stream by visiting http://www.inteltrax.com/
Patrick Roland, Editor, Inteltrax.