January 30, 2015
The article on IBM titled Discover and Use Real-World Terminology with IBM Watson Content Analytics provides an overview to domain-specific terminology through the linguistic facets of Watson Content Analytics. The article begins with a brief reminder that most data, whether in the form of images or texts, is unstructured. IBM’s linguistic analysis focuses on extracting relevant unstructured data from texts in order to make it more useful and usable in analysis. The article details the processes of IBM Watson Content Analytics,
“WCA processes raw text from the content sources through a pipeline of operations that is conformant with the UIMA standard. UIMA (Unstructured Information Management Architecture) is a software architecture that is aimed at the development and deployment of resources for the analysis of unstructured information. WCA pipelines include stages such as detection of source language, lexical analysis, entity extraction… Custom concept extraction is performed by annotators, which identify pieces of information that are expressed as segments of text.”
The main uses of WCA are exploring insights through facets as well as extracting concepts in order to apply WCA analytics. The latter might include excavating lab analysis reports to populate patient records, for example. If any of these functionalities sound familiar, it might not surprise you that IBM bough iPhrase, and much of this article is reminiscent of iPhrase functionality from about 15 years ago.
Chelsea Kerwin, January 30, 2014
January 29, 2015
I read “Automated Systems Replacing Traditional Search.” The write up asserts:
Stephen E. Arnold, search industry expert and author of the “Enterprise Search Report” and “The New Landscape of Search,” has announced the publication of “CyberOSINT: Next-Generation Information Access.” The 178-page report explores the tools and methods used to collect and analyze content posted in public channels such as social media sites. The new technology can identify signals that provide intelligence and law enforcement analysts early warning of threats, cyber attacks or illegal activities.
According to Robert Steele, co-founder of USMC Intelligence Activity:
NGIA systems are integrated solutions that blend software and hardware to address very specific needs. Our intelligence, law enforcement, and security professionals need more than brute force keyword search.
According to Dr. Jerry Lucas, president of Telestrategies, which operates law enforcement and training conferences in the US and elsewhere:
This is the first discussion of the innovative software that makes sense of the flood of open source digital information. Law enforcement, security, and intelligence professionals will find this an invaluable resource to identify ways to deal with Big Data.
The report complements the Telestrategies ISS seminar on CyberOSINT. Orders for the monograph, which costs $499, may be placed at www.xenky.com/cyberosint. Information about the February 19, 2015, seminar held in the DC area is at this link.
The software and methods described in the study has immediate and direct applications to commercial entities. Direct orders may be placed at http://gum.co/cyberosint.
Don Anderson, January 29, 2015
January 27, 2015
If you need help finding file analysis solutions, Nieuwsbank published this press release that might help you with your research, “Gartner ZyLAB in ‘The File Analysis Market Guide, 2014.’” File analysis refers to file storage and users who have access to them. It is used to manage intellectual property, keep personal information private, and protect data. File analysis solutions have many benefits for companies, including reducing businesses risks, reducing costs, and increasing operational efficiencies.
The guide provides valuable insights:
“In the Market Guide Gartner writes: ZyLAB enter this market from the perspective of and with a legacy in eDiscovery. The company has a strong presence in the legal community and is widely used by governments and organizations in the field of law enforcement. ZyLAB emphasis on activities such as IP protection and detection, fraud investigations, eDiscovery and responsibly removal of redundant data. ZyLAB supports storage types 200 and 700 file formats in 450 languages. This makes it a good choice for international companies. ‘”
ZyLAB is a respected eDiscovery and information risk management solutions company and this guide is a compilation of their insights. The articles point out that companies might have their own file analysis manuals, but few actually enforce its policies or monitor violations. Gartner is a leading market research and their endorsement should be all you need to use this guide.
January 19, 2015
Watson has been going to town in different industries, putting to use its massive artificial brain. It has been working in the medical field interpreting electronic medical record data. According to Open Health News, IBM has used its technology in other medical ways: “IBM Research Scientists Investigate Use Of Cognitive Computing-Based Visual Analytics For Skin Cancer Image Analysis.”
IBM partnered with Memorial Sloan Kettering to use cognitive computing to analyze dermatological images to help doctors identify cancerous states. The goal is to help doctors detect cancer earlier. Skin cancer is the most common type of cancer in the United States, but diagnostics expertise varies. It takes experience to be able to detect cancer, but cognitive computing might take out some of the guess work.
Using cognitive visual capabilities being developed at IBM, computers can be trained to identify specific patterns in images by gaining experience and knowledge through analysis of large collections of educational research data, and performing finely detailed measurements that would otherwise be too large and laborious for a doctor to perform. Such examples of finely detailed measurements include the objective quantification of visual features, such as color distributions, texture patterns, shape, and edge information.”
IBM is already a leader in visual analytics and the new skin cancer project has a 97% sensitivity and 95% specificity rate in preliminary tests. It translates to cognitive computing being accurate.
Could the cognitive computing be applied to identifying other cancer types?
January 18, 2015
Short honk: Value is in the eye of the beholder. I am reminded of this each time I see an odd ball automobile sell for six figures on the Barrett Jackson auction.
Navigate to “Palantir Raising More Money After Tagged With $15 Billion Valuation.” Keep in mind that you may have to pay to view the article, or you can check out the apparently free link to source data at http://bit.ly/KKOAw1.
The key point is that Palantir is an NGIA system. Obviously it appears on the surface to have more “value” than Hewlett Packard’s Autonomy or the other content processing companies in the hunt for staggering revenues.
Stephen E Arnold, January 18, 2015
January 15, 2015
January 9, 2015
The article on ZyLab titled Looking Ahead to 2015 sums up the latest areas of focus at the end of one year and the beginning of the next. Obviously security is at the top of the list. According to the article, incidents of breaches in security grew 43% in 2014. We assume Sony would be the first to agree that security is of the utmost importance to most companies. The article goes on to predict audio data being increasingly important as evidence,
“Audio evidence brings many challenges. For example, the review of audio evidence can be more labor intensive than other types of electronically stored information because of the need to listen not only to the words but also take into consideration tone, expression and other subtle nuances of speech and intonation…As a result, the cost of reviewing audio evidence can quickly become prohibitive and with only a proportional of the data relevant in most cases.”
The article also briefly discusses various data sources, data analytics and information governance in their prediction of the trends for 2015. The article makes a point of focusing on the growth of data and types of data sources, which will hopefully coincide with an improved ability to discover the sort of insights that companies desire.
Chelsea Kerwin, January 09, 2014
December 23, 2014
The article on WMC Action News 5 titled Centrifuge Analytics v3 is Now Available- Large Scale Data Discovery Never Looked Better promotes the availability of Centrifuge Analytics v3, a product that enables users to see the results of their data analysis like never before. This intuitive, efficient tool helps users dig deeper into the meaning of their data. Centrifuge Systems has gained a reputation in data discovery software, particularly in the fields of cyber security, counter-terrorism, homeland defense, and financial crimes analysis among others. Chief Executive Officer Simita Bose is quoted in the article,
“Centrifuge exists to help customers with critical missions, from detecting cyber threats to uncovering healthcare fraud…Centrifuge Analytics v3 is an incredibly innovative product that represents a breakthrough for big data discovery.” “Big data is here to stay and is quickly becoming the raw material of business,” says Stan Dushko, Chief Product Officer at Centrifuge Systems. “Centrifuge Analytics v3 allows users to answer the root cause and effect questions to help them take the right actions.”
The article also lists several of the perks of Centrifuge Analytics v3, including that it is easy to deploy in multiple settings from a laptop to the cloud. It also offers powerful visuals in a fully integrated background that is easy for users to explore, and even add to if source data is complete. This may be an answer for companies who have all the big data they need, but don’t know what it means.
Chelsea Kerwin, December 23, 2014
December 22, 2014
The folks at Google may have the answer for the dearth of skilled data analysts out there. Unfortunately for our continuing job crisis, that answer does not lie in (human) training programs. Google Research Blog discusses “Automatically Making Sense of Data.” Writers Keven Murphy and David Harper ask:
“What if one could automatically discover human-interpretable trends in data in an unsupervised way, and then summarize these trends in textual and/or visual form? To help make progress in this area, Professor Zoubin Ghahramani and his group at the University of Cambridge received a Google Focused Research Award in support of The Automatic Statistician project, which aims to build an ‘artificial intelligence for data science’.”
Trends in time-series data have thus far provided much fodder for the team’s research. The article details an example involving solar-irradiance levels over time, and discusses modeling the data using Gaussian-based statistical models. Murphy and Harper report on the Cambridge team’s progress:
“Prof Ghahramani’s group has developed an algorithm that can automatically discover a good kernel, by searching through an open-ended space of sums and products of kernels as well as other compositional operations. After model selection and fitting, the Automatic Statistician translates each kernel into a text description describing the main trends in the data in an easy-to-understand form.”
Naturally, the team is going on to work with other kinds of data. We wonder—have they tried it on Google Glass market projections?
There’s a simplified version available for demo at the project’s website, and an expanded version should be available early next year. See the write-up for the technical details.
Cynthia Murrell, December 22, 2014
December 18, 2014
A smaller big data sector that specializes in text analysis to generate content and reports is burgeoning with startups. Venture Beat takes a look out how one of these startups, Narrative Science, is gaining more attention in the enterprise software market: “Narrative Science Pulls In $10M To Analyze Corporate Data And Turn It Into Text-Based Reports.”
Narrative Science started out with software that created sport and basic earnings articles for newspaper filler. It has since grown into help businesses in different industries to take their data by the digital horns and leverage it.
Narrative Science recently received $10 million in funding to further develop its software. Stuart Frankel, chief executive, is driven to help all industries save time and resources by better understanding their data
“ ‘We really want to be a technology provider to those media organizations as opposed to a company that provides media content,’ Frankel said… ‘When humans do that work…it can take weeks. We can really get that down to a matter of seconds.’”
From making content to providing technology? It is quite a leap for Narrative Science. While they appear to have a good product, what is it they exactly do?