February 3, 2015
Alexander Linden, one of Gartner’s research directors, made some astute observations about advanced analytics and data science technologies. Linden shared his insights with First Post in the article, “Why Should CIOs Consider Advanced Analytics?”
Chief information officers are handling more data and relying on advanced analytics to manage it. The data is critical gaining market insights, generating more sales, and retaining customers. The old business software cannot handle the overload anymore.
What is astounding is that many companies believe they are already using advanced analytics, when in fact they can improve upon their current methods. Advanced analytics are not an upgraded version of normal, descriptive analytics. They use more problem solving tools such as predictive and prescriptive analytics.
Gartner also flings out some really big numbers:
“One of Gartner’s new predictions says that through 2017, the number of citizen data scientists will grow five times faster than the number of highly skilled data scientists.”
This is akin to there being more people able to code and create applications than the skilled engineers with the college degrees. It will be a do it yourself mentality in the data analytics community, but Gartner stresses that backyard advanced analytics will not cut it. Companies need to continue to rely on skilled data scientists the interpret the data and network it across the business units.
February 2, 2015
I find the complaints about Google’s inability to handle time amusing. On the surface, Google seems to demote, ignore, or just not understand the concept of time. For the vast majority of Google service users, Google is no substitute for the users’ investment of time and effort into dating items. But for the wide, wide Google audience, ads, not time, are more important.
Does Google really get an F in time? The answer is, “Nope.”
In CyberOSINT: Next Generation Information Access I explain that Google’s time sense is well developed and of considerable importance to next generation solutions the company hopes to offer. Why the craw fishing? Well, Apple could just buy Google and make the bitter taste of the Apple Board of Directors’ experience a thing of the past.
Now to temporal matters in the here and now.
CyberOSINT relies on automated collection, analysis, and report generation. In order to make sense of data and information crunched by an NGIA system, time is a really key metatag item. To figure out time, a system has to understand:
- The date and time stamp
- Versioning (previous, current, and future document, data items, and fact iterations)
- Times and dates contained in a structured data table
- Times and dates embedded in content objects themselves; for example, a reference to “last week” or in some cases, optical character recognition of the data on a surveillance tape image.
For the average query, this type of time detail is overkill. The “time and date” of an event, therefore, requires disambiguation, determination and tagging of specific time types, and then capturing the date and time data with markers for document or data versions.
A simplification of Recorded Future’s handling of unstructured data. The system can also handle structured data and a range of other data management content types. Image copyright Recorded Future 2014.
Sounds like a lot of computational and technical work.
In CyberOSINT, I describe Google’s and In-Q-Tel’s investments in Recorded Future, one of the data forward NGIA companies. Recorded Future has wizards who developed the Spotfire system which is now part of the Tibco service. There are Xooglers like Jason Hines. There are assorted wizards from Sweden, countries the most US high school software cannot locate on a map, and assorted veterans of high technology start ups.
An NGIA system delivers actionable information to a human or to another system. Conversely a licensee can build and integrate new solutions on top of the Recorded Future technology. One of the company’s key inventions is numerical recipes that deal effectively with the notion of “time.” Recorded Future uses the name “Tempora” as shorthand for the advanced technology that makes time along with predictive algorithms part of the Recorded Future solution.
January 30, 2015
The article on IBM titled Discover and Use Real-World Terminology with IBM Watson Content Analytics provides an overview to domain-specific terminology through the linguistic facets of Watson Content Analytics. The article begins with a brief reminder that most data, whether in the form of images or texts, is unstructured. IBM’s linguistic analysis focuses on extracting relevant unstructured data from texts in order to make it more useful and usable in analysis. The article details the processes of IBM Watson Content Analytics,
“WCA processes raw text from the content sources through a pipeline of operations that is conformant with the UIMA standard. UIMA (Unstructured Information Management Architecture) is a software architecture that is aimed at the development and deployment of resources for the analysis of unstructured information. WCA pipelines include stages such as detection of source language, lexical analysis, entity extraction… Custom concept extraction is performed by annotators, which identify pieces of information that are expressed as segments of text.”
The main uses of WCA are exploring insights through facets as well as extracting concepts in order to apply WCA analytics. The latter might include excavating lab analysis reports to populate patient records, for example. If any of these functionalities sound familiar, it might not surprise you that IBM bough iPhrase, and much of this article is reminiscent of iPhrase functionality from about 15 years ago.
Chelsea Kerwin, January 30, 2014
January 29, 2015
I read “Automated Systems Replacing Traditional Search.” The write up asserts:
Stephen E. Arnold, search industry expert and author of the “Enterprise Search Report” and “The New Landscape of Search,” has announced the publication of “CyberOSINT: Next-Generation Information Access.” The 178-page report explores the tools and methods used to collect and analyze content posted in public channels such as social media sites. The new technology can identify signals that provide intelligence and law enforcement analysts early warning of threats, cyber attacks or illegal activities.
According to Robert Steele, co-founder of USMC Intelligence Activity:
NGIA systems are integrated solutions that blend software and hardware to address very specific needs. Our intelligence, law enforcement, and security professionals need more than brute force keyword search.
According to Dr. Jerry Lucas, president of Telestrategies, which operates law enforcement and training conferences in the US and elsewhere:
This is the first discussion of the innovative software that makes sense of the flood of open source digital information. Law enforcement, security, and intelligence professionals will find this an invaluable resource to identify ways to deal with Big Data.
The report complements the Telestrategies ISS seminar on CyberOSINT. Orders for the monograph, which costs $499, may be placed at www.xenky.com/cyberosint. Information about the February 19, 2015, seminar held in the DC area is at this link.
The software and methods described in the study has immediate and direct applications to commercial entities. Direct orders may be placed at http://gum.co/cyberosint.
Don Anderson, January 29, 2015
January 27, 2015
If you need help finding file analysis solutions, Nieuwsbank published this press release that might help you with your research, “Gartner ZyLAB in ‘The File Analysis Market Guide, 2014.’” File analysis refers to file storage and users who have access to them. It is used to manage intellectual property, keep personal information private, and protect data. File analysis solutions have many benefits for companies, including reducing businesses risks, reducing costs, and increasing operational efficiencies.
The guide provides valuable insights:
“In the Market Guide Gartner writes: ZyLAB enter this market from the perspective of and with a legacy in eDiscovery. The company has a strong presence in the legal community and is widely used by governments and organizations in the field of law enforcement. ZyLAB emphasis on activities such as IP protection and detection, fraud investigations, eDiscovery and responsibly removal of redundant data. ZyLAB supports storage types 200 and 700 file formats in 450 languages. This makes it a good choice for international companies. ‘”
ZyLAB is a respected eDiscovery and information risk management solutions company and this guide is a compilation of their insights. The articles point out that companies might have their own file analysis manuals, but few actually enforce its policies or monitor violations. Gartner is a leading market research and their endorsement should be all you need to use this guide.
January 19, 2015
Watson has been going to town in different industries, putting to use its massive artificial brain. It has been working in the medical field interpreting electronic medical record data. According to Open Health News, IBM has used its technology in other medical ways: “IBM Research Scientists Investigate Use Of Cognitive Computing-Based Visual Analytics For Skin Cancer Image Analysis.”
IBM partnered with Memorial Sloan Kettering to use cognitive computing to analyze dermatological images to help doctors identify cancerous states. The goal is to help doctors detect cancer earlier. Skin cancer is the most common type of cancer in the United States, but diagnostics expertise varies. It takes experience to be able to detect cancer, but cognitive computing might take out some of the guess work.
Using cognitive visual capabilities being developed at IBM, computers can be trained to identify specific patterns in images by gaining experience and knowledge through analysis of large collections of educational research data, and performing finely detailed measurements that would otherwise be too large and laborious for a doctor to perform. Such examples of finely detailed measurements include the objective quantification of visual features, such as color distributions, texture patterns, shape, and edge information.”
IBM is already a leader in visual analytics and the new skin cancer project has a 97% sensitivity and 95% specificity rate in preliminary tests. It translates to cognitive computing being accurate.
Could the cognitive computing be applied to identifying other cancer types?
January 18, 2015
Short honk: Value is in the eye of the beholder. I am reminded of this each time I see an odd ball automobile sell for six figures on the Barrett Jackson auction.
Navigate to “Palantir Raising More Money After Tagged With $15 Billion Valuation.” Keep in mind that you may have to pay to view the article, or you can check out the apparently free link to source data at http://bit.ly/KKOAw1.
The key point is that Palantir is an NGIA system. Obviously it appears on the surface to have more “value” than Hewlett Packard’s Autonomy or the other content processing companies in the hunt for staggering revenues.
Stephen E Arnold, January 18, 2015
January 15, 2015
January 9, 2015
The article on ZyLab titled Looking Ahead to 2015 sums up the latest areas of focus at the end of one year and the beginning of the next. Obviously security is at the top of the list. According to the article, incidents of breaches in security grew 43% in 2014. We assume Sony would be the first to agree that security is of the utmost importance to most companies. The article goes on to predict audio data being increasingly important as evidence,
“Audio evidence brings many challenges. For example, the review of audio evidence can be more labor intensive than other types of electronically stored information because of the need to listen not only to the words but also take into consideration tone, expression and other subtle nuances of speech and intonation…As a result, the cost of reviewing audio evidence can quickly become prohibitive and with only a proportional of the data relevant in most cases.”
The article also briefly discusses various data sources, data analytics and information governance in their prediction of the trends for 2015. The article makes a point of focusing on the growth of data and types of data sources, which will hopefully coincide with an improved ability to discover the sort of insights that companies desire.
Chelsea Kerwin, January 09, 2014
December 23, 2014
The article on WMC Action News 5 titled Centrifuge Analytics v3 is Now Available- Large Scale Data Discovery Never Looked Better promotes the availability of Centrifuge Analytics v3, a product that enables users to see the results of their data analysis like never before. This intuitive, efficient tool helps users dig deeper into the meaning of their data. Centrifuge Systems has gained a reputation in data discovery software, particularly in the fields of cyber security, counter-terrorism, homeland defense, and financial crimes analysis among others. Chief Executive Officer Simita Bose is quoted in the article,
“Centrifuge exists to help customers with critical missions, from detecting cyber threats to uncovering healthcare fraud…Centrifuge Analytics v3 is an incredibly innovative product that represents a breakthrough for big data discovery.” “Big data is here to stay and is quickly becoming the raw material of business,” says Stan Dushko, Chief Product Officer at Centrifuge Systems. “Centrifuge Analytics v3 allows users to answer the root cause and effect questions to help them take the right actions.”
The article also lists several of the perks of Centrifuge Analytics v3, including that it is easy to deploy in multiple settings from a laptop to the cloud. It also offers powerful visuals in a fully integrated background that is easy for users to explore, and even add to if source data is complete. This may be an answer for companies who have all the big data they need, but don’t know what it means.
Chelsea Kerwin, December 23, 2014