October 5, 2015
Humans are sight-based creatures. When faced with a chunk of text or a series of sequential pictures, they will more likely scan the pictures for information than read. With the big data revolution, one of the hardest problems analytics platforms have dealt with is how to best present data for users to implement. Visual analytics is the key, but one visual analytics is not the same as another. DCInno explains that one data visual company stands out from the rest in the article, “How The Reston Startup Makes Everyone A Big Data Expert.”
Zoomdata likes to think of itself as the one visual data companies that gives its clients a one up over others and it goes about it in layman’s terms.
“Zoomdata has been offering businesses and organizations a way to see data in ways more useful than a spreadsheet since it was founded in 2012. Its software offers real-time and historical explorations of data streams, integrating multiple sources into a cohesive whole. This makes the analytics far more accessible than they are in raw form, and allows a layperson to better understand what the numbers are saying without needing a degree in mathematics or statistics.”
Zoomdata offers a very interactive platform and is described to be the only kind on the market. Their clients range from government agencies, such as the Library of Congress, and private companies. Zoomdata does not want to be pigeonholed as a government analytics startup. Their visual data platform can be used in any industry and by anyone with the goal of visual data analytics for the masses. Zoomdata has grown tremendously, tripled its staff, and raised $22.2 million in fundraising.
Now let us sit back and see how their software is implemented in various industries. I wonder if they could make a visual analytics graphic novel?
Whitney Grace, October 5, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
October 3, 2015
I read a listicle called “Ten Top Languages for Crunching Big Data.” The list is interesting but the underlying assumption about the languages and “crunching” Big Data was remarkable.
The core of the write up is a list of 10 programming languages which make it possible (maybe semi easy) to “generate insights.” The list has some old familiar programming languages; for example, SQL or structured query language. There’s the graduate student in psychology fave SAS. Some might argue that SPSS Clem is the way to chop Big Data down to size. There is a toolkit in the list. Remember Matlab, which for a “student” is only $49. For the sportier crowd, I would add Mathematica to the list, but I don’t want to melt the listicle.
Also on the list are Python and R. Both get quite a bit of love from some interesting cyber OSINT outfits.
For fans of Java, the list points to Scala. The open source fan can use HiveQL, Julia, or Pig Latin.
The listicle includes a tip of the hat to Alphabet Google. According to the write up:
Go has been developed by Google and released under an open source license. Its syntax is based on C, meaning many programmers will be familiar with it, which has aided its adoption. Although not specifically designed for statistical computing, its speed and familiarity, along with the fact it can call routines written in other languages (such as Python) to handle functions it can’t cope with itself, means it is growing in popularity for data programming.
Yep, a goodie from the GOOG spells Big Data magic. For how long? Well, I don’t know.
However, the assumption from which the listicle hangs is that a programming language allows Big Data to be crunched.
There may be a couple of “trivial” intermediary steps required. Let me mention one. The Big Data cruncher has to code up something to get useful outputs. Now that “code up” step may require some other bothersome tasks; for example, dealing with messy data to ensure that the garbage in, garbage out problem does not arise. The mathematically inclined may suggest that the coded up “script” actually work within available computer time and memory resources. Wow, that might make a script to crunch Big Data either not work or output results which are dead wrong. What if the script implements algorithmic bias?
Whoa, whoa, Nellie.
I know that programming languages are important. But some other tasks deserve attention in my experience.
Stephen E Arnold, October 3, 2015
September 30, 2015
I read “Two Great Visualizations about Data Science.” There is not too much reading involved. The article provides images of two graphics. The more interesting is “another nice picture about the history of big data and data science.”
Note that in the 2010 column, the separate lines of “technology” have converged into what looks to me like a fur ball.
The diagram captures several important ideas.
First, note that Bayes and Bayesian methods have some continuity. Other numerical approaches are important, but that Bayes has created the equivalent of Gorilla Glue.
Second, progress, particularly after 1990, seems to point to visualization. This is, for me, similar to judges awarding a cake with nice looking icing a blue ribbon without tasting the baker’s confection. Appearances are more important than substance.
Third, the end point of the diagram is a circular image which looks like a 1950s atomic diagram from the old Atomic Energy Forum. I think the image looks like a darned confusing diagram.
I think data science and Big Data are more confusing than they were in 2010. The eccentric orbits are becoming more distorted.
Stephen E Arnold, September 30, 2015
September 29, 2015
Quote to note: Want to do data sharing? Want to offer federated search? Want Spark to ignite your insights?
Before embracing these ideas, you may find “Researcher Examines Complexities of Data-Sharing in Four Research Projects” providing some not well publicized insights. I know those self appointed experts and mid tier consultants do their best to be objective and detail oriented, but some information slips between the cracks in their scientific processes.
“Having the right data is usually better than having more data; little data can be just as important as big data.”
The person allegedly making this statement labors at UCLA’s Center for Knowledge Infrastructures.
Stephen E Arnold, September 28, 2015
September 26, 2015
One of my two or three readers sent me a link to “Rethinking Enterprise Search for the Big Data Age.” The write up explains that old-school search won’t do the trick in today’s digital content environment.
I learned that the Manna Search and Discovery Platform is built on a modern Hadoop stack that leverages HDFS, the Accumulo graph database, Apache Spark, heaps of Scala code, and a host of various machine learning algorithms for teasing knowledge out of reams of unstructured data.
The write up veers into a swamp I try to avoid. I am not sure what knowledge is, and I have a heck of a time figuring out how data becomes information. The knowledge part is a mystery for brighter “sparks” to pursue.
The Maana system is a “search and discovery platform.” The write up quotes a Mr. Thompson who explains:
You can tell Maana, ‘I want to know all pieces of equipment that have led to most unplanned downtime,” Thompson says. After telling it to look in the Gulf and entering the appropriate EQP code, the system returns of histogram of pieces of equipment with the most amount of downtime. “So you get very quickly through a simple search and filtering operation a visual representation of the underlying data.”
The magic is that the system:
can join multiple disparate data sets and enable users to search and discover data across them in a semantic method. “It’s very simple to navigate the entire information space, which may be being fed from many different sources simultaneously,” Thompson says. “But you’re working at level of domain concepts.”
Okay, a modern Version of a federating system with clustering, correlation, classification, data mining, semantic, and correlation features.
The open source software issue is an interesting one. The write up points out that Maana relies on Apache Spark. However, I did a quick memory refresh on the Maana Web site which states here that the system is not based on Lucene/Solr.
The company is backed by Conoco Phillips, Chevron, Frost Data Capital, and GE Ventures. I also noticed that Intel has a stake in the company. Intel, in my opinion, continues to explore content processing. After the company’s adventure (maybe misadventure with Convera (formerly Excalibur Technologies), Intel took a stake in Endeca. Endeca sold itself to Oracle and Intel has obviously moved on to Maana.
Will the LucidWorks approach to Big Data capture customers who want to make sense of Big Data? Will Elasticsearch make inroads? My hunch is that Big Data will come under the influence of the systems built to deal with flows of real time data from disparate sources, including audio and video. Most of these firms use open source search and retrieval tools as a utility.
Maana appears to be positioning itself to be a key player in Big Data access. I will wait to see which horses make it to the finish line.
Stephen E Arnold, September 26, 2015
September 22, 2015
Exalead is Dassault Systems’s big data software targeted specifically at businesses. Exalead offers innovative data discovery and analytics solutions to manage information in real time across various servers and generate insightful reports to make better, faster decisions. It is the big data solution of choice for many businesses across various industries. The Exalead blog shares that “PricewaterhouseCoopers Is Launching Its Information Management Application, Based on Exalead CloudView.”
PricewaterhouseCoopers (PwC) analyzed the amount of time users spent trying to locate, organize, and disseminated information. When users spend the time on information management, they lose two valuable resources: time and money. PwC designed Pulse, a search and information tool as a solution to the problem.
“The EXALEAD CloudView software solution from Dassault Systèmes facilitates the rapid search and use of sources of structured and unstructured information. In existence since 2007, this enterprise information management concept was integrated initially in other software applications. Since it was reworked as EXALEAD CloudView, the configuration of the queries has become easier and they are processed much faster. Furthermore, the results of the searches are more precise, significantly reducing the number of duplicates and the time wasted managing them. PwC has deliberately decided to roll out Pulse on an international scale gradually, in order to generate plenty of enthusiasm amongst users. A business case is prepared for each country on the basis of its needs, the benefits and the potential savings. PwC also intends to make the content in Pulse accessible by other internal systems (e.g., the project workspaces), to integrate the sources, and to make the search function even smarter.”
Pulse is supposed to cut costs and reinvest the resources into more fruitful venues. One interesting aspect to note is that PwC did not build the Pulse upgrade, Exalead provided the plumbing.
September 21, 2015
I read “Big Data Falls Off the Hype Cycle.” Fascinating. A term without definition has sparked ruminations about why a mid tier consulting firm does not define Big Data as hyperbole.
The write up states:
“Big Data” joins other trends dropped into obscurity this year including: decision management, autonomous vehicles, prediction markets, and in-memory analytics. Why are terms dropped?
The article scoots forward to answer this question. The solution for those of you familiar with a multiple choice test include:
Sometimes because they are too obvious. For example in-memory analytics was dropped because no one was actually pursuing out-of-memory analytics. Autonomous vehicles because “it will not impact even a tiny fraction of the intended audience in its day-to-day jobs”. Some die and are forgotten because they are deemed to have become obsolete before they could grow to maturity. And Big Data, well, per Gartner “data is the key to all of our discussion, regardless of whether we call it “big data” or “smart data.” We know we have to care, so it is moot to make an extra point of it here.”
The write up then offers:
When I first took a stab at making a definition I concluded that Big Data was really more about a new technology in search of a problem to solve. That technology was NoSQL DBs and it could solve problems in all three of those Vs. Maybe we should have just called it NoSQL and let it go at that. Not to worry. I’m sure that calling things “Big Data” will stick around for a long time even if Gartner wants us not to.
I have a different take. My hunch is that the hype cycle is a marketing and lead generation vehicle for a mid tier consulting firm. When the leads no longer flow and the “objective studies” no longer sell, a fresh approach is needed.
Big Data as a concept is no longer hype. That’s reassuring. Perhaps progress is retarded by buzzwords, jargon, and thrashing for revenues?
Stephen E Arnold, September 21, 2015
September 21, 2015
Have you heard the one about how dark data hides within an organization’s servers and holds potential business insights? Wait, you did not? Then where have you been for the past three years? Datameer posted an SEO heavy post on its blog called, “Shine Light On Dark Data.” The post features the same redundant song and dance about how dark data retained on server has valuable customer trend and business patterns that can put them bring them out ahead of the competition.
One new fact is presented: IDC reports that 90% of digital data is dark. That is a very interesting fact and spurs information specialists to action to get a big data plan in place, but then we are fed this tired explanation:
“This dark data may come in the form of machine or sensor logs that when analyzed help predict vacated real estate or customer time zones that may help businesses pinpoint when customers in a specific region prefer to engage with brands. While the value of these insights are very significant, setting foot into the world of dark data that is unstructured, untagged and untapped is daunting for both IT and business users.”
The post ends on some less than thorough advice to create an implementation plan. There are other guides on the Internet that better prepare a person to create a big data action guide. The post’s only purpose is to serve as a search engine bumper for Datameer. While Datameer is one of the leading big data software providers, one would think they wouldn’t post a “dark data definition” post this late in the game.
September 15, 2015
French semantic tech firm Mondeca has their own research arm, Mondeca Labs. Their website seems to be going for a playful, curiosity-fueled vibe. The intro states:
“Mondeca Labs is our sandbox: we try things out to illustrate the potential of Semantic Web technologies and get feedback from the Semantic Web community. Our credibility in the Semantic Web space is built on our contribution to international standards. Here we are always looking for new challenges.”
The page links to details on several interesting projects. One entry we noticed right away is for an inference engine; they say it is “coming soon,” but a mouse click reveals that no info is available past that hopeful declaration. The site does supply specifics about other projects; some notable examples include linked open vocabularies, a SKOS reader, and a temporal search engine. See their home page, above, for more.
Established in 1999, Mondeca has delivered pragmatic semantic solutions to clients in Europe and North America for over 15 years. The firm is based in Paris, France.
Cynthia Murrell, September 15, 2015
September 15, 2015
The Internet is a hotbed for crime and its perpetrators and Europol is one of the main organizations that fights it head on. One the problems that Europol faces is the lack of communication between law enforcement agencies and private industry. In a landmark agreement that will most likely be followed by others, The Inquirer reports “Europol and FireEye Have Aligned To Fight The International Cyber Menace.”
FireEye and Eurpol have signed a Memorandum of Understanding (MoU) where they will exchange information, so law enforcement agencies and private industry will be able to share information in an effort to fight the growing prevalence of cyber crime. Europol is usually the only organization that disseminates information across law enforcement agencies. FireEye is eager to help open the communication channels.
” ‘The threat landscape is changing every day and organizations need to stay one step ahead of the attackers,’ said Richard Turner, president for EMEA at FireEye. ‘Working with Europol means that, as well as granting early access to FireEye’s threat intelligence, FireEye will be able to respond to requests for assistance around threats or technical indicators of compromise in order to assist Europol in combating the ever increasing threat from cyber criminals.’ ”
The MoU will allow for exchange of information about cyber crime to aid each other in prevention and analyze attach methods. The Inquirer, however, suspects that information will only be shared one way. It does not explain which direction, though. The MoU is going to be a standard between Big Data companies and law enforcement agencies. Law enforcement agencies are notorious for being outdated and understaffed; relying on information and software from private industry will increase cyber crime prevention.