February 27, 2014
The article titled How to Integrate Multiple Data Types Into Hadoop With Datameer on the Datameer blog delves into the staff’s favorite features of the program. Besides a small amount of gushing, the article offers insight into the workings of Datameer and metaphors for its applications. The author says that Datameer’s ability to combine various data sources is its best feature, due to the ease with which users can combine datasets. The article states,
“One of the most compelling features in Datameer, in my opinion, is its ability to bring in multiple different data sources — structured, unstructured, from traditional databases, cloud sources, local files, etc. — and then be able to combine those datasets quickly and easily. You can think of Datameer as the single spreadsheet to bring previously siloed data sources together into a single view. I’ve created the below video to show you just how easy it is to do in our data integration wizard.”
The article offers the example from the Datameer staff workers, whose marketing office daily pulls from both Salesforce aand Marketo, two separate datasets. Datameer is set to automatically integrate the data into a single, more powerful dataset. The article also notes that if you don’t see your own preference for data source in the screenshot demos, Datameer also offers plug-ins for a customer connection.
Chelsea Kerwin, February 27, 2014
February 26, 2014
Is big data the key to boosting Africa’s economic prowess? IBM seems to think so, and it is sending in its AI ambassador Watson to help with the continent’s development challenges. Watson is IBM’s natural language processing system that famously won Jeopardy in 2011. Now, Phys,org announces that “IBM Brings Watson to Africa.” The $100 million initiative is known as Project Lucy, named after the skeleton widely considered the earliest known human ancestor (Australopithecus, to be specific), discovered in Africa in 1974. (I would be remiss if I did not mention that an older skeleton, Ardipithecus, was found in 1994; there is still no consensus on whether this skeleton is really a “human ancestor,” though many scientists believe it is. But I digress.)
The write-up tells us:
“Watson technologies will be deployed from IBM’s new Africa Research laboratory providing researchers with a powerful set of resources to help develop commercially-viable solutions in key areas such as healthcare, education, water and sanitation, human mobility and agriculture.
“To help fuel the cognitive computing market and build an ecosystem around Watson, IBM will also establish a new pan-African Center of Excellence for Data-Driven Development (CEDD) and is recruiting research partners such as universities, development agencies, start-ups and clients in Africa and around the world. By joining the initiative, IBM’s partners will be able to tap into cloud-delivered cognitive intelligence that will be invaluable for solving the continent’s most pressing challenges and creating new business opportunities.”
IBM expects that with the help of its CEDD, Watson will be able to facilitate data collection and analysis on social and economic conditions in Africa, identifying correlations across multiple domains. The first two areas on Watson’s list are healthcare and education, both realms where improvement is sorely needed. The Center will coordinate with IBM’s 12 laboratories around the world and its new Watson business unit. (Wait, Watson now has its own business unit?) See the article for more on this hopeful initiative.
Cynthia Murrell, February 26, 2014
February 26, 2014
One of our favorite data outfits has been profiled at the British legal news site The Lawyer in, “The London Startup Giving Meaning to Big Data.” Our own Stephen E. Arnold did an extensive interview with the firm’s director Mats Bjore last November for his excellent Search Wizards Speak series. Though much more brief than his piece, the Lawyer write-up emphasizes one of this company’s key advantages—its commitment to connecting the dots between data sources. That focus has led clients to seek out Silobreaker for data-related security work. We’re told:
“Silobreaker did not specialise in cyber security from the start. Rather, cyber security came to it in the form of some of the largest US hardware and software companies looking to gain insight into threat intelligence data they had gathered.
“The company believes that because many organisations operate in siloed environments there is a disconnect between data sources – customer or financial information, social media data, market analyses and so on. Companies and governments need to inject some sense into their information by bringing all those sources together.
“[…]Co-founder and CEO Kristofer Månsson says, ‘Governments and companies need us to give the information they have some context. The services we provide – geopolitical analysis, monitoring of global events or situational awareness through social media, for example – are not part of a cyber security company’s traditional offering. But we’re still a cyber company by association because of what we do.’”
Besides Silobreaker’s skilled and effective data integration, we also like them for their constant innovation and their ability to see things from the end-user’s perspective. Even greater praise: The ArnoldIT team uses Silobreaker for our intelligence-related work. Launched in 2005, Silobreaker is headquartered in London. They serve clients in corporate, government, military, and financial services realms.
Cynthia Murrell, February 26, 2014
February 21, 2014
The conventional wisdom is that the data derived from social media has great value. It is fast overtaking traditional media, with 70% of adults in the US using Facebook, and 63% of all Facebook users visiting the site at least once a day. That doesn’t even begin to address Twitter, LinkedIn or niche social networks like Ravelry.
What users share on social networks is unstructured data, and IT’s challenge is to extract business value from that unstructured data. But what is that data worth? Loraine Lawson considered the question in her recent blog post “Do Businesses Really Value Social Media Data?” for IT BusinessEdge.
Lawson notes that Dun and Bradstreet has entered into a partnership with business analytics vendor FirstRain that will allow D&B to integrate unstructured social data into its existing enterprise data products at no additional cost. Forbes’ Ben Kepes reported this development, and his column sparked Lawson to wonder,
“Social media data that has been integrated and given to clients…for free? As Kepes notes, that could be interpreted as ‘an admission that enterprises aren’t buying into the idea of unstructured data’s value proposition on its own.’”
On the other hand, Oracle will be happy to sell you a solution to leverage social media data. It’s clear the marketplace hasn’t quite reached consensus on the value of unstructured data.
Laura Abrahamsen, February 21, 2014
February 20, 2014
Style Intelligence, the Business Intelligence platform from Inetsoft, released v. 11.5 late in 2013. According to the Passionned Group post “New Release of Inetsoft Includes Hadoop Connector,” the new data source connector will
“enable users to add data from the Apache Hive data warehouse to their dashboards and analysis in the same easy way they connect any other data source.”
The update also features a complete restyle of their user interface, which is based on a self-service dashboard for data reporting. Style Intelligence is built on an open standard SOA architecture and delivers data reports from both structured and unstructured sources.
In other words, Inetsoft has seen that Hadoop is the way Big Data is going, and they want to make sure their own product can work with what is fast becoming the industry standard.
Laura Abrahamsen, February 20, 2014
February 19, 2014
The press release on ThomsonReuters.com titled Thomson Reuters Cortellis Data Fusion Addresses Big Data Challenges by Speeding Access to Critical Pharmaceutical Content announces the embrace of big data by revenue hungry Thomson Reuters. The new addition the suite of drug development technologies will offer users a more intuitive interface through which they will be able to analyze large volumes of data. Chris Bouton, General Manager at Thomson Reuters Life Sciences is quoted in the article,
“Cortellis Data Fusion gives us the ability to tie together information about entities like diseases and genes and the connections between them. We can do this from Cortellis data, from third party data, from a client’s internal data or from all of these at the same time. Our analytics enable the client to then explore these connections and identify unexpected associations, leading to new discoveries… driving novel insights for our clients is at the very core of our mission…”
The changes at Thomson Reuters are the result of the company’s acquisition of Entagen, according to the article. That company is a leader in the field of semantic search and has been working with biotech and pharmaceutical companies offering both development services and navigation software. Cortellis Data Fusion promises greater insights and better control over the data Thomson Reuters holds, while maintaining enterprise information security to keep the data safe.
Chelsea Kerwin, February 19, 2014
February 17, 2014
The article titled Elasticsearch Debuts Marvel To Deploy And Monitor Its Open Source Search And Data Analytics Technology on TechCrunch provides insight into Marvel, which the article calls a “deployment management and monitoring solution.” Elasticsearch is a technology for extracting information from structured and unstructured data and its users include such big names as Netflix, Verizon and Facebook among others. The article explains how Marvel will work to manage Elasticsearch,
“Enter Marvel, Elasticsearch’s first commercial offering, that makes it easy to run search, monitor performance, get visual views in real time and take action to fix things and improve performance. Marvel allows Elasticsearch system operators, who manage the technology at companies like Foursquare, see their Elasticsearch deployments in action, initiate instant checkup, and access historical data in context. Potential systems issues can be spotted and resolved before they become problems, and troubleshooting is faster. Pricing starts at $500 per five nodes.”
Elasticsearch reported that their revenue growth in 2013 was at over 400% and Marvel will only further their popularity. Already a user-friendly and lightweight technology, Elasticsearch is targeting developers interested in real-time discernibility of their data. Marvel may be great news for Elasticsearch and its users, but is certainly bad news for competitor Lucid Imagination.
Chelsea Kerwin, February 17, 2014
February 12, 2014
The article How To Do Predictive Analytics with Limited Data from Datameer on Slideshare suggests that Limited Data may replace Big Data in import. The idea of “semi-supervised learning” is presented to handle the difficulties associated with creating predictions based on limited data such as expense and manageability and simply missing key data. The overview states,
“As it turns out, recent research on machine learning techniques has found a way to deal effectively with such situations with a technique called semi-supervised learning. These techniques are often able to leverage the vast amount of related, but unlabeled data to generate accurate models. In this talk, we will give an overview of the most common techniques including co-training regularization. We first explain the principles and underlying assumptions of semi-supervised learning and then show how to implement such methods with Hadoop.”
The presentation summarizes possible approaches to semi-supervised learning and the assumptions it is possible to make about unlabeled data (these include such models as clustering, low density and manifold assumptions). It also covers the concepts of Label Propagation and Nearest Neighbor Join. However, as inviting as it is to forget Big Data, and switch to predictive analytics with Limited Data the suggestion may sound too much like Bayes-Laplace.
Chelsea Kerwin, February 12, 2014
February 11, 2014
The article on PRNewswire titled Attivio and Quant5 Partner to Bring Fast and Reliable Predictive Customer Analytics to the Cloud explains the partnership between the two analytics innovators. Aimed at producing information from data without the hassle of a team of data scientists, the partnership promises to effectively create insights that companies will be able to act on. The partnership responds to the growing frustration some companies face with gleaning useful information from huge amounts of data. The article explains,
“Attivio built its business around the core principle that integrating big data and big content should not require expensive mainframe legacy systems, handcuffing service agreements, years of integration and expensive data scientists. Attivio enterprise customers experience business-changing efficiency, sales and competitive results within 90 days. Similarly, Quant5 arose from the understanding that businesses need simple, elegant solutions to address difficult and complex marketing challenges. Quant5 customers experience increased revenues, reduced customer churn and an affordable and fast path to predictive analytics.”
The possibility of indirect sales following in the footsteps of Autonomy and Endeca does seem to be a part of the 2014 tactics. The Attivio-Quant5, Inc. solutions are offered in five major areas of concern: Lead & Opportunity Scoring, Customer Segmentation, Targeted Offers, Product Usage and Product Relationships.
Chelsea Kerwin, February 11, 2014
January 31, 2014
Sometimes a company can grow too fast for its own good. Take the case of DigitalOcean, which eWeek describes in its piece, “Scrubbing Data a Concern in the Digital Ocean Cloud.” It was recently discovered that the cloud hosting firm was not automatically scrubbing user data after every deletion of a virtual machine (VM) instance—not good for security. Apparently, the young company once scrubbed after each VM destroy request, but changed that policy as their growth ballooned.
Writer Sean Michael Kerner tells us:
“As Digital Ocean’s utilization went up, the company found that the scrubbing activity was degrading performance and decided to make it an option that API users needed to manually activate. [DigitalOcean CEO Moisey] Uretsky told eWEEK that even though the data scrubbing has an impact, it is now a cost that his company will bear.
Digital Ocean grew very quickly in 2013, to at least 7,000 Web-facing servers in June 2013, up from only 100 in December 2012, according to Netcraft. One of the reasons for the rapid rise has been Digital Ocean’s aggressive pricing, which starts at $5 for a server with 512MB of memory and a 20GB solid-state drive for a month of cloud service.”
At least the company is taking responsibility for, and learning from, the mistake. Not only is DigitalOcean now faithfully scrubbing every deleted VM instance in sight, Uretsky also specified that his company is hastening to make other changes based on customer feedback. They also, he noted, pledge not to reveal customer data to third parties. The imprudent scrub-optional policy only affected certain DigitalOcean API users, and it does not appear from the article that any programmers were harmed. Headquartered in New York City, DigitalOcean graduated from the TechStars startup accelerator program in 2012.
Cynthia Murrell, January 31, 2014