We Are Without a Paddle on Growing Data Lakes

January 18, 2018

The pooling of big data is commonly known as a “data lake.” While this technique was first met with excitement, it is beginning to look like a problem, as we learned in a recent Info World story, “Use the Cloud to Create Open, Connected Data Lakes for AI, Not Data Swamps.”

According to the story:

A data scientist will quickly tell you that the data lake approach is a recipe for a data swamp, and there are a few reasons why. First, a good amount of data is often hastily stored, without a consistent strategy in place around how to organize, govern and maintain it. Think of your junk drawer at home: Various items get thrown in at random over time, until it’s often impossible to find something you’re looking for in the drawer, as it’s gotten buried.

This disorganization leads to the second problem: users are often not able to find the dataset once ingested into the data lake.

So, how does one take aggregate data from a stagnant swamp to a lake one can traverse? According to Scientific Computing, the secret lies in separating the search function into two pieces, finding and searching. When you combine this thinking with Info World’s logic of using the cloud, suddenly these massive swamps are drained.

Patrick Roland, January 18, 2018

 

 

Amazon Cloud Injected with AI Steroids

January 17, 2018

Amazon, Google, and Microsoft are huge cloud computing rivals.  Amazon wants to keep up with the competition, says Fortune, in the article, “Amazon Reportedly Beefing Up Cloud Capabilities In The Cloud.”  Amazon is “beefing up” its cloud performance by injecting it with more machine learning and artificial intelligence.   The world’s biggest retailer is doing this by teaming up with AI-based startups Domino Data Lab and DataRobot.

Cloud computing is mostly used by individuals as computer backups and the ability to access their files from anywhere.  Businesses use it to run their applications and store data, but as cloud computing becomes more standard they want to run machine learning tasks and big data analysis.

Amazon’s new effort is code-named Ironman and is aimed at completing tasks for companies focused on insurance, energy, fraud detection, and drug discovery, The Information reported. The services will be offered to run on graphic processing chips made by Nvidia as well as so-called field programmable gate array chips, which can be reprogrammed as needed for different kinds of software.

Nvidia and other high-performing chip manufacturers such as Advanced Micro Devices and Intel are ecstatic about the competition because it means more cloud operators will purchase their products.  Amazon Web Services is one of the company’s fastest growing areas and continues to bring in the profits.

Whitney Grace, January 17, 2018

Cloud Computing Resources: Cost Analysis for Machine Learning

December 8, 2017

Information about the cost of performing a specific task in a cloud computing set up can be tough to get. Reliable cross platform, apples-to-apples cost analyses are even more difficult to obtain.

A tip of the hat to the author of “Machine Learning Benchmarks: Hardware Providers.” The article includes some useful data about the costs of performing tasks on the cloud services available from Amazon, Google,  Hetzner, and IBM,

My suggestion is to make a copy of the article.

The big surprise: Amazon was the high-cost service. Google is less expensive.

One downside: No Microsoft costs.

Stephen E Arnold, December 8, 2017

Healthcare Analytics Projected to Explode

November 21, 2017

There are many factors influencing the growing demand for healthcare analytics: pressure to lower healthcare costs, demand for more personalized treatment, the emergence of advanced analytic technology, and impact of social media.  PR Newswire takes a look at how the market is expected to explode in the article, “Healthcare Analytics Market To Grow At 25.3% CAGR From 2013 To 2024: Million Insights.”  Other important factors that influence healthcare costs are errors in medical products, workflow shortcomings, and, possibly the biggest, having cost-effective measures without compromising care.

Analytics are supposed to be able to help and/or influence all of these issues:

Based on the component, the global healthcare analytics market is segmented into services, software, and hardware. Services segment held a lucrative share in 2016 and is anticipated to grow steady rate during the forecast period. The service segment was dominated by the outsourcing of data services. Outsourcing of big data services saves time and is cost effective. Moreover, Outsourcing also enables access to skilled staff thereby eliminating the requirement of training of staff.

The cloud-based delivery is anticipated to grow and be the most widespread analytics platform for healthcare.  It allows remote access, avoids complicated infrastructures, and has real-time data tracking.  Adopting analytics platforms help curb the rising problems from cost to workforce to treatment the healthcare industry faces and will deal with in the future.  While these systems are being implemented, the harder part is determining how readily workers will be correctly trained on using them.

Whitney Grace, November 21, 2017

Mongo DB Position upon Filing IPO

November 9, 2017

This article at Datamation, “MongoDB’s Mongo Moment,” suggests MongoDB is focused on the wrong foe. As the company filed for its $100 million IPO, its CEO Dev Ittycheria observed that competitor Oracle is “vulnerable” because it has lost appeal to developers. However, writer Matt Asay asserts developers never were very fond of Oracle, and that MondoDB’s real competition is AWS (Amazon Web Services). He posits:

As mentioned, however, the real question isn’t about MongoDB’s impact on Oracle, any more than MySQL had a material impact on Oracle. No, the question is how relevant MongoDB is to the growing population of modern applications. Quite simply: this is where the action is. As VMware executive (and former MongoDB executive) Jared Rosoff reasons, ‘Old workloads grew one database server at a time. New workloads add tens or hundreds of servers at a time.’

Indeed, as MongoDB vice president of cloud products Sahir Azam told me in an interview, ‘We see a higher percentage of legacy RDBMS moving to MongoDB. Tens of billions of spend that has traditionally gone to Oracle and other SQL vendors is now moving to open source RDBMS and MongoDB with app refreshes and new apps.’

Mongo has a significant advantage over AWS, writes Asay, in the flexibility it offers developers. He also notes the increased spending power developers are now enjoying within enterprises should help the company. One potential pitfall—Mongo spends way too much on marketing, which could cause investors to shy away. On the whole, however, Asay believes MongoDB is navigating a shifting market wisely. See the article for more on the company’s approach and some criticisms it has received. Founded in 2007, MongoDB is based in New York City and employs over 800 workers in locations around the world.

Cynthia Murrell, November 9, 2017

Big Data Less Accessible for Small and Mid-Size Businesses

October 31, 2017

Even as the term “Big Data” grows stale, small and medium-sized businesses (SMB’s) are being left behind in today’s data-driven business world. The SmartData Collective examines the issue in, “Is Complexity Strangling the Real-World Benefits of Big Data for SMB’s?” Writer Rehan Ijaz supplies this example:

Imagine a local restaurant chain fighting to keep the doors open as a national competitor moves into town. The national competitor will already have a competent Cloud Data Manager (CDM) in place to provide insight into what should be offered to customers, based on their past interactions. A multi-million-dollar technology is affordable, due to scale, for a national chain. The same can’t be said for a smaller, mom and pop type restaurant. They’ve relied on their gut instinct and hometown roots to get them this far, but it may not be enough in the age of Big Data. Large companies are using their financial muscle to get information from large data sets, and take targeted action to outmaneuver local competitors.

Pointing to an article from Forbes, Ijaz observes that the main barrier for these more modestly-sized enterprises is not any hesitation about the technology itself, but rather a personal issue—their existing marketing employees were not hired for their IT prowess, and even the most valuable data requires analysis to be useful. Few SMB’s are eager to embrace the cost and disruption of hiring data scientists and reorganizing their marketing teams; they have to be sure it will be worth the trouble.

Ijaz hopes that the recent increase in scalable, cloud-based analysis solutions will help SMB’s with these challenges. The question is, he notes, whether it is too late for many SMB’s to recover from their late foray into Big Data.

Cynthia Murrell, October 31, 2017

HP Enterprise Spins Software Division into Micro Focus International

October 23, 2017

It would seem that the saga of HP’s lamented 2011 Autonomy acquisition is now complete—Reuters announces, “Hewlett Packard Enterprise to Complete Software Spin-Off.” Reporter Salvador Rodriguez explains:

The enterprise software businesses, which include the widely used ArcSight security platform, have been merged with Micro Focus International Plc (MCRO.L), a British software company. HPE was formed when the company once known as Hewlett-Packard split into HPE and HP Inc in November 2015.

 

The spin-off comes as HPE adjusts to the rapid shift of corporate computing to cloud services offered by the likes of Amazon.com Inc (AMZN.O) and Microsoft Corp (MSFT.O). HPE aims to cater specifically to customers running services both on their own premises and in the cloud, said Ric Lewis, senior vice president of HPE’s cloud software group, in an interview.

 

The spin-off marks the end of HP’s unhappy tangle with Autonomy, which it acquired for $11 billion in an aborted effort to transform HP into an enterprise software leader. The ink was barely dry on the much-criticized deal when the company took an $8.8 billion writedown on it.

But wait, the story is not over quite yet—the legal case that began when HP sued Autonomy ’s chief officers continues. Apparently, that denouement is now HPE’s to handle. As for Micro Focus, Rodriguez reports it will now be run by former HPE Chief Operating Officer Chris Hsu, who plans to focus on growth through acquisitions. Wait… wasn’t that what started this trouble in the first place?

Cynthia Murrell, October 23, 2017

Equifax Hack Has Led to Oracle Toughening Up

October 19, 2017

According to a timely piece in SearchOracle, its parent company has muscled up in response to its recent troubles, according to the article, “Machine Learning and Analytics Among Key Oracle Security Moves.”

This comes on the heels of the infamous Equifax hack, which was made vulnerable due to a weakness in Apache Struts. To their credit, Oracle has owned up to the problem and made it public that they are not going to wilt in the face of criticism. In fact, they are doubling down:

Oracle’s effort to help IT teams reprioritize their defenses, he said, takes the form of a new unified model for organizing data, rolled out as part of an updated Oracle Management Cloud suite. Advanced machine learning and analytics will enable automated remediation of flaws like Struts…

The story continues:

(Oracle’s) approach to machine learning is uniquely its own, in the sense that it is being delivered as a core enhancement to existing offerings, and not as a stand-alone technology that is personalized by a mascot or nickname — a la Einstein from Salesforce or Watson from IBM.

We like that Oracle isn’t trying to throw the baby out with the bathwater, here. We agree, there are a lot of things to like and overhauling would not be the solution. Via analytical improvements, we suspect that Oracle will recover from the Equifax snafu and be stronger for it. They certainly sound like their focus is on that.

Patrick Roland, October 19, 2017

The Cloud Needs EDiscovery Like Now

October 16, 2017

Cloud computing has changed the way home and enterprise systems store and access data.  One of the problems with cloud computing, however, is the lack of a powerful eDiscovery tool.  There are search tools for the cloud, but eDiscovery tools help users make rhyme and reason of their content.  Compare The Cloud reports that there is a new eDiscovery tool to improve the cloud, “KroLDiscovery Brings End-To-End eDiscovery To The Cloud With Nebula.”  Nebula is the name of KrolLDiscovery’s eDiscovery tool and it is an upgrade of eDirect365, building on the software’s processing and review capabilities.

Nebula was designed with a user-friendly eDiscovery approach that simplifies otherwise complex tasks.  Nebula is also a web-based application and it can be accessed from most browsers and mobile devices.  The benefit for Windows users is that it can be deployed within Windows Azure to bring scalability and rapid deployment capabilities.

KrolLDiscovery is proud of their newest product:

 ‘We are excited for the future of Nebula,; said Chris Weiler, President and CEO of KrolLDiscovery. ‘Expanding our eDiscovery capabilities to the cloud is a benefit to our multi-national and international clients as they can now process, store and access their data across the globe. All the while, we are dedicated to providing the same industry-leading service we are known for by our clients.’

Nebula was designed to improve how users interact and use their content on a cloud-based system.  Cloud computing has a real-time and portable air about it, but its weaknesses lie in lag and security.  Perhaps Nebula will enhance the former making its other weaknesses a mere shadow of the past.

Whitney Grace, October 16, 2017

 

Amazon Factoids: Match Game for Google, IBM, and MSFT?

September 18, 2017

I am not sure if the data in this Amazon write up are accurate. Navigate to “Prime Day 2017 – Powered by AWS” and make your own decision. I noted these “factoids” about Amazon’s cloud Olympic winning dead lift:

Block Storage – Use of Amazon Elastic Block Store (EBS) grew by 40% year-over-year, with aggregate data transfer jumping to 52 petabytes (a 50% increase) for the day and total I/O requests rising to 835 million (a 30% increase). The team told me that they loved the elasticity of EBS, and that they were able to ramp down on capacity after Prime Day concluded instead of being stuck with it.

NoSQL Database – Amazon DynamoDB requests from Alexa, the Amazon.com sites, and the Amazon fulfillment centers totaled 3.34 trillion, peaking at 12.9 million per second. According to the team, the extreme scale, consistent performance, and high availability of DynamoDB let them meet needs of Prime Day without breaking a sweat.

Stack Creation – Nearly 31,000 AWS CloudFormation stacks were created for Prime Day in order to bring additional AWS resources on line.

API Usage – AWS CloudTrail processed over 50 billion events and tracked more than 419 billion calls to various AWS APIs, all in support of Prime Day.

Configuration TrackingAWS Config generated over 14 million Configuration items for AWS resources.

Is Amazon reminding customers or competitors that it does more than sell books and buy grocery stores? Is Amazon doing PR?

Stephen E Arnold, September 18, 2017

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta