Fragmented Data: Still a Problem?

January 28, 2019

Digital transitions are a major shift for organizations. The shift includes new technology and better ways to serve clients, but it also includes massive amounts of data. All organizations with a successful digital implementation rely on data. Too much data, however, can hinder organizations’ performance. The IT Pro Portal explains how data and something called mass data fragmentation is a major issue in the article, “What Is Mass Data Fragmentation, And What Are IT Leaders So Worried About It?”

The biggest question is: what exactly is mass data fragmentation? I learned:

“We believe one of the major culprits is a phenomenon called mass data fragmentation. This is essentially just a technical way of saying, ’data that is siloed, scattered and copied all over the place’ leading to an incomplete view of the data and an inability to extract real value from it. Most of the data in question is what’s called secondary data: data sets used for backups, archives, object stores, file shares, test and development, and analytics. Secondary data makes up the vast majority of an organization’s data (approximately 80 per cent).”

The article compares the secondary data to an iceberg, most of it is hidden beneath the surface. The poor visibility leads to compliance and vulnerability risks. In other words, security issues that put the entire organization at risk. Most organizations, however, view their secondary data as a storage bill, compliance risk (at least that is good), and a giant headache.

When surveyed about the amount of secondary data they have, it was discovered that organizations had multiple copies of the same data spread over the cloud and on premise locations. IT teams are expected to manage the secondary data across all the locations, but without the right tools and technology the task is unending, unmanageable, and the root of more problems.

If organizations managed their mass data fragmentation efficiently it would increase their bottom line, reduce costs, and reduce security risks. With more access points to sensitive data and they are not secure, it increases the risk of hacking and information being stolen.

Whitney Grace, January 28, 2019

Relatives Got You Down? Check Out BigQuery and Redshift

December 25, 2018

I read “Redshift Vs BigQuery: What Are The Factors To Consider Before Choosing A Data Warehouse.” With Oracle on the ropes and database technology chugging along, why pay attention to old school solutions?

The article sets out to compare and contrast BigQuery (one of the Google progeny known to have consorted with a certain Mr. Dremel.) Amazon has more database products and services than I can keep track of. But RedShift is one of them, and it is important if an intelware company uses AWS and the RedShift technology.

Which system is more “flexible”? I learned:

In the case of Redshift, if anything goes kaput during a transaction, Amazon Redshift allows users to perform roll-back to ensure that data get backs to the consistent state. BigQuery works on the principle of append-only data and its storage engine strictly follows this technique. This becomes a major disadvantage to the user when something goes wrong during the transaction process, forcing them to restart from the beginning or specific point. Another key point is that duplicating data in BigQuery is hard to achieve and costly. Both the technologies have reservations regarding insertion of streaming data, with Redshift taking edge by guaranteeing storage of data with additional care from the user. On the other hand, BigQuery supports de-duplication of streaming data in the most effective way by using time window.

The write up points out:

As compared to BigQuery, Redshift is considerably more expensive costing $0.08 per GB, compared to BigQuery which costs $0.02 per GB. However, BigQuery offers only storage and not queries. The platform charges separately for queries based upon processed data at $5/TB. As BigQuery lacks indexes and various analytical queries, the scanning of data is a huge and costly process. In most cases, users opt for Amazon Redshift as it is predictable, simple and encourages data usage and analytics.

Which is “better”? Not surprisingly, both are really swell. Helpful. But the Beyond Search goose was curious about:

  • Performance
  • Latency for different types of queries
  • Programming requirements

But swell is fine.

Stephen E Arnold, December 25, 2018

Data Science Gets Political

November 20, 2018

With the near ubiquitous use of big data science in every industry short of rock hunting, it was inevitable that there would be blowback. Recently, many tech companies began to feel some political heat due to their involvement with immigration agencies. We learned more from a recent Mercury News story, “Bay Area Cities May Boycott Tech Giants Contracting With ICE.”

According to the story:

“The policy comes as the local immigration debate shifts toward several prominent tech companies — including Palo Alto’s Palantir Technologies, Vigilant Solutions in Livermore and Amazon, which have been criticized for contracting with federal immigration agencies. Last week, advocates descended on Salesforce’s annual conference in San Francisco with an 14-foot-tall cage symbolizing ICE detention to protest the company’s contract with Customs and Border Protection.”

If this sounds a little farfetched or even unlikely, pay close attention to similar actions in Europe. There, when people pushed back against the intersection of politics and big data, it began to impact finances. And when pocketbooks begin to suffer, you can guarantee companies take notice. We don’t yet know if the same will happen in America, but we have a hunch this issue won’t vanish quietly.

Patrick Roland, November 20, 2018

Oracle: Grousing about Amazon and Wrestling with Revenue Alligators

November 14, 2018

One of my erstwhile fans sent me a link to a video allegedly revealing Larry Ellison’s deep disappointment with Amazon. Yep, Amazon, an online store with a bundle of database systems. You can view the video here.

News is news. But It seems that some time has passed since Oracle rolled out major technology announcements. What’s happened to Endeca by the way? Seeking Alpha’s “The Reason(s) Why Oracle’s Growth Story Is Crumbling” is semi news, and the write up raises the question, “What is happening with Oracle?”

Oracle’s quarterly earnings are down and the company’s growth is shrinking faster than the polar ice caps. Oracle might have made a mistake combining its cloud business together with its on-premise business. This move led to Oracle’s stock worth dropping:

“Several SA contributors have provided their take on those earnings, though, in my view, this piece by Shock Exchange puts it quite succinctly: Oracle’s cloud growth may have peaked. Indeed, Oracle’s Fiscal Q4 2018 cloud revenue of $1.57B was $200M below the Wall Street consensus, while 31% growth paled in comparison to SAP’s (SAP) 40% and Microsoft’s (MSFT) 53% for the same segment. For perspective, Oracle’s cloud revenue growth was 66% just a year ago.”

Despite the poor returns this year, Oracle stock is only a little off from its highest point, so the company is surfing along. Perhaps Amazon is a rallying point for the Oracle faithful?

Whitney Grace, November 14, 2018

Amazon: Global Takeover to Leverage the Cloud

November 6, 2018

From bookstores to grocery stores to even video stores, we have gotten used to the idea that Amazon is impossible to stop once it begins in a new market. However, some folks re worried about a market that Amazon has been involved with for a while. We learned more from a recent Tech Crunch story: “Common Clause Stops Open-Source Abuse.”

According to the story:

“Amazon takes Redis (the most loved database in StackOverflow’s developer survey), gives very little back, and runs it as a service, re-branded as AWS Elasticache. Many other popular open-source projects including, Elasticsearch, Kafka, Postgres, MySQL, Docker, Hadoop, Spark and more, have similarly been taken and offered as AWS products.

“To be clear, this is not illegal. But we think it is wrong, and not conducive to sustainable open-source communities.”

Sadly, open-source lovers can stand up and yell, but we have a feeling it won’t do much good. Amazon is far too strong to simply do anything but steamroll in the way it already knows. Look, for example, at how they have even recently begun dipping their toe in the motor oil business. Clearly, there is no safe haven, even open-source, from this titan.

Patrick Roland, November 6, 2018

Stunning Revelation about Maximizing Data Value

October 11, 2018

Quite a stunning revelation appeared in “Your Business Can’t Get Maximum Value Out of Your Data If It’s Not Clean, Says Talend.” “Never before has anyone involved in digital information stumbled upon this insight,” said Tibby Dogg, the Beyond Search data guru. “Imagine. Data have to be consistent, timely, and accurate. Who knew?”

The Beyond Search research team noted this statement as equally revelatory:

…Many companies have a lot of data but are unable to use it effectively. “It’s in many locations, it’s inconsistent, it’s in bad formats, and people can’t make use of it,” Tuchen [a Talend expert] explains.

Why are reliable data needed? That decades old mystery has now been solved:

“If companies can correct all the errors and get a consistent number — not five versions, say — if they do that well, they can start to figure out based on what you’ve bought what are you likely to buy? What should we recommend? What are the most effective sales and marketing campaigns? Should we do more of those? – Talend expert

What’s the fix?

Data governance

The Talend expert does not define data governance because everyone knows exactly what that is.

Quite a brilliant insight about data and how to rectify errors. Keep in mind that Talend is ready to solve data problems.

Yep. Act now. No one knows these secrets.

Stephen E Arnold, October 11, 2018

PR Coup? Russia Wields Blockchain for Good

September 7, 2018

We think is the most interesting use of blockchain yet. Crypto-currency news site BitNovosti reports, “Siberian Federal University Suggests Using Blockchain for Making Fair Waiting Lists for Kindergartens.” The article tells us:

“The existing technologies used for creating waiting lists for kindergartens do not provide full protection against data distortion when it comes to listing order, time of registration, etc. But blockchain guarantees the immutability of sequence of names in waiting lists. The blockchain technology precludes the possibility of adding false data blocks or removing any blocks already in the chain, as these procedures would be easily traceable within the system. Neither one can add any unjustified data elements to the structure, nor remove any data elements which it already incorporates. All sorts of unauthorized interference with the system are completely ruled out, as all the wrongdoings aimed at bringing in changes to the database are preventable by means of comparing them with other copies of the same blockchain stored by other participants in the system. The technology does not presuppose any central server, as databases are distributed between all the system’s users. Thus, a blockchain-based system is absolutely transparent, whereas the insertion of new information into it is impossible without the users’ consensus.”

This is one in a series of projects in Russia that have embraced the use of blockchain technology to benefit society. Other uses include managing social-services payments and plans for patient control over medical records. More applications are being planned in areas like education, charity, and adoption.

Cynthia Murrell, September 5, 2018

IBM Embraces Blockchain. Watson Watches

August 10, 2018

IBM recently announced the creation of LedgerConnect, a Blockchain powered banking service. This is an interesting move for a company that previously seemed to waver on whether it wanted to associate with this technology most famous for its links to cryptocurrency. However, the pairing actually makes sense, as we discovered in a recent IT Pro Portal story, “IBM Reveals Support Blockchain App Store.”

According to an IBM official:

“On LedgerConnect financial institutions will be able to access services in areas such as, but not limited to, know your customer processes, sanctions screening, collateral management, derivatives post-trade processing and reconciliation and market data. By hosting these services on a single, enterprise-grade network, organizations can focus on business objectives rather than application development, enabling them to realize operational efficiencies and cost savings across asset classes.”

This, in addition, to recent news that some of the biggest banks on the planet are already using Blockchain for a variety of needs. This includes the story that the Agricultural Bank of China has started issuing large loans using the technology. In fact, out of the 26 publicly owned banks in China, nearly half are using Blockchain. IBM looks conservative when you think of it like that, which is just where IBM likes to be. Watson, we believe, is watching, able to answer questions about the database du jour.

Patrick Roland, August 10, 2018

Silos Are a Natural Consequence of Information: Learn to Love Them

July 30, 2018

How To Eradicate Unnecessary Data Silos

A piece at the SmartDataCollective explains “How to Eliminate Silos in Company-Wide Data Analytics.” Writer Larry Alton explains:

“Silos emerge when a cluster of individuals in your company (usually within a specific department) have trouble communicating with, or collaborating with another cluster of individuals in your company (usually within another department). In some ways, this is a natural result of building a company; if you want your sales team to focus on sales and your marketing team to focus on marketing, eventually, it will be difficult for your sales and marketing staff to collaborate on a mutual problem. But if you want your company’s data to be streamlined, accessible, and impactful to your organization’s bottom line, you’ll need to eliminate these silos, or at least mitigate their development.”

The piece lists the reasons silos are to be avoided and we agree, in general, with Alton’s points. However, we observe that data isolation by department is required in some sectors—intelligence, law enforcement, and pharmaceuticals, for example. Alton offers specific advice in his list, “How to Break Silos Down,” so see the piece for that info.

The problem, however, is that data silos are a fact of life in many organizations. Examples range from the 23andMe data now shared with a major pharmaceutical company to information in the possession of an attorney allegedly bound by confidentiality obligations. The idea that federating a wide range of data is a natural condition goes against individual and corporate behavior.

Talk about data silos is one thing. Delivering a giant data lake with open access to those with permission to view the data is another. When a new project gets off the ground, how are the data handled? The answer, “In a silo.” Toss in a government requirement for secrecy or a corporate rule about secret drug research, and you have silos.

Who doesn’t want silos?

Cynthia Murrell, July 30, 2018

IBM and a University Tie Up or Tie Down

July 26, 2018

I wanted to comment about the resuscitation of IBM’s cancer initiative at the Veterans Administration. But that’s pure Watson, and I think Watson has become old news.

A more interesting “galactico” initiative at IBM is blockchain.

What’s bigger than Watson?

Blockchain. Well, that’s the the hope.

IBM is grasping tightly to blockchain technology, this time through an academic partnership, we learn in CoinDesk’s piece, “IBM Teams with Columbia to Launch Blockchain Research Center.” Located on the Manhattan campus of Columbia University, the center hopes to speed the development of blockchain apps and cultivate education initiatives. Writer Wolfie Zhao elaborates:

“A dedicated committee comprised of both Columbia faculty members and IBM research scientists will start reviewing proposals for blockchain ‘curriculum development, business initiatives and research programs’ later this year. In addition, the center will advise on regulatory issues for startups in the blockchain space and provide internship opportunities to improve technical skills for students and professionals with an interest in the tech.”

Zhao also notes this move fits into a larger trend:

“The announcement marks the latest effort by the blockchain industry to invest in a top-tier university in the U.S. to accelerate blockchain understanding and adoption. As reported by CoinDesk in June, San Francisco-based distributed ledger startup Ripple said it will invest $2 million in blockchain research initiatives in the University of Texas at Austin in the next five years, as part of its pledge to invest $50 million in worldwide institutions.”

For those who are interested in the University of Texas at Austin’s Blockchain Initiative, there is more information here, via the university’s McCombs School of Business. Ripple, by the way, was founded in 2012 specifically to capitalize on blockchain technology. Though it is indeed based in San Francisco, the company also maintains offices in New York City and Atlanta.

Perhaps IBM will just buy university research departments before Amazon, Facebook, and Google consume the blockchain academic oxygen?

Cynthia Murrell, July 26, 2018

Next Page »

  • Archives

  • Recent Posts

  • Meta