Bigquery Equals Big Data Transfers for Google
March 16, 2018
Google provides hundreds of services for its users; these include YouTube, AdWord, DoubleClick Campaign Manager, and more. Google, however, is mainly used as a search engine and all of the content on its other services are fed into the search algorithm so they can be queried. In order for all of the content to be searchable, it needs to be dumped and mined. That requires a lot of push power, so what does Google use? According to Smart Data Collective, Google uses the, ““Big Query Service: Next Big Thing Unveiled By Google On Big Data”.“”
Google and big data have not been in the news together for a while, but the BigQuery Data Transfer Service shows how it is moving away from SaaS. How exactly does this work?
According to a Google’s blog post, the new service automates the migration of data from these apps in BigQuery in a scheduled and managed manner. So good so far, the service will support data transfers from AdWords, DoubleClick Campaign Manager, DoubleClick for Publishers, and YouTube Content and Channel Owner Reports and so forth. As soon as the data gets to BigQuery, users can begin querying on the immediate basis. With the help of Google Cloud Dataprep, users cannot only clean and prep the data for that analysis but also further think of analyzing other data alongside that information kept in BigQuery.
The data moves from the apps within 24 hours and BigQuery customers can schedule their own data deliveries so they occur regularly. Customers who already use BigQuery are Trivago and Zenith.
The article turns into a press release for other services Google provides related to machine learning and explains how it is the leading company in the industry. It is simply an advertisement for cloud migration and yet another Google service.
Whitney Grace, March 16, 2018
Visualization Aims to Be Huge in 2018
March 9, 2018
Al the data in the world won’t do you much good if users can’t visualize it. This has been a fact for the computer since Steve Jobs was working out of a garage. But with today’s onslaught of big data, it’s more important than ever. Luckily, it’s going to be huge in the coming year, according to a recent Business Wire article, “IHS Markit Identifies the Top Eight Tech Trends for 2018.”
According to the story, the two best trends are:
Trend #5: Ubiquitous video
The growing use of screens and cameras across multiple consumer- and enterprise-device categories, along with increasingly advanced broadcast, fixed and mobile data networks, is powering an explosion in video consumption, creation, distribution and data traffic. More importantly, video content is increasingly expanding beyond entertainment into industrial applications for medical, education, security and remote controls, as well as digital signage.
Trend #6: Computer vision
The increasing importance of computer vision is directly tied to the mega-trend of digitization that has been playing out in the industrial, enterprise and consumer segments. The proliferation of image sensors, as well as improvements in image processing and analysis, are enabling a broad range of applications and use cases including industrial robots, drone applications, intelligent transportation systems, high-quality surveillance, and medical and automotive.
Perhaps nowhere will this intersection of big data and visualization be bigger than with AI. Experts are ready for artificial intelligence to become user-friendly and they all say it’ll be through visualization. Just wait to see what the new year brings.
Patrick Roland, March 9, 2018
Governance: Now That Is a Management Touchstone for MBA Experts
February 27, 2018
I read “Unlocking the Power of Today’s Big Data through Governance.” Quite a lab grown meat wiener that “unlocking,” “power,” “Big Data,” and “governance” statement is that headline. Yep, IDG, the outfit which cannot govern its own agreements with the people the firm pays to make the IDG experts so darned smart. (For the back-story, check out this snapshot of governance in action.)
What’s the write up with the magical word governance about?
Instead of defining “governance,” I learn what governance is not; to wit:
Data governance isn’t about creating a veil of secrecy around data
I have zero idea what this means. Back to the word “governance.” Google and Wikipedia define the word in this way:
Governance is all of the processes of governing, whether undertaken by a government, market or network, whether over a family, tribe, formal or informal organization or territory and whether through the laws, norms, power or language of an organized society.
Okay, governing. What’s governing mean? Back to the GOOG. Here’s one definition which seems germane to MBA speakers:
control, influence, or regulate (a person, action, or course of events).
The essay drags out the chestnuts about lots of information. Okay, I think I understand because Big Data has been touted for many years. Now, mercifully I assert, the drums are beating out the rhythm of “artificial intelligence” and its handmaiden “algos,” the terrific abbreviation some of the marketing jazzed engineers have coined. Right, algos, bro.
What’s the control angle for Big Data? The answer is that “data governance” will deal with:
- Shoddy data
- Incomplete data
- Off point data
- Made up data
- Incorrect data
Presumably these thorny issues will yield to a manager who knows the ins and outs of governance. I suppose there are many experts in governance; for example, the fine folks who have tamed content chaos with their “governance” of content management systems or the archiving mavens who have figured out what to do with tweets at the Library of Congress. (The answer is to not archive tweets. There you go. Governance in action.)
The article suggests a “definitive data governance program.” Right. If one cannot deal with backfiles, changes to the data in the archives, and the new flows of data—how does one do the “definitive governance program” thing? The answer is, “Generate MBA baloney and toss around buzzwords.” Check out the list of tasks which, in my experience, are difficult to accomplish when resources are available and the organization has a can-do attitude:
- Document data and show its lineage.
- Set appropriate policies, and enforce them.
- Address roles and responsibilities of everyone who touches that data, encouraging collaboration across the organization.
These types of tasks are the life blood of consultants who purport to have the ability to deliver the near impossible.
What happens if we apply the guidelines in the Governance article to the data sets listed in “Big Data And AI: 30 Amazing (And Free) Public Data Sources For 2018.” In my experience, the cost of normalizing the data is likely to be out of reach for most organizations. Once these data have been put in a form that permits machine-based quality checks, the organization has to figure out what questions the data can answer with a reasonable level of confidence. Getting over these hurdles then raises the question, “Are these data up to date?” And, if the data are stale, “How do we update the information?” There are, of course, other questions, but the flag waving about governance operates at an Ivory Tower level. Dealing with data takes place with one’s knees on the ground and one’s hands in the dirt. If the public data sources are not pulling the hay wagon, what’s the time, cost, and complexity of obtaining original data sets, validating them, and whipping them into shape for use by an MBA?
You know the answer: “This is not going to happen.”
Here’s a paragraph which I circled in Oscar Mayer wiener pink:
One of the more significant, and exciting, changes in data governance has been the shift in focus to business users. Historically, data has been a technical issue owned by IT and locked within the organization by specific functions and silos. But if data is truly going to be an asset, everyday users—those who need to apply the data in different contexts—must have access and control over it and trust the data. As such, data governance is transforming from a technical tool to a business application. And chief data officers (CDOs) are starting to see the technologies behind data governance as their critical operating environment, in much the same way SAP serves CFOs, and Salesforce supports CROs. It is rare to find an opportunity to build a new system of record for a market.
Let’s look at this low calorie morsel and consider some of its constituent elements. (Have you ever seen wieners being manufactured? Fill in that gap in your education if you have not had the first hand learning experience.)
First, business users want to see a pretty dashboard, click on something that looks interesting in a visualization, and have an answer delivered. Most of the business people I know struggle to understand if the data in their system is accurate and limited expertise to understand the mathematical processes which churn away to display an “answer.”
The reference to SAP is fascinating, but I think of IBM-type systems as somewhat out of step with the more sophisticated tools available to deal with certain data problems. In short, SAP is an artifact of an earlier era, and its lessons, even when understood, have been inadequate in the era of real time data analysis.
Let me be clear: Data governance is a management malarkey. Look closely at organizations which are successful. Peer inside their data environments. When I have looked, I have seen clever solutions to specific problems. The cleverness can create its own set of challenges.
The difference between a Google and a Qwant, a LookingGlass Cyber and IBM i2, or Amazon and Wal-Mart is not Big Data. It is not the textbook definition of “governance.” Success has more to do with effective problem solving on a set of data required by a task. Google sells ads and deals with Big Data to achieve its revenue goals. LookingGlass addresses chat information for a specific case. Amazon recommends products in order to sell more products.
Experts who invoke governance on a broad scale as a management solution are disconnected from the discipline required to identify a problem and deal with data required to solve that problem.
Few organizations can do this with their “content management systems”, their “business intelligence systems,” or their “product information systems.” Why? Talking about a problem is not solving a problem.
Governance is wishful thinking and not something that is delivered by a consultant. Governance is an emergent characteristic of successful problem solving. Governance is not paint; it is not delivered by an MBA and a PowerPoint; it is not a core competency of jargon.
In Harrod’s Creek, governance is getting chicken to the stores in the UK. Whoops. That management governance is not working. So much in modern business does not work very well.
Stephen E Arnold, February 27, 2018
Can Presscoin Keep News from Going Fake
February 23, 2018
Fake news is a topic that has everyone on all sides of the aisle concerned. However, One innovative idea merging fake news and big data might have solved the problem. We learned more from a recent The Next Web story, “PresscoinL The Largest Crowdfunding Effort to Address the News Crisis.”
According to the story:
As the GDP of the PressCoin economy grows, the value of tokens will rise.
And blockchain keeps it honest. Our hope is to grow a decentralized media network reaching 100 mln engaged ‘users/prosumers’ in five years time, and to meaningfully address 10 percent of the news media industry within a couple of years of that.
PressCoin’s business strategy rests on these legs:
- Shared Design Philosophy– Open Collaboration, Partnership, and Decentralization
- Shared Technology Infrastructure– Seamless media technology cloud, underlying big-data systems, advanced APIs for engagement and analytics
- Shared Business Services– Consumer Data Intelligence, Monetization, Enterprise Sales, Ecosystem Partnerships, Strategic Relations
- Shared Developer Network– This fertile playground for agile experiments in the news/media/journalism sphere
- Shared Fiat/Crypto Financial Services Infrastructure– Built on Cointype
- Shared Venture Arm– Foster disruption within the ecosystem
While PressCoin is just getting off the ground, big data is already sorting out whether fake news can be trusted. One interesting use has been the story of how fake news might not have had that big an impact on the 2016 election. This could be the dawn of some very insightful times.
Patrick Roland, February 23, 2018
Big Data and Net Freedom in China Make a Complicated Relationship
February 21, 2018
One of China’s hottest new app uses a big data engine, unlike anything most of us can imagine, however, that horsepower is getting the company in trouble. We learned more in a recent Slashdot piece, “Toutiao, One of China’s Most Popular News Apps, is Discovering the Risks Involved in Giving People Exactly What They Want Online.”
It actually pulls from a New York Times article and says:
Now the company is discovering the risks involved, under China’s censorship regime, in giving the people exactly what they want. The makers of the popular news app Jinri Toutiao unveiled moves this week to allay rising concerns from the authorities.
Last week, the Beijing bureau of China’s top internet regulator accused Toutiao of “spreading pornographic and vulgar information” and “causing a negative impact on public opinion online,” and ordered that updates to several popular sections of the app be halted for 24 hours. In response, the app’s parent company, Beijing Bytedance Technology, took down or temporarily suspended the accounts of more than 1,100 bloggers that it said had been publishing “low-quality content” on the app. It also replaced Toutiao’s “Society” section with a new section called “New Era,” which is heavy on state media coverage of government decisions.
Toutiao is the vanguard of a growing movement in China. For years, citizens knew they were being tracked by the government, but now are beginning to demand privacy. We certainly hope they can get there but are mighty skeptical. Good luck!
Patrick Roland, February 21, 2018
Palantir: Accused of Hegelian Contradictions
January 29, 2018
I bet you have not thought about Hegel since you took that required philosophy course in college. Well, Hegel and his “contradictions” are central to “WEF 2018: Davos, Data, Palantir and the Future of the Internet.”
I highlighted this passage from the essay:
Data is the route to security. Data is the route to oppression. Data is the route to individual ideation. Data is the route to the hive mind. Data is the route to civic wealth. Data is the route to civic collapse.
Thesis, antitheses, synthesis in action I surmise.
The near term objective is synthesis. I assume this is the “connecting the dots” approach to finding what one needs to know.
I learned:
The stakes for big data couldn’t be bigger.
Okay, a categorical in our fast changing, diverse economic and political climate. Be afraid seems to be the message.
Palantir’s point of operations in Davos is described in the write up as “a pimped up liquor store.” Helpful and highly suggestive too.
The conclusion of the essay warranted a big red circle:
So next time you hear the names Palantir or Alex Karp, stop what you’re doing and pay attention. The future – your future – is under discussion. Under construction. This little first draft of history of which you’ve made it to the end (congratulations and thanks) – the history of data – is of a future that will in time come to be seen for what it is: digital that truly matters.
Several observations:
- The author wants me to believe that Palantir is not a pal.
- The big data thing troubles the author because Palantir is one of the vendors providing next generation information access.
- The goal of making Palantir into something unique is best accomplished by invoking Fancy Dan ideas.
I would suggest that knowledge about companies like Gamma Group FinFisher, Shoghi, Trovicor, and some other interesting non US entities might put Palantir in perspective. Palantir has an operational focus; some of the other vendors perform different information services.
Palantir is an innovator, but it is part of a landscape of data intercept and analysis organizations. I could make a case that Palantir is capable but some companies in Europe and the East are actually more technologically advanced.
But these outfits were not at Davos. Why? That’s a good question. Perhaps they were too busy with their commercial and government work. My hunch is that a few of these outfits were indeed “there”, just not noticed by the expert who checked out the liquor store.
Stephen E Arnold, January 29, 2019
We Are Without a Paddle on Growing Data Lakes
January 18, 2018
The pooling of big data is commonly known as a “data lake.” While this technique was first met with excitement, it is beginning to look like a problem, as we learned in a recent Info World story, “Use the Cloud to Create Open, Connected Data Lakes for AI, Not Data Swamps.”
According to the story:
A data scientist will quickly tell you that the data lake approach is a recipe for a data swamp, and there are a few reasons why. First, a good amount of data is often hastily stored, without a consistent strategy in place around how to organize, govern and maintain it. Think of your junk drawer at home: Various items get thrown in at random over time, until it’s often impossible to find something you’re looking for in the drawer, as it’s gotten buried.
This disorganization leads to the second problem: users are often not able to find the dataset once ingested into the data lake.
So, how does one take aggregate data from a stagnant swamp to a lake one can traverse? According to Scientific Computing, the secret lies in separating the search function into two pieces, finding and searching. When you combine this thinking with Info World’s logic of using the cloud, suddenly these massive swamps are drained.
Patrick Roland, January 18, 2018
Google Tries Like Crazy to End Extreme Content Controversy
January 16, 2018
Google is having a tough time lately. When it purchased YouTube few thought extremist videos and wonky children’s programming would be its most concerning headaches. But their solutions remain strained, as we discovered in a recent Verge story, “YouTube Has Removed Thousands of Videos from Extremist Cleric Anwar Al-Awlaki.”
Google removed hundreds of al-Awalaki’s videos in 2010 which directly advocated violence, following the conviction of Roshonara Choudhry, a radicalized follower who stabbed British MP Stephen Timms earlier that year. At the time, a YouTube spokesperson cited the site’s guidelines against inciting violence. But al-Awalaki posted tens of thousands of other videos, and in subsequent years, was cited as an influence in other notable terrorist attacks at Fort Hood, the Boston Marathon, San Bernardino, and Orlando, Florida.
This comes on the heels of another Verge story with a similar issue, “YouTube Says it Will Crack Down on Bizarre Videos Targeting Children.”
We’re in the process of implementing a new policy that age restricts this content in the YouTube main app when flagged,” said Juniper Downs, YouTube’s director of policy. “Age-restricted content is automatically not allowed in YouTube Kids.” YouTube says that it’s been formulating this new policy for a while, and that it’s not rolling it out in direct response to the recent coverage.
Google is trying to do better, but it seems like they are fighting off an avalanche with a snow shovel. Luckily, as Washington Post points out, the United States leads the world in terms of big data. One can hope that a solution lies in their somewhere, but good luck predicting what it will be.
Patrick Roland, January 17, 20186
One of Big Datas Giants Accused of Big Time Fraud
January 15, 2018
Palantir, one of the biggest names in big data has been praised for its innovative solutions since it began 2004. However, it has been getting attention for all the wrong reasons lately, as we saw in a recent Deal Street Asia story, “Palantir Holder Says Company Sabotaged Stock Sale to Chinese.”
One of Palantir Technologies Inc.’s early investors accused the data-mining startup of sabotaging his attempt to sell his $60 million stakes to a Chinese company so directors and executives could enrich themselves by selling their stock instead.
Marc Abramowitz, a 63-year-old lawyer and investor, contends that when Palantir executives got wind of his offer to sell his stock to Chinese private equity firm CDH Investments Fund Management Co., they sunk the deal by offering to sell their shares to CDH instead, according to a lawsuit filed Thursday in Delaware. Palantir’s campaign to spoil Abramowitz’s sale demonstrates the Silicon Valley company’s “willingness to intentionally interfere with shareholder transactions in an effort…’
It may be tough to prove this in court, however. Palantir is famous for its secrecy, though that may become a thing of the past when they go public. Either way, this is an interesting look at the cutthroat world of big data and the potential things people do to stay on top.
Patrick Roland, January 15, 2018
Big Data Logic Turning Government on Its Ear
January 3, 2018
Can the same startup spirit that powers so many big data companies disrupt the way the government operates? According to a lot of experts, that’s exactly what is happening. We discovered more in a recent Next Gov article, “This Company is Trying to Turn Federal Agencies into Startups.”
According to the story:
BMNT Partners, a Palo Alto-based company, is walking various government agencies through the process of identifying pressing problems and then creating teams that compete against each other to design the best solution. The best of those products might warrant future investments from the agency.
The process begins when an agency presents BMNT with an array of problems it faces internally; BMNT staff helps them narrow down the problem scope, conduct market research to identify the problems that could pique interest from commercial companies, and then track down experts within the agency who can evaluate the solutions. BMNT also helps agencies create various teams of three or four employees who can start building minimum viable products. Newell explained those employees often are selected from the pool within the chief information officer’s or chief technology officers’ staffs.
This seems like a very plausible avenue. Federal agencies are already embracing machine learning and AI, so why not move a little further in this direction? We are looking forward to seeing how this pans out, but chances are this is something the government cannot ignore.
Patrick Roland, January 3, 2018