Blockchain Quote to Note: The Value of Big Data as an Efficient Error Reducer

September 6, 2017

I read “Blockchains for Artificial Intelligence: From Decentralized Model Exchanges to Model Audit Trails.” The foundation of the write up is that blockchain technology can be used to bring more control to data and models. The idea is an interesting one. I spotted a passage tucked into the lower 20 percent of the article which I judged to be a quote to note. Here’s the passage I highlighted:

as you added more data — not just a bit more data but orders of magnitude more data — and kept the algorithms the same, then the error rates kept going down, by a lot. By the time the datasets were three orders of magnitude larger, error was less than 5%. In many domains, there’s a world of difference between 18% and 5%, because only the latter is good enough for real-world application. Moreover, the best-performing algorithms were the simplest; and the worst algorithm was the fanciest. Boring old perceptrons from the 1950s were beating state-of-the-art techniques.

Bayesian methods date from the 18th century and work well. Despite LaPlacian and Markovian bolt ons, the drift problem bedevils some implementations. The solution? Pump in more training data, and the centuries old techniques work like a jazzed millennial with a bundle of venture money.

Care to name a large online outfit which may find this an idea worth nudging forward? I don’t think it will be Verizon Oath or Tronc.

Stephen E Arnold, September 6, 2017

Big Data Visualization the Open Source Way

August 10, 2017

Big Data though was hailed in a big way, it is yet to gain full steam because of a shortage of talent. Companies working in this domain are taking another swipe by offering visualization tools for free.

The Customize Windows in an article titled List of Open Source Big Data Visualization Tools:

There are some growing number of websites which write about Big Data, cloud computing and spread wrong information to sell some others paid things.

Many industries have tried the freemium route to attract talent and promote the industry. For instance, Linux OS maker Penguin Computing offered its product for free to users. This move sparked interest among users who wanted to try something other than Windows and Mac.

The move created a huge user base of Linux users and also attracted talent to promote research and development.

Big Data players it seems is following the exact strategy by offering data visualization tools free, which they will monetize later. All that is needed now is patience.

Vishal Ingole, August 10, 2017

Big Data Can Reveal Darkest Secrets

August 8, 2017

Surveys for long have been used by social and commercial organizations for collecting data. However, with easy access to Big Data, it seems people while responding to surveys, lie more than thought about earlier.

In an article by Seth Stephens-Davidowitz published by The Guardian and titled Everybody Lies: How Google Search Reveals Our Darkest Secrets, the author says:

The power in Google data is that people tell the giant search engine things they might not tell anyone else. Google was invented so that people could learn about the world, not so researchers could learn about people, but it turns out the trails we leave as we seek knowledge on the internet are tremendously revealing.

As per the author, impersonal and anonymity of Internet and ease of access is one of the primary reasons why Internet users reveal their darkest secrets to Google (in form of queries).

Big Data which is a form of data scourged from various sources can be a reliable source of information. For instance, surveys say that around 10% of American men are gay. Big Data, however, reveals that only 2-3% of men are actually gay. To know more about interesting insights on Big Data, courtesy Google, read the article here.

Vishal Ingole, August 8, 2017

Big Data Too Is Prone to Human Bug

August 2, 2017

Conventional wisdom says Big Data being a realm of machines is immune from human behavioral traits like discrimination. Insights from data scientists, however, are different.

According to an article published by PHYS.ORG titled Discrimination, Lack of Diversity, and Societal Risks of Data Mining Highlighted in Big Data, the author says:

Despite the dramatic growth in big data affecting many areas of research, industry, and society, there are risks associated with the design and use of data-driven systems. Among these are issues of discrimination, diversity, and bias.

The crux of the problem is the way data is mined, processed and decisions made. At every step, humans need to be involved in order to tell machines how each of these processes are executed. If the person guiding the system is biased, these biases are bound to seep into the subsequent processes in some way.

Apart from decisions like granting credit, human resources which also is being automated may have diversity issues. The fundamental remains the same in this case too.

Big Data was touted as the next big thing and may turn out to be so, but most companies are yet to figure out how to utilize it. Streamlining the processes and making them efficient would be the next step.

Vishal Ingole, August 2, 2017

Machine Learning Does Not Have the Mad Skills

July 25, 2017

Machine learning and artificial intelligence are computer algorithms that will revolutionize the industry, but The Register explains there is a problem with launching it: “Time To Rethink Machine Learning: The Big Data Gobble Is OFF The Menu.”  The technology industry is spouting that 50 percent of organizations plan to transform themselves with machine learning, but the real truth is that it is less than 15 percent.

The machine learning revolution has supposedly started, but in reality, the cannon has only be fired and the technology has not been implemented.  The problem is that while companies want to use machine learning, they are barely getting off the ground with big data and machine learning is much harder.  Organizations do not have workers with the skills to launch machine learning and the tech industry as a whole has a huge demand for skilled workers.

Part of this inaction comes down to the massive gap between ML (and AI) myth and reality. As David Beyer of Amplify Partners puts it: ‘Too many businesses now are pitching AI almost as though it’s batteries included.’ This is dangerous because it leads companies to either over-invest (and then face a tremendous trough of disillusionment), or to steer clear when the slightest bit of real research reveals that ML is very hard and not something the average Python engineer is going to spin up in her spare time.

Organizations also do not have the necessary amount of data to make machine learning feasible and they also lack the corporate culture to do the required experimentation for machine learning to succeed.

This article shares a story that we have read many times before.  The tech industry gets excited about the newest shiny object, it explodes in popularity, then they realize that the business world is not ready for implementing the technology.

Whitney Grace, July 25, 2017

Big Data in Biomedical

July 19, 2017

The biomedical field which is replete with unstructured data is all set to take a giant leap towards standardization with Biological Text Mining Unit.

According to PHYS.ORG, in a peer review article titled Researchers Review the State-Of-The-Art Text Mining Technologies for Chemistry, the author states:

Being able to transform unstructured biomedical research data into structured databases that can be more efficiently processed by machines or queried by humans is critical for a range of heterogeneous applications.

Scientific data has fixed set of vocabulary which makes standardization and indexation easy. However, most big names in Big Data and enterprise search are concentrating their efforts on e-commerce.

Hundreds of new compounds are discovered every year. If the data pertaining to these compounds is made available to other researchers, advancements in this field will be very rapid. The major hurdle is the data is in an unstructured format, which Biological Text Mining Unit standards intend to overcome.

Vishal Ingole, July 19, 2017

The Big Problems of Big Data

June 30, 2017

Companies are producing volumes of data. However, no fully functional system is able to provide actionable insights to decision makers in real time. Bayesian methods might pave the way to the solution seekers.

In an article published by PHYS and titled Advances in Bayesian Methods for Big Data, the author says:

Bayesian methods provide a principled theory for combining prior knowledge and uncertain evidence to make sophisticated inference of hidden factors and predictions.

Though the methods of data collection have improved, analyzing and presenting actionable insights in real time is still a big problem for Big Data adopters. Human intervention is required at almost every step which defies the entire purpose of an intelligent system. Hopefully, Bayesian methods can resolve these issues. Experts have been reluctant to adopt Bayesian methods owing to the fact that they are slow and are not scalable. However, with recent advancements in machine learning, the method might work.

Vishal Ingole, June 30, 2017

HPE IDOL Released with Natural Language Processing Capabilities Aimed at Enterprise-Level Tasks

June 16, 2017

The article titled Hewlett Packard Enterprise Enriches HPE IDOL Machine Learning Engine With Natural Language Processing on SDTimes discusses the enhancements to HPE IDOL. The challenges to creating an effective interactive experience based on Big Data for enterprise-class inquiries are related to the sheer complexity of the inquiries. Additional issues arise around context, specificity, and source validation. The article examines the new and improved model,

HPE Natural Language Question Answering deciphers the intent of a question and provides an answer or initates an action drawing from an organization’s own structured and unstructured data assets, in addition to available public data sources to provide actionable, trusted answers and business critical responses… HPE IDOL Natural Language Question Answering is a core feature of the new HPE IDOL 11.2 software release that features four key capabilities for natural language processing for the enterprise.

These capabilities are the IDOL Answer Bank (with pre-set reference questions), Fact Bank (with structured and unstructured data extraction abilities), Passage Extract (for text-based summaries), and Answer Server (for question analysis and integration of the other 3 areas). The goal is natural conversations between people and computers, an “information exchange”. The four capabilities work together to deliver a complex answer with the utmost accuracy and relevance.

Chelsea Kerwin, June 16, 2017

The Big Dud

April 24, 2017

Marketers often need a fancy term periodically to sell technologies to large companies. Big Data and Hadoop was one such term. After years of marketing, adopters are yet to see any results, let alone any ROI.

Datamani recently published an article titled Hadoop Has Failed Us, Tech Experts Say in which the author says:

Many companies still run mainframe applications that were originally developed half a century ago. But thanks to better mousetraps like S3 (for storage) and Spark (for processing), Hadoop will be relegated to niche and legacy statuses going forward.

One of the primary concerns with Hadoop is that only handful of people know how to play it. For data scientists to make head and tail out of data, precise data queries and mining needs to be done. The dearth of experts, however, is hampering efforts of companies who want to make Big Data work for them. Other frameworks are trying to overcome problems put forth by Hadoop, but many companies have already adopted it and are stuck with it. And just like many fads, Big Data might fade into oblivion.

Vishal Ingole, April 24, 2017

AI Might Not Be the Most Intelligent Business Solution

April 21, 2017

Big data was the buzzword a few years ago, but now artificial intelligence is the tech jargon of the moment.  While big data was a more plausible solution for companies trying to mine information from their digital data, AI is proving difficult to implement.  Forbes discusses AI difficulties in the article, “Artificial Intelligence Is Powerful Stuff, But Difficult To Scale To Real-Life Business.”

There is a lot of excitement brewing around machine learning and AI business possibilities, while the technology is ready for use, workers are not.  People need to be prepped and taught how to use AI and machine learning technology, but without the proper lessons, it will hurt a company’s bottom line.  The problem comes from companies rolling out digital solutions, without changing the way they conduct business.  Workers cannot just adapt to changes instantly.  They need to feel like they are part of the solution, instead of being shifted to the side in the latest technological trend.

CIO for the Federal Communications Commission Dr. David Bray said that:

The growth of AI may shift thinking in organizations. ‘At the end of the day, we are changing what people are doing,; Bray says. ‘You are changing how they work, and they’re going to feel threatened if they’re not bought into the change. It’s almost imperative for CIOs to really work closely with their chief executive officers, and serve as an internal venture capitalist, for how we bring data, to bring process improvements and organizational performance improvements – and work it across the entire organization as a whole.

Artificial intelligence and machine learning are an upgrade to not only a company’s technology but also how a company conducts business.  Business processes will need to be updated to integrate the new technology, but also how workers will use and interface it.  Businesses will continue facing problems if they think that changing technology, but not their procedures are the final solution.

Whitney Grace, April 21, 2017

Next Page »

  • Archives

  • Recent Posts

  • Meta