Universal Text Translation Is the Next Milestone for AI

February 9, 2018

As the globe gets smaller, individuals are in more contact with people who don’t speak their language. Or, we are reading information written in a foreign language. Programs like Google Translate are flawed at best and it is clear this is a niche waiting to be filled. With the increase of AI, it looks like that is about to happen, according to a recent GCN article, “IARPA Contracts for Universal Text Translator.”

According to the article:

The Intelligence Advanced Research Projects Activity is a step closer to developing a universal text translator that will eventually allow English speakers to search through multilanguage data sources — such as social media, newswires and press reports — and retrieve results in English.

 

The intelligence community’s research arm awarded research and performance monitoring contracts for its Machine Translation for English Retrieval of Information in Any Language program to teams headed by leading research universities paired with federal technology contractors.

 

Intelligence agencies, said IARPA project managers in a statement in late December, grapple with an increasingly multilingual, worldwide data pool to do their analytic work. Most of those languages, they said, have few or no automated tools for cross-language data mining.

This sounds like a very promising opportunity to get everyone speaking the same language. However, we think there is still a lot of room for error. We are hedging our bets on Unibabel’s AI translation software that is backed up by human editors. (They raised $23M, so they must be doing something right.) That human angle seems to be the hinge that will be a success for someone in this rich field.

Patrick Roland, February 9, 2018

Data Governance is the Hot Term in Tech Now

February 5, 2018

Data governance is a headache many tech companies have to juggle with. With all the advances in big data and search, how can we possibly make sense of this rush of information? Thankfully, there are new data governance advances that aim to help. We learned more from a recent Top Quadrant story, “How Does SHACL Support Data Governance.”

According to the story:

“SHACL (SHAPES Constraint Language) is a powerful, recently released W3C standard for data modeling, ontology design, data validation, inferencing and data transformation. In this post, we explore some important ways in which SHACL can be used to support capabilities needed for data governance.

Below, each business capability or value relevant to data governance is introduced with a brief description, followed by an explanation of how the capability is supported by SHACL, accompanied by a few specific examples from the use of SHACL in TopBraid Enterprise Data Governance.

So, governance is a great way for IT and business to communicate better and wade through the data. Others are starting to take notice and SHACL is not just the only solution. In fact, there are a wealth of options available, you just have to know where to look. Regardless, your business is going to have to take governance seriously and it’s better to start sooner than later.

Patrick Roland, February 5, 2018

Averaging Information Is Not Cutting It Anymore

January 16, 2018

Here is something interesting that comes after the headline of “People From Around The Globe Met For The First Flat Earth Conference” and beliefs that white supremacists are gaining more power.  The Frontiers Media shares that, “Rescuing Collective Wisdom When The Average Group Opinion Is Wrong” is an article that pokes fun at the fanaticism running rampant in the news.  Beyond the fanaticism in the news, there is a real concern with averaging when it comes to data science and other fields that heavily rely on data.

The article breaks down the different ways averaging is used and the different theorems that are developed from it.  The introduction is a bit wordy but it sets the tone:

The total knowledge contained within a collective supersedes the knowledge of even its most intelligent member. Yet the collective knowledge will remain inaccessible to us unless we are able to find efficient knowledge aggregation methods that produce reliable decisions based on the behavior or opinions of the collective’s members. It is often stated that simple averaging of a pool of opinions is a good and in many cases the optimal way to extract knowledge from a crowd. The method of averaging has been applied to analysis of decision-making in very different fields, such as forecasting, collective animal behavior, individual psychology, and machine learning. Two mathematical theorems, Condorcet’s theorem and Jensen’s inequality, provide a general theoretical justification for the averaging procedure. Yet the necessary conditions which guarantee the applicability of these theorems are often not met in practice. Under such circumstances, averaging can lead to suboptimal and sometimes very poor performance. Practitioners in many different fields have independently developed procedures to counteract the failures of averaging. We review such knowledge aggregation procedures and interpret the methods in the light of a statistical decision theory framework to explain when their application is justified. Our analysis indicates that in the ideal case, there should be a matching between the aggregation procedure and the nature of the knowledge distribution, correlations, and associated error costs.

Understanding how data can be corrupted is half the battle of figuring out how to correct the problem.  This is one of the complications related to artificial intelligence and machine learning.  One example is trying to build sentiment analysis engines.  These require huge data terabytes and the Internet provides an endless supply, but the usual result is that the sentiment analysis engines end up racist, misogynist, and all around trolls.  It might lead to giggles but does not very accurate results.

Whitney Grace, January 17, 2018

Blurring the Line Between Employees and AI

January 4, 2018

Using artificial intelligence to monitor employees is a complicated business. While some employers aim to improve productivity and making work easier, others may have other intents. That’s the thesis of the recent Harvard Business Review story, “The Legal Risks of Monitoring Employees Online.”

According to the story:

Companies are increasingly adopting sophisticated technologies that can help prevent the intentional or inadvertent export of corporate IP and other sensitive and proprietary data.

 

Enter data loss prevention, or “DLP” solutions, that help companies detect anomalous patterns or behavior through keystroke logging, network traffic monitoring, natural language processing, and other methods, all while enforcing relevant workplace policies. And while there is a legitimate business case for deploying this technology, DLP tools may implicate a panoply of federal and state privacy laws, ranging from laws around employee monitoring, computer crime, wiretapping, and potentially data breach statutes. Given all of this, companies must consider the legal risks associated with DLP tools before they are implemented and plan accordingly.

While it’s undeniable some companies will use technology monitor employees, this same machine learning and AI can help better employees. Like this story about how AI is forcing human intelligence to evolve and strengthen itself, not get worse. This is a story we’re watching closely because these two camps will likely only create a deeper divide.

Patrick Roland, January 4, 2018

Humans Living Longer but Life Quality Suffers

December 28, 2017

Here is an article that offers some thoughts worth pondering.  The Daily Herald published, “Study: Americans Are Retiring Later, Dying Sooner And Sicker In Between”.  It takes a look at how Americans are forced to retire at later ages than their parents because the retirement age keeps getting pushed up.  Since retirement is being put off, it allows people to ideally store away more finances for their eventual retirement.  The problem, however, is that retirees are not able to enjoy themselves in their golden years, instead, they are forced to continue working in some capacity or deal with health problems.

Despite being one of the world’s richest countries and having some of the best healthcare, Americans’ health has deteriorated in the past decade.  Here are some neighbors to make you cringe:

University of Michigan economists HwaJung Choi and Robert Schoeni used survey data to compare middle-age Americans’ health. A key measure is whether people have trouble with an “activity of daily living,” or ADL, such as walking across a room, dressing and bathing themselves, eating, or getting in or out of bed. The study showed the number of middle-age Americans with ADL limitations has jumped: 12.5 percent of Americans at the current retirement age of 66 had an ADL limitation in their late 50s, up from 8.8 percent for people with a retirement age of 65.

Also, Americans’ brains are rotting with an 11 percent increase in dementia and other cognitive declines in people from 58-60 years old.  Researchers are not quite sure what is causing the decline in health, but they, of course, have a lot of speculation.  These include alcohol abuse, suicide, drug overdoses, and, the current favorite, increased obesity.

The real answer is multiple factors, such as genes, lifestyle, stress, environment, and diet.  All of these things come into play.  Despite poor health quality, we can count on more medical technological advances in the future.  The aging population maybe the test grounds and improve the golden years of their grandchildren.

Whitney Grace, December 28, 2017

New York Begins Asking If Algorithms Can Be Racist

December 27, 2017

The whole point of algorithms is to be blind to everything except data. However, it is becoming increasingly clear that in the wrong hands, algorithms and AI could have a very negative impact on users. We learned more in a recent ACLU post, “New York Takes on Algorithm Discrimination.”

According to the story:

A first-in-the-nation bill, passed yesterday in New York City, offers a way to help ensure the computer codes that governments use to make decisions are serving justice rather than inequality.

 

Algorithms are often presumed to be objective, infallible, and unbiased. In fact, they are highly vulnerable to human bias. And when algorithms are flawed, they can have serious consequences.

 

The bill, which is expected to be signed by Mayor Bill de Blasio, will provide a greater understanding of how the city’s agencies use algorithms to deliver services while increasing transparency around them. This bill is the first in the nation to acknowledge the need for transparency when governments use algorithms…

This is a very promising step toward solving a very real problem. From racist coding to discriminatory AI, this is a topic that is creeping into the national conversation. We hope others will follow in New York’s footsteps and find ways to prevent this injustice from going further.

Patrick Roland, December 27, 2017

Data Analysis Startup Primer Already Well-Positioned

December 22, 2017

A new startup believes it has something unique to add to the AI data-processing scene, we learn from VentureBeat’s article, “Primer Uses AI to Understand and Summarize Mountains of Text.” The company’s software automatically summarizes (what it considers to be) the most important information from huge collections of documents. Filters then allow users to drill into the analyzed data. Of course, the goal is to reduce or eliminate the need for human analysts to produce such a report; whether Primer can soar where others have fallen short on this tricky task remains to be seen. Reporter Blair Hanley Frank observes:

Primer isn’t the first company to offer a natural language understanding tool, but the company’s strength comes from its ability to collate a massive number of documents with seemingly minimal human intervention and to deliver a single, easily navigable report that includes human-readable summaries of content. It’s this combination of scale and human readability that could give the company an edge over larger tech powerhouses like Google or Palantir. In addition, the company’s product can run inside private data centers, something that’s critical for dealing with classified information or working with customers who don’t want to lock themselves into a particular cloud provider.

Primer is sitting pretty with $14.7 million in funding (from the likes of Data Collective, In-Q-Tel, Lux Capital, and Amplify Partners) and, perhaps more importantly, a contract with In-Q-Tel that connects them with the U.S. Intelligence community. We’re told the software is being used by several agencies, but that Primer knows not which ones. On the commercial side, retail giant Walmart is now a customer. Primer emphasizes they are working to enable more complex reports, like automatically generated maps that pinpoint locations of important events. The company is based in San Francisco and is hiring for several prominent positions as of this writing.

Cynthia Murrell, December 22, 2017

Search System from UAEU Simplifies Life Science Research

December 21, 2017

Help is on hand for scientific researchers tired of being bogged down in databases in the form of a new platform called Biocarian. The Middle East’s ITP.net reports, “UAEU Develops New Search Engine for Life Sciences.” Semantic search is the key to the more efficient and user-friendly process. Writer Mark Sutton reports:

The UAEU [United Arab Emirages University] team said that Biocarian was developed to address the problem of large and complex data bases for healthcare and life science, which can result in researchers spending more than a third of their time searching for data. The new search engine users Semantic Web technology, so that researchers can easily create targeted searches to find the data they need in a more efficient fashion. … It allows complex queries to be constructed and entered, and offers additional features such as the capacity to enter ‘facet values’ according to specific criteria. These allow users to explore collated information by applying a range of filters, helping them to find what they are looking for quicker.

Project lead Nazar Zaki expects that simplifying the search process will open up this data to many talented researchers (who don’t happen to also be computer-science experts), leading to significant advances in medicine and healthcare. See the article for on the Biocarian platform.

Cynthia Murrell, December 21, 2017

Plan for 100,000 Examples When Training an AI

December 19, 2017

Just what is the magic number when it comes to the amount of data needed to train an AI? See VentureBeat’s article, “Google Brain Chief: Deep Learning Takes at Least 100,000 Examples” for an answer. Reporter Blair Hanley Frank cites Jeff Dean, a Google senior fellow, who spoke at this year’s VB Summit. Dean figures that supplying 100,000 examples gives deep learning systems enough examples of most types of data. Frank writes:

Dean knows a thing or two about deep learning — he’s head of the Google Brain team, a group of researchers focused on a wide-ranging set of problems in computer science and artificial intelligence. He’s been working with neural networks since the 1990s, when he wrote his undergraduate thesis on artificial neural networks. In his view, machine learning techniques have an opportunity to impact virtually every industry, though the rate at which that happens will depend on the specific industry. There are still plenty of hurdles that humans need to tackle before they can take the data they have and turn it into machine intelligence. In order to be useful for machine learning, data needs to be processed, which can take time and require (at least at first) significant human intervention. ‘There’s a lot of work in machine learning systems that is not actually machine learning,’ Dean said.

Perhaps poetically, Google is using machine learning to explore how best to perform this non-machine-learning work. The article points to a couple of encouraging projects, including Google DeepMind’s AlphaGo, which seems to have mastered the ancient game of Go simply by playing against itself.

Cynthia Murrell, December 19, 2017

Everyone Should Know the Term Cognitive Computing

December 19, 2017

Cognitive computing is a term everyone in the AI world should already be familiar with. If not, it’s time for a crash course. This is the DNA of machine learning and it is a fascinating field, as we learned from a recent Information Age story, “RIP Enterprise Search –AI-Based Cognitive Insight is the Future.”

According to the story:

The future of search is linked directly to the emergence of cognitive computing, which will provide the framework for a new era of cognitive search. This recognizes intent and interest and provides structure to the content, capturing more accurately what is contained within the text.

 

Context is king, and the four key (NOTE: We only included the most important two) elements of context detection are as follows:

 

Who – which user is looking for information? What have they looked for previously and what are they likely to be interested in finding in future? Who the individual is key as to what results are delivered to them.
What – the nature of the information is also highly important. Search has moved on from structured or even unstructured text within documents and web pages. Users may be looking for information in any number of different forms, from data within databases and in formats ranging from video and audio, to images and data collected from the internet-of-things (IOT).

Who and what is incredibly important, but that might be putting the cart before the horse. First, we must convince CEOs how important AI is to their business…any business. Thankfully, folks like Huffington Post are already ahead of us and rallying the troops.

Patrick Roland, December 19, 2017

 

Next Page »

  • Archives

  • Recent Posts

  • Meta