April 26, 2017
Amazon aims to insert itself into every aspect of daily life and the newest way it does is with the digital assistant Alexa. Reuters reports that, “Amazon Rolls Out Chatbot Tools In Race To Dominate Voice-Powered Tech,” explaining how Amazon plans to expand Alexa’s development. The retail giant recently released the technology behind Alexa to developers, so they can build chat features into apps.
Amazon is eager to gain dominance in voice-controlled technology. Apple and Google both reign supreme when it comes to talking computers, chatbots, and natural language processing. Amazon has a huge reach, perhaps even greater than Apple and Google, because people have come to rely on it for shopping. Chatbots have a notorious history for being useless and Microsoft’s Tay even turned into a racist, chauvinist program.
The new Alexa development tool is called Alexa Lex, which is hosted on the cloud. Alexa is already deployed in millions of homes and it is fed a continuous data stream that is crucial to the AI’s learning:
Processing vast quantities of data is key to artificial intelligence, which lets voice assistants decode speech. Amazon will take the text and recordings people send to apps to train Lex – as well as Alexa – to understand more queries.
That could help Amazon catch up in data collection. As popular as Amazon’s Alexa-powered devices are, such as Echo speakers, the company has sold an estimated 10 million or more.
Amazon Alexa is a competent digital assistant, able to respond to vocal commands and even offers voice-only shop via Amazon. As noted, Alexa’s power rests in its data collection and ability to learn natural language processing. Bitext uses a similar method but instead uses trained linguists to build its analytics platform.
Whitney Grace, April 26, 2017
April 17, 2017
Math is objective, right? Not really. Developers of artificial intelligence systems, what I call smart software, rely on what they learned in math school. If you have flipped through math books ranging from the Googler’s tome on artificial intelligence Artificial Intelligence: A Modern Approach to the musings of the ACM’s journals, you see the same methods recycled. Sure, the algorithms are given a bath and their whiskers are cropped. But underneath that show dog’s sleek appearance, is a familiar pooch. K-means. We have k-means. Decision trees? Yep, decision trees.
What happens when developers feed content into Rube Goldberg machines constructed of mathematical procedures known and loved by math wonks the world over?
The answer appears in “Semantics Derived Automatically from Language Corpora Contain Human Like Biases.” The headline says it clearly, “Smart software becomes as wild and crazy as a group of Kentucky politicos arguing in a bar on Friday night at 2:15 am.”
Biases are expressed and made manifest.
The article in Science reports with considerable surprise it seems to me:
word embeddings encode not only stereotyped biases but also other knowledge, such as the visceral pleasantness of flowers or the gender distribution of occupations.
Ah, ha. Smart software learns biases. Perhaps “smart” correlates with bias?
The canny whiz kids who did the research crawfish a bit:
We stress that we replicated every association documented via the IAT that we tested. The number, variety, and substantive importance of our results raise the possibility that all implicit human biases are reflected in the statistical properties of language. Further research is needed to test this hypothesis and to compare language with other modalities, especially the visual, to see if they have similarly strong explanatory power.
Yep, nothing like further research to prove that when humans build smart software, “magic” happens. The algorithms manifest biases.
What the write up did not address is a method for developing less biases smart software. Is such a method beyond the ken of computer scientists?
To get more information about this question, I asked on the world leader in the field of computational linguistics, Dr. Antonio Valderrabanos, the founder and chief executive officer at Bitext. Dr. Valderrabanos told me:
We use syntactic relations among words instead of using n-grams and similar statistical artifacts, which don’t understand word relations. Bitext’s Deep Linguistics Analysis platform can provide phrases or meaningful relationships to uncover more textured relationships. Our analysis will provide better content to artificial intelligence systems using corpuses of text to learn.
Bitext’s approach is explained in the exclusive interview which appeared in Search Wizards Speak on April 11, 2017. You can read the full text of the interview at this link and review the public information about the breakthrough DLA platform at www.bitext.com.
It seems to me that Bitext has made linguistics the operating system for artificial intelligence.
Stephen E Arnold, April 17, 2017
April 11, 2017
On a recent trip to Madrid, Spain, I was able to arrange an interview with Dr. Antonio Valderrabanos, the founder and CEO of Bitext. The company has its primary research and development group in Las Rosas, the high-technology complex a short distance from central Madrid. The company has an office in San Francisco and a number of computational linguists and computer scientists in other locations. Dr. Valderrabanos worked at IBM in an adjacent field before moving to Novell and then making the jump to his own start up. The hard work required to invent a fundamentally new way to make sense of human utterance is now beginning to pay off.
Dr. Antonio Valderrabanos, founder and CEO of Bitext. Bitext’s business is growing rapidly. The company’s breakthroughs in deep linguistic analysis solves many difficult problems in text analysis.
Founded in 2008, the firm specializes in deep linguistic analysis. The systems and methods invented and refined by Bitext improve the accuracy of a wide range of content processing and text analytics systems. What’s remarkable about the Bitext breakthroughs is that the company support more than 40 different languages, and its platform can support additional languages with sharp reductions in the time, cost, and effort required by old-school systems. With the proliferation of intelligent software, Bitext, in my opinion, puts the digital brains in overdrive. Bitext’s platform improves the accuracy of many smart software applications, ranging from customer support to business intelligence.
In our wide ranging discussion, Dr. Valderrabanos made a number of insightful comments. Let me highlight three and urge you to read the full text of the interview at this link. (Note: this interview is part of the Search Wizards Speak series.)
Linguistics as an Operating System
One of Dr. Valderrabanos’ most startling observations addresses the future of operating systems for increasingly intelligence software and applications. He said:
Linguistic applications will form a new type of operating system. If we are correct in our thought that language understanding creates a new type of platform, it follows that innovators will build more new things on this foundation. That means that there is no endpoint, just more opportunities to realize new products and services.
Better Understanding Has Arrived
Some of the smart software I have tested is unable to understand what seems to be very basic instructions. The problem, in my opinion, is context. Most smart software struggles to figure out the knowledge cloud which embraces certain data. Dr. Valderrabanos observed:
Search is one thing. Understanding what human utterances mean is another. Bitext’s proprietary technology delivers understanding. Bitext has created an easy to scale and multilingual Deep Linguistic Analysis or DLA platform. Our technology reduces costs and increases user satisfaction in voice applications or customer service applications. I see it as a major breakthrough in the state of the art.
If he is right, the Bitext DLA platform may be one of the next big things in technology. The reason? As smart software becomes more widely adopted, the need to make sense of data and text in different languages becomes increasingly important. Bitext may be the digital differential that makes the smart applications run the way users expect them to.
Snap In Bitext DLA
Advanced technology like Bitext’s often comes with a hidden cost. The advanced system works well in a demonstration or a controlled environment. When that system has to be integrated into “as is” systems from other vendors or from a custom development project, difficulties can pile up. Dr. Valderrabanos asserted:
Bitext DLA provides parsing data for text enrichment for a wide range of languages, for informal and formal text and for different verticals to improve the accuracy of deep learning engines and reduce training times and data needs. Bitext works in this way with many other organizations’ systems.
When I asked him about integration, he said:
No problems. We snap in.
I am interested in Bitext’s technical methods. In the last year, he has signed deals with companies like Audi, Renault, a large mobile handset manufacturer, and an online information retrieval company.
When I thanked him for his time, he was quite polite. But he did say, “I have to get back to my desk. We have received several requests for proposals.”
Las Rosas looked quite a bit like Silicon Valley when I left the Bitext headquarters. Despite the thousands of miles separating Madrid from the US, interest in Bitext’s deep linguistic analysis is surging. Silicon Valley has its charms, and now it has a Bitext US office for what may be the fastest growing computational linguistics and text analysis system in the world. Worth watching this company I think.
For more about Bitext, navigate to the firm’s Web site at www.bitext.com.
Stephen E Arnold, April 11, 2017
March 24, 2017
Will search-and-discovery firm Diffeo’s recent acquisition give it the edge? Yahoo Finance shares, “Diffeo Acquires Meta Search and Launches New Offering.” Startup Meta Search developed a local computer and cloud search system that uses smart indexing to assign index terms and keep the terms consistent. Diffeo provides a range of advanced content processing services based on collaborative machine intelligence. The press release specifies:
Diffeo’s content discovery platform accelerates research analysts by applying text analytics and machine intelligence algorithms to users’ in-progress files, so that it can recommend content that fills in knowledge gaps — often before the user thinks of searching. Diffeo acts as a personal research assistant that scours both the user’s files and the Internet. The company describes its technology as collaborative machine intelligence.
Diffeo and Meta’s services complement each other. Meta provides unified search across the content on all of a user’s cloud platforms and devices. Diffeo’s Advanced Discovery Toolbox displays recommendations alongside in-progress documents to accelerate the work of research analysts by uncovering key connections.
Meta’s platform integrates cloud environments into a single keyword search interface, enabling users to search their files on all cloud drives, such as Dropbox, Google Drive, Slack and Evernote all at once. Meta also improves search quality by intelligently analyzing each document, determining the most important concepts, and automatically applying those concepts as ‘Smart Tags’ to the user’s documents.
This seems like a promising combination. Founded in 2012, Diffeo made Meta Search its first acquisition on January 10 of this year. The company is currently hiring. Meta Search, now called Diffeo Cloud Search, is based in Boston.
Cynthia Murrell, March 24, 2017
March 21, 2017
Have you ever watched a crawfish (sometimes called a crawdad or a crayfish) get away from trouble. The freshwater crustaceans can go backwards. Members of the members of the Astacidae can be found in parts of the south, so you will have to wander in a Georgia swamp to check out the creature’s behavior.
The point is that crawfish go backwards to protect themselves and achieve their tiny lobster like goals. Big time consultants also crawfish in order to sell more work and provide “enhanced” insight into a thorny business or technical problem other consultants have created.
To see this in action, navigate to “The Conundrum of Big Data.” A super consultant explains that Big Data is not exactly the home run, silver bullet, or magic potion some lesser consultants said Big Data would be. I learned:
Despite two decades of intensive IT investment in data [mining] applications, recent studies show that companies continue to have trouble identifying metrics that can predict and explain performance results and/or improve operations. Data mining, the process of identifying patterns and structures in the data, has clear potential to identify prescriptions for success but its wide implementation fails systematically. Companies tend to deploy ‘unsupervised-learning’ algorithms in pursuit of predictive metrics, but this automated [black box] approach results in linking multiple low-information metrics in theories that turn out to be improbably complex.
Big surprise. For folks who are not trained in the nuts and bolts of data analysis and semi fancy math, Big Data is a giant vacuum cleaner for money. The cash has to pay for “experts,” plumbing, software, and more humans. The outputs are often fuzzy wuzzy probabilities which more “wizards” interpret. Think of a Greek religious authority looking at the ancient equivalent of road kill.
The write up cites the fizzle that was Google Flu Trends. Cough. Cough. But even that sneeze could be fixed with artificial intelligence. Yep, when smart humans make mistakes, send in smart software. That will work.
In my opinion, the highlight of the write up was this passage:
When it comes to data, size isn’t everything because big data on their own cannot just solve the problem of ‘insight’ (i.e. inferring what is going on). The true enablers are the data-scientists and statisticians who have been obsessed for more than two centuries to understand the world through data and what traps lie in wait during this exercise. In the world of analytics (AaaS), it is agility (using science, investigative skills, appropriate technology), trust (to solve the client’s real business problems and build collateral), and ‘know-how’ (to extract intelligence hidden in the data) that are the prime ‘assets’ for competing, not the size of the data. Big data are certainly here but big insights have yet to arrive.
Yes. More consulting is needed to make those payoffs arrive. But first, hire more advisers. What could possibly go wrong? Cough. Sneeze. One goes forwards with Big Data by going backwards for more analysis.
Stephen E Arnold, March 21, 2017
March 13, 2017
It will require training Canada’s youth in design and the arts, as well as STEM subjects if that country is to excel in today’s big-data world. That is the advice of trio of academic researchers in that country, Patricio Davila, Sara Diamond, and Steve Szigeti, who declare, “There’s No Big Data Without Intelligent Interface” at the Globe and Mail. The article begins by describing why data management is now a crucial part of success throughout society, then emphasizes that we need creative types to design intuitive user interfaces and effective analytics representations. The researchers explain:
Here’s the challenge: For humans, data are meaningless without curation, interpretation and representation. All the examples described above require elegant, meaningful and navigable sensory interfaces. Adjacent to the visual are emerging creative, applied and inclusive design practices in data “representation,” whether it’s data sculpture (such as 3-D printing, moulding and representation in all physical media of data), tangible computing (wearables or systems that manage data through tactile interfaces) or data sonification (yes, data can make beautiful music).
Infographics is the practice of displaying data, while data visualization or visual analytics refers to tools or systems that are interactive and allow users to upload their own data sets. In a world increasingly driven by data analysis, designers, digital media artists, and animators provide essential tools for users. These interpretive skills stand side by side with general literacy, numeracy, statistical analytics, computational skills and cognitive science.
We also learn about several specific projects undertaken by faculty members at OCAD University, where our three authors are involved in the school’s Visual Analytics Lab. For example, the iCity project addresses transportation network planning in cities, and the Care and Condition Monitor is a mobile app designed to help patients and their healthcare providers better work together in pursuit of treatment goals. The researchers conclude with an appeal to their nation’s colleges and universities to develop programs that incorporate data management, data numeracy, data analysis, and representational skills early and often. Good suggestion.
Cynthia Murrell, March 13, 2017
March 10, 2017
I read “The Much-Needed Business Facet for Modern Data Integration.” The write up presents some useful information. Not many of the “go fast and break things” crowd will relate to some of the ideas and suggestions, but I found the article refreshing.
What does one do to make modern data centric activities sort of work? The answers are ones that I have found many more youthful wizards often elect to ignore.
Here they are:
- Do data preparation. Yikes. Normalization of data. I have fielded this question in the past, “Who has time for that?” Answer: Too few, gentle reader. Too few.
- Profile the data. Another gasp. In my experience it is helpful to determine what data are actually germane to the goal. Think about the polls for the recent
- Create data libraries. Good idea. But it is much more fun to just recreate data sets. Very Zen like.
- Have rules which are now explained as “data governance.” The jargon does not change the need for editorial and data guidelines.
- Take a stab at data quality. This is another way of saying, “Clean up the data.” Even whiz bang modern systems are confused with differences like I.B.M and International Business Machines or numbers with decimal points in the incorrect place.
- Get colleagues in the game. This is a good idea, but in many organizations in which I have worked “team” is spelled “my bonus.”
Useful checklist. I fear that those who color unicorns will not like the dog work which accompanies implementing the ideas. That’s what makes search and content processing so darned interesting.
Stephen E Arnold, March 10, 2017
March 10, 2017
Monopolies aren’t just for telecoms and zipper manufacturers. The Nation reveals a much scarier example in its article, “5 Corporations Now Dominate Our Privatized Intelligence Industry.” Reporter Tim Shorrock outlines the latest merger that brings us to this point, one between Pentagon & NSA contractor Leidos Holdings and a division of Lockheed Martin called Information Systems and Global Solutions. Shorrock writes:
The sheer size of the new entity makes Leidos one of the most powerful companies in the intelligence-contracting industry, which is worth about $50 billion today. According to a comprehensive study I’ve just completed on public and private employment in intelligence, Leidos is now the largest of five corporations that together employ nearly 80 percent of the private-sector employees contracted to work for US spy and surveillance agencies.
Yes, that’s 80 percent. For the first time since spy agencies began outsourcing their core analytic and operational work in the late 1990s, the bulk of the contracted work goes to a handful of companies: Leidos, Booz Allen Hamilton, CSRA, SAIC, and CACI International. This concentration of ‘pure plays’—a Wall Street term for companies that makes one product for a single market—marks a fundamental shift in an industry that was once a highly diverse mix of large military contractors, small and medium technology companies, and tiny ‘Beltway Bandits’ surrounding Washington, D.C.
I should mention that our beloved leader, Stephen E Arnold, used to work as a gopher for one of these five companies, Booz Allen Hamilton. Shorrock details the reasons such concentrated power is a problem in the intelligence industry, and shares the profile he has made on each company. He also elaborates on the methods he used to analyze the shadowy workforce they employ. (You’ll be unsurprised to learn it can be difficult to gather data on intelligence workers.) See the article for those details, and for Shorrock’s discussion of negligence by the media and by Congress on this matter. We can agree that most folks don’t seem to be aware of this trend, or of its potential repercussions.
Cynthia Murrell, March 10, 2016
March 9, 2017
The idea that software can make sense of information is a powerful one. Many companies tout the capabilities of their business processes, analytical tools, and staff to look at data and get a sense of the future. The vast majority of these firms have tools and methods which provide useful information.
What happens when a person who did not take a course in analytics learns about the strengths and limitations of these systems?
Answer: You get some excitement.
I read “Big Data’s Power Is Terrifying. That Could Be Good News for Democracy.” The main idea is that companies with nifty analytic systems and methods can control life is magnetic. Lots of folks want to believe that a company’s analyses can have a significant impact on elections, public opinion, and maybe the stock market.
The write up asserts:
Online information already lends itself to manipulation and political abuse, and the age of big data has scarcely begun. In combination with advances in cognitive linguistics and neuroscience, this data could become a powerful tool for changing the electoral decisions we make. Our capacity to resist manipulation is limited.
My view is that one must not confuse the explanations from marketing mavens, alarmists, and those who want to believe that Star Trek is “real” with what today’s systems can do. Firms like Cambridge Analytica and others generate reports. In fact, companies have been using software to figure out what’s what for many years.
What’s interesting is that folks learn about these systems and pick up the worn ball and carry it down field while screaming, “Touchdown.”
Sorry. The systems don’t warrant that type of excitement. Reality is less exciting. Probabilities are useful, not reality. But why not carry the ball. It is easier than learning what analytics firms do.
Stephen E Arnold, March 9, 2017
March 2, 2017
Maybe this is fake news? Maybe. Navigate to “Big Blue’s Big Blunder: IBM Accidentally Hands Over Root Access to Its Data Science Servers.” When I read the title, my first reaction was, “Hey, Yahoot is back in the security news.” Wrong.
According to the write up, which I assume to be exposing the “truth”:
IBM left private keys to the Docker host environment in its Data Science Experience service inside freely available containers. This potentially granted the cloud service’s users root access to the underlying container-hosting machines – and potentially to other machines in Big Blue’s Spark computing cluster. Effectively, Big Blue handed its cloud users the secrets needed to potentially commandeer and control its service’s computers.
IBM hopped to it. Two weeks after the stumble was discovered, IBM fixed the problem.
The write up includes this upbeat statement, attributed to the person using a demo account which exposed the glitch:
I think that IBM already has some amazing infosec people and a genuine commitment to protecting their services, and it’s a matter of instilling security culture and processes across their entire organization. That said, any company that has products allowing users to run untrusted code should think long and hard about their system architecture. This is not to imply that containers were poorly designed (because I don’t think they were), but more that they’re so new that best practices in their use are still being actively developed. Compare a newer-model table saw to one decades old: The new one comes stock with an abundance of safety features including emergency stopping, a riving knife, push sticks, etc, as a result of evolving culture and standards through time and understanding.
Bad news. Good news.
Let’s ask Watson about IBM security. Hold that thought, please. Watson is working on health care information. And don’t forget the March 2017 security conference sponsored by those security pros at IBM.
Stephen E Arnold, March 2, 2017