The Algorithm to Failure
April 12, 2017
Algorithms have practically changed the way the world works. However, this nifty code also has its limitations that lead to failures.
In a whitepaper published by Cornell University, authored by Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah and titled Failures of Deep Learning, the authors say:
It is important, for both theoreticians and practitioners, to gain a deeper understanding of the difficulties and limitations associated with common approaches and algorithms.
The whitepaper touches four pain points of Deep Learning, which is based on algorithms. The authors propose remedial measures that possibly could overcome these impediments and lead to better AI.
Eminent personalities like Stephen Hawking, Bill Gates and Elon Musk have however warned against advancing AIs. Google in the past had abandoned robotics as the machines were becoming too intelligent. What now needs to be seen is who will win in the end? Commercial interests or unfounded fear?
Vishal Ingole, April 12, 2017
Whose Message Is It Anyway?
April 11, 2017
Instant messaging service provider WhatsApp is in a quandary. While privacy of its users is of utmost importance to them, where do they draw the line if it’s a question of national security?
In an editorial published in The Telegraph titled WhatsApp Accused of Giving Terrorists ‘a Secret Place to Hide’ as It Refuses to Hand over London Attacker’s Messages, the writer says:“The Government was considering legislation to force online firms to take down extremist material, but said it was time for the companies to “recognise that they have a responsibility” to get their own house in order.
Apps like WhatsApp offer end-to-end encryption for messages sent using its network. This makes it impossible (?) for anyone to intercept and read them, even technicians at WhatsApp. On numerous occasions, WhatsApp, owned by Facebook, has come under fire for protecting its user privacy. In this particular incident, the London attacker Ajao used WhatsApp to send message to someone. While Soctland Yard wants access to the messages sent by the terrorist, WhatsApp says its hands are tied.
The editorial also says that social media networks are no more tech companies, rather they are turning into publishing companies thus the onus is on them to ensure the radical materials are also removed from their networks. Who ultimately will win the battle remains to be seen, but right now, WhatsApp seems to have the edge.
Vishal Ingole, April 11, 2017
HonkinNews for April 11, 2017, Now Available
April 11, 2017
This week’s HonkinNews video program leads with information about Bitext, a company providing breakthrough deep linguistic analysis solutions. In order to put the comments of Dr. Antonio Valderrabanos in perspective, HonkinNews takes a look at the “promo” article discussing IBM’s cognitive computing activities. There is one key difference highlighted in HonkinNews: IBM talks jargon in recycled marketing language and Bitext’s CEO talks about the company’s rapid growth and licensing deals with companies like Audi, Renault, and one of the largest players in the mobile device and mobile services market. The program also looks at the remarkable 9,000 word Fortune Magazine article about Palantir Technologies’ interaction with US government procurement agencies. The very long article does not describe Palantir’s technical innovations nor does the Fortune analysis explain why using commercial off-the-shelf software for intelligence work makes sense. News about the Dark Web Notebook teams three presentations at the prestigious TechnoSecurity & Digital Forensics Conference in June 2017 complements a special offer for the only handbook to Dark Web investigations available. For discount information, check out the links displayed in the video. The video also takes a look at the new Yahoo. Once the transformation of Yahoo into Oath with a punctuation mark no less takes place, the Yahoo yodel will become a faint auditory memory. Does the HonkinNews item trigger an auditory memory. Watch this week’s video to find out. You can watch the video at this link.
Kenny Toth, April 11, 2017
Bitext: Exclusive Interview with Antonio Valderrabanos
April 11, 2017
On a recent trip to Madrid, Spain, I was able to arrange an interview with Dr. Antonio Valderrabanos, the founder and CEO of Bitext. The company has its primary research and development group in Las Rosas, the high-technology complex a short distance from central Madrid. The company has an office in San Francisco and a number of computational linguists and computer scientists in other locations. Dr. Valderrabanos worked at IBM in an adjacent field before moving to Novell and then making the jump to his own start up. The hard work required to invent a fundamentally new way to make sense of human utterance is now beginning to pay off.
Dr. Antonio Valderrabanos, founder and CEO of Bitext. Bitext’s business is growing rapidly. The company’s breakthroughs in deep linguistic analysis solves many difficult problems in text analysis.
Founded in 2008, the firm specializes in deep linguistic analysis. The systems and methods invented and refined by Bitext improve the accuracy of a wide range of content processing and text analytics systems. What’s remarkable about the Bitext breakthroughs is that the company support more than 40 different languages, and its platform can support additional languages with sharp reductions in the time, cost, and effort required by old-school systems. With the proliferation of intelligent software, Bitext, in my opinion, puts the digital brains in overdrive. Bitext’s platform improves the accuracy of many smart software applications, ranging from customer support to business intelligence.
In our wide ranging discussion, Dr. Valderrabanos made a number of insightful comments. Let me highlight three and urge you to read the full text of the interview at this link. (Note: this interview is part of the Search Wizards Speak series.)
Linguistics as an Operating System
One of Dr. Valderrabanos’ most startling observations addresses the future of operating systems for increasingly intelligence software and applications. He said:
Linguistic applications will form a new type of operating system. If we are correct in our thought that language understanding creates a new type of platform, it follows that innovators will build more new things on this foundation. That means that there is no endpoint, just more opportunities to realize new products and services.
Better Understanding Has Arrived
Some of the smart software I have tested is unable to understand what seems to be very basic instructions. The problem, in my opinion, is context. Most smart software struggles to figure out the knowledge cloud which embraces certain data. Dr. Valderrabanos observed:
Search is one thing. Understanding what human utterances mean is another. Bitext’s proprietary technology delivers understanding. Bitext has created an easy to scale and multilingual Deep Linguistic Analysis or DLA platform. Our technology reduces costs and increases user satisfaction in voice applications or customer service applications. I see it as a major breakthrough in the state of the art.
If he is right, the Bitext DLA platform may be one of the next big things in technology. The reason? As smart software becomes more widely adopted, the need to make sense of data and text in different languages becomes increasingly important. Bitext may be the digital differential that makes the smart applications run the way users expect them to.
Snap In Bitext DLA
Advanced technology like Bitext’s often comes with a hidden cost. The advanced system works well in a demonstration or a controlled environment. When that system has to be integrated into “as is” systems from other vendors or from a custom development project, difficulties can pile up. Dr. Valderrabanos asserted:
Bitext DLA provides parsing data for text enrichment for a wide range of languages, for informal and formal text and for different verticals to improve the accuracy of deep learning engines and reduce training times and data needs. Bitext works in this way with many other organizations’ systems.
When I asked him about integration, he said:
No problems. We snap in.
I am interested in Bitext’s technical methods. In the last year, he has signed deals with companies like Audi, Renault, a large mobile handset manufacturer, and an online information retrieval company.
When I thanked him for his time, he was quite polite. But he did say, “I have to get back to my desk. We have received several requests for proposals.”
Las Rosas looked quite a bit like Silicon Valley when I left the Bitext headquarters. Despite the thousands of miles separating Madrid from the US, interest in Bitext’s deep linguistic analysis is surging. Silicon Valley has its charms, and now it has a Bitext US office for what may be the fastest growing computational linguistics and text analysis system in the world. Worth watching this company I think.
For more about Bitext, navigate to the firm’s Web site at www.bitext.com.
Stephen E Arnold, April 11, 2017
Google: The Male Female Thing
April 10, 2017
I have fond memories of my high school’s science club. My hunch is that some Google-type companies do too.
I look back and remember the days of Donald Jackson, who with his brother Bernard, published an article in a peer reviewed astronomy journal. Those guys were fixated on the moon. Go figure.
There was a canny lad named Phil Herbst, who shifted to fuzzy science with his interest in anthropology. Misguided. Anthropology. Who cares about that?
There was Steve Connett, who was into electrical engineering and the goodies which that required his parents to provide.
And the others?Males. Every one of them.
I don’t recall any females in the science club. Super smart Hope Davis, one of the females in my advanced physics class, had perfect pitch, a knack for mathematics, and a well founded disdain for the males in the science club.
My experience with her as a lab partner is that she was smarter than most of the fellows who gathered a couple of times a month to discuss explosives, corrosive chemical compounds, circuits which could terminate certain creatures with a zap, and the other nifty things the dozen or so regulars found fascinating.
Why was science club in the rust belt in 1958 a no go zone for really smart people like Hope Davis?
My favorite line from the motion picture “Revenge of the Nerds” is, “Nerds.” Poetic.
My answer is that the males in my science club were not exactly hot social items. Although I was the dumbest person in the club, I shared three qualities with the real brainiacs in the group:
- Zero awareness of females and their abilities. I was an only child, had zero exposure to females outside of class, and lived within my own weird little world of books and model airplanes
- My notion of conversation was my ability to repeat almost anything I read verbatim. (Alas, as I age, that wonderful automatic function does not work as well as it did. But when it was in high gear, absolutely no female in any of my classes wanted to speak with me. Who wanted a fat, nearsighted meatware audio book for a friend?)
- I was deeply uncomfortable around anyone not in the odd ball special classes my high school offered for students who seemed to get A grades and did not participate in [a] sports, [b] school governance, [c] social activities like parties and dances, and [d] activities understood by the high school administrators.
I thought of my high school science club when I read “Google Accused of ‘Extreme’ Gender Pay Discrimination by US Labor Department.” I quite like the word “extreme.” Quite charged and suggestive. I learned:
Google has discriminated against its female employees, according to the US Department of Labor (DoL), which said it had evidence of “systemic compensation disparities”.
Making a leap from the particular allegation against Google to a fuzzy swath of California, the real journalists who are struggling with their own demons, states:
The explosive allegation against one of the largest and most powerful companies in Silicon Valley comes at a time when the male-dominated tech industry is facing increased scrutiny over gender discrimination, pay disparities and sexual harassment.
Does the word “extreme” up the ante?
Battle in the Clouds
April 10, 2017
The giants of the tech world are battling fiercely to dominate the Cloud services industry. Amazon, however is still at the pole position being the first entrant, followed by Microsoft, Google and IBM.
The Street in an in-depth report titled How Amazon, Microsoft, Google and IBM Battle for Dominance in the Cloud says:
Amazon Web Services, or AWS, is the indisputable leader, with a breadth of services and clients ranging from blue chips such as Coca Cola (KO) and General Electric (GE) to app-economy stalwarts like Netflix (NFLX), Tinder and Lyft. Microsoft and Google are closing the features gap, even if they are far behind on market share.
So far, these technology giants are fighting it out in cornering the IaaS market. Amazon with AWS clearly dominates this space. Microsoft, because of its inherent advantage of B2B software already running across major corporations has it easy, but not easy enough to topple Amazon. Google and IBM are vying for the remaining market share.
Apart from IaaS, PaaS is going to be the next frontier on which the Cloud battles will be fought, the report states. Consolidation is a distant possibility considering the fact that the warriors involved are too big to be acquired. With most services at par, innovation will be the key to gain and sustain in this business.
Vishal Ingole, April 10, 2017
Creative Commons Eludes Copyright With Free Image Search
April 7, 2017
One scandal that plagues the Internet is improper usage and citation of digital images. Photographs, art, memes, and GIFs are stolen on a daily basis and original owners are often denied compensation or credit. Most of the time, usage is completely innocent; other times it is blatant theft. If you need images for your Web site or project, but do not want to be sent a cease and desist letter or slammed with a lawsuit check out the Creative Commons, a community where users post photos, art, videos, and more free of copyright control as long as you give credit to the original purveyor. Forbes wrote that, “Creative Commons’ New Search Engine Makes It Easy To Find Free-To-Use Images.”
The brand new Creative Commons search engine is something the Internet has waited for:
The Creative Commons search engine gives you access to over nine million images drawn from 500px, Flickr, the Metropolitan Museum of Art, the New York Public Library and the Rijksmuseum. You can search through all or any combination of these collections. You can also constrain your search to titles, creators, tags or any combination of the three. Finally, you can limit your search to images that you can modify, adapt or build upon as you see fit, or that are free to use for commercial purposes.
Creative Commons is a wonderful organization and copyright tool that allows people to share their work with others while receiving proper credit. It is also a boon for others who need photos and video to augment their own work. My only question is: why did it take so long for the Creative Commons to make this search engine?
Whitney Grace, April 7, 2017
IBM: Recycling Old Natural Language Assertions
April 6, 2017
I have ridden the natural language processing unicycle a couple of times in the last 40 years. In fact, for a company in Europe I unearthed from my archive NLP white papers from outfits like Autonomy Software and Siderean Software among others. The message is the same: Content processing from these outfits can figure out the meaning of a document. But accuracy was a challenge. I slap the word “aboutness” on these types of assertions.
Don’t get me wrong. Progress is being made. But the advances are often incremental and delivered as the subsystem level of larger systems. A good example is the remarkable breakthrough technology of Madrid, Spain-based Bitext. The company’s Deep Linguistic Analysis Platform solves a very difficult problem when an outfit like a big online service has to figure out the who, what, when, and where in a flood of content in 10, 20, or 30 or more languages. The cost of using old-school systems is simply out of reach even for companies with billion in the bank.
I read “Your Machine Used to Crunch Numbers. Now It Can Chew over What They Mean, Too.” The write up appeared in the normally factual online publication “The Register.” The story, in my opinion, sucks in IBM marketing speak and makes some interesting assertions about what Lucene, home brew scripts, and acquired technology can deliver. In my experience, “aboutness” requires serious proprietary systems and methods. Language, no matter what one believes when Google converts 400 words of Spanish into semi-okay English.
In the article I was told:
This makes sense, because the branches of AI gaining most traction today – machine learning and deep learning – typically have non-deterministic outputs. They’re “fuzzy”, producing confidence scores relating to their inputs and outputs. This makes AI-based analytics systems good at analyzing the kind of data that has sprung up since the early 2000s; particularly social media posts.
Well, sort of. There are systems which can identify from unstructured text in many languages the actor, the action, and the outcome. In addition, these systems can apply numerical recipes to identify items of potential interest to an analyst or another software systems. The issue is error rate. Many current entity tagging systems stumble badly when it comes to accuracy.
But IBM has been nosing around NLP and smart software for a long time. Do you remember Data Fountain or Dr. Jon Kleinberg’s CLEVER system? These are important, but they too were suggestive, not definitive approaches.
The write up tells me via Debbie Landers, IBM Canada’s vice president of Cognitive Solutions:
People are constantly buying security products to fix a problem or get a patch to update something after it’s already happened, which you have to do, but that’s table stakes,” he says. Machine learning is good at spotting things as they’re happening (or in the case of predictive analytics, beforehand). Their anomaly detection can surface the ‘unknown unknowns’ – problems that haven’t been seen before, but which could pose a material threat. In short, applying this branch of AI to security analytics could help you understand where attackers are going, rather than where they’ve been. What does the future hold for analytics, as we get more adept at using them? Solutions are likely to become more predictive, because they’ll be finding patterns in empirical data that people can’t spot. They’ll also become more context-aware, using statistical modeling and neural networks to produce real-time data that correlates with specific situations.
My reaction to this write up is that IBM is “constantly” thrashing for a way to make Watson-type services a huge revenue producer for IBM. From recipes to cancer, from education to ever more spectacular assertions about what IBM technology can do—IBM is demonstrating that it cannot keep up with smart software embedded in money making products and mobile services.
Is this a promotional piece? Yep, The Reg even labels it as such with this tag:
See. A promo, not fake news exactly. It is clear that IBM is working overtime with its PR firm and writing checks to get the Watson meme in many channels, including blogs.
Beyond Search wants to do its part. However, my angle is different. Look around for innovative companies engaged in smart software and closing substantive deals. Compare the performance of these systems with that of IBM’s solutions, if you can arrange an objective demonstration. Then you will know how much of IBM’s content marketing carpet bombing falls harmlessly on deaf ears and how many payloads hit a cash register and cause it to pay out cash. (A thought: A breakthrough company in Madrid may be a touchstone for those who are looking for more than marketing chatter.)
Stephen E Arnold, April 6, 2017
Google Management Methods: The Car Thing
April 6, 2017
I like to track Google’s behavior for management insights. I noted what might be a great moment in Google’s management methods. The article “Google Paid Its Self-Driving Car Boss $120 Million and Then He Left for Uber.” The write up told me in sterling journalistic prose:
Embattled engineer Anthony Levandowski collected $120 million from Google, despite involvement with at least one start-up that would ultimately compete with the company, according to new legal filings. Levandowski was already trying to staff up his competing start-up, Otto, while he worked at Google — but he waited until he got his payout to make the details of Otto public, a lawsuit said.
If this is accurate, I surmise that:
- Alphabet Google management reviewed Mr. Levandowski’s performance, spoke to some of his colleagues, and maybe asked a peer or two for some information about Mr. Lewandowski; for example, Is he a team player? Is he moving the ball down the field?
- The personnel review, prepared by the smart Googlers, flashed green and someone in management okayed a bonus of $120 million. Most employees have a spending authority limit. Few employees can sign of on $1.2 million and then hit the volleyball court. Accountability in action?
- Mr. Lewandowski, if the write up is on the money, was recruiting people and doing other sort of work related things. I assume this involved email, phone calls, and face-to-face interviews. Obviously Mr. Lewandowski’s activities did not seem untoward.
My observation is that the Google management method involves some procedures which fuzzify what is going on. My thought is that this is the augmented reality, Google Glass approach to management. Errors in perception can be rectified in court.
McKinsey, Bain, BCG, and other blue chip management consulting firms have a great deal to learn from Google. I wonder if a book about Google’s management methods will become available on the Google Play store soon. Here’s hoping.
Stephen E Arnold, April 6, 2017
Yikes! Google Skeptics Amp Up
April 6, 2017
Beyond providing search, email, office suite services, and not doing any evil, another of Google’s goals is to ramp up its search speed. Media Post shares via its Search Marketing Daily column that “Search Experts Skeptical Of Google Amp Updates.” Google’s Accelerated Mobile Project (AMP) might make it easier to access the original URL from search results, companies who rely on mobile search for marketing and advertising are not happy with it.
AMP reduces a Web site’s functionality by caching the content and in search results it prioritizes AMP. Companies are losing potential clients when they are unable to display their wares in the growing mobile market. It also does not bode well for Google, which draws a significant profit from ad revenue. Why would Google hinder its own clients? It is all in an effort to make the end user’s Google mobile search experience better.
The clients want to forgo the AMP experience:
‘If load times and user experience is really the issue here, then Google should prioritize based on load speed,’ wrote Yee Cheng Chin. ‘An AMP site with tons of images isn’t necessarily better than a simple minimal static page Web site served over CDN. I also want to use Google to look for relevant content, not whether a website conforms to Google’s own proprietary standards when searching.’ Chin, along with others, simply want to know how to disable the feature.
End users are frustrated as well because AMP changes the original URL’s content and does not always show what would be available on a full page.
The load times might be fast, pages are easier to read, but original intent and content are lost. What is the solution? Wait for technology to be upgraded enough to handle the original Web pages and bigger screens.
Whitney Grace, April 6, 2017