Universal Text Translation Is the Next Milestone for AI
February 9, 2018
As the globe gets smaller, individuals are in more contact with people who don’t speak their language. Or, we are reading information written in a foreign language. Programs like Google Translate are flawed at best and it is clear this is a niche waiting to be filled. With the increase of AI, it looks like that is about to happen, according to a recent GCN article, “IARPA Contracts for Universal Text Translator.”
According to the article:
The Intelligence Advanced Research Projects Activity is a step closer to developing a universal text translator that will eventually allow English speakers to search through multilanguage data sources — such as social media, newswires and press reports — and retrieve results in English.
The intelligence community’s research arm awarded research and performance monitoring contracts for its Machine Translation for English Retrieval of Information in Any Language program to teams headed by leading research universities paired with federal technology contractors.
Intelligence agencies, said IARPA project managers in a statement in late December, grapple with an increasingly multilingual, worldwide data pool to do their analytic work. Most of those languages, they said, have few or no automated tools for cross-language data mining.
This sounds like a very promising opportunity to get everyone speaking the same language. However, we think there is still a lot of room for error. We are hedging our bets on Unibabel’s AI translation software that is backed up by human editors. (They raised $23M, so they must be doing something right.) That human angle seems to be the hinge that will be a success for someone in this rich field.
Patrick Roland, February 9, 2018
And the Greatest Tech Headline of 2018 Is…
February 8, 2018
Short honk: In my morning news flow, I spotted what may be the greatest headline of 2018 (at least in the first five weeks of the year). Here it is:
Google Gave the World Powerful AI Tools, and the World Made Porn with Them
You can read the naked truth in a revealing story in Quartz. Try this link.
Strip away the veneer about AI. Breathless prose ignites one’s passion for tech thrills. With Reddit doing the censorship thing, this is “real” news, maybe just organic (I almost typed another word with and s and m.)
Stephen E Arnold, February 8, 2018
Financial Research: Rumblings Get Louder
February 8, 2018
Regulations are having causing small tremors in the high altitude research business. I read “U.S. Asset Managers Shake Up Equity research as Banks Cut Back.” The write up offered several pieces of intelligence which might be considered “real” news.
First, outfits with money to invest and “churn” are hiring people who know specific things; for example, a former product manager at a company manufacturing gear related to artificial intelligence. No MBA needed was the take away for me.
Second, big money outfits have cut back on buying research. According to the article, one big money executive stopped buying bank research and learned “that he could live without most of it.”
Third, I highlighted this headache inducing statement for the providers of high end research:
Major global investment banks slashed their equity research budgets from a peak of $8.2 billion in 2008 to $3.4 billion in 2017, according to Frost Consulting. McKinsey projects the top 10 banks will cut those budgets by another 30 percent in the near term…
My question, “What happens to the Investext business?” Another one: “What acquisitions will big money companies make in order to deal with the changes in research?”
Worth watching.
Stephen E Arnold, February 8, 2018
The Future of Social Media is Old School
February 8, 2018
Before social media, the only way to express yourself online was via a mostly anonymous series of blogs and sites that were impossible to go viral because virality didn’t exist. Oddly, some bright minds are going back to this method with txt.fyi, a platform where you can post anything you want without it going to search engines. This old-fashioned message board was examined in a recent Wired article, “This Stripped-Down Blogging Tool Exemplifies Antisocial Media.”
I wanted something where people could publish their thoughts without any false game of social manipulation, one-upmanship, and favor-trading,” he says. This is what I found so interesting about his creation. Its antivirality doesn’t necessarily prevent a post from becoming wildly popular. (A txt.fyi URL shared on, say, Facebook could perhaps go viral.) But its design favors messages to someone, not everyone.
[The inventor] discovered someone using txt.fyi to write letters to a deceased relative. It was touching and weirdly human, precisely the sort of unconventional expression we used to see a lot more of online. But today we sand down those rough edges, those barbaric yawps, in the quest for social spread. Even if you don’t want to share something, Medium or Tumblr or Snapchat tries to make you. They have the will to virality baked in.
This is a neat idea and might have a longer shelf life than you’d think. That’s because we are firm believers that every good idea on the internet gets retooled for awfulness. (Reddit, anyone?) This quasi-dark web blogging approach is almost certain to be used for nefarious purposes and will become a tool for hate speech and crime.
Patrick Roland, February 8, 2018
The Appification of Search: Dr. Frankenstein Is Back in the Innovation Basement
February 7, 2018
When I need information, I want to define my area of interest. I want to select a database which is likely to contain relevant information. I want to receive results and short summaries. I want to work through the content which conforms to my query. Time consuming and difficult work. But that’s how I roll down the information highway.
I noted a write up from Google called “The Keyword.” The story or marketing piece tells me that when I look for an airline flight, I will be able to book that flight from the search results.
Sounds like a great idea.
As I stated in the opening paragraph, I want to work through results. In the case of looking for a flight, I want to check different departure and return dates, available airports, number of stops, layover times, etc.
Once I locate a particular flight, I check the cost of that flight using different online services.
The reason? I have been flying around for more than a half century, and I have learned how an uninformed decision can set up an overnight in February in the Minneapolis St Paul airport. Believe me that’s not a great place to sleep as the snow falls and the meeting in Fargo becomes essentially impossible.
The write up states:
We’re evolving the way our hotel search works on smartphones to help users explore options and make decisions on their smallest screens. The new hotel search experience includes better price filtering, easier-to-find amenity information and the ability to book right from Google.
Some of the folks looking for flights will find convenience and a small screen ideal for their needs.
Not for me.
I do not trust one stop shops. I do not trust aggregators. I do not trust information assembled when ad dollars may be fluttering like those Minnesota snow flakes. I have learned that Southwest flights and some European carriers data require a visit to the airline’s Web site. Some human travel agents still consolidate tickets for wild and crazy “groups.”
But my principal concern is that online trust is no longer an operating assumption for me. Unless I slog through the data, I lack the information necessary for an informed decision.
Appification of search is one more shift from locating information, processing it, and making an informed decision.
Thank you, Mother Google. But no. I don’t want search results to be an app. I want search results to be one component of data collection and a precursor to analysis. Also, I like a big screen.
Stephen E Arnold, February 7, 2018
Startup Success: Cleverness and Lady Luck
February 7, 2018
Right now a game-changing startup is begging for funding. That’s a given. But just as likely is the idea that that company is getting completely ignored. It’s a common story that the biggest asset for startups is luck, which was wonderfully illustrated by a recent Quartz story, “Google’s Early Failure to Sell Itself Shows Why We Can’t Recognize Good Ideas.”
According to the funder who wrote Sergey Brin his second check, who advised them to give up on a failed plan to license Google:
“It’s very hard to get anyone else to adopt your baby. I told them, “You have to raise your baby yourself.” They came back some months later, and I don’t think they said I was right, but they’d decided to start their own company because nobody was interested in their baby.”
This has always been the case. These babies that tech gurus design often don’t find sympathetic investors. It’s often like hearing news of a brilliant musician who went unnoticed because of bad luck or a beautiful movie that fell through the cracks. It’s timing and luck and networking and it’s been like this for as long as anyone can remember. Quora was asking how big of a role luck plays in startup success way back in 2010. The results are about what you can expect: Lady Luck picks her dates often without much thought.
Patrick Roland, February 7, 2018
Searching Video and Audio Files is Now Easier Than Ever
February 7, 2018
While text-based search has been honed to near perfection in recent years, video and audio search still lags. However, a few companies are really beginning to chip away at this problem. One that recently caught our attention was VidDistill, a company that distills YouTube videos into an indexed list.
According to their website:
vidDistill first gets the video and captions from YouTube based off of the URL the user enters. The caption text is annotated with the time in the video the text corresponds to. If manually provided captions are available, vidDistill uses those captions. If manually provided captions are not available, vidDistill tries to fall back on automatically generated captions. If no captioning of any sort is available, then vidDistill will not work.
Once vidDistill has the punctuated text, it uses a text summarization algorithm to identify the most important sentences of the entire transcript of the video. The text summarization algorithm compresses the text as much as the user specifies.
It was interesting and did what they claimed, however, we wish you could search for words and have it brought up in the index so users could skip directly to specific parts of a video. This technology has been done in audio, quite well. A service called Happy Scribe, which is aimed at journalists transcribing audio notes, takes an audio file and (for a small fee) transcribes it to text, which can then be searched. It’s pretty elegant and fairly accurate, depending on the audio quality. We could see VidDistill using this mentality to great success.
Patrick Roland, February 7, 2018
Online Giants: Not into Sunshine
February 6, 2018
Two quick items.
The first comes from Thomson Reuters (now in the process of adapting to its financial reality). The write up is “Germany Opens Anti-Trust Probe into Online Advertising.” The German regulators are referenced as the “cartel” office. And the bone of contention? Facebook and Google get lots of online advertising money. Major publishing companies have either been squeezed by the US high school science club companies or missed the U-Bahn completely.
The second is a more academic and, therefore, considered opinion than “real” news. The article is “Big Tech’s Bid to Control FOIA.” The main point is that companies like the Silicon Valley science club outfit Facebook wants to keep tax incentives and other information out of bright light.
It seems that some in governmental agencies like Germany want to know more about US online giants. And maybe some of the online giants want to keep information about its business dealings in a low light situation.
Which will win? I suppose one can turn to Paradise Lost for guidance. But there was that pesky sequel Paradise Regained. Now one has to figure out who or what is making a heaven of hell and a hell of heaven.
Stephen E Arnold, February 6, 2018
German Scientists Find Freedom Of Search
February 6, 2018
A storm had been brewing in Germany over the ability for scientists to gain access to expensive academic journals. The deal had more to do with search and rights than it did science, so the publisher stood up and did something shocking. They did…the right thing. We learned more in a recent Nature story, “Germany vs Elsevier: universities win temporary journal access after refusing to pay fees.”
According to the story:
The Dutch publishing giant Elsevier has granted uninterrupted access to its paywalled journals for researchers at around 200 German universities and research institutes that had refused to renew their individual subscriptions at the end of 2017.
The institutions had formed a consortium to negotiate a nationwide licence with the publisher. They sought a collective deal that would give most scientists in Germany full online access to about 2,500 journals at about half the price that individual libraries have paid in the past. But talks broke down and, by the end of 2017, no deal had been agreed. Elsevier now says that it will allow the country’s scientists to access its paywalled journals without a contract until a national agreement is hammered out.
This is a victory for, not just the scientists, but for freedom of information. We applaud Elsevier for putting aside profit (temporarily) in favor of human. We wish more companies and governments would take their example to heart.
Patrick Roland, February 6, 2018
DarkCyber for February 6, 2018, Now Available
February 6, 2018
The Beyond Search DarkCyber video program for January 6, 2018, is now available. You can view the program on YouTube or on Vimeo. This week’s program reveals that the go-to system for purchasing military-grade weapons is Telegram, the messaging app. Lebanon’s surveillance program has been exposed. After years of covert operation, human error allowed researchers to characterize the operation. White hat and black hat techniques were used by the Middle Eastern country. Haven, a software app attributed to Edward Snowden, promises protection from third-party access to a mobile phone. Dark Cyber tested the app and found that it could transmit data back to the app’s creator. The program also reviews some of the investigative techniques used to locate the operator of a Dark Web pornography site. In addition to analysis of Dark Web traffic, investigators matched behavioral to Surface Web sources and examined linguistic behaviors to track down users. You can view the video from the Beyond Search main page at this link.
Kenny Toth, February 6, 2018