September 19, 2014
Machines know how to read, because they have been programmed to understand letters and numbers. They, however, do not comprehend what they are “reading” and cannot regurgitate it for users. The Research Blog that comments on Google’s latest news “Teaching Machines To Read Between The Lines (And A New Corpus With Entity Salience Annotations),” about how the search engine giant is using the New York Times Annotated Corpus to teach machines entity salience. Entity salience basically means machines can comprehend what they are “reading,” locate required information, and be able to use it. The New York Times Corpus is a large dataset with 1.8 million articles from twenty years. If a machine can learn salience from anything, it would be this collection.
Entity salience is determined by term ratios and complex search indexing done-brought to you by Knowledge Graph. The machine reading the article records the indicator for salience, byte offsets, entity index, mention count of entity determined by conference system, and other information to digest the document.
The system does work better with proper nouns:
“Since our entity resolver works better for named entities like WNBA than for nominals like “coach” (this is the notoriously difficult word sense disambiguation problem, which we’ve previously touched on), the annotations are limited to names.”
“What is one of the most-often overlooked things in machine learning that you wished more people would know about or would study more? What are some of the most interesting data science projects Google is working on?”
Norvig responded that there are many problems depending on the project you are working on and Google is doing a lot of data science projects, but nothing specific.
Machine learning and reading is being worked on. In short, machines are going to school.
September 15, 2014
If you follow the HP Autonomy firefights, you will enjoy “Autonomy Deal Fallout ‘More Extreme’ Than Hoped, says HP’s UK boss Andy Isherwood:
In spite of HP’s allegations that Lynch and senior Autonomy management inflated revenues with phantom deals and hidden low-margin sales, Isherwood liked what he found. Technologically at least it was a good buy, he insists….We’re seeing clearly a lot more customer buying so it’s not an issue with the product.”
I also noted the positive signal of one percent revenue growth. Mr. Isherwood asserts:
Despite the decline in outsourcing revenues, a global trend, HP surprised Wall Street last month with 1pc revenue growth – its first in a dozen quarters – on the back of increasing share of the PC market. The UK picture was better still, with only outsourcing in decline. After falling last year overall sales here are on track to grow 7pc, says Isherwood.
With this atta boy and positive financials, what went wrong with Autonomy? As Isherwood says:
Unsurprisingly, he has only good things to say about the leadership of Whitman, Lynch’s nemesis. She is nearly two years into a five-year plan to turn the HP oil tanker, with increased investment in research and development, and a focus on the big trends of cloud computing, mobile working and big data as part an attempt to turn HP’s scale and diversity to its advantage. “HP is a broad-based company,” says Isherwood. “Meg understood that immediately. At that time we had said we were going to hive off the PC business, but she came in and said ‘no’, the power is in the broad portfolio.”
If there’s no management culpability, HP wants its money back. Interesting.
Stephen E Arnold, September 15, 2014
September 12, 2014
A criminal hiding in a foreign land for over a decade may begin to feel sure he has escaped the long arm of U.S. law. Today’s technology, however, has rendered that sense of security false for at least one wanted suspect. We learn from NakedSecurity that “Facial Recognition Software Leads to Arrest After 14-Year Manhunt.”
Neil Stammer, of New Mexico, was charged with some very serious offenses back in 1999, but escaped while out on bond. Writer Lisa Vaas reports:
“The case went cold until January 2014, when FBI Special Agent Russ Wilson was assigned the job of fugitive coordinator in Albuquerque, New Mexico. Wilson created a new wanted poster for Stammer and posted it onto FBI.gov in hopes of generating tips.
“A special agent with the Diplomatic Security Service (DSS) – a branch of the US Department of State whose mission includes protecting US Embassies and maintaining the integrity of US visa and passport travel documents – was testing new facial recognition software designed to uncover passport fraud when he decided, ‘on a whim,’ to use the software on FBI wanted posters.
“A match showed up between Stammer’s wanted poster and a passport photo issued under a different name. Suspecting fraud, the DSS agent contacted the FBI. The tip soon led Wilson to Nepal, where Stammer was living under the name Kevin Hodges and regularly visiting the US Embassy there to renew his tourist visa.”
Apparently, Stammer/Hodges had gotten comfortable in Nepal, teaching English. An FBI agent observed that the suspect seemed quite surprised when a joint operation with the Nepalese government led to his location and arrest.
Though the facial-recognition search that produced this arrest was performed “on a whim,” local and federal law-enforcement agencies across the country are using or considering such software. Vaas emphasizes that these implementations are being made in the absence of any standardized best practices, though some are currently being crafted by the National Telecommunications & Information Administration.
Cynthia Murrell, September 12, 2014
September 11, 2014
I read “The Revolutionary Technique That Quietly Changed Machine Vision Forever.” The main idea is that having software figure out what an image “is” has become a slam dunk. Well, most of the time.
The write up from the tech cheerleaders at Technology Review says, “Machines are now almost as good as human at object recognition.”
A couple of niggling points. There is that phrase “almost as good”. Then there is the phrase “object recognition.”
Read the write up and then answer these questions:
- Is the method ready to analyze imagery fed by a drone to a warfighter during a live fire engagement?
- Is the system able to classify a weapon in a manner meaningful to field commander?
- Can the system discern a cancerous tissue from a non cancerous tissue with an image output from a medical imaging system?
- Does the method recognize objects in a image like the one shown below?
Image by Stephen E Arnold, 2013
If you pass this query to Google’s image recognition system, you get street scenes, not a person watching activities through an area cordoned off by government workers.
Google thinks the surveillance image is just like the scenes shown above. Note Google does not include observers or the all important police tape.
The write up states:
In other words, it is not going to be long before machines significantly outperform humans in image recognition tasks. The best machine vision algorithms still struggle with objects that are small or thin such as a small ant on a stem of a flower or a person holding a quill in their hand. They also have trouble with images that have been distorted with filters, an increasingly common phenomenon with modern digital cameras.
This stuff works in science fiction stories, however. Lab progress is not real world application progress.
Stephen E Arnold, September 11, 2014
September 4, 2014
I read “Google Backed Calico to Launch $1.5 Billion Aging Research Center.” The idea of wellness is a good one. The concept of life extension does not match up with information retrieval. As Google marginalizes blog search, Google’s initiatives are fascinating. The company has not been able to diversify its revenue stream from search based advertising. The company has been able to diversify its science projects. From Loon balloons to investments in quantum computing, Google’s activities remind me of a high school science fair on steroids.
I learned that this new venture which joins Google delivery drone investments is focused on:
The new San Francisco Bay Area facility will focus on drug discovery and early drug development for diseases like neurodegeneration and cancer. Calico’s larger aim is lifespan extension.
What’s this bode for good old fashioned relevant search results? More ads, less relevance is one possibility. Search is parked on an access road to the information highway I fear.
Stephen E Arnold, September 4, 2014
September 4, 2014
We have been monitoring Sail Labs and they have been quiet on the news front for a long time. At the beginning of July, Sail Labs posted this press release: “Sail Labs Announces Availability Of Release Version 2013-2 And Media Mining Indexer 6.3.” The company is a leading provider of speech technology. Sail is an acronym for “speech artificial intelligence and language lab.” It is located in Vienna, Austria and has a strong commitment to open source.
The new upgrades improve the already popular media mining client and media mining server. They include new supported languages, API documentation, time range filter for search and retrieval of workflow data items, scheduling Web sites without a RSS feed, and support for new cloud features. Even longer are the product enhancements, which make search, ontology creation, and more treats for open source:
“Improve performance of Suggest Open Source Information item for Report view, by applying result paging
Show count of referenced and suggested Open Source Information item in chapter tree of reports”
It is great that Sail Labs is still creating quality speech technology products. Too bad they do not have the same presence as Nuance in the US.
Whitney Grace, September 04, 2014
September 1, 2014
A video on Snapzu.com titled The Computer That’s Smarter Than YOU & I offers an explanation of Watson, IBM’s supercomputer. It begins with the beginning of civilization and humankind’s constant innovation since. With the creation of the microchip, modern technology really began to ramp up, and it asks (somewhat rhetorically) what will be the next great technological innovation? The answer is: the reasoning computer. The video shows a demo of the supercomputer trying to understand pros and cons on the sale of violent video games. Watson worked through the topic as follows,
“Scanned approximately 4 million Wikipedia articles. Returning ten most relevant articles. Scanned all three thousand sentences in top ten articles. Detected sentences which contain candidate claims. Identified borders of candidate claims. Assessed pro and con polarity of candidate claims. Constructed demo speech… the sale of violent video games should be banned.”
Watson went on to list his reasons for choosing this stance, such as “exposure to violent video games results in increased physiological arousal.” But he also offered a refutation, that the link between the games and actual violent action has not been proven. The ability of the computer to reason without human aid on its own is touted as the truly exciting innovation. Meanwhile, we are still waiting for a publicly accessible demo.
Chelsea Kerwin, September 01, 2014
August 20, 2014
Google is famous for its very curious research arm, and now the company has published its favorite findings of 2013. We learn of the generous gesture from eWeek’s “Google Shares Research Findings with Scientific World,” where writer Todd R. Weiss discusses reports on the roundup originally posted in a Google Research blog post. It is a very interesting list, and worth checking out in full. What caught my eye were the reports on machine learning and natural language processing. Weiss writes:
“Machine learning is a continuing topic, as seen in papers including … the paper ‘Efficient Estimation of Word Representations in Vector Space,’ which looks at a ‘simple and speedy method for training vector representations of words,’ according to the post.
“’The resulting vectors naturally capture the semantics and syntax of word use, such that simple analogies can be solved with vector arithmetic. For example, the vector difference between “man” and “woman” is approximately equal to the difference between “king” and “queen,” and vector displacements between any given country’s name and its capital are aligned,’ the post read.”
Weiss next turns to natural language processing with the report, “Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging.” He quotes the paper:
“Constructing part-of-speech taggers typically requires large amounts of manually annotated data, which is missing in many languages and domains. In this paper, we introduce a method that instead relies on a combination of incomplete annotations projected from English with incomplete crowd-sourced dictionaries in each target language. The result is a 25 percent error reduction compared to the previous state of the art.”
The article concludes by noting that Google has is no stranger to supporting the research community, pointing to its App Engine for Research Awards program. It also notes that the company grants access to the Google infrastructure to academics for research purposes. Will all this generosity help Google in the PR arena?
Cynthia Murrell, August 20, 2014
August 20, 2014
Bubble? What bubble? ZDNet informs us that “Salesforce Acquired Big Data Startup RelateIQ” for a sum approaching $400 million. The deal will be Salesforce’s second-largest acquisition, following their purchase of “marketing cloud” outfit ExactTarget last year for $2.5 billion. Reporter Natalie Gagliordi writes:
“According to a document filed Friday with the Securities and Exchange Commission, Salesforce will pay up to $390 million for the Palo Alto, California-based startup, which provides relationship intelligence via data science and machine learning. RelateIQ will become a Salesforce subsidiary, the filing says.
“On its website, RelateIQ says it’s built ‘the world’s first Relationship Intelligence platform’ that redefines the world of CRM. In a nutshell, the platform captures sales data from email, calendars and smartphone calls and social media to provide insights in real time.”
Relationship intelligence, eh? That’s indeed a new one (outside the discipline of sociology, anyway). RelateIQ launched in 2011, based out of Palo Alto. In nearby San Francisco, Salesforce was launched in 1999 by a former Oracle exec, Now, their success in cloud-based customer-relationship-management solutions has them operating offices around the world. Will their spending spree pay off?
Cynthia Murrell, August 20, 2014
August 14, 2014
Short honk: I don’t have too much to say about “Gartner: Internet of Things Has Reached Hype Peak .” Wow will have to suffice. The diagram in the article is amazing as well. A listicle is pretty darned limited when compared to a plotting of buzzwords from a consulting firm that vies with McKinsey, Bain, Boston Consulting, and Booz for respect. Another angle on this article is that it is published by a company that has taken a frisky approach to other folks’ information. For some background, check out “Are HP, Google, and IDC Out of Square.” I wanted to assemble a list of the buzzwords in the Network World article, but even for my tireless goslings, the task was too much. I could not figure out what the legends on the x and y axis meant. Do you know what a “plateau of productivity” is. I am not sure what “productivity” means unless I understand the definition in use by the writer.
One fact jumps out for me:
“As enterprises embark on the journey to becoming digital businesses, they will leverage technologies that today are considered to be ‘emerging’,” said Hung LeHong, vice president and Gartner fellow. “Understanding where your enterprise is on this journey and where you need to go will not only determine the amount of change expected for your enterprise, but also map out which combination of technologies support your progression.”
The person making this statement probably has a good handle on the unpleasantness of a legal dispute. For some color, please, see “Gartner Magic Quadrant in the News: Netscount Matter.”
Stephen E Arnold. August 14, 2014