Bitvore: The AI, Real Time, Custom Report Search Engine
May 16, 2017
Just when I thought information access had slumped quietly through another week, I read in the capitalist tool which you know as Forbes, the content marketing machine, this article:
This AI Search Engine Delivers Tailored Data to Companies in Real Time.
This write up struck me as more interesting than the most recent IBM Watson prime time commercial about smart software for zealous professional basketball fans or Lucidworks’ (really?) acquisition of the interface builder Twigkit. Forbes Magazine’s write up did not point out that the company seems to be channeling Palantir Technologies; for example, Jeff Curie, the president, refers to employees at Bitvorians. Take that, you Hobbits and Palanterians.
A Bitvore 3D data structure.
The AI, real time, custom report search engine is called Bitvore. Here in Harrod’s Creek, we recognized the combination of the computer term “bit” with a syllable from one of our favorite morphemes “vore” as in carnivore or omnivore or the vegan-sensitive herbivore.
Voice Search and Big Data: Defining Technologies for 2017
April 20, 2017
I read “Voice Search and Data: The Two Trends That Will Shape Online Marketing in 2017.” If the story is accurate, figuring out what people say and making sense of data (lots of data) will create new opportunities for innovators.
The article states:
Advancements in voice search and artificial intelligence (AI) will drive rich answers that will help marketers understand the customer intent behind I-want-to-go, I-want-to-know, I-want-to-buy and I-want-to-do micro-moments. Google has developed algorithms to cater directly to the search intent of the customers behind these queries, enabling customers to find the right answers quickly.
My view is that the article is correct in its assessment.
Where the article and I differ boils down to search engine optimization. The idea that voice search and Big Data will make fooling the relevance algorithms of Bing, Google, and Yandex a windfall for search engine optimization experts is partially true. Marketing whiz kids will do and say many things to deliver results that do not answer my query or meet my expectation of a “correct” answer.
My view is that the proliferation of systems which purport to understand human utterances in text,and voice-to-text conversions will discover that the the error rates of 60 to 75 percent are not good enough. Errors can be buried in long lists of results. They can be sidestepped if a voice enabled system works from a set of rules confined to a narrow topic domain.
Open the door to natural language parsing, and the error rates which once were okay become a liability. In my opinion, this will set off a scramble among companies struggling to get their smart software to provide information that customers accept and use repeatedly. Fail and customer turnover can be a fatal knife wound to the heart of an organization. The cost of replacing a paying customer is high. Companies need to keep the customers they have with technology that helps keep paying customers smiling.
What companies are able to provide higher accuracy linguistic functions? There are dozens of companies which assert that their systems can extract entities, figure out semantic relationships, and manipulate content in a handful of languages.
The problem with most of these systems is that certain, very widely used methods collapse when high accuracy is required for large volumes of text. The short cut is to use numerical tricks, and some of those tricks create disconnects between the information the user requests or queries and the results the system displays. Examples range from the difficulties of tuning the Autonomy Digital Reasoning Engine to figuring out how in the heck Google Home arrived at a particular song when the user wanted something else entirely.
Our suggestion is that instead of emailing IBM to sign a deal for that companies language technology, you might have a more productive result if you contact Bitext. This is a company which has been on my mind. I interviewed the founder and CEO (an IBM alum as I learned) and met with some of the remarkable Bitext team.
I am unable to disclose Bitext’s clients. I can suggest that if you fancy a certain German sports car or use one of the world’s most popular online services, you will be bumping into Bitext’s Digital Linguistic Analysis platform. For more information, navigate to Bitext.com.
The data I reviewed suggested that Bitext’s linguistic platform delivers accuracy significantly better than some of the other systems’ outputs I have reviewed. How accurate? Good enough to get an A in my high school math class.
Stephen E Arnold, April 20, 2017
Cortana Becomes an MD
April 17, 2017
Smartphone assistants like Microsoft’s Cortana are only good for verbal Internet searches. They can be made smarter with an infusion of machine learning and big data. According to Neowin, Microsoft is adding NLP and AI to Cortana and sending it to medical school, “The UK’s Health Services Now Relies On Cortana Intelligence Suite To Read Medical Research.”
Microsoft takes a lot of flak for their technology, but they do offer comprehensive solutions that do amazing things…when they work. The UK Health Services will love and hate their new Cortana Intelligence Suite. It will be utilized to read and catalog medical research to alert medical professionals to new trends in medicine:
Researching and reading can consume medical professionals’ times, stealing a valuable resource from patients.
That’s why the UK’s National Institute for Health and Care Excellence (NICE) is now relying on Microsoft’s Cortana Intelligence Suite for sifting through medical data. NICE uses machine-learning algorithms to look at published medical research, categorize it, and feed it to volunteer citizen scientists which then re-categorizes and processes it. This leaves researchers time to go through the final data, interpret and understand it, without having to waste time on the way. It also forms a virtuous cycle, whereby the citizen scientists feed the computer algorithm data and improve it, and the computer algorithm feeds the volunteers better data, speeding up their work.
Medical professionals need to be aware of current trends and how medical research is progressing, but the shear amount of papers and information available is an impossible feat to control. Cortana can smartly parry down the data and transform it into digestible, useful material.
Whitney Grace, April 17, 2017
A Peek at the DeepMind Research Process
April 14, 2017
Here we have an example of Alphabet Google’s organizational prowess. Business Insider describes how “DeepMind Organises Its AO Researchers Into ‘Strike Teams’ and ‘Frontiers’.” Writer Sam Shead cites a report by Madhumita Murgia as described in the Financial Times. He writes:
Exactly how DeepMind’s researchers work together has been something of a mystery but the FT story sheds new light on the matter. Researchers at DeepMind are divided into four main groups, including a ‘neuroscience’ group and a ‘frontiers’ group, according to the report. The frontiers group is said to be full of physicists and mathematicians who are tasked with testing some of the most futuristic AI theories. ‘We’ve hired 250 of the world’s best scientists, so obviously they’re here to let their creativity run riot, and we try and create an environment that’s perfect for that,’ DeepMind CEO Demis Hassabis told the FT. […]
DeepMind, which was acquired by Google in 2014 for £400 million, also has a number of ‘strike teams’ that are set up for a limited time period to work on particular tasks. Hassabis explained that this is what DeepMind did with the AlphaGo team, who developed an algorithm that was able to learn how to play Chinese board game Go and defeat the best human player in the world, Lee Se-dol.
Here’s a write-up we did about that significant AlphaGo project, in case you are curious. The creative-riot approach Shead describes is in keeping with Google’s standard philosophy on product development—throw every new idea at the wall and see what sticks. We learn that researchers report on their progress every two months, and team leaders allocate resources based on those reports. Current DeepMind projects include algorithms for healthcare and energy scenarios.
Hassabis launched DeepMind in London in 2010, where offices remain after Google’s 2014 acquisition of the company.
Cynthia Murrell, April 14, 2017
The Algorithm to Failure
April 12, 2017
Algorithms have practically changed the way the world works. However, this nifty code also has its limitations that lead to failures.
In a whitepaper published by Cornell University, authored by Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah and titled Failures of Deep Learning, the authors say:
It is important, for both theoreticians and practitioners, to gain a deeper understanding of the difficulties and limitations associated with common approaches and algorithms.
The whitepaper touches four pain points of Deep Learning, which is based on algorithms. The authors propose remedial measures that possibly could overcome these impediments and lead to better AI.
Eminent personalities like Stephen Hawking, Bill Gates and Elon Musk have however warned against advancing AIs. Google in the past had abandoned robotics as the machines were becoming too intelligent. What now needs to be seen is who will win in the end? Commercial interests or unfounded fear?
Vishal Ingole, April 12, 2017
Intelligence Researchers Pursue Comprehensive Text Translation
March 27, 2017
The US Intelligence Advanced Research Projects Agency (IARPA) is seeking programmers to help develop a tool that can quickly search text in over 7,000 languages. ArsTechnica reports on the initiative (dubbed the Machine Translation for English Retrieval of Information in Any Language, or MATERIAL) in the article, “Intelligence Seeks a Universal Translator for Text Search in Any Language.” As it is, it takes time to teach a search algorithm to translate each language. For the most-used tongues, this process is quite well-along, but not so for “low-resource” languages. Writer Sean Gallagher explains:
To get reliable translation of text based on all variables could take years of language-specific training and development. Doing so for every language in a single system—even to just get a concise summary of what a document is about, as MATERIAL seeks to do—would be a tall order. Which is why one of the goals of MATERIAL, according to the IARPA announcement, ‘is to drastically decrease the time and data needed to field systems capable of fulfilling an English-in, English-out task.’
Those taking on the MATERIAL program will be given access to a limited set of machine translation and automatic speech recognition training data from multiple languages ‘to enable performers to learn how to quickly adapt their methods to a wide variety of materials in various genres and domains,’ the announcement explained. ‘As the program progresses, performers will apply and adapt these methods in increasingly shortened time frames to new languages.’
Interested developers should note candidates are not expected to have foreign-language expertise. Gallagher notes that IARPA plans to publish their research publicly; he looks forward to wider access to foreign-language documents down the road, should the organization meet their goal.
Cynthia Murrell, March 27, 2017
MBAs Under Siege by Smart Software
March 23, 2017
The article titled Silicon Valley Hedge Fund Takes Over Wall Street With AI Trader on Bloomberg explains how Sentient Technologies Inc. plans to take the human error out of the stock market. Babak Hodjat co-founded the company and spent the past 10 years building an AI system capable of reviewing billions of pieces of data and learning trends and techniques to make money by trading stocks. The article states that the system is based on evolution,
According to patents, Sentient has thousands of machines running simultaneously around the world, algorithmically creating what are essentially trillions of virtual traders that it calls “genes.” These genes are tested by giving them hypothetical sums of money to trade in simulated situations created from historical data. The genes that are unsuccessful die off, while those that make money are spliced together with others to create the next generation… Sentient can squeeze 1,800 simulated trading days into a few minutes.
Hodjat believes that handing the reins over to a machine is wise because it eliminates bias and emotions. But outsiders wonder whether investors will be willing to put their trust entirely in a system. Other hedge funds like Man AHL rely on machine learning too, but nowhere near to the extent of Sentient. As Sentient bring in outside investors later this year the success of the platform will become clearer.
Chelsea Kerwin, March 23, 2017
Yandex Incorporates Semantic Search
March 15, 2017
Apparently ahead of a rumored IPO launch, Russian search firm Yandex is introducing “Spectrum,” a semantic search feature. We learn of the development from “Russian Search Engine Yandex Gets a Semantic Injection” at the Association of Internet Research Specialists’ Articles Share pages. Writer Wushe Zhiyang observes that, though Yandex claims Spectrum can read users’ minds, the tech appears to be a mix of semantic technology and machine learning. He specifies:
The system analyses users’ searches and identifies objects like personal names, films or cars. Each object is then classified into one or more categories, e.g. ‘film’, ‘car’, ‘medicine’. For each category there is a range of search intents. [For example] the ‘product’ category will have search intents such as buy something or read customer reviews. So we have a degree of natural language processing, taxonomy, all tied into ‘intent’, which sounds like a very good recipe for highly efficient advertising.
But what if a search query has many potential meanings? Yandex says that Spectrum is able to choose the category and the range of potential user intents for each query to match a user’s expectations as close as possible. It does this by looking at historic search patterns. If the majority of users searching for ‘gone with the wind’ expect to find a film, the majority of search results will be about the film, not the book.
As users’ interests and intents tend to change, the system performs query analysis several times a week’, says Yandex. This amounts to Spectrum analysing about five billion search queries.”
Yandex has been busy. The site recently partnered with VKontakte, Russia’s largest social network, and plans to surface public-facing parts of VKontakte user profiles, in real time, in Yandex searches. If the rumors of a plan to go public are true, will these added features help make Yandex’s IPO a success?
Cynthia Murrell, March 15, 2017
The Human Effort Behind AI Successes
March 14, 2017
An article at Recode, “Watson Claims to Predict Cancer, but Who Trained It To Think,” reminds us that even the most successful AI software was trained by humans, using data collected and input by humans. We have developed high hopes for AI, expecting it to help us cure disease, make our roads safer, and put criminals behind bars, among other worthy endeavors. However, we must not overlook the datasets upon which these systems are built, and the human labor used to create them. Writer (and CEO of DaaS firm Captricity) Kuang Chen points out:
The emergence of large and highly accurate datasets have allowed deep learning to ‘train’ algorithms to recognize patterns in digital representations of sounds, images and other data that have led to remarkable breakthroughs, ones that outperform previous approaches in almost every application area. For example, self-driving cars rely on massive amounts of data collected over several years from efforts like Google’s people-powered street canvassing, which provides the ability to ‘see’ roads (and was started to power services like Google Maps). The photos we upload and collectively tag as Facebook users have led to algorithms that can ‘see’ faces. And even Google’s 411 audio directory service from a decade ago was suspected to be an effort to crowdsource data to train a computer to ‘hear’ about businesses and their locations.
Watson’s promise to help detect cancer also depends on data: decades of doctor notes containing cancer patient outcomes. However, Watson cannot read handwriting. In order to access the data trapped in the historical doctor reports, researchers must have had to employ an army of people to painstakingly type and re-type (for accuracy) the data into computers in order to train Watson.
Chen notes that more and more workers in regulated industries, like healthcare, are mining for gold in their paper archives—manually inputting the valuable data hidden among the dusty pages. That is a lot of data entry. The article closes with a call for us all to remember this caveat: when considering each new and exciting potential application of AI, ask where the training data is coming from.
Cynthia Murrell, March 14, 2017
Yandex Finally Catches the Long-Tailed Queries
March 7, 2017
One of the happiest moments in a dog’s life is when, after having spent countless hours spinning in circles, is when they catch their tail. They wag for joy, even though they are chomping on their own happiness. When search engines were finally programmed to handle long-tailed queries, that is queries with a lot of words such as a question, people’s happiness was akin to a dog catching their tail. Google released RankBrain to handle long-winded ad NLP queries, but Yandex just released their own algorithm to handle questions, “Yandex Launches New Algorithm Named Palekh To Improve Search Results For Long-Tail Queries” from AIRS Association.
Yandex is Russia’s most-used search engine and in order to improve the user experience, they released Palekh to better process long-tail queries. Palekh, like RankBrain, will bring the search engine closer to understanding the natural language or the common vernacular. Yandex decided on the name Palekh, because the Russian city of the same name has a firebird on its coat of arms. The firebird has a long-tail, so the name fits perfectly.
Yandex handles more than 100 million queries per day that fall under the long-tail query umbrella. When asked if Yandex based Palekh on RankBrain, Yandex only responded that the two algorithms are similar in their purposes. Yandex also uses machine learning to build neural networks to build a smarter search engine:
Yandex’s Palekh algorithm has started to use neural networks as one of 1,500 factors of ranking. A Yandex spokesperson told us they have “managed to teach our neural networks to see the connections between a query and a document even if they don’t contain common words.” They did this by “converting the words from billions of search queries into numbers (with groups of 300 each) and putting them in 300-dimensional space — now every document has its own vector in that space,” they told us. “If the numbers of a query and numbers of a document are near each other in that space, then the result is relevant,” they added.”
Yandex is one of Google’s biggest rivals and it does not come as a surprise that they are experimenting with algorithms that will expand machine learning and NLP.
Whitney Grace, March 7, 2017