August 5, 2014
Watson, fresh from its recipe innovations at Bon Appétit, is on the move…again. From the game show to the hospital, Watson has been demonstrating its expertise in the most interesting venues.
I read “A Room Where Executives Go to Get Help from IBM’s Watson.” The subtitle is an SEO dream: “Researchers at IBM are testing a version of Watson designed to listen and contribute to business meetings.” I know IBM has loads of search and content processing capability. In addition to the gems cranked out by Dr. Jon Kleinberg and Dr. Ramanathan Guha, IBM has oodles of acquisitions in the search and content processing sector. Do you know about Clementine? Are you familiar with iPhrase? Have your explored Cybertap’s indexing and search function with your local IBM representative? What about Vivisimo? What about the search functions in DB2, FileNet, and OminFind regardless of its incarnation? Whew. That’s a lot of search and content processing horsepower. I think most of that power remains in the barn.
Watson is not in the barn. Watson is a raging bull. Watson is, I believe, something special. Based on open source technology plus home brew wizardry, Watson is a next-generation information retrieval world beater. The idea is that Watson is trained in a manner similar to the approach used by Autonomy in 1996. Then that indexed content is whipped into a question answering system. Hapless chefs, litigation wary physicians, and now risk averse MBAs can use Watson to make better decisions or answer really tough questions.
I know this to be true because Technology Review tells me so. Whatever MIT-tinged Technology Review says is pretty darned solid. Here’s a passage I noted:
Everything said in the room can be instantly transcribed, providing a detailed record of any meeting, and allowing the system to listen out for commands addressed to “Watson.” Those commands can be simple requests for information of the kind you might type into a search box. But Watson can also take a more active role in a discussion. In a live demonstration, it helped researchers role-playing as executives to generate a short list of companies to acquire.
The write up explains that a little bit of preparation is required. There’s the pesky training, which is particularly annoying when the topic of the meeting is, “The DOJ attorneys are here to discuss the depositions” or “We have a LOCA at the reactor. Everyone to my conference room now.” I suppose most business meetings are even more exciting.
Technology Review points out that the technology has a tough time converting executive speech to text. Watson uses the text as fodder for the indexing and parsing required to pass queries to the internal subsystems which then tap into Watson for answers. The natural language query and automatic query refinement functions seem to work well for game show questions and for discerning uses of tamarind. For a LOCA meeting or discussion of a deposition, Watson may need a bit more work.
I find the willingness of major “real” news outlets to describe Watson in juicy write ups an indication of the esteem in which IBM is held. My view is a bit different. I am not sure the Watson group at IBM knows how to generate substantial revenues. The folks have to make some progress toward $1 billion in revenue and then grow that revenue to a modest $10 billion in five or six years.
The fact that outfits in search and content processing have failed to hit more modest benchmarks for decades is irrelevant. The only search company that I know has generated billions is Google. Keep in mind that those billions come from online advertising. HP bought Autonomy for $11 billion in the hopes of owning a Klondike. IBM wisely went with open source technology and home grown code.
But the eventual effect of both HP’s and IBM’s approach will be more modest revenues. HP makes a name for itself via litigation and IBM is making a name for itself with demonstrations and some recipes.
Search and content processing, whether owned by a large company or a small one, faces some credibility, marketing, revenue, technology, and profit challenges. I am not sure a business triathlete can complete the course at this time. Talk is just so much easier than getting over or around the course intact.
Stephen E Arnold, August 5, 2014
July 21, 2014
The article titled Text Analytics Company Linguamatics Boosts Enterprise Search with Semantic Enrichment on MarketWatch discusses the launch of 12E Semantic Enrichment from Linguamatics. The new release allows for the mining of a variety of texts, from scientific literature to patents to social media. It promises faster, more relevant search for users. The article states,
“Enterprise search engines consume this enriched metadata to provide a faster, more effective search for users. I2E uses natural language processing (NLP) technology to find concepts in the right context, combined with a range of other strategies including application of ontologies, taxonomies, thesauri, rule-based pattern matching and disambiguation based on context. This allows enterprise search engines to gain a better understanding of documents in order to provide a richer search experience and increase findability, which enables users to spend less time on search.”
Whether they are spinning semantics for search, or if it is search spun for semantics, Linguamatics has made their technology available to tens of thousands of users of enterprise search. Representative John M. Brimacombe was straightforward in his comments about the disappointment surrounding enterprise search, but optimistic about 12E. It is currently being used by many top organizations, as well as the Food and Drug Administration.
Chelsea Kerwin, July 21, 2014
July 3, 2014
If you are interested in the utility of open source information, you will want to pay particular attention to the disappearing content triggered by the EU’s right to be forgotten. Information is hard to find if the index has been scrubbed. I thought about the “disappearing” of information when I read “Out of Band.” The write up states:
Crowdsourcing and the wealth of networks are terms that are in vogue. What the government generally, and the secret world particularly, refuse to knowledge is that information is a team sport and nature bats last. The government is only as good as its ability to do outreach, and if it relies on lies, nature—reality—will always reveal the truth at some future date.
Interesting point. However, when the most used source of information is filtering information, open source access becomes more important. With a single point of access, the reality becomes what’s findable. Will information access expand. Mr. Steele points out:
For the secret world, only a million-dollar custom-made shim will do, and they won’t notice if the beltway bandit sells them a piece of a beer can claiming it is the custom shim. I cannot overstate the ignorance and inattentiveness of today’s contracting officers and contracting officer technical representatives in the secret world.
In my view, his perspective applies to both commercial indexes and to government information methods. Fascinating. I keep wondering if Google is now the information government.
Stephen E Arnold, July 3, 2014
June 26, 2014
The EasyAsk for Magento solution has allowed Sonic Sense to deliver a much richer user experience with visual Search-as-you-Type, natural language search with highly accurate results and dynamic relevant navigation.
EasyAsk is a better choice than Solr, according to the write up:
“Sonic Sense is another shining example of the dramatic improvements in customer experience that EasyAsk delivers for Magento or any e-commerce site, said Craig Bassin, EasyAsk CEO. “EasyAsk’s solution is head and shoulders above the SOLR option and other third party search solutions for Magento Enterprise which is proven by the results at Sonic Sense and dozens of Magento customers flocking to EasyAsk.”
I navigated to www.sonicsearch.com and ran some queries. I will boil down my experience to one representative query, and invite you to run your own queries to make sure I did not miss a key point.
My test query was “audio mixer recorder.” I received three results pages. The results on the first page did include audio mixer with recording functions. However, the results on pages 2 and 3 were not relevant. This type of query relaxation allows a company to display more results, giving the impression of a hefty line up of products.
However, the faceted navigation function did not work. On page three, when I clicked on the option for the two products between $1 and $100, the system did not return a results page.
Response time struck me as sluggish. I did not expect Amazon-type displays, but I found myself wondering about the suitability of the SonicSense infrastructure to the demands of the search system.
For more information about EasyAsk, a natural language search system once owned by Progress Software, navigate to www.easyask.com.
Stephen E Arnold, June 26, 2014
June 9, 2014
The estimable IDC published “An AI Milestone: Chatbot Passes Turing Test by Posing as 13-Year-Old Boy.” I assume that the writer was compensated and the IDC issued a contract for the write up. Isn’t that the way IDC operates most of the time?
Well, maybe. More interesting to me than the tap dancing of the big outfits their way to revenue is a story that points out computers can fool humans. Humans fool humans, so it makes sense that humans will want computers to fool humans too.
According to the “real journalist” story:
At an event on Saturday at the Royal Society in London, a conversation program running on a computer called Eugene Goostman was able to convince more than a third of the judges that it was human. It marks the first time that any machine has passed the Turing Test proposed in 1950 by Alan Turing, regarded as the father of artificial intelligence (AI), according to the university, which organized the event.
Good for Eugene.
My view is that search engines already fool humans, effectively and frequently. A user assumes that the results from a free Web search will be timely, accurate, and objective. Like IDC’s approach to its authors’ content, the assumptions are sometimes not in line with reality.
Run queries on Bing, Google, and Yandex. What do you get? On the test queries I present in my lectures about getting through the advertising and self serving content takes a lot of work.
I assume that Eugene’s impact will make it more difficult to get information that answers a user’s question with what might be called “real” information.
Artificial intelligence is artificial. Fooling one third of the judges is not as impressive as fooling most people who look for information in a major Web search system and get filtered, skewed, distorted, and pay to play results.
Progress is not an illusion. Like much in today’s go go world, magic happens. Few are the wiser. When you read a document with an “expert’s name” on it, you may be reading the words of another person who is trampled upon. Exciting. Eugene, good work fooling humans.
Stephen E Arnold, June 9, 2014
May 30, 2014
Does silicon have taste buds? Do algorithms sniff the essence of Kentucky barbecue?
I read a darned amazing article called “I Tasted BBQ Sauce Made By IBM’s Watson, And Loved It.” The write up reports that IBM and partner Co.Design used the open source, home grown code, and massive database to whip up a recipe for grilling. IBM is going whole hog with the billion dollar baby Watson, which is supposed to be one of IBM’s revenue fountains any day now.
According the write up, which may or may not have the ingredients of a “real” news story:
Most BBQ sauces start with ingredients like vinegar, tomatoes, or even water, but IBM’s stands out from the get go. Ingredient one: White wine. Ingredient two: Butternut squash. The list contains more Eastern influences, such as rice vinegar, dates, cilantro, tamarind (a sour fruit you may know best from Pad Thai), cardamom (a floral seed integral to South Asian cuisine) and turmeric (the yellow powder that stained the skull-laden sets of True Detective) alongside American BBQ sauce mainstays molasses, garlic, and mustard.
And most important for the grillin’ fans in Harrod’s Creek, the author used the Watson concoction of tofu. I am not sure that the folks in Harrod’s Creek know what tofu is. I do know that the idea of creating a barbecue sauce without bourbon in it is a culinary faux pas. Splash tamarind on a couple of dead squirrels parked above the coals, and the friends of Daniel Boone may skin the offender and think about grillin’ something larger than a squirrel.
The author who is scoring the tofu and broccoli treat reports:
I test it again and again. Finally I just slather my plate in the stuff. It’s delicious–the best way I can describe it is as a Thai mustard sauce, or maybe the middle point between a BBQ sauce and a curry. Does that sound gross? I assure you that it isn’t…But as I mop my plate of the last drips of Bengali Butternut BBQ Sauce, contemplating the difference between a future in which computers addict us to the next Lean Cuisine and one where they attempt to eradicate us with Terminators, Napoleon’s old adage comes to mind: An army marches on its stomach. He–or that–who controls our stomachs controls it all.
Yes. From game show win to a tofu topping, IBM Watson is redefining search, corporate strategy, and the vocabulary of cuisine for tofu and broccoli lovers. Kentucky frshly killed and skinned grilled squirrel may not benefit.
Anyone who suggests that vendors of information retrieval technology have lost their keen marketing edge, you are not in touch with butternut squash and reality. Should the digital chefs Put Kentucky bourbon in Bengali Butternut BBQ Sauce? Myron Mixon, the winningest man in barbecue, may say, “That’s what I am talkin’ for my whole hog.” Couild IBM sponsor the barbecue cook off program? Mr. Mixon may be a lover of tamarind and tofu too.
Stephen E Arnold, May 30, 2014
May 20, 2014
I wanted to associate Cognos with Cognea. Two different things. IBM’s Watson unit, according to “IBM Watson Acquires Artificial Intelligence Startup Cognea,” is beefing up its artificial intelligence capabilities. Facebook, Google, and other outfits are embracing the dreams of artificial intelligence like it is 1981 when Marvin Weinberger was giving talks about AI’s revolutionizing information processing. I have lost track of Marvin, although I recall his impassioned polemics, 30 years after hearing him lecture. Unfortunately I remain skeptical about “artificial intelligence” because Watson, as I understood the pitch after Jeopardy, was already super smart. I suppose Cognea can add some marketing credibility to Watson. That system is curing disease and performing wonders for the insurance industry, if I embrace the IBM public relations’ flow.
In my lectures about the Big O problem, I point out that many of today’s smartest systems (for example, Search2, to name one) implements clever methods to make well known numerical recipes run like a teenager who just gulped three cans of Jolt Cola followed by a Red Bull energy drink.
The reality is that there are more sophisticated mathematical tools available. The problem is that the systems available cannot exploit these algorithmic methods. I am pretty confident that Cognea tells a great story. I am even more confident that IBM will do the “Vivisimo” thing with whatever technology Cognea actually has. Without a concrete demo, benchmarks, and independent evaluations, I will remain skeptical about “a cognitive computing and conversational artificial intelligence platform.”
I am far more interested in the Cybertap technology that IBM acquired and seems to be keeping under wraps. Cybertap works. Artificial intelligence, well, it depends on how one defines “artificial” and “intelligence” doesn’t it?
Stephen E Arnold, May 20, 2014
May 20, 2014
One tech-savvy skeptic channels his frustration with Age-of-Aquarius bromides into what he calls his “New Age [BS] Generator.” With a mouse click, the application assembles random esoteric phrases into sentences that do sound remarkably like something one might see from modern mystics. Blogger seb pearce explains he was inspired while watching Deepak Chopra debate philosophy:
“After sitting through hours of New Age rhetoric, I decided to have a crack at writing code to generate it automatically and speed things up a bit. I cobbled together a list of New Age buzzwords and cliché sentence patterns and this is the result.
“You’ll get some profound-sounding nonsense here, too.
“So, what is this for? Put it on your website as placeholder text. Print it out as a speech for your yoga class and see if anyone can guess a computer wrote it. Use it to write the hottest new bestseller in the self-help section, or give false hope to depressed friends and family members.”
Pearce admits that he was not the first to put forth such a generator. While he was working on his mockery, another page called “The Enigmatic Wisdom of Deepak Chopra” was launched; he encourages us to try both generators. The popular sites indicate that pearce is not the only one tired of philosophical clichés.
Cynthia Murrell, May 20, 2014
April 22, 2014
Hakia has been a little quiet lately, but that doesn’t mean the upstart search engine isn’t still gaining fans. We found a really enthusiastic review in a recent Christiano Kewna post, “Proof! Haika.com Works Better than Google Search on Long Tail Keyphrases.”
According to Kwena:
If you are searching using natural language phrases, then I urge you to check out Hakia.com. You can still revert back to Google for some other searches, but if you have a 10 word phrase that you are searching for, then the big Giant Google will likely take you round and round.
Actually, things aren’t so quiet around Hakia headquarters. According to a recent PR Newswire piece, Hakia partnered with FLOW to work on social media marketing. According to one exec, “We are excited that Flow has chosen to integrate [Hakia] into its social commerce platform. We expect many other technology innovators to move in this direction.” We think the world of Hakia and look forward to them making routine splashes again. This is one of the sharpest enterprise search companies on the block and always worth watching.
Patrick Roland, April XX, 2014
April 2, 2014
OpenCalais is an open source project that creates rich semantic data by using natural language processing and other analytical methods through a Web service interface. It is a simple explanation for a piece of powerful software. OpenCalais was originally part of ClearForest, but Thomson Reuters acquired the project in 2007. Instead of marketing OpenCalais as proprietary software, Reuters allowed it to remain open. OpenCalais has since become valued metadata open source software that is used on blogs to specialized museum collections.
There are many notables who use OpenCalais and a sample can be found on “The List Of OpenCalais Implementations Grows.”
OpenCalais is excited about the new additions to the list:
“Add 10 to the list of innovative sites and services that use OpenCalais to reduce costs, deliver compelling content experiences and mine the social web for insight. See our press release for more details on each. We are thrilled to recognize the following new sites and services that are changing the way we engage with news and the social Web. They join a growing number of others in media, publishing, blogging, and news aggregation who use OpenCalais.”
Among them are The New Republic, Al Jazeera’s English blogging news networks, Slate Magazine’s blogging network, and I*heart* Sea.” Not only do news Web sites use OpenCalais, but news aggregation apps do as well, including, Feedly. DocumentCloud, and OpenPublish. Expect the list to grow even longer and consider OpenCalais for your own metadata solution.