Blogs May Be Training Input for AI Systems
April 16, 2010
The Montréal Gazette ran an interesting story “’Mundane’ Blogs Could Help Train Artificial-Intelligence Computers: Researcher. I think of blogs as marketing vehicles, not instructional material. That goes to show how little I know. For me, the key passage in the write up was:
For Andrew Gordon, there’s no such thing as a boring blog — even if it chronicles making breakfast or walking to work. A research scientist at the University of Southern California’s Institute for Creative Technologies, he’s heading a new project with the ambitious aim of archiving every English-language blog entry posted online — a million of them a day — in hopes of using this vast database to teach artificial-intelligence computers about real life. “People write about the mundane aspects of their daily life, and for me, personally, I find it incredibly interesting,” he says.
This line of research falls within what has been called “a formalization of common sense.”
Stephen E Arnold, April 16, 2010
No one paid for this post.
Wither Nervana?
April 16, 2010
I received a call about a company in the Seattle area. The firm is Nervana, founded in 2001, and if you are one of the lucky folks attending the Gilbane conference in San Francisco, you can hear a talk by Nervana’s founder, Nosa Omoigui. Nervana focused on semantics and natural language processing. My Overflight files has a meaty collection of information about this company’s technology. The firm received funding and ramped up its marketing in 2006. The firm pushed into the processing of health and medical content. Then the firm refocused its efforts on the processing of résumés. The firm’s Web site is online at www.nervana.com, but the news section has not been updated since November 2006. I continue to track the firm because Mr. Omoigui is involved with Youth for Technology Foundation which has a presence in Louisville, Kentucky.
What’s important about Nervana is that the company’s trajectory shows how a very bright entrepreneur in the field of content processing has positioned what is, in my opinion, a quite interesting technical system. The firm’s technology is anchored in “a unique technology that allows knowledge workers to ask questions naturally within the context of their meanings.” A LinkedIn description adds:
“Nervana, Inc. provides knowledge discovery solutions for companies. Its solutions enable knowledge based workers to find, correlate, and retrieve the information from repositories both inside and outside their enterprise. The company’s products include Drug Discovery that provides Medline, life sciences news, and life sciences Web content for research and development teams; Business Discovery, which offers Medline, life sciences news, general news, patents, and life sciences Web; IP Discovery that enables users to discover and retrieve information from the United States, European, Japanese, and other worldwide patents; and Premium Discovery for enterprise customers to manage their in-house information. It also offers project management, logistics, pre-configuration, onsite installation, informatics consulting, and documentation services. Nervana was founded in 2001 and is headquartered in Seattle, Washington.”
My notes show that one of the sources of funding is now involved in a company that seems to use the original Nervana logo. This firm is Dipiti. SeattlePI in February 2008 ran a story “Dipiti, a Search Engine for Message Boards.” Dipiti seems to have gone off line and now redirects to Hot Shopper.
What’s interesting is that the trajectory of Nervana shows that next generation content processing has huge potential. Management and investors tried a number of different markets. The other thought that struck me is the words and phrases used to describe the firm’s technology are as fresh today as they were in the firm’s marketing push in 2006. Next generation content processing evokes considerable market interest. Nervana, shortly before it repositioned, was named a “Hot 100” company and touted some major clients, including Procter & Gamble. (Lists of “hot” companies may not be valid indicators of a firm’s health in my opinion.)
This is an interesting case example of the challenges facing some types of technologies.
Stephen E Arnold, April 16, 2010
A freebie.
Siri and Its Virtual Assistant
April 16, 2010
The idea of having our own personal assistant to handle all the mundane stuff in life sounds intriguing to most of us. No more flipping through yellow pages, making routine phone calls, or even searching on Google (gasp!) sounds like a fantasy land akin to The Jetsons. However, Siri International has released a new phone application that appears to do all those things and more. “Siri Launches Virtual Personal Assistant for iPhone 3GS” announces the release of their virtual personal assistant that with just a vocal prompt can purchase theatre tickets, call for a taxi or make restaurant reservations. Born out of SRI International’s CALO (Cognitive Assistant that Learns and Organizes) project, Siri utilizes advanced technologies to enable an intelligent, context-aware, question-and-answer interaction. The wide variety of Web services and APIs available allows Siri to get things done; things you no longer have to do yourself.
Melody K. Smith, April 16, 2010
IBM and Open Source
April 16, 2010
The idea that IBM was an open source outfit struck me as silly when I first heard about its 2005 patent pledge. I enjoyed the excellent article “IBM Breaks the Taboo and Betrays Its Promise to the FOSS Community”. You will want to read Florian Mueller’s write up and make up your own mind. The information presented does not surprise me. IBM has big revenue and may be one of the “too big to fail” outfits. The company has shifted from software to consulting and it is now traveling well worn paths in its online ambitions, deals with telecommunication companies, and baloney about the economics of mainframes. This passage caught my attention:
This proves that IBM’s love for free and open source software ends where its business interests begin. In market segments where IBM has nothing to lose, open source comes in handy and the developer community is courted and cherished. In an area in which IBM generates massive revenues (an estimated $25 billion annually just on mainframe software sales!), any weapon will be brought into position against open source. Even patents, which represent to open source what nuclear arms are in the physical world.
I think open source is one of the important trends in play at this time. Companies have wrapped themselves in open source robes. Beneath the robes is the same old entity. Same motives. Same goals. Same belief that marketing can create a corporate reality. Well, maybe IBM is protecting its mainframe assets and still loves open source?
Stephen E Arnold, April 16, 2010
A freebie.
Jazzing Up Bing for Spring
April 15, 2010
Microsoft seems to be using the Sgt. York approach with Bing – adding and tweaking features for the application, one at a time, until they have a complete dynamic package.
Some people wonder, however, if this approach leaves users confused and disinterested, as reported by TechNewsWorld.com in their article, “Bing’s New Bells and Whistles Could Leave Searchers’ Heads Ringing” Let’s face it, we currently live in a world that thrives on immediate gratification. Patience may be a virtue, but not a highly valued one in this day and time. Is it wise to bank on the users still being engaged or even still present when the product has peaked?
The newest feature making the biggest impact, at least out in the blogosphere, is the integration of Foursquare into Bing Maps results. According to Bing Group Product Manager Todd Schwartz,
“Selecting the Foursquare Map App in Bing Maps and zooming in to Greenwich Village will get you tips that show you what locals are saying about the hot spots in that area,” Schwartz wrote. “It’s like an interactive day planner, designed to help find the best things to do in that area. And if you have questions, you can always contact users through foursquare to get the inside scoop.”
The addition of Foursquare certainly sends the message that Microsoft is looking forward at Bing’s mobile applications. However, consumers are currently happy with Google and the search features they find there. It will take a lot of incentive and fantastic features to encourage users to consider switching search engines. Microsoft seems to have devoted lots of time and attention to making just that happen with Bing. A commitment to update and refine Bing several times a year is a significant departure from Microsoft’s history of approaches.
Interestingly, this approach seems vaguely familiar as a tried and true Apple strategy. Release a product, capture your audience’s attention, and upgrade features frequently until you have a loyal customer base that will now pay close attention to any new product you release. Can we say iPod? And now iPad?
But will the effort be noticeable in market share and profits? It appears that Microsoft and Yahoo! are fighting for their share of a piece of pie that doesn’t grow. So whatever gains one makes, it is always at the expense of the other.
Melody K. Smith, April 15, 2010
Note: Post was not sponsored.
eBook Sales to Grow
April 15, 2010
In a report from Goldman Sachs, analysts predicted growth in book sales. “U.S. Book Sales to Increase on E-Books, Goldman Says” included this statement: “Apple’s share of the e-book market will surge to 33 percent in 2015 from 10 percent this year.” Amazon, it seems, will see its share of e-book sales decline to 28 percent from 50 percent. Will e-books remain books, or will e-books morph into interactive media? Will authors of books be able to create products that will appeal to users of new devices like the Apple iPad? If publishers have to invest in software development, will increased costs of production put further pressure on author royalties?
Stephen E Arnold, April 15, 2010
Unsponsored post.
Google Adds to Its Real Time Search Services
April 15, 2010
Short honk: Every Google watcher on the planet has documented the most recent Google real time search services. I just want to capture the date (April 14, 2010) and the links to the “official” announcements. For the blog post about the new service navigate to “Replay It.” For the experimental find a person to follow service, point your browser to Google Follow Finder. Both services are likely to be used by a smaller percentage of Google.com users. These are advanced search features, and users still bang in two or three words and look at the top results. One important aspect of the Follow Finder is that it appears to be running on the Google Apps Engine. Useful for marketing and intelligence purposes.
Stephen E Arnold, April 15, 2010
Unsponsored post.
SSN Minute: Tactics to Strategy Available
April 15, 2010
Strategic Social Networking’s video for April 15, 2010, is now available. The subject is “Social Media Tactics to a Social Media Strategy” written and presented by David Thimme’s, ArnoldIT.com’s social media analyst. The video runs about two minutes. You can access the video via http://ssnblog.com and clicking on the SSN Minute logo.
Stephen E Arnold, April 15, 2010
This is a sponsored link.
Automatic Translation Percolates
April 15, 2010
SDL is a company that provides global information management to organizations worldwide. The firm is active in automated translation, Web content management, structured content technologies, and eCommerce. I learned that SDL reported some of the findings from the firm’s study into machine translation. The factoids in this write up come from “SDL Reveals Results of its Automated Translation Survey”:
- A surprising 28 percent of those in the sample of 228 people are using or plan to use automated translation. (My anecdotal information about machine translation was that its use was well over 50 percent. Part of this perception comes from comments made about the uptake of Google’s free translation service and its automated method in the Chrome browser.)
- The reason the survey sample does not use machine translation is related to quality. (The use of human translators does deliver better handling of slang, particularly in casual communications and some types of informal writing. But machine translation has, based on our tests at ArnoldIT.com, works quite well on scientific, technical, and medical information and certain types of formal business writing; for example, a proposal that conforms to a specific set of technical guidelines set forth in a statement of work.)
- More than half of the respondents wanted to hook machine translation to human translators in order to improve quality.
The article said:
SDL invested in machine translation in 2001 and launched its Knowledge-based Translation System in 2004. SDL publishes over 7 billion words of content through its automated translation systems every year. A 300 strong team of computational linguists, project managers and post-editors has been human post-editing machine translation for global clients for over 6 years. SDL sees the success of machine translation as being through its integration in the translation process. SDL integrates machine translation technology with consulting services, desktop translation memory and enterprise translation management systems in hosted, on-premise or SaaS environments to suit the needs of global business.
Beyond Search believes that the volume on content to translated from source to target language makes human translation more problematic. In one police department, the organization has more translators than a major Federal agency. The reason is that certain languages lack a sufficient number of translators for that language. Google’s free system relies on humans, but the company invites a person to submit a better translation that the one Google produces.
Costs of human translation continue to rise. Interesting situation with content volume increasing, humans becomes more expensive, and free services available.
Stephen E Arnold, April 15, 2010
Unsponsored post.
Google and Disruption: Will It Work Tomorrow?
April 15, 2010
Editor’s Note: The text in this article is derived from the notes prepared by Stephen E Arnold’s keynote talk on April 15, 2010. He delivered this speech as part of Slovenian Information Days in Portoroz, Slovenia.
Thank you, Mr. Chairman. I am most grateful for the opportunity to address this group and offer some observations about Google and its disruptive tactics.
I started tracking Google’s technical inventions in 2002. A client, now out of business, asked me to indicate if “Google really had something solid.”
My analysis showed a platform diagram and a list of markets that Google was likely to disrupt. I captured three ideas in my 2005 monograph “The Google Legacy“, which is still timely and available from Infonortics Ltd. in Tetbury, Glos.
The three ideas were:
First, Google had figured out how to add computing capacity, including storage, using mostly commodity hardware. I estimated the cost in 2002 dollars as about one-third what companies like Excite, Lycos, Microsoft, and Yahoo and were paying.
Second, Google had solved the problem of text search for content on Web pages. Google’s engineers were using that infrastructure to deliver other types of services. In 2002, there were rumors that Google was experimenting with services that ranged from email to an online community / messaging system. One person, whose name I have forgotten, pointed out that Google’s internal network MOMA was the test bed for this type of service.
Third, Google was not an invention company. Google was an applied research company. The firm’s engineers, some of whom came from Sun Microsystems and AltaVista.com, were adepts at plucking discoveries from university research computing tests and hooking them into systems that were improvements on what most companies used for their applications. The genius was focus and selection and integration.
Google is an information factory, a digital Rouge River construct. Raw materials enter at one end and higher value information products and services come out at the other end of the process.
In my second Google monograph, funded funded in part by another client, I built upon my research into technology and summarized Google’s patent activities between 2004 and mid 2007. Google Version 2.0: The Calculating Predator, also published by Infonortics Ltd., disclosed several interesting facts about the company.