Milward from Linguamatics Wins 2010 Evvie Award

April 28, 2010

The Search Engine Meeting, held this year in Boston, is one of the few events that focuses on the substance of information retrieval, not the marketing hyperbole of the sector. Entering its second decade, the conference speakers tackle challenging subjects. This year speakers addressed such topics as “Universal Composable Indexing” by Chris Biow, Mark Logic Corporation, “Innovations in Social Search” by Jeff Fried, Microsoft, and “From Structured to Unstructured and Back Again: Database Offloading”, by Gregory Grefenstette, Exalead, and a dozen other important topics.

evvie2010

From left to right: Sue Feldman, Vice President, IDC, Dr. David Milward, Liz Diamond, Stephen E. Arnold, and Eric Rogge, Exalead.

Each year, the best paper is recognized with the Evvie Award. The “Evvie” was created in honor of Ev Brenner, one of the pioneers in machine-readable content. After a distinguished career at the American Petroleum Institute, Ev served on the planning committee for the Search Engine Meeting and contributed his insights to many search and content processing companies. One of the questions I asked after each presentation was, “What did Ev think?”. I valued Ev Brenner’s viewpoint as did many others in the field.

The winner of this year’s Evvie award is David R. Milward, Linguamatics, for his paper “From Document Search to Knowledge Discovery: Changing the Paradigm.” Dr. Milward said:

Business success is often dependent on making timely decisions based on the best information available. Typically, for text information, this has meant using document search. However, the process can be accelerated by using agile text mining to provide decision-makers directly with answers rather than sets of documents. This presentation will review the challenges faced in bringing together diverse and extensive information resources to answer business-critical R&D questions in the pharmaceutical domain. In particular, it will outline how an agile NLPbased approach for discovering facts and relationships from free text can be used to leverage scientific knowledge and move beyond search to  automated profiling and hypothesis generation from millions of documents in real time.

Dr. Milward has 20 years’ experience of product development, consultancy and research in natural language processing. He is a co-founder of Linguamatics, and designed the I2E text mining system which uses a novel interactive approach to information extraction. He has been involved in applying text mining to applications in the life sciences for the last 10 years, initially as a Senior Computer Scientist at SRI International. David has a PhD from the University of Cambridge, and was a researcher and lecturer at the University of Edinburgh. He is widely published in the areas of information extraction, spoken dialogue, parsing, syntax and semantics.

Presenting this year’s award was Eric Rogge, Exalead, and Liz Diamond, niece of Ev Brenner. The award winner received a recognition award and a check for $500. A special thanks to Exalead for sponsoring this year’s Evvie.

The judges for the 2010 Evvie were Dr. David Evans (Evans Research), Sue Feldman (IDC), and Jill O’Neill, NFAIS.

Congratulations, Dr. Milward.

Stuart Schram IV, April 28, 2010

Sponsored post.

New Search and Old Boundaries

April 28, 2010

Yesterday in my talk at a conference I pointed out that for many people, the Facebook environment will cultivate new species of information retrieval. Understandably the audience listened politely and converted my observations into traditional information retrieval methods. Several of the people with whom I spoke pointed out that the Facebook information was findable only with a programmatic query via the Facebook application programming interfaces or by taking a Facebook feed and processing it. The idea that “search” now spans silos, includes structured and unstructured data, and delivers actionable results describes what some organizations want. There are challenges, of course. These include:

  • Mandated silos of information; for example, in certain situations, mash ups and desiloization are prohibited for legal or practical reasons
  • The costs of shifting from inefficient, expensive methods to more informed methods; for example, the costs of data transformation can be onerous. I have talked with individuals who point out that data transformation can consume significant sums of money and these expenditures are often inadequately budgeted. One result is a slow down or cut back on the behind-the-scenes preparatory work
  • Business processes have sometimes emerged based on convention, user behavior or because the system was refined over time. When “data” are meshed with such a business process, the marriage is a less-than-happy one. Data centric thinking can be blunted when juxtaposed to certain traditional business processes and methods.

In short, the new world can be envisioned, based on speculation, or assembled from fragmentary reports from the field. I can imagine the intrepid 16th century navigators understanding why innovators have to push forward into a new and unknown world. One reminder is the assertion that an estimated 358 million personal data records have been leaked since 2005.

The Guardian article “Facebook Privacy Hole ‘Lets You See Where Strangers Plan to Go‘” provides an example of one challenge. The point of the write up is that the Facebook social network has a “privacy hole”. The Guardian says:

Some people report that they are able to see the public “events” that Facebook users have said they will attend – even if they person is not a “friend” on the social network…The implications of being able to find out the movements of any of the 400m people on Facebook are potentially wide-ranging – although the flaw does not seem to apply to every user, or every event. Yee says that the simplest way to prevent your name appearing in such lists is to put “not attending” against any event you are invited to.

As the Facebook approach to finding information captures users, the barriers between new types of information and the uses to which those information objects can be put come down. In a social space, the issue is personal privacy. In an organizational space, the issue is the security of information assets.

As young people enter the workforce, these folks bring a comfort level with Facebook type of systems markedly different from mine. I think organizations are largely unable to control effectively what some employees do with online services. Telework, mobile devices, and smart phones present a management and information challenge.

The lowering of information barriers and the efforts to dissolve silos further reduces an organization’s control of information and the knowledge of the uses to which that information may be put.

Let’s step back.

First, ineffective search and content processing systems exist, so organizations need ways to address the costs and inefficiencies of incumbent systems. Web services and fresh approaches to indexing content seem to be solutions to findability problems in some situations.

Second, employees—particularly those comfortable with pervasive connectivity and social methods of obtaining information—do what works for them. These methods are not necessarily controllable or known to some employers. An employee can use a personal smart phone to ask “friends” a question. After all, what are friends for?

Third, vendors want to describe their systems using words and phrases that connote ways to solve findability problems. Talking about merged data and collaboration may be what’s needed to close a deal.

When these three ingredients are mixed, the result is a security and information control challenge that is only partially understood.

Is it possible to deliver a next generation information experience and minimize the risks from such a system? Sure, but there will be surprises along the route. Whether it is Mr. Zuckerberg’s schedule or insights into the Web browsing habits of government employees, there will be unexpected and important insights about these systems. The ability to use a search interface to obtain reports is increasing. Are the privacy and security controls lagging behind?

Stephen E Arnold, April 28, 2010

Unsponsored post.

Oracle Shape Shifts Search

April 28, 2010

Oracle’s SES10g search system and the native search functions in PL/SQL provide licensees with ways to locate information. Oracle has been moving to leap frog the problems of traditional search, and the article “Florida State U Transforms Reporting with Business Intelligence” may provide a glimpse of what Oracle will do to prevent search vendors from poaching. The article says:

Online Management of Networked Information, or OMNI, the university’s name for its ERP system, consists of an enterprise portal, financials, HR and payroll, and enterprise performance management. That word, “online,” is revealing. Beginning in March 2008 Florida State has been in a continuing process to move off of third-party BI vendor and legacy tools for reporting and to open up its data systems to 1,500 active users on campus through several analytics tools provided in Oracle Business Intelligence Suite Enterprise Edition Plus (OBIEE). These include a query system; dashboards; Microsoft Office-integrated analytics; and “ibots,” e-mail alerts sent when specific user-set conditions in the data occur.

Two points. The article focuses on the financial payback, estimated at no less than $360,000. Second, the integrated system delivers needed information in the context of an integrated system courtesy of Oracle. Forget search. The new system tells users about important events.

What happens to search? It becomes a utility, not the main event. I keep hearing rumors that Oracle is thinking about buying a traditional search and content processing vendor. That may be true. The story to watch is Oracle’s using cost savings and desirable new features to deemphasize traditional search. Will the approach work? To many organizations that $360,000 savings looks tempting enough to make the set up sticker shock lose its impact.

Stephen E Arnold, April 28, 2010

Unsponsored post.

Google and Its Nation State Mistakes

April 28, 2010

Wired Magazine’s “Word War III: Google vs. Government” makes clear that the worm is turning. Five years ago, Google was wonderful. Now Wired identifies three mistakes in Google’s handling of a letter from data protection authorities in nine countries. The fact that Google is publicly excoriated for making mistakes is proof enough for me. One mistake is the use of “weasel words” in the responses. The second goof, according to Wired, is the company’s dismissal of the issue of privacy. And, the third is the lack of action or as Wired says, the nine countries hopefully will “shut up and go away.” I think Google is the cat’s pajamas. Wired does not share my viewpoint. How many others share Wired’s viewpoint is the key question.

Stephen E Arnold, April 27, 2010

Unsponsored post.

Google and Travel

April 28, 2010

For a number of years, Google has processed a query like SFO LGA at a city pair. One of the results is a table that allows one click access to travel reservations. Well, not exactly one click as you will see when you try the query. Close enough. The story that Google had an interest in acquiring a travel plumbing company called ITA. You can get the scoop in “Google Rumor Puts Focus on ITA.” My take on this alleged deal was, “Got it.”

image

When I read the Gerson Lehrman Group’s “Travel Game Changer – What Does Google’s Potential acquisition of ITA Mean?” I realized the gap between my “got it” and the GLG analyst’s view; that is, Google buys ITA and gets some nifty plumbing.

I certainly appreciate the ITA plumbing, but my “got it” considered these factors:

  • ITA is a keystone company; that is, it supports a number of related entities and their operations. Google’s getting into satellite imagery pivoted on a similar keystone type of deal.
  • ITA has credibility; that is, ITA translates to “travel” in some potentially desirable market segments.
  • ITA has engineers; that is, these engineers bring domain knowledge and skills to the Google.

What happens in this type of deal is that Google’s disruptive potential is increased. Just as the satellite imagery deal caused some excitement at Microsoft when it took place, the ITA deal may have similar impact. This type of deal is not a deal based exclusively on technology, no matter how impressive. Google has some technology that can perform somewhat similar functions. The ITA technology may, like the dMark technology, be Google-ized over time if the deal actually takes place.

I find the GLG analysis interesting, but I am not sure it positions the deal in the Googley world. I did like the embedded “hire me” ad. (Not quite Google grade but an interesting touch in an “objective” informational post in my opinion.)

Stephen E Arnold, April 28, 2010

Unsponsored post.

SAS and Social Media

April 28, 2010

The social media bandwagon rolls on. I read “SAS aims to Make a Splash in Social Media Analytics” and realized that even large firms cannot ignore the shift to Facebook’s impact. True, there are many social media companies, but Facebook has emerged as the go-to service, threatening to eclipse even Twitter. The story says:

SAS says its technology can identify influencers within social networks, quantify their impact and from that forecast the future volume of social media conversations. The ultimate aim is to predict what impact these conversations will have on a business so companies can allocate relevant resources, create “what-if” scenarios and correlate key marketing metrics like brand preference, web traffic, online campaign effectiveness and media mix.

IBM SPSS will be quick to respond. Statistics could even become even more fun.

Stephen E Arnold, April 28, 2010

Unsponsored post.

High Speed Fiber May Not Be a Must Have

April 27, 2010

Ars Technica’s “16% of US Homes Can Now Get Fiber, but Deployments Slowing” surprised me. I eagerly await high speed fiber in Harrod’s Creek, Kentucky. The links I have now are not up to the task of loading piggy Web pages, downloading “must see” videos, or acquiring the software updates that vendors hose at my netbook with regularity. The article reports:

less than a third of those homes that can order a fiber connection have actually done so. Verizon’s take rate has been decent but not earth-shattering…

What happens to the next generation, rich media services from vendors such as Amazon, Apple, Google, and Netflix, among others, if the necessary connections are not in place? The revenue models built on the assumption that US consumers want super high speed connections may have to be reworked. Maybe Google’s approach is more of a test than a market challenge? Google has some interesting rich media capabilities which, it appears to me, Google has not rolled out. The time may not be propitious for a really big play in high speed to the home. Could the opportunity be in a Meraki type of high speed wireless service?

Stephen E Arnold, April 27, 2010

Unsponsored post.

Just Systems and Stilo Team for XML Management

April 27, 2010

XML (Extensible Markup Language) is one of those buzzwords that cause folks to knod knowingly. Do these folks share a common understanding of XML, which has more flavors than Ben & Jerry’s ice cream store? When the subject turns to XML management, my hunch is that the likelihood of a common understanding tilts south. “JustSystems and Stilo International Partner to Enable Enterprises to Manage XML Content” describes a tie up between the two firms this way:

JustSystems, the largest independent software vendor in Japan and a worldwide leader in XML and information management technologies, and Stilo International, the leading provider of automated XML content conversion solutions and associated technologies, are pleased to announce a partnership that lets the enterprise effectively manage content in XML through the integration of Stilo Migrate v2 and JustSystems XMetaL Enterprise 6.0. Migrate v2, also announced today, is the latest version of Stilo`s on-demand XML content conversion and transformation service.

XML can be an unweildy beastie. How unweildy? Well, whipping around source files and converting them to well-mannered XML is interesting. Then managing these verbose instances adds an extra level of excitement which makes some traditional database administrators have an opportunity to spend weekends at the office.

You can get more information about JustSystems and Stilo on their respective Web sites. With JustSystems anchored in Japan and Stilo in the UK, the teaming makes marketing sense. Now will the tie up produce the anticipated upticks in sales and revenues. The tie up will be a fascinating story to watch.

Stephen E Arnold, April 27, 2010

Unsponsored post.

The UK Guardian Shares Its Facebook Love

April 27, 2010

In general, I agree with the Guardian’s “Why Facebook’s Open Graph Idea Must Be Taken Seriously.” The Guardian sees Facebook as “the new Google.” The write up adds:

The sheer scale of Facebook and the extra ease with which its vast number of users can spread links, applications, bits of videos and snippets of news across all manner of digital platforms is, in one sense, awe-inspiring, but on the other hand raises a whole new set of issues to grapple with. To completely spurn out of hand the incredible reach this platform could offer would simply be madness, at a time when finding a relevant audience for news and other content is the biggest challenge. But to hand over all of this activity wholesale to Facebook suggests that within five minutes there will be another head-scratching session as media executives hunt for the teaspoons from the family silver Google left behind.

My view is that Google has been driven by the spirit of a math club. Facebook is different. There is the math club element in the form of former Xooglers who labor in the Facebook vineyards. But there is a different mix of insight, brilliance, and approach in Facebook. A new Google does not mean that the problems exposed by the Google system will go away. New problems and opportunities will “layer up.” Excitement ahead.

Stephen E Arnold, April 27, 2010

Unsponsored post.

Bing Costs

April 27, 2010

Short honk: “Bing Loses More Money as Microsoft Chases Google” highlighted an aspect of Microsoft’s recent financials I had overlooked. The key passage for me was:

During Microsoft’s fiscal third quarter, which ended March 31, the Online Services Division, or OSD, reported a 12 per cent increase in revenue, which rose to US$566 million on the back of higher advertising revenue. That wasn’t enough to offset a surge in operating expenses during the period. The division’s quarterly loss grew by 73 per cent to $713 million, compared to a loss of $411 million during the same period last year.

Not material in the scope of Microsoft’s revenues from its other business units. The recent Facebook play may pay big dividends. As search shifts from Web site text to other types of information objects, Microsoft may be poised to reverse this reported loss.

Stephen E Arnold, April 27., 2010

Unsponsored post.

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta