Microsoft Powerset Could Unseat Google

July 8, 2008

You may find this essay stimulating. I did. Rebecca Sato’s essay “Microsoft Acquires Powerset”: Why a Semantic Web Will Be Smarter, Faster & All-Around Better” is remarkable. Please, navigate to The Daily Galaxy and get the inside scoop on the future of the Web. For example, Ms. Sato writes:

Microsoft’s acquisition of Powerset signals a the building of a future when the entire world will likely have access to virtual “software agents” who will “roam” across the Web, making our travel arrangements, doctor’s appointments and basically taking care of all the day-to-day hassles for humankind. It’s a great vision, but it will never be achieved with today’s current Internet.

My take on Ms. Sato’s thesis is that today, users must struggle with text documents that require the user to figure out what’s important. The future is smarter software, richer indexing, and more dimensionality for the information. Ms. Sato acknowledges that that Powerset-type functions are in their early stages. I agree.

Let me offer two observations;

Smart software can be resource intensive. As a result, semantic systems may have to start small and grow as the computing resources become available. To me, this means that semantic systems may be confined to modest roles, often as utilities or special purpose operations. If this happens, semantic systems may take years to deliver on their potential.
Semantic technology may find itself playing catch up to search systems that use smart shortcuts. For example, user tagging may provide acceptable payoffs without the complexity and cost of semantic systems. If this happens, the search revolution may be people power, not smart software.

Agree? Disagree? Let me know.

Stephen Arnold, July 8, 2008

Written by Stephen E. Arnold · Filed Under News, Search, Semantic, Text processing | 1 Comment

SurfRay AB Update

July 6, 2008

In the first two editions of Enterprise Search Report, I profiled Mondosoft, a Microsoft-centric search system. I gave it a favorable review. Like most Microsoft-centric products, unless properly resourced, performance can become an issue. By the time I started work on the third edition of ESR in 2006, I had heard rumors of some changes underway at the firm. By late 2007, Mondosoft became part of SurfRay, a Danish search conglomerate. I found the search system implemented for the Vatican quite interesting. Hit boosting and multi-lingual support added zest to what could have been a sinfully bad (no pun intended) search experience. You can try it here.

In 2004, Mondosoft caught my attention because it was one of the first search vendors to offer analytics for licensees. Mondosoft, when deployed in a SharePoint environment, brought much needed usage data into the SharePoint picture. Instead of flying blind, Mondosoft gave the system administrator useful information about user actions. With Mondosoft’s analytics, SharePoint sites could be tuned to improve the user’s experience. Microsoft talked about SharePoint user experience; Mondosoft delivered technology that addressed user experience.

Mondosoft then acquired Ontolica, a company that made better use of SharePoint metadata and generated other useful tags. With Ontolica 3.2 installed and properly resourced, a SharePoint administrator could provide a useful set of hot links related to the user’s query. Microsoft delivered a blunt instrument; Ontolica provided a precision tool.

SurfRay’s product line includes an advanced, multi-lingual search engine suite with three components [a] MondoSearch, [b] BehaviorTracking, and [c] InformationManager, SurfRay’s Speed Index search and retrieval system, and Ontolica Search for SharePoint, providing business intelligence on information creation, search, retrieval and use. SurfRay also owns technology that can speed up searches of traditional relational database tables. In addition, SurfRay provides consulting services to its licensees. Plus, the company offers SurfRay XP search for Xerox’s multifunction document systems.

SurfRay/Mondosoft customers include Bosch, Burger King Corporation, Coleman. Hilton Hotels, Honeywell Process Solutions, Microsoft, Overnight Transportation, People’s Bank, Shell Oil, Siemens, SimCorp, The Swiss Army, TDC, The Vatican Holy See and United Technologies. SurfRay’s CEO and founder is Martin Veise. The president of the company is Steffen Saxil.

SurfRay has offices in New York, Stockholm, Bangkok and Copenhagen. You can learn more about the company here.

Stephen Arnold, July 6, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, Online (general), Search, Semantic, Text processing | Comments Off on SurfRay AB Update

Google: Another Angle on Question Answering

July 5, 2008

On July 3, 2008, the USPTO published US20080160490. Applications do not equal real products and services. Many people remind me that patent applications are the busy work of misguided engineers, flights of fancy, or bar bets among engineers to see who can fool the USPTO. The application was filed on March 22, 2007, and published about 15 months later, pretty snappy for this fine US government entity. The paperwork was herded along by Google’s legal eagles at Fish & Richardson, a law firm operating from the warm and sunny Minneapolis, Minnesota.

If you are curious, you may want to take a look at “Seeking Answers to Questions”. The buzz about Powerset’s marvelous semantic search engine has many folks twittering. You may want to visit Hakia.com here to see an all-software approach. Then check out Yahoo’s help system here which seems to share some similarities with what Google describes. You can find other question answering systems, including InQuira’s implementation for Honda, Semantra’s system, et al.

Here’s what Google says its invention does:

A computer-implemented method of seeking answers to questions comprises receiving one or more questions from users seeking answers, maintaining an inventory of pending questions to be answered, and transmitting a question from the pending question inventory to a network location determined to be topically relevant to the transmitted question based on the content of the network location.

Pretty mundane, right? If so, then why are two Google wizards–Udi Manber and Benedict Gomes–wasting their time and Google’s money with this approach to question answering?

The social aspects of this invention are interesting. The human inputs hook into the Google infrastructure. There are hints of Google’s method for figuring out what’s good and what’s less good expressed as “knowledgeable users”, Google’s desire to build knowledge bases as it does with phonemes, and Google’s interest in hooking traffic into Web sites for the purpose of selling advertising. The notion of experts collaborating with experts struck me as a broader implementation of the types of operations one can achieve with appropriate resources via Tacit Software’s system for an enterprise.

This invention caught my attention because it expresses the meta-nature of some of Google’s other recent innovations. Google is chugging to knit existing intelligent sub systems into integrated fabrics of functionality.

I find this invention amusing because as Microsoft pursues Google with Xerox PARC technology that iterates down to meaning via machine processes. Google is exploring how to integrate human smarts, Google fancy math, and finer-grained advertising opportunities for advertisers. Judge for yourself if this expresses a holistic approach to information. The patent application is only 13 pages of crystal clear Google legalese and engineering explication. Agree? Disagree? Let me know.

Stephen Arnold, July 5, 2008

Written by Stephen E. Arnold · Filed Under Database, Google, News, Search, Semantic, Technology | Comments Off on Google: Another Angle on Question Answering

Yahoo’s Semantic Search Still Available

July 3, 2008

In the firestorm of publicity burning through blogland, Yahoo’s semantic search system has been marginalized. I admit, the url is not the easiest to remember: http://www.yr-bcn.es/demos/microsearch/. The moniker Microsearch seems to be intended to tell the astute user that Yahoo processes microformat information. A microformat is a Web-based data formatting approach that seeks to re-use existing content as metadata.

The site is labeled a demonstration, and the Yahoo logo is visible in a funereal black, which I quite like. The service is called Microsearch. The system supports supports RDFa marked-up pages plus some other semantic formats. Yahoo says:

Microsearch is a richer search experience combining traditional search results with metadata extracted from web [sic] pages. At the moment your Yahoo! Search is enriched in three ways: [a] by showing ‘smart’ snippets that summarize the metadata inside the page and allow to take action without actually visiting the page; [b] by showing map and timeline views that aggregate metadata from various pages, [c] by showing pages related to the current result.

I had to dig a bit to find the explicit connection with the Semantic Web, but the site offers a version of semantic search. Yahoo includes a link to the Semantic Web page at the World Wide Web consortium.

Let’s look at the system. Yahoo provides some suggested queries, but I prefer my own.

My first query was “enterprise search”. The system returned the following result page:

The map was visually arresting, but it was irrelevant to the query and the result set. I looked at the results and was surprised to find Microsoft was the number two result. The other results were okay. The same query on Google returned more Microsoft links. My conclusion was that the “semantic” feature on Yahoo worked about as well as regular Google. The other conclusion I drew was that Microsoft is working hard to come up at the top of the results list for the word pair “enterprise search”. Too bad I don’t think of Microsoft and enterprise search as sector leaders.

My second query was for the phrase “Michael Lynch Autonomy”. Here’s what Microsearch displayed:

For this query, the map did not render. I assumed that the system would show me the location of Autonomy’s headquarters in the United Kingdom. Sigh. Microsearch is at version 1.4 on July 3, 2008, and whizzy features should be working. The results were stale. The top ranked hit was a 2006 interview. My recollection is that the Financial Times ran an essay by Mr. Lynch a few days ago. Alas, the system seems unable to factor time into its results ranking. News stories often carry time and date data, and News XML includes explicit tags for these data. I ran the same query on standard Google. Google returned the results set more quickly than Yahoo. Google’s results were poor. The first hit was to someone other than Autonomy’s Mike Lynch. The other hits were more stale than Yahoo’s. Autonomy may want to emulate Microsoft’s search engine optimization push.

Observations

The semantic features of Microsearch did not appear front and center. The mapping function did not work. Compared to Google, Yahoo performed as well as market leader Google. To be fair, Google’s results were not too good and Yahoo hit that benchmark.

Agree? Disagree? Let me know.

Stephen Arnold, July 3, 2008

Written by Stephen E. Arnold · Filed Under News, Online (general), Semantic, Text processing, Yahoo | 2 Comments

Answering Questions: Holy Grail or Wholly Frustrating

July 2, 2008

The cat is out of the bag. Microsoft has acquired Powerset for $100 million. You can read the official announcement here. The most important part of the announcement to me was:

We know today that roughly a third of searches don’t get answered on the first search and first click…These problems exist because search engines today primarily match words in a search to words on a webpage [sic]. We can solve these problems by working to understand the intent behind each search and the concepts and meaning embedded in a webpage [sic]. Doing so, we can innovate in the quality of the search results, in the flexibility with which searchers can phrase their queries, and in the search user experience. We will use knowledge extracted from webpages [sic] to improve the result descriptions and provide new tools to help customers search better.

I agree. The problem is that delivering on these results is akin to an archaeologist finding the Holy Grail. In my experience, delivering “answers” and “better results” can be wholly frustrating. Don’t believe me? Just take a look at what happened to AskJeeves.com or any of the other semantic / natural language search systems. In fact, doubt is not evident in the dozens of posts about this topic on Techmeme.com this morning.

So, I’m going to offer a different view. I think the same problems will haunt Microsoft as it works to integrate Powerset technology into its various Live.com offerings.

Answering Questions: Circa 1996

In the mid 1990s, Ask Jeeves differentiated itself from the search leaders with its ability to answer questions. Well, some questions. The system worked for this query which I dredged from my files:

What’s the weather in Chicago, Illinois?

At the time, the approach was billed as natural language processing. Google does not maintain comprehensive historical records in its public-facing index. But you can find some information about the original system here or in the Wikipedia entry here.

How did a start up in the mid-1990s answer a user’s questions online? Computers were slow by today’s standards and expensive. Programming was time consuming. There were no tools comparable to python or Web services. Bandwidth was expensive and modems, chugged along south of 56 kilobits per second, eagerly slowing down in the course of a dial up session.

I have no inside knowledge about AskJeeves.com’s technology, but over the years, I have pieced together some information that allows me to characterize how AskJeeves.com delivered NLP (natural language processing) magic.

Humans.

AskJeeves.com compiled a list of frequently asked questions. Humans wrote answers. Programmers put data into database tables. Scripts parsed the user’s query and matched it to the answers in the tables. The real magic, from my point of view, was that AskJeeves.com updated the weather table, so when the system received my query “What is the weather in Chicago, Illinois?”, the system would pull the data from the weather table and display an answer. The system also showed links to weather sites in case the answer part was incorrect or not what the user wanted.

Over time, AskJeeves.com monitored what questions users asked and added these to the system.

What happened when the system received a query that could not be matched to a canned answer in a data table? The system picked the closest question to what the user asked and displayed that answer. So a question such as “What is the square of aleph zero plus N?” generated an answer along the lines “The Cubs won the pennant in 1918?” or some equally crazy answer.

AskJeeves.com discovered several facts about its approach to natural language processing:

Humans were expensive. AskJeeves.com burned cash. The company tried to apply its canned question answering system to customer support and ended up part of the Barry Diller empire. Humans can answer questions, but the expense of paying humans to craft templates, create answer tables, and code the system were too high then and remain cash hungry today.
Humans asked questions but did not really mean what they asked? Humans are perverse. A question like “What’s a good bar in San Francisco?” can go off the rails in many ways. For example, what type of bar does the user require? Biker, rock, blue collar? What’s San Francisco? Mission, Sunset, or Powell Street? The problem with answering questions, then, is that humans often have a tough time formulating the right question.
Information changes. The answer today may not be the answer tomorrow. A system, therefore, has to have some way of knowing what the “right” answer is in the moment. As it turns out, the notion of “real time”–that is, accurate information at this moment–is an interesting challenge. In terms of stock prices, the “now quote” costs money. The quote from yesterday’s closing bell is free. Not only is it tricky to keep the index fresh, to have current information may impose additional costs.

This mini-case sheds light on two challenges in natural language processing.

Written by Stephen E. Arnold · Filed Under Enterprise, Feature, Google, Microsoft, Online (general), Semantic, Text processing | Comments Off on Answering Questions: Holy Grail or Wholly Frustrating

ZDNet Says, Powerset Won’t Change the Search Equation

July 2, 2008

Larry Dignan has another good essay, “Microsoft’s Search Plan: It’s about Semantics and Possibly for Naught”. You can read the full essay here. Mr. Dignan believes that Microsoft gets some smart people and maybe a boost. He concludes:

However, Microsoft can reinvent search, but it’s still running up a natural Google monopoly. The analogy here is Windows: Microsoft didn’t have the best operating system on the planet. It just had the best positioned one. In search, the tables are turned in Google’s favor. I don’t see how Powerset will change that equation.

He is correct and diplomatic. My view is that semantic technology may help Microsoft with certain narrow functions. But applying the Powerset technology across the 12 billion Web pages that Microsoft says it has indexed will take some clever engineering. Semantic technology has to operate on the source content and figure out what the heck the user means. Google uses short cuts even though it has some serious semantic brainpower at the Googleplex. It is not just technology; it is plumbing that can be scaled economically and operated with tight cost controls.

Microsoft has money, but I am not sure it has enough time. The Google keeps lumbering forward. Microsoft has to find a way to jump over Google and take the high ground. Catching up won’t work. This is the calculus of Microsoft’s search challenge.

Stephen Arnold, July 2, 2008

Written by Stephen E. Arnold · Filed Under Google, Microsoft, News, Online (general), Search, Semantic | Comments Off on ZDNet Says, Powerset Won’t Change the Search Equation

Search: An Old Taxi with a Faux Cow Hide Interior

July 2, 2008

The last time I was in a big city I hailed a taxi. What a clunker. It smelled of fast food, incense, and hot plastic. One fender was dented and the curb side door would not open. The window would not go down. “She dead,” smiled the driver. The interior of the taxi had a set of blinking lights popular at holiday times. The taxi was a mess, but the faux cow interior was unusual. lights were working.

Thanks to ABC Australia for the photo. The original is here. http://www.abc.net.au/news/newsitems/200610/s1770336.htm

I have been clicking and scanning the opinions about the Microsoft Powerset deal. Scanning the links at Congoo.com, Megite.com, and Techmeme.com will take a long time. I have been a slacker, clicking at random and looking for some substantive news.

Why is search like a lousy taxi with a useless faux cow hide interior?

My thought for this evening is that search is string matching. The other functions are ways to:

Make it easier for a busy person who does not have time or the desire to read a traditional document; that is, a multi page report.
Show the user what is available and push the user toward that information. The user, who doesn’t want to make this effort, will let the software do the work.
Support a user who is not to swift when it comes to thinking about abstract digital data.
Reduce the time a user spends fumbling for information.
Put training wheels on a worker who forgets work processes the way I forget where I put my automobile keys five minutes ago.

What’ happening is that key word search, string matching, and its kissing cousin Boolean are the lousy taxi. Good enough but not too pleasant.

The cow interior for search are these types of enhancements:

Assisted navigation, a fancy term for Use For and See also references
Clustering, putting like things together in a folder or under a heading
Discovery, an interface that provides an overview of information
Semantic search, a system that figures out what you mean when you type a two word query
Natural language processing, a term that now means answering a question, assuming that someone takes the time to think up a question and type it into a search box
Dashboards, a report that has panels or containers, each containing different information. Some dashboards look like speedometers with text; others can be quite fanciful.
Access to metadata about what person in an organization gets the most email about a specific technical issue. This type of monitoring and analysis is now called social search because surveillance is not politically correct in many circles.

You get the idea.

Possible impacts

Let’s consider the consequences.

First, enterprise search is complicated. Today I spoke with an enthusiastic and young professional. The call touched upon creating a plan for enterprise search. Like most organizations, this outfit has three separate enterprise search systems. None work all that well, so the phone rings. This is a common situation, and I am not to optimistic that enterprise search will work very well when there are competing factions each with a favorite search engine to support. Adding whizzy new functionality adds to the cost and complexity, and I am not convinced users want to do much more than find the needed information and move on to another task.

Written by Stephen E. Arnold · Filed Under Enterprise, Feature, Online (general), Search, Semantic | Comments Off on Search: An Old Taxi with a Faux Cow Hide Interior

Autonomy PR Coup: The Financial Times Essay

July 1, 2008

I gave up on the hard copy of the Financial Times. The Pearson operations wizards could not do reliable daily delivery in rural Kentucky. Fortunately, a colleague in a more civilized part of the world send me a copy of “Embracing the Friend, Taming the Beast–Web 2.0 in the Enterprise” by Sir Michael Lynch, Ph.D., Founder and Chief Executive Officer of Autonomy. You can try to find the article on the Financial Times’s Web site here. (Note: The FT is working to correct the search sins in its past, but it is a work in progress. You can also read the good summary by Oliver Marks, a ZDNet columnist here.)

The thrust of Dr. Lynch’s essay is Autonomy, not Web 2.0. The idea is that a host of technologies–social networks, folksonomies, wikis, blogs, Web services–are finding their way into organizations. The key point of Dr. Lynch’s essay was for me:

Next-generation solutions are available to help enterprises organize, manage and regulate user-generated content in a secure, consistent and scalable manner to ensure that employees benefit from instant access to relevant information and that brand integrity is properly protected. Such solutions bring conceptual understanding and an unprecedented level of automation to content management and address liabilities by continually reading entries, spotting problematic content and removing it in real-time. In addition, they can automatically reconcile tags that differ but are close in meaning, or actually provide the level of specificity needed in the enterprise that social methods struggle to deliver.

If you review other write ups about Dr. Lynch’s essay, you will find emphasis placed on the whizzy technologies enumerated above. My view is that the essay sets forth Autonomy’s value proposition for its enterprise software, IDOL or the Integrated Data Operating Layer.

The public relations coup is that this outstanding positioning piece appears about six weeks after the Casenove report about Autonomy. This is not a public report, but I wrote a short note about the report and its assertions here. As I understood the analysis, Autonomy has to make its business model perform at peak efficiency and create an environment in which Autonomy’s acquisitions can generate strong growth.

The difference between my reading of this excellent essay and that of Mr. Marks’ interpretation boils down the the difficulty of pinpointing exactly what business some vendors focus their sales efforts. To illustrate: Mr. Marks refers to Autonomy as a CMS vendor. “CMS” means to me “content management system”. I do not think of Autonomy as a primary provider of content management solutions. The company makes a strong case on its Web site and in some of its presentations that I have witnessed as a leader in search with strong competency in video search, fraud detection, and eDiscovery. The blurring of what Autonomy “does” makes it difficult for some potential customers to know in which software category to place Autonomy, for example.

My reading is informed by my knowledge of Autonomy’s search technology. The paragraph I highlighted above says to me, “Autonomy can contribute significantly to a world in which user-generated content exists with other types of information.”

So, I say, “PR coup.” I anticipate similar “essays” from Endeca, Microsoft Fast, and possibly Oracle. Allowing Autonomy to define the terms for “search” cedes some market influence to Autonomy. The editor of the Financial Times will be invited to some interesting lunches as vendors and their public relations professionals lobby to place another “essay” in front of Financial Times’s readers.

Stephen Arnold, July 1, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, News, Search, Semantic, Text processing | Comments Off on Autonomy PR Coup: The Financial Times Essay

Business Week: Microsoft’s Search Moves Analyzed

July 1, 2008

Catherine Holahan’s “Microsoft’s Plan B for Search” popped into my news reader this morning. The interesting essay–almost a business school-type write up–appears on the Business Week Web site here. I think the story will appear in some form in the hard copy magazine, but I read the online version this morning, July 1, 2008.

Ms. Holahan looks at the alleged Powerset buy out by Microsoft. The “Plan B” is acquiring additional search technologies in the aftermath of Redmond’s failed Yahoo deal. Her analysis is closely-reasoned, so it is difficult for me to summarize the argument.

I did find one point of particular interest; that is:

Rather than focus on creating one consumer-facing site capable of answering any query, like Google has, Microsoft has split its search engine into specific categories—a comparison-shopping engine, Microsoft Live Cashback; a travel search engine, Farecast; and a health-specific search engine, health.live.com. Today, semantic search engines do best with such category-specific searches, which help them to scan a smaller set of pages in detail. Scanning the entire Web in that much detail is difficult to do quickly.

Business Week has done a good job of explaining that Microsoft has a more fractionalized approach to search than Google. Keep in mind, however, that Google is not a single piece of digital cloth. There are different search mechanisms in operation at Google; specifically, the search system used for Google Base differs from the search system used for the search box on Google.com. The Google Search Appliance is also moving in its own direction as well.

In general, I applaud Ms. Holahan for identifying the difference initiatives within Microsoft. She has also identified two other interesting “semantic engines”. The first is Hakia, a company that offers a “Compare Hakia” function here and the Berggi Search for mobile devices other than the BlackBerry or iPhone, which limits the market somewhat. Hakia is working to generate the type of buzz that Powerset’s team found so effortless. She also mentions Expert System, based in Modena, Italy, and founded in 1989. The firm has beefed up its US presence with a new president and a more focused public relations campaign. You can learn more about Expert System here. Expert System has gained some traction for its software componetns in the mobile search market and has a lower profile in North America than Italy.

Observations

The buzz about semantic search is gaining pitch and volume. My view is that semantic search is not an end in itself; it is a component of a search system. Vendors of semantic search are likely to find warmer welcomes as utilities or refinement functions within larger constellations of information retrieval methods. I guess I don’t buy the notion of “semantic search”.
The key difference between Google and Microsoft boils down to the fact that Google has been working on its infrastructure for a decade. Without a honking big super computer, semantic technology is tough to implement on [a] large amounts of content and [b] content that changes frequently. The well known problem of updating indexes becomes quite challenging.
Fragmented search is not necessarily a bad thing. But when there are many different search systems, costs become a problem quickly. Each system requires its own technology, engineers, and infrastructure. Google–while not homogeneous–avoided the “pushcart full of junk” approach taken by Yahoo. Microsoft, with its purchase of Fast Search & Transfer, may be unconsciously following the Yahoo model. Google’s approach of greater, not less, search homogeneity is the lower cost path. I was surprised Business Week’s B-school analysis of Microsoft’s Plan B ignored cost as a factor. Cost is a very big deal in search, which is the reason search vendors crash and burn. There’s no money to buy fuel.

Agree? Disagree? Use the comments section of this Web log to inform me of my intellectual short comings.

Stephen Arnold, July 1, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, Microsoft, News, Online (general), Search, Semantic, Text processing | Comments Off on Business Week: Microsoft’s Search Moves Analyzed

Powerset Nails Search: A Very Bold Assertion

June 29, 2008

Chris Gaylord, writing for the Christian Science Monitor, updates a May 2008 essay, and emphasizes this point:

Google has been a bit dismissive of semantic search, preferring (for now at least) its quick keyword approach. But this Microsoft news puts a lot of weight – and $100 million – behind the notion that web users want to ask questions to a search engine, not just feed it keyword clues. We have yet to see if Microsoft will keep the Powerset name or, more likely, integrate the technology into its Live Search. That site certainly needs some help. The company has fought a losing battle against Google and Yahoo for years now. Despite its best efforts and even cash incentives, Microsoft has not been able to distinguish itself. Offering a strong semantic search option is a good way to reboot the challenge.

You can read the full document here.

You may recall that the original Ask Jeeves answered questions. Human figured out answers, put them in a file, and the Ask Jeeves’ system converted the user’s query to a form that could be matched against the canned answers. The buzz about this surged in the late 1990s, but the cost of the Ask Jeeves’ approach was high, and in my view, the system did not work very well.

The desire of information retrieval mavens to take a question, any question, and have software answer it makes some folks darned excited. The technology to answer questions continues to advance, and it is possible to get answers from a number of different systems. I have participated in meetings where smart people much more enthusiastic than I argued about the importance of having a system answer a question.

I have written about NLP or natural language processing in the first three editions of the Enterprise Search Report, and I added some information in my April 2008 Beyond Search study for Gilbane Group. Let me offer some observations:

I don’t type queries into search engines. I prefer Boolean statements and point-and-click interfaces that let me “see” what’s in an indexed corpus. My experience is that typing questions is not too popular, nor is the notion of chopping text from an article and letting a search system find “more like this”. I have an installation of the Brainware trigram system, and it is useful–far more useful to me than asking “When did Columbus discover America?” if indeed he did. No NLP system can make much sense of a short query in the context of archaeological research about pre-Kit Columbus visitors to the North American landmass. Nope, that type of question answering will take a bit more lab work.
NLP imposes considerable computational load on both the document indexing subsystem and the query processing subsystem. I saw an impressive set of PowerPoint slides at the 2007 BearStearns’ Internet conference, and I fiddled with the Wikipedia demonstration in 2008. What I have not seen is proof that Powerset’s amalgam of Xerox technology and its proprietary code scaling. Without scaling, NLP is likely to remain interesting but of little use to me.
Microsoft, like Yahoo, is now in the business of collecting search technologies. There are two “flavors” of SharePoint search. There is the Fast Search & Transfer technology. There is the whizzy new Live.com search. There is search in XP, in Vista, in SQL Server, and probably other search technologies I don’t know about. Toss in Powerset. What the collection resembles is a yard sale, not an exhibit of Etruscan tomb art at the British Museum. Search has to be more than a yard sale in its design, architecture, and technical framework. The cost of integrating this stuff is more than my check book can support.

I appreciate the enthusiasm for Microsoft becoming more competitive. Let us not forget that Google has been doing pretty much the same thing–it’s one trick pony show–for a decade. With two thirds of the market for Web search, Microsoft has some work to do to become a number two in search. Google continues to seep into the enterprise via osmosis. Let’s face facts. Customers have to buy from Google. Google is not very good at sales, customer support, or communicating what its gizmos can do. Microsoft is a good sales organization, but it is watching Google challenging its enterprise revenue the way spilled ink spreads on a white table cloth. And, Google has serious semantic technology which is a widget in a larger data management solution at Google.

Keep cheerleading for Microsoft. Just keep the challenges of NLP in mind. Agree? Disagree? Let me know so I can learn what I don’t now know.

Stephen Arnold, June 30, 2008

Written by Stephen E. Arnold · Filed Under News, Search, Semantic, Text processing | Comments Off on Powerset Nails Search: A Very Bold Assertion

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Microsoft Powerset Could Unseat Google

SurfRay AB Update

Google: Another Angle on Question Answering

Yahoo’s Semantic Search Still Available

Answering Questions: Holy Grail or Wholly Frustrating

ZDNet Says, Powerset Won’t Change the Search Equation

Search: An Old Taxi with a Faux Cow Hide Interior

Autonomy PR Coup: The Financial Times Essay

Business Week: Microsoft’s Search Moves Analyzed

Powerset Nails Search: A Very Bold Assertion

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta