Cognition Rolls Out Semantic Medline

July 30, 2008

Resource Shelf reports that Cognition Technologies has indexed Medline content with its semantic search system. The new service is free, and you can try it yourself at http://www.semanticmedline.com/. Remember that you will be searching abstracts, not the full text of medical documents.

You can read the Resource Shelf story here. The point that jumped out at me was:

[This is] a new free service that enables complex health and life science material to be rapidly and efficiently discovered with greater precision and completeness using natural language processing (NLP) technology.

Cognition Technologies, like Hakia, develops semantic search and content processing systems. You can find out more about the company here. The company also offers a demonstration of its content processing applied to the Wikipedia. You can access that service here.

Stephen Arnold, July 30, 2008

Intel Chases the Cloud a Second Time

July 30, 2008

I wrote about Convera’s present business in vertical search here because I heard that Intel was going to chase clouds again. But before we look at the new deal with Hewlett Packard (the ink company), Yahoo (goodness knows what its business is now), and Intel, let’s go back in time.

Remember in late 2000 when Intel signed a deal with Excalibur? Probably not. Convera was the result of a fusion of Intel’s multimedia unit and Excalibur Technologies. When this deal took form, Intel had 10 data centers.

An Intel executive at the time was quoted in Tabor Communications DSstar saying:

We are creating a global network of Internet data centers with the goal of becoming a leader in world-class Internet application hosting and e-Commerce services, said Mike Aymar, president, Intel Online Services. The opening of a major Internet data center in Virginia is a key step toward this goal. We’ll bring our reliable and innovative approach to hosting customers running mission-critical Internet applications, both in the U.S. and around the world.

Part of the deal included the National Basketball Association. Intel and Convera would stream NBA games. These deals were complex and anticipated the online video boom that is now taking place. The problem was that Intel jumped into this game with Convera technology that was shall we say immature. In less than a year, the deal blew up. The NBA terminated its relationship with Convera. By the time the dust and law suits settled, the total price tag of this initiative was in the hundreds of millions of dollars.

Outside of a handful of Wall Street analysts and data center experts, few people know that Intel anticipated the cloud, made a play, muffed the bunny, and faded quietly into the background until today.

Intel is back again and demonstrating that it still doesn’t have a knack for picking the right partners. The big news is that Intel, HP, and Yahoo are going to tackle cloud computing. The approach is to allow academic researchers to collaborate with industry on projects. The companies will create an experimental network. In short, risk is reduced and the costs spread across the partners. You can read Thomson Reuters’ summary here.

Will the chip giant’s Cloud Two initiative work?

Sure, anything free will garner attention among academics and corporate researchers. Will the test spin money for the ink vendor and the confused online portal? Probably not.

keystone kops

Rounding up more cloud computing suspects.

But there’s another angle I want to discuss briefly.

Intel pumped money in Endeca, a well-regarded search and content processing company. You can refresh your memory about that $10 million investment here.

Is there a connection between this investment in Endeca and today’s cloud computing announcement from Intel? I believe there is. Intel is making chips with CPU cycles to spare. Few applications saturate the processors. With even more cores on a single die coming, software and applications are lagging far behind the chips capabilities.

Read more

Funnelback CTO Interview Now Available

July 29, 2008

Dr. David Hawking, the chief technical officer of Funnelback, has joined the search and content processing company full time. Dr. Hawking is well known among the information retrieval community. His students have joined Google and Microsoft Research. Dr. Hawking’s interview with ArnoldIT.com is now available as part of the Search Wizards Speak series at www.arnoldit.com/search-wizards-speak.

Dr. Hawking said that Funnelback, now in version 8, delivers search ranking quality and tunability, geospatial query processing, folksonomy tagging of search results, streamlined set up and configuration, customizable work flows, and a software as a service option. In short, Funnelback is a capable enterprise search solution.,

Located in Canberra, Australia, the Funnelback system has a number of high profile clients in Australia and New Zealand. The company also has clients in the United Kingdom and Canada.,

Dr. Hawking said,

Funnelback includes an intuitive Web based administration interface for configuration, user interface customization and viewing query reports. No programming skills are required for the majority of configuration tasks, but deeper integrations can be achieved by developing specific interfaces to work with various enterprise application such as content management systems or portal applications.

The next release of Funnelback will appear in the first half of 2009. The company has plans to expand into other countries, but Dr. Hawking would not reveal specific plans for new offices. He hinted that Funnelback is working on solutions for vertical markets. The company already has a vertical implementation for one of Australia’s law enforcement agencies. That project has been well received by the users.

You can read the full text of the interview here. Information about the company is here.

Stephen Arnold, July 29, 2008

Google’s Publishing Baby Step

July 29, 2008

I have written about Knol, Google’s publishing technology in Google Version 2.0. Outsell (a consulting firm) recycled some on of my Google publishing research in the summer of 2007. I will have an update available from my UK publisher, Infonortics, Ltd., in Tetbury, Glou., in September 2008. If you want to read my take of Google’s publishing technology, you can snag a copy of Google Version 2.0 here. In my analysis, Knol is a publishing baby step, but it is an important one because it delivers two payoffs: [a] content to monetize and [b] inputs for Google’s smart software. I explain why Google wants to process quality content, not just Webby dogs and cats in Google Version 2.0.

You may also want to read Andrew Lih’s “Google Know Wikipedia Comparison Faulty” analysis here. Mr. Lih does a good job of pointing out what Knol is and is not. Particularly useful to those confused about the competition Google faces, Mr, Lih’s identification of Google’s “real competition” is solid.  The part of his essay I enjoyed was his “grading” of those who were covering the Knol story. He identifies who did poorly, those who were stuck in the mire of the bell curve, and the informed souls who received a gold star for excellence. I won’t spoil your fun, but you will find at the back of the class some names with which  you will be familiar.

A happy quack to Mr. Lih.

Stephen Arnold, July 29, 2008

Financial Close Dance: Connotate and High Step Rumba

July 29, 2008

BobsGuide.com revealed on July 23, 2008, that High Step Capital (yep, a money outfit) is using Connotate’s agent technology in a clever new way. In the financial world, clever means finding a way to make money in today’s unsettled market.

BobsGuide.com reports:

By creating a group of agents that monitor real-time changes to information on multiple Web sites-for example, for monitoring the prices of electronic products from competing companies-and aggregating the results, a user can create a real-time feed of prices or other information… That feed is loaded into a database hosted by Connotate, which provides a Web portal for Jones to view the data online or to download the information into spreadsheets…

You can read the full story “High Step Adds Connotate Data to Models” here. For more information about Connotate, you can visit the company’s Web site here, or you can buy a copy of my April 2008 study Beyond Search here.  Connotate competes with Relegence, a unit of America Online which is owned by Time Warner. You can read about Relegence here.

Why is this important?

Services that merge internal and external data are one of the Web 2.0 technologies that work and deliver fungible payoffs. Some Web 2.0 functions are nifty but tough to tie to a financial benefit.

Stephen Arnold, July 29, 2008

Opinion: Cuil, Google, and Microsoft

July 28, 2008

Before I go out and feed the geese on my pond in Harrods Creek, I wanted to offer several unsolicited comments about Microsoft, Cuil, and search.

First, now that Microsoft has its own search technologies, Fast Search & Transfer’s search technologies for the enterprise and the Web, and Powerset’s search technologies, does Cuil look cool?

This is a tough question, and I don’t think that Microsoft had much knowledge of the Cuil team and its work ins search. My research suggests that work on Cuil began for real in 2007. The work profiles of the Cuil team is decidedly non-Microsoft. My thought is that Microsoft did not have a competitive profile about this company. My working hypothesis is that this search system struck Microsoft like a bolt from the blue.

Second, will Microsoft buy Cuil? This is a question that will probably garner some discussion at Microsoft. The Linux “heads” at Microsoft will probably resonate with the idea. Cuil incorporates some of the “beyond” Google technology that one can find at Exalead and now at Cuil. The architecture of these “beyond” Google operations might be quite useful to Microsoft. On the other hand, Microsoft is charging forward with its own approach to massively parallel distributed systems that the “beyond” Google engineering would be a touch pill to swallow.

Third, will Cuil get traction? The answer is yes. My hypothesis is that the folks who flock to Cuil will be Google users, but the real impact of Cuil may well be taking orphaned or disaffected users from Ask.com, Live.com, and Yahoo.com search.

The short term impact on Google may be significant for several reasons:

  1. Cuil has poked a finger in Google’s eye with its user tracking policy. Simply stated, Cuil won’t build user and usage profiles that tie to an individual in a stateful session or to an individual assigned to a fine grained group of clusters in a stateless session. See my July August KMWorld feature for more about the data model of this type of tracking.
  2. Cuil hit Google with its larger index of 120 Web pages processed to Google’s 30 to 40 million pages. Keep in mind that size doesn’t matter, but it is a public relations hook that could snare Googzilla around the ankles.
  3. Cuil includes bells and whistles that have not be released on the public Google system. For example, there are snazzier results displays, insets for suggested searches, and tabs to allow slicing results. Google has these features, but the GOOG keeps them under wraps. Right now, Cuil looks cooler (pun intended). The Cuil search page is black which even says “green”. Clever.

Google now has to sit quietly and watch Xooglers implement features that Google has had in the can for years. Interesting day for both Microsoft (Should we buy Cuil too?) and Google (What’s the next step for the Xooglers’ service?).

Stephen Arnold, July 28, 2008

Cuil Your Jets: Take Offs Are Easier than Landings

July 28, 2008

Digging through the rose petals is tough going. Cuil, it seems, has charmed those interested in Web search.

Balanced Comments Are Here

Among the more balanced commentaries are:

  • Search maven Danny Sullivan here who says, “Can any start-up search engine “be the next Google?” Many have wondered this, and today’s launch of Cuil (pronounced “cool’) may provide the best test case since Google itself overtook more established search engines.”
  • Michael Arrington, TechCrunch here, says, “Cuil does a good job of guessing what we’ll want next and presents that in the top right widget. That means Cuil saves time for more research based queries.”
  • David Utter, WebProNews here, says, “The real test for Cuil when it comes back will be how well it handles the niche queries people make all the time, expecting a solid result from very few words.”

Now, I don’t want to pull harder on this cool search stallion’s bit. I do want to offer several observations:

First, the size of indexes don’t matter. If I am looking for the antidote to save a child’s life, the system need only return one result–the name of the antidote. The “size matters” problem surfaced decades ago when ABI / INFORM, a for fee database with a typical annual index growth of about 50,000 new records, found itself challenged as “too small” by a company called Management Contents. Predicasts jumped on the bandwagon. The number of entries in the index does not correlate to satisfying a user’s query. The size of the index provides very useful data which can be used to enhance a search result, but size in and of itself does not translate to “good results”. For example, on Cuil, run the query beyond search. You will see this Web log’s logo mapped to another site. This means nothing to me, but it shows that one must look beyond the excitement of a new system to explore.

Second, the key to consumer search engines is dealing with the average user who types 2.3 terms per query. The test query spears on Cuil returns the expected britney spears hits. Enter the term britny, and you get very similar results, but the graphics rotate plucking an image from one site and mashing it into the “hit”. Enter the query “brittany” and you get zero hits for Ms. Spears, super star. The fuzzy spelling logic and the synonym expansion is not yet tailored for the average user who can spell Ms. Spears more than 400 ways if I recall a comment made by Googler Jeff Dean several years ago.

Third, I turned on safe search and ran my “brittany” query. Here’s what I saw in the inset that allows me to search by category.

cuil playboy

I like Playboy bunnies, and we have dozens of them hanging around the computer lab here in Harrods Creek. However, in some of the libraries in the Commonwealth of Kentucky, a safe search function that returns a hutch of Playboy bunnies can create some excitement.

Fourth, it is not clear to me what learnings from WebFountain, Dr. Patterson’s Google patent documents, and Mr. Monier’s learnings from the AltaVista.com/eBay/Google experiences have or have not found their way into this service. Search is a pretty difficult challenge as Microsoft’s struggles attest over the last 12 or 13 years. My hunch is that there are some facets to the intellectual property within Cuil that warrant a lawyer with a magnifying glass.

Net Net

I applaud the Cuil team for getting a service up and running. Powerset was slow out of the starting blocks and wrangled a pay day with a modest demo. Cuil, in a somewhat snappier, way launched a full service. Over the coming weeks and months, the issues of precision, recall, relevance, synonym expansion, and filters that surprise will be resolved.

I don’t want to suggest this is a Google killer for several reasons. First, I learned from a respected computer scientist that a Gmail address set up for a test and not released seemed to have been snagged in a Cuil crawl. Subsequent tests showed the offending email address was no longer in the index. My thought was that the distance between Cuil and Google might not be so great. Most of the Cuil team are Xooglers, and some share the Stanford computer science old school spirit. Therefore, I want to see exactly how close or far Cuil and Google are.

Second, the issue of using images from one site to illustrate a false drop on another site must be resolved. I don’t care, but some may. Here’s an example of this error for the query beyond search.

arnoldit art another site hit

If this happens to a person more litigious than I, Cuil will be spending some of its remaining $33 million in venture funds to battle an aggrieved media giant. Google has learned how testy Viacom is over snippets of Beavis and Butt-head. Cuil may enjoy that experience as well.

To close, exercise Cuil. I will continue to monitor the service. I plan to reread Dr. Patterson’s Google patent documents this week as well. If you want to know what she invented when working for the GOOG, you can find a eight or nine page discussion of the inventions in Google Version 2.0. A general “drill down” notion is touched upon in these documents in my opinion.

And, keep in mind, the premise of The Google Legacy is that Google will be with us for a long time. Cuil is just one examples of the Google “legacy”; that is, Xooglers who build on Google’s approach to cloud based computing services.

Stephen Arnold, July 28, 2008

Cool Discussion of Cuil

July 28, 2008

Xooglers Anna Patterson and Louth man Tom Costello (husband and wife brains behind Xift which sold to AltaVista.com and Recall), Louis Monier (AltaVista.com top wizard), and Russell Power (worked on TeraGoogle) teamed up to create a next-generation Google. Michael Liedtke’s New Search Engine Claims Three Times the Grunt of Google is worth reading. You can find one instance of the write up here.

TechCrunch wrote about Cuil in 2007. You can read that essay here. The key points in the TechCrunch write up were that Cuil can index Web content faster and more economically than Google. Venture funding was $33 million, which is a healthy chunk for search technology.

Mr. Liedtke pulls together some useful information. For me, the most interesting points in the write up were:

  • The Cuil index contains 120 billion Web pages.
  • Cuil is derived from an Irish name.
  • The search results will appear in a “magazine like format”, not a laundry list of results.
  • Google has looked the same for the last 10 years and will look the same in the next 10 years.

Although Dr. Patterson left Google in 2006, she authored several patent documents related to search. I profiled these documents in Google Version 2.0, and these provide some insight into how Dr. Patterson thinks about extracting meaning from content. The patent documents are available from the USPTO, and she is listed as the sole inventor on the patent applications.

Observations

If Cuil’s index contains 120 billion Web pages, it would be three times larger than Google’s Web page index of 40 billion Web pages and six times larger than Live.com 20 billion page index. Google has indexed structured data which makes the index far larger, but Google does not reveal the total number of items in its index. The “my fish was this big” approach to search is essentially meaningless without context.

The AltaVista.com connection via Louis Monier is important. A number of AltaVista.com engineers have not joined Google. One company–Exalead–has plumbing that meets or exceeds Google’s infrastructure. My thought is that Cuil will include innovations that Google cannot easily retrofit. Therefore, if Exalead has a killer infrastructure, it is likely that Cuil will have one too. As Mr. Liedtke’s article points out, Google has not changed search in a decade. This observation comes from Dr. Patterson and may have some truth in it. But as Google grows larger, radical change becomes more difficult no matter how many lava lamps there are in the Mountain View office.

The experience Dr. Costello gained in the Web Fountain work for IBM suggests that text analytics will get more than casual treatment. Analytics may play a far larger role in Cuil than it did in either Recall, Xift, or Google for that matter.

The knowledge DNA of the Cuil founders is important. There’s Stanford University, the University of Washington, and AltaVista.com. I make quick judgments about new search technology by looking for this type of knowledge fingerprint.

Other links you may find useful:

  • Cuil bios are here.
  • Independent Ireland write up about the company is here.
  • A run down of Xooglers who jumped ship is here and here.
  • A brief description of Xift is here.
  • Recall info is here. Scroll down to this headline “Recall Search through Past”
  • WebFountain architecture info is here. You have to download the section in which you have interest.

With $33 million in venture funding, it’s tough to determine if Cuil will compete with Google or sell out. This company is on my watch list. If you have information to share about Cuil, please, post it in the comments section.

Stephen Arnold, July 28, 2008

Amazon: Server to Server Chattiness

July 27, 2008

In general, observers are pleased with Amazon’s explanation about the recent outage. You can read what Amazon offered as the “inside scoop” here. Center Networks has a useful wrap up plus links to the company’s earlier comments about the Amazon outage. For me, the most interesting point in the Center Networks’ write up was its gentleness. The same thought wove itself through Profy.com’s take on the issue, reminding me that Amazon is offering a low cost service. You can read Profy’s view here. Message passing in distributed, parallelized environments is an issue. Too many messages and the system chokes and simply quits working. Anyone remember the nCube’s problems? Too few messages, the massively parallel system becomes like a sleep over for your daughter’s seven middle school chums. Message passing is an issue in the older Microsoft data center architectures, which I wrote about in a series of three posts for this Web log. To solve the problem in 2006, if I interpret the Microsoft diagram I included in my write up, Microsoft “threw hardware at the problem”. You can read this essay here.  Redmond then implemented a variety of exotic and expensive mechanisms to keep servers in sync, processes marching like the Duke University drill team, and SQL Server happily insulated from the chioking hands of Microsoft’s own code ninjas. Is there a better or at least less trouble-free way to accomplish messaging? Yes, but these techniques require specialized functions in the operating system, changes to who sends what messages and how the “master” in a procedure talks to “slaves”, and a rethink about how to keep message traffic from converging on a single “master”. Chattiness does not do justice to the technical complexity of the problem. You may want to navigate to Google.com and run a query on these terms: “Google File System”, “Chubby”, and “Jeffrey Dean”. The first term refers to add ons to Linux that Google implemented nine years ago. The second refers to one piece of Google plumbing for file locking and unlocking with references to the BigTable data management system. And “Dr. Dean” is a former AltaVista.com wizard who has become one of the people given the job of explaining how Google tackled messaging and related problems since 1999. I summarize most of the broad ideas in The Google Legacy (Infonortics 2005), but reading the primary source information can be illuminating. Google’s solution is not without its weaknesses, of course. “Chattiness” is not one of these vulnerabilities. In terms of total operations cost at Amazon, the AWS services get a tiny slice of the action. Amazon is using its infrastructure to squeeze value from its plumbing. I anticipate other issues going forward, and Amazon will address them. Over time, AWS will resolve “chattiness”. Perhaps the next problem will be minimized and repositioned as “an under cooked failed soufflé”? Wordsmithing is not engineering in my opinion. Agree? Disagree? Help me understand Amazon’s explanation of “offline”.

Stephen Arnold, July 28, 2008

AOL: What about Relegence

July 27, 2008

I am now longer surprised with the deep cloud of unknowing that reduces visibility in pundits’ reports. Someone (who shall remain nameless) sent me a report from a well known search “expert” (who shall also remain nameless) reporting on the changes at America Online.

AOL fumbled some of their opportunities in the last three years. Now the company is the darling of the analysts’ eye because the company is dumping services. I have a partner who is a former Naval officer. He remarked, “When you jettison stuff, you want to try and stay afloat.” Not much more deep thinking is required to understand why AOL Pictures, BlueString, and an online back up service called Xdrive are history. None has much traction compared to Flickr, YouTube.com, and literally dozens of cloud-based back up services. Here’s a link to a reasonably good summary of the received wisdom about AOL’s actions.

The report I mentioned earlier talked about solid AOL services; namely, email, instant messaging, and chat rooms. The consultant generated a laundry list of other AOL services, which you can find here without paying a consultant to prepare a custom study for you.

One interesting service, which is named “Money”, is reasonably useful. In fact, for some types of company research, one can argue that it is as good as either Yahoo Finance or Google’s Yahoo Finance clone aptly named Google Finance. I heard that one of the female engineers who worked on Yahoo Finance, jumped to Google to work on Google Finance, but that may be Silicon Valley chatter. There are some similarities.

AOL has muffed the bunny on its promotion of its service. First, navigate to http://money.aol.com. There are a number of point and click options. These range from changing the page layout to scanning through categories of information, blogs, headlines, and videos.

In November 2006, AOL acquired Relegence Corporation, originally started in Israel. This company–essentially unknown and untracked in the search and content processing world of New York punditry–developed technology that monitors, formats, and displays real time content streams. The company’s approach presaged the Connotate system (now the object of much love from Goldman Sachs) and Exegy (in deep mind meld with the financial and military intelligence sectors) in 1999.

toolbar

This is the Relegence tool bar. Users have one click access to news and other features.

In 2006, I thought that The Relegence Corporation was a very solid real-time financial services news engine, providing market and business intelligence to global buy-side and sell-side institutions. The company had an R&D center in Israel. In 2004, Relegence hooked up with search vendor X1, but that tie up has dropped off my radar. Relegence’s automated infrastructure aggregated relevant structured and unstructured information from internal resources and external research, including blogs, Web sites, email and over thousands of third-party sources in English and other languages. Relegence’s could deliver customized news in real-time to any communications device. When I looked at Relegence at the time of the AOL deal, I thought that Relegence was in a position to give InfoDesk, founded by a former Bell Labs wizard Sterling Stites, a run for the money.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta