More on Search and Multicore Processors

January 20, 2009

I poked around my collection of open source Google papers. I wanted to see if there were more information about bottlenecks in multicore CPUs but from the Google side of the street. As it happens, Mark D. Hill and Michael R. Marty wrote “Amdahl’s Law in the Multicore Era.” You can find a copy of this document from the Google Labs’s papers site or by clicking here. Mr. Hill is a professor at a Big 10 university; Mr. Marty is an engineer at Google. Like many of Google’s technical papers, the authors assume a working knowledge of certain computer processed. The paper makes clear that stuffing lots of cores on a piece of silicon does not translate to faster performance. For example, today’s chips are symmetric. The Googlers suggest asymmetric cores might be more useful. But–and there always is a “but”–the software must be adapted to the needs of the more efficient multicore design. If you use symmetric cores and traditional methods of pushing data around, bottlenecks choke performance. There are other issues identified in the paper as well. I will leave it to you to work through the analsyis and the list of other problems that require more engineering attention. After working through this paper, I had a question: “If Google is thinking about asymmetric multicore processors, will Google edge into the CPU business?”

The company does motherboards, routers, and cooling devices now. Note: Amdahl’s Law means that one fast gizmo won’t deal with bottlenecks created by other gizmos in the system. More information about Amdahl’s Law is here.

Stephen Arnold, January 19, 2009

IBM Lotus Notes: In the Cloud but Can I Find Emails

January 20, 2009

IBM’s Lotus Notes has been splashed across my trusty feedreader today (January 19, 2009). IBM is either kicking its Lotus Notes’s sales activity up a notch or the original Ray Ozzie program is undergoing a rebirth. The search function for Lotus Notes has been interesting. At Ziff Communications, we were early adopters of Lotus Notes. In the 1980s it was tough for me to locate a specific email. Last time I tried to locate emails and attachments in a Notes repository, the job was still tough two decades later. There were some specialized searching tools such as Grapevine. I am not sure if this system from Grapevine Technologies is still in business. Today, I can fire index Notes repositories with third party tools. These work pretty well until I have to dig out a Notes archive, figure out what is what, and then go through the indexing and searching fire drill.

Now Lotus Notes is going to the cloud. You can read the story in the Industry Standard here. According to Lincoln Spector, LotusLive provides a signal for the future of this “platform”. Mr. Spector writes in “IBM Shoots for the Cloud with LotusLive but Notes Pricing Is a Mystery”:

After a year of public beta under the name BlueHouse, LotusLive was officially announced Monday at the Lotusphere conference in Orlando. Users can sign up and start using two LotusLive services, Meetings and Events. Meetings integrates audio and video conferencing and costs $48 to $99 per month depending on the number of participants, or 25 cents per minute. Events is intended to help users manage and host an online conference. In addition to the actual conferencing, it also handles registration and other chores. Meetings costs $99 per month or 30 cents per minute per guest.

Like an infomercial, I am going to say, “Wait. There’s more.” IBM and SAP have teamed to make a “smarter workforce”, according to eWeek. Clint Boulton’s “IBM, SAP Ally on Alloy for Enterprise Collaboration here.” The new Alloy product combines Lotus Notes and SAP’s Business Suite. Now when two elephants with appetites for seven figure license deals team up, the result is going to be fascinating to watch.

The question that I had after reading these announcements was, “Okay, will I be able to search for a particular email and attachment in a way that is a marked improvement over the default string matching?” As the volume of email goes up, finding and managing email is particularly important.

There are third party tools from Wave Software here and Coveo who provide solutions. I can turn to Exalead, ISYS Search Software, and several other vendors for solutions as well.

But IBM is moving to the cloud with Lotus Notes, and I am not convinced that either IBM’s or SAP’s search and retrieval system is there yet. Announcements are fine, but when I need to locate an email, I want a low latency system that works. I don’t want to pay more money.

If anyone knows what I am missing with regard to findability, please, contribute a comment in the appropriate section of this Web log.

Stephen Arnold, January 20, 2009

Social Search Spoofing

January 20, 2009

I remain on the fence about social search. I did notice that the hardware and gadget manufacturer engaged in some social spoofing. I read at OverclockersClub.com that “Belkin Issues Apology for Paid Review Incident” here. The idea is pretty simple. Belkin enlisted folks to write positive posts about Belkin’s products. Happily supporting the activity was Amazon.com (owned in part by the world’s smartest man). Why not outsource social information management? I wonder if Google’s new “preferred site” function has any applications within Google? The function exists for a user, so Google itself can boost certain sites up as well. There is no OverclockersClub.com post on this subject. I find it an interesting idea to ponder. And, inside an organization might not a coven of entitlement children cross post on the internal social system? If I were 22 again, I might think about this as a way to appear higher in the rankings when my superior searched for a project on which I worked. If it is social, it is bound to be spoofed. Agree? Disagree? Let me know. In the meantime, those rave reviews about Belkin products: that is what I call disinformation. Works pretty well until someone discovers the ploy.

Stephen Arnold, January 20, 2009

Kosmix: YAGK (Yet Another Google Killer)

January 20, 2009

Kosmix like Cuil.com has some fibrous tendrils that connect to the Google. Not surprisingly, the Kosmix system does not tackle the Google head on. Think of Kosmix as an automated portal for information. When I visit the site, I see what’s new, I have “hot” topic to click and explore. I have trends. I have videos. In short, I get search without search. There is a search box, and it works reasonably well.

kosmix splash

Kosmix splash page. An information portal for the 21st century.

One of the wizards behind Kosmix is Anand Rajaraman, who has considerable visibility in the Silicon Valley technology world. I have followed his Web log posts because he has demonstrated keen insight into the technical activities at Google. In December 2008 he wrote “Kosmix Adds Rocketfuel to Power Voyage of Exploration” here. Several points earned a place in my notes about search; to wit:

  • Kosmix raised an additional $20 million in financing
  • Google=Search+Find. But Kosmix=Explore+Browse
  • The system is based on algorithmic categorization technology.

A feature summary appears on the Kosmix Web log here.

Read more

Autonomy and Xerox in Tie Up

January 20, 2009

Big news in the world of content processing and search: Xerox and Autonomy have struck a deal. According to this news story on Forbes.com “Xerox DocuShare Enters into OEM Agreement with Autonomy”, “The new license will allow Xerox to integrate Autonomy’s Intelligent Data Operating Layer (IDOL) technology into its DocuShare enterprise content management (ECM) platform.” Docushare is a content management system. The IDOL server will be integrated into the existing Docushare accounts worldwide.

David Smith, Xerox VP, said:

Content management technologies and services that help organizations save money, better manage content and improve efficiencies are essential in today’s business climate… The integration of Autonomy’s IDOL Server takes DocuShare’s ability to meet the needs of our global customer base to the next level.

Information about Docushare is here. Information about Autonomy IDOL is here. The content management sector has been hit by Microsoft’s SharePoint push. Other CMS vendors have beefed up their search and content processing services to withstand the “good enough” system available at competitive rates from Microsoft and its resellers. For example, Interwoven has a deal with Vivisimo.

The challenge for Xerox will be to hold on to its existing customers. The opportunity for Autonomy is to make upsells for other Autonomy functionality. If this deal works, perhaps Xerox will step forward and acquire Autonomy. The vendor has more than 16,000 licensees and a number of lucrative deals. Xerox has dabbled in search and content processing for many years. In fact, Microsoft licensed some of the Xerox search and content processing technology as part of Microsoft’s purchase of Powerset in 2008.

My question is, “What does Xerox know about Xerox PARC technology that prevents Xerox from using its own technology in the Docushare product?” This begs another question, “Does Microsoft know that Xerox has sidestepped Xerox PARC technology for the Autonomy IDOL system?”

Autonomy has a strong business in litigation support. I wonder if Xerox Litigation Services will avail itself of the Autonomy technology to address some of the shortcomings in the Xerox eDiscovery offerings. I don’t have any color for the financial terms of the deal. If I get some substantive information, I will post it.

Stephen Arnold, January 20, 2009

Exalead: Enterprise Search Vendor Migration

January 20, 2009

Exalead has made strong progress in its efforts to expand its footprint in the U.S. Last year, the company held an online seminar (called Webinars by those under 30) with one of my colleagues, Miles Kehoe, a senior wizard at New Idea Engineering. I just learned that on January 21, 2009, the company will host another online seminar. The topic is Enterprise Search Vendor Migration. With some organizations finding that their 1990s solution is a mismatch for today’s information needs, moving from an incumbent system to a 21st centruy platform is a priority. Exalead has enlisted the help of Forrester, large consulting firm, to explore this subject. You can register here. Exalead told me that it will make an audio version of the seminar available as well. You can get more information about Exalead here. One of the Beyond Search team will attend. Sounds like a good way to get information about adapting to today’s information’s challenges with a minimum of disruption.

Stephen Arnold, January 20, 2009

Search Spending Disambiguation

January 19, 2009

Search and content processing struggles with ambiguity; that is, figuring out the meaning of metaphors, homonyms, and other tricky parts of human utterance. The headline "Report: Search Spending Off 8 Percent in Q4" here is a headline that needs disambiguation. The "search" referenced is the online pay for placement and some other bits of the online advertising business. Beyond Search tracks spending in the enterprise search and content processing sector. Our update of the search and content processing vendors is prompting posts about companies that seem to have fallen by the wayside in enterprise search and enterprise content management. If you want a figure for how the financial crisis is affecting online advertising, the data reported by Greg Sterling are for you. For other slices of the industry, you will have to look elsewhere.

Stephen Arnold, January 19, 2009

Ad Age Advises Yahoo: Startling Strategic Counsel

January 19, 2009

I read this weekend that top job opening require technical or scientific training. Imagine my surprise when Ad Age, a dead tree publication for the Liberal Arts and Master of Fine Arts crowd, published “Four Ways Yahoo Can right Itself under New CEO Bartz.” You can read this remarkable article here. Keep in mind that Yahoo is a technology company. The products and services of Yahoo are based on software, systems, and other arcana that delight computer scientists and electrical engineers, leaving the art gallery and soft drink executives lost in a cloud of unknowing. Furthermore, if you have read my other commentaries about Yahoo, you know that the ills of Yahoo are a manifestation of a misalignment of technology and user needs. Fixing Yahoo, therefore, requires more than a public relations blitz and a handful of consultants to change the ad rate price schedule. Some of the Mad Ave ilk will point to the unsold Super Bowl TV spots and assert, “Yahoo needs to snap up these ad slots and make some brand impact.” Right, advertising online services on the Super Bowl will work just as it will for Ask.com’s sponsorship of NASCAR.

Abbey Klaasen, the Ad Age journalist, identifies four strategies for the Yahooligans.

First, Yahoo has to hang on to search. I am a bit fuzzy about what “search” is referenced. Yahoo has a cartload of search systems. My hunch is that Ad Age thinks about Web search and ignoring the Flickr and Delicious systems, which may have more sizzle than the so so Web search. There’s also mail search, the search on the personal section, and so on. Ad Age is aware of the sports and finance information, but I wonder how much analysis is going on at Ad Age. Anyway, the idea is keep “search”. Let’s assume that Yahoo is to keep its various forms of search.

Second, the recommendation is for Yahoo to “combine search and display data.” I have to admit that I am not sure what this means. Yahoo lacks a homogeneous system; therefore, combining any cluster of services means normalization, transformation, and manipulation of data. Yahoo had a project underway to rationalize some disparate data, but I am not sure if that is still underway or if it swam on rocks. Advertisers have been asking for access to specific slices of Yahoo demographics across services for a while. Yahoo can’t deliver these types of audiences because of technical issues. Yahoo is a technology company. If a service is not available, there’s a technical reason, not a managerial reason. If the cost of “fixing up” the system is too high, the service will not be available. Yahoo has not been able to focus its resources on certain technical problems because it has a GM problem; that is, GM knows what Toyota and Honda do to make autos. GM can’t change the culture nor can it amass the resources to implement the Toyota and Honda solutions. Yahoo’s engineers are smart. Some go to Google and become happy campers; for example, the Delicious.com founder. It’s not brains; it’s a fundamental technical problem exacerbated by cost and management.

Third, Ad Age wants Yahoo to sell “the Unilevers of the world”. My hunch is that this is a play that will require fixing search and audience data. It is going to be tough to repaid the Yahoo-mobile unless one has the right parts. Yahoo is going to require the equivalent of a resto-mod rebuild on the jalopy before the Unilevers pump more cash into the Yahoo advertising opportunity.

Fourth, buy Hulu. Yahoo has been fooling around with video for a while. In case anyone missed the news, Google has managed to make YouTube.com the number two search engine. Hulu.com is also way behind the Googlers in terms of traffic. I grant that Hulu.com is better than Yahoo’s video services. Follow me on this line of reasoning: If Yahoo’s previous attempts to do video have been less than stellar, why will Yahoo handle Hulu.com better. Does anyone remember Finance Vision or the original content production push with Lloyd Braun’s return here? So, I assert that Yahoo’s ability to integrate an acquisition is questionable. Yahoo took years to integrate the Yahoo photo site into Flickr. Let’s assume that Yahoo does buy Hulu. Can Yahoo contribute to the service? At this time, whatever management expertise Yahoo has will be stretched trying to deal with the existing Yahoo technology and financial problems.

In short, I find the Ad Age counsel pretty interesting. It’s not wrong as Mad Ave thinking goes; it’s just from another dimension. I will stick with the reality of the goose pond in Harrod’s Creek, Kentucky.

Stephen Arnold, January 19, 2009

Etymon: Maybe Another Lost Search Vendor

January 19, 2009

Etymon Systems Inc. was founded in 1998. The company set out to apply information systems research to solve problems through innovative software and consulting. The company’s name means “the source word of a given word.” In 2005, the company alerted me to its text retrieval systems: Amberfish and Isearch.

At that time, I learned that:

Amberfish was general purpose text retrieval software, developed at Etymon by Nassib Nassar and distributed as open source software under the terms of version 2 of the GNU General Public License (GPL). Its distinguishing features are indexing/search of semi-structured text (i.e. both free text and multiply nested fields), built-in support for XML documents using the Xerces library, structured queries allowing generalized field/tag paths, hierarchical result sets (XML only), automatic searching across multiple databases (allowing modular indexing), TREC format results, efficient indexing, and relatively low memory requirements during indexing (and the ability to index documents larger than available memory). Z39.50 support was available. Other features included support for Boolean queries, right truncation, phrase searching, relevance ranking, support for multiple documents per file, incremental indexing, and easy integration with other UNIX tools.

You can download from SourceForge.net a version of Amberfish here.

Isearch was:

open source text retrieval software developed in 1994 by Nassib Nassar at the Clearinghouse for Networked Information Discovery and Retrieval (CNIDR), which was funded by the National Science Foundation. Isearch was designed as a proof-of-concept software architecture for use in distributed information retrieval, known at the time as wide-area information systems, or WAIS. Isearch formed the text retrieval component of the Isite software, which was a complete prototype implementation of ANSI/NISO Z39.50 (ISO 23950)… The main features of Isearch included full text and field searching, relevance ranking, Boolean queries, and support for many document types such as HTML, mail folders, list digests, and text with SGML-style tags.

I had a link in my notes to a version of Isearch dated 2006. You can get that file here today (January 18, 2009). Nassib Nassar turns up as one of the people involved with this company. I had a pointer in my profile of this company to a technical paper about the company’s “grid” concept. You can locate this document here. Mr. Nassar’s blog here has not been updated since December 2005. The Renaissance Computing Institute lists Mr. Nassar on its Web site here.

I am inclined to move this company to my list of defunct search and content processing vendors. If anyone has information about the fate of Etymon, let me know.

Stephen Arnold, January 19, 2009

Unusual US Documents

January 19, 2009

A happy quack to the reader who called my attention to two sites. I haven’t checked the data on these two sites, and I don’t want to suggest the information is accurate. The first site is The Memory Hole here. The second site is Government Attic here. Both sites provides links and information about US government decisions, procedures, and activities. Government Attic’s information is indexed by Google. I’m not sure about Memory Hole at this time.

Stephen Arnold, January 19, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta