Quintura: Relationships with Hoops to Jump

September 9, 2009

A reader sent me a link to Quintura. I had looked at this system and turned my attention to more enterprise-centric vendors. I took another look at it this morning (September 7, 2009). I ran a query on the publicly accessible search system. This appears at the top the Quintura interface. My earlier test delivered a result in the form of a relationship map. The father of the hyperbolic relationship map in my mind is Ramana Rao, the former Xerox PARC wizard. His map has been influential and its surfaces in SAS’s “discovery” interface and has pointed Cluuz.com toward its display of connections. I ran a test query for “Microsoft Bing”. The result I saw was:

bing results

The results were useful. I then ran a more difficult query. I tried “Microsoft Eugene Agichtein. Dr. Agichtein is working in a little known but quite significant area related to next generation data management. Here is the Quintura display for this query:

Eugene Agichtein

Source: http://www.quintura.com. No link to the results appears in the navigation bar in thus Quintura page display.

None of the mapped items pertained to database, dataspace, or data management. I got proper nouns, and I know that the pointer to the people will with some further research eventually lead to useful information. I found this set of discovered tags not too useful for my needs. I kept getting proper nouns, and I needed other words and phrases for this test query. Maybe a consumer would find the tags useful. I found them not particularly useful for the type of research I do.

I then ran the same query on Cluuz.com. Here is a portion of the Cluuz.com output which uses the Yahoo search index:

Eugene Agichtein cluuz

Source: http://www.cluuz.com/Default.aspx?list=y&yahoo=y&q=microsoft%20Eugene%20Agichtein&q1=&r=10&s=1&sites=&format=&p=true&c=true&ph=true&e=true&a=true&d=true&dt=false&g=false&o=false&rt=&filt=

Bingo. The Cluuz.com system pointed me to a relationship at the University of Washington, prople, and a concept. I was off to the research races.

I then tried the query on one of Google’s “in the wild” search demonstrations. I don’t recall how I got the system to generate this type of output, so I can provide much of a how to in this write up. Here’s what Google delivered for the query “Microsoft Eugene Agichtein”:

Eugene Agichtein microsoft

Source: http://www.google.com/search?hl=en&tbo=1&tbs=ww%3A1&q=Microsoft+Eugene+Agichtein&btnG=Search&tbo=1

Okay, better than Quintura.com’s output in my opinion but not as good as the Cluuz.com output.

What’s my take?

First, I don’t think most users find these types of relationship maps easy to use upon first encountering them. A more skilled researcher will be able to make sense out of them. If the maps are too simple like the Quintura and Google implementation, I think that a list of suggestions may be more useful.

Second, in terms of what the systems found “related” to Dr. Eugene Agichtein, the Yahoo index processed by Cluuz.com was more useful. This tells me that Yahoo has a useful index and a lousy way of making the pointers available via Yahoo. The Canadian crowd at Cluuz.com makes Yahoo a more useful service. Too bad Yahoo has not signed a deal with Cluuz.com. Hopefully Cluuz.com will stay in business because I like their tools.

Third, the “regular” Google index has the information I wanted. I did have to use the Advanced Search panel to dig it out of the trillion item Google index (give or take a hundred billion, of course). Google has a major accessibility problem right now. The weird thing that looks like a law sprinkler does not deliver for me.

So, if you want relationship maps, I suggest you use Cluuz.com and skip the flashy displays on other systems until these outfits crack the code successfully.

Stephen Arnold, September 8, 2009

Exclusive Interview Gaviri Founder

September 8, 2009

One of the most interesting aspects of the Beyond Search Web log in general and the Search Wizards Speak feature in particular is learning about those who invent search systems.

I had an opportunity to sepak with Emeka Akaezuwa in the spare student union at Drexell University not long ago. Dr. Akaezuwa struck me as a remarkable individual. First, he made the trek from his office near New York City to Philadelphia. Second, I learned that he used to work at Dow Jones & Co., laboring on the firm’s search and retrieval systems. Third, I found that he had a PhD and was contributing his time to help young students get their arms around computers and their potential.

You can read the full text of my interview with Dr. Akaezuwa in the Search Wizards Speak feature on ArnoldIT.com. Search Wizards Speak is the single most comprehensive collection of interviews with the movers and shakers in search and content processing.

Dr. Akaezuwa’s Universal SearchOS impressed me. I watched as he took his wristwatch, connected it to my laptop via a USB cable, and indexed my computer. I made the observation that some law enforcement and intelligence agencies might be intersted in the technology. The SearchOS indexed my netbook in a matter of minutes. He said, “Now you can use the data in the Gaviri index to search for documents on your PC.”

SearchOS is available in other “flavors” as well. I have been testing the desktop version since we met at Drexell. One of its most useful features is the ability to “point” the indexing system at an archive of Outlook or Outlook Express email. The system makes those messages and their attachments searchable. Very useful.

The system can be deployed like other enterprise search systems. I asked Dr. Akaezuwa if the SearchOS system could cluster and generate facets. He laughed and told me, “The system is very versatile. Yes, we can make those features available to you if you want them.”

One point Dr. Akaezuwa made in his intervew was:

SearchOS can index so many documents without additional servers because of its Sandbox, Distributed Indexing Architecture. Let us take a behind-the-firewall setting with one thousand users and maybe six servers. Each user indexes all the documents within their sphere of search influence (desktops, portable storage devices, etc.) using their PC or laptop. If we assume that each user has three million documents, we would index three billion documents. And if each of the six servers has fifty million documents, we would index 300 million documents on the servers. Indexing is distributed on each user’s machine.  By design, SearchOS can index as many documents as are available on a device. We did not test on the specific hardware configuration you mentioned, but SearchOS’ processing throughput on a dual core processor is about 3,000 documents a minute. Keep in mind that SearchOS does full-text, not partial-text, indexing so the number may be less if a system has many large-size documents (over 500 MB). The software has a CPU utilization throttle that allows a user or a sys admin to power-up or decelerate content processing throughput to match available system resources. SearchOS not only morphs to user contexts but also can be scaled to a device’s content processing capabilities. Device-scaling – which I did not mention as one of what sets us apart – is necessary given the array of systems – from resource-challenged PDAs to high-powered servers that SearchOS must run on.

If you want more information about Dr. Akaezuwa’s system, navigate to http://www.gaviri.com. The full text of the interview is here.

Stephen Arnold, September 8, 2009

Google Books Becomes a Kinder, Gentler Thing

September 8, 2009

I enjoyed the Bloomberg story “Google Agrees to Give Europeans a Say in Books Deal.” I thought Google was reluctant to compromise, to show softness instead of knife edged logic. Maybe 1 + 1 does equal three in this brave new world. The story said:

“The challenge for EU policymakers is to ensure a regulatory framework which paves the way for a rapid roll-out of services, similar to those made possible in the U.S. by the recent settlement, to European consumers and to the European library and research communities,” the commissioners said in a joint statement today.

I found this comment because I can see that it may suggest that Google is adding some friendly souls to the Google Books board.

But Google is going even farther. According to the Reuters story “Google Tells EU Online Books Make Web Democratic” that Google is doing good. The statement attributed to Dan Clancy, architect of the Google Books program, was interesting:

“We have seen a democratization of access to online information,” said Clancy, engineering director of Google Book Search. “You can discover information which you did not know was there,” he said. “It is important that these (out-of-print) books are not left behind. Google’s interest was in helping people to find the books.”

Isn’t that a royal “we”?

Stephen Arnold, September 7, 2009

Google Shuttles Vilified in San Francisco

September 8, 2009

I wrote a column for KMWorld about Google’s push into mass transit services. Google disclosed an invention for sending shuttles where they were needed. The traditional muni transportation systems in the US run buses on a scheduled route. Most bus systems are bleeding cash. Google’s shuttle system (I am simplifying here) allows a Googler to request a ride electronically. The driver gets a route and map update wirelessly. The Googler gets an SMS that tell him or her when the shuttle will arrive. Lovely. But the love fest in SF is ending if the story in the San Francisco Weekly is accurate. “Noe Valley and Mission Residents Say Google Shuttles Are Evil” reported:

Residents from the Mission and Noe Valley have vilified the company and its unmarked private shuttles that zip commuters to Google’s headquarters in Mountain View. Many Mission hipsters blame Google and its buses for gentrifying the neighborhood that prides itself on being artsy and eclectic. Next door in Noe Valley, residents are irked for different reasons. They complain that the buses, which are equipped with WiFi and air conditioning, are infecting their pristine neighborhood with congestion, noise, and pollution. “There are buses idling; we don’t want that even if they run partially on biofuel,” says Vicki Rosen, president of Upper Noe Neighbors.

Google has spicy sausage pains in Italy. Now the Mission and Noe crowd are grousing. Poor Googzilla. The honeymoon seems to be ending.

Stephen Arnold, September 8, 2009

Gmail Outage as Instructional Opportunity

September 8, 2009

I try to focus my attention on companies and pundits in the search and content processing sectors. I need to make a slight modification. The GigaOM write up “5 Things We Learned From the Gmail Outage” snagged my attention. I think I see the purpose of the article; that is, provide a semi-oracular spin on a high profile company’s technical problem. If I were younger, I suppose I would adopt a similar tone in an effort to curry favor with the Google and approach a subject that has caught the attention of some Gmail users. The core of the article for me was this comment:

But in the end, the fact of the outage wasn’t nearly as interesting as what it said about Google, about email and about us.

In the best spirit of English 101, Kevin Kelleher uses the fact of Gmail going offline as a way to extract some lessons about life in 2009. The five lessons disturbed me. Let me comment on each, and I don’t want to imply that you should not read the original article. Far from it, you need to go to the water bucket yourself to drink.

First, I don’t like the implication that I have to get used to “outages”. When my family lived in Brazil in the 1950s, we had outages every day. The electricity worked a few hours a day. The water, maybe it was on two or three days a week. One does not get used to outages because outages require fundamental changes in the way one goes about certain tasks. I think it is defeatist to learn that Google cannot deliver a service that does not throw a user’s / customer’s life into a tail spin. Outages are not “good enough”; they are unacceptable. This glib statement irritated me. What if a Gmail contained information for a doctor treating the author’s loved one. Without the information, the doctor muffs the bunny and the author’s loved one dies. Is that something the author will get used to? Probably not too quickly I surmise.

Second, big is bad. Excuse me. The consolidation of computing and information services is moving forward, particularly in the US. “Big is bad” but for whom? Big seems to be the trajectory in the US, and if it were bad, why is bigness accelerating? The notion that “big” – the normal situation in telecommunications, automobiles, insurance, and (my favorite) financial services – is a philosophical message. Reality is different, and I find that taking a company’s technical weakness and making a political statement a weird way to communicate with me. Do something to change the reality; don’t tell me that I learned a lesson.

Finally, chain reactions are part of networks. Nope, chain reactions are a characteristics of nuclear phenomena. A chain reaction triggers a sequence of events that continues until the state of matter changes. A network is a system and failures can cascade. Furthermore failures in network systems can be compartmentalized, remediated, and in smart networks worked around. The chain reaction is a fact of nature. The failure of a network centric system is consequence of careless engineers.

Google has to do better, and I think the lesson the Gmail failure taught me is easy to state: Google has to do better. Google should not be excused, given a free pass because “good enough” or “get used to it” is acceptable. Wrong. Google made itself a dominant outfit. Now it has to live up to its obligation. No one should excuse or make excuses for Google’s lousy engineering.

Stephen Arnold, September 8, 2009

The Pogue Problem, Maybe the Future Opportunity?

September 8, 2009

I am not a journalist. I don’t have the first clue about what goes on in journalism classes. I don’t think the university I attended had a journalism department. True, I worked at a newspaper and at a magazine publishing company, but I was more of a manager / nerd type, more concerned with cutting costs and generating revenue than writing about the hoe down at the local courthouse or the school board meeting. Now I write a Web log that is nearly 100 percent marketing beef. I do commercial work, but that is a very different way to monetize my knowledge, as modest as it may be.

As a result, I look at publishing in a way that is different from and often incomprehensible to those who are trained journalists. Here’s an example. TechCrunch published a very good story “Losing Its Religion: the New York Times Compromises”. I understand the point of view that a reputable, traditional newspaper should make the distinction between news and advertising. I think Mr. Arrington’s write up puts a nice cap on the “Pogue problem”. Mr. Pogue is an author, a lecturer, and a columnist. He gets very excited about Apple products. Mr. Arrington writes:

The NY Times ethics policy also says “When we first use facts originally reported by another news organization, we attribute them.” But in our experience that isn’t always the case. The one thing the NY Times has is its brand and its people. They aren’t first to stories but they generally get things right. Trying to hide conflicts of interest hurts that brand, particularly when they hide, hypocritically, behind an ethics statement that prohibits the behavior they’re hiding. It’s far better to keep everything in the open. Transparency is what’s important, not appearances.

I don’t disagree with Mr. Arrington’s viewpoint. I want to stretch and idea in a different direction.

What I see in the “Pogue problem” is an opportunity. In my view, the “ethics” that were spelled out when the New York Times was rolling in money have been marginalized. The New York Times does not have the money or the staff to ride herd on the people who generate content. My recollection is that the New York Times and I believe the Washington Post ran stories that were, in effect, mostly made up. In the good old days of the newspaper wars, the moguls would create news. The “ethics” that major papers enforced were largely a reaction to the some of the more creative ways newspaper moguls handled the news in the good, old days. For many years, newspapering was lucrative. People read newspapers at the breakfast table and then on the subway or tram ride home from the mills outside Chicago and Philadelphia. Advertisers wanted to reach these people and the newspaper was the only game in town for a while. Eventually radio and TV came along and newspapers jumped into these channels. Some were successful like the Courier Journal & Louisville Times Co. where I worked for many years. The CJ&LT was a monopoly and Mr. Bingham had a letter from some higher legal authority that said it was okay for the CJ&LT to own TV, radio, commercial databases, direct mail ham outfits, door knob handing distribution companies, and printing plants that handled the New York Times Magazine when it was done via rotogravure. Life was indeed good.

But those days are gone.

Newspapers have not been able to make the leap into the channel broadly described as “new media” or “the Internet”. Sure, there are some marginal successes like the Wall Street Journal Online, but I take the hard copy of the paper and I get spam every day to urge me to subscribe again. The New York Times had a sweetheart deal with LexisNexis. Then the NYT pulled the plug, blew a million in royalties, and sank another dump truck of millions in its largely ineffective money making online efforts. The CJ&LT made money in online as early as 1981, yet when I talk with publishing executives, I get the “what do you know” treatment. Well, I know that the CK&LT knew how to make money online because I was there and contributed to that effort. I have a tough time taking the feedback I get from publishers about online seriously. Clueless is the word I use to describe most of the meetings I attend.

Now back to the Pogue Problem. I think the idea that troubles some people is that Mr. Pogue is close to Apple, so his objectivity is skewed. Let’s think about the upside of this model. In fact, forget Mr. Pogue, let’s talk money.

First, if the newspaper or any other publishing company has a way to get money from people who have money, the company—if it is publicly traded—has an obligation to figure out how to take this money without running afoul of the law. This means that if it is necessary to create a new type of editorial product, then that product should be created. If the person who writes the auto column stuck behind sports in the Sunday New York Times gets a request to write about a particular car, why not figure out how to sell that “content hole”? Make the deal a transaction and take the money. This happens with consulting firms who sell slots in industry charts. It happens at trade shows where those who buy booth space get to be speakers. One trade show organizer told me, “I don’t know how I can get all the exhibitors a speaking slot on our program.” Google is predicated on selling messages to people. This model deserves greater consideration among traditional publishing business thinkers. Yep, sell advertising messages in the form of “news” and “opinion”.

Second, the problem is that publishing companies have reinvented themselves and their business model. The only problem is that the revenue part and the cost control part have slipped through their fingers. What’s left is a business method that does not match with today’s fast changing world. What were the ethics of the newspaper companies at the turn of the century in New York? What were the ethics of a Barry Bingham in the 1980s? Those times and their “ethics” don’t match up with today’s opportunities. Therefore, change the definition for “ethics” and explore and possibly seize certain opportunities.

Finally, the journalists who follow the rules often find themselves under great pressure. When costs get chopped, some journalists have to turn to new types of research or information collection methods. These folks write pretty good stories, but they user different tools and methods. I can’t get too excited when a journalist uses blogs instead of telephone interviews. When was the last time you were able to get someone on the phone straightaway. Even my wife’s phone rings to voicemail. For my kids, it is SMS or nothing. When these new methods collide with a journalist who is trying to do the best job possible given the constraints, I cannot get worked up when a story is off base or a personal view gets into the article. My thought is to find a way to accept these changes and charge for them.

In summary, the Pogue Problem should be explored as the Pogue Opportunity. How can these situations be monetized. If the demand it there, my thought is that information companies have to consider how to deal with the opportunities, not react reflexively. If new sources of revenue are not found, the problem takes care of itself. Change is needed; new classes of information products and services are needed; and fresh thinking has to be brought to this new opportunity space.

Just my opinion.

Stephen Arnold, September 8, 2009

Dorthy: Achievement Service Search

September 7, 2009

“What is your dream? What is your passion? What is your desire?” So asks new search site dorthy.com. It’s calling itself an “achievement service” and works by continuously filtering content, including social features, rather than depending on straight-line searches. You start with a statement that answers the above questions (Example: Vacation at Disney World). Dorthy generates a “dream page” that organizes articles, videos, photos, blogs, recommended RSS feeds, and even Tweets. Your search results are cross-referenced with other peoples’ to expand information access. You can even “like” items and interests, similar to the Facebook function.

Right now the service struck me as somewhat fragmented, but it is an alpha, available at http://www.dorthy.com. You’ll need to spend quite a bit of time with it to find connections. It will need to find traction among a significant user population to succeed or get notice of the search world at large, and as big as Twitter and Facebook are. Will Dorthy reach the land of Oz? Who saw Facebook coming a couple of years ago? We’ll have to wait and see.

Jessica Bratcher, September 7, 2009

Google and Spicy Italian Sausage

September 7, 2009

Short honk: Google has indigestion. I think it is caused by gobbling some Italian data sausage. You can read “Google News Italia Probe Expands” and draw your own conclusions. The Web Pro News story provides an Italian legal document. Short story: Google is allegedly going to be investigated by Italian antitrust authorities. Is Google bigger than Italy? We will find out. Italy is a data outpost for Google. Will Italians find their data diet trimmed? Will Google indigestion go away?

Stephen Arnold, September 7, 2009

SharePoint Search and Twitter

September 7, 2009

End User SharePoint has provided a means to search Tweets within SharePoint. You may first want to read these two articles by following the two links below:

  1. Search Federation with SharePoint – Part 1
  2. Search Federation Part 2 – Customizing Results with SharePoint Designer

Then you will want to navigate to EUSP’s “Binary Free SharePoint Twitter Search Web Part”. EUSP has provided screen shots and a walk through of the method. For me the most interesting comment in the write up was:

Remove the web part connection created in the previous exercise before exporting!

Useful tip which can obviate the need for hours of hair pulling and Red Bull guzzling. Set aside an hour, maybe two, and then you will have Tweets in SharePoint. A happy quack to Woody Windischman who posts to The Sanity Point Web log.

Stephen Arnold, September 7, 2009

Explaining the Difference between Fast ESP and MOSS 2007 Again

September 7, 2009

When a company offers multiple software products to perform a similar function, I get confused. For example, I have a difficult time explaining to my 88 year old father the differences among Notepad, WordPad, Microsoft Works’ word processing, Microsoft Word word processing, and the Microsoft Live Writer he watched me use to create this Web log post. I think it is an approach like the one the genius at Ragu spaghetti sauce used to boost sales of that condiment. When my wife sends me to the store to get a jar of Ragu spaghetti sauce, I have to invest many minutes figuring out what the heck is the one I need. Am I the only male who cannot differentiate between Sweet Tomato Basic and Margherita? I think Microsoft has taken a different angle of attack because when I acquired a Toshiba netbook, the machine had installed Notepad, WordPad, and Microsoft Works. I added a version of Office and also the Live Writer blog tool. Some of these were “free” and others products came with my MSDN subscription.

Now the same problem has surfaced with basic search. I read “FAST ESP versus MOSS 2007 / Microsoft Search Server” with interest. Frankly I could not recall if I had read this material before, but quit a bit seemed repetitive. I suppose when trying to explain the differences among word processors, the listener hears a lot of redundant information as well.

The write up begins:

It took me some time but i figured out some differences between Microsoft Search Server / MOSS 2007 and Microsoft FAST ESP. These differences are not coming from Microsoft or the FAST company. But it came to my notice that Microsoft and FAST will announce a complete and correct list with these differences between the two products at the conference in Las Vegas next week.These differences will help me and you to make the right decisions at our customers for implementing search and are based on business requirements.

Ah, what’s different is that this is a preview of the “real” list of differences. Given the fact that the search systems available for SharePoint choke and gasp when the magic number of 50 million documents is reached, I hope that the Fast ESP system can handle the volume of information objects that many organizations have on their systems at this time.

The list in the Bloggix post numbers 14. Three interested me:

  1. Scalability
  2. Faceted navigation
  3. Advanced federation.

Several observations:

First, scalability is an issue with most search systems. Some companies have made significant technical breakthroughs to make adding gizmos painless and reasonably economical. Other companies have made the process expensive, time consuming, and impossible for the average IT manager to perform. I heard about EMC’s purchase of Kazeon. I thought I heard that someone familiar with the matter pointed to problems with the Fast ESP architecture as one challenge for EMC. In order to address the issue, EMC bought Kazeon. I hope the words about “scalability” are backed up with the plumbing required to deliver. Scaling search is a tough problem, and throwing hardware at hot spots is, at best, a very costly dab of Neosporin.

Second, faceted navigation exists within existing MOSS implementations. I think I included screenshots of faceted navigation in the last edition of the Enterprise Search Report I wrote in 2006 and 2007. There was a blue interface and a green interface. Both of these made it possible to slice and dice results by clicking on an “expert” identified by counting the number of documents a person wrote with a certain word in them. There were other facets available as well, although most we more sophisticated that the “expert” function. I hope that the “new” Fast ESP implements a more useful approach for users of Fast ESP. Of course, identifying, tagging, and linking facets across processed content requires appropriate computing resources. That brings us back to scaling, doesn’t it? Sorry.

Third, federation is a buzz word that means many different things because vendors define the term in quite distinctive ways. For example, Vivisimo federates, and it is  or was at one time a metasearch system. The query went to different indexing services, brought back the results, deduplicated them, put the results in folders on the fly, and generated a results list. Another type of federation surfaces in the descriptions of business intelligence systems offered by SAS. The system blends structured and unstructured data within the SAP “environment”. Others are floating around as well, including the repository solutions from TeraText which federates disparate content into one XML repository. What I find interesting is that Microsoft is not delivering “federation” which is undefined. Microsoft is, according to the Bloggix post, on the trail of “advanced federation”. What the heck does that mean. The explanation is:

FAST ESP supports advanced federation including sending queries to various web search APIs, mixing results, and shallow navigation. MOSS only supports federation without mixing of results from different sources and navigation components, but showing them separately.

Okay, Vivisimo and SAP style for Fast ESP; basic tagging for MOSS. Hmm.

To close, I think that the Fast ESP product is going to add a dose of complexity to the SharePoint environment. Despite Google’s clumsy marketing, the Google Search Appliance continues to gain traction in many organizations. Google’s solution is not cheap. People want it. I think Fast ESP is going to find itself in a tough battle for three reasons:

  1. Google is a hot brand, even within SharePoint shops
  2. Microsoft certified search solutions are better than Fast ESP based on my testing of search systems over the past decade
  3. The cost savings pitch is only going to go so far. CFOs eventually will see the bills for staff time, consulting services, upgrades, and search related scaling. In a lousy financial environment, money will be a weak point.

I look forward to the official announcement about Fast ESP, the $1.2 billion Microsoft spent for this company is now going to have to deliver. I find it unfortunate that the police investigation of alleged impropriety at Fast Search & Transfer has not been resolved. If a product is so good as Fast ESP was advertised to be, what went wrong with the company, its technology, and its customer relations prior to the Microsoft buy out? I guess I have to wait for more information on these matters. When you have a lot of different products with overlapping and similar services, the message I get is more like the Ragu marketing model, not the solving of customer problems in a clear, straightforward way. Sigh. Marketing, not technology, fuels enterprise search these days I fear.

Stephen Arnold, September 7, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta