More on Search ROI

August 8, 2008

I usually agree with Deep Web Technologies’ commentaries. Sol Lederman has written an interesting essay “Measuring Return on Search Investment.” You will want to read his analysis here. The point of his write up is that Judy Luther, president of Informed Strategies, wrote a white paper about ROI for libraries. The good news in Ms. Luther’s analysis, if I read Mr. Lederman’s summary, correctly is that libraries can show a return on investment in an academic library. As a long time library user, I agree that an investment can pay many dividends.

I do want to push back a bit on library ROI. The sticking point is cost analysis. As long as an institution can chop up costs and squirrel them away, it is very difficult to know what an information service of any type costs. Libraries develop a budget. A tiny fraction of that budget goes for books, electronic information, and journals. Most of the money is sucked up from fixed costs like salaries, maintenance, security, and other institutional overheads.

As a result, the “cost” of an information service is almost always the direct cost at a specific point in time for a specific service or product. Costs associated with figuring out what to buy, installing the product, the share of the infrastructure the product requires, and other costs are ignored. As a result, the calculation that shows a specific return is not too useful.

Without a knowledge of the direct and indirect costs, the basic budget analysis is incomplete. Ignoring the “going forward” costs means that when problems occur, the costs can break the back of the library’s budget. Wacky ROI calculations, particularly where digital information and  search are concerned, push library’s deeper into the budget swamp. Here in Kentucky, budgets for online information are now cut. The looming problem will be that chopping a direct cost allows the unmonitored and often unknown dependent costs to continue to chew away at the budget.

Libraries face some severe budget pressure from these long ignored costs. These burn like an underground mine fire, and like an underground mine fire, these costs are often very difficult to control.

Stephen Arnold, August 8, 2008

Microsoft BrowseRank Round Up

August 8, 2008

Looking to compete with Google’s PageRank program, BrowseRank is a Microsoft-developed method of computing page importance for use in Internet search browsers.

The computations are based upon user behavior data and algorithms to “leverage hundreds of millions of users’ implicit voting on page importance.” (So says a Microsoft explanatory paper [http://research.microsoft.com/users/tyliu/files/fp032-Liu.pdf]). The whole point is to add “the human factor” to search to bring up more results people actually want to see.

On July 27 SEO Book posted a review/opinion [http://www.seobook.com/microsoft-search-browserank-research-reviewed] since Steve posted about BrowseRank here [http://arnoldit.com/wordpress/2008/07/26/microsofts-browser-rank/].Summary: While it’s a good idea, there are drawbacks like false returns because of heavy social media traffic, link sites, etc. Sites like Facebook, MySpace, and YouTube are popping up high on the list – not because they have good, solid, popular information, but just because they’re high traffic. Microsoft will have to combine its BrowseRank user feedback information with other data to be really useful. On the other hand, if Microsoft can collect this user data over a longer term, the info would more likely pan out. For example, BrowseRank will measure time spent on a site to help determine importance and relevance.

A blog post on WebProNews [http://www.webpronews.com/topnews/2008/07/28/browserank-the-next-pagerank-says-microsoft] on July 28 said flat out: “It shouldn’t be the links that come in, but the time spent browsing a relevant page, that should help determine where a page ranks for a given query.” So that idea lends some credence to BrowseRank’s plan. The next step is how Microsoft will acquire all that information – obviously through things like their Toolbar, but what else? (Let’s ignore, for now, screams about Internet browsing privacy.) If MSN’s counting on active participation from users, it won’t work. This blog post points out that “Google’s PageRank succeeds partially due to its invisibility.” And that’s what users expect.

browserank-results

Graphic from Microsoft Research Asia

For now, and granted there’s only this small bit of info out there, SEO Book says, in their opinion, PageRank (Google’s product) has the one up on Microsoft because it sorts informational links higher, connects them to Google’s advertising, and because Google has the ability to manipulate the information.

You can read this for more info on Microsoft vs. Google: CNET put out a pretty substantial article [http://news.cnet.com/8301-1023_3-9999038-93.html] on July 25 talking about PageRank vs. BrowseRank and what Microsoft hopes to accomplish.

Read more

Autonomy: Another Week, Another Award

August 7, 2008

If I were a search system vendor, I would start to think about myself as a loser. Autonomy continues to win accolades for its information platform. Autonomy may be the winningest information platform vendor in history. The company’s most recent award is the 2008 IP Contact Center Technology Pioneer Award. You can read here the full story “Autonomy etalk Receives the 2008 IP Contact Center Technology Pioneer Award: Customer Interaction Solutions Magazine Recognizes Qfiniti for Advanced IP Call Recording.”

According to the essay, an Autonomy executive said:

etalk is thrilled to be recognized for our advanced recording technology and the flexibility and scalability we provide to our clients. When you combine our next-generation IP recording technology with telephony solutions from the world’s top-rated vendors, our clients are rewarded with the most robust and reliable voice recording solutions on the market.

etalk is a system that can record voice conversations for enterprise contact centers and mission critical business environments. This solution offers full customer interaction recording for compliance, risk management, and quality.

More information is available at http://www.etalk.com/. The link in the news release returns a page not found error. I fixed this glitch for the two or three Beyond Search readers. I wonder if the author “spoke” the link and it was incorrectly parsed or if the mistake was one of those human flubs like the ones I make so often?

Stephen Arnold, August 7, 2008

Looking for the Next Killer App: Moving Beyond Search

August 7, 2008

For years, the “next killer app” was email. Email, it turns out, is a headache. Younger folks are happy with instant messaging variants. SMS is okay. More “now”, Twitter-like functions are better. As the giants of software were pumping millions into R&D and the venture crowd was trolling the corridors of universities, the “next killer app” arrived. According to Pew/Internet: Pew Internet & American Life Project, search is the big dog. You can read the Pew story here. For me, the key point in the Pew data was “the number of those using a search engine on a typical day is pulling ever closer to the 60% of internet users who use email, arguably the Internet’s all-time killer app, on a typical day.”

So, how do I find an email a day old or older? Search. Gmail search works pretty well tool. Yahoo’s email search is okay, just pokey on my connection.

With Google dominating search, what’s Google’s next killer app?

My research suggests that Google is poking its snout into information access and management. I call it “search on steroids”.  Part of this effort is Google’s Programmable Search Engine. Another part is data management. Competitors need to crank up their innovation engines and figure out how to leap frog Google. What’s “beyond search”? Competitors who want to catch Google may be too late.

Stephen Arnold, August 7, 2008

SearchCloud Updated

August 7, 2008

SearchCloud.net has updated its search system. You can now adjust term weightings from the results page. If you are not familiar with this system, you will want to navigate to http://www.searchcloud.net. The idea is to enter a term and then select a font size for that term. The system puts the term in the cloud and makes an internal notation to weight that term in proportion to its font size. The larger the font, the more significant the term is to your query. Term weighting has been available to search system administrators, but the function is usually excluded from user facing controls. I wrote a profile of the company’s system, and you can read that essay here.

Other changes in the update include:

  • Tweaks to the interface
  • A hint box
  • Results can be copied.

I will keep you posted about developments. A happy quack to those who support term weighting.

Stephen Arnold, August 7, 2008

Sprylogics’ CTO Zivkovic Talks about Cluuz.com

August 7, 2008

The popular media fawn over search and process content companies with modest demos that work on limited result sets. Cluuz.com–a company I profiled here several weeks ago here–offers more hearty fare. The company uses Yahoo’s index to showcase its technology. You can take Cluuz.com for a test drive here. I was quite interested in the company’s approach because it uses Fancy Dan technology in a way that was immediately useful for me. Cluuz.com is a demonstration of Toronto-based Sprylogics International Inc. The company is traded on the Toronto exchange symbol TSXV:SPY.

With roots in the intelligence community, unlocking Sprylogics took some work. Once I established contact with Alex Zivkovic, I was impressed with his responsiveness and his candor.

You can read about the origins of the Cluuz.com service as well as some of the company’s other interesting content processing technology. The company offers a search system, but the real substance of the company is how the company processes content, even the Yahoo search index into a significantly more useful form.

The Cluuz.com system puts on display the firm’s proprietary semantic graph technology. You can see relationships for specific subsets of relevant content. I often use the system to locate information about a topic and then explore the identified experts and their relationships. This feature saves me hours of work trying to find a connection between two people. Cluuz.com makes this a trivial task.

Mr. Zivkovic told me:

So, we have clustering. We have entity extraction. We have a relational ship analysis in a graph format. I want to point out that for enterprise applications, the Cluuz.com functions are significantly more rich. For example, a query can be run across internal content and external content. The user sees that the internal information is useful but not exactly on point. Our graph technology makes it easy for the user to spot useful information from an external source such as the Web in conjunction with the internal information. With a single click, the user can be looking into those information objects.

I probed into the “guts” of the system. Mr. Zivkovic revealed:

Our engineers have worked hard to perform multiple text processing operations in an optimized way. Our technology can, in most cases, process content and update the indexes in a minute or less. We keep the details of our method close to the vest. I can say that we use some of the techniques that you and others have identified as characteristic of high-speed systems like those at Google, for example.

You can read the full interview with Mr. Zivkovic in the Search Wizards Speaks interview collection on the ArnoldIT.com Web site. The full text of this exclusive interview is here. A complete index of the interviews in this series is here.

Attivio: Active Intelligence Engine Version 1.2 Released

August 7, 2008

Attivio is on my radar. The company demonstrated its next-generation business intelligence system to me at the AIIM Show in April 2008. I liked what I saw. I interviewed the founder of Attivio, and you can read that transcript on the ArnoldIT.com Search Wizards Speak site here.

Now the company has released Version 1.2 of its Active Intelligence Engine, which means the gang in Wellesley, Massachusetts, is on the move. You can read the write up here.

Attivio–like some other newcomers–is not just a search or information access engine. Attivio has rethought the problem of getting information to employees who are under pressure or just in a hurry to get their kids from day care. I will dig into some of the new features later this month.

For now, let me highlight some of the new features of AIE 1.2. (I must admit when I pronounce AIE, I imagine the sound of a scream from competitors. Then after the scream dies down, I hear, “Why didn’t we implement those functions?”).

Four points about Version 1.2 caught my attention:

  1. AIE is positioned as a platform. The idea is that you can deploy quickly and build on top of the AIE system.
  2. Rich index that combines ease of use with the type of precision associated with structured query language statements. To me, this means I can get what I need without trying to get a programmer’s attention or spend some time flipping through a SQL manual
  3. Fast index updates and real time alerting.
  4. New connectors and support for 50 languages.

Attivio wants to deliver business intelligence without the hassles of most older business intelligence systems. The Bottomline for me is that Attivio is focusing on basics: speed, ease of use, quick deployment, and platform extensibility. You can learn more about Attivio here.

Stephen Arnold, August 7, 2008

Hosted SharePoint Info

August 7, 2008

Network World’s Mitchell Ashley scooped most Microsoft watchers with “Microsoft Spills the Beans on Hosted Exchange / SharePoint”. Mr. Ashley tracked down Microsoft’s John Betz, Director Product Management for Microsoft Business Online Services. The conversation–available as a podcast here–provides useful information about hosted SharePoint. Mr. Ashley tossed some high, soft, easy to field questions, but several points jumped out at me. These were:

  1. The cloud play is “Hosted by Microsoft and sold by partners”. Infrastructure is going to be one important key ingredient in this new service stew.
  2. Pricing has a ceiling of $15 per user. Most folks will pay less. These prices strike me as “pulled from the clouds.”
  3. Microsoft “will make it all work together. Active Directory will communicate with hosted Exchange and SharePoint. A “new tool will be provided”.
  4. Trade off for hosted Exchange and SharePoint–give up some control. “We make assumptions and settings on your behalf…. If you want customization, you need on premises Exchange and SharePoint.”
  5. The Service Level Agreement is for uptime, not transit time or any other network function.
  6. “We absolutely rely on partners. This is a great opportunity to sell an online service today and get paid forever.” Reason: Support comes from partner or local information technology group. Online services are for organizations that have an IT person on staff. “We’re delivering meat and potatoes. Our partners can put an embellishment upon these services.”

This is a very interesting chunk of information. A happy quack to Mr. Ashley.

Stephen Arnold, August 6, 2008

Google Search Appliance: Showing Some Fangs

August 6, 2008

Assorted wizards have hit the replay button for Google’s official description of the Google Search Appliance (GSA)

If you missed the official highlights film, here’s a recap:

  • $30,000 starting price, good for two years, “support” and 500,000 document capacity. The bigger gizmos each can handle 10 million documents. These work like Christmas tree lights. When you need more, just buy more GSAs and plug them in. This is the same type of connectivity “big Google” enjoys when it scales.
  • Group personalization; for example, marketing wizards see brochures-type information and engineers see documents with equations
  • Metadata extraction so you can search by author, department, and other discovered index points.

If you want jump right into Google’s official description, just click here. You can even watch a video about Universal Search, which is Google’s way of dancing away from the far more significant semantic functionality that will be described in a forthcoming white paper from a big consulting firm. This forthcoming report–alas–costs money and it even contains my name in very small type as a contributor. Universal Search was the PR flash created for Google’s rush Searchology conference not long after an investment bank published a detailed report of a far larger technical search initiative (Programmable Search Engine) within the Googleplex. For true Google watchers, you will enjoy Google’s analysis of complexity. The title of the video is a bit of Googley humor because when it comes to enterprise or behind the firewall search, complexity is really not that helpful. Somewhere between 50 and 75 percent of the users of a search system are dissatisfied with the search system. Complexity is one of the “problems” that Google wants to resolve with its GSA.

When you buy the upscale versions of the GSA, you can implement fail over to another GSA. GSAs can be distributed geographically as well. The GSA comes with support for various repositories such as EMC Documentum. This means that the GSA can index the Document content without custom coding. The GSAs support the OneBox API, which is an important component in Google’s enterprise strategy. With the GSA, a clever programmer can use the GSA to create Vivisimo-style federated search results, display live data from a Microsoft Exchange server so a “hit” on a person shows that person’s calendar, integrate Web and third-party commercial content with the behind-the-firewall information, and perform other important content processing tasks.

Google happily names some of its larger customers, including Adobe Systems, Kimberly-Clark, and Sunnybrook Health. The company also does not mention the deep penetration of the GSA into government agencies, police organizations, and universities.

Good “run the game plan” write ups are available from CNet here, my favorite TechCrunch with Eric Schonfeld’s readable touch here, and the “stilling hanging in there” eWeek write up here.

splash for videos

After registering for the enterprise videos, you will see this splash page. You can get more information about the upgrade to Version 5 of the GSA.

My Take

Now, here’s my take on this upgrade:

First, Google is responding to demands for better connectivity, more administrative control, and better security. With each upgrade to the GSA, Google has added features that have been available for a quarter century from outfits like Verity (now part of the Autonomy holdings). The changes are important because Google is often bad mouthed for offering a poor enterprise search solution. With this release, I am not so sure that the negatives competitors heap on these cheerful yellow boxes are warranted. This version of the GSA is better than most of the enterprise search appliances with which I am familiar and a worthy competitor where administrative and engineering resources are scarce.

Read more

IBM: Blurring Lotus Notes and Enterprise Content Management

August 6, 2008

Marketwatch appears to have picked up an IBM news release and posted the write up without any editorial massaging. You can read “IBM eDiscovery Software Helps Organizations Win the Compliance Battle” here. The purpose of the news release was to explain that IBM offers “Enterprise Content Management (ECM) software designed to help clients meet challenging legal discovery requirements.” A couple of years ago, I had to take a look at IBM’s content management services. These ranged from FileNet to applications deployed within WebSphere. In addition, IBM was actively involved in deploying Documentum, albeit in some remarkably interesting situations that feathered the nest of some legal eagles.

This IBM news release and Marketwatch “news” story asserts that IBM has “enterprise content management” which complements Lotus Notes. The “new” approach “helps organizations win the compliance battle.” Hmmm. I thought compliance was a requirement designed to assure that certain actions were taken. I don’t think of compliance as a battle, but I’m not IBM. I was just a fellow asked to figure out what IBM offered in the way of content management. The real news in this release is that IBM is pushing into eDiscovery, which is neither enterprise content management nor compliance work. eDiscovery is its own separate thing, but obviously IBM is setting me straight. The news story on Marketwatch said:

IBM’s eDiscovery software is the first to leverage a complete ECM platform to transform the process of eDiscovery by proactively managing electronically stored evidence. The new eDiscovery software integrates with IBM’s auto-classification and records management technology to help IT departments manage information for compliance and electronic discovery requests. IBM eDiscovery software also integrates with IBM’s content-centric business process management (BPM) capabilities to help organizations standardize, control and automate legal discovery workflows and enable third-party components as needed.

Let’s think about this. IBM has “new” software that:

  • Is Lotus Notes
  • Is Enterprise Content Management
  • Performs eDiscovery
  • Integrates business process management
  • Performs compliance requests
  • Manages electronically stored evidence
  • Hooks into automatic classification
  • Connects to records management.

I am a bit confused. One “new” product is the digital equivalent of the entire stock list of AsSeenOnTV.com? I find this remarkable and pretty close to science fiction. But I am an addled goose in an acid rain soaked hollow in rural Kentucky. Those folks in Armonk and Almaden are sure capable innovators. I wasn’t sure every buzz word in search and content processing could be squeezed into one meaty “news” release.

Stephen Arnold, August 6, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta