Deep Web Technologies’ Vertical Search for Business Information

January 13, 2009

In the early 1990s, Verity was the dominant enterprise search system. IBM’s confused approach to STAIRS and the complexity of STAIRS derivatives created a market opportunity. Verity took it. Verity’s founders have continued to innovate in search. I was delighted to speak with Abe Lederman (that interview is here) and learn about the innovations his company has made. Deep Web Technologies (DWT) tames the tangled world of US government scientific information. You can explore the Science.gov site here. Now, Mr. Lederman and his team have turned their attention to the needs of the person looking for substantive business information. The company’s new business search system–Biznar–débuted in October 2008.

DWT has identified about 60 business oriented Web sites and federates these sources in near real time. To this core list, the Deep Web (Biznar) takes a user’s query and retrieves results from other Web indexing services. The system then blends the results, producing a results list that is designed to answer business questions. On this select source list are such publications as:

  • Business Week
  • Money Magazine
  • Motley Fool
  • US Patent & Trademark Office
  • Wall Street Journal.

Sample Query

Let’s look at a test query. I used Biznar to obtain information about “bankruptcy liability”. The system generated a result list with 1,706 entries. I ran the same query on Google.com, which returned a result list containing more than 9,400,000 results. Obviously no human could examine a fraction of these 9,400,000 results. Google advertises that it is good by virtue of indexing a lot of content. Biznar focuses on a meaningful result set of 1,700 items.

But for most people, 1,700 items are too many. Biznar makes it easy to navigate the results. Look at the results page below:

clip_image001

You see a two column display. The larger column presents a traditional results list with several useful enhancements:

  1. You see a star rating that provides an indication of the importance of the result for this specific query
  2. The source is displayed for each item; for example, Google Blog Search, Google Scholar, the New York Times, etc.
  3. The link includes a snippet of the content in the document that matches the query.

Read more

Federate Net Weaver and SharePoint

December 28, 2008

The new year approaches, and you have SAP Net Weaver and Microsoft SharePoint. You want to spend a few minutes making it possible to run one query and retrieve results from each system. Trivial? You bet. In case some of the steps are a tad uncertain, you will want to peruse the SAP white paper here. The title of this useful document is “Federated Search between SAP Net Weaver Enterprise Search and Microsoft Search Server 2008 Using Open Search and SSO.” The authors are SAP wizards Andre Fischer, Pedro Arrontes, and Holger Brucheit. The 15 page document is SAP centric, and the key is to use SAP’s Open Search interface. The paper assumes you know how this middleware and its method works.  If you are fuzzy in Open Search particulars, the white paper provides links to other documents in the SAP technical library. If you want to jump right in, fire up Net Weaver and use the built in templates to specify where the data are and their format. The white paper assumes that you will be using SAP’s security and access control system, which might be incorrect if SAP plays a secondary role in your organization. The information for configuring SharePoint walks through the specific graphical interface settings to use and, thankfully, includes the scripts needed to make SharePoint play nice with Net Weaver. If you work through the white paper and your federating doesn’t federate, SAP has included some troubleshooting tips. Enjoy.

Stephen Arnold, December 28, 2008

Google Needs Ideas, Says the Independent

December 5, 2008

After the London Times muffed the ball with its Microsoft Yahoo deal, the Independent is reporting that Google needs ideas. You can read the full text of the story here. According to “Google Staff Searching for Fresh Ideas” by Steve Foley, Google’s engineers have to spend less time of personal projects and more time to find ways to squeeze more juice from the Google data farm. For me the most important comment reminded me how dependent on advertising Google remains after such high profile revenue initiatives as enterprise search sales. Here’s what Mr. Foley said:

Google’s ad revenues are still growing at a rate of one-third a year, but just three years ago they were doubling annually, and analysts are forecasting the online advertising market will be little better than flat this year. Google makes effectively all its money selling ads next to its search results.

Do I believe that Google is indeed making big moves to remain in front of the tidal wave of financial misfortune? Nope. Not for a New York minute. I think Google is taking advantage of the present economic crisis to chop away some of the excesses of Google’s early hubris and unusual business decisions. The GOOG is in a much better position than some other companies. Google has many ways to turn on nrew revenue streams, and it has a significant lead in infrastructure. I think Google is like one of those mixed martial arts fight clubs. The weaker members of the team and some distractions are being eliminated.

Stephen Arnold, December 5, 2008

More on Search ROI

August 8, 2008

I usually agree with Deep Web Technologies’ commentaries. Sol Lederman has written an interesting essay “Measuring Return on Search Investment.” You will want to read his analysis here. The point of his write up is that Judy Luther, president of Informed Strategies, wrote a white paper about ROI for libraries. The good news in Ms. Luther’s analysis, if I read Mr. Lederman’s summary, correctly is that libraries can show a return on investment in an academic library. As a long time library user, I agree that an investment can pay many dividends.

I do want to push back a bit on library ROI. The sticking point is cost analysis. As long as an institution can chop up costs and squirrel them away, it is very difficult to know what an information service of any type costs. Libraries develop a budget. A tiny fraction of that budget goes for books, electronic information, and journals. Most of the money is sucked up from fixed costs like salaries, maintenance, security, and other institutional overheads.

As a result, the “cost” of an information service is almost always the direct cost at a specific point in time for a specific service or product. Costs associated with figuring out what to buy, installing the product, the share of the infrastructure the product requires, and other costs are ignored. As a result, the calculation that shows a specific return is not too useful.

Without a knowledge of the direct and indirect costs, the basic budget analysis is incomplete. Ignoring the “going forward” costs means that when problems occur, the costs can break the back of the library’s budget. Wacky ROI calculations, particularly where digital information and  search are concerned, push library’s deeper into the budget swamp. Here in Kentucky, budgets for online information are now cut. The looming problem will be that chopping a direct cost allows the unmonitored and often unknown dependent costs to continue to chew away at the budget.

Libraries face some severe budget pressure from these long ignored costs. These burn like an underground mine fire, and like an underground mine fire, these costs are often very difficult to control.

Stephen Arnold, August 8, 2008

Federated Search: List of Presentations

July 21, 2008

Deep Web Technologies Web log showcases almost two dozen presentations about federated search. The list was compiled by Sol Lederman, one of the key figures at Deep Web Technologies. An interview with Abe Lederman appeared in the Search Wizards Speak series in June 2008 here. The company’s Web log is FederatedSearchBlog.com, and you will want to navigate to this useful list of presentations. Note a couple of the referenced presentations are on a third party service called Slideshare, which some authors referenced by Deep Web Tech rely upon, is troublesome. Kudos to the Deep Web Tech team for their work.

Stephen Arnold, July 21, 2008

Vertical Search Resurgent

July 16, 2008

Several years ago, the mantra among some of my financial service clients was, “Vertical search.” What’s vertical search? It is two ideas rolled into one buzzword.

A Casual Definition

First, the content processed by the search system is about a particular topic. Different database producers define the scope of a database in idiosyncratic ways. In Compendex, an index of engineering information, you can find a wide range of engineering topics, covering many fields. You can find information about environmental engineering, which looks to me as if the article belongs in a database about chemistry. But in general, the processed information fits into a topical basket. Chemical Abstracts is about chemistry, but the span of chemistry is wide. Nevertheless, the guts of a vertical search engine is bounded content that is brought together in a generally useful topic area. When you look for information about travel, you are using a vertical search engine. For example, Orbitz.com and BookIt.com are vertical search engines.

Second, the content has to searchable. So, vertical content collections require a search engine. Vertical content is often structured. When you look for a flight from LGA to SFO, you fill in dates, times, department airport code, arrival airport code, etc. A parametric query is a fancy way of saying, “Training wheels for a SQL query.” But vertical content collections can be processed by the menagerie of text processing systems. When you query, the Dr. Koop Web site, you are using the type of search system provided by Live.com and Yahoo.com.

wheel

Source: http://www.sonirodban.com/images/wheel.jpg

Google is a horizontal search engine, but it is also a vertical search engine. If you navigate to Google’s advanced search page, which is accessed by fewer than three percent of Google’s users, you will find links to a number of vertical search engines; for example, the Microsoft collection and the US government collection. Note: Google’s universal search is a bit of marketing swizzle that means Google can take a query and pass it across indexes for discrete collections. The results are pulled together, deduplicated, and relevance ranked. This is a function available from Vivisimo since 2000. Universal search Google style displays maps and images, but it is far from cutting edge technology save for one Google factor–scale.

Why am I writing about vertical search when the topic for me came and went years ago. In fact, at the height of the vertical search frenzy I dismissed the hype. Innovators, unaware of the vertical nature of commercial databases 30 years ago, thought something quite new was at hand. Wrong. Google’s horizontal information dominance forced other companies to find niches where Google was not doing a good job or any job for that matter.

Vertical search flashed on my radar today (July 15, 2008) when I flipped through the wonderful information in my tireless news reader.

Autonomy announced:

that Foundography, a subsidiary of Nexus Business Media Ltd, has selected Autonomy to power vertical search on its website (sic)  for IT professionals: foundographytech.com. The site enables business information users to access only the information they want and through Autonomy’s unique conceptual capabilities delivers an ‘already found’ set of results, providing pertinent information users may not have known existed. The site also presents a unique proposition for advertisers, providing conceptually targeted ad selling.

Read more

50 Niche Search Engines

June 28, 2008

Alisa Miller has compiled a list of 50 niche search engines. You can find the listing on Accredited Degrees here. Ms. Miller groups the search engines, which adds to the usefulness of her list. As I worked my way through the links, two of her finds struck me as useful:

  • Bookmatch provides search results from 3,300 sources with spam and silliness removed from Web log postings and news aggregators.
  • Congoo delivers results results from news and other sources. The company claims a higher level of information. My test queries returned useful results.

A happy quack to Ms. Miller for her list.

Stephen Arnold, June 29, 2008

Coveo: Beyond a Billion Documents

June 3, 2008

Most licensees of enterprise search systems don’t know how many documents the system must index. Coveo can handle more than 1,000,000,000 documents.

Even fewer search system licensees know that many enterprise search systems have hard limits on how many documents a system can index before choking, sometimes expiring without warning. For example, Microsoft SharePoint has a hard limit significantly below the Coveo billion document target. Microsoft acquired Fast Search & Transfer, in part, to have a work around for this scaling problem.
Coveo’s G2B Information Access solutions deliver security, relevant results, and very strong ease of use. You can “snap in” Coveo to SharePoint, Documentum, and IBM FileNet environments without custom coding. For more information, navigate to the Coveo Web site. A free trial is available.

Stephen Arnold, June 3, 2008

Search Crystal

May 31, 2008

A colleague in the UK called my attention to a service I did not know about. A happy quack to my email helpers! The service is called Search Crystal. This is a difficult metasearch system to describe. You will have to navigate to the site, go through a basic registration form, and then wait until Search Crystal sends you an activation link.

Once you have access to the site, you enter a query in the search box. Search Crystal sends the query to Google, Ask.com, Exalead, Yahoo, and Microsoft. The results are processed and then the fun begins. Here’s the display that I was able to explore for my query “enterprise search”.

searchcrystal

You can filter by search engine. The colors correspond to the results from a particular search engine. You can flip among displays. These include a “crystal” view that shows an icon to make it easy to see which result appeared in specific search engine’s results. Every item on the display is a hot link. You can slice and dice the result list in myriad ways.

In the bull’s eye view, the top result is in the center of the display. SEO specialists will find this feature useful. The full version offers you a powerful research tool that lets you compare up to 500 results, add comments and share and compare saved crystals with friends and colleagues.

If you are a fan of rich interfaces, you will find a lot of love at Search Crystal. In my test queries, I found that I preferred the more traditional list view. My eye sight is simply not up to the task of reading the listings in the side control panels, the tag clouds, or the dense visual renderings.

You can search images, save result sets, share your Search Crystal views, and license the technology for use in your organization. You can visualize overlap in result sets and explore the Wikipedia via a Search Crystal. There are APIs so you can integrate the technology with a Web service like Flickr’s or an enterprise application if you are so inclined. I recall seeing a reference to a Facebook application as well. If you have a Web log, you can use the company’s widget to add the Search Crystal functions for your users.

The Search Crystal engineers certainly know their Flash technology. Check it out.

Stephen Arnold, May 31, 2008

Silobreaker: When Intelligence Officers Solve Their Own Info Problems

May 20, 2008

“The Holy Grail”, one former intelligence officer told me, “is to walk in my office and have what I need on my desk, on the computer monitor, and on the screen of my secure telephone.” (You can recognize these whizzy mobile phones because some have an extra light and other features to make it hard for the bad guy to listen in on the call.)

I forget that most people in the online business don’t have experience working in intelligence, the military and law enforcement. When I see an allegedly “hot new semantic search system”, I often take a cursory look and then walk on by. The reason is that the idea of searching is not where the action is for serious intelligence.

If you do a search on Mother Google, you will find more than 300,000 references to the company. To give you a benchmark, if you search for this Web log, you get about 230,000 references with most of them to a search engine optimization company with the same name. The point is that certain services or resources, no matter how useful, are tough to find unless you know exactly what to enter in the search box.

Let me illustrate. Here’s a screen shot of a system that has been available for several years.

silobreaker_serchresults

The query “semantic search” returned a main story, secondary items in smaller “newspaper” style boxes, an embedded live video from CeBIT, a bar chart about term frequency, and an “In Focus” section that provides the names of people and things the Silobreaker system identified as important. (If you look at the people in the “In Focus” box, you’ll see me (Stephen Arnold) identified despite my <230,000 Web log references in Google.)

Notice that Silobreaker’s default display is a report. The system delivers a synthesis of what’s important. There’s no result list. No single graphic gizmo floating in the browser without meaningful context. Silobreaker looks great but it contains a significant amount of go juice. Navigate here to explore the system yourself.

Silobreaker doesn’t do plain vanilla laundry lists. You can see a list of documents, but you see them in context; that is, a specific knowledge setting. You don’t have to ask, “What the heck does that mean?” Silobreaker presents the meaning of each item in a display.

Most of the search systems I see or get asked to review don’t do what I need done. I want to comment on a basic Silobreaker output and point out a few facts about the system. Once that housekeeping is done, I will make several observations in an effort to spark discussion about the sorry state of enterprise search and commercial business intelligence systems. For a reader who finds my criticism of the best that Silicon Valley has to offer offensive, stop reading now. If you want to see where the rubber meets the race track in the intelligence community, keep reading. Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta