Deep Web Technologies’ Vertical Search for Business Information
January 13, 2009
In the early 1990s, Verity was the dominant enterprise search system. IBM’s confused approach to STAIRS and the complexity of STAIRS derivatives created a market opportunity. Verity took it. Verity’s founders have continued to innovate in search. I was delighted to speak with Abe Lederman (that interview is here) and learn about the innovations his company has made. Deep Web Technologies (DWT) tames the tangled world of US government scientific information. You can explore the Science.gov site here. Now, Mr. Lederman and his team have turned their attention to the needs of the person looking for substantive business information. The company’s new business search system–Biznar–débuted in October 2008.
DWT has identified about 60 business oriented Web sites and federates these sources in near real time. To this core list, the Deep Web (Biznar) takes a user’s query and retrieves results from other Web indexing services. The system then blends the results, producing a results list that is designed to answer business questions. On this select source list are such publications as:
- Business Week
- Money Magazine
- Motley Fool
- US Patent & Trademark Office
- Wall Street Journal.
Sample Query
Let’s look at a test query. I used Biznar to obtain information about “bankruptcy liability”. The system generated a result list with 1,706 entries. I ran the same query on Google.com, which returned a result list containing more than 9,400,000 results. Obviously no human could examine a fraction of these 9,400,000 results. Google advertises that it is good by virtue of indexing a lot of content. Biznar focuses on a meaningful result set of 1,700 items.
But for most people, 1,700 items are too many. Biznar makes it easy to navigate the results. Look at the results page below:
You see a two column display. The larger column presents a traditional results list with several useful enhancements:
- You see a star rating that provides an indication of the importance of the result for this specific query
- The source is displayed for each item; for example, Google Blog Search, Google Scholar, the New York Times, etc.
- The link includes a snippet of the content in the document that matches the query.
Now let’s look at the left hand column and its clustering of the results. With these point-and-click categories, I was able to focus on the specific subject germane to my query. I clicked on “Bankruptcy Court” and was able to browse a list of the 79 results directly related to this facet of my initial inquiry. Here’s what the categories looked like for this sample query:
Vivisimo’s Clusty.com offers a general purpose metasearch that also includes on-the-fly clustering. But DWT makes clustering more effective by adding important additional controls. For example, I can sort by date and limit my result to specific sources. These functions appear at the top of the main results page in this panel:
Mr. Lederman said, “Business information is plentiful. The problem is that while most search engines do a good job of indexing general business information, they don’t usually retrieve specific industry information across a wide range of topics. We designed Biznar to address this by searching quality business sources in real-time for the most current information.”
Other Features
Biznar is based on DWT’s federating technology. If you are not familiar with the notion of federated search, think of running a single query across multiple content sources and getting a single, relevance ranked result list. DWT’s approach extends this process with some sophisticated innovations. For example, DWT notes the date of each document. Sorting by date is a matter of pointing and clicking. “Most systems don’t pay enough attention to time. For business information, the time data (DATES) are essential. One doesn’t want to make a decision based on stale information.” DWT asserts that it can obtain information from a source immediately.
Another function is the DWT ability to index (The search takes place in real time where possible and indexes only if it makes sense in a particular situation. DWT makes available content from premium vendors. DWT’s system can tap into information that most Web indexing systems cannot process. “Few people realize that Google often indexes the HTML or XML and may not follow links deeper than three or four levels down. The company told Beyond Search, “Our technology searches the index created by a government Web site, which is not limited to three or four levels. As a result, our Biznar service includes information that is not available in other public facing search systems.”
Biznar is built upon DWT’s federated search system. Biznar takes full advantage of proprietary deduplication methods so you don’t see the same article multiple times. The system allows point and click sorting. The system suggests related content. You can limit a query to specific sources. If you want to access premium content from commercial publishers, DWT offers a for-fee service that gets you access to these high-value information sources.
Other operations available to you include:
- The ability to sort results by source, title, rank, author, or date (as mentioned above)
- Alerts which you can receive on a daily, weekly, or monthly basis. The Alerts are delivered via email or via an RSS feed
- Ability to specify specific results to be saved, printed, or emailed
- Automatic clustering so you can “drill down” into search results by topic, author, publication, or date
You can also save your preferences so that Biznar runs queries and generates results that meet your specific requirements. The system includes a feature that is not available from most vertical search systems. You can see the number of results from each source in a results list.
Biznar’s Alert service allows a user to save a query preferences. The system will automatically generate alerts in the future for a user. These queries are run as often as a user requires; for example, once a day to once a month. The delivery method is through email or RSS. Mr. Lederman said, “In my Biznar alerts, I find information that I don’t find on Google. I do all of my competitor research by setting up alerts on Biznar. All I have to do is check my email.”)
Because DWT performs a metasearch (sends your query to multiple search systems), you have an option to wait for each search engine to return results to your results page or terminate the data harvesting so you can review the most quickly returned results. I prefer to wait until the result list builds, but several of the Beyond Search team found that using the “quick results” feature delivered the needed information. The key point is that you have control over how the DWT system works for you. More information about the system is available here. Information Today’s description of the Biznar service is here. The system has been favorably mentioned in the December 2008 Information Advisor newsletter, Volume 20, Number 12.
About Deep Web
Deep Web Technologies (http://www.deepwebtech.com) creates custom, sophisticated federated search solutions for clients who demand precise, accurate results. Founded by industry thought leader and “deep web” pioneer Abe Lederman, Deep Web Technologies created the powerful Explorit Research Accelerator, software that searches, retrieves, aggregates and analyzes content from deep web databases – data that is inaccessible to general search engines. Serving Fortune 500 companies, the Science.gov Alliance, the U.S. Department of Energy, the Defense Technical Information Center, scitopia.org, WorldwideScience.org and a variety of research and library alliances, Deep Web Technologies has a solid reputation as the “researcher’s choice” for its advanced, agile information discovery tools. Deep Web Technologies is based in Santa Fe, New Mexico, where it has earned four Flying 40 Awards as one of the fastest growing high-tech companies.
Give Biznar a test drive. We found it useful and a definite improvement over the undifferentiated results in the higher profile Web search systems. Keep in mind that a for fee version can deliver premium content so you can balance Web information with that from established commercial publishers of business information.
Stephen Arnold, January 13, 2009
Comments
7 Responses to “Deep Web Technologies’ Vertical Search for Business Information”
Thanks for this article. I have a question regarding the methods used by DWT to federate results. Usually when viewing federated results, the content sources from the remote data stores are not integrated into one result set. For example, if you are federating the results from say 20 sources, the application will basically need to query the 20 remote sources for which it will receive 20 different result sets. Any idea how is the task of merging the distinct result sets into one result set performed?
My next question is regarding the “clustered information” navigation displayed in the left hand of DWT. Isn’t this more of a faceted navigation rather than cluster navigation. Clustered navigation, as displayed in clusty, creates the navigation hierarchy directly from the unstructured information of the documents. It does this by analyzing the key concepts of the result set. Whereas faceted navigation, creates the hierarchy from structured metadata already residing in the documents. It would seem reasonable that the DWT example would have to be an example of a faceted hierarchy since this is a federated solution? If we are speaking of a strictly federated approach the index does not reside locally and therefore is not amenable to the clustering analysis. Thanks again for the article
No metasearch engine (including Clusty which is also a metasearch engine) is able to do full clustering of results because they dont see the whole source of the page.
In that sense Clusty, Biznar or our PolyMeta search engines are very similar.
Thanks to all of the above for an edifying disquisition on arcane matters.
I loooooove all of this about Biznar:
* The ability to sort results by source, title, rank, author, or date
* Alerts which you can receive on a daily, weekly, or monthly basis. The Alerts are delivered via email or via an RSS feed
* Ability to specify specific results to be saved, printed, or emailed
* Automatic clustering so you can “drill down” into search results by topic, author, publication, or date
particularly the ability to sort by date and the email alerts. I have set up such of search engine news and Web 2.0 nes and they work like a dream. I work in a medical library so also love Biznar’s sibling, Mednar.
see also Masterseek business search http://www.masterseek.com
49.876.923 Company profile worldwide
Great article! It is worth noting that if you are looking for deep Web tools for finding resources beyond scientific information, you should check out http://www.virtualprivatelibrary.com — which offer over 50 different subject directories and search engines built and updated with proprietary bot technology.
Great article. Another vertical search engine worth checking out is LionSurfer: http://www.lionsurfer.com
LionSurfer is a human-powered directory-based search engine. Users submit URLs into categories in the directory. Using LionSurfer’s algorithm, the URLs are placed into the category that received the most submissions. LionSurfer is integrated with Google’s search technology to create the most useful tool for finding information on the web.
why don’t we index the deep web content by the fields that are in the search form.Because we are searching the deepwebsite acoording to those fields.