Which Is Better? Abstract or Full Text Search?

November 26, 2010

Please bear with us while we present a short lesson in the obvious: “Users searching full text are more likely to find relevant articles than searching only abstracts.”  A recent BMC Bioinformatics research article written by Jimmy Lin titled “Is Searching Full Text More Effective than Searching Abstracts?” explores exactly that.

So maybe we opened with the conclusion, but here is some background information.  Since it is no longer an anomaly to view a full-text article online, the author set out to determine if it would be more effective to search full-text versus only the short but direct text of an abstract.  The results:

“Experiments show that treating an entire article as an indexing unit does not consistently yield higher effectiveness compared to abstract-only search. However, retrieval based on spans, or paragraphs-sized segments of full-text articles, consistently outperforms abstract-only search. Results suggest that highest overall effectiveness may be achieved by combining evidence from spans and full articles.”

Yep, at the end of the day, searching from a bank of more words will in fact increase your likeliness of a hit.  The extension here is the future must bring with it some solutions.  Due to the longer length of the full-text articles and the growing digital archive waiting to be tamed, Lin predicts that multiple machines in a cluster as well as distributed text retrieval algorithms will be necessary to effectively handle the search requirements.  Wonder who will be first in line to provide these services…

Sarah Rogers, November 26, 2010

Freebie

Inforbix Poised to Shake Up Engineering Design Search

November 3, 2010

In an exclusive interview with ArnoldIT.com, Oleg Shilovitsky, co-founder and CEO of Inforbix, provides an in-depth look at his information retrieval system for engineering and product design. His firm Inforbix has been operating in a low profile and is now beginning to attract the attention of engineering professionals struggling with conventional data management tools for parts, components, assemblies, and other engineered pieces.

Most search systems are blind to the data locked in engineering design tools and systems. For example, in a typical manufacturing company, a traditional search system would index the content on an Exchange server, email, proposals in Word files, and maybe some of the content residing in specialized systems used for accounts payable or inventory. When these items are indexed, most are displayed in a hit list like a Google results page or in a point-and-click interface with hot links to documents that may or may not be related to the user’s immediate business information need.

But what about the specific part needed for a motor assembly? How does one locate the drawing? Where are the data about the item’s mean time before failure? The semantic relationships between bits of product data data located in multiple silos are missing. The context of information related to components in a manufacturing and production process is either ignored, not indexable, or presented as a meaningless item number and a numerical value.

That’s the problem Mr. Shilovitsky and his team of engineers has solved. With basic key word retrieval now a commodity, specialized problems exist. As Mr. Shilovitsky told me, “I think maybe we have solved a problem for the first time. We make manufacturing and production related data available in context.”

In the interview conducted on November 1, 2010, Mr. Shilovitsky said:

In my view, the most valuable characteristics of future systems will be “flexibility” and “granularity”. The diversity of data in manufacturing organization is huge. You need to be flexible to be able to crack the information retrieval. On the other side, businesses are driven by values and ROI. So, to be able to have a granular solution (don’t boil the ocean) in order to address a particular business problem is a second important thing.

He added:

Our system foundation combines flexibility and granularity with a deep understanding of product data in engineering and manufacturing. One of the problems of product development is a uniqueness of organizational processes. Every organization runs their engineering and development shop differently. They are using the same tools (CAD, CAM, CAE, data management tools, or an ERP system), but the combination is unique.

To read the full text of this exclusive interview, navigate to this link. For more information about this ground-breaking approach to a tough information problem, point your browser to www.inforbix.com.

Stephen E Arnold, November 3, 2010

Freebie

A Tagged Future

October 7, 2010

“The Myth of the Universal Tag and the Future of Digital Data Collection” provides an interesting view of the challenges of tag-based deployment. The point of the write up by Ensighten is to use a tag management system. Ensighten offers such a system. There are other vendors in this market as well; for example, Access Innovations.

The white paper explains how a tag management system can create  technologically strong and financially wise benefits for end users. This will be especially interesting to those who invest in digital measurement advertising, and marketing solutions. Another sharp focus is clarifying the need for tag management systems and tag management solutions.

For your own copy of the white paper, click here for the Ensighten write up. You will have to fill out the form for a free copy. Sounds interesting, and may be worth your time to look.

Our view is that tags are now proliferating. In the traditional database world, too many tags can create problems. Users can get confused when a tag generates false drops. The management of tags becomes more complicated. Without control from the git-go, tags have a tendency to become muddled.

Search is tough and indiscriminate tagging makes the job harder for the user. Uncontrolled tags are often one consequence of carelessness. Whether a manual system or automated system is used to generate tags, getting the tagging method under control is Job #1. Getting the tags wrong means significant costs down the road, assuming the organization has the appetite to fix a problem on the right level in an appropriate manner. Cosmetics applied by azurini and former journalists won’t do the job in our experience.

Stephen E Arnold, October 7, 2010

Freebie

Deconfusing Taxonomists and Ontologists

October 1, 2010

“Skills of a Classy Taxonomist” tackles a subject that some English majors turned search experts often sidestep. This article explains “taxonomists build hierarchies and ontologist determine classes or categories.” The key point is that “ontologies are neat and unambiguous and taxonomies are a bit messy.”

The most interesting part of the article asserts that a faceted taxonomy is more useful than a plain vanilla taxonomy. With a faceted taxonomy, the author asserts, some challenges are confronted and resolved; specifically:

  • Clarify specific terms by situation or function
  • Ease long-term maintenance issues
  • Facilitate sharing and importing of taxonomies.

If you want to reduce your taxonomy hassles, you will want to navigate to “The Taxonomy Blog”. The product that delivers the benefits referenced is Top Quadrant. There are some other solutions that work as well; for example, Access Innovations’ system and, for the code savvy, some open source components.

My view is that most organizations talk about taxonomies and ontologies and then find that the cost and effort to sustain a project as language changes makes the effort expendable. That’s too bad, but economic realities often force “good enough” tagging on hapless users.

Stephen E Arnold, October 1, 2010

Freebie

« Previous Page

  • Archives

  • Recent Posts

  • Meta