Microsoft SharePoint: Controlled Term Functionality

June 13, 2012

Also covered “SharePointSearch, Synonyms, Thesaurus, and You” provides a useful summary of Microsoft SharePoint’s native support for controlled term lists. Today, the buzzwords taxonomy and ontology are used to refer to term lists which SharePoint can use to index content. Term lists may consist of company-specific vocabulary, the names of peoples and companies with which a firm does business, or formal lists of words and phrases with “Use for” and “See also” cross references.

The important of a controlled term list is often lost when today’s automated indexing systems process content. Almost any search system benefits when the content processing subsystem can use a controlled term list as well as the automated methods baked into the indexer.

In this TechGrowingPains write up, the author says:

A little known, and interesting, feature in SharePoint search is the ability to create customized thesaurus word sets. The word sets can either be synonyms, or word replacements, augmenting search functionality. This ability is not limited to single words, it can also be extend into specific phrases.

The article explains how controlled term lists can be used to assist a user in formulating a query. The method is called “replacement words”. The idea of suggesting terms is a good one which many users find a time saver when doing research. The synonym expansion function is mentioned as well. SharePoint can insert broader terms into a user’s query which increases or decreases the size of the result set.

The centerpiece of the article is a recipe for activating this functionality. A helpful code snippet is included as well.

If you want additional technical support, let us know. Our Search Technoologies’ team has deep experience in Microsoft SharePoint search and customization. We can implement advanced controlled term features in almost any SharePoint system.

Iain Fletcher, June 13, 2012

Autonomy Offers Automatic Classification and Taxonomy Generation

May 7, 2012

Conceptualizing the processes and methods behind the storage and organization of data in our current age ruled by unstructured content and meta-tags can prove overwhelming. We found a great source of information from Autonomy, which explains their offering of Automatic Classification and Taxonomy Generation.

With their eye on functionality, IDOL’s classification solutions help users to circumvent issues that have arisen in a time of exponential data growth.

In addition to Taxonomy Libraries and Automatic Categorization and Channels, the Autonomy Collaborative Classifier is included. Their website clearly delineates how these elements work.

The website states the following information regarding Taxonomy Libraries:

“Built by experienced knowledge engineers using best practices learned through hundreds of consulting engagements, Autonomy taxonomies let organizations rapidly deploy industry-standard taxonomies that can be combined with your corporate taxonomies or easily customized to meet company and industry-specific requirements. Each Autonomy taxonomy is based on industry standards, and built using IDOL’s conceptual analysis that provides the highest level of accuracy.”

There are a variety of taxonomies IDOL consists of ranging from biotechnology to financial services: a comprehensive solution, indeed. Overall, IDOL seems equipped to eradicate the need for time consuming intervention required in the past. But open source alternatives exist and should be considered by procurement teams.

Megan Feil, May 9, 2012

Sponsored by Ikanow

Protege 4.2 Now Available

May 5, 2012

Version 4.2 (beta) of Protégé from Stanford University is now available here. The open source application serves as an ontology editor and knowledge-base framework. The product description states:

“The Protégé platform supports two main ways of modeling ontologies via the Protégé-Frames and Protégé-OWL editors. Protégé ontologies can be exported into a variety of formats including RDF(S), OWL, and XML Schema.

“Protégé is based on Java, is extensible, and provides a plug-and-play environment that makes it a flexible base for rapid prototyping and application development.

“Protégé is supported by a strong community of developers and academic, government and corporate users, who are using Protégé for knowledge solutions in areas as diverse as biomedicine, intelligence gathering, and corporate modeling.”

The editor can be customized to provide domain-friendly support for creating knowledge models and entering data. The National Library of Medicine supports Protégé’s biomedical ontologies and knowledge bases, which serve as national resources. The editor is a core component of The National Center for Biomedical Ontology.

Do taxonomy vendors face the open source ogre?

Cynthia Murrell,May 5, 2012

Sponsored by Ikanow

Will Harvard Library to Jettison Paid Access Academic Journals?

May 3, 2012

In what could be another step toward knowledge failure, BoingBoing reports “Harvard Library to Faculty: We’re Going Broke Unless You Go Open Access.” Struggling with the high costs of academic journal access fees, the Harvard Library Faculty Advisory Council has decided to cancel all the library’s paid scholarly subscriptions.

There’s no doubt that these charges are out of control, and steadily encroaching on the budgets for other acquisitions. Writer Cory Doctorow quotes the Council’s Memorandum on Journal Pricing:

“Harvard’s annual cost for journals from these providers now approaches $3.75M. . . . Some journals cost as much as $40,000 per year, others in the tens of thousands. Prices for online content from two providers have increased by about 145% over the past six years, which far exceeds not only the consumer price index, but also the higher education and the library price indices.”

We understand that the library must control costs. It is unfortunate, however, that that knowledge will no longer be at students’ fingertips. The open access academic world is still sparsely populated, and the Council makes this plea in hope of a richer open access community in the future:

“It’s suggesting that faculty make their research publicly available, switch to publishing in open access journals and consider resigning from the boards of journals that don’t allow open access.”

Perhaps the scholarly open access options will grow, in time. In the meanwhile, it will be the students who miss out on key knowledge.

Cynthia Murrell, May 3, 2012

Sponsored by PolySpot

Sponsored by PolySpot

MuseGlobal and Info Library Team for Mobile Access to Libraries

May 3, 2012

As many book retailers are being shut down due to an increase in e-book and tablet use, many are worried about what will become of our public libraries. Info Library and Information Solutions recently reported on a partnership that may provide a solution to this problem in the article, “Muse Global and Info Library and Information Solutions on Mobile Search Platform.”

According to the write-up, MuseGlobal and Info Library and Information Solutions have come together to make libraries more mobile friendly by offering a custom mobile search platform usingMuseGlobal’s cloud-based mobile search interface and platform.

Info Library and Information Solutions also brings quite a bit to the table. Kristina Bivens, MuseGlobal’s CEO stated:

“Info Library and Information Solutions is well known for their end-user oriented product focus in delivering innovative technology solutions that help libraries serve, interact with and empower users with customizable, on-demand information discovery tools. The NOW platform clearly reflects this commitment and we are delighted to collaborate with Info Library and Information Solutions in extending the NOW platform’s offerings to bring together all of the library’s collections, third-party content, and custom services in one convenient mobile interface.”

MuseGlobal’s technology will allow Libraries the convenience of having a platform that is easy to implement and offers users access to their entire catalog without having to allocate additional time and resources. Sounds like an excellent idea to me.

Jasmine Ashton, May 3, 2012

Sponsored by Ikanow

Is the End Approaching for Commercial Metadata Vendors?

April 26, 2012

This is a very interesting move, one that may have implications for the organizations which sell library metadata. Joho the Blog reports, “‘Big Data for Books’: Harvard Puts Metadata for 12M Library Items into the Public Domain.” We learn from the write up:

Harvard University has today put into the public domain (CC0) full bibliographic information about virtually all the 12M works in its 73 libraries. This is (I believe) the largest and most comprehensive such contribution. The metadata, in the standard MARC21 format, is available for bulk download from Harvard. The University also provided the data to the Digital Public Library of America’s prototype platform for programmatic access via an API. The aim is to make rich data about this cultural heritage openly available to the Web ecosystem so that developers can innovate, and so that other sites can draw upon it.”

Wow. Now, Harvard does ask users to respect community norms, like attributing sources of metadata. Blogger David Weinberger notes that licensing issues have held up the release of library metadata, and that this move makes the metadata of many, many of the most- used library items accessible.

What will happen next? Will the sellers of library metadata fight back?

Cynthia Murrell, April 26, 2012

Sponsored by PolySpot

OpenText Offers Content Auto Classification Solution

April 23, 2012

Open Text recently reported on a new transparent and defensible auto-classification designed for records managers in the article release “Open-Text Auto Classification.”

According to the article, very few companies a sound information governance strategy with appropriate records management services in place and therefore fail to dispose of their unstructured content that is no longer in use.

As a response to this paradoxical issue, the article states:

“At the core, the issue is that content needs to be classified or understood in order to determine why it must be retained, how long it must be retained and when it can be dispositioned.  Managing the retention and disposition of information reduces litigation risk, it reduces discovery and storage costs, and it ensures organizations maintain regulatory compliance.”

The article goes on to explain why many end users fail to go through the tedious process of cataloging and managing their information and then advocates for Open Text’s auto classification solution. We found this article to be an interesting explanation of the Nstein technology with some new twists and recommend it as a great read for those interested in understanding records management more thoroughly.

Jasmine Ashton, April 23, 2012

Sponsored by PolySpot

Open Source Medical Controlled Term Management: Apelon DTS 4.0

March 26, 2012

Apelon Medical Terminology in Practice recently posted a news release introducing the latest version of its open source terminology management software titled “Apelon Introduces Distributed Terminology System 4.0”

According to the article, Apelon’s latest DTS is a comprehensive open-source solution for the acquisition, management and practical deployment of standardized healthcare terminologies. It is built on the JEE platform  allows for simplified integration into existing enterprise systems.

The article states:

DTS users easily manage the complete terminology lifecycle. The system provides the ability to transparently view, query, and browse across terminology versions. This facilitates the management of rapidly evolving standards such as SNOMED CT, ICD-10-CM, LOINC and RxNorm, and supports their use for longitudinal electronic health records. Local vocabularies, subsets and cross-maps can be versioned and queried in the same way, meaning that DTS users can tailor and adapt standards to their particular needs.

The advance of technological terms in the medical industry that need to be referenced quickly and accurately, has precipitated the need for enhanced functionality in terminology management tools. The latest version of this software is easier to use than its predecessors and will help even more institutions integrate the latest decision support technologies into their daily work.

Jasmine Ashton, March 26, 2012

Sponsored by Pandia.com

Metadata: To the Roots!

February 15, 2012

According to the Computer Weekly article “Diving Deeper than Metadata, Down to “Contextual “ Metadata” content management isn’t what it used to be. Social business tools have now entered the corporate world and they are a crucial part of content management. The article asserts:

“Systems of record themselves now face the additional challenge of not only tracking a firm’s own processes, but also accommodating for what Forrester Research defines as “out of process” applications from third parties or those that only happen infrequently.”

This new form of analytics is referred to as contextual analytics. IBM uses Lucene Search in its IBM Content Analytics program which uses “annotators” to help define the content management metatags. “Content analytics solutions can understand the meaning and context of human language and rapidly process information to improve knowledge-driven search and surface new insights from your enterprise content.”

Looks like IBM is focused on digging deep and getting to the root of the problem. Digging is good but the scattering of service after service, solution after solution, strikes me as a trifle untidy.

April Holmes, February 15, 2012

Sponsored by Pandia.com

Exogenous Complexity 1: Search

January 31, 2012

I am now using the phrase “exogenous complexity” to describe systems, methods, processes, and procedures which are likely to fail due to outside factors. This initial post focuses on indexing, but I will extend the concept to other content centric applications in the future. Disagree with me? Use the comments section of this blog, please.

What is an outside factor?

Let’s think about value adding indexing, content enrichment, or metatagging. The idea is that unstructured text contains entities, facts, bound phrases, and other identifiable entities. A key word search system is mostly blind to the meaning of a number in the form nnn nn nnnn, which in the United States is the pattern for a Social Security Number. There are similar patterns in Federal Express, financial, and other types of sequences. The idea is that a system will recognize these strings and tag them appropriately; for example:

nnn nn nnn Social Security Number

Thus, a query for Social Security Numbers will return a string of digits matching the pattern. The same logic can be applied to certain entities and with the help of a knowledge base, Bayesian numerical recipes, and other techniques such as synonym expansion determine that a query for Obama residence will return White House or a query for the White House will return links to the Obama residence.

One wishes that value added indexing systems were as predictable as a kabuki drama. What vendors of next generation content processing systems participate in is a kabuki which leads to failure two thirds of the time. A tragedy? It depends on whom one asks.

The problem is that companies offering automated solutions to value adding indexing, content enrichment, or metatagging are likely to fail for three reasons:

First, there is the issue of humans who use language in unexpected or what some poets call “fresh” or “metaphoric” methods. English is synthetic in that any string of sounds can be used in quite unexpected ways. Whether it is the use of the name of the fruit “mango” as a code name for software or whether it is the conversion of a noun like information into a verb like informationize which appears in Japanese government English language documents, the automated system may miss the boat. When the boat is missed, continued iterations try to arrive at the correct linkage, but anyone who has used fully automated systems know or who paid attention in math class, the recovery from an initial error can be time consuming and sometimes difficult. Therefore, an automated system—no matter how clever—may find itself fooled by the stream of content flowing through its content processing work flow. The user pays the price because false drops mean more work and suggestions which are not just off the mark, the suggestions are difficult for a human to figure out. You can get the inside dope on why poor suggestions are an issue in Thining, Fast and Slow.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta