Metadata on Unstructured Data Increases Findability
November 9, 2012
Big data has held the media spotlight long enough to surpass any initial thought that it was a passing trend. Now the headlines trumpet how to benefit from the massive amounts of unstructured data flooding the internet and how to process it.
Computer Weekly’s article“How to Manage Unstructured Data for Business Benefit” explains how the next data evolution will be harnessing the benefits of both unstructured and structured data:
“There is as much value in unstructured data in terms of what customers are thinking on the web and what businesses can derive from other organizations’ data. It requires an understanding of the type of information the business is looking for and the kinds of insights business managers are hoping to draw from the data. The more considered the query, and the more focused the search, the better the results. This rule applies to both structured and unstructured data.”
Applying metadata to unstructured data opens up a profound new way to increase the findability of enterprise content, but the right solution is mandatory for success. Businesses looking for secure search and enterprise accessibility will find Intrafind provides customized solutions that combine to organize, tag and ultimately reveal relevant information to users of their enterprise search solutions. Powerful tools like this provide flexible options for data processing that put the power to increase efficiency and ROI back in the hands of the user.
Jennifer Shockley, November 9, 2012
Sponsored by ArnoldIT.com, developer of Augmentext
Harsh Words for SKOS eXtension for Labels
November 8, 2012
In 2009, the W3C published the SKOS-XL (SKOS eXtension for Labels). Now, Voyages of the Semantic Enterprise asks, “Who Needs Skos-XL? Maybe No One.” What does writer Irene Polikoff have against the SKOS extension?
The post begins at the beginning, with an explanation of the Triangle of Reference and the concepts behind the open source SKOS. Polikoff goes on to describe the purpose of SKOS-XL: to allow concept labels to collect their own metadata. This, she says, unnecessarily complicates vocabulary management. She writes:
“Labels are not strings as in SKOS proper, but RDF resources with their own identity. Each label can have only one literal form; this is where the actual text string (the name) goes. The literal form is not one per Label per language as with SKOS’s constraint for assigning preferred labels, but one per Label. So, to accommodate different languages, different label resources must be created. At the same time, there can be multiple Label resources with the same literal form (for example, two different Label resources with the literal form ‘Mouse’). Even a simple SKOS-XL vocabulary is considerably bulkier than its SKOS alternative. Since SKOS-XL format takes far more space, storage, import/export and performance of search and query can become an issue for larger vocabularies.”
Labels can be linked to each other as well as to their concepts, the article notes, further increasing complexity. Also, the same text label may be applied to different entities, potentially leading to confusion. Furthermore, the write up points to a couple of specific integrity clashes between SKOS and SKOS-XL. See the article for more details.
Polikoff closes by offering to help readers who think SKOS-XL is their only choice for vocabulary management to find a simpler solution. Will many users agree it is wise to do so?
Cynthia Murrell, November 08, 2012
Sponsored by ArnoldIT.com, developer of Augmentext
Effective Knowledge Management Requires Enterprise Search
October 18, 2012
The post goes on to elaborate on another study with similar results:
“Not enough for you? Seven years ago, an article ran in NewScientist. It highlights a study done at King’s College London, that showed in today’s business setting, marked by emails, smart phone connections,– the connected 24×7 reality of today, the average IQ of an individual drops by about 10 points. The study went on to conclude, (and this is my favorite part), ‘Even smoking dope has less effect on your ability to concentrate on the task in hand.’”
Knowledge management is obviously powerful, but requires one to step back and consider available options and information. Enterprise search is a key ingredient to knowledge management and Intrafind offers some of best in class best practices for secure searching that offers semantic linking and intelligent tagging.
Andrea Hayden, October 18, 2012
Sponsored by ArnoldIT.com, developer of Augmentext
The Metaprocess Puzzle
September 29, 2012
BeyeNetwork suggests one reason metadata is not implemented comprehensively or well: “Lack of Metaprocess Information Impedes Ability to Collect Metadata.” Writer and database management expert Bill Inmon pins the lack of enterprise-wide metadata primarily on a lack of metaprocess information. Metaprocess covers high-level descriptive details about a process, like its name, the technology that houses it, its input and output, and algorithmic variables. It is pointless, Inmon insists, to attempt to understand a large organization’s information flow without this information.
Why is metaprocess information so hard to come by? The article explains:
“It resides in the old legacy code. In COBOL. In assembler. In AS/400 modules. In PL/1. In technology that has not seen the light of day in decades. Once there were technicians that could be hired to read and go through the old code. Today those technicians have retired or have been promoted to management positions. In another generation, it won’t even be possible to find anyone who understands these older technologies. And by that time, SQL and C++ will be the old legacy technologies of the day.”
How does one solve a metaprocess problem? What is the meta-metaprocess? Inmon doesn’t really have an answer to that. He does suggest that, since legacy code is a form of text, someone may someday find a way to coax this information from a text editor. Anyone up for the challenge?
Cynthia Murrell, September 29, 2012
Sponsored by ArnoldIT.com, developer of Augmentext
WAND Inc Makes Integration Plans with Nintex Workflow Official
August 30, 2012
Last week WAND, Inc announced its plans to integrate with Nintex Workflow. Wand, Inc is attributed to making search work better by the use of its various taxonomies and Nintex is the world’s leader in SharePoint Workflow. Separately they are powerhouses in their specific field and combined they seem to have a lot to offer each other. The article, WAND adds automatic tagging to Nintex Workflow sheds some more light on what this match up could mean:
“The DataFacet Automatic Annotation custom action for Nintex Workflow allows a user to automatically tag documents with taxonomy metadata as part of a workflow process. Users who have DataFacet and Nintex Workflow …will be able take advantage of this custom action to control when a document is automatically tagged and base conditional actions on those tags. Workflows can be configured to tag documents with any terminology that is stored in the SharePoint Term Store, including terms from any of WAND’s Foundation Taxonomies.”
If everything runs smoothly for these two, this seems like a decent match up that has some great potential. I think it is safe to say that we can be expecting some great things from these guys in the coming years.
Edie Marie, August 30, 2012
Sponsored by ArnoldIT.com, developer of Augmentext
New Release of OpenCalais
August 12, 2012
OpenCalais gets a new release … soon. It has not come out as of this writing, but Calais’s blog announces that “A New OpenCalais Release On the Way.” It sounds like an upgrade that any site covering current events should be looking forward to. The post describes the enhancements:
“On the user-facing side of the equation you’ll see a number of new entities, facts and events related primarily to politics and intra and international conflict. Doesn’t look like either of those will be going away soon – so we thought they were worth implementing. You’ll see new information in candidates, Party Affiliations, Arms Purchases and a number of others.
“In addition to these new items, we’ve also enhanced our SocialTags feature for greater accuracy – in fact, you’ll see a number of accuracy improvements across the board.”
For those unfamiliar with the OpenCalais Web Service, it is a nifty free tool (“powered by” Thomson Reuters) that automatically incorporates semantic metadata into content. The best way to see what it does is to past a chunk of text, any text, into their Document Viewer. The tool will tease out and insert links for topics, social tags, entities (like organizations or industry terms), and events & facts. After you’ve played with that, check out the examples of ways the technology has been implemented on the Showcase page.
It is acceptable to use OpenCalais for commercial purposes, as long as users abide by the Terms of Use, but a Calais Professional version is also available for power users.
Cynthia Murrell, August 12, 2012
Sponsored by ArnoldIT.com, developer of Augmentext
Open Source Data Treasure Trove
August 2, 2012
Hungering for open source data? The H Open reports, “Data on 500,000 Open Source Projects Available.” The trove comes from Ohloh, a directory of open source projects maintained by Black Duck Software. The projects listed are all available under the Creative Commons Attribution 3.0 license. The write up reveals:
“The company also made a RESTful API available that allows information about the projects to be queried. Ohloh analyses projects from around 5,000 repositories, including GitHub, SourceForge, Google Code, kernel.org, Eclipse, Mozilla, and Apache.
“To use the API, users first have to register for a key and they can then query metrics such as the number of active contributors and commits, the number of lines of code, the main programming language used, and licensing information for the project. Black Duck uses the Ohloh database to identify items such as particularly active new projects or licensing trends.”
The folks at Black Duck are also working on a new version of their code search for Ohloh, now in beta; they hope the search will help improve links between project code and metadata.
Since its founding in 2002, Black Duck Software has been dedicated to supplying strategy, products, and services to enable the adoption of open source software on the enterprise scale. In fact, they boast that they are the leading provider of such solutions worldwide.
Cynthia Murrell,August 2, 2012
Sponsored by ArnoldIT.com, developer of Augmentext
The Factiva Metadata Strategy Explained
June 15, 2012
Information Age recently reported on a new data management solution for Dow Jones & Company in the article “The Metadata Strategy Behind News Search Service Factiva.”
According to the article, the new search service, known as Factiva, allows users to keep track of the latest developments in their industries. Originally launched in 1999, the system has since evolved to reflect the changing information consumption habits of target customers.
As text analytics technology has progressed over time, so too have the concepts it can be used to define. Today, Factiva can identify whether a news story relates to a change of management at a company or a bankruptcy, and can encode as much in the metadata.
Greg Merkle, creative director at Dow Jones’ Enterprise Media Group explains:
“As the amount of information that Factiva can derive from its metadata is increasing, its search results are evolving beyond a simple list of relevant stories. We are developing situational solutions for specific business cases, such as supply chain analysis: if you have a list of supplier you rely on, you can do searches that not only show the news, but extract the facts from the stories and present them as interactive dashboard.”
Dow Jones is just one of many companies that has a metadata strategy in place to deal with the increasing amount of data put out by search engines daily. Does your company have one in place?
Jasmine Ashton, June 15, 2012
Sponsored by PolySpot
Mondeca Updates Linked Open Vocabularies
June 11, 2012
Mondeca has updated their Linked Open Vocabularies(LOV). LOV’s goal is to help Web vocabulary users and managers access the broad ecosystem of linked open vocabularies in the Linked Data Cloud. The site’s About page explains:
“The vocabularies we are about are the many dialects (RDFS and OWL ontologies) used in the growing linked data Web. . . . Not only does linked data leverage a growing set of vocabularies, but vocabularies themselves rely more and more on each other through reusing, refining or extending, stating equivalences, declaring metadata.
“LOV objective is to provide easy access methods to this ecosystem of vocabularies, and in particular by making explicit the ways they link to each other and providing metrics on how they are used in the linked data cloud, help to improve their understanding, visibility and usability, and overall quality.”
A vocabulary is worthy of inclusion in the LOV dataset if it is expressed in one of the Semantic Web ontology languages (RDFS or some species of OWL); is published and freely available on the Web; is retrievable by content negotiation from its namespace URI; and is small enough easily integrated and re-used, in part or as a whole, by other vocabularies. See this page for more on the LOV dataset and features.
Mondeca is a leading provider of solutions for the management of advanced knowledge structures: ontologies, thesauri, taxonomies, terminologies, metadata repositories, knowledge bases, and linked open data. Their products and services help clients in Europe and North America boost their information retrieval, analysis, and usability. The firm was founded in 1999 and is based in Paris, France.
Cynthia Murrell, June 11, 2012
Sponsored by PolySpot
GigaOM Discovers the Power of Beyond Search
May 18, 2012
Short honk. We’re thrilled. In addition to Microsoft, numerous azure chip consultants, and various SEO experts, the phrase “beyond search” has been discovered by a “real” news outfit. Navigate to “Beyond Search: Twitter Joins the Discovery Wave.” The point is that one cannot read Twitter. Great insight. We look forward to more semantic appropriations. Perhaps a “beyond search” column, a mysteries of GigaOM, or – our favorite – the calculating predator 2012.
Stephen E Arnold, May 18, 2012
Sponsored by HighGainBlog, where “real” journalists don’t both to seek inspiration.