January 22, 2014
Did you know that there was an open source version of ClearForest called Calais? Neither did we, until we read about it in the article posted on OpenCalais called, “Calais: Connect. Everything.” Along with a short instructional video, is a text explanation about how the software works. OpenCalais Web Service automatically creates rich semantic metadata using natural language processing, machine learning, and other methods to analyze for submitted content. A list of tags are generated and returned to the user for review and then the user can paste them onto other documents.
The metadata can be used in a variety of ways for improvement:
“The metadata gives you the ability to build maps (or graphs or networks) linking documents to people to companies to places to products to events to geographies to… whatever. You can use those maps to improve site navigation, provide contextual syndication, tag and organize your content, create structured folksonomies, filter and de-duplicate news feeds, or analyze content to see if it contains what you care about.”
The OpenCalais Web Service relies on a dedicated community to keep making progress and pushing the application forward. Calais takes the same approach as other open source projects, except this one is powered by Thomson Reuters.
January 10, 2014
When Netflix first launched I read an article about how everyone’s individual movie tastes are different. There are not any two alike and Netflix created an algorithm that managed to track each user’s queue down to the individual. It was scary and amazing at the same time. Netflix eventually decided to can the algorithm (or at least they told us), but it still leaves a thought that small traces of metadata can lead to you. The Threat Post, a Web site that tracks Internet security threats, reported on how “Stanford Researchers Find Connecting Metadata With User Names Is Simple.”
A claim has been made that user phone data anonymously generated cannot be tracked back to an individual. Stanford Researchers proved otherwise. The team started the Metaphone program that collects data from volunteers with Android phones. The project’s main point was to collect calls, text messages, and social network information for the Stanford Security Lab to connect metadata and surveillance. They selected 5,000 random numbers and were able to match 27% of the them using Web sites people user everyday.
The article states:
“ ‘What about if an organization were willing to put in some manpower? To conservatively approximate human analysis, we randomly sampled 100 numbers from our dataset, and then ran Google searches on each. In under an hour, we were able to associate an individual or a business with 60 of the 100 numbers. When we added in our three initial sources, we were up to 73,’ said Jonathan Mayer and Patrick Mutchler in a blog post explaining the results.”
The article also points out that if money was not a problem, then the results would be even more accurate. The Stanford Researchers users a cheap data aggregator instead and accurately matched 91 out of 100 numbers. Data is not as protected or as anonymous as we thought. People are willing to share their whole lives on social media, but when security is mentioned they go bonkers over an issue like this? It is still a scary thought, but where is the line drawn over willing shared information and privacy?
Whitney Grace, January 10, 2014
December 26, 2013
For Obama’s 2012 re-election campaign, his team broke down data silos and moved all the data to a cloud repository. The team built Narwhal, a shared data store interface for all of the campaigns’ application. Narwhal was dubbed “Obama’s White Whale,” because it is almost a mythical technology that federal agencies have been trying to develop for years. While Obama may be hanging out with Queequag and Ishmael, there is a more viable solution for the cloud says GCN’s article, “Big Metadata: 7 Ways To Leverage Your Data In the Cloud.”
Data silo migration may appear to be a daunting task, but it is not impossible to do. The article states:
“Fortunately, migrating agency data to the cloud offers IT managers another opportunity to break down those silos, integrate their data and develop a unified data layer for all applications. In this article, I want to examine how to design metadata in the cloud to enable the description, discovery and reuse of data assets in the cloud. Here are the basic metadata description methods (what I like to think of as the “Magnificent Seven” of metadata!) and how to apply them to data in the cloud.”
The list runs down seven considerations when moving to the cloud: identification, static and dynamic measurement, degree scales, categorization, relationships, and commentary. The only thing that stands in trashing data silos is security and privacy. While this list is useful it is pretty basic textbook information that is applied to metadata in any situation. What makes it so special for the cloud?
Whitney Grace, December 26, 2013
October 27, 2013
If you are in need of a relatively painless way to obtain metadata, DocumentCloud might be your solution. Every uploaded document is run through OpenCalais, allowing for user access to widespread information mentioned in them. It simplifies the search for people, places and organizations from your documents and allows you to plot them by dates mentioned in a timeline that can be as specific or general as the user desires.
“Use our document viewer to embed documents on your own website and introduce your audience to the larger paper trail behind your story.
From our catalog, reporters and the public alike can find your documents and follow links back to your reporting. DocumentCloud contains court filings, hearing transcripts, testimony, legislation, reports, memos, meeting minutes, and correspondence. See what’s already in our catalog. Make your documents part of the cloud.”
If you prefer privacy, that is a built-in feature. If you prefer to publish, your documents become a part of the landscape of primary sources in the DocumentCloud catalogue. There is also a highlighting feature that accommodates both public annotations and more private organizational notes. Each note has its own URL, enabling users to show their readers the exact information they need.
Chelsea Kerwin, October 27, 2013
August 6, 2013
The rise of metadata is here, but will companies be able to harness its value? Concept Searching points to the answer that ROI has not been successful with this across the board. A recent article, “Solving the Inadequacies and Failures in Enterprise Search,” admonishes the laissez-faire approach that some companies have towards enterprise search. The author advocates, instead, towards a hands-on information governance approach.
What the author calls a “metadata infrastructure framework” should be created and should be comprised of automated intelligent metadata generation, auto-classification, and the use of goal and mission aligned taxonomies.
According to the article:
The need for organizations to access and fully exploit the use of their unstructured content won’t happen overnight. Organizations must incorporate an approach that addresses the lack of an intelligent metadata infrastructure, which is the fundamental problem. Intelligent search, a by-product of the infrastructure, must encourage, not hamper, the use and reuse of information and be rapidly extendable to address text mining, sentiment analysis, eDiscovery and litigation support. The additional components of auto-classification and taxonomies complete the core infrastructure to deploy intelligent metadata enabled solutions, including records management, data privacy, and migration.
We wholeheartedly agree that investing in infrastructure is a necessity — across many areas, not just search. However, when it comes to a search infrastructure, we would be remiss not to mention the importance of security. Fortunately there are solutions like Cogito Intelligence API that offer businesses focused on avoiding risks the confidence in using a solution already embedded with corporate security measures.
Megan Feil, August 6, 2013
July 26, 2013
We read numbers about the amount of time wasted on searching for documents all the time, and they are not pretty. When we stumbled upon Document Cloud, we could not help but wonder if this type of service will help with the productivity and efficiency issues that are currently all too common.
The homepage takes potential users through the steps of what using Document Cloud is like. First, users will have access to more information about their documents. Secondly, annotations and highlighting sections are functionalities that can be done with ease.
Finally, sharing work is possible:
“Everything you upload to DocumentCloud stays private until you’re ready to make it public, but once you decide to publish, your documents join thousands of other primary source documents in our public catalog. Use our document viewer to embed documents on your own website and introduce your audience to the larger paper trail behind your story. From our catalog, reporters and the public alike can find your documents and follow links back to your reporting. DocumentCloud contains court filings, hearing transcripts, testimony, legislation, reports, memos, meeting minutes, and correspondence.”
In summary, this is a service that will enable metadata to be produced for documents. If anyone needs us, we will be browsing the documents already in their catalog.
Megan Feil, July 26, 2013
July 26, 2013
The growing web of linked data not only grows in volume of data, but also in a growing set of vocabularies. We recently saw on Open Knowledge Foundation’s site that Mondeca’s Linked Open Vocabularies (LOV) have been updated. A collection of vocabulary spaces.
Users are able to find vocabularies listed and individually described by metadata, classified by vocabulary spaces and interlinked using the dedicated vocabulary VOAF.
We learned more about what LOV is about:
“Most popular ones form now a core of Semantic Web standards de jure (SKOS, Dublin Core, FRBR …) or de facto (FOAF, Event Ontology …). But many more are published and used. Not only linked data leverage a growing set of vocabularies, but vocabularies themselves rely more and more on each other through reusing, refining or extending, stating equivalences, declaring metadata. LOV objective is to provide easy access methods to this ecosystem of vocabularies, and in particular by making explicit the ways they link to each other and providing metrics on how they are used in the linked data cloud, help to improve their understanding, visibility and usability, and overall quality.”
There are a myriad of ways that those interested can feed their inner controlled vocabulary demon. One of which is to suggest a new vocabulary to add to LOV.
Megan Feil, July 26, 2013
January 3, 2013
Information is the only global currency and it is by no means a limited resource. The National Information Exchange Model (NIEM) Resource Database sees this and was initially developed out of a desire for a government-wide, standards-based approach to exchanging information.
Twenty states found that there were too many bureaucratic policies involved in exchanging information across state and city government lines and thus began the NIEM. This effort became known as the Global Justice Information Sharing Initiative.
The website continues on the background of this project and the Department of Homeland Security‘s connection:
“Parallel to the GJXDM effort was the stand up of the U.S. Department of Homeland Security. The mention of metadata in the president’s strategy for homeland security in the summer of 2002 galvanized the homeland security community to begin working towards standardization. These collaborative efforts by the justice and homeland security communities—to produce a set of common, well-defined data elements for data exchange development and harmonization—lead to the beginnings of NIEM.”
While it is difficult not to find this interesting, at the end of the day this is a government initiative in a time of severe financial challenges and we cannot help but wonder if this will hamper efforts to push forward. For now, take a look at the resource database while you can.
Megan Feil, January 03, 2013
December 10, 2012
The healthcare world continues its creep into the twenty-first century, and now Mondeca is lending a hand with the process. The French company’s Web site announces, “Mondeca Helps to Bring Electronic Patient Record to Reality.” Tasked with implementing healthcare management systems across France, that country’s healthcare agency, ASIP Santé, has turned to Mondeca for help. The press release describes the challenge:
“The task is a daunting one since most healthcare providers use their own custom terminologies and medical codes. This is due to a number of issues with standard terminologies: 1) standard terminologies take too long to be updated with the latest terms; 2) significant internal data, systems, and expertise rely on the usage of legacy custom terminologies; and 3) a part of the business domain is not covered by a standard terminology.
“The only way forward was to align the local custom terminologies and codes with the standard ones. This way local data can be automatically converted into the standard representation, which will in turn allow to integrate it with the data coming from other healthcare providers.”
The process began by aligning the standard terminology Logical Observation Identifiers Names and Codes (LOINC) with the related terminology common in Paris hospitals. Mondeca helped the effort with their expertise in complex organizational and technical processes, like setting up collaborative spaces and aligning and exporting terminology.
Our question: Will doctors use these systems without introducing more costs and errors in the push for cost efficiency? Let us hope so.
Established in 1999, Mondeca serves clients in Europe and North America with solutions for the management of advanced knowledge structures: ontologies, thesauri, taxonomies, terminologies, metadata repositories, knowledge bases, and linked open data. The firm is based in Paris, France.
Cynthia Murrell, December 10, 2012
November 9, 2012
Big data has held the media spotlight long enough to surpass any initial thought that it was a passing trend. Now the headlines trumpet how to benefit from the massive amounts of unstructured data flooding the internet and how to process it.
Computer Weekly’s article“How to Manage Unstructured Data for Business Benefit” explains how the next data evolution will be harnessing the benefits of both unstructured and structured data:
“There is as much value in unstructured data in terms of what customers are thinking on the web and what businesses can derive from other organizations’ data. It requires an understanding of the type of information the business is looking for and the kinds of insights business managers are hoping to draw from the data. The more considered the query, and the more focused the search, the better the results. This rule applies to both structured and unstructured data.”
Applying metadata to unstructured data opens up a profound new way to increase the findability of enterprise content, but the right solution is mandatory for success. Businesses looking for secure search and enterprise accessibility will find Intrafind provides customized solutions that combine to organize, tag and ultimately reveal relevant information to users of their enterprise search solutions. Powerful tools like this provide flexible options for data processing that put the power to increase efficiency and ROI back in the hands of the user.
Jennifer Shockley, November 9, 2012