April 8, 2015
Anyone interested in the mechanics behind image search should check out the description of PicSeer: Search Into Images from YangSky. The product write-up goes into surprising detail about what sets their “cognitive & semantic image search engine” apart, complete with comparative illustrations. The page’s translation seems to have been done either quickly or by machine, but don’t let the awkward wording in places put you off; there’s good information here. The text describes the competition’s approach:
“Today, the image searching experiences of all major commercial image search engines are embarrassing. This is because these image search engines are
- Using non-image correlations such as the image file names and the texts in the vicinity of the images to guess what are the images all about;
- Using low-level features, such as colors, textures and primary shapes, of image to make content-based indexing/retrievals.”
With the first approach, they note, trying to narrow the search terms is inefficient because the software is looking at metadata instead of inspecting the actual image; any narrowed search excludes many relevant entries. The second approach above simply does not consider enough information about images to return the most relevant, and only most relevant, results. The write-up goes on to explain what makes their product different, using for their example an endearing image of a smiling young boy:
“How can PicSeer have this kind of understanding towards images? The Physical Linguistic Vision Technologies have can represent cognitive features into nouns and verbs called computational nouns and computational verbs, respectively. In this case, the image of the boy is represented as a computational noun ‘boy’ and the facial expression of the boy is represented by a computational verb ‘smile’. All these steps are done by the computer itself automatically.”
See the write-up for many more details, including examples of how Google handles the “boy smiles” query. (Be warned– there’s a very brief section about porn filtering that includes a couple censored screenshots and adult keyword examples.) It looks like image search technology progressing apace.
Cynthia Murrell, April 08, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com
January 13, 2015
Germany’s foreign intelligence arm (BND) refuses to be outdone by our NSA. The World Socialist Web Site reports, “German Foreign Intelligence Service Plans Real-Time Surveillance of Social Networks.” The agency plans to invest €300 million by 2020 to catch up to the (Snowden-revealed) capabilities of U.S. and U.K. agencies. The stated goal is to thwart terrorism, of course, but reporter Sven Heymann is certain the initiative has more to do with tracking political dissidents who oppose the austerity policies of recent years.
Whatever the motivation, the BND has turned its attention to the wealth of information to be found in metadata. Smart spies. Heymann writes:
“While previously, there was mass surveillance of emails, telephone calls and faxes, now the intelligence agency intends to focus on the analysis of so-called metadata. This means the recording of details on the sender, receiver, subject line, and date and time of millions of messages, without reading their content.
“As the Süddeutsche Zeitung reported, BND representatives are apparently cynically attempting to present this to parliamentary deputies as the strengthening of citizens’ rights and freedoms in order to sell the proposal to the public.”
“In fact, the analysis of metadata makes it possible to identify details about a target person’s contacts. The BND is to be put in a position to know who is communicating with whom, when, and by what means. As is already known, the US sometimes conducts its lethal and illegal drone attacks purely on the basis of metadata.”
The article tells us the BND is also looking into the exploitation of newly revealed security weaknesses in common software, as well as tools to falsify biometric-security images (like fingerprints or iris scans). Though Germany’s intelligence agents are prohibited by law from spying on their own people, Heymann has little confidence that rule will be upheld. After all, so is the NSA.
Cynthia Murrell, January 13, 2015
November 28, 2014
As the Internet grows and evolves, the features users expect from search and content management systems is changing. SearchContentManagement addresses the shift in “Semantic Technologies Fuel the Web Experience Wave.” As the title suggests, writer Geoffrey Bock sees this shift as opening a new area with a new set of demands — “web experience management” (WEM) goes beyond “web content management” (WCM).
The inclusion of metadata and contextual information makes all the difference. For example, the information displayed by an airline’s site should, he posits, be different for a user working at their PC, who may want general information, and someone using their phone in the airport parking lot, where they probably need to check their gate number or see whether their flight has been delayed. (Bock is disappointed that none of the airlines’ sites yet work this way.)
The article continues:
“Not surprisingly, to make contextually aware Web content work correctly, a lot of intelligence needs to be added to the underlying information sources, including metadata that describes the snippets, as well as location-specific geo-codes coming from the devices themselves. There is more to content than just publishing and displaying it correctly across multiple channels. It is important to pay attention to the underlying meaning and how content is used — the ‘semantics’ associated with it.
“Another aspect of managing Web experiences is to know when you are successful. It’s essential to integrate tracking and monitoring capabilities into the underlying platform, and to link business metrics to content delivery. Counting page views, search terms and site visitors is only the beginning. It’s important for business users to be able to tailor metrics and reporting to the key performance indicators that drive business decisions.”
Bock supplies an example of one company, specialty-plumbing supplier Uponor, that is making good use of such “WEM” possibilities. See the article for more details on his strategy for leveraging the growing potential of semantic technology.
Cynthia Murrell, November 28, 2014
November 24, 2014
Here is a unique idea that we have not heard about: “Build Your Own Canto Metadata Webinar.” Canto is a company that specializes in digital asset management and their award-winning Cumulus software is an industry favorite to manage taxonomies and metadata for digital content. People often forget how important metadata is Web content:
“Metadata lets you do more with your digital content.
Metadata can save you from copyright lawsuits.
Metadata can speed your everyday workflow.”
The webinar is advertised as way to help people understand what exactly metadata is, how people can harness it to their advantage, and how to engage more people into using it. While anyone can teach a webinar about metadata, Canto is building the entire session around users’ questions. They will be able to tweet questions before and during the meeting.
The webinar is led by three metadata experts: Thomas Schleu-CTO/Co-Founder of Canto, Phoenix Von Lieven-Director of Professional Services, Americas Cantos, and Danielle Forshtay-Publications Coordinator of Lockheed Martin. These experts will lend their knowledge to attendees.
“Build Your Own Canto Webinar” is an odd way to advertise an online class about metadata. Why is it called make your own? Are the attendees shaping the class’ content entirely? It does bear further investigation by attending the webinar on November 19.
November 18, 2014
The article on CNN Money titled Varonis Announces Metadata Framework Version 6, Including New Functionality For Four Varonis Solutions explores the new features of Version 6. Varonis, the leading software provider, focuses on human-generated data that is unstructured and might include anything from spreadsheets to emails to text messages. They can boast over 3,000 customers in fields as varied as healthcare, media and financial services. The Varonis MetaData Framework has been perfected over the last decade. The article describes it this way,
“ [It is ] a single platform on a unifying code base, purpose-built to tackle the many challenges and use cases that arise from the massive volumes of unstructured data files created and stored by organizations of all sizes. Currently powering five distinct Varonis products, the Varonis Metadata Framework intelligently extracts and analyzes metadata from customers’ vast, distributed unstructured data stores, and enables a variety of uses cases, including data governance, data security, archiving, file synchronization, enhanced mobile data accessibility, search, and business collaboration.”
Exciting new features in Version 6 include a search API for DatAnswers, “bi-directional permissions visibility” for DatAdvantage to reduce operational overhead, and reduced risk through DatAlert with the information of malware location and timing.
Chelsea Kerwin, November 18, 2014
January 22, 2014
Did you know that there was an open source version of ClearForest called Calais? Neither did we, until we read about it in the article posted on OpenCalais called, “Calais: Connect. Everything.” Along with a short instructional video, is a text explanation about how the software works. OpenCalais Web Service automatically creates rich semantic metadata using natural language processing, machine learning, and other methods to analyze for submitted content. A list of tags are generated and returned to the user for review and then the user can paste them onto other documents.
The metadata can be used in a variety of ways for improvement:
“The metadata gives you the ability to build maps (or graphs or networks) linking documents to people to companies to places to products to events to geographies to… whatever. You can use those maps to improve site navigation, provide contextual syndication, tag and organize your content, create structured folksonomies, filter and de-duplicate news feeds, or analyze content to see if it contains what you care about.”
The OpenCalais Web Service relies on a dedicated community to keep making progress and pushing the application forward. Calais takes the same approach as other open source projects, except this one is powered by Thomson Reuters.
January 10, 2014
When Netflix first launched I read an article about how everyone’s individual movie tastes are different. There are not any two alike and Netflix created an algorithm that managed to track each user’s queue down to the individual. It was scary and amazing at the same time. Netflix eventually decided to can the algorithm (or at least they told us), but it still leaves a thought that small traces of metadata can lead to you. The Threat Post, a Web site that tracks Internet security threats, reported on how “Stanford Researchers Find Connecting Metadata With User Names Is Simple.”
A claim has been made that user phone data anonymously generated cannot be tracked back to an individual. Stanford Researchers proved otherwise. The team started the Metaphone program that collects data from volunteers with Android phones. The project’s main point was to collect calls, text messages, and social network information for the Stanford Security Lab to connect metadata and surveillance. They selected 5,000 random numbers and were able to match 27% of the them using Web sites people user everyday.
The article states:
“ ‘What about if an organization were willing to put in some manpower? To conservatively approximate human analysis, we randomly sampled 100 numbers from our dataset, and then ran Google searches on each. In under an hour, we were able to associate an individual or a business with 60 of the 100 numbers. When we added in our three initial sources, we were up to 73,’ said Jonathan Mayer and Patrick Mutchler in a blog post explaining the results.”
The article also points out that if money was not a problem, then the results would be even more accurate. The Stanford Researchers users a cheap data aggregator instead and accurately matched 91 out of 100 numbers. Data is not as protected or as anonymous as we thought. People are willing to share their whole lives on social media, but when security is mentioned they go bonkers over an issue like this? It is still a scary thought, but where is the line drawn over willing shared information and privacy?
Whitney Grace, January 10, 2014
December 26, 2013
For Obama’s 2012 re-election campaign, his team broke down data silos and moved all the data to a cloud repository. The team built Narwhal, a shared data store interface for all of the campaigns’ application. Narwhal was dubbed “Obama’s White Whale,” because it is almost a mythical technology that federal agencies have been trying to develop for years. While Obama may be hanging out with Queequag and Ishmael, there is a more viable solution for the cloud says GCN’s article, “Big Metadata: 7 Ways To Leverage Your Data In the Cloud.”
Data silo migration may appear to be a daunting task, but it is not impossible to do. The article states:
“Fortunately, migrating agency data to the cloud offers IT managers another opportunity to break down those silos, integrate their data and develop a unified data layer for all applications. In this article, I want to examine how to design metadata in the cloud to enable the description, discovery and reuse of data assets in the cloud. Here are the basic metadata description methods (what I like to think of as the “Magnificent Seven” of metadata!) and how to apply them to data in the cloud.”
The list runs down seven considerations when moving to the cloud: identification, static and dynamic measurement, degree scales, categorization, relationships, and commentary. The only thing that stands in trashing data silos is security and privacy. While this list is useful it is pretty basic textbook information that is applied to metadata in any situation. What makes it so special for the cloud?
Whitney Grace, December 26, 2013
October 27, 2013
If you are in need of a relatively painless way to obtain metadata, DocumentCloud might be your solution. Every uploaded document is run through OpenCalais, allowing for user access to widespread information mentioned in them. It simplifies the search for people, places and organizations from your documents and allows you to plot them by dates mentioned in a timeline that can be as specific or general as the user desires.
“Use our document viewer to embed documents on your own website and introduce your audience to the larger paper trail behind your story.
From our catalog, reporters and the public alike can find your documents and follow links back to your reporting. DocumentCloud contains court filings, hearing transcripts, testimony, legislation, reports, memos, meeting minutes, and correspondence. See what’s already in our catalog. Make your documents part of the cloud.”
If you prefer privacy, that is a built-in feature. If you prefer to publish, your documents become a part of the landscape of primary sources in the DocumentCloud catalogue. There is also a highlighting feature that accommodates both public annotations and more private organizational notes. Each note has its own URL, enabling users to show their readers the exact information they need.
Chelsea Kerwin, October 27, 2013
August 6, 2013
The rise of metadata is here, but will companies be able to harness its value? Concept Searching points to the answer that ROI has not been successful with this across the board. A recent article, “Solving the Inadequacies and Failures in Enterprise Search,” admonishes the laissez-faire approach that some companies have towards enterprise search. The author advocates, instead, towards a hands-on information governance approach.
What the author calls a “metadata infrastructure framework” should be created and should be comprised of automated intelligent metadata generation, auto-classification, and the use of goal and mission aligned taxonomies.
According to the article:
The need for organizations to access and fully exploit the use of their unstructured content won’t happen overnight. Organizations must incorporate an approach that addresses the lack of an intelligent metadata infrastructure, which is the fundamental problem. Intelligent search, a by-product of the infrastructure, must encourage, not hamper, the use and reuse of information and be rapidly extendable to address text mining, sentiment analysis, eDiscovery and litigation support. The additional components of auto-classification and taxonomies complete the core infrastructure to deploy intelligent metadata enabled solutions, including records management, data privacy, and migration.
We wholeheartedly agree that investing in infrastructure is a necessity — across many areas, not just search. However, when it comes to a search infrastructure, we would be remiss not to mention the importance of security. Fortunately there are solutions like Cogito Intelligence API that offer businesses focused on avoiding risks the confidence in using a solution already embedded with corporate security measures.
Megan Feil, August 6, 2013