September 23, 2015
Here’s an interesting project: we received an announcement about funding for Pop Up Archive: Search Your Sound. A joint effort of the WGBH Educational Foundation and the American Archive of Public Broadcasting, the venture’s goal is nothing less than to make almost 40,000 hours of Public Broadcasting media content easily accessible. The American Archive, now under the care of WGBH and the Library of Congress, has digitized that wealth of sound and video. Now, the details are in the metadata. The announcement reveals:
“As we’ve written before, metadata creation for media at scale benefits from both machine analysis and human correction. Pop Up Archive and WGBH are combining forces to do just that. Innovative features of the project include:
*Speech-to-text and audio analysis tools to transcribe and analyze almost 40,000 hours of digital audio from the American Archive of Public Broadcasting
*Open source web-based tools to improve transcripts and descriptive data by engaging the public in a crowdsourced, participatory cataloging project
*Creating and distributing data sets to provide a public database of audiovisual metadata for use by other projects.
“In addition to Pop Up Archive’s machine transcripts and automatic entity extraction (tagging), we’ll be conducting research in partnership with the HiPSTAS center at University of Texas at Austin to identify characteristics in audio beyond the words themselves. That could include emotional reactions like laughter and crying, speaker identities, and transitions between moods or segments.”
The project just received almost $900,000 in funding from the Institute of Museum and Library Services. This loot is on top of the grant received in 2013, from the Corporation for Public Broadcasting, that got the project started. But will it be enough money to develop a system that delivers on-point results? If not, we may be stuck with something clunky, something that resembles the old Autonomy Virage, Blinkxx, Exalead video search, or Google YouTube search. Let us hope this worthy endeavor continues to attract funding so that, someday, anyone can reliably (and intuitively) find valuable Public Broadcasting content.
Cynthia Murrell, September 23, 2015
June 18, 2015
When it comes to enterprise technology these days, it is all about making software compliant for a variety of platforms and needs. Compliancy is the name of the game for Basho, says Diginomica’s article, “Basho Aims For Enterprise Operational Simplicity With New Data Platform.” Basho’s upgrade to its Riak Data Platform makes it more integration with related tools and to make complex operational environments simpler. Data management and automation tools are another big seller for NoSQL enterprise databases, which Basho also added to the Riak upgrade. Basho is not the only company that is trying to improve NoSQL enterprise platforms, these include MongoDB and DataStax. Basho’s advantage is delivering a solution using the Riak data platform.
Basho’s data platform already offers a variety of functions that people try to get to work with a NoSQL database and they are nearly automated: Riak Search with Apache Solr, orchestration services, Apache Spark Connector, integrated caching with Redis, and simplified development using data replication and synchronization.
“CEO Adam Wray released some canned comment along with the announcement, which indicates that this is a big leap for Basho, but also is just the start of further broadening of the platform. He said:
‘This is a true turning point for the database industry, consolidating a variety of critical but previously disparate services to greatly simplify the operational requirements for IT teams working to scale applications with active workloads. The impact it will have on our users, and on the use of integrated data services more broadly, will be significant. We look forward to working closely with our community and the broader industry to further develop the Basho Data Platform.’”
The article explains that NoSQL market continues to grow and enterprises need management as well as automation to manage the growing number of tasks databases are used for. While a complete solution for all NoSQL needs has been developed, Basho comes fairly close.
Whitney Grace, June 18, 2015
April 8, 2015
Anyone interested in the mechanics behind image search should check out the description of PicSeer: Search Into Images from YangSky. The product write-up goes into surprising detail about what sets their “cognitive & semantic image search engine” apart, complete with comparative illustrations. The page’s translation seems to have been done either quickly or by machine, but don’t let the awkward wording in places put you off; there’s good information here. The text describes the competition’s approach:
“Today, the image searching experiences of all major commercial image search engines are embarrassing. This is because these image search engines are
- Using non-image correlations such as the image file names and the texts in the vicinity of the images to guess what are the images all about;
- Using low-level features, such as colors, textures and primary shapes, of image to make content-based indexing/retrievals.”
With the first approach, they note, trying to narrow the search terms is inefficient because the software is looking at metadata instead of inspecting the actual image; any narrowed search excludes many relevant entries. The second approach above simply does not consider enough information about images to return the most relevant, and only most relevant, results. The write-up goes on to explain what makes their product different, using for their example an endearing image of a smiling young boy:
“How can PicSeer have this kind of understanding towards images? The Physical Linguistic Vision Technologies have can represent cognitive features into nouns and verbs called computational nouns and computational verbs, respectively. In this case, the image of the boy is represented as a computational noun ‘boy’ and the facial expression of the boy is represented by a computational verb ‘smile’. All these steps are done by the computer itself automatically.”
See the write-up for many more details, including examples of how Google handles the “boy smiles” query. (Be warned– there’s a very brief section about porn filtering that includes a couple censored screenshots and adult keyword examples.) It looks like image search technology progressing apace.
Cynthia Murrell, April 08, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com
January 13, 2015
Germany’s foreign intelligence arm (BND) refuses to be outdone by our NSA. The World Socialist Web Site reports, “German Foreign Intelligence Service Plans Real-Time Surveillance of Social Networks.” The agency plans to invest €300 million by 2020 to catch up to the (Snowden-revealed) capabilities of U.S. and U.K. agencies. The stated goal is to thwart terrorism, of course, but reporter Sven Heymann is certain the initiative has more to do with tracking political dissidents who oppose the austerity policies of recent years.
Whatever the motivation, the BND has turned its attention to the wealth of information to be found in metadata. Smart spies. Heymann writes:
“While previously, there was mass surveillance of emails, telephone calls and faxes, now the intelligence agency intends to focus on the analysis of so-called metadata. This means the recording of details on the sender, receiver, subject line, and date and time of millions of messages, without reading their content.
“As the Süddeutsche Zeitung reported, BND representatives are apparently cynically attempting to present this to parliamentary deputies as the strengthening of citizens’ rights and freedoms in order to sell the proposal to the public.”
“In fact, the analysis of metadata makes it possible to identify details about a target person’s contacts. The BND is to be put in a position to know who is communicating with whom, when, and by what means. As is already known, the US sometimes conducts its lethal and illegal drone attacks purely on the basis of metadata.”
The article tells us the BND is also looking into the exploitation of newly revealed security weaknesses in common software, as well as tools to falsify biometric-security images (like fingerprints or iris scans). Though Germany’s intelligence agents are prohibited by law from spying on their own people, Heymann has little confidence that rule will be upheld. After all, so is the NSA.
Cynthia Murrell, January 13, 2015
November 28, 2014
As the Internet grows and evolves, the features users expect from search and content management systems is changing. SearchContentManagement addresses the shift in “Semantic Technologies Fuel the Web Experience Wave.” As the title suggests, writer Geoffrey Bock sees this shift as opening a new area with a new set of demands — “web experience management” (WEM) goes beyond “web content management” (WCM).
The inclusion of metadata and contextual information makes all the difference. For example, the information displayed by an airline’s site should, he posits, be different for a user working at their PC, who may want general information, and someone using their phone in the airport parking lot, where they probably need to check their gate number or see whether their flight has been delayed. (Bock is disappointed that none of the airlines’ sites yet work this way.)
The article continues:
“Not surprisingly, to make contextually aware Web content work correctly, a lot of intelligence needs to be added to the underlying information sources, including metadata that describes the snippets, as well as location-specific geo-codes coming from the devices themselves. There is more to content than just publishing and displaying it correctly across multiple channels. It is important to pay attention to the underlying meaning and how content is used — the ‘semantics’ associated with it.
“Another aspect of managing Web experiences is to know when you are successful. It’s essential to integrate tracking and monitoring capabilities into the underlying platform, and to link business metrics to content delivery. Counting page views, search terms and site visitors is only the beginning. It’s important for business users to be able to tailor metrics and reporting to the key performance indicators that drive business decisions.”
Bock supplies an example of one company, specialty-plumbing supplier Uponor, that is making good use of such “WEM” possibilities. See the article for more details on his strategy for leveraging the growing potential of semantic technology.
Cynthia Murrell, November 28, 2014
November 24, 2014
Here is a unique idea that we have not heard about: “Build Your Own Canto Metadata Webinar.” Canto is a company that specializes in digital asset management and their award-winning Cumulus software is an industry favorite to manage taxonomies and metadata for digital content. People often forget how important metadata is Web content:
“Metadata lets you do more with your digital content.
Metadata can save you from copyright lawsuits.
Metadata can speed your everyday workflow.”
The webinar is advertised as way to help people understand what exactly metadata is, how people can harness it to their advantage, and how to engage more people into using it. While anyone can teach a webinar about metadata, Canto is building the entire session around users’ questions. They will be able to tweet questions before and during the meeting.
The webinar is led by three metadata experts: Thomas Schleu-CTO/Co-Founder of Canto, Phoenix Von Lieven-Director of Professional Services, Americas Cantos, and Danielle Forshtay-Publications Coordinator of Lockheed Martin. These experts will lend their knowledge to attendees.
“Build Your Own Canto Webinar” is an odd way to advertise an online class about metadata. Why is it called make your own? Are the attendees shaping the class’ content entirely? It does bear further investigation by attending the webinar on November 19.
November 18, 2014
The article on CNN Money titled Varonis Announces Metadata Framework Version 6, Including New Functionality For Four Varonis Solutions explores the new features of Version 6. Varonis, the leading software provider, focuses on human-generated data that is unstructured and might include anything from spreadsheets to emails to text messages. They can boast over 3,000 customers in fields as varied as healthcare, media and financial services. The Varonis MetaData Framework has been perfected over the last decade. The article describes it this way,
“ [It is ] a single platform on a unifying code base, purpose-built to tackle the many challenges and use cases that arise from the massive volumes of unstructured data files created and stored by organizations of all sizes. Currently powering five distinct Varonis products, the Varonis Metadata Framework intelligently extracts and analyzes metadata from customers’ vast, distributed unstructured data stores, and enables a variety of uses cases, including data governance, data security, archiving, file synchronization, enhanced mobile data accessibility, search, and business collaboration.”
Exciting new features in Version 6 include a search API for DatAnswers, “bi-directional permissions visibility” for DatAdvantage to reduce operational overhead, and reduced risk through DatAlert with the information of malware location and timing.
Chelsea Kerwin, November 18, 2014
January 22, 2014
Did you know that there was an open source version of ClearForest called Calais? Neither did we, until we read about it in the article posted on OpenCalais called, “Calais: Connect. Everything.” Along with a short instructional video, is a text explanation about how the software works. OpenCalais Web Service automatically creates rich semantic metadata using natural language processing, machine learning, and other methods to analyze for submitted content. A list of tags are generated and returned to the user for review and then the user can paste them onto other documents.
The metadata can be used in a variety of ways for improvement:
“The metadata gives you the ability to build maps (or graphs or networks) linking documents to people to companies to places to products to events to geographies to… whatever. You can use those maps to improve site navigation, provide contextual syndication, tag and organize your content, create structured folksonomies, filter and de-duplicate news feeds, or analyze content to see if it contains what you care about.”
The OpenCalais Web Service relies on a dedicated community to keep making progress and pushing the application forward. Calais takes the same approach as other open source projects, except this one is powered by Thomson Reuters.
January 10, 2014
When Netflix first launched I read an article about how everyone’s individual movie tastes are different. There are not any two alike and Netflix created an algorithm that managed to track each user’s queue down to the individual. It was scary and amazing at the same time. Netflix eventually decided to can the algorithm (or at least they told us), but it still leaves a thought that small traces of metadata can lead to you. The Threat Post, a Web site that tracks Internet security threats, reported on how “Stanford Researchers Find Connecting Metadata With User Names Is Simple.”
A claim has been made that user phone data anonymously generated cannot be tracked back to an individual. Stanford Researchers proved otherwise. The team started the Metaphone program that collects data from volunteers with Android phones. The project’s main point was to collect calls, text messages, and social network information for the Stanford Security Lab to connect metadata and surveillance. They selected 5,000 random numbers and were able to match 27% of the them using Web sites people user everyday.
The article states:
“ ‘What about if an organization were willing to put in some manpower? To conservatively approximate human analysis, we randomly sampled 100 numbers from our dataset, and then ran Google searches on each. In under an hour, we were able to associate an individual or a business with 60 of the 100 numbers. When we added in our three initial sources, we were up to 73,’ said Jonathan Mayer and Patrick Mutchler in a blog post explaining the results.”
The article also points out that if money was not a problem, then the results would be even more accurate. The Stanford Researchers users a cheap data aggregator instead and accurately matched 91 out of 100 numbers. Data is not as protected or as anonymous as we thought. People are willing to share their whole lives on social media, but when security is mentioned they go bonkers over an issue like this? It is still a scary thought, but where is the line drawn over willing shared information and privacy?
Whitney Grace, January 10, 2014
December 26, 2013
For Obama’s 2012 re-election campaign, his team broke down data silos and moved all the data to a cloud repository. The team built Narwhal, a shared data store interface for all of the campaigns’ application. Narwhal was dubbed “Obama’s White Whale,” because it is almost a mythical technology that federal agencies have been trying to develop for years. While Obama may be hanging out with Queequag and Ishmael, there is a more viable solution for the cloud says GCN’s article, “Big Metadata: 7 Ways To Leverage Your Data In the Cloud.”
Data silo migration may appear to be a daunting task, but it is not impossible to do. The article states:
“Fortunately, migrating agency data to the cloud offers IT managers another opportunity to break down those silos, integrate their data and develop a unified data layer for all applications. In this article, I want to examine how to design metadata in the cloud to enable the description, discovery and reuse of data assets in the cloud. Here are the basic metadata description methods (what I like to think of as the “Magnificent Seven” of metadata!) and how to apply them to data in the cloud.”
The list runs down seven considerations when moving to the cloud: identification, static and dynamic measurement, degree scales, categorization, relationships, and commentary. The only thing that stands in trashing data silos is security and privacy. While this list is useful it is pretty basic textbook information that is applied to metadata in any situation. What makes it so special for the cloud?
Whitney Grace, December 26, 2013