CyberOSINT banner

TemaTres Open Source Vocabulary Server

November 3, 2015

The latest version of the TemaTres vocabulary server is now available, we learn from the company’s blog post, “TemaTres 2.0 Released.” Released under the GNU General Public License version 2.0, the web application helps manage taxonomies, thesauri, and multilingual vocabularies. The web application can be downloaded at SourceForge. Here’s what has changed since the last release:

*Export to Moodle your vocabulary: now you can export to Moodle Glossary XML format

*Metadata summary about each term and about your vocabulary (data about terms, relations, notes and total descendants terms, deep levels, etc)

*New report: reports about terms with mapping relations, terms by status, preferred terms, etc.

*New report: reports about terms without notes or specific type of notes

*Import the notes type defined by user (custom notes) using tagged file format

*Select massively free terms to assign to other term

*Improve utilities to take terminological recommendations from other vocabularies (more than 300:

*Update Zthes schema to Zthes 1.0 (Thanks to Wilbert Kraan)

*Export the whole vocabulary to Metadata Authority Description Schema (MADS)

*Fixed bugs and improved several functional aspects.

*Uses Bootstrap v3.3.4

See the server’s SourceForge page, above, for the full list of features. Though as of this writing only 21 users had rated the product, all seemed very pleased with the results. The TemaTres website notes that running the server requires some other open source tools: PHP, MySql, and HTTP Web server. It also specifies that, to update from version 1.82, keep the db.tematres.php, but replace the code. To update from TemaTres 1.6 or earlier, first go in as an administrator and update to version 1.7 through Menu-> Administration -> Database Maintenance.

Cynthia Murrell, November 3, 2015

Sponsored by, publisher of the CyberOSINT monograph

Braiding Big Data

October 26, 2015

An apt metaphor to explain big data is the act of braiding.  Braiding requires  person to take three or more locks of hair and alternating weaving them together.  The end result is clean, pretty hairstyle that keeps a person’s hair in place and off the face.  Big data is like braiding, because specially tailored software takes an unruly mess of data, including the combed and uncombed strands, and organizes them into a legible format.   Perhaps this is why TopQuadrant named its popular big data software TopBraid, read more about its software upgrade in “TopQuadrant Launches TopBraid 5.0.”

TopBraid Suite is an enterprise Web-based solution set that simplifies the development and management of standards-based, model driven solutions focused on taxonomy, ontology, metadata management, reference data governance, and data virtualization.  The newest upgrade for TopBraid builds on the current enterprise information management solutions and adds new options:

“ ‘It continues to be our goal to improve ways for users to harness the full potential of their data,’ said Irene Polikoff, CEO and co-founder of TopQuadrant. ‘This latest release of 5.0 includes an exciting new feature, AutoClassifier. While our TopBraid Enterprise Vocabulary Net (EVN) Tagger has let users manually tag content with concepts from their vocabularies for several years, AutoClassifier completely automates that process.’ “


The AutoClassifer makes it easier to add and edit tags before making them a part of the production tag set. Other new features are for TopBraid Enterprise Vocabulary Net (TopBraid EVN), TopBraid Reference Data Manager (RDM), TopBraid Insight, and the TopBraid platform, including improvements in internationalization and a new component for increasing system availability in enterprise environments, TopBraid DataCache.

TopBraid might be the solution an enterprise system needs to braid its data into style.

Whitney Grace, October 26, 2015

Sponsored by, publisher of the CyberOSINT monograph

Funding Granted for American Archive Search Project

September 23, 2015

Here’s an interesting project: we received an announcement about funding for Pop Up Archive: Search Your Sound. A joint effort of the WGBH Educational Foundation and the American Archive of Public Broadcasting, the venture’s goal is nothing less than to make almost 40,000 hours of Public Broadcasting media content easily accessible. The American Archive, now under the care of WGBH and the Library of Congress, has digitized that wealth of sound and video. Now, the details are in the metadata. The announcement reveals:

As we’ve written before, metadata creation for media at scale benefits from both machine analysis and human correction. Pop Up Archive and WGBH are combining forces to do just that. Innovative features of the project include:

*Speech-to-text and audio analysis tools to transcribe and analyze almost 40,000 hours of digital audio from the American Archive of Public Broadcasting

*Open source web-based tools to improve transcripts and descriptive data by engaging the public in a crowdsourced, participatory cataloging project

*Creating and distributing data sets to provide a public database of audiovisual metadata for use by other projects.

“In addition to Pop Up Archive’s machine transcripts and automatic entity extraction (tagging), we’ll be conducting research in partnership with the HiPSTAS center at University of Texas at Austin to identify characteristics in audio beyond the words themselves. That could include emotional reactions like laughter and crying, speaker identities, and transitions between moods or segments.”

The project just received almost $900,000 in funding from the Institute of Museum and Library Services. This loot is on top of the grant received in 2013, from the Corporation for Public Broadcasting, that got the project started. But will it be enough money to develop a system that delivers on-point results? If not, we may be stuck with something clunky, something that resembles the old Autonomy Virage, Blinkxx, Exalead video search, or Google YouTube search. Let us hope this worthy endeavor continues to attract funding so that, someday, anyone can reliably (and intuitively) find valuable Public Broadcasting content.

Cynthia Murrell, September 23, 2015

Sponsored by, publisher of the CyberOSINT monograph

Basho Enters Ring With New Data Platform

June 18, 2015

When it comes to enterprise technology these days, it is all about making software compliant for a variety of platforms and needs.  Compliancy is the name of the game for Basho, says Diginomica’s article, “Basho Aims For Enterprise Operational Simplicity With New Data Platform.”  Basho’s upgrade to its Riak Data Platform makes it more integration with related tools and to make complex operational environments simpler.  Data management and automation tools are another big seller for NoSQL enterprise databases, which Basho also added to the Riak upgrade.  Basho is not the only company that is trying to improve NoSQL enterprise platforms, these include MongoDB and DataStax.  Basho’s advantage is delivering a solution using the  Riak data platform.

Basho’s data platform already offers a variety of functions that people try to get to work with a NoSQL database and they are nearly automated: Riak Search with Apache Solr, orchestration services, Apache Spark Connector, integrated caching with Redis, and simplified development using data replication and synchronization.

“CEO Adam Wray released some canned comment along with the announcement, which indicates that this is a big leap for Basho, but also is just the start of further broadening of the platform. He said:

‘This is a true turning point for the database industry, consolidating a variety of critical but previously disparate services to greatly simplify the operational requirements for IT teams working to scale applications with active workloads. The impact it will have on our users, and on the use of integrated data services more broadly, will be significant. We look forward to working closely with our community and the broader industry to further develop the Basho Data Platform.’”

The article explains that NoSQL market continues to grow and enterprises need management as well as automation to manage the growing number of tasks databases are used for.  While a complete solution for all NoSQL needs has been developed, Basho comes fairly close.

Whitney Grace, June 18, 2015

Sponsored by, publisher of the CyberOSINT monograph

Progress in Image Search Tech

April 8, 2015

Anyone interested in the mechanics behind image search should check out the description of PicSeer: Search Into Images from YangSky. The product write-up goes into surprising detail about what sets their “cognitive & semantic image search engine” apart, complete with comparative illustrations. The page’s translation seems to have been done either quickly or by machine, but don’t let the awkward wording in places put you off; there’s good information here. The text describes the competition’s approach:

“Today, the image searching experiences of all major commercial image search engines are embarrassing. This is because these image search engines are

  1. Using non-image correlations such as the image file names and the texts in the vicinity of the images to guess what are the images all about;
  2. Using low-level features, such as colors, textures and primary shapes, of image to make content-based indexing/retrievals.”

With the first approach, they note, trying to narrow the search terms is inefficient because the software is looking at metadata instead of inspecting the actual image; any narrowed search excludes many relevant entries. The second approach above simply does not consider enough information about images to return the most relevant, and only most relevant, results. The write-up goes on to explain what makes their product different, using for their example an endearing image of a smiling young boy:

“How can PicSeer have this kind of understanding towards images? The Physical Linguistic Vision Technologies have can represent cognitive features into nouns and verbs called computational nouns and computational verbs, respectively. In this case, the image of the boy is represented as a computational noun ‘boy’ and the facial expression of the boy is represented by a computational verb ‘smile’. All these steps are done by the computer itself automatically.”

See the write-up for many more details, including examples of how Google handles the “boy smiles” query. (Be warned– there’s a very brief section about porn filtering that includes a couple censored screenshots and adult keyword examples.) It looks like image search technology progressing apace.

Cynthia Murrell, April 08, 2015

Stephen E Arnold, Publisher of CyberOSINT at

German Spies Eye Metadata

January 13, 2015

Germany’s foreign intelligence arm (BND) refuses to be outdone by our NSA. The World Socialist Web Site reports, “German Foreign Intelligence Service Plans Real-Time Surveillance of Social Networks.” The agency plans to invest €300 million by 2020 to catch up to the (Snowden-revealed) capabilities of U.S. and U.K. agencies. The stated goal is to thwart terrorism, of course, but reporter Sven Heymann is certain the initiative has more to do with tracking political dissidents who oppose the austerity policies of recent years.

Whatever the motivation, the BND has turned its attention to the wealth of information to be found in metadata. Smart spies. Heymann writes:

“While previously, there was mass surveillance of emails, telephone calls and faxes, now the intelligence agency intends to focus on the analysis of so-called metadata. This means the recording of details on the sender, receiver, subject line, and date and time of millions of messages, without reading their content.

“As the Süddeutsche Zeitung reported, BND representatives are apparently cynically attempting to present this to parliamentary deputies as the strengthening of citizens’ rights and freedoms in order to sell the proposal to the public.”

“In fact, the analysis of metadata makes it possible to identify details about a target person’s contacts. The BND is to be put in a position to know who is communicating with whom, when, and by what means. As is already known, the US sometimes conducts its lethal and illegal drone attacks purely on the basis of metadata.”

The article tells us the BND is also looking into the exploitation of newly revealed security weaknesses in common software, as well as tools to falsify biometric-security images (like fingerprints or iris scans). Though Germany’s intelligence agents are prohibited by law from spying on their own people, Heymann has little confidence that rule will be upheld. After all, so is the NSA.

Cynthia Murrell, January 13, 2015

Sponsored by, developer of Augmentext

Now Entering the Age of Web Experience Management

November 28, 2014

As the Internet grows and evolves, the features users expect from search and content management systems is changing. SearchContentManagement addresses the shift in “Semantic Technologies Fuel the Web Experience Wave.” As the title suggests, writer Geoffrey Bock sees this shift as opening a new area with a new set of demands — “web experience management” (WEM) goes beyond “web content management” (WCM).

The inclusion of metadata and contextual information makes all the difference. For example, the information displayed by an airline’s site should, he posits, be different for a user working at their PC, who may want general information, and someone using their phone in the airport parking lot, where they probably need to check their gate number or see whether their flight has been delayed. (Bock is disappointed that none of the airlines’ sites yet work this way.)

The article continues:

“Not surprisingly, to make contextually aware Web content work correctly, a lot of intelligence needs to be added to the underlying information sources, including metadata that describes the snippets, as well as location-specific geo-codes coming from the devices themselves. There is more to content than just publishing and displaying it correctly across multiple channels. It is important to pay attention to the underlying meaning and how content is used — the ‘semantics’ associated with it.

“Another aspect of managing Web experiences is to know when you are successful. It’s essential to integrate tracking and monitoring capabilities into the underlying platform, and to link business metrics to content delivery. Counting page views, search terms and site visitors is only the beginning. It’s important for business users to be able to tailor metrics and reporting to the key performance indicators that drive business decisions.”

Bock supplies an example of one company, specialty-plumbing supplier Uponor, that is making good use of such “WEM” possibilities. See the article for more details on his strategy for leveraging the growing potential of semantic technology.

Cynthia Murrell, November 28, 2014

Sponsored by, developer of Augmentext

Make Your Own Metadata Webinar

November 24, 2014

Here is a unique idea that we have not heard about: “Build Your Own Canto Metadata Webinar.” Canto is a company that specializes in digital asset management and their award-winning Cumulus software is an industry favorite to manage taxonomies and metadata for digital content. People often forget how important metadata is Web content:

“Metadata lets you do more with your digital content.

Metadata can save you from copyright lawsuits.

Metadata can speed your everyday workflow.”

The webinar is advertised as way to help people understand what exactly metadata is, how people can harness it to their advantage, and how to engage more people into using it. While anyone can teach a webinar about metadata, Canto is building the entire session around users’ questions. They will be able to tweet questions before and during the meeting.

The webinar is led by three metadata experts: Thomas Schleu-CTO/Co-Founder of Canto, Phoenix Von Lieven-Director of Professional Services, Americas Cantos, and Danielle Forshtay-Publications Coordinator of Lockheed Martin. These experts will lend their knowledge to attendees.

“Build Your Own Canto Webinar” is an odd way to advertise an online class about metadata. Why is it called make your own? Are the attendees shaping the class’ content entirely? It does bear further investigation by attending the webinar on November 19.

Whitney Grace, November 24, 2014
Sponsored by, developer of Augmentext

Version 6 of Varonis Metadata Framework to Be Generally Available by Years End

November 18, 2014

The article on CNN Money titled Varonis Announces Metadata Framework Version 6, Including New Functionality For Four Varonis Solutions explores the new features of Version 6. Varonis, the leading software provider, focuses on human-generated data that is unstructured and might include anything from spreadsheets to emails to text messages. They can boast over 3,000 customers in fields as varied as healthcare, media and financial services. The Varonis MetaData Framework has been perfected over the last decade. The article describes it this way,

“ [It is ] a single platform on a unifying code base, purpose-built to tackle the many challenges and use cases that arise from the massive volumes of unstructured data files created and stored by organizations of all sizes. Currently powering five distinct Varonis products, the Varonis Metadata Framework intelligently extracts and analyzes metadata from customers’ vast, distributed unstructured data stores, and enables a variety of uses cases, including data governance, data security, archiving, file synchronization, enhanced mobile data accessibility, search, and business collaboration.”

Exciting new features in Version 6 include a search API for DatAnswers, “bi-directional permissions visibility” for DatAdvantage to reduce operational overhead, and reduced risk through DatAlert with the information of malware location and timing.

Chelsea Kerwin, November 18, 2014

Sponsored by, developer of Augmentext

Learn About the Open Source Alternative to ClearForest

January 22, 2014

Did you know that there was an open source version of ClearForest called Calais? Neither did we, until we read about it in the article posted on OpenCalais called, “Calais: Connect. Everything.” Along with a short instructional video, is a text explanation about how the software works. OpenCalais Web Service automatically creates rich semantic metadata using natural language processing, machine learning, and other methods to analyze for submitted content. A list of tags are generated and returned to the user for review and then the user can paste them onto other documents.

The metadata can be used in a variety of ways for improvement:

“The metadata gives you the ability to build maps (or graphs or networks) linking documents to people to companies to places to products to events to geographies to… whatever. You can use those maps to improve site navigation, provide contextual syndication, tag and organize your content, create structured folksonomies, filter and de-duplicate news feeds, or analyze content to see if it contains what you care about.”

The OpenCalais Web Service relies on a dedicated community to keep making progress and pushing the application forward. Calais takes the same approach as other open source projects, except this one is powered by Thomson Reuters.

Whitney Grace, January 22, 2014
Sponsored by, developer of Augmentext

Next Page »