CyberOSINT banner

Data Darkness

May 28, 2015

According to Datameer, organizations do not use a large chunk of their data and it is commonly referred to “dark data.”  “Shine Light On Dark Data” explains that organizations are trying to dig out the dark data and use it for business intelligence or in more recent terms big data.  Dark data is created from back end business processes as well as from regular business activities.  It is usually stored on storage silo in a closet and only kept for compliance audits.

Dark data has a lot of hidden potential:

Research firm IDC estimates that 90 percent of digital data is dark. This dark data may come in the form of machine or sensor logs that when analyzed help predict vacated real estate or customer time zones that may help businesses pinpoint when customers in a specific region prefer to engage with brands. While the value of these insights are very significant, setting foot into the world of dark data that is unstructured, untagged and untapped is daunting for both IT and business users.”

The article suggests making a plan to harness the dark data and it does not offer much in the way of approaching a project other than making it specifically for dark data, such as identifying sources, use Hadoop to mine it, and tests results against other data sets.

This article is really a puff piece highlighting dark data without going into much detail about it.  They are forgetting that the biggest movement in IT from the past three years: big data!

Whitney Grace, May 28, 2015

Sponsored by, publisher of the CyberOSINT monograph


Altiar And dtSearch Combine

May 27, 2015

Sometimes when items are combine they create something even better, such as Oreos and peanut butter, Disney and Marvel, and Netflix and original series.  EContentMag alerted us that a new team-up is underway between two well known companies.  The press release title says it all, “Altiar Cloud-Based ECM Platform Is Embedding The dtSearch Engine.”  Altair is an enterprise content management platform that has been specifically used by Microsoft Azure.  The popular dtSearch platform has been searching through terabytes since 1991 and is referred to as a powerful search tool.  Embedding dtSearch into the Altiar core will make it a more powerful ECM.

Altiar is a popular ECM and can only be improved by dtSearch:

“A cloud-based service, Altiar includes rapid setup, scalability, and storage. It can accept any type of file, from PowerPoint to streaming video, as well as providing a host of tools and services to create custom content pages, newsletters, personal zones, and the like. The platform lets users not only access content from any connected device, but also manage, share, and track content, including features like email alerts.”

Microsoft is not a main player in the cloud computing and Microsoft Azure is supposed to drive more customers to them.  Anything, like this new Altair improving its search will make it more appealing.

Whitney Grace, May 27, 2015

Sponsored by, publisher of the CyberOSINT monograph

Hadoop Has Accessories

May 25, 2015

ZDNet’s article, “Why Hadoop Is Hard, And How To Make It Easier” alludes that Hadoop was going to disappear at some point.  We don’t know about you, but the open source big data platform has a huge support community and hundreds have adopted it, if not thousands of companies, have deployed Hadoop.  The article argues otherwise, citing that a recent Gartner survey found that only 26 percent of the corporate world is actively using it.

One of the biggest roadblocks for Hadoop is that it is designed for specialist to tinker with and it is not an enterprise tool.  That might change when Microsoft releases its new SQL Server 2016.  With the new server, Microsoft will add Polybase that bridges Hadoop to the server.  Microsoft is still the most popular OS for enterprise systems and when this upgrade becomes available Hadoop will be a more viable enterprise option.

What is the counterpoint?

“It’s also a counterpoint to the interpretation of Gartner’s survey that says Hadoop is somehow languishing. What’s languishing is the Enterprise’s willingness to invest in a new, premium skill set, and the low productivity involved in working with Hadoop through its motley crew of command-line shells and scripting languages. A good data engine should work behind the scenes and under the covers, not in the spotlight.”

So once more enterprise systems need to be updated, which is comparable to how Hadoop needs to be augmented with add-on features to make it more accessible, such as mature analytics tools, DBMS abstraction layers and Hadoop-as-a-Service cloud offerings.

Whitney Grace, May 25, 2015

Sponsored by, publisher of the CyberOSINT monograph

Peruse Until You Are Really Happy

May 22, 2015

Have you ever needed to quickly locate a file that you just know you made, but were unable to find it on your computer, cloud storage, tablet, smartphone, or company pool drive?  What is even worse is if your search query does not pick up on any of your keywords!  What are you supposed to do then?  VentureBeat might have the answer to your problems as explained in the article, “Peruse Is A New Natural Language Search Tool For Your Dropbox And Box Files.”  Peruse is a search tool that allows users to use their natural flow of talking to find their files and information.

Natural language querying is already a big market for business intelligence software, but it is not as common in file sharing services.  Peruse is a startup with the ability to search Dropbox and Box accounts using a regular question.  If you ask, “Where is the marketing data from last week?” The software will be able to pull the file for you without even opening the file. Right now, Peruse can only find information in spreadsheets, but the company is working on expanding the supported file types.

“The way we index these files is we actually look at them visually — it understands them in a way a person would understand them,” said [co-founder and CEO Luke Gotszling], who is showing off Peruse…”

Peruse’s goal is to change the way people use document search.  Document search has remained pretty consistent since 1995, twenty years later Gotszling is believes it is time for big change.  Gotzling is right, document search remains the same, while Web search changes everyday.

Whitney Grace, May 22, 2015

Stephen E Arnold, Publisher of CyberOSINT at Preserves Online Information

May 18, 2015

Today’s information seekers use the Internet the way some of used reference books growing up. Unlike the paper tomes on our dusty bookshelves, however, websites can change their content without so much as a by-your-leave. Suggestions for preserving online information can be found in “Create Publicly Available Web Page Archives with” at

Writer Martin Brinkmann begins by listing several local options familiar to many of us. There’s Ctrl-s, of course, and assorted screenshot-saving methods. Website archivers like Httrack perform their own crawls and save the results to the user’s local machine. Remotely, automatically creates snapshots of prominent sites, but users cannot control the results. Enter Brinkmann writes: is a free service that helps you out. To use it, paste a web address into the form on the services main page and hit submit url afterwards. The service takes two snapshots of that page at that point in time and makes it available publicly. The first takes a static snapshot of the site. You find images, text and other static contents included while dynamic contents and scripts are not. The second snapshot takes a screenshot of the page instead. An option to download the data is provided. Note that this downloads the textual copy of the site only and not the screenshot. A Firefox add-on has been created for the service which may be useful to some of its users. It creates automatic snapshots of every web page that you bookmark in the web browser after installation of the add-on.”

Wow, don’t set and forget that Firefox option! In fact, the article cautions, be mindful of the public availability of every snapshot; Brinkmann reasonably suggests the tool could benefit from a password feature. Still, this could be an option to preserve important (but, for the prudent, impersonal) information found online.

Cynthia Murrell, May 18, 2015

Stephen E Arnold, Publisher of CyberOSINT at

Behind The Google X Doors

May 18, 2015

Google X is Google’s top-secret laboratory, where the company develops new, innovative technology projects.  The main purpose behind Google X is to make technology more adaptable, useful, as well as improve people’s lives.  The Google Glass was one of their projects, so is Project Loon, where giant, high altitude balloons are released into the sky to bring Internet services to rural areas.  Also do not forget the driverless car.  EWeek has listed “10 Bold Google X Projects Aiming For Tech Breakthroughs,” exploring the new wonders that could eventually be available to your or me.

Are you interested in cleaner, renewable energy?  So are the folks at Makani Power, a Google X project that builds wind turbines and then makes them airborne using kites.  The wind turbines make energy for human consumption.  While energy is important for modern human life, health is a big issue too.

Google X has four projects dedicated to learning more about the human body and disease.  One is a contact lens measure glucose levels in tears, so diabetics will not have to prick themselves with needles to measure their sugar levels.  The Baseline Study project analyzes medical information and uses genomics to define what the human body actually is.  This project’s goal is to predict major diseases before their onset.  Life Labs, acquired in 2014, invented a spoon device that counteracts Parkinson’s disease.  The most astounding is something out of a science-fiction novel:

“Google X is in the nanoparticles business. The company in October unveiled a platform that uses nanoparticles to detect disease. In January, it followed that up with the announcement of the creation of synthetic skin as a proof-of-concept to show what nanoparticle technology might achieve in human biology and health.”

Nanoparticles?  Self-driving cars? Wind turbines on kites?  What will Google X work on next?

Whitney Grace, May 18, 2015
Sponsored by, publisher of the CyberOSINT monograph

Popular and Problematic Hadoop

May 15, 2015

We love open source on principle, and Hadoop is indeed an open-source powerhouse. However, any organization considering a Hadoop system must understand how tricky implementation can be, despite the hype. A pair of writers at GCN asks and answers the question, “What’s Holding Back Hadoop?” The brief article reports on a recent survey of data management pros by data-researcher TDWI. Reporters Troy K. Schneider and Jonathan Lutton explain:

“Hadoop — the open-source, distributed programming framework that relies on parallel processing to store and analyze both structured and unstructured data — has been the talk of big data for several years now.  And while a recent survey of IT, business intelligence and data warehousing leaders found that 60 percent will Hadoop in production by 2016, deployment remains a daunting task. TDWI — which, like GCN, is owned by 1105 Media — polled data management professionals in both the public and private sector, who reported that staff expertise and the lack of a clear business case topped their list of barriers to implementation.”

The write-up supplies a couple bar graphs of survey results, including the top obstacles to implementation and the primary benefits of going to the trouble. Strikingly, only six percent or respondents say there’s no Hadoop in their organizations’ foreseeable future. Though not covered in the GCN write-up, the full, 43-page report includes word on best practices and implementation trends; it can be downloaded here (registration required).

Cynthia Murrell, May 15, 2015

Sponsored by, publisher of the CyberOSINT monograph

The Latest SharePoint News from Ignite

May 14, 2015

The Ignite conference in Chicago has answered many of the questions that SharePoint users have been curious about for months now. Among them was the question of release timing and features for the latest iteration of SharePoint. CMS Wire gives a rundown in their article, “What’s Up With SharePoint? #MSIgnite.”

The article sums up the biggest news:

“Microsoft will continue to enhance the core offerings in the on-premises edition. It will also continue to develop SharePoint Online and update it as quickly as the updates are available. A preview version of SharePoint 2016 will be made available later this summer, with a beta version expected by the end of the year . . . In an afternoon session entitled Evolution of SharePoint Overview and Roadmap, the duo gave a rough outline of Microsoft’s plans, albeit without precise delivery dates.”

Having had to push back delivery dates once already, Microsoft is likely hesitant to announce anything solid until development is final. As far as qualities for the new version, Microsoft is focusing on: user experience, extensibility, and SharePoint management. The inclusion of user experience should be a welcome change for many. To stay in touch with developments as they become available, keep an eye on, and particularly his feed devoted to SharePoint. Stephen E. Arnold has made a lifelong career out of all things search, and he has a knack for distilling down the “need to know” facts to keep an organization on track.

Emily Rae Aldridge, May 14, 2015

Sponsored by, publisher of the CyberOSINT monograph

The Forgotten List of Telegraph

May 13, 2015

Technology experts and information junkies in the European Union are in an uproar over a ruling that forces Google to remove specific information from search results.  “The right to be forgotten” policy upheld by the EU is supposed to help people who want “inadequate, irrelevant, or no longer relevant” information removed from Google search results.  Many news outlets in Europe have been affected, including the United Kingdom’s Telegraph.  The Telegraph has been recording a list called “Telegraph Stories Affected By ‘EU Right To Be Forgotten’” of all the stories they have been forced to remove.

According to the article, the Google has received over 250,000 requests to remove information.  Some of these requests concern stories published by Telegraph.  While many oppose the ‘right to be forgotten,’ including the House of Lords, others are still upholding the policy:

“But David Smith, deputy commissioner and director of data protection for the Information Commissioner’s Office (ICO), hit back and claimed that the criticism was misplaced, ‘as the initial stages of its implementation have already shown.’ ”

Many of the “to be forgotten” requests concern people with criminal pasts and misdeeds that are color them in an bad light.  The Telegraph’s content might be removed from Google, but they are keeping a long, long list on their website.  Read the stories there or head on over to the US Google website-freedom of the press still holds true here.

Whitney Grace, May 13, 2015

Sponsored by, publisher of the CyberOSINT monograph

The Philosophy of Semantic Search

May 13, 2015

The article Taking Advantage of Semantic Search NOW: Understanding Semiotics, Signs, & Schema on Lunametrics delves into semantics on a philosophical and linguistic level as well as in regards to business. He goes through the emergence of semantic search beginning with Ray Kurzweil’s interest in machine learning meaning as opposed to simpler keyword search. In order to fully grasp this concept, the author of the article provides a brief refresher on Saussure’s semantics.

“a Sign is comprised of a signifier, or the name of a thing, and the signified, what that thing represents… Say you sell iPad accessories. “iPad case” is your signifier, or keyword in search marketing speak. We’ve abused the signifier to the utmost over the years, stuffing it onto pages, calculating its density with text tools, jamming it into title tags, in part because we were speaking to robot who read at a 3-year-old level.”

In order to create meaning, we must go beyond even just the addition of price tag and picture to create a sign. The article suggests the need for schema, in the addition of some indication of whom and what the thing is for. The author, Michael Bartholow, has a background in linguistics and marketing and search engine optimization. His article ends with the question of when linguists, philosophers and humanists will be invited into the conversation with businesses, perhaps making him a true visionary in a field populated by data engineers with tunnel-vision.

Chelsea Kerwin, May 13, 2014

Sponsored by, publisher of the CyberOSINT monograph

Next Page »