CyberOSINT banner

The History of ZyLab

February 10, 2016

Big data was a popular buzzword a few years ago, making it seem that it was a brand new innovation.  The eDiscovery process, however, has been around for several decades, but recent technology advancements have allowed it to take off and be implemented in more industrial fields.  While many big data startups have sprung up, ZyLab-a leading innovator in the eDiscovery and information governance-started in its big data venture in 1983.   ZyLab created a timeline detailing its history called, “ZyLab’s Timeline Of Technical Ingenuity.”

Even though ZyLab was founded in 1983 and introduced the ZyIndex, its big data products did not really take off until the 1990s when personal computers became an indispensable industry tool.  In 1995, ZyLab made history by being used in the OJ Simpson and Uni-bomber investigations.  Three years later it introduced text search in images, which is now a standard search feature for all search engines.

Things really began to take off for ZyLab in the 2000s as technology advanced to the point where it became easier for companies to create and store data as well as beginning the start of masses of unstructured data.  Advanced text analytics were added in 2005 and ZyLab made history again by becoming the standard for United Nations War Crime Tribunals.

During 2008 and later years, ZyLab’s milestones were more technological, such as creating the Zylmage SharePoint connector and Google Web search engine integration, the introduction of the ZyLab Information Management Platform, first to offer integrated machine translation in eDiscovery, adding audio search, and incorporating true native visual search and categorization.

ZyLab continues to make historical as well as market innovations for eDiscovery and big data.


Whitney Grace, February 10, 2016
Sponsored by, publisher of the CyberOSINT monograph

Jargon Watch: De-Risking

February 8, 2016

De-Risking Technology Projects” presents some interesting factoids; for example: “Fewer than one in three software projects present successful outcomes.”

The factoid comes from a mid tier consulting firm’s “Chaos” report. The diligent folks who did the research analyzed 50,000 projects.

But the hook which snagged me was the use of the term “de-risking.” The idea is that one takes an assignment at work, works on it, and keeps one’s job even if the project goes down in flames.

How can this state of regular paycheck nirvana be achieved? The write up offers some advice which is obvious and probably has been embraced by those who crank out a collapsing bridge or a search and content processing system which cannot locate information or keep pace with inflows of content.

Here are the tips in case you napped during one of your business school lectures:

  • Balance scope and time available
  • Figure out how and what to deliver
  • Design and implement the solution
  • Prioritize simplicity and performance.

Now how does one get from high rates of failure to success?

Let’s consider implementing a search, content processing, and discovery solutions. Most of the information access systems with which I have examined deliver disappointment. Years ago I reported on the satisfaction users of enterprise search systems reported. The rate of dissatisfaction fell somewhere between 55 and 75 percent of users.

This means that if one third of enterprise software projects like search and content processing fail, the two thirds which survive crank out astounding users who are not happy with the deployed system.

The question “How does one make an enterprise search and content processing?” a success calls into question the products, interfaces, and functionality of many vendors’ work.

My view is that users cope. The belief that information access technology is making corporate work a joy is widely held. Like some other beliefs, reality may not match up.

Wonder why vendors are embracing open source technology? It is part of the de-risking approach. Let others figure out how to fix this stuff.

Does de-risking deliver excellence? In my experience, nope. Jargon is a means of closing a deal. Making something work for its users is a different challenge.

Stephen E Arnold, February 8, 2016

How Often Do You Use Vocal Search

February 8, 2016

Vocal search is an idea from the future: you give a computer a query and it returns relevant information.   However, vocal search has become an actual “thing” with mobile assistants like Siri, Cortana, and build in NLP engines on newer technology.  I enjoy using vocal search because it saves me from having to type my query on a tiny keyboard, but when I’m in a public place I don’t use it for privacy reasons.  Search Engine Watch asks the question, “What Do You Need To Know About Voice Search?” and provides answers for me more questions about vocal search.

Northstar Research conducted a study that discovered 55% percent of US teens used vocal search, while only 41% of US adults do.  An even funnier fact is that 56% of US adults only use the search function, because it makes them feel tech-savvy.

Vocal Search is extremely popular in Asia due to the different alphabets.  Asian languages are harder to type on a smaller keyboard.  It is also a pain on Roman alphabet keyboards!

Tech companies are currently working on new innovations with vocal search.  The article highlights how Google is trying to understand the semantic context behind queries for intent and accuracy.

“Superlatives, ordered items, points in time and complex combinations can now be understood to serve you more relevant answers to your questions…These ‘direct answers’ provided by Google will theoretically better match the more natural way that people ask questions in speech rather then when typing something into a search bar, where keywords can still dominate our search behaviour.”

It translates to a quicker way to access information and answer common questions without having to type on a keyboard.  Now it would be a lot easier if you did not have to press a button to activate the vocal search.

Whitney Grace, February 8, 2016
Sponsored by, publisher of the CyberOSINT monograph

Bing Searches for Continuous Development

February 5, 2016

I read “Microsoft Shifts Bing Search Engine To ‘Continuous’ Development Cycle.” Frankly I had never considered the frequency of Bing updates. I do pay attention when Microsoft relies on Baidu or Yandex for search. I may or may not notice when Bing “hides” its shopping service. I have given up trying to locate Microsoft academic search and trying to figure out how to eliminate pop culture references from a Bing results set. In short, I know about Bing, but I don’t think about Bing unless I read articles like “Bing Search for Android Gets New Design and Lots of Bugs in Latest Update.”

Recently Bing realized that it was not making modifications to the site quickly enough. I learned:

The Bing team has openly stated that it was finding its deployment cycle was limiting innovation.

The idea is that Bing will just get better more quickly. Okay, that sounds good. I learned also:

Some people call this learning to fail fast i.e. get features tested and only keep the stuff that works.

I took another look at the write up. The author is a “contributor” to Forbes. Does this mean that the write up is an advertorial? That’s okay, but the conclusion left me scratching my head:

Quite why Bing isn’t the new Google is another topic altogether. Microsoft may never challenge the search giant’s simplicity, functionality and query intelligence – or it might, we don’t know. What we do know is that software updates have to work a whole lot faster than they used to and only the successful ‘code shops’ will now follow this pattern.

My thoughts on why Bing lags behind Google boils down to:

  1. The Bing index strikes me as less robust than Google’s
  2. The Bing system does not deliver results that give me access to content on sites which are smaller and often quite difficult via the Bing tools.

Google is not perfect, so I rely on, Yandex, and other systems. Bing is not a second choice for me. Speed of code changes is, like many of my Bing search query results, irrelevant.

Stephen E Arnold, February 5, 2016

Its Official: Facebook and the Dark Web

February 5, 2016

A piece from Nextgov suggests just how ubiquitous the Dark Web could become. Published as Facebook is giving users a new way to access it on the ‘Dark Web’, this article tells us “a sizeable community” of its users are also Dark Web users; Facebook has not released exact figures. Why are people using the Dark Web for everyday internet browsing purposes? The article states:

“Facebook’s Tor site is one way for people to access their accounts when the regular Facebook site is blocked by governments—such as when Bangladesh cut off access to Facebook, its Messenger and Whatsapp chat platforms, and messaging app Viber for about three weeks in November 2015. As the ban took effect, the overall number of Tor users in Bangladesh spiked by about 10 times, to more than 20,000 a day. When the ban was lifted, the number dropped back to its previous level.”

Public perception of the darknet is changing. If there was any metric to lend credibility to the Dark Web being increasingly used for mainstream purposes, it is Facebook adding a .onion address. Individual’s desire for security, uninterrupted and expansive internet access will only contribute to the Dark Web’s user base. While the Silk Road-type element is sure to remain as well, it will be interesting to see how things evolve.


Megan Feil, February 5, 2016

Sponsored by, publisher of the CyberOSINT monograph

Elasticsearch Works for Us 24/7

February 5, 2016

Elasticsearch is one of the most popular open source search applications and it has been deployed for personal as well as corporate use.  Elasticsearch is built on another popular open source application called Apache Lucene and it was designed for horizontal scalability, reliability, and easy usage.  Elasticsearch has become such an invaluable piece of software that people do not realize just how useful it is.  Eweek takes the opportunity to discuss the search application’s uses in “9 Ways Elasticsearch Helps Us, From Dawn To Dusk.”

“With more than 45 million downloads since 2012, the Elastic Stack, which includes Elasticsearch and other popular open-source tools like Logstash (data collection), Kibana (data visualization) and Beats (data shippers) makes it easy for developers to make massive amounts of structured, unstructured and time-series data available in real-time for search, logging, analytics and other use cases.”

How is Elasticsearch being used?  The Guardian is daily used by its readers to interact with content, Microsoft Dynamics ERP and CRM use it to index and analyze social feeds, it powers Yelp, and her is a big one Wikimedia uses it to power the well-loved and used Wikipedia.  We can already see how much Elasticsearch makes an impact on our daily lives without us being aware.  Other companies that use Elasticsearch for our and their benefit are Hotels Tonight, Dell, Groupon, Quizlet, and Netflix.

Elasticsearch will continue to grow as an inexpensive alternative to proprietary software and the number of Web services/companies that use it will only continues to grow.

Whitney Grace, February 5, 2016
Sponsored by, publisher of the CyberOSINT monograph

Bing Clocks Search Speed

February 4, 2016

Despite attempts to improve Bing, it still remains the laughing stock of search engines.  Google has run it over with its self-driving cars multiple times.   DuckDuckGo tagged it as the “goose,” outran it, and forced Bing to sit in the proverbial pot.  Facebook even has unfriended Bing.  Microsoft has not given up on its search engine, so while there has been a list of novelty improvements (that Google already did or copied not long after their release) it has a ways to go.

Windows Central tells about the most recent Bing development: a bandwidth speed test in “Bing May Be Building A Speed Test Widget Within Search Results.”  Now that might be a game changer for a day, until Google releases its own version.  Usually to test bandwidth, you have to search for a Web site that provides the service.  Bing might do it on command within every search results page.  Not a bad idea, especially if you want to see how quickly your Internet runs, how fast it takes to process your query, or if you are troubleshooting your Internet connection.

The bandwidth test widget is not available just yet:

“A reader of the site Kabir tweeted a few images displaying widget like speed test app within Bing both on the web and their phone (in this case an iPhone). We were unable to reproduce the results on our devices when typing ‘speed test’ into Bing. However, like many new features, this could be either rolling out or simply A/B testing by Microsoft.”

Keep your fingers crossed that Microsoft releases a useful and practical widget.  If not just go to Google and search for “bandwidth test.”


Whitney Grace, February 4, 2016
Sponsored by, publisher of the CyberOSINT monograph

Multimedia Data Mining

February 3, 2016

I read “Knowledge Discovery using Various Multimedia Data Mining Technique.” The write up is an Encyclopedia Britannica type summary of the components required to make sense of audio and video.

I noted this passage:

In this paper, we addressed data mining for multimedia data such as text, image, video and audio. In particular, we have reviewed and analyzed the multimedia data mining process with different tasks. This paper also described the clustering models using video for multimedia mining.

The methods used by the systems the author considered use the same numerical recipes which most search vendors know, love, rely upon, and ignore the known biases of the methods: Regression, time series, etc.

My take away is that talk about making sense of the flood of rich media is a heck of a lot easier than processing the video uploaded to Facebook and YouTube in a single hour.

The write up does not mention companies working in this farm yard. There are some nifty case studies to reference as well; for example, Exalead’s video search and my touchstone, Google YouTube and Google Video Search. Blinkx (spun out of Autonomy, a semi famous search outfit) is a juicy tale as well.

In short, if you want to locate videos, one has to use multiple tools, ask people where a video may be found, or code your own solution.

Stephen E Arnold, February 3, 2016

The Enterprise and Online Anonymity Networks

February 3, 2016

An article entitled Tor and the enterprise 2016 – blocking malware, darknet use and rogue nodes from Computer World UK discusses the inevitable enterprise concerns related to anonymity networks. Tor, The Onion Router, has gained steam with mainstream internet users in the last five years. According to the article,

“It’s not hard to understand that Tor has plenty of perfectly legitimate uses (it is not our intention to stigmatise its use) but it also has plenty of troubling ones such as connecting to criminal sites on the ‘darknet’, as a channel for malware and as a way of bypassing network security. The anxiety for organisations is that it is impossible to tell which is which. Tor is not the only anonymity network designed with ultra-security in mind, The Invisible Internet Project (I2P) being another example. On top of this, VPNs and proxies also create similar risks although these are much easier to spot and block.”

The conclusion this article draws is that technology can only take the enterprise so far in mitigating risk. Reliance on penalties for running unauthorized applications is their suggestion, but this seems to be a short-sighted solution if popularity of anonymity networks rise.


Megan Feil, February 3, 2016

Sponsored by, publisher of the CyberOSINT monograph

The Encrypted Enterprise Search

February 3, 2016

Another enterprise software distributor has taken the leap into a proprietary encrypted search engine.  Computer Technology Review informs us that “VirtualWorks Releases Its Encrypted Enterprise Search Platform ViaWorks Built On Hitachi Technology.”  VirtualWorks’s enterprise search platform is called ViaWorks and the company’s decision to release an encrypted search engine comes after there has been a rise in data security breaches as well as concern about how to prevent such attacks.  We will not even mention how organizations want to move to the cloud, but are fearful of hacking.  More organizations from shopping in person on the Internet, banking, healthcare, government, and even visiting a library use self-service portals that rely on personal information to complete tasks.  All of these portals can be hacked, so trade organizations and the government are instituting new security measures.

Everyone knows, however, that basic rules and a firewall are not enough to protect sensitive information.  That is why companies like VirtualWorks stay one step ahead of the game with a product like ViaWork built on Hitachi’s Searchable Encryption technology.  ViaWorks is a highly encrypted platform that does not sacrifice speed and accuracy for security

“ViaWorks encrypted enterprise search features are based on AES, a worldwide encryption standard established by NIST; special randomization process, making the encrypted data resistant to advanced statistical attacks; with key management and encryption APIs that store encryption keys securely and encrypt the original data.  ViaWorks provides key management and encryption APIs that store encryption keys securely and encrypt the original data, respectively. Users determine which field is encrypted, such as index files, search keyword or transaction logs.”

VirtualWorks already deployed ViaWorks in beta tests within healthcare, government, insurance, and finance.  Moving information to the cloud saves money, but it presents a security risk and slow search.  A commercial encrypted search engine paired with cloud computing limits the cyber risk.


Whitney Grace, February 3, 2016
Sponsored by, publisher of the CyberOSINT monograph

Next Page »