Super Secretive Google DeepMind Open Sources AI Rocket Science

December 6, 2016

Take that IBM. And you Microsoft, I see you and raise you more smart software. The high stakes poker game for the control of the burgeoning market for smart software is getting exciting and expensive. The Google either bursts with confidence, or it fears that outfits like IBM and Microsoft are poised to run the table.

Navigate to the “real” news story “Google DeepMind Makes AI Training Platform Publicly Available.” You will learn that:

DeepMind is putting the entire source code for its training environment — which it previously called Labyrinth and has now renamed as DeepMind Lab — on the open-source depository GitHub, the company said Monday. Anyone will be able to download the code and customize it to help train their own artificial intelligence systems. They will also be able to create new game levels for DeepMind Lab and upload these to GitHub.

Is the move a response to auto maker, dreamer, and launcher of expensive fireworks Elon Musk’s puttering in smart software? The “real” news story said:

OpenAI, a rival research shop set up by billionaire entrepreneur Elon Musk, venture capitalist Peter Thiel and Sam Altman, a founder of Silicon Valley startup accelerator Y Combinator, made its own AI training platform, called OpenAI Gym, available to the public in April [2016]. On Monday it also announced that it was making public an interface called Universe that lets an AI agent “use a computer like a human does: by looking at screen pixels and operating a virtual keyboard and mouse,” the company said in a statement. In short, it’s a go-between that lets an AI system learn the skills needed to play games or operate other applications. Researchers can use tools in OpenAI’s Gym to measure how these agents perform.

Why pay for artificial intelligence to perform the handful of tasks that the Harvard Business Review documented in the helpful write up “What Artificial Intelligence Can and Can’t Do Right Now.”

Perhaps free software will allow the capabilities of smart software to achieve the heights of wonder the marketers envision? How does one spell lock in?

Stephen E Arnold, December 6, 2016

Iran-Russia Ink Pact for Search Engine Services

November 28, 2016

Owing to geopolitical differences, countries like Iran are turning towards like-minded nations like Russia for technological developments. Russian Diplomat posted in Iran recently announced that home-grown search engine service provider Yandex will offer its services to the people of Iran.

Financial Tribune in a news report Yandex to Arrive Soon said that:

Last October, Russian and Iranian communications ministers Nikolay Nikiforov and Mahmoud Vaezi respectively signed a deal to expand bilateral technological collaborations. During the meeting, Russian Ambassador Vaezi said, We are familiar with the powerful Russian search engine Yandex. We agreed that Yandex would open an office in Iran. The system will be adapted for the Iranian people and will be in Persian.

Iran traditionally has been an extremist nation and at the center of numerous international controversies that indirectly bans American corporations from conducting business in this hostile territory. On the other hand, Russia which is seen as a foe to the US stands to gain from these sour relations.

As of now, .com and domains owned by Yandex are banned in Iran, but with the MoU signed, that will change soon. There is another interesting point to be observed in this news piece:

Looking at, an official reportedly working for IRIB purchased the website, according to a domain registration search.  DomainTools, a portal that lists the owners of websites, says Mohammad Taqi Mozouni registered the domain address back in July.

Technically, and internationally accepted, no individual or organization can own a domain name of a company with any extension (without necessary permissions) that has already carved out a niche for itself online. It is thus worth pondering what prompted a Russian search engine giant to let a foreign governmental agency acquire its domain name.

Vishal Ingole November 28, 2016
Sponsored by, publisher of the CyberOSINT monograph

Android Has No Competition in Mobile OS Market

November 23, 2016

Google’s Android OS currently powers 88% of the smartphones in the world, leaving minuscule 12.1 percent to Apple’s iOS and the remaining 0.3 percent for Windows Mobile, BlackBerry OS and Tizen.

IBTimes in an article titled Android Rules! 9 out of Every 10 Phones Run Google’s OS says:

Google’s Android OS dominated the world by powering 88 percent of the world’s smartphone market in the third quarter of 2016. This means 9 out of every 10 mobile phones in the world are using Android, while the rest rely on iOS or other mobile OS such as BlackBerry OS, Tizen and Windows Phone.

The growth occurred despite the fact that smartphone shipments are falling. China and Africa which were big markets have been performing poorly since last three-quarters. Android’s gain thus can be attributed to the fact that Android is an OpenSource system that can be used by any device manufacturer.

Despite being the clear leader, the mobile OS is full of bugs and other inherent problems, as the article points out:

Android platform is getting overcrowded with hundreds of manufacturers, few Android device vendors make profits, and Google’s new Pixel range is attacking its own hardware partners that made Android popular in the first place.

At present, Samsung, Huawei, Oppo and Vivo are the leading Android phone makers. However, Google recently unveiled Pixel, its flagship phone for the premium category. Does it mean that Google has its eyes set on the premium handset category market? Only time can tell.

Vishal Ingole, November 23, 2016
Sponsored by, publisher of the CyberOSINT monograph Missing Some Stuff

November 9, 2016

I love US government Web sites. They come and then they fade. The new kid on the block is The idea is that the US government has created a portal for open source software. Here are the entities whose code is available to anyone able to navigate to this link.

  • Agriculture
  • Commerce
  • EPA
  • Energy
  • Executive Office of the President
  • GSA (home of 18F)
  • Labor
  • NASA
  • National Archives and Records Administration
  • OPM (yep, the security conscious folks)
  • Treasury
  • Veterans Affairs (what are you looking at, cupcake?)

I did notice some interesting gaps; for example, does the Department of Defense have open source software? Well, maybe not. We do include a pointer to more than 100 useful programs in the forthcoming Dark Web Notebook. Want to reserve a copy? Write benkent2020 at yahoo dot com and we’ll put your name on the list.

Stephen E Arnold, November 9, 2016

Solr: The Prestigious Bossie Winner

September 30, 2016

Beyond Search learned that open source search and retrieval solution Solr won a Bossie Award. The outfit involved in the awards said that Solr was a trusted and mature search engine technology.” Big outfits using Solr include Zappos, Comcast, and DuckDuckGo.

Also bringing home an award was Lucene. The description of Elasticsearch pointed out:

As part of the ELK stack (Elasticsearch, Logstash, and Kibana, all developed by Elasticsearch’s creators, Elastic), Elasticsearch has found its killer app as an open source Splunk replacement for log analysis.

Users of Lucene include Microsoft and LinkedIn. (What’s the problem with SharePoint Search? What prevents Microsoft from using Fast Search & Transfer technology in lieu of open source search?)

Why are Solr and Lucene the go to search utilities? Free? Actual bug fixes and not excuses? No licensing leg shackles? Did I mention free?

Stephen E Arnold, September 30, 2016

The Uncertain Fate of OpenOffice

September 27, 2016

We are in danger of losing a popular open-source alternative to the Microsoft Office suite, we learn from the piece, “Lack of Volunteer Contributors Could Mean the End for OpenOffice” at Neowin. Could this the fate of open source search, as well?

Writer William Burrows observes that few updates for OpenOffice have emerged of late, only three since 2013, and the last stable point revision was released about a year ago. More strikingly, it took a month to patch a major security flaw over the summer, reports Burrows. He goes on to summarize OpenOffice’s 14-year history, culminating it the project’s donation to Apache by Oracle in 2011. It appears to have been downhill from there. The article tells us:

It was at this point that a good portion of the volunteer developer base reportedly moved onto the forked LibreOffice project. Since becoming Apache OpenOffice, activity on project has diminished significantly. In a statement by Dennis Hamilton, the project’s volunteer vice president, released in an email to the mailing list it was suggested that “retirement of the project is a serious possibility” citing concerns that the current team of around six volunteer developers who maintain the project may not have sufficient resources to eliminate security vulnerabilities. There is still some hope for OpenOffice, though, with some of the contributors suggesting that discussion about a shutdown may be a little premature, and that attracting new contributors is still possible.

In fact, OpenOffice was downloaded over 29 million times last year, so obviously it still has a following. LibreOffice is currently considered more successful, but that could change if OpenOffice manages to attract a resurgence of developers willing to contribute to the project. Any volunteers?

Cynthia Murrell, September 27, 2016
Sponsored by, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden Web/Dark Web meet up on September 27, 2016.
Information is at this link:



Paris Police Face Data Problem in Google Tax Evasion Investigation

September 20, 2016

Google has been under scrutiny for suspected tax evasion. Yahoo published a brief piece updating us on the investigation: Data analysis from Paris raid on Google will take months, possibly years: prosecutor. French police raided Google’s office in Paris, taking the tax avoidance inquiry to a new level. This comes after much pressure from across Europe to prevent multinational corporations from using their worldwide presence to pay less taxes. Financial prosecutor Eliane Houlette is quoted stating,

We have collected a lot of computer data, Houlette said in an interview with Europe 1 radio, TV channel iTele and newspaper Le Monde, adding that 96 people took part in the raid. “We need to analyze (the data) … (it will take) months, I hope that it won’t be several years, but we are very limited in resources’. Google, which said it is complying fully with French law, is under pressure across Europe from public opinion and governments angry at the way multinationals exploit their global presence to minimize tax liabilities.

While big data search technology exists, government and law enforcement agencies may not have the funds to utilize such technologies. Or, perhaps the knowledge of open source solutions is not apparent. If nothing else, these comments made by Houlette go to show the need for increased focus on upgrading systems for real-time and rapid data analysis.

Megan Feil, September 20, 2016
Sponsored by, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden Web/Dark Web meet up on September 27, 2016.
Information is at this link:


Faster Text Classification from Facebook, the Social Outfit

August 29, 2016

I read “Faster, Better Text Classification.” Facebook’s artificial intelligence team has made available some of its whizzy code. The software may be a bit of a challenge to the vendors of proprietary text classification software, but Facebook wants to help everyone. Think of the billion plus Facebook users who need to train an artificially intelligent system with one billion words in 10 minutes. You may want to try this on your Chromebook, gentle reader.

I learned:

Automatic text processing forms a key part of the day-to-day interaction with your computer; it’s a critical component of everything from web search and content ranking to spam filtering, and when it works well, it’s completely invisible to you. With the growing amount of online data, there is a need for more flexible tools to better understand the content of very large datasets, in order to provide more accurate classification results. To address this need, the Facebook AI Research (FAIR) lab is open-sourcing fastText, a library designed to help build scalable solutions for text representation and classification.

What does the Facebook text classification code deliver as open sourciness? I learned:

FastText combines some of the most successful concepts introduced by the natural language processing and machine learning communities in the last few decades. These include representing sentences with bag of words and bag of n-grams, as well as using subword information, and sharing information across classes through a hidden representation. We also employ a hierarchical softmax that takes advantage of the unbalanced distribution of the classes to speed up computation. These different concepts are being used for two different tasks: efficient text classification and learning word vector representations.

The write up details some of the benefits of the code; for example, its multilingual capabilities and its accuracy.

What will other do gooders like Amazon, Google, and Microsoft do to respond to Facebook’s generosity? My thought is that more text processing software will find its way to open source green pastures.

What will the for fee vendors peddling proprietary classification systems do? Here’s a short list of ideas I had:

  1. Pivot to become predictive analytics companies and seek new rounds of financing
  2. Pretend that open source options are available but not good enough for real world tasks
  3. Generate white papers and commission mid tier consulting firms to extol the virtues of their innovative, unique, high speed, smart software
  4. Look for another line of work in search engine optimization, direct sales for a tool and die company, or check out Facebook.

Stephen E Arnold, August 29, 2016

LucidWorks Bet on Spark. Now What?

August 8, 2016

Many clear night ago, Lucid Imagination offered an open source enterprise search solution. Presidents came. Presidents went. Lucid Imagination morphed into LucidWorks. I promptly referred to the company in this way: Lucid works, really?

The firm embraced Spark and did a not-unexpected pirouette into a Big Data outfit. I know. I know. Lucid Imagination is a company anchored in key word search, but this is the 21st century. Pirouettes are better than mere pivots, so Big Data it is.

I read “Big Data Brawlers: 4 Challengers to Spark” and the write up triggered some thoughts about LucidWorks. Really.

The point of the story is to identify four open source solutions which do what Spark allegedly does so darned well. Each of these challengers:

  • Scales
  • Handles Big Data (whatever that means)
  • Exploits cheap memory so there are no slug like disc writes
  • Does the old school batch processing thing.

What are the “challengers” to Spark? Here are the contenders:

  • Apache Apex. Once proprietary, now open source, the software does micro batching for almost, sort of real time functions
  • Heron. Another real time solution with spouts and bolts. Excited?
  • Apache Flink. This is an open source library with a one two punch: It does the Flink stuff and the Spark stuff.
  • Onyx. This is a distributed computation system which will appeal to the Java folks.

What do these Spark alternatives have to do with LucidWorks, really? I think there is going to be one major impact. LucidWorks will have to spend or invest in supporting whatever becomes the next big thing. Recommind hit a glass ceiling with its business model. LucidWorks may be bumping into the open source sky light. Instead of being stopped, LucidWorks has to keep investing to keep pace with what the community driven folks generate with little thought to the impact on companies trying to earn a living with open source.

Stephen E Arnold, August 8, 2016

Amazon AWS Jungle Snares Some Elasticsearch Functions

July 1, 2016

Elastic’s Elasticsearch has become one of the go to open source search and retrieval solutions. Based on Lucene, the system has put the heat on some of the other open source centric search vendors. However, search is a tricky beastie.

Navigate to “AWS Elasticsearch Service Woes” to get a glimpse of some of the snags which can poke holes in one’s rip stop hiking garb. The problems are not surprising. One does not know what issues will arise until a search system is deployed and the lucky users are banging away with their queries or a happy administrator discovers that Button A no longer works.

The write up states:

We kept coming across OOM issues due the JVMMemoryPresure spiking and inturn the ES service kept crapping out. Aside from some optimization work, we’d more than likely have to add more boxes/resources to the cluster which then means more things to manage. This is when we thought, “Hey, AWS have a service for this right? Let’s give that a crack?!”. As great as having it as a service is, it certainly comes with some fairly irritating pitfalls which then causes you to approach the situation from a different angle.

One approach is to use templates to deal with the implementation of shard management in AWS Elasticsearch. Sample templates are provided in the write up. The fix does not address some issues. The article provides a link to a reindexing tool called es-tool.

The most interesting comment in the article in my opinion is:

In hindsight I think it may have been worth potentially sticking with and fleshing out the old implementation of Elasticsearch, instead of having to fudge various things with the AWS ES service. On the other hand it has relieved some of the operational overhead, and in terms of scaling I am literally a couple of clicks away. If you have large amounts of data you pump into Elasticsearch and you require granular control, AWS ES is not the solution for you. However if you need a quick and simple Elasticsearch and Kibana solution, then look no further.

My takeaway is to do some thinking about the strengths and weaknesses of the Amazon AWS before chopping through the Bezos cloud jungle.

Stephen E Arnold, July 1, 2016

Next Page »