CyberOSINT banner

Microsoft and the Open Source Trojan Horse

March 30, 2016

Quite a few outfits embrace open source. There are a number of reasons:

  1. It is cheaper than writing original code
  2. It is less expensive than writing original code
  3. It is more economical than writing original code.

The article “Microsoft is Pretending to be a FOSS Company in Order to Secure Government Contracts With Proprietary Software in ‘Open’ Clothing” reminded me that there is another reason.

No kidding.

I know that IBM has snagged Lucene and waved its once magical wand over the information access system and pronounced, “Watson.” I know that deep inside the kind, gentle heart of Palantir Technologies, there are open source bits. And there are others.

The write up asserted:

For those who missed it, Microsoft is trying to EEE GNU/Linux servers amid Microsoft layoffs; selfish interests of profit, as noted by some writers [1,2] this morning, nothing whatsoever to do with FOSS (there’s no FOSS aspect to it at all!) are driving these moves. It’s about proprietary software lock-in that won’t be available for another year anyway. It’s a good way to distract the public and suppress criticism with some corny images of red hearts.

The other interesting point I highlighted was:

reject the idea that Microsoft is somehow “open” now. The European Union, the Indian government and even the White House now warm up to FOSS, so Microsoft is pretending to be FOSS. This is protectionism by deception from Microsoft and those who play along with the PR campaign (or lobbying) are hurting genuine/legitimate FOSS.

With some government statements of work requiring “open” technologies, Microsoft may be doing what other firms have been doing for a while. See points one to three above. Microsoft is just late to the accountants’ party.

Why not replace the SharePoint search thing with an open source solution? What’s the $1.2 billion MSFT paid for the fascinating Fast Search & Transfer technology in 2008? It works just really well, right?

Stephen E Arnold, March 30, 2016

Elasticsearch for Text Analysis

March 29, 2016

Short honk: Put your code hat on. “Mining Mailboxes with Elasticsearch and Kibana” walks a reader through using open source technology to do text analysis. The example under the microscope is email, but the method will work for any text corpus ingested by Elasticsearch. The write up includes code samples and enough explanation to get the Elastic system moving forward. Visualizations are included. These make it easy to spot certain trends; for example, the top recipients of the email analyzed for the tutorial. Worth a look.

Stephen E Arnold, March 29, 2016

Search as a Framework

March 26, 2016

A number of search and content processing vendors suggest their information access system can function as a framework. The idea is that search is more than a utility function.

If the information in the article “Abusing Elasticsearch as a Framework” is spot on, a non search vendor may have taken an important step to making an assertion into a reality.

The article states:

Crate is a distributed SQL database that leverages Elasticsearch and Lucene. In it’s infant days it parsed SQL statements and translated them into Elasticsearch queries. It was basically a layer on top of Elasticsearch.

The idea is that the framework uses discovery, master election, replication, etc along with the Lucene search and indexing operations.

Crate, the framework, is a distributed SQL database “that leverages Elasticsearch and Lucene.”

Stephen E Arnold, March 26, 2016

Reviews on Dark Web Email Providers Shared by Freedom Hacker

February 10, 2016

The Dark Web has many layers of sites and services, as the metaphor provided in the .onion extension suggests. List of secure Dark Web email providers in 2016 was recently published on Freedom Hacker to detail and review the Dark Web email providers currently available. These services, typically offering both free and pro account versions, facilitate emailing without any type of third-party services. That even means you can forget any hidden Google scripts, fonts or trackers. According to this piece,

“All of these email providers are only accessible via the Tor Browser, an anonymity tool designed to conceal the end users identity and heavily encrypt their communication, making those who use the network anonymous. Tor is used by an array of people including journalists, activists, political-dissidents, government-targets, whistleblowers, the government and just about anyone since it’s an open-source free tool. Tor provides a sense of security in high-risk situations and is often a choice among high-profile targets. However, many use it day-to-day as it provides identity concealment seamlessly.”

We are intrigued by the proliferation of these services and their users. While usage numbers in this article are not reported, the write-up of the author’s top five email applications indicate enough available services to necessitate reviews. Equally interesting will be the response by companies on the clearweb, or the .com and other regular sites. Not to mention how the government and intelligence agencies will interact with this burgeoning ecosystem.


Megan Feil, February 10, 2016

Sponsored by, publisher of the CyberOSINT monograph


More Open Source Smart Software

January 15, 2016

The gift giver this time is Baidu. Navigate to “Baidu Open-Sources Its WARP-CTC Artificial Intelligence Software.” Baidu’s method is call the connectionist temporal classification or CTC method. Is the innovation from the Middle Kingdom? Nah. Switzerland. You know, the country where Einstein whacked away with his so so computational skills.

According to the write up:

The CTC approach involves recurrent neural networks (RNNs), an increasingly common component used for a type of AI called deep learning. Recurrent nets have been shown to work well even in noisy environments.

Have at the code, gentle read. The link is

Stephen E Arnold, January 14, 2016

Open Source Data Management: It Is Now Easy to Understand

January 10, 2016

I read “16 for 16: What You Must Know about Hadoop and Spark Right Now.” I like the “right now.” Urgency. I am not sure I feel too much urgency at the moment. I will leave that wonderful feeling to the executives who have sucked in venture money and have to find a way to generate revenue in the next 11 months.

The article runs down the basic generalizations associated with each of these open source data management components:

  • Spark
  • Hive
  • Kerberos
  • Ranger/Sentry
  • HBase/Phoenix
  • Impala
  • Hadoop Distributed File System (HDFS)
  • Kafka
  • Storm/Apex
  • Ambari/Cloudera Manager
  • Pig
  • Yarn/Mesos
  • Nifi/Kettle
  • Knox
  • Scala/Python
  • Zeppelin/Databricks

What the list tells me is two things. First, the proliferation of open source data tools is thriving. Second, there will have to be quite a few committed developers to keep these projects afloat.

The write up is not content with this shopping list. The intrepid reader will have an opportunity to learn a bit about:

  • Kylin
  • Atlas/Navigator

As the write up swoops to its end point, I learned about some open source projects which are a bit of a disappointment; for example, Oozie and Tez.

The key point of the article is that Google’s MapReduce which is now pretty long in the tooth is now effectively marginalized.

The Balkanization of data management is evident. The challenge will be to use one or more of these technologies to make some substantial revenue flow.

What happens if a company jumps on the wrong bandwagon as it leaves the parade ground? I would suggest that it may be more like a Pig than an Atlas. The investors will change from Rangers looking for profits to Pythons ready to strike. A Spark can set fire to some hopes and dreams in the Hive. Poorly constructed walls of Databricks can come falling down. That will be an Oozie.

Dear old Oracle, DB2, and SQLServer will just watch.

Stephen E Arnold, January 10, 2016

Short Honk: Hadoop Ecosystem Made Clear

January 3, 2016

Love Hadoop. Love all things Hadoopy? You will want to navigate to “The Hadoop Ecosystem Table.” You have categories of Hadoopiness with examples of the Hadoop amoebae. You are able to see where Spark “fits” or Kudu. Need some document data model options? The table will deliver: ArangoDB and more. Useful stuff.

Stephen E Arnold, December 30, 2015

The Importance of Google AI

December 23, 2015

According to Business Insider, we’ve all been overlooking something crucial about Google. Writer Lucinda Shen reports, “Top Internet Analyst: There Is One Thing About Google that Everyone Is Missing.” Shen cites an observation by prominent equity analyst Carlos Kirjner. She writes:

“Kirjner, that thing [that everyone else is missing] is AI at Google. ’Nobody is paying attention to that because it is not an issue that will play out in the next few quarters, but longer term it is a big, big opportunity for them,’ he said. ‘Google’s investments in artificial intelligence, above and beyond the use of machine learning to improve character, photo, video and sound classification, could be so revolutionary and transformational to the point of raising ethical questions.’

“Even if investors and analysts haven’t been closely monitoring Google’s developments in AI, the internet giant is devoted to the project. During the company’s third-quarter earnings call, CEO Sundar Pichai told investors the company planned to integrate AI more deeply within its core business.”

Google must be confident in its AI if it is deploying it across all its products, as reported. Shen recalls that the company made waves back in November, when it released the open-source AI platform TensorFlow. Is Google’s AI research about to take the world by storm?


Cynthia Murrell, December 23, 2015

Sponsored by, publisher of the CyberOSINT monograph

Open Source Survey: One Big Surprise about Code Management

November 23, 2015

I read “Awfully Pleased to Meet You: Survey Finds Open Source Needs More Formal Policies.”

The fact that eight out of 10 outfits in the sample were using open source software was no surprise. The sponsor of the survey is open source centric.

The point I highlighted was:

According to the study, less than 42% of organizations maintain a IT Asset Management (ITAM) style inventory of open source components.

When I read this, I thought, “Who keeps track of the open source components?”

The answer in more than half the companies in the sample was, “Huh? What?”

I circled this point:

Shipley [Black Duck top dog] has also added the following comment, “In the results this year, it has become more evident that companies need their management and governance of open source to catch up to their usage. This is critical to reducing potential security, legal, and operational risks while allowing companies to reap the full benefits OSS provides.”

Is the reason companies spend money with open source commercial plays buying management? If that is the case, the successful commercial open source outfit is the one that has the ability to manage, not the technology and trends the marketers at certain commercial open source companies hype.

Stephen E Arnold, November 23, 2015

Lucidworks: Another $21 Million in Funding

November 19, 2015

Lucidworks (a eight year old “start up” founded in 2007) has raised an additional $21 million in funding. According to Crunchbase, the total funds injected into the open source centric company is now $53 million.

The news release “Lucidworks Announces $21 Million in Series D Funding” states:

Lucidworks, the chosen search solution for leading brands and organizations around the world, today announced $21 million in new financing. Allegis Capital led the round with participation from existing investors Shasta Ventures and Granite Ventures. Lucidworks will use the funds to accelerate its product-focused mission enabling companies to translate massive amounts of data into actionable business intelligence.

The statement included this observation attributed to Spencer Tail, Allegis Capital:

Lucidworks has proven itself, not only by providing the software and solutions that businesses need to benefit from Lucene/Solr search, but also by expanding its vision with new products like Fusion that give companies the ability to fully harness search technology suiting their particular customers. We fully support Lucidworks, not only for what it has achieved to date — disruptive search solutions that offer real, immediate benefits to businesses — but for the promising future of its product technology.

Lucidworks, formerly Lucid Imagination, competes with Elastic. Companies from IBM to OpenSearchServer offer solutions which compete in the same market sector. Elastic’s funding is in the $104 million range.

The horses are away from the starting gate. And the winner will be a steed with the best jockey? Stay tuned because the track is muddy.

Stephen E Arnold, November 19, 2015

Next Page »