Open Source DeepDive Now Available

January 14, 2015

IBM’s Watson has some open-source competition. As EE Times reports in “DARPA Offers Free Watson-Like Artificial Intelligence,” DARPA’s DeepDive is now a freely available alternative to the famous machine-learning AI. Both systems have their roots in the same DARPA-funded project. According to DeepDive’s primary programmer, Christopher Re, while Watson is built to answer questions, DeepDive’s focus is on extracting a wealth of structured data from unstructured sources. Writer R. Colin Johnson informs us:

DeepDive incorporates probability-based learning algorithms as well as open-source tools such as MADlib, Impala (from Oracle), and low-level techniques, such as Hogwild, some of which have also been included in Microsoft’s Adam. To build DeepDive into your application, you should be familiar with SQL and Python.

“Underneath the covers, DeepDive is based on a probability model; this is a very principled, academic approach to build these systems, but the question for use was, ‘Could it actually scale in practice?’ Our biggest innovations in Deep Dive have to do with giving it this ability to scale,” Re told us.

For the future, DeepDive aims to be proven in other domains. “We hope to have similar results in those domains soon, but it’s too early to be very specific about our plans here,” Re told us. “We use a RISC processor right now, we’re trying to make a compiler, and we think machine learning will let us make it much easier to program in the next generation of DeepDive. We also plan to get more data types into DeepDive.”

It sounds like the developers are just getting started. Click here to download DeepDive and for installation instructions.

Cynthia Murrell, January 14, 2015

Sponsored by, developer of Augmentext

Love Open Source? Good News and News

January 13, 2015

I read “Top 10 FOSS Legal Developments of 2014.” A legal eagle generated the listicle. Despite my skepticism for birds of this feather, the list has some good news and—well, to put it positively—news for the open source movement.

The good news is that folks from courts to government agencies are paying attention to free and open source software. The “news” news is that use of open source “by commercial companies expands.” The write up states:

We have discussed in the past how many large companies are using FOSS as an explicit strategy to build their software. Jim Zemlin, Executive Director of the Linux Foundation, has described this strategic use of FOSS as external “research and development.” His conclusions are supported by Gartner who noted that “the top tech companies are still spending tens of billions of dollars on software research and development, the smart ones are leveraging open source for 80 percent of the code and spending their money on the remaining 20 percent, which represents their program’s ‘special sauce.’” The scope of this trend was emphasized by Microsoft’s announcement that it was “open sourcing” the .NET software framework (this software is used by millions of developers to build and operate websites and other large online applications).

The other item of “news” news is that the dust up with regard to Google and Java for Android continues. Who wants to risk a similar patent action? The answer to that question will help inform your assessment of the “news”.

I interpreted the information to suggest that open source is increasingly commercial. Good news or just news?

Stephen E Arnold, January 14, 2015

FOSS Supporters: Sharing Development Costs the 21st Century Way

January 11, 2015

In the good old days, proprietary software was funded by the company owning the technology, shareholders/investors, and “partners” sucked into the “pay to work with us” model perfected by outfits like IBM.

I read “Big Names Like Google Dominate Open Source Funding.” One of the points that I gleaned from the write up is that a handful of larger commercial firms support certain open source projects. The data are based on various records and incomplete data sets. Also the data presumably do not include statements from Eastern European open source contributors who are polishing their résumé or college professors working to create their own bit of financial heaven.

The article includes a graphic that identifies some of the big supporters of open source. There are some names I did not recognize like Credativ and 10gen. But there were a few that jumped out at me; namely, Google, IBM, and that bastion of management excellence, Hewlett Packard.

I formulate three thoughts after working through the admittedly flawed analysis included in Network World, a publication which I view with healthy skepticism.

First, with large companies funding open source projects, the cost of R&D has been pushed down and shared. This is good for big outfits who can get out of the business of supporting software that are essentially utilities.

Second, open source is less about community and more about getting folks jobs and opportunities to set up an “open source consulting services company.” When you take a look at LucidWorks (Really?), you see folks trying to emulate pure consulting firms. But open source search is just one example of this model of using “free stuff” to sell expensive engineering. This works on a small scale, but when you try to pump up a “free software” company to the size of RedHat, that taxes the management capabilities of some whizzy Silicon Valley types.

Third, open source does not always result in free and open source software. Consider IBM’s approach. By repackaging Lucene and attributing serious juju to search, the company hopes to build a $10 billion business in 48 to 60 months. Not gonna happen. IBM faces many challenges, but those infected with spreadsheet fever twiddle the numbers to create a fictional world. Is Google really free and open source? What about Google Earth? Is Hewlett Packard, bless its management heart, is not quite the model of open source goodness shareholders want.

No surprises in the write up, but the change in what once seemed like a good idea does not trouble Network World. Open source sounds great and offers a way out of massive, continuing investments in maintaining certain types of software. That money can be better used to create proprietary extensions that customers have to pay for.

Stephen E Arnold, January 11,2015

Shades of Ray Kurzweil: Watson to Crack Ageing

January 11, 2015

I am not too keen on immortality. My view is that stuff dies. Age appropriate behavior means accepting the lot of mortal man.

But some folks want to extend their lives; others hope to live forever like the nano-stuff creatures in Alastair Reynolds’ novels.

I associate the live longer and collect stock options approach with Ray Kurzweil, the Google big thinker and music inventor. Well, I learned something in “IBM Watson’s Lab to Tackle Aging Issues.” Now Watson with its chugging heart of Lucene has lifetimes of revenue to generate before some activist investors put a bit in this pony’s mouth.

The write up says:

IBM Korea will build a cognitive computing center in Seoul to help tackle an aging society with technology. “IBM submitted a letter of intent to the Seoul Metropolitan Government last month to set up a Watson lab to study smart-aging technology,” IBM Korea said…

I found this statement remarkable because IBM has not turned Lucene and home-grown scripts into a multi billion dollar revenue stream. On the other hand, it has helped the delis close to the IBM Watson facility in Manhattan prospect.

IBM has taken major steps to develop Watson as a new business line for future success. Watson has made achievements in diagnostic medicine and cancer treatment.

The approach involves the phone and microwave company Samsung and various universities, start ups, and public relation professionals in South Korea.

I assume more details will be revealed in Technology Review, a publication that covers Watson’s twists and turns in exquisite, marketing detail.

If you want to get on the anti-ageing train, board in South Korea. Like the projected $10 billion in revenue from a Lucene based system, let me know how those crow’s feet fly.

Stephen E Arnold, January 11, 2015

Ranking Countries Data Openness

January 5, 2015 is one of the largest bastions for the open source community and they recently published an article that ranks countries around the world in how much of their data is open for public access: “The Global ‘Open’” Pulse From The 2014 Open Data Index.” The information is pulled from Open Knowledge’s 2014 Open Data Index. According to the numbers, governments are not being as open as they should, because the level is down to 11% from 15%.

“The OKF defines “open” in the context of this report as a data set which adheres to the open definition standard as open. The current definition of “open” per can be summarized as ‘open data and content can be freely used, modified, and shared by anyone for any purpose.’ “

There was progress in 2014, however. The United Kingdom is the most open. France and India rose on the list of openness. The number of countries who are open went from sixty to ninety-seven. The Is 70% open, dropping to 8th place over second in 2013. Africa, Asia, and the Middle East are improving their numbers.

Open Knowledge’s entire goal is to increase the amount of information about government activities, so people can exercise their rights. What is disappointing is that while many more countries are showing up on the list, they are not living up the definition of “open.”

Whitney Grace, January 05, 2015
Sponsored by, developer of Augmentext

LucidWorks (Really?) Wants to Kill Splunk (Really?)

December 22, 2014

Let’s hear it for originality. LucidWorks (really?) is not content to watch Elasticsearch’s lead in the open source enterprise search sector. LucidWorks (really?) seeks to distinguish itself in committing metaphorical murder of Splunk, one of the go-to log file centric solutions. What makes life more interesting is that the murder, which seems quite improbably, is patricide. The president of LucidWorks (really?) is a former Splunk employee.

Now that’s the stuff of Greek tragedy recast as Silicon Valley silliness. Navigate to “Lucid Woks Preps Solr Stack as Splunk Killer.” Note that LucidWorks (really?) is not yet a Splunk or anything else Richard Speck. If Recorded Future-type systems were to process this statement, I am not sure it would warrant more than a two percent probability. But here’s the plan:

SiLK “is a solution that relies on open core components that organizations can use to manage log data at scale,” said Will Hayes, LucidWorks chief product officer.The SiLK package combines Apache Lucene/Solr with a number of open-source analysis tools, namely Apache Flume, LogStash and Kibana.

LucidWorks will play catch up to Elasticsearch’s open source offering. Why catch up when you can try semantically questionable marketing ploys?

I think the dearth of marketing creativity is illustrative of the absence of fresh ideas at LucidWorks (really?). One thing is certain: use of the term “murder” will mark LucidWorks (really?) in an interesting way.

Hitting revenue targets, retaining staff, and innovating would be my preferred approach to this open source enterprise search company’s future. But if murder is the company’s game, “Book ‘em, Dan O. Marketing silliness.”

Stephen E Arnold, December 22, 2014


On Commercial vs Open Source Databases

December 22, 2014

Perhaps we should not be surprised that MarkLogic’s Chet Hays urges caution before adopting an open-source data platform. His article, “Thoughts on How to Select Between COTS and Open Source” at Sys-Con Media can be interpreted as a defense of his database company’s proprietary approach. (For those unfamiliar with the acronym, COTS stands for commodity off-the-shelf.) Hayes urges buyers to look past initial cost and consider other factors in three areas: technical, cultural, and, yes, financial.

In the “technical” column, Hayes asserts that whether a certain solution will meet an organization’s needs is more complex than a simple side-by-side comparison of features would suggest; we are advised to check the fine print. “Cultural” refers here to taking workers’ skill sets into consideration. Companies usually do this with their developers, Hayes explains, but often overlook the needs of the folks in operational support, who might appreciate the more sophisticated tools built into a commercial product. (No mention is made of the middle ground, where we find third-party products designed that add such tools to Hadoop iterations.)

In his comments on financial impact, Hayes basically declares: It’s complicated. He writes:

“Organizations need to look at the financial picture from a total-cost perspective, looking at the acquisition and development costs all the way through the operations, maintenance and eventual retirement of the system. In terms of development, the organization should understand the costs associated with using a COTS provided tool vs. an Open Source tool.

“[…] In some cases, the COTS tool will provide a significant productivity increase and allow for a quicker time to market. There will be situations where the COTS tool is so cumbersome to install and maintain that an Open Source tool would be the right choice.

“The other area already alluded to is the cost for operations and maintenance over the lifecycle of project. Organizations should take into consideration existing IT investments to understand where previous investments can be leveraged and the cost incurred to leverage these systems. Organizations should ask whether the performance of one or the other allow for a reduced hardware and deployment footprint, which would lead to lower costs.”

These are all good points, and organizations should indeed do this research before choosing a solution. Whether the results point to an open-source solution or to a commercial option depends entirely upon the company or institution.

Cynthia Murrell, December 22, 2014

Sponsored by, developer of Augmentext

New Azure Search Compared to Veteran Solr

December 19, 2014

Wondering how the new search function in Microsoft’s Azure stacks up against open-source search solution Solr? Sys-Con Media gives us a side-by-side comparison in, “Solr vs Azure Search.” It is worth noting that Azure Search is still in beta, so such a comparison might look different down the line. Writer Srinivasan Sundara Rajan sets the stage for his observations:

“The following are the some of the aspects in the usage of Solr in enterprises against that of Azure Search. As the open source vs commercial software is a religious debate, the intent is not aimed at the argument, as the most enterprises define their own IT Policies between the choice of Open Source vs commercial products and same sense will prevail here also, the below notes are meant for understanding the new Azure service in the light of an existing proven search platform.”

Rajan’s chart describes usage of each platform in four areas: installation and setup, schema, loading, and searching. Naturally, each platform has its advantages and disadvantages; see the article for specifics. The write-up summarizes:

“Azure Search tries to match the features of Solr in most aspects, however Solr is a seasoned search engine and Azure Search is in its preview stage, so some small deficiencies may occur in the understanding and proper application of Azure Search. However there is one area where the Azure Search may be a real winner for enterprises, which is ‘Scalability & Availability’…. Azure Search, really makes scalability a much simpler thing.”

As Microsoft continues to develop Azure Search, will it surpass Solr in areas besides scalability? Stay tuned.

Cynthia Murrell, December 19, 2014

Sponsored by, developer of Augmentext

Cisco Relies on OpenSOC through GitHub When it Comes to Big Data

December 10, 2014

The article on Enterprise Networking Planet titled Cisco Goes Open-Source for Big Data Analytics discusses the change for Cisco with some high-ups in the company. Annie Ballew, Solutions Architect in the Cisco Security Business Group, mentions that OpenSOC is not actually a Security Information and Event Management system but rather should be considered “big data technology for security analytics.” OpenSOC is freely available through Github. The article states,

“While the OpenSOC project itself is open-source, Cisco is already leveraging the technology in its commercial products.”OpenSOC is currently included in our Managed Threat Defense services offering where it is installed, implemented and fully operationalized,” Ballew said. Cisco launched its Manage Threat Defense service in April. That service manages and monitors logs as well as a customer’s security event lifecycle. Ballew added that OpenSOC is also integrated with various other Cisco security components such as Sourcefire FirePower NGIPS, SourceFire AMP, and ThreatGrid.”

The article also remarks on the importance of Elasticsearch to OpenSOC. The Kibana project provides the dashboard for the opensource Elasticsearch project, and Cisco admits that they work with Elasticsearch, but currently that relationship is only through Kibana. Cisco has worked with open-source before, so perhaps it should be no surprise that they turn to OpenSOC to meet their security demands when it comes to big data.

Chelsea Kerwin, December 10, 2014

Sponsored by, developer of Augmentext

Open Source Business Intelligence Tools: A Narrow View

December 2, 2014

Last week, a person with considerable experience in business intelligence told me that interest in open source software applicable to intelligence purposes was evident in South America. I poked around and came across “5 Open Source business intelligence Tools.” I was hoping to learn about open source real-time translation tools, geo-coding components, and old-school search software that hooked into some next-generation analytics and visualization components.

Wait for it.

I was disappointed. The write up presented a short list of open source systems that are well known to me. I need more than short comments about Jaspersoft, Pentaho, BIRT, RapidMiner, and SpagoDB. The article mentions three other business intelligence tools: Knime, Tactic, and ERP BI. All good, but not enough for my needs.

One reason vendors of proprietary business intelligence systems continue to capture the attention of some organizations is that the open source community develops in some areas of the barnyard and not others. What about Elasticsearch, Ikanow, and a number of other sources for quite useful open source software that can make significant contributions to business intelligence. (I am tempted to mention some US government open source contributions like NiFi too.) I think an information gap exists.

Stephen E Arnold, December 2, 2014

Next Page »