Open Source Search Gets Confusing

April 3, 2014

Elasticsearch is the favored open source search application and many startups have built their own products on top of the platform, increasing competition among the startups. InfoWorld lets us know that the competition is about to get stiffer in the article, “Logstash Steps Up As Splunk’s Latest Challenger.”

Splunk offers many big data solutions, including security, analytics, application management, and cloud services. The article explains that Logstash is part of a components stack also including Kibaba and Elasticsearch. It is used to log data and can be configured to a user’s needs. It is an Apache-licensed open source endeavor and has a lower cost margin (either free or a different pay for support plans). Elasticsearch has commercialized Logstash through its Marvel product.

It does not appear that Logstash is a direct competitor, but the article explains:

“So far, the biggest distinction between Splunk and its competition is how they’re productized. Splunk’s a proprietary item, but with the emphasis on it being a product and not simply a technology stack. The competition still largely consists of open source stacks rather than actual services, but it’s clear the gap between what Splunk offers at a cost and what others offer for free is closing.”

Another new service pressures Lucid Imagination and other search vendors to create a response, which also makes investors inpatient as Elasticsearch surges forward with bigger and better ideas. Search vendors are lost in the middle as they try to be competitive and earn a profit at the same time. Kudos to Elasticsearch and open source applications.

Whitney Grace, April 03, 2014
Sponsored by, developer of Augmentext

OpenCalais Has Big Profile Users

April 2, 2014

OpenCalais is an open source project that creates rich semantic data by using natural language processing and other analytical methods through a Web service interface. It is a simple explanation for a piece of powerful software. OpenCalais was originally part of ClearForest, but Thomson Reuters acquired the project in 2007. Instead of marketing OpenCalais as proprietary software, Reuters allowed it to remain open. OpenCalais has since become valued metadata open source software that is used on blogs to specialized museum collections.

There are many notables who use OpenCalais and a sample can be found on “The List Of OpenCalais Implementations Grows.”

OpenCalais is excited about the new additions to the list:

“Add 10 to the list of innovative sites and services that use OpenCalais to reduce costs, deliver compelling content experiences and mine the social web for insight. See our press release for more details on each. We are thrilled to recognize the following new sites and services that are changing the way we engage with news and the social Web. They join a growing number of others in media, publishing, blogging, and news aggregation who use OpenCalais.”

Among them are The New Republic, Al Jazeera’s English blogging news networks, Slate Magazine’s blogging network, and I*heart* Sea.” Not only do news Web sites use OpenCalais, but news aggregation apps do as well, including, Feedly. DocumentCloud, and OpenPublish. Expect the list to grow even longer and consider OpenCalais for your own metadata solution.

Whitney Grace, April 02, 2014
Sponsored by, developer of Augmentext

Elasticsearch: 70:30 Odds as the Next Big Thing in Search

March 28, 2014

We learned on March 26, 2014  suggesting that the German search vendor Intrafind has been looking for the next big thing. The company may have found it, and we expect that this low profile vendor will be plugging into the Elasticsearch power cable. Wikipedia already has, joining hundreds of other firms looking for a solution to doggy indexing in some other open source centric solutions.

Elasticsearch repackager SearchBlox has rolled out Version 8 of its hosted Elasticsearch system, according to Timo Selvaraj, Co-Founder/VP Product Management of SearchBlox.

As if these two recent developments were not enough, GoveWizely, a Washington, DC engineering services firm, has added Elasticsearch to its arsenal. GovWizely, operated by Erik S. Arnold (yep, that’s my boy) has moved adroitly to capitalize on the surging interest in Elasticsearch’s high performance system.

Contrast Elasticsearch’s rise as the go to open source enterprise search system with the struggles of other open source search vendor and some commercial outfits. LucidWorks has ingested $2 million in venture funding, according to Crunchbase. Elasticsearch has received $34 million in funding. Parity, right?

Not so “fast”. (A gentle nod to the fascinating proprietary system shoe horned by Microsoft into SharePoint.) Elasticsearch seems to be catching up to LucidWorks or winning the critical struggle for developers. Here’s the Elasticsearch pitch:


Understated and quiet, according to my engineering team. Could the developments at Intrafind, SearchBlox, and Adhere Solutions, among others, are an early warning system, Elasticsearch certainly could be the “next big thing” in search, enterprise and otherwise.

What’s this mean for the proprietary and non open sourcey vendors like Coveo, Funnelback, Lexmark ISYS, and Hewlett Packard? I would suggest that these firms’ management have to adapt to what appears to an emergent and disruptive force in information processing. If Elasticsearch does emulate the growth of the pre HP Autonomy, the likelihood that the millions of venture funding pumped into search funding and search acquiring may never be repaid. Chilling thought for some stakeholders who may have jumped on the wrong horse and seem compelled to continue to feed the nag fresh, expensive, non recoverable “clover.” (Think millions in hard cash funding with little to show that a payback is imminent or even possible.)

Read more

GitHub Search: Handy for Some Amazon Sportiness

March 24, 2014

GitHub, an open sourcey operation, is in the news again. Navigate to “AWS Urges Developers to Scrub GitHub of Secret Keys.” ITNews reports that some math club members—sorry, open source folks—have “inadvertently exposed their log-in credentials.”

The write up points out that a search of GitHub “for AWS keys returns almost 10,000 results.” The article notes:

GitHub is a community site where developers post their code and allow collaboration from other interested devs. The problem is developers aren’t taking enough care to ensure their credentials are properly protected.

With the management issues at GitHub, perhaps open source evidences some of the fissures in the open source approach to life, business practices, and, of course, search?

Stephen E Arnold, March 24, 2014

Open Source Management: A Work in Progress

March 20, 2014

I have attended a couple of “open source” events over the year. Most of the attendees are male, serious, bright, and similar to the fellows in my advanced high school math class and our math club.

The few women present were notable because there were so darned few of them. I attended the first Lucene Revolution with two exceptionally competent females, one a law librarian and one a PhD in operations management.

My recollection is that no one from Lucid, the sponsoring organizations, or the general attendance group paid either much, if any, attention, even when I introduced them.

As I recall, one of the then-senior executives of Lucid Imagination (now Lucid Works) blew off suggestions made by my PhD colleague. It was not what the Lucid person said. It was the facial expression that communicated, “Wow, do I have to listen to yet another idea from a PhD from Kentucky. I have better things to do with my really valuable time.”

I found the meeting amusing.

The female PhD did not share my point of view. Eighteen months later, that male Silicon Valley “superstar” was sucked by Lucid’s revolving door and spit into the ever sunny Silicon Valley job market. My team and I moved on, concluding that at least one open source search company was pretty much like my high school classmates in the math club. Others? Who knows? Who cares?

So what?

Well, I read a fascinating East Coasty article in the April Harper’s Magazine. The story is “The Office and its Ends”, a book extract from Cubed: A Secret History of the Workplace. Harper’s is into the Trotula recycling approach to content. Nikil Saval and his publisher Doubleday are, no doubt, thrilled by the East Coasty endorsement. Book sales are the name of the game.

Here’s the passage I noted. The extract is describing the workplace at GitHub, where much search source codes resides. The GitHub begins on page 14. You will have to snag a hard copy of the library on a newsstand, even though these are getting hard to find in rural Kentucky. Good hunting, gentle reader.

GitHub seems to be a case example of how to do the workplace.

The hook is in my opinion:

Chacon [the GitHub CIO and a founder] described this [the GitHub workplace approach] as having developed from the open source model: ‘You have all these projects that you can work on, and people choose the crossover of what they’re good at… Leadership can be ephemeral.’

No doubt about leadership ephemerality in open source companies. The whizzing of the revolving door can be discerned in Harrod’s Creek, Kentucky.

This passage struck me as one to underline:

Yet Scott Chacon, one of the company’s founders and its current CIO, kept referring to the value of employees’ being able to ‘serendipitously encounter’ one another throughout the workday. When I [Mr. Saval] asked Chacon how this was supposed to occur if most of the staff wasn’t actually required to come into the office, he explained that he wanted these encounters to be rare, once every month or two, and to ‘deeper interactions.’…’That’s way more valuable to me that ‘I saw this person when I was going to the bathroom,’ or ‘I had to wait in line behind them when I was waiting for food.’ It seemed to me [Mr. Saval] a valid rebuke to the lazier ideas the proliferated in office-design-speak around the world.

I think Mr. Saval sees GitHub as a model for other companies to emulate. There you go. A model for alleged harassment.

By chance, I came across a CNNMoney article “GitHub Suspends Founder over Gender Harassment Claims.” I have no idea if CNN was able to put sufficient resources into researching GitHub because most of the “news” efforts are directed at a missing airplane story. Nevertheless, I will assume the write up is semi-accurate. Here’s the snippet I noted:

“I’ve been harassed by ‘leadership’ at GitHub for two years,” she wrote. “I’m incredibly happy to moving to join a more healthy work environment, with a team who doesn’t tolerate harassment of their peers.”

I circled this passage as well:

It’s hardly the first time a female entrepreneur has pointed out sexism in tech. Last year, tech developer Adria Richards posted to Twitter after taking offense to a sexual reference made by male attendees at tech conference PyCon. One of the men who made the reference was fired, and in a bizarre twist, Richards was also fired for “publicly shaming the offenders.” In another incident at annual tech conference TechCrunch Disrupt, entrepreneurs came under fire for pitching controversial apps…

Several observations:

  1. I wonder how Mr. Saval perceives this situation. I am not sure the GitHub workplace is where I want my daughter to work. If Mr. Saval has a daughter, a wife, a female cousin, I wonder if he would use his connections at GitHub to get one of these females a job.
  2. I wonder if the Lucid Imagination former executive is aware that my PhD colleague could have interpreted his treatment of her as untoward behavior. My hunch is that the disconnect between this Silicon Valley warrior an an African American PhD was so great that bridging the gap was impossible. I wonder if the fellow from Lucid Imagination even knew there was a gap.
  3. What does this Janus-like approach at GitHub say about open source management methods? I have a few ideas, but I will tuck them in my pocket for now.

To wrap up, the East Coasty approach to open source is intriguing. How will other open source companies manage. Will the guy-centric math club approach change? At my 50th high school reunion, the math club folks sat by themselves. Some behaviors are consistent through time I believe.

The major challenge open source faces is management. I will clutch this assertion until someone demonstrates that whiz bang, I’m too busy, my plane is late methods really do deliver value to stakeholders and employees. With venture funding pouring into “open source plays”, how will these companies generate sufficient revenue to pay off the investors? Do Facebook, Google, IBM, and Yahoo have sufficient resources to buy every open source start up?

A decade ago even Google was smart enough to admit that it needed adult supervision. Even with an adult on the job, Google is a case study cornucopia; for example, the alleged relationship between a Google founder and a Glass marketer. Ample evidence appears to exist that high tech management has not found its sweet spot outside of the high school math club. If tech is the future of America’s industrial performance and open source software is the heir to proprietary software, when will management manage? One hopes in time to prevent the alleged unfortunate problems at GitHub from becoming more widespread.

Stephen E Arnold, March 20, 2014

Download A Free TemaTres Pack

March 11, 2014

Despite the dubious quality of the blog Home-Education. Free Download., they do make an interesting point with the post “TemaTres Pack.” Other than a link to a questionable download Web site, there is nothing in the post. What sort of knowledge can a user glen from a blog that was obviously made to house content and make a few cents on a dollar for the creator?

TemaTres is a legitimate open source vocabulary server developed to manage and exploit dictionaries, taxonomies thesauri, and other formal representations of knowledge. It can also be downloaded at SourceForge, trust this over the above link.

Open source is a key player in technology and software development. Proprietary and open source are ingrained with each other and it is difficult to discern where the line is drawn-except when money comes into play. This link to a TemaTres download begs the question: what does free downloads do to the business models of Smartlogic, Modeca, and other vocabulary management firms?

Companies are built on the entire premise of developing software to manage information, control vocabulary lists, and present it in a useful form. Open source is a boon to users, but is TemaTres going to dampen these companies’ profits? It is possible, but open source lacks the organization of a paying its developers and sometimes offering a robust solution without an IT professional.

Whitney Grace, March 11, 2014
Sponsored by, developer of Augmentext

For Big Insights Try A Big Download From IBM

March 10, 2014

IBM might not be the first name when it comes to open source, but they experiment in that area and they have offered a free, downloadable version of BigInsights. On IBM’s developerWorks page, IBM InfoSphere BigInsights Quick Start Edition can be downloaded without any strings. It was made available to anyone who wants to experience enterprise level features, play with Hadoop, and figure out what it can be used for.

IBM describes Infosphere BigInsights Quick Start Edition as:

“IBM InfoSphere BigInsights Quick Start Edition is a free, downloadable non-production version of BigInsights that enables new solutions that cost effectively turn large, complex volumes of data into insight by combining Apache Hadoop, (including the MapReduce framework and the Hadoop Distributed File Systems), with unique, enterprise-ready technologies and capabilities from across IBM, including Big SQL, text analytics and BigSheets.”

Can IBM use the word “big” to explain its product even more? Yes, they can, because they forgot to include big data solutions. This is, of course, a sales gimmick to entice people to buy the professional edition, but it has the open source benefits, especially in customer support and the IBM name.

Whitney Grace, March 10, 2014
Sponsored by, developer of Augmentext

Splunk: The Run Up May Have Hit a Glass Ceiling

March 1, 2014

I read “Splunk’s Q4 Expenses Run Hot as It Adds Salespeople.” I think of a Splunk as a search and data access system that helps make sense of log files. I know that Splunk does more, but once I get an idea in my head, it is sometimes overly persistent.

The write up presented some interesting information.

  1. Splunk is running up its expenses
  2. Some of the expenses are related to hiring sales people to make sales (obviously)
  3. Other costs were related to marketing a “hot” company’s wares.

Splunk is confident that the losses are anomalous.

I am not sure I agree. The simple reason is that Splunk’s success has given developers the idea that open source software can do what Splunk does better, faster, and cheaper. Usually, one has to pick two of these attributes.

But—and this is a big “but”—the thorn in Splunk’s side is Elasticsearch. The open source search system works wonders on some of the data that Splunk embraced. The Elasticsearch outfit is flush with cash from its recent round of funding. Even the azure chip “real journalist” operation at InfoWorld called Elasticsearch “hip.”

Other, probably less “hip” competitors like Lucid Works (formerly Lucid Imagination) want in on the Splunk game. Lucid wants to partner; Elasticseaerch wants to let its legions of developer fanatics take the company wherever the Elasticsearch technology makes sense.

In my opinion, Splunk has a developer perception problem. I am not sure hiring sales people and pumping money into marketing is going to blunt the short and mid term impact of the Elasticsearch juggernaut.

Stephen E Arnold, March 1, 2014

Quote to Note: Open Source Is a Little Pregnant

February 27, 2014

I came across “Why Is Atom Closed Source?” The thread had a very interesting statement from mojombo. I quote:

Atom won’t be closed source, but it won’t be open source either. It will be somewhere inbetween, making it easy for us to charge for Atom while still making the source available under a restrictive license so you can see how everything works. We haven’t finalized exactly how this will work yet. We will have full details ready for the official launch.

Several years ago I gave a talk and used this diagram to illustrate the spectrum of open source search software:


Some of my information explaining the diagram turned up in an azure chip consulting firm report. Well, that’s how the semi straight consulting firms work.

The point of the diagram is that open source software is on a path to be commercial software. The open source cheerleaders deny this trend. I, on the other hand, submit that the Atom quote makes it pretty darned clear that being a little pregnant is not much different from having a commercial baby. Open source is increasingly a marketing ploy with lipstick.

Stephen E Arnold, February 27, 2013

Log Files: Search, Short Cuts, and Low Costs

February 26, 2014

I read “Splunk Feels the Heat from Stronger, Cheaper Open Source Rivals.” InfoWorld is up to its old tricks again. Log files have been around for decades. Many organizations allow more recent entries to overwrite previous log files. I know that some people believe that this practice has gone the way of the dodo. Well, would you like to buy a bridge?

For those who keep log files and want to figure out what treasures nestle therein, an outfit has marketed an expensive “search” system. Splunk is the darling of many information technology gurus. In Washington, DC, I am surprised when laborers in the Federal vineyard do not sport a Splunk tattoo.

IDC’s view is that there is charge rolling down the road. The write up points out that Splunk is no longer limited. Like most information access systems, the company has expanded. In fact, the wizards at IDC parrot the jargon: Analytics. Here’s the passage I noted:

Splunk started strong and has only grown stronger as it’s branched out to become a wide-ranging analytics platform. But the free version of Splunk is quite limited, and the enterprise version’s pricing is based on the amount of data indexed, which adds up to prohibitive costs for some.

The important factoid is, in my opinion, cost. Most organizations want to reduce costs for some little understood information tasks. Making heads or tails out of the ever burgeoning and frequently overwritten log files may be at the top of the budget tightening list.

IDC, truly an expert in open source software, points out that “open source competition has been emerging in the background.” I suppose that’s why IDC is selling at $3,500 a whack analyses of open source such as this gem produced in part by IDC’s wizards. See Report 237410. Who wrote that? Worth a look I suppose.

The angle is that Graylog2 and Elasticsearch are chasing after Splunk. I am not sure if this is old news, good news, or silly news. What’s clear is that InfoWorld is covering open source and not emphasizing its deep research.

Cost control is a subtle point. I am delighted that the write up creeps up on one of the central attributes of open source software: No license fees. But what of the costs of installing, tuning, and maintaining the open source solution? Ah, not included in the write up. If you pony up $3,500 for an IDC open source report, I assume more substance is provided. Who wrote those IDC open source reports like 237410? Was it an IDC analyst, marketer, or reporter? Did the information come from another source?

Anyway, good PR for Elasticsearch. Bad PR for Splunk.

Stephen E Arnold, February 26, 2014

Next Page »