March 11, 2014
Despite the dubious quality of the blog Home-Education. Free Download., they do make an interesting point with the post “TemaTres Pack.” Other than a link to a questionable download Web site, there is nothing in the post. What sort of knowledge can a user glen from a blog that was obviously made to house content and make a few cents on a dollar for the creator?
TemaTres is a legitimate open source vocabulary server developed to manage and exploit dictionaries, taxonomies thesauri, and other formal representations of knowledge. It can also be downloaded at SourceForge, trust this over the above link.
Open source is a key player in technology and software development. Proprietary and open source are ingrained with each other and it is difficult to discern where the line is drawn-except when money comes into play. This link to a TemaTres download begs the question: what does free downloads do to the business models of Smartlogic, Modeca, and other vocabulary management firms?
Companies are built on the entire premise of developing software to manage information, control vocabulary lists, and present it in a useful form. Open source is a boon to users, but is TemaTres going to dampen these companies’ profits? It is possible, but open source lacks the organization of a paying its developers and sometimes offering a robust solution without an IT professional.
March 10, 2014
IBM might not be the first name when it comes to open source, but they experiment in that area and they have offered a free, downloadable version of BigInsights. On IBM’s developerWorks page, IBM InfoSphere BigInsights Quick Start Edition can be downloaded without any strings. It was made available to anyone who wants to experience enterprise level features, play with Hadoop, and figure out what it can be used for.
IBM describes Infosphere BigInsights Quick Start Edition as:
“IBM InfoSphere BigInsights Quick Start Edition is a free, downloadable non-production version of BigInsights that enables new solutions that cost effectively turn large, complex volumes of data into insight by combining Apache Hadoop, (including the MapReduce framework and the Hadoop Distributed File Systems), with unique, enterprise-ready technologies and capabilities from across IBM, including Big SQL, text analytics and BigSheets.”
Can IBM use the word “big” to explain its product even more? Yes, they can, because they forgot to include big data solutions. This is, of course, a sales gimmick to entice people to buy the professional edition, but it has the open source benefits, especially in customer support and the IBM name.
March 1, 2014
I read “Splunk’s Q4 Expenses Run Hot as It Adds Salespeople.” I think of a Splunk as a search and data access system that helps make sense of log files. I know that Splunk does more, but once I get an idea in my head, it is sometimes overly persistent.
The write up presented some interesting information.
- Splunk is running up its expenses
- Some of the expenses are related to hiring sales people to make sales (obviously)
- Other costs were related to marketing a “hot” company’s wares.
Splunk is confident that the losses are anomalous.
I am not sure I agree. The simple reason is that Splunk’s success has given developers the idea that open source software can do what Splunk does better, faster, and cheaper. Usually, one has to pick two of these attributes.
But—and this is a big “but”—the thorn in Splunk’s side is Elasticsearch. The open source search system works wonders on some of the data that Splunk embraced. The Elasticsearch outfit is flush with cash from its recent round of funding. Even the azure chip “real journalist” operation at InfoWorld called Elasticsearch “hip.”
Other, probably less “hip” competitors like Lucid Works (formerly Lucid Imagination) want in on the Splunk game. Lucid wants to partner; Elasticseaerch wants to let its legions of developer fanatics take the company wherever the Elasticsearch technology makes sense.
In my opinion, Splunk has a developer perception problem. I am not sure hiring sales people and pumping money into marketing is going to blunt the short and mid term impact of the Elasticsearch juggernaut.
Stephen E Arnold, March 1, 2014
February 27, 2014
I came across “Why Is Atom Closed Source?” The thread had a very interesting statement from mojombo. I quote:
Atom won’t be closed source, but it won’t be open source either. It will be somewhere inbetween, making it easy for us to charge for Atom while still making the source available under a restrictive license so you can see how everything works. We haven’t finalized exactly how this will work yet. We will have full details ready for the official launch.
Several years ago I gave a talk and used this diagram to illustrate the spectrum of open source search software:
Some of my information explaining the diagram turned up in an azure chip consulting firm report. Well, that’s how the semi straight consulting firms work.
The point of the diagram is that open source software is on a path to be commercial software. The open source cheerleaders deny this trend. I, on the other hand, submit that the Atom quote makes it pretty darned clear that being a little pregnant is not much different from having a commercial baby. Open source is increasingly a marketing ploy with lipstick.
Stephen E Arnold, February 27, 2013
February 26, 2014
I read “Splunk Feels the Heat from Stronger, Cheaper Open Source Rivals.” InfoWorld is up to its old tricks again. Log files have been around for decades. Many organizations allow more recent entries to overwrite previous log files. I know that some people believe that this practice has gone the way of the dodo. Well, would you like to buy a bridge?
For those who keep log files and want to figure out what treasures nestle therein, an outfit has marketed an expensive “search” system. Splunk is the darling of many information technology gurus. In Washington, DC, I am surprised when laborers in the Federal vineyard do not sport a Splunk tattoo.
IDC’s view is that there is charge rolling down the road. The write up points out that Splunk is no longer limited. Like most information access systems, the company has expanded. In fact, the wizards at IDC parrot the jargon: Analytics. Here’s the passage I noted:
Splunk started strong and has only grown stronger as it’s branched out to become a wide-ranging analytics platform. But the free version of Splunk is quite limited, and the enterprise version’s pricing is based on the amount of data indexed, which adds up to prohibitive costs for some.
The important factoid is, in my opinion, cost. Most organizations want to reduce costs for some little understood information tasks. Making heads or tails out of the ever burgeoning and frequently overwritten log files may be at the top of the budget tightening list.
IDC, truly an expert in open source software, points out that “open source competition has been emerging in the background.” I suppose that’s why IDC is selling at $3,500 a whack analyses of open source such as this gem produced in part by IDC’s wizards. See Report 237410. Who wrote that? Worth a look I suppose.
The angle is that Graylog2 and Elasticsearch are chasing after Splunk. I am not sure if this is old news, good news, or silly news. What’s clear is that InfoWorld is covering open source and not emphasizing its deep research.
Cost control is a subtle point. I am delighted that the write up creeps up on one of the central attributes of open source software: No license fees. But what of the costs of installing, tuning, and maintaining the open source solution? Ah, not included in the write up. If you pony up $3,500 for an IDC open source report, I assume more substance is provided. Who wrote those IDC open source reports like 237410? Was it an IDC analyst, marketer, or reporter? Did the information come from another source?
Anyway, good PR for Elasticsearch. Bad PR for Splunk.
Stephen E Arnold, February 26, 2014
February 17, 2014
I did a series of reports about open source search. Some of these were published under mysterious circumstances by that leader of the azure chip consultants, IDC. You can see the $3,500 per report offers on the IDC site. Hey, I am not getting the money, but that’s what some of today’s go go executives do. The list of titles appears below my signature.
Elasticsearch, a system that is based on Lucene, evolved after the still-in-use Compass system. What seems to have happened in the last six months is one of those singularities that Googlers seek.
In January 2014, GigaOM, a “real news” outfit reported that Elasticsearch had moved from free and open source to a commercial model. You can find that report in “6 million Downloads Later, Elasticsearch Launches a Commercial Product.” The write up equates lots of downloads with commercial success. Well, I am not sure that I accept that. I do know that Elasticsearch landed an additional $24 million in series B funding if Silicon Angle’s information is correct. Elasticsearch, armed with more money than the now aging and repositioning Lucid Works (originally Lucid Imagination) has. (An interview with one of the founders of Lucid Imagination, the precursor of Lucid Works is at http://bit.ly/1gvddt5. Mr. Krellenstein left Lucid Imagination abruptly shortly after this interview appeared.)
I noted that in February 2014, InfoWorld, owned by the publisher of the $3,500 report about Elasticsearch, called the company “ultra hip.” I don’t see many search companies—proprietary or open source—called “hip.” “Ultra Hip Elasticsearch Hits Commercial Release.” The write up asserts (although I wonder who provided the content):
Elasticsearch was originally spun off from the Compass project, an open source Java search engine framework, back in 2004, in an effort to create a highly scalable search solution. Built on top of the well-known and popular Lucene library from the Apache Software Foundation, Elasticsearch adds such features as multitenancy, sharding, faceted search, and a JSON-based REST API. This feature set puts it in competition with the Solr project as a complete search solution built on top of Lucene.
The statement does not hit what I thought are the main points of the Elasticsearch initiative. let me fill in the blanks. Perhaps an azure chip consultant can use these to whip up another $3,500 report?
February 7, 2014
Whether Autonomy’s product success is true or false, as proprietary software it comes with a large price tag. The average small business or user cannot afford to purchase HP Autonomy’s IDOL Crawler. Open source is the best alternative, but for the longest time you could not get software comparable to IDOL Crawler. Norconex says that has changed in the article, “An Open Source Crawler For Autonomy IDOL.” Norconex released an HP Autonomy IDOL Committer for its open source Web crawler Norconex HTTP Collector.
The HTTP Collector is available for Github. The developer encourages people to download it and contribute to the project. Its features are mostly the same as those from HP Autonomy HTTP Connector.
The article states:
“Most key features of HP Autonomy HTTP Connector are available in Norconex HTTP Collector, including document changes detection on incremental crawls and purging documents from IDOL for deleted web pages. New ones are introduced, such as having different hit interval at different time of the day and the ability to overwrite pretty much every part of the web crawling flow with your own implementation logic. The IDOL Committer has been tested on diverse public and internal web sites with great performance.”
We can learn from the open source community that if there is not a piece of software you want, all you have to do is wait until a developer makes it or you can take the initiative to do it yourself.
Whitney Grace, February 07, 2014
January 22, 2014
If you are thinking about building applications based on topic maps and do not feel like shelling out money for proprietary software, then do not look any further than Ontopia! Ontopia is an open source tools suite with features such as an ontology designer, a full-featured query language; web services points, database storage, and an instance data editor. There are many more powerful tools available with Ontopia outlined here.
Ontopia has been an on-going project in the open source community for over a decade and has an interesting history:
“The product suite is highly mature. Ontopia 1.0 was released in June 2001, and we are now nearing the release of Ontopia 5.1. Ontopia has been in production use in a number of commercial projects on three continents for many years now, and the core engine has been very stable over most of that period. Ontopia is open source and released under the Apache License 2.0. The entire product is released as open source. There are no proprietary add-ons, which are necessary to run it, or to make it suitable for an enterprise setting. Commercial support, however, is available.”
A developer community that has been attached to the project for years keeps up Ontopia and there are new participants from Europe. If you are curious about recent activity with Ontopia, they keep a page with Google Code and they also recently updated the Web site’s design.
January 22, 2014
Did you know that there was an open source version of ClearForest called Calais? Neither did we, until we read about it in the article posted on OpenCalais called, “Calais: Connect. Everything.” Along with a short instructional video, is a text explanation about how the software works. OpenCalais Web Service automatically creates rich semantic metadata using natural language processing, machine learning, and other methods to analyze for submitted content. A list of tags are generated and returned to the user for review and then the user can paste them onto other documents.
The metadata can be used in a variety of ways for improvement:
“The metadata gives you the ability to build maps (or graphs or networks) linking documents to people to companies to places to products to events to geographies to… whatever. You can use those maps to improve site navigation, provide contextual syndication, tag and organize your content, create structured folksonomies, filter and de-duplicate news feeds, or analyze content to see if it contains what you care about.”
The OpenCalais Web Service relies on a dedicated community to keep making progress and pushing the application forward. Calais takes the same approach as other open source projects, except this one is powered by Thomson Reuters.
January 13, 2014
Here is an open-source solution for the search crowd: check out the beta version of LogicPull, available at GitHub, for some content magic. The tool lets one create advanced interviews for end users, then feeds their answers to document templates.
The description elaborates:
“LogicPull was initially developed to save time and money creating the many legal documents needed for a court proceeding. It has since expanded to handle the assembly of PDF, DocX, RTF and XML documents for any project. It is a cloud based automated document assembly service. We give you the tools to quickly create an advanced question and answer interview to be completed by an end user, which in turn creates an answer set to be combined with a template to produce documents.
*Multiple Document Formats Supported
*Create Complex Branching Logic
*Keep your Data and Documents in the Cloud
*Save Progress on Client Interviews
*Attach Custom Templates to Guided Interviews
*Preview your Work Before it Goes Live
*Send Processed Documents Automatically”
Naturally, the GitHub entry lists system and software requirements for running LogicPull, as well as a links to demos, an installation tutorial, and an article on building the solution logically. You can also look through the FAQs, known issues, envisioned improvements, and other key info. One point to note: in order to use the full version of LogicPull, one must register. However, at the time of this writing, the site is a victim of its success—so many folks have registered recently, that sign-up is currently disabled. Let us hope it will re-open soon.
Cynthia Murrell, January 13, 2014