October 21, 2014
If you want to avoid the hassle of some proprietary search engines, you may want to take a look at this case study about ElasticSearch. Navigate to “Building Scalable Search from Scratch with ElasticSearch.” The author works through his process for putting ElasticSearch to work in content space with a variety of information; for example, products, text collections, and user information.
What makes this write up useful is the logical layout of the article and the inclusion of a requirements summary, block diagrams, and code snippets.
This type of solid user support is one reason ElasticSearch is outpacing some open source search competitors like LucidWorks and Nutch.
Highly recommended. (As far as I can tell, no mid tier consulting firms has surfed on this content. Dave Schubmehl, this may be an opportunity.)
Stephen E Arnold, October 21, 2014
October 15, 2014
There is a presentation “Kicking the Bukkit: Anatomy of an Open Source Meltdown” by Ryan Michela, a developer with experience in open source. Over several years, a game open source project rose and fell. I am not too interested in open source games. At the end of the Slideshare document, there are five reasons an open source game project failed.
Let me summarize these and encourage you to work through he full 55 slide deck. How many of these issues may have an impact on open source search systems. Keep in mind that commercial enterprises like Attivio and IBM make use of open source technology.
- Inclusion of decompiled code in an open source project
- License issues
- Ties ups within the community before a project gains momentum
- No contributor license agreement
- Disgruntled developers in the community.
The presentation includes a quote that I noted:
It only takes one unhappy developer to kill an unprotected project.
Is there an open source search company vulnerable to one or more of these issues? I can name a couple. I wonder if the firm’s funding sources are concerned about their investment “kicking the bucket”?
Stephen E Arnold, October 15, 2014
October 14, 2014
The article on Linux Insider titled Dan Allen and Sarah White: Documentation Dearth Dooms Open Source Projects discusses the work of entrepreneurs Allen and White. The pair have focused on encouraging and aiding software developers in “superior documentation” for open source software. The article includes an interview with White and Allen explaining the function of their program, called Asciidoctor. Allen states in the interview,
“What we have done with Asciidoctor is make the documentation something of value. We do that by, number one, rewarding the writer. For most software developers of open source software, whatever documentation that is created gets published on the website. So we show the developer how the content looks on a Web page displayed in Asciidoctor. When the software developer sees how minor the content is, that triggers motivation to fill in the gaps.”
According to Sarah White, software developers have had a “stunning” response to the motivation to improve documentation (which includes, White notes, improvements to the homepage and to training materials.) Since their start in November, White claims that there has been a tremendous influx of clients interested in making the sort of improvements that White and Allen offer. In the future, White is particularly interested in ensuring that all documentation is integrated to render well on different types of devices, particularly mobile screens.
Chelsea Kerwin, October 14, 2014
October 8, 2014
I read “12 Open Source CRM Options.” I think of CRM as a synonym for customer experience or CRM as an easy way to suck down a top salesperson’s contact list when he or she heads to greener pastures. I know. I am shortsighted.
The write up surprised me because I did not know there were a dozen open source CRM solutions, components, or widgets. I assumed there were the big buck systems from Oracle and Salesforce.com. I was uninformed.
I had heard of SugarCRM because one of the proprietary search vendors supports the system. I had not heard of:
Vtiger (a variant of SugarCRM), SuiteCRM, Fat Free CRM, Odoo, Zurmo, EspoCRM, SplendidCRM, OpenCRX, X2Engfine, Concourse Suite, or CentraView.
Well, there you.
My reaction to this basket of “suites” is that search is going to be part of the offering. When the baked in solution falls short, then the licensees will look for more robust solutions. For me, that means taking a look at the open source search solutions. ElasticSearch and Sphinx Search come to mind, but there are others.
I would not be too keen to license one of the proprietary search systems for three reasons:
- Try open source and if it works, the money can be used for other things. Raises or hiring a tastier consultant
- There are satisfactory information retrieval solutions that run from the cloud, on premises, or in a hybrid mode
- The hassles of integrating an open source and a proprietary system can be sidestepped. Integration is never a walk in the park, but it seems that open source begets open source.
Stephen E Arnold, October 8, 2014
October 6, 2014
You can download the most recent version of ElasticSearch via the link in the ElasticSearch blog. Navigate to http://bit.ly/1uxnOfN and click the download button. Changes include a fix for shard recovery and corruption occurring when a licensee upgrades an old index.
Stephen E Arnold, October 1, 2014
September 29, 2014
Navigate to “Postgres Full Text Search Is Good Enough.” I first heard this argument at a German information technology conference a few years ago. The idea is surprisingly easy to understand. As long as a user can bang in a couple of key words, scan a result list, and locate information that the user finds helpful—job done. The search results may consist of flawed or manipulated information. The search results may be off point for the user’s query when evaluated by old fashioned methods such as precision and recall. The user may be dumb and relies on what the user finds accurate.
This write up explains the good enough approach in terms of PostgreSQL, a useful open source Codd type data management system. Please, note. I am not uncomfortable with good enough search. I understand that when the herd stampedes, it is not particularly easy to stop the run. Prudence suggests that one take cover.
Here’s the guts of the write up:
What do I mean by ‘good enough’? I mean a search engine with the following features:
- Ranking / Boost
- Support Multiple languages
- Fuzzy search for misspelling
- Accent support
Luckily PostgreSQL supports all these features.
The write up contains some useful code snippets to make use of search features. The discussion of full text search is coherent and addresses a vast swath of content. Note that proprietary vendors have tilled acres of marketing earth and fertilizer to convert search into a mind boggling range of functions.
This article includes code snippets to tackle full text within PostgreSQL.
Querying is included as well. Again, code snippets are included. (My teenage advisors said, “Very useful snippets.” Okay. Good.
The write up concludes:
We have seen how to build a decent multi-language search engine based on a non-trivial document. This article is only an overview but it should give you enough background and examples to get you started with your own….Postgres is not as advanced as ElasticSearch and SOLR but these two are dedicated full-text search tools whereas full-text search is only a feature of PostgreSQL and a pretty good one
Reasonable observation. Worth reading.
If you are a vendor of proprietary search technology, there will be more individuals infused with the sprit of open source, not fewer. How many experts are there for proprietary systems? Fewer than the cadres of open source volk I surmise.
Stephen E Arnold, September 29, 2014
September 29, 2014
I read “Tibco Sells Out to Private Equity in $4.3bn Deal with Vista Equity Partners.” I found Tibco interesting when I saw the servers used to power Yahoo News a number of years ago. The company is now owned by accountants and MBAs. I learned in the write up:
Tibco was founded in 1997 by its current chairman and CEO Vivek Ranadive. It was a pioneer of message-oriented middleware, particularly for the financial sector, which enables information to be pushed to multiple recipients at precisely the same time. However, Tibco’s expensive high-end proprietary software is under attack from open source in the form of the Advanced Message Queuing Protocol (AMQP), which promises not just lower-cost message queuing software, but also inter-operability between different vendors’ implementations of the open-source standard.
My recollection is that Tibco’s “information bus” made some of the old line outfits uncomfortable. Perhaps IBM? If the write up is accurate, open source is claiming a proprietary vendor.
How long will proprietary enterprise search vendors be able to keep the open source predators away? If the financial market gets the willies, the collapse of over hyped proprietary systems are likely to face high seas. Some swimmers drown in rough water even though the marketers insist the sun is shining.
Stephen E Arnold, September 29, 2014
September 26, 2014
The race for commodity pricing in cloud computing is underway. I read an article, which I assume is semi-accurate, called “Microsoft Azure Sees Big Price Reductions: Competition Is Good.” “Good” is a often a relative term.
For those looking for low cost cloud computing that delivers Azure functions, lower prices mean that Amazon- and Google-type prices may be too high.
For a vendor trying to pitch an information retrieval system to a Microsoft centric outfit, the falling prices may mean that Azure Search is not just good enough. It is a deal. The only systems that can be less expensive are those one downloads from an open source repository or one that a hard worker codes herself.
The write up states:
Microsoft has announced, in a blog post, that it will be slashing the cost of some of its Azure cloud services from October 1st….customers buying through Enterprise agreements will enjoy even lower prices. The rate card currently shows 63 services being reduced by up to about 40%.
For enterprise search vendors chasing SharePoint licensees with promises of better, faster, and cheaper—the move by Microsoft is likely to be of interest.
I anticipate that search vendors will scramble even harder than ever. Furthermore, I look forward to even more outrageous assertions about the value of content processing. As an example, check out this set of assertions about an open source based system that has been scrambling for purchase on the sales mountain for six or seven years.
Stephen E Arnold, September 26, 2014
September 24, 2014
I read “Red Hat CEO Announces a Shift from Client-Server to Cloud Computing.” With Red Hat the poster child for the economic viability of an open source business model, this shift seems to mark a break with Red Hat’s past focus.
The article reports:
In case you haven’t gotten the point yet, Whitehurst [Red Hat big gun] states, “We want to be the undisputed leader in enterprise cloud.” In Red Hat’s future, Linux will be the means to a cloud, not an end unto itself.
No problem with this move. Most of the organizations with which I have contact bemoan the cost of on premises computing. The cloud, as I understanding their MBA-tinged reasoning, is cheaper. Cut back on staff, eliminate the expensive weekend triage sessions with engineers who charge more than roving physicians in New Jersey, and the hassles of human resources professionals who complain about body shops, background checks, and turnover—these themes surface.
The move should be okay for Red Hat. The company is moving in a new direction. Existing customers will be okay for the foreseeable future.
On a related note, I was scanning one of the less and less heavily visited LinkedIn enterprise search bulletin boards. What did I see? A brave soul was looking for a hosted version of Solr, presumably for its facets and perceived zippy performance.
In one of the comments—an “expert” mentioned that Lucid Works, which invokes from me the thought, “Really?”—said that the Lucid Works cloud offering was no longer available.
I suppose this is an example of contrarianism, but if the statement were true, maybe Lucid Works knows something that has eluded Red Hat? Interesting question. My hunch is that Red Hat knows what it is doing.
Stephen E Arnold, September 23, 2014
September 24, 2014
Though many news sites allow ads to more or less (depending on the site) blend in with their real articles, this native advertising is usually easy enough to spot if you know what you’re looking for. Still, it can put a crimp in one’s skimming speed. Now, Google engineer Ian Webster offers the open source AdDetector, a browser plug-in that makes such “stories” more obvious. The plug-in is currently available for Chrome and Firefox. The description states:
“AdDetector reveals articles with corporate sponsors. This browser plugin puts a red banner above articles that may appear unbiased but are actually ads or press releases. Its goal is to improve transparency in media and on the web. Trusted by 14,000+ people, AdDetector spots ads in over 100 top newspapers and online publications. More sites are being added daily. If you’d like to see a site added, tweet, email, or use this form.”
The page includes screenshots of its banners in action. The software works by detecting sponsor markings on these pages, many of which are not visible to readers. There is no word on the plug-in’s error rate, but it seems bound to smooth the path for news speed-readers like me.
Cynthia Murrell, September 24, 2014