Hadoop Caught in Loops
July 4, 2009
Dana Blankenhorn’s “Who Will Control Hadoop?” here raised an important question. The focus was close, but I considered his question in a broader context. Mr. Blankenthorn asked:
Do too many Hadoops spoil the code?
In a narrow sense, my view is let many flowers bloom. When the world was less fluid, flakey, and financially challenged, many efforts seemed like a good idea. Now, I am not so sure. Mr. Blankenthorn said:
But some reporters are beginning to ask who is really in charge of Hadoop. Is it Apache or Yahoo? Was Yahoo’s distribution a diss of Facebook, which previously developed its own Hadoop SQL, called Hive? Most projects have a community and a commercial arm. Hadoop’s importance has drawn a number of corporate sponsors to separately deliver their implementations. Microsoft, Yahoo, Google, and Facebook all have their own takes on Hadoop, alongside Apache and Cloudera. All these various Hadoops can be seen as a positive or a negative. As a positive, there is growth and momentum for the framework. As a negative, there are many organizations pulling Hadoop in different directions.
In a broad context, the value of open source software is that many hands working to create something that is not proprietary, not unstable, and not subject to the whims of a corporate titan is a foundation stone. On the other hand, fragmentation of an important technology makes some folks wary of open source.
The way online works is to reward one company with a virtual monopoly. This is a natural consequence of costs and user behavior. The problem is that when one outfit is in control, that organization follows the well worn path of profit and benefit maximization. That can’t be helped either.
In short, I think the same type of financial meltdown that has trashed some individuals’ plans for the future is likely to take place again. Tricky stuff, indeed.
Stephen Arnold, July 4, 2009
Lucid Imagination Offers Connectors to Lucene Solr Systems
June 30, 2009
Lucid Imagination now resells ISYS Search Software file filters to offer content access capability to Lucene/Solr open source search systems. This clever move has the happy side effect of allowing Lucid to market the filters, a set of .dlls (dynamic link libraries) normally used in retail products for text extraction, to their own customers, effectively stretching its Lucene/Solr search product into the pay-for-service enterprise data field. It’s a streamlined effort designed to be significantly cheaper than competitor connectors and gets around one of the barriers to broader uptake of open source search technology. Most commercial search vendors do not unbundle their connectors and often use them to justify higher price-tags. This deal may take a lot of wind out of their sails. Lucid will offer five categories of content filters, available separately or in any combination, so a company can customize based on their search needs. Beyond Search was surprised that commercial search vendor is unbundling its technology. The plus is that it gets the Australian company’s foot in the door to the open source market. Meanwhile, Lucid is on the move to strengthen its position bridging the gap between open source and commercial software and will be signing up other commercial software components so Lucene/Solr users can build more robust search solutions.
Jessica Bratcher, June 30, 2009
Lucid Meet Up: Open Source Search Draws Crowd
June 23, 2009
I was in San Francisco the day of the open source Lucene meet up sponsored by Lucid Imagination. The New Idea Engineering Web log wrote a useful summary of what transpired. You can find “Impressions of First Lucene / Solr Meet Up” on the Enterprise Search Blog. Keep in mind that the founders of the Enterprise Search Blog liked the study “Successful Enterprise Search Management” Martin White and I wrote. People who like what I do may have unusual tolerance for addled geese. You have been warned.
I noted the upside and downside of a technical meet up, but I wanted to know more. I chased down David Fishman, one of the spark plugs for Lucid Imagination. You can read an interview with one of the founders of Lucid Imagination, Marc Krellenstein, in the ArnoldIT.com “Search Wizards Speak” series.
I came away from my discussion with Mr. Fishman more than a little impressed. Some of the items that remained pinned to my brain’s search bulletin board warrant sharing.
First, open source is hot. Few information technology professionals want to go to a meeting about search without first hand information about Apache Lucene (http://lucene.apache.org/) and Solr.
Second, Lucid Imagination (www.lucidimagination.com) is gaining traction with its industrial strength approach to the open source search technology that promises relief from the seven figure licensing fees imposed by some of the high profile search and retrieval vendors.
The meet up brought together almost 50 engineers and programmers on June 3. Featured speakers included Grant Ingersoll, of Lucid Imagination, and of the Apache Lucene project development team, as well as Erik Hatcher, author of Lucene in Action, of the Apache Lucene project development team, and with Ingersoll, a co-founder of Lucid Imagination. Jason Rutherglen and Jake Mannix of Linked-In talked about how they’ve implemented search at the core of their cutting edge social network. Other speakers talked about a wide range of deep search questions, from numeric search, aka Trie Range queries. Avi Rappoport, a search consultant, talked about the approach to “stop words” — encouraging search application developers not to ignore words like “the”, “in”, and the like given the power of today’s compute resources to deal with such nuances.
Back to Lucid: Grant Ingersoll’s talk focused on innovations in Solr 1.4, the forthcoming release of the search platform built around the Lucene Search engine. While there are a good number of important new features, including Trie-range queries for better searching of numeric data, and advanced replication and better logging for improved scalability and deployment, that’s just the latest in a string of enterprise grade innovations that the open source community has rolled together, closing the gap with many, if not most, of the meaningful technology features of commercial enterprise search software. Erik Hatcher spoke about a new search engine for search developers (http://search.lucidimagination.com) that Lucid sponsors for the community, using Lucene and Solr technology to plow through the abundant discussions and technical info created over the years — providing faster troubleshooting and education than programmers could get before.
There were three takeaways from the meeting, according to David Fishman, who does marketing for Lucid Imagination. The breadth and depth of the search problem set means that it’s not going to be solved by one company or one set of people; the active, engaged open source community is constantly adding and innovating new features, putting them through their paces, and pushing the frontier faster than any single company could.
The technology upon which open source search rests is as good or maybe better than some of the commercial products’ code base. Many hands and many eyes mean that the gotchas hiding in some of the high profile brands’ products are not going to jump out and bite an administrator.
That demand is real: innovative companies, as different as IBM, Zappos, Netflix, Linked In, Digg, AOL, MySpace, Apple, Comcast Interactive and more — all these have built mission critical search services at the core of their business using this technology. The people who came to this meet up, and one just like it two weeks earlier in Reston Virginia (http://www.meetup.com/NOVA-Lucene-Solr-Meetup/) are part of that rapidly accelerating adoption curve, since there’s no need to call a salesperson or schedule a demo to get started — the community lowers the barriers to experimentation and participation.
Not least important is what wasn’t covered, said Fishman. Innovation is half the battle; the other, reliability. As Mark Bennett observed on his blog , this meet up was not the crowd that keeps datacenter and IT managers sleeping soundly through the night. Commercial grade reliability comes from a commercial-grade company with the expertise to help get it working and keep it working. And having talked to the Lucid Imagination team, they not only “get” search. They “get” service level agreements. That’ may be one reason why they’re in the business of offering commercial grade support for these technologies.
To sum up, what strikes me as new is that Lucid’s pool of engineers is available to help — many of them, the same engineers who help write the code and manage the innovations with the Apache Lucene community. What the IT guys get by working with Lucid is the combination of innovation with peace of mind and better control of customization and maintenance.
My hunch is that a company with a search system is going to invest in professional services for support no matter what search solution you deploy. Even if open source makes it easy to get search, it takes expertise to get search right.
If I know Marc Krellenstein, the Lucid Imagination team will be able to deliver that expertise at competitive rates. Certainly, the range of companies represented suggest that open source search is moving toward center stage.
Can open source search gain traction in the enterprise? The answer: In some organizations, the answer is, “Yes.”
Open source search is here and Lucene/Solr promises to push beyond simple search and retrieval.
Stephen Arnold, June 23, 2009
Free Not Free
June 21, 2009
Techradar’s “Why Free Web Services Aren’t Really Free” surprised me. I have been chugging along with the idea that when I navigated to Google’s search page, ran a query, and reviewed results I was using a free service. I have a free Yahoo email account, and I assumed that because I don’t pay Yahoo money, that email service was free. I learned that if I navigate to a freeware site and download a free application, that download service is not providing me with a free service.
I just learned that these services are not free. I needed to shift my thinking because I am not paying money, providing a PayPal cash transfer, or spitting out my credit card number.
Techradar’s point is that “free” does not mean that usage is without a cost. I was thinking about my Econ class in my freshman year at university. The use of certain words separates an econ major from the lowly math or science wonk. Econ revels in notions of “cost” and “value”.
Techradar asserted:
But from a free software perspective, we’re fighting a dangerous battle. Trading one closed-source app for another gets us nowhere, even if the new app happens to come from Google. Yes, the company does appear to have a bottomless source of storage and bandwidth, but would you feel happy recommending Google Docs to your friends if it were run by Microsoft? It’d be just as free and just as featureful – but somehow people have been fooled into thinking that Google can be as proprietary as it wants and we ought not to care. Tim O’Reilly’s classic speech, “The Open Source Paradigm Shift”, makes it clear that the commoditization of operating systems is imminent, with the next war being fought in the web app space. In 10 years’ time your desktop computer will almost certainly run nearly all your programs over the internet, with your OS being a relatively thin shell that fires up a web browser and points you towards the net. If, in that time, all we’ve done is trade offline closed-source apps for online closed-source apps, then everything we’re fighting for will be worthless. I don’t think anyone wants to see that happen.
Ah, I get it. “Lock in”, “rip and replace cost”, and “value of service”.
And why is lock in bad? It seems to be working in a number of business sectors. If I want to fix my Honda, I have to buy my part from Honda because it will in my experience fit, be less of a hassle if it fails, and can be installed by a Honda technician. The benefit of lock in may be utility or a greater good or a perceived value.
Should software and services be different? Customize an open source system using a third party and you have lock in. Tough to get away from this “value” notion and some other economic forces. Maybe “free” is bad for me because I want to pay so I have one throat to choke or at least a number to call when there is a problem.
Stephen Arnold, June 21, 2009
CIA Investment Arm Taps Open Source Search
June 16, 2009
In Washington, DC, yesterday (June 15, 2009), I learned that In-Q-Tel, the investment arm of the US intelligence community, has given a salute to open source. Lucid Imagination, a commercial open source company dedicated to supporting Apache Lucene and SOLR search technologies, received a strategic investment from In-Q-Tel.
With the investment, Lucid Imagination will support the US intel community by providing advanced access to Lucene and SOLR search solutions.
I learned that this partnership will further accelerate Lucid Imagination’s efforts towards facilitating and broadening the adoption of Lucene/Solr technology within the U.S. Intelligence Community, in addition to the commercial market. Lucid Imagination is proud to be a part of IQT’s efforts at bringing innovative technologies to the U.S. intelligence community.
Lucene/SOLR-based search has become one of the fastest-growing segments in the enterprise search market over the past three years, with more than 9,000 downloads per day and over 4,000 organizations using the open source software for some of industry’s most intensive search needs. Many organizations are replacing costly proprietary licensed search software products with Lucene and Solr because, in addition to lower TCO, it offers the most flexible and scalable architecture for developing highly sophisticated full-text search applications.
Previously, companies using Lucene/Solr relied primarily on the open source community for training, documentation and technical support, a fact that kept many others from using or considering the technology. With Lucid imagination, a well-funded commercial entity offering certified distributions of Lucene and Solr, SLA-based support subscriptions, training, high-level consulting and value-added software, both new and existing users now have access to enterprise-grade support and services to optimize their enterprise search efforts. Lucid Imagination’s Web site is designed to serve as a “knowledge portal” for the Lucene/Solr user community, with information and resources to help application developers build and deploy Lucene/Solr-based solutions in a more efficient and cost-effective manner.
In my view, the In-Q-Tel investment has several interesting aspects.
First, the US government appears to perceive significant potential for Lucene/Solr adoption within its government partners. Several of these entities have already been using the technology and require more reliable, predictable support and services for better risk management. Lucid Imagination provides the commercial backing IQT’s partners need to deploy Lucene/SOLR in mission-critical applications.
Second, in today’s business climate, open source solutions provide one way to tap into a broad community and its programming expertise. Important innovations often bubble up from open source, thus reducing the time between a good idea and a concrete implementation of a function or feature. Some proprietary systems impose a “time friction” on licensees.
Third, open source allows some applications to reduce or eliminate what I call the “one way street” that commercial software often requires of licensees. Flexibility can deliver both financial and technical advantages in my opinion. In my experience, it can be time consuming and expensive to “get information out” of some commercial systems or expensive to figure out how to tap or repurpose a proprietary content processing system’s outputs.
Lucid Imagination is a commercial company dedicated to Apache Lucene technology. The company provides free certified distributions, commercial-grade support, training, high-level consulting and value-added software for Lucene and Solr. Lucid Imagination’s goal is to serve as a central resource for the entire Lucene community and make them more productive. Lucid Imagination is a privately held venture-funded company. Investors in Lucid include Granite Ventures and Walden International. More information is available at www.lucidimagination.com.
Stephen Arnold, June 16, 2009
SAP and Open Source
June 13, 2009
Gwyn Moody and I seem to be on a similar frequency. I like his work. The article “SAP: Open Source’s Friend or Foe?” is an excellent example. Gwyn Moody tackled SAP’s reluctance to cozy up to open source. He wrote:
For an outfit that calls itself “the world’s largest business software company”, the German software giant SAP is relatively little-known in the open source world.
Now SAP wants to support the Eclipse Foundation. Gwyn Moody reported:
There are many well-known benefits that accrue from mandating open source for European contracts – level playing-field, absence of lock-in, ease of moving between suppliers etc. More generally, it creates a bigger software commons that everyone can draw upon - not just companies, whether giants like SAP, or small startups, but educational establishments too (an important but often-overlooked sector). Companies that have adopted a mixed model can simply re-jig their product line, offering wholly open source versions for European government consumption, and making money through their proprietary add-ons elsewhere; adoption by Europe would be a huge marketing boost, making it much easier to do this. And if they won’t adapt to the situation, that creates an opportunity for new players who *are* willing to do so. That’s not the only place where SAP’s attitude to open source is ambiguous, to put it mildly.
My hunch is that SAP is feeling pressure from its fee boosts, customer push back, and escalating demands for cash from SAP engineers who have to keep the complex SAP solutions up and running. Innovation, well, that’s another hungry mouth begging for euros too.
In my opinion, SAP has to find a magic hat and start pulling bunnies from it. I am not sure open source will yield the bunnies SAP needs.
Stephen Arnold, June 13, 2009
Yahoo Hadoop
June 11, 2009
Short honk: The Yahoo Developer Blog Network published “Announcing the Yahoo Distribution of Hadoop” here. IBM and Yahoo teamed on a version of OmniFind that is free. When I installed the system, I found it had a document limit. Bummer. Will the Hadoop distribution have a limit of some type that’s not in the Apache version? With all of the commercial pressures on Yahoo, what business is the company pursuing: online ads, banner ads, open source software, for fee email, or any of the other services available from the splash page? I’m puzzled. Anyone care to direct my thinking?
Stephen Arnold, June 11, 2009
Microsoft Partners Flock to Lotus Notes
June 7, 2009
Here’s the headline that brought a smile to my face: “200+ Microsoft Partners Per Month Flocking to Sell IBM Lotus Foundations Appliance”. You can read the scoop on the KLKN TV Web site here. Take a close look. The “story” is a news release. The author of the “story” / news release (!) is IBM. The document reports some “facts” and “ideas” that I find at odds with my experience. Keep in mind that I am an addled goose and lack the sophistication of the IBM PR team, but consider these points:
- “According to Microsoft business partners, sales of Microsoft’s Small Business Server (SBS) software bundle have slowed due to lack of innovation and partner dissatisfaction with their inability to add solutions. In challenging economic times, Microsoft partners are looking for an alternative that provides SMB customers with more collaboration computing power for less money and more reliability.”
- “…Some Microsoft partners have expressed disenchantment with Microsoft’s new strategy of battling Linux encroachment in the SMB market by offering a skeletal version of Windows Server. Named “Microsoft Foundation Windows Server,” it can be positioned as a loss leader to up-sell customers a variety of other Microsoft products. Seeing rising customer demand for lower cost, more secure, and open source alternatives, many Microsoft partners are looking to sell Linux-based solutions.”
- “We chose Lotus Foundations from IBM for its high availability and ease of maintainability,” said Jim Abraham, Managing Partner of Line Fifty Software. “Lotus Foundations also helps companies keep costs down due to its reliability, all-in-one office software suite, ease of use and maintainability. Additionally, since our solution is Windows-based, the ability to run Windows applications alongside Linux applications makes the Foundations environment a complete SMB solution.”
Strong stuff and I didn’t even mention the pat on the head for Linux and the implicit sideswipe of the Microsoft limo. My thoughts about this type of “story” are that [a] someone needs to run the IBM news releases by an individual with some experience in the real world of partner management. Partners invest time and money and risk losing the support of the mothership for getting out of bounds. I think some of the folks quoted in this news story may have an opportunity for a heart to heart chat with an interested party; [b] the Lotus Appliance does not work like bread crumbs thrown to pigeons. “Flock” does not strike the right chord with me. The number 200 sounds odd to me when I consider the companies cited in the “news story”. I wonder if digging around would reveal that these are firms who represent a range of different products and signed up to resell the Lotus Appliance because fees were waived or some inducement offered; and [c] what the heck is a puff piece like this one doing on a TV station’s Web site. Where’s the New York Times’s coverage? Where’s the story in Network World? Not too many Microsoft resellers will be checking out the KLKN Web site for Lotus news I surmise. I wonder if I can find the original release by searching the IBM Web site? Doing that now. Answer: Nope. IBM is as adept at Lotus Appliance PR as it is at search it seems.
Stephen Arnold,
A Consulting Firm Borrows from Kubler-Ross
May 22, 2009
Elisabeth Kubler-Ross’s widely read study On Death and Dying, published in the 1960s, introduced those struggling with the loss of a loved one to the stages of grief. I read her book when my friend Alan Boise, who worked at the National Institutes of Health, faced a battle with a rare form of leukemia. I found the information helpful, and since that time I have been reluctant to appropriate a useful representation of human behavior for frivolous purposes. I am an addled goose, but I keep certain activities in a private corner of my swimming hole. If you want to reacquaint yourself with Kubler-Ross’s work, you can order your copy here.
If you want to see how an azure chip consulting firm handles the idea of “stages”, you will want to read Matt Asay’s “Forrester’s Five Phases of Open Source Success” here. Mr. Asay reported that this consultancy converts the stages of grief into the stages of open source success. i anticipate that most people will find the approach amusing, perhaps instructive. After all, open source must move from denial that community supported software is of much value to acceptance of products such as Lucene from Lucid Imagination.
Open source has an important contribution to certain information technology challenges. I am pleased with open source. I am not so comfortable with the appropriation and inversion of the Kubler-Ross metaphor. In fact, the metaphor makes me uncomfortable, not with open source software, but with the associations the “stages” evoke. The addled goose’s opinion is that the azure chip consulting firm’s reach has exceeded its grasp.
Stephen Arnold, May 22, 2009
Open Source Surprise
May 12, 2009
CNet here reported that up to 24 percent of software purchases are open source. Are the data 99.9 percent accurate? No. Are the data instructive? Yep. The reason is that a decade ago, open source would have been almost unmeasurable. Matt Asay wrote:
Today, if you look at the most successful open-source businesses, none of them pass the ideologues’ unrealistic and counterproductive “100-percent freedom” litmus test. Not a single one of them.
What emerging is a new business model and one that cannot be ignored.
Stephen Arnold, May 12, 2009


