Lucid Imagination Conference Full of Insights
May 21, 2012
The powerful advantages of open source search solutions are still new to many who would embrace them. That’s one of the conclusions to be drawn from O’Reilly Radar’s “Lucene Conference Touches Many Areas of Growth in Search.” Presenters at Lucid Imagination’s recent conference, Lucene Revolution, detailed those advantages as well as new developments in the field.
Sign-up stats indicate that many of the attendees were new to Lucene and Solr, with about a third having experienced them for less than one year. It sounds like there’s a lot of room for the technologies to grow.
There is more information in the article than I can go into here, so you might want to check it out for yourself. Writer and conference attendee Andy Oram shares some of the highlights regarding big data:
“Mark Davis did a fast-pace presentation on the use of Solr along with Hadoop, and systems hosting GPUs at the information processing firm Kitenga. A RESTful API from LucidWorks Enterprise gives Solr access to Hadoop to run jobs. Glenn Engstrand described how Zoosk, The “Romantic Social Network,” keeps slow operations on the update side of the operation so that searches can be simple and fast. As in many applications, Solr at Zoosk pulls information from MySQL. Other tools they use include the High-speed ObjectWeb Logger (HOWL) to log transactions and RabbitMQ for auto-acknowledge messages. HOWL is also useful for warming Solr’s cache with recent searches, because certain operations flush the cache. “
We welcome another big data development revealed at the conference: LucidWorks Big Data platform, now in Beta, will allow users to manage Solr schemas without having to configure and certify the local tools. Now, there’s a time saver. Lucid vows that the platform can handle any “volume, variety, and velocity” of content.
Auto-completion warranted its own presentation, wherein Sudarshan Gaikaiwari focused on geospatially informed results. Geohashes are used to retrieve geospatial info. They represent the world’s grid as arbitrary strings, where shorter strings represent larger regions and adding characters narrows the search to smaller area. Using these, applications can suggest auto-completed terms local to the user, like a nearby museum or restaurant.
Oram notes that Apache‘s Lucene is probably the most popular independent search engine. Written in Java, it can be used for nearly any full-text search application, particularly cross-platform. The engine boasts low memory requirements, fast incremental indexing, and an array of query types.
Conference sponsor Lucid Imagination is the commercial company for Lucene and its search server Solr. The company crafts robust scalable search solutions that make the most of the open source technology. Lucid prides itself on making open source search accessible and easy to learn. These search gurus recently moved to new digs in Redwood City, CA.
Cynthia Murrell, May 21, 2012
Sponsored by HighGainBlog
Talend Updates Open Studio Applications
May 19, 2012
Talend’s Open Studio platform offers more business intelligence and big data with an enhancement: master data management. The H Open describes the updates found in the most recent version in “Talend Updates Data Tools to 5.1.0.”
Based on open source Eclipse, the Open Studio environment hosts Talend’s Data Integration, Big Data, Data Quality, Master Data Management, and Enterprise Service Bus (ESB). A user-friendly GUI allows users to define processes. The write up specifies that the updates give Open Studio:
“. . .enhanced XML mapping and support for XML documents in its SOAP, JMS, File and Mom components. A new component has also been added to help manage Kerberos security. Open Studio for Data Quality has been enhanced with new ways to apply an analysis on multiple files, and the ability to drill down through business rules to see the invalid, as well as valid, records selected by the rules.
“ESB and Open Studio for ESB appear to be the most revised of the products, with the release notes documenting improvements to the REST and SOAP services, an improved route builder, and improvements to the runtime system . . . . Open Studio for Master Data Management has seen enhancements in the development environment, with searching and filtering available as ways to view an entity, and in the web user interface with improvements in visual cues, easier image storage and resizable sliding panels.”
Talend ESB and Big Data are under the Apache 2.0 License. Open Studio for ESB, Data Integration, Data Quality, and MDM are under the GPLv2.
Talend is a leading open source vendor, providing middleware for both data management and application integration. The company was already a leader in open source data management when its 2010 acquisition of Sopera boosted its standing in the open source middleware market. The company takes pride in providing powerful and flexible open solutions for all sorts of organizations, great and small.
Cynthia Murrell, May 19, 2012
Sponsored by PolySpot
Flax Offers Alternative to dtSearch
May 18, 2012
An apparent alternative to dtSearch, with modifications for scale, positional information, and other factors, has been developed.
Flax, self-proclaimed Open Source Search Specialists, has developed an alternative to the closed source text retrieval software. According to a blog post on the company site, “An Open Source Replacement for the dtSearch Closed Source Search Engine,” the new search engine was built for a client who needed the query language to match the old system. The catch? The source code is not available for public download.
The article describes the development:
“First, we developed a new Lucene Analyzer that speaks the same syntax as dtSearch, allowing us to index text input. On the search side we have a Lucene QueryParser that shares this syntax. To make it easier to use we’ve wrapped the whole lot in a modified Solr server. As we needed some features of very recent Lucene code, our modifications are based on a patch to Lucene trunk (and so the source code isn’t for the faint hearted – if you need it let us know, but we’re not currently providing it for download).”
A downloadable WAR file is available from Flax’s downloads area.
Flax states that this project shows it is possible, and even faster and more economically sound, to move to an open source alternative, regardless of investment in current search. We agree and are interested to see further developments from Flax and others who are willing to tackle the project.
Andrea Hayden, May 18, 2012
Sponsored by PolySpot
Attivio Offers a Spin on the Big Data Bandwagon
May 18, 2012
Attivio, a software company specializing in enterprise search solutions and unified information access, recently published an informative blog post called “How We Handle Open Source.”
According to the post, there are many open source products, particularly those put out by the Java community, that are responsible for some of today’s hottest technology trends. Unfortunately, there are also a lot of bugs and functional gaps as well. So it is necessary that companies have a system in place to handle these issues.
When Attivio encounters an issue with open source code, the company follows a series of steps to check for possible solutions and then:
“Lastly, we strive to create a formal ticket, patch and test for contribution back to the open source community. In all fairness, this is our weakest part of the process, but one we are striving to improve. Many of our changes are small in nature and fix either esoteric edge cases or general code cleanliness like the thread example I mentioned above, but most changes are still useful for the wider community.”
Every company is different but what they all have in common is the need for a troubleshooting plan to address the issues that can come from using open source products. Attivio’s solution is an excellent example of this.
Jasmine Ashton, May 18, 2012
Sponsored by PolySpot
Microsoft Joined by AOL in Outercurve Support
May 16, 2012
AOL is joining Microsoft in its open source ways. ZDNet reports, “AOL Joins Microsoft as Sponsor of Outercurve Foundation.” Outercurve facilitates the exchange of code between the open source community and corporations. It has been supported by Microsoft since that company launched it in 2009 (under the original name CodePlex Foundation), and has now attracted the backing of AOL. Apparently, the move was easier than setting up their own foundation. Mary Jo Foley writes:
“According to a blog post, AOL is becoming a sponsor so it can transfer its internal open-source projects to the Foundation, ‘which eliminates the complexity of creating, funding and managing a separate foundation,’ in the words of Erynn Petersen, AOL SVP of Paid Services Engineering. ‘Outercurve sponsorship also will make it simpler for our partners to contribute to AOL-sponsored open source projects,’ Petersen added.”
Foley asked whether the choice had anything to do with last month’s sale of hundreds of AOL patents to Microsoft, and was assured there’s no connection. (By the way, Microsoft is reselling a number of these patents to Facebook. Interesting.)
A 501c(6) non-profit , the Outercurve Foundation is resolved to complement other open source foundations rather than compete with them. It provides organizations with details like software IP management and project development governance in order to encourage collaboration and spur faster results.
Cynthia Murrell, May 16, 2012
Sponsored by PolySpot
Open Source Search: Momentum Building
May 10, 2012
It has happened.
The self-appointed experts have discovered open source search, reveals CIO in “Wide-Open Search.” With exponentially growing amounts of data to contend with, organizations from Twitter and Facebook to the Library of Congress are turning to open source solutions. Such groups, Stacy Collett writes:
“. . . venture into the seemingly untamed world of open-source search applications, not just for the cost savings, but also for the ability to customize and modify applications quickly. Plus, open source has an active community that can help solve related problems.”
All true. Collett points to Lucene, developed by Lucid Imagination, as her open source example, which seems like a good choice to us. She emphasizes that Lucene is a formidable application built for enterprises with sophisticated search needs. Smaller-scale tools based on Lucene are also available, like Elasticsearch.
Lucid Imagination provides an enterprise open source search solution as well as consulting and engineering services. Lucene Solr leads the field in independent enterprise search platforms, with 200,000 to 300,000 downloads per month. As other search application vendors get snapped up by the giant companies, Lucid relies on adaptability. The write up informs us:
“Lucid Imagination plans to move into the business intelligence and data warehousing spaces and enable integration with big-data technologies, [Lucid CEO Paul] Doscher says. ‘If you put traditional data warehouse or business intelligence-type applications on top of Hadoop, in some instances, it’s almost like trying to take this manhole cover of opportunity and shove it through a garden hose,’ he says.”
Nice metaphor.
We’re okay with Lucid, but he mid-tier consultants. . . . Well, mid-tier exists for a reason. You can get profiles of key open source search vendors for free by clicking on the Profiles link at our sister information service, OpenSearchNews.com.
Cynthia Murrell, May 10, 2012
Sponsored by HighGainBlog
Open Web Analytics and Ikanow as a Key Resource
May 10, 2012
Innovative analytics companies are hot on the market right now. “Enterprise Applications: 10 Hot Data, Web Analytics Companies That You Should Know,” raises some interesting points, but we found the run down of important companies incomplete.
The article tells us the following about Open Web Analytics:
Open Web Analytics is an open-source Web analytics software written in PHP and that uses a MySQL database, which makes it compatible to run with an AMP solution stack on various Web servers. OWA is comparable to Google Analytics, though OWA is a server software one can install and run, while Google Analytics is a software service offered by Google. OWA supports tracking with WordPress and MediaWiki, two popular Web site frameworks.
We think that chatter about analytics is interesting. However, analytics is often tossed out as a buzzword and not precisely defined. The reality of analytics is that a user must know what data are in the set or sample. The validity of the data or the probability of accuracy is important. Finally, the specific numerical recipe followed to generate an output must be selected from many available recipes. Each numerical recipe has limitations and specific utility. Analytics, therefore, is a discipline, not something that outputs a chart or fancy graph.
We have found that Ikanow is a company in the open source analytics sector which warrants attention. You can get more information about Ikanow at www.ikanow.com. Dig deeper than a run down of companies which skims the surface of an important, dynamic market space.
Stephen E Arnold, May 10, 2012
Sponsored by HighGainBlog
Free DataStax Open Source Search Profile Now Available
May 8, 2012
OpenSearchNews.com has posted another open source search engine profile. Like the two previous profiles—Basho Riak and Doculibre Constellio—the discussion of the search system is available for one week. Navigate to the OpenSearchNews.com profile page and click the link for the DataStax report.
OpenSearchNews.com is a service of ArnoldIT, which provides information and analysis of selected open source search solutions. Many proprietary vendors of search are reluctant to admit that open source search is becoming an increasingly disruptive force in the enterprise market.
According to Stephen E Arnold, “This series of profiles is designed to make it easy for an individual to get basic information about open source search. We want to provide these profiles at no cost because access to the information is more important than monetary considerations. Self appointed experts have not stepped forward to cover this important market sector. The team at Beyond Search has taken an important step forward.”
For more information about ArnoldIT, the publisher of these profiles, navigate to www.arnoldit.com and www.augmentext.com.
Don C. Anderson, May 8, 2012
Sponsored by ArnoldIT
Is Android Slipping From Google’s Grasp?
May 8, 2012
It is a tale of fragmentation and control: BetaNews declares, “Google Has Lost Control of Android.” The extensive article examines the ways in which writer Joe Wilcox says open source distribution of the Android platform is hurting the company. He asserts:
“Forrester Research predicts that proprietary Android will surpass the Google Android ecosystem by 2015. Stated differently, Google’s open-source mobile platform risks fracturing into multiple fatally fragmented Android ecosystems. Not one but many. There is little time for Google to demonstrate decisive leadership that can keep the ecosystem largely intact. . . .
“Google’s problem: Two partners are overwhelming successful, while the majority limp along, and one hurts the entire Android ecosystem. Apple is now the least of concerns. Putting Amazon and Samsung in their place is more important.”
Why are Amazon and Samsung such thorns in Google‘s side? For its part, Amazon has customized its Android platform to direct users into its retail world, not Google’s. For example, it delivers its own products and services over Google’s even if, say, the address for Google Play is typed directly into the Kindle‘s browser. Sneaky.
Samsung has hijacked the Android environment on its Galaxy Tab and on most of its smartphones, controlling the user experience. The Samsung skin is so thick, Wilcox says, that users lucky enough to get an upgrade to Ice Cream Sandwich won’t be able to see much of a difference.
See the article for more in depth discussion of Android’s fragmentation, the need for Google to exert control, and Wilcox’s suggestions for the company. Interesting reading (cute pictures, too.)
Cynthia Murrell, May X, 2012
Protege 4.2 Now Available
May 5, 2012
Version 4.2 (beta) of Protégé from Stanford University is now available here. The open source application serves as an ontology editor and knowledge-base framework. The product description states:
“The Protégé platform supports two main ways of modeling ontologies via the Protégé-Frames and Protégé-OWL editors. Protégé ontologies can be exported into a variety of formats including RDF(S), OWL, and XML Schema.
“Protégé is based on Java, is extensible, and provides a plug-and-play environment that makes it a flexible base for rapid prototyping and application development.
“Protégé is supported by a strong community of developers and academic, government and corporate users, who are using Protégé for knowledge solutions in areas as diverse as biomedicine, intelligence gathering, and corporate modeling.”
The editor can be customized to provide domain-friendly support for creating knowledge models and entering data. The National Library of Medicine supports Protégé’s biomedical ontologies and knowledge bases, which serve as national resources. The editor is a core component of The National Center for Biomedical Ontology.
Do taxonomy vendors face the open source ogre?
Cynthia Murrell,May 5, 2012
Sponsored by Ikanow


