Lucene Revolution Call for Papers
March 3, 2011
In “Lucid Imagination Searching for Lucene Revolution Presenters,” CMS WiRE announced the search for conference speakers:
“Lucid Imagination, the commercial provider of services and support for Solr/Lucene, has opened its search for presenters for the second Lucene Revolution conference, scheduled for May 25-26 in San Francisco. The annual Lucene Revolution, the largest U.S. conference focused on open source enterprise search, brings together developers and industry thought leaders to discuss the use of Solr/Lucene.”
See information about Lucid Imagination here.
Cynthia Murrell, March 3, 2011
Freebie
ZL Systems and TREC
August 13, 2010
I don’t write anything about TREC, the text retrieval conference “managed” by NIST (US Department of Commerce’s National Institute of Standards and Technology). The participants in the “tracks”, as I understand the rules, may not use the data for Madison Avenue-style cartwheels and reality distortion exercises.
The TREC work is focused on what I characterize as “interesting academic exercises.” Over the years, the commercial marketplace has moved in directions that are different from the activities for the TREC “tracks”. A TREC exercise is time consuming and expensive. The results are difficult for tire kickers to figure out. In the last three years, the commercial market is moving in a manner different from academic analyses. You may recall my mentioning that Autonomy had 20,000 customers and that Microsoft SharePoint has tens of millions of licensees. Each license contains search technology and cultivates a fiercely competitive ecosystem to “improve” findability in SharePoint. Google is chugging along without much worry about what’s happening outside of the Googleplex unless it involves Apple, money, and lawyers. In short, research is one thing. Commercial success is quite another.
I was, therefore, interested to see “Study Finds that E-Discovery Using Enterprise-Wide Search Improves Results and Reduces Costs.” The information about this study appeared in the ZL Technologies’ blog The Modern Archivist in June 2010. You can read the story “New Scientific Paper for TREC Conference”, which was online this morning (August 10, 2010). In general information about TREC is hard to find. Folks who post links to TREC presentations often find that the referenced document is a very short item or no longer available. However, you can download the full “scientific paper” from the TREC Web site.
The point of the ZL write up is summarized in this passage:
Using two fully-independent teams, ZL tested the increased responsiveness of the enterprise-wide approach and the results were striking: The enterprise-wide search yielded 77 custodians and 302 responsive email messages, while the custodian approach failed to identify 84% of the responsive documents.
The goose translates this to mean that there’s no shortcut when hunting for information. No big surprise to the goose, but probably a downer to those who like attention deficit disorder search systems.
So what’s a ZL Technologies? The company says:
[It] provides cutting-edge enterprise software solutions for e-mail and files archiving for regulatory compliance, litigation support, corporate governance, and storage management. ZL’s Unified Archive, offers a single unified platform to provide all the above capabilities, while maintaining a single copy and a unified policy across the enterprise. With a proven track record and enterprise clients which include top global institutions in finance and industry, ZL has emerged as the specialized provider of large-scale email archiving for eDiscovery and compliance.
Some information about TREC 2010 appears in “TREC 2010 Web Track Guidelines”. The intent is to describe one “track”, but the information provides some broader information about what’s going on for 2010. The “official” home page for TREC may be useful to some Beyond Search readers.
For more TREC information, you will have to attend the conference or contact TREC directly. The goose is now about to get his feathers ruffled about the availability of presentations that point out that search and retrieval has a long journey ahead.
Reality is often different from what the marketers present in my opinion.
Stephen E Arnold, August 12, 2010
Freebie
Lucene Revolution Conference Details
July 15, 2010
The Beyond Search team received an interesting news release from a reader in San Francisco. We think the information reveals the momentum that is building for open source search. Here’s the story as we received it:
San Mateo, Calif. – July 14, 2010 – Lucid Imagination, the commercial company for Apache Lucene and Solr open source search technologies, is pleased to announce speakers for Lucene Revolution, the first-ever conference [EV1] in the US devoted to open source search. The conference will take place October 7-8, 2010 at the Hyatt Harborside, Boston, Massachusetts. Lucene Revolution is a groundbreaking event that drives broad participation in open source enterprise search , creating opportunities for developers, technologists and business leaders to explore the disruptive new benefits that open source enterprise search makes possible, in a fresh, energetic and forward thinking format.
The diverse and widespread adoption of Lucene/Solr for enterprise search applications is reflected by the broad range of speakers at the event, such as:
- Cisco Systems: Satish Gannu
- eHarmony: Joshua Tuberville
- LinkedIn: John Wang
- Sears: David Oliver
- The McClatchy Company: Martin Streicher
- The Smithsonian: Ching-Hsien Wang
- Twitter: Michael Busch
Conference speakers represent a cross-section of Lucene/Solr adoption – including new media, ecommerce, embedded search applications, content management, social media, and security and intelligence – spanning the broad spectrum of production-class enterprise search implementations, all of whom leverage the power and economics of Lucene/Solr innovation.
Other industry thought leaders participating and sharing their insights into open source enterprise search include Hadley Reynolds (Research Director, Search & Digital Marketplace Technologies, IDC) and Stephen E. Arnold (Beyond Search; Managing Partner, ArnoldIT).
Over the two days of the conference there are over 30 sessions scheduled in a variety of different formats: technical presentations, use cases, panel discussions, and Q&A sessions. In addition there will be an “un-conference” the evening of October 7, where attendees can present lightning talks and take part in hands-on community coding efforts.
Registration for Lucene Revolution is now open for the conference at: http://www.lucenerevolution.com/register. A full list of speakers, along with a complete conference agenda, is available at http://www.lucenerevolution.com/agenda.
If you are not familiar with Lucid, here’s a snapshot:
Lucid Imagination is the commercial company dedicated to Apache Lucene technology. The company provides value-added software, documentation, commercial-grade support, training, high-level consulting, and free certified distributions, for Lucene and Solr. Lucid Imagination’s goal is to serve as a central resource for the entire Lucene community and search marketplace, to make enterprise search application developers more productive. Customers include AT&T, Sears, Ford, Verizon, Elsevier, Zappos, The Motley Fool, Macy’s, Cisco, HP, The Guardian and many other household names. Lucid Imagination is a privately held venture-funded company. Investors include Granite Ventures, Walden International, In-Q-Tel and Shasta Ventures. To learn more please visit www.lucidimagination.com.
Goslings Constance Ard and Dr. Tyra Oldham will be attending. Should be useful. Certainly more timely than the plethora of SharePoint and gasping one-size-fits-all programs. Honk.
Stephen E Arnold, July 15, 2010
Sponsored post.
Real Time Search Systems, Part 1
June 21, 2010
Editor’s note: For those in the New Orleans real time search lecture and the Madrid semantic search talk, I promised to make available some of the information I discussed. Attendees are often hungry to have a take away, and I want to offer a refrigerator magnet, not the cruise ship gift shop. This post will provide a summary of the real time information services I mentioned. The group focuses on content processed from such services as Facebook, Twitter, blogs, and other geysers of digital confetti. A subsequent blog post will present the basics of my draft taxonomy of real time search. I know that most readers will kick the candy bar wrapper into the gutter. If you are one of the folks who picks up the taxonomy, a credit line would make the addled goose feel less like a down pillow and more like a Marie Antoinette pond ornament.
What’s Real Time Search?
Ah, gentle reader, real time search is marketing baloney. Life has latency. You call me on the phone and days, maybe weeks go by, and I don’t return the call. In the digital world, you get an SMS and you think it was rocketed to you by the ever vigilant telecommunications companies. Not exactly. In most cases, unless you conduct a laboratory test between mobiles on different systems, capturing the transmit time, the receiving time, and other data points such as time of day, geolocation, etc., you don’t have a clue what the latency between sending and receiving. Isn’t it easier to assume that the message was sent instantly. When you delve into other types of information, you may discover that what you thought was real time is something quite different. The “check is in the mail” applies to digital information, index updating, query processing, system response time, and double talk from organizations too cheap or too disorganized to do much of anything quickly. Thus, real time is a slippery fish.
Real Time Search Systems
Why do I use the phrase “real time”? I don’t have a better phrase at hand. Vendors yap about real time and a very, very few explain exactly what their use of the phrase means. One outfit that deserves a pat on the head is Exalead. The company explains that in an organization, most information is available to an authorized user no less than 15 minutes after the Exalead system becomes aware of the data. That’s fast, and it beats the gym shorts of many other vendors. I would love to pinpoint the turtles, but my legal eagle cautions me that this type of sportiness will get me a yellow card. Figure it out for yourself is the sad consequence.
Here’s the list of the systems I identified in my lectures. I don’t work for any of these outfits, and I use different services depending on my specific information needs. You are, therefore, invited to run sample queries on these services or turn to one of the “real” journalists for their take. If you have spare cash and found yourself in the lower quartile of your math class, you may find that an azure chip consultant is just what you need to make it in the crazy world of online information.
- Collecta – www.collecta.com – Venture funded with another infusion of $5 million
- Crowdeye www.crowdeye.com – Former Microsoft employees’ start up
- DailyRT http://dailyrt.com –
- Ice Rocket www.icerocket.com – funded by Mark Cuban
- ITPints www.itpints.com – Single entrepreneur
- Leapfish — www.leapfish.com Metasearch, controversy over alleged link fraud, backed by DotNext; integrates Topsy.com
- Newslookup – www.newslookup.com, founded 2000 regions and categories, open source engine, DataparkSearch
- OneNewsPage www.onenewspage.com – Live access to top news and analysis
- Red Tram www.redtram.com – Russian service, broad coverage in nine languages, including Chinese. Based in Cyprus
- Scoopler www.scoopler.com – Y Combinator funded
- Topsy www.topsy.com – Ignition Partners and other VC firms
- Twazzup www.twazzup.com – Self funded start up. Don’t confuse this with Exalead’s Tweepz.com
- Tweetmeme www.tweetmeme.com – Part of Fav.or.it in the UK via angel funding
- Yauba www.yauba.com – IIT and UC Berkeley, privacy safe
In my lectures I made four points about these types of real-time search services.
First, each of these services did at the time of my talks deliver more useful and comprehensive results than the “real time search” services from the Big Gals in the Web search game; namely, Google, Microsoft Bing, and Yahoo. Yahoo, I pointed out, doesn’t do real time search itself. Yahoo has a deal with the OneRiot.com outfit. The service is useful and I suppose I could stick it in the list above, but I am just cutting and pasting from the PowerPoint decks I used as crutches and dogs in my lecture.
First U.S. Open Source Search Conference
May 17, 2010
The first-ever conference focused on addressing the business and development aspects of open source search will take place October 7-8, 2010 at the Hyatt Harborside in Boston.
Dubbed Lucene Revolution due to the sponsor, Lucid Imagination, the commercial company dedicated to Apache Lucene technology. This inaugural event promises a full, forward-thinking agenda, creating opportunities for developers, technologists and business leaders to explore the benefits that open source enterprise search makes possible.
In addition to in-depth training provided by Lucid Imagination professionals, there will be two days of content rich talks and presentations by Lucene and Solr open source experts. Working on the program will be Stephen E. Arnold, author and consultant.
Those interested in learning more about the conference and submitting a proposal for a talk can navigate to http://lucenerevolution.com/. The deadline for submissions is June 23, 2010. Individuals are encouraged to submit proposals for papers and talks that focus on categories including enterprise case studies, cloud-based deployment of Lucene/Solr, large-scale search, and data integration.
The Lucene Revolution conference comes just after success of sold-out Apache Lucene EuroCon 2010 in Prague, also sponsored by Lucid Imagination, the single largest gathering of open source search developers to date.
Melody K. Smith, May 16, 2010
Note: ArnoldIT.com paid me to write this.
Lucene Solr Developer Event in Prague Arrives
May 4, 2010
Lucid Imagination is hosting a developer event called Apache Lucene EuroCon in Prague form May 18-21. Insiders tell me they have attracted over 120 attendees so far, a real feat in these travel-constrained times. Some reasons might be: rising interest and adoption of Lucene/Solr; the vibrant European developer community, and the gap left due to the cancellation of the 2010 Apache EuroCon.
According to the conference Web site:
Apache Lucene EuroCon 2010 is the first dedicated Lucene and Solr User Conference in Europe. This conference provides professional training on Lucene and Solr as well as a unique opportunity to learn from the search experts in two educational tracks.
The event will include 2-day Lucene and Solr Boot camp trainings, user case studies, and technical deep dives, along with keynotes from Eric Gries, Lucid’s CEO, Stephen Dunn from the Guardian, and Zack Urlocker, previously EVP at MySQL.
Stephen E Arnold, May 4, 2010
No one paid me to write this. Maybe someday!
Mr. Google Goes to Washington
December 14, 2009
On Monday, December 14, 2009, I will be delivering a 10 minute talk about Google and its impact on the US government. Now I can’t cover too much in 10 minutes, but I want to hit three of the points I will be making. If you are in DC and want to hit the conference, you can get more information at http://government25.com/.
As an introduction, I want to point out that since Mr. Brin made his famous trip in sneakers and a black T shirt several years ago, the Google has leveled up. The Google’s presence sports quite a few folks who can get the Google story across. The top brass at Google also snag those nifty White House luggage tags and cuff links. So, Mr. Google has gone to Washington, and the Googlers are learning to play the Beltway game.
Three points:
First, most people—including Googlers like my pal Cyrus—don’t have a good sense of what Google’s reality is. The problem is like the one a fish has in a fish bowl. The larger world is mostly a blur. Details are tough to discern. The result is that Google can position itself as a Web search company for the masses or as a vital tool for defense mapping. It is quite difficult to locate a person who can express the “is-ness” of Google. The reason? Google’s top 200 wizards want to manage perception. Anyway, detailed explanations require a person to have a Googler’s intellect. Most of the people with that brainpower already work at Google. Therefore, why try to teach the average mobile device user the “is-ness” of Google?
Second, since 2006, the Google has been accelerating its push into various business sectors. You know about telecommunications. You know about content, mostly because the global publishing community has been asleep at the switch, allowing the Google shinkansen to blast on through without stopping. There are five or six other sectors largely unaware that Googzilla is on its way to their fertile fields. This means 2010 will witness more Google disruptions. So, fasten your seatbelt. If you work in one of the somnambulant sectors, get your résumé in order. Wal*Mart may be hiring.
Third, Google’s Lego approach to products and services means that Google can out-innovate most companies. Sure, there are a couple of outfits that have an edge on the Google. Example: Facebook. But in general, Google can move quickly which means that both competitors, customers, and partners are almost always off balance. The lack of balance means that the Google can do pretty much whatever it wants. Once folks react, the Google has moved forward. The opportunities just keep on coming while competitors waste time, resources, and energy trying to deal with where Google has been.
Bottomline: Google is going to have a major impact on the US government starting with the fiscal year beginning on October 1, 2010. In a word: unstoppable.
Stephen E. Arnold, December 14, 2009
I wish to disclose to the USGS that Google is like the San Andreas fault. The Google runs through seven business sectors, not California. Oh, I was not paid by the sponsor of this conference to give a talk. We did a horse trade or a goose trade. I suppose this means I was compensated to think up this analysis, give a talk, and write this self-serving, tongue-in-cheek article. So be it. I am a shameless shill for myself.
Government 2.5: Traditional Information Technology Evolves
December 7, 2009
I have just returned from my endnote at the International Online Conference in London. On December 14, 2009, I will be taking one of the 10 trends for 2010 from my London UK talk and expanding on the idea of dataspaces, not databases. Most governmental entities are anchored in traditional database technology. Although state of the art in the 1970s, the RDBMS framework is ill suited for the rigors of Government 2.5 information.
I will be attending the CoolBlue Government 2.5 conference in Washington, DC, on December 14 and 15, 2009. You can get full details about the conference from the program’s Web site.
You can get a glimpse of what’s in my talk. Just search this Web log for the term “dataspace”, and you will get some background information. The dataspace technology is one of Google’s crown jewels, and it a core capability little known outside of a small circle of wizards. You can see a tiny fragment of the dataspace technology in action if you navigate to the Google Wave information page and do some exploration.
My remarks created quite a stir in London on Thursday, December 3, 2009, and I anticipate a similar reaction in Washington on December 14, 2009. Googlers are largely unaware of the dataspace technology, how it embraces the Google programmable search engine, and the company’s push to become the Semantic Web.
I will be linking these technologies to likely government use cases. If you want to talk after the event, just write me at seaky2000 at yahoo dot com. I will make time to visit with Government 2.5 attendees.
Stephen Arnold, December 7, 2009
Oyez, oyez, I want to alert the mayor of Washington, DC, that I was not paid to write this blatant self promotion or mention the CoolBlue conference. I think the conference’s PR manager will buy me a Diet Pepsi. I have my Web feet crossed.
London Online: The Missing Trends
December 6, 2009
The endnote at the International Online Conference succeeded in getting insightful comments from the panelists and eliciting probing questions from the audience. The downside was that the 90 minute session covered four of the 10 trends advertised in the program. The four trends discussed in the endnote were rising Google pressure, more use of XML, a surge in rich media for core information exchange, and more security safety nets with increased user surveillance likely.
In response to several emails from attendees, here are the missing six trends:
Trend 5: Libraries will be under increasing budget pressure. As a result, interest in lower-cost, cloud-based solutions will rise sharply in 2010. One consequence will more financial woes for library vendors, including commercial database producers.
Trend 6: More demands for timely data. Although not real-time indexing and content delivery, the newer services will strive to reduce latency (staleness) of information available to users in an organization.
Trend 7: Mobile search will become more important. The impact on the length of certain types of textual information will be significant. Those without fast network connections will be unable to access the rich media that will become a larger percentage of the information on offer.
Trend 8: Even if the economic climate improves in 2010, there will be increasing financial pressure on information, search, and content processing companies. Content management and enterprise search vendors will be particularly vulnerable. Neither CMS nor search can “explain” precisely their benefits so marketing, not technical excellence will mean the difference between survival and a buy out or extinction.
Trend 9: Open source will gain traction. Traditional vendors will have to deal with the financial and technical payoffs open source offers. In some organizations, open source will become an acceptable alternative to certain software systems. At the same time, open source vendors will monetize their services. Confusion and contention will increase.
Trend 10: Regulation will become more oppressive. In 2010, the Wild West of the Internet will be brought under the control of the authorities.
Have a trend to add? Use the comments feature of this Web log.
Stephen Arnold, December 6, 2009
Yep, I was paid to be at the Incisive show by Incisive. Nope, I was not paid to write my view of the trends in 2010. Deal with it.
MarkLogic and Its XML Briefing Draw Crowds at London Online
December 4, 2009
Usually I ignore the exhibit areas at trade shows. I don’t know anyone any longer, and the average age of most of the people in the booths is about one third of my 65 years. I did make a sweep through the Incisive International Online Show but I had my progress impeded yesterday. The reason was that the MarkLogic briefings given every hour or so created a mini-traffic jam.
Overflow crowds participated in the MarkLogic technical briefings at the International Online Show, December 1 to 3, 2009, in London, UK.
The briefings drew crowds that overflowed the space allocated for attendees. I asked one of the XML wizards, “What’s with the big crowd?” The MarkLogic wizard replied, “Our MarkLogic server briefing is selling like cold drinks at a football match.” MarkLogic knows its XML and its metaphors. The interest in XML MarkLogic style makes clear that where there is technical magnetism, there is a crowd.
Stephen Arnold, December 4, 2009
I want to disclose to the Food & Drug Administration that I was not paid by MarkLogic to write this article. I was not able to get a booth giveaway when I stopped to ask about the reason for the interest in the XML server lectures. I have to find a way to get some cash for my photographic expertise.