Tweets with Pickles: DataSift and Its Real Time Recipe
September 25, 2010
We have used Tweetmeme.com to see what Twitter users are doing right now. The buzz word real time has usurped “right now” but that’s the magic of folks born between 1968 and 1978.
DataSift combines some nifty plumbing with an original scripting language for filtering 800 tweets a second. The system can ingest and filter other types of content, but as a Twitter partner, DataSift is in the Twitterspace at the moment.
Listio describes the service this way:
DataSift gives developers the ability to leverage cloud computing to build very precise streams of data from the millions and millions of tweets sent everyday. Tune tweets through a graphical interface or through its bespoke programming language. Streams consumable through our API and real-time HTTP. Comment upon and rank streams created by the community. Extend one or more existing streams to create super streams.
The idea is that a user will be able to create a filter that plucks content, patterns like Social Security Numbers, and metadata like the handle, geographic data, and the like. With these items, the system generates a tweet stream that matches the parameters of the filter. The language is called “Filtered Stream Definition Language” and you can see an example of its lingo below:
RULE “33e3891a3aebad56f962bb5e7ae4dc94” AND twitter.user.followers_count > 1000
A full explanation of the syntax appears in the story “FSDL”.
You can find an example on the DataSift blog which is more accessible than the videos and third party write ups about a service that is still mostly under wraps.
The wordsmiths never rest. Since I learned about DataSift, the service has morphed into “cloud event processing.” As an phrase for Google indexing, this one is top notch. In terms of obfuscating the filter, storage, and analysis aspect of DataSift, I don’t really like cloud event processing or the acronym CEP. Once again, I am in the minority.
The system’s storage component is called “pickles.” The filters can cope with irrelevant hash tags and deal with such Twitter variables as name, language, location, profiles, and followers, among others. There are geospatial tricks so one can specify a radius around a location or string together multiple locations and get tweets from people close to bankrupt Blockbuster stores in Los Angeles.
The system is what I call a next generation content processing service. Perched in the cloud, DataSift deals with the content flowing through the system. To build an archive, the filtered outputs have to be written to a storage service like Pickles. Once stored, clever users can slice and dice the data to squeeze gems from the tweet stream.
The service seems on track to become available in October or November 2010. A graphical interface is on tap, a step that most next generation content processing systems have to make. No one wants to deal with an end user who can set up his own outputs and make fine decisions based on a statistically-challenged view of his or her handiwork.
For more information point your browser at www.datasift.net.
Stephen E Arnold, September 25, 2010
Search Industry Spot Changing: Risks and Rewards
September 20, 2010
I want to pick up a theme that has not been discussed from our angle in Harrod’s Creek. Marketers can change the language in news releases, on company blogs, and in PowerPoint pitches with a few keystrokes. For many companies, this is the preferred way to shift from one-size-fits-all search solutions described as a platform or framework into a product vendor. I don’t want to identify any specific companies, but you will be able to recognize them as these firms load up on Google AdWords, do pay-to-play presentations at traditional conferences, and output information about the new products. To see how this works, just turn off Google Instant and run the query “enterprise search”, “customer support”, or “business intelligence.” You can get some interesting clues from this exercise.
Source: http://jason-thomas.tumblr.com/
Enterprise search, as a discipline, is now undergoing the type of transformation that hit suppliers to the US auto industry last year. There is consolidation, outright failure , and downsizing for survival. The auto industry needs suppliers to make cars. But when people don’t buy the US auto makers products, dominoes fall over.
What are the options available to a company with a brand based on the notion of “enterprise search” and wild generalizations such as “all your information at your fingertips”? As it turns out, the options are essentially those of the auto suppliers to the US auto industry:
- The company can close its doors. A good example is Convera.
- The search vendor can sell out, ideally at a very high price. A good example is Fast Search & Transfer SA.
- The search vendor can focus on a specific solution; for example, indexing FAQs and other information for customer support. A good example is Open Text.
- The vendor can dissolve back into an organization and emerge with a new spin on the technology. An example is Google and its Google Search Appliance.
- The search vendor can just go quiet and chase work as a certified integrator to a giant outfit like Microsoft. Good examples are the firms who make “snap ins” for Microsoft SharePoint.
- The search vendor can grab a market’s catchphrase like “business intelligence” and say me too. The search vendor can morph into open source and go for a giant infusion of venture funding. An example is Palantir.
Now there is nothing wrong with any of these approaches. I have worked on some projects and used many of the tactics identified above as rivets in an analysis.
What I learned is that saying enterprise search technology is now a solution has an upside and downside. I want to capture my thoughts about each before they slip away from me. My motivation is the acceleration in repositioning that I have noticed in the last two weeks. Search vendors are kicking into overdrive with some interesting moves, which we will document here. We are thinking about creating a separate news service to deal with some of the non-search aspects of what we think is a key point in the evolution of search, content processing and information retrieval.
The Upside of Repositioning One-Size-Fits-All-Search
Let me run down the facets of this view point.
First, repositioning—as I said above—is easy. No major changes have to be made except for the MBA-style and Madison Avenue type explanation of what the company is doing. I see more and more focused messages. A vendor explains that a solution can deliver an on point solution to a big problem. A good example are the search vendors who are processing blogs and other social content for “meaning” that illuminates how a product or service is perceived. This is existing technology trimmed and focused on a specific body of content, specific outputs from specific inputs, and reports that a non-specialist can understand. No big surprise that search vendors are in the repositioning game as they try to pick up the scent of revenues like my neighbor’s hunting dog.
US Government and Its New IT Directions
September 14, 2010
The U.S. Government is shedding its old clothes for new ones that fit the new technology. The Obama Administration wants the agencies to be transparent and innovative, giving command to U.S. General Services Administration (GSA) to implement the “Open Government” initiative, which in turn created the Office of Citizen Services and Innovative Technologies (OCSIT).
The CRMBuyer interview “Making Change Happen Every Day: Q&A With GSA’s David McClure”, reports the OCSIT associate administrator comment that, “OCSIT is rapidly becoming a leader in the use of new media, Web 2.0 technologies, and cloud computing, to proactively make government agencies and services more accessible to the public.” According to him, by operating at the “enterprise level,” the GSA is aiming to accelerate the adoption of technologies, including mobile applications, and improving search engine capabilities, to involve greater customer interactions and gain efficiencies. We concur with David who feels enhancing citizen participation in government will pay dividends on technology investments, but by hiring IBM to add agility, we are not sure if it could be the swiftest runner on the track team.
Why are there so many separate search systems? Is one efficiency to use one indexing system?
Is IBM the swiftest cat in the nature preserve?
Leena Singh, September 14, 2010
Freebie
Fair Search Rankings: SEO and Its Sins Come Home to Roost
September 7, 2010
You will be reading a lot from the search engine optimization crowd in the coming weeks. SEO means get a site on the first page of Google results no matter what. The “no matter what” part means tricks which Web indexing systems try to de-trick. Both sides are in a symbiotic relationship. The poor goofs with Web sites that pitch a pizza parlor have zero chance to get traffic. An elaborate dance takes place among the engineers who tweak algorithms to make sure that when I enter the query “white house”, I get the “right” white house.
A 1,000 calorie plus Krispy Kreme burger of Texas indigestion is on the menu for the Google if the Associated Press’s story is spot on. Source: http://new.wxerfm.com/blogs/post/bolson/2010/aug/06/krispy-kreme-burger/
You know the one with the President of the country where Google and Microsoft have headquarters. If you are another “white house”, you can hire some SEO azurini and trust that these trial-and-error experts can improve your ranking in Google, Bing, Ask, or other search system. But most of the SEO stuff does not work reliably, so the Web site owner gets to buy ads or pay for traffic. Quite an ecosystem.
Now the game may be officially declared the greatest thing in marketing since the invention of the sandwich board advertising bars in Times Square or be trashed as a scam on hapless Web site owners. The first hint of a potential rainy day is “Texas Opens Inquiry into Google Search Results.” I don’t quote from the AP. The goose is nervous about folks who get too eager to chase feathered fowl with legal eagles. I also am getting more and more careful about my enthusiasm for things Googley.
I don’t have much of a comment and I have only one observation. Add one more Krispy Kreme sized problem to the Paul Allen patent extravaganza, the Oracle dust up, the Facebook chase, and the dissing of the Google TV. I thought Google’s lousy summer was over. Is September 2010 going to trump Google’s June, July, and August 2010? It may. Quite a Labor Day in a state noted for its passion for justice Texas style.
Stephen E Arnold, September 7, 2010
Freebie
Facebooky Curated Search
September 7, 2010
I read “Facebook Now Displaying All Liked News Articles In Search Results.”
So here is the cost equation.
Indexing every wacky thing a spider can reach. Or why not index just the stuff that members click on or flag as Facebooky? Improve relevance and get out of the 1998 mentality toward search. In my opinion, the Googley approach is way expensive. The Facebooky approach is cheaper and probably better. The Facebook method may not emulate the wild and crazy CNN approach to news but Facebook humans are doing some heavy lifting. Facebook suck in what rings the Facebook members’ chimes. The result is Facebooky Curated Search or FCS. Add a consonant and a vowel and FCS becomes a serious stick in the eye for the Google.
Toss in the Facebook targeted ad and something interesting begins to take form.
Yep, I know the azurini see Facebook as a lower form of digital life. Yep, I know the SEO English majors can’t understand why anyone would search Facebook for news. It’s not search, right? Wrong. FCS may be a harbinger (sorry, big word, gentle reader) of a larger threat to finding information.
Why search?
There may be a Facebook app for that. Then what? A horror to awful to contemplate. Google’s traditional search becomes less vital to the Facebook set. Now of what does FCS remind me? Oh, I have an idea: Facebook crushes search. You thought something else?
Stephen E Arnold, September 7, 2010
Open Source At The Smithsonian
September 3, 2010
Resource shelf.com received several emails from people wondering about the technology used to power the Smithsonian’s popular Collection Center catalog. A new article titled “What Search Technology Is The Smithsonian Collection Search Center Catalog Using?” answers that question. The article says, “Bottom line. It was built using open-source technology.”
The museum needed a system capable of supporting a wide range of documents and objects. In the end, the Smithsonian selected open-source Lucene/Solr indexing software for the project, which has given the Smithsonian a flexible and scalable indexing environment. The Smithsonian has also enhanced their online display by programming in a Java environment.
This is a major coup for open-source SOLR/Lucene software. We’ll be paying close attention. As budget pressures increase for certain types of organizations, open source search solutions may be getting more attention. With search vendors morphing into the great marketing hyperbole dimension, Lucene/Solr may be the down to earth solution that fills a void. If you want to download a Lucene/Solr system, navigate to Lucid Imagination.
Stephen E Arnold, September 3, 2010
Freebie
Mango Thrives in the Warmth of Solr
September 2, 2010
Mango library catalog helps to search libraries for the particular book, video, CD, or an ISBN, ISSN, and call number using criteria’s like keywords, title, author, location, et cetera. The Mango statistics measure the end user’s interaction with the Web browser, e.g. text messaging, using the folders, or searching for articles. The Florida Center for Library Automation web site news “Mango is now Solr-Powered!” states that the Mango catalogs will now run on the production servers and use the Solr software.
The news reveals a new term ‘Solango’ that is described as “the combination of the Mango discovery interface with the open source Solr indexing software published by the Apache Software Foundation”. It will replace the Endeca software and become a fully independent discovery platform, which will be able to ingest numerous data sources with no record limits, providing a new powerful open source-indexing, facet, and search engine for Mango.
Open source could put more of a squeeze on already strapped library vendors.
Leena Singh, September 2, 2010
Exclusive Interview: Charlie Hull, FLAX
September 1, 2010
Cambridge, England, has been among the leaders in the open source search sector. The firm offers the FLAX search system and offers a range of professional services for clients and those who wish to use FLAX. Mr. Hull will be one of the speakers in the upcoming Lucene Revolution Conference, and I sought his views about open source search.
Charlie Hull, FLAX
Two years ago, Mr. Hull participated in a spirited discussion about the future of enterprise search. I learned about the firm’s clients which include Cambridge University, IBuildings, and MyDeco, among others. After our “debate,” I learned that Mr. Hull worked with the Muscat team, a search system which provided access to a wide range of European content in English and other languages. Dr. Martin Porter’s Muscat system was forward looking and broke new ground in my opinion. With the surge of interest in open source search, I found his comments quite interesting. The full text of the interview appears below:
Why are you interested in open source search?
I first became interested in search over a decade ago, while working on next-generation user interfaces for a Bayesian web search tool. Search is increasingly becoming a pervasive, ubiquitous feature – but it’s still being done so badly in many cases. I want to help change that. With open source, I firmly believe we’re seeing a truly disruptive approach to the search market, and a coming of age of some excellent technologies. I’m also pretty sure that open source search can match and even surpass commercial solutions in terms of accuracy, scalability and performance. It’s an exciting time!
What is your take on the community aspect of open source search?
On the positive side, a collaborative, community-based development method can work very well and lead to stable, secure and high-performing software with excellent support. However it all depends on the ‘shape’ of the community, and the ability of those within it to work together in a constructive way – luckily the open source search engines I’m familiar with have healthy and vibrant communities.
Commercial companies are playing what I call the “open source card.” Won’t that confuse people?
There are some companies who have added a drop of open source to their largely closed source offering – for example, they might have an open source version with far fewer features as tempting bait. I think customers are cleverer than this and will usually realize what defines ‘true’ open source – the source code is available, all of it, for free.
Those who have done their research will have realized true open source can give unmatched freedom and flexibility, and will have found companies like ourselves and Lucid Imagination who can help with development and ongoing support, to give a solid commercial backing to the open source community. They’ll also find that companies like ourselves regularly contribute code we develop back to the community.
What’s your take on the Oracle Google Java legal matter with regards to open source search?
Well, the Lucene engine is of course based on Java, but I can’t see any great risk to Lucene from this spat between Oracle and Google, which seems mainly to be about Oracle wanting a slice of Google’s Android operating system. I suspect that (as ever) the only real benefactors will be the lawyers…
What are the primary benefits of using open source search?
Freedom is the key one – freedom to choose how your search project is built, how it works and its future. Flexibility is important, as every project will need some amount of customization. The lack of ongoing license fees is an important economic consideration, although open source shouldn’t be seen as a ‘cheap and basic’ solution – these are solid, scalable and high performing technologies based on decades of experience. They’re mature and ready for action as well – we have implemented complete search solutions for our customers, scaling to millions of documents, in a matter of days.
When someone asks you why you don’t use a commercial search solution, what do you tell them?
The key tasks for any search solution are indexing the original data, providing search results and providing management tools. All of these will require custom development work in most cases, even with a closed source technology. So why pay license fees on top? The other thing to remember is anything could happen to the closed source technology – it could be bought up by another company, stuck on a shelf and you could be forced to ‘upgrade’ to something else, or a vital feature or supported platform could be discontinued…there’s too much risk. With open source you get the code, forever, to do what you want with. You can either develop it yourself, or engage experts like us to help.
What about integration? That’s a killer for many vendors in my experience.
Why so? Integrating search engines is what we do at Flax day-to-day – and since we’ve chosen highly flexible and adaptable open source technology, we can do this in a fraction of the time and cost. We don’t dictate to our customers how their systems will have to adapt to our search solution – we make our technology work for them. Whatever platform, programming language or framework you’re using, we can work with it.
How do people reach you?
Via our Web site at http://www.flax.co.uk – we’re based in Cambridge, England but we have customers worldwide. We’re always happy to speak to anyone with a search-related project or problem. You’ll also find me in Boston in October of course!
Thank you.
Stephen E Arnold, September 1, 2010
Freebie
Google and Its Yahoo Style Acquisitions
August 25, 2010
I don’t want to beat a dead Googzilla. But Google had a product search service that involved scanning catalogs. That went away, but I interpreted the effort as a way for the Google to learn about scanning, page fix ups, and indexing to an acceptable level of accuracy a page image. Google rolled out Froogle, which I thought was pretty clever. Not as good as the services from Pricewatch.com or Amazon.com, but I found it helpful for certain types of product research. Froogle was deemed too frisky, so the shopping product was renamed Google Shopping. Along the path of Catalog to Shopping, the Google integrated a shopping cart, which I like more than Amazon’s “one click” approach. Poor Amazon keeps forgetting that I like the one click approach. I get the privilege of going through a bunch of screens, clicking and entering items of data, in order to turn on one click. Then, without further ado, Amazon turns off one click for me. How thoughtful? Google figured out how not to annoy me with its Checkout. In my three Google monographs, I mentioned other features of Google’s shopping capabilities. These ranged from the bar code function to the predictive cuteness disclosed in Google’s open source documents.
I believed and still believe that Google’s in-house technology is available for the Google to convert the Google Shopping service into a world beater.
Apparently I am wrong.
Google bought Like.com which is – care to guess – a shopping service. Point your browser thingy at “Google Buys Like.com” for the received wisdom about this deal from Fortune Magazine. Time Warner sure does understand online information, right? Here’s the key passage from the Fortune write up:
I [Fortune’s author] think the ~$100 million+ Like.com pick-up is an even bigger indication that Google wants to be an eCommerce platform. Google won’t be a fulfillment house but they’ll happily take an affiliate cut of links they send to vendors. And, even if Google casts aside Like.com’s affiliate business, Google still stands to make a lot of money advertising against the (30%) higher CPC rates that shopping sites can pull in. From a technology standpoint, Like.com’s image recognition/comparison engine can not only power shopping, it can also help in its Image Search product, which just recently saw a significant update. Google has other experimental products like Goggles that could also benefit from the technology.
Okay.
My take is different:
- Google seems to be buying companies in the hope that the technology, customers, and staff will give Google a turbo boost in a sector in which Apple and Amazon, among others, are doing a darned good job getting my money. I don’t turn to Google’s Shopping service as frequently as I used to. Am I alone? Well, this deal seems to hint that I am not the only person ignoring Mother Google for products.
- In the eCommerce sector, Google has not mounted much of a product offering. Google hired an expert in eCommerce, but so far not much except this acquisition. I have seen zero use of the product functionality disclosed in the Guha Programmable Search Engine patent documents. Lots of weapons, no attack of significance that I have experienced.
- Google’s in house engineering teams may start to get the idea that their work can’t cut the mustard. Edmond Fallot’s black currant mustard, please! Google’s acquisitions seem to duplicate, not complement, technology Google has disclosed in its technical papers and patent applications. Maybe this stuff Google invented does not work or work in today’s market? Scary.
- Google’s customers may be tired of waiting. I know that I don’t think of Google when I am looking for network cables. I go to Amazon, check out its prices, and then run a query across deal sites. A cable is a cable no matter what Monster insists is true.
Bottom-line: Google has cash and has not yet diversified its revenue streams. The old saw no longer cuts for me. The notion that these acquisitions increase Google’s ad revenue does not get the job done. If the online ad market softens due to a bold action from Facebook or a less clumsy offering from Apple, the Google may have to do more than collect companies Yahoo style. Google has to do something. Microsoft Yahoo are now up and jogging.
Maybe Google is a hybrid of the “old” Microsoft and the “pre-Semel” Yahoo? Interesting thought in my opinion.
Stephen E Arnold, August 25, 2010
Freebie
Exclusive Interview: Satish Gannu, Cisco Systems Inc.
August 24, 2010
I made my way to San Jose, California, to find out about Cisco Systems and its rich media initiatives. Once I located Cisco Way, the company’s influence in the heart of Silicon Valley, I knew I would be able to connect with Satish Gannu, a director of engineering in Cisco’s Media Experience and Analytics Business Unit. Mr. Gannu leads the development team responsible for Cisco Pulse, a method for harnessing the collective expertise of an organization’s workforce. The idea is to apply next generation technology to the work place in order to make it quick and easy for employees to find the people and information they need to get their work done “in an instant.”
I had heard that Mr. Gannu is exploring the impact of video proliferation in the enterprise. Rich media require industrial-strength, smart network devices and software, both business sectors in which Cisco is one of the world’s leading vendors. I met with Mr. Gannu is Cisco Building 17 Cafeteria (appropriate because Mr. Gannu has worked at Cisco for 17 years). Before tackling rich media, he served as Director of Engineering in Cisco’s Security Technology Group. I did some poking around with my Overflight intelligence system and picked up signals that he is responsible for media transcoding, a technology that can bring some vendors’ network devices to their knees. Cisco’s high performance systems handle rich media. Mr. Gannu spearheads Cisco’s search and speech-to-text activities. He is giving a spotlight presentation at the October 7-8, 2010, Lucene Revolution Conference in Boston, Massachusetts. The conference is sponsored by Lucid Imagination.
Satish Gannu, Director of Engineering, Cisco Systems Inc.
The full text of my interview with Mr. Gannu appears below:
Thanks for taking the time to talk with me?
No problem.
I think of Cisco as a vendor of sophisticated networking and infrastructure systems and software? Why is Cisco interested in search?
We set off to do the Pulse project in order to turn people’s communications in to a mechanism for finding the right people in your company. For finding people, we asked how do people communicate what they know? People communicate what they know through documents — web page, or an email, or a Word document, or a PDF, and now, Video. Video is big for Cisco
Videos are difficult to consume or even find. The question we wanted to answer was, “Could we build a business-savvy recommendation engine. We wanted to develop a way to learn from user behavior and then recommend videos to people, not just in an organization but in other settings as well. We wanted to make videos more available for people to consume. Video is the next big thing in digital information, from You Tube coming to enterprise world. In many ways, video represents a paradigm shift. Video content takes a a lot of storage space. We think that video is also difficult to consume, difficult to find. In search, we’ve always worked from document-based view. We are now expanding the idea of a document from text to rich media. We want to make video findable, browseable, and searchable. Obviously the network infrastructure must be up to the task. So rich media is a total indexing and search challenge.
Is there a publicly-accessible source of information about Cisco’s Pulse project?
Yes. I will email you the link and you may insert it in this interview. [Click here for the Pulse information.]
No problem. Are you using open source search technology at Cisco.
Yes, we believe a lot in the wisdom of the crowds. The idea that a community and some of the best minds can work together to develop and enhance search technology is appealing to us. We also like the principle that we should not invent something that is already available.
I know you acquired Jabber. Is it open source?
Yes, in late 2008 we purchased Cisco bought the company called Jabber. The engineers had developed a presence and messaging protocol and software. Cisco is also active in the Open Social Platform.
Would you briefly describe Open Social?
Sure. “Open Social” is a platform with a set of APIs developed by a community of social networking developers and vendors to structure and expose social data over the network, at opensocial.org. We’ve adopted Open Social to expose the social data interfaces in our product for use by our customers, leveraging both the standardization and the innovation of this process to make corporate data available within organizations in a predictable, easy-to use platform.
Why are you interested in Lucene/Solr?
We talked to multiple companies, and we decided that Lucene and Solr were the best search options. As I said, we didn’t want to reinvent the wheel. We looked at available Lucene builds. We read the books. Then we started working with Lucid. Our hands on testing actually validated the software. We learned how mature it is. The road map for things which are coming up was important to us.
What do you mean?
Well, we had some specific ideas in mind. For example, we wanted to do certain extensions on top of basic Lucene. With the road map, open source gives us an an opportunity to do our own intellectual property on the top of Lucene/Solr.
Like video?
Yes, but I don’t want to get into too much detail. Lucene for video search is different. With rich media sources we worry about how transcribe it, and then we have to get into how the system can implement relevancy and things like that.
One assumption we made is how people speak at a rate of two to three words per second. So when we were doing tagging, we could calculate the length of the transcript and size of the document.
That’s helpful. What are the primary benefits of using Lucene/Solr?
One of our particular interests is figuring out how we can make it easy for people in an organization to find a person with expertise or information in a particular field. At Cisco, then, how our systems help users find people with specific expertise is core to our product.
So open source gives us the advantage of understanding what the software is doing. Then we can build on top of those capabilities., That’s how we determine what, which one to choose for.
Does the Lucene/Solr community provide useful developments?
Yes, that’s the wisdom of the crowds. In fact, the community is one of the reasons open source is thriving. In my opinion, the community is a big positive for us. In our group, we use open social too. At Cisco, we are part of the enterprise Open Social consortium, and we play an active role in it. We also publish an open source API.
I encourage my team be active participants in that and contribute. Many at Cisco are contributing certain extensions. We have added these on top of open social. We are giving our perspective to the community from our Pulse learnings. We are doing the same type of things for for Lucene/Solr.
My view is that if useful open source code is out there, everyone can make the best utilization of it. And if a developer is using open source, there is the opportunity for making some enhancement on top of the existing code. It is possible to create your own intellectual property around open source too.
How has Lucid Imagination contributed to your success in working with Solr/Lucene?
We are not Lucene experts. We needed to know whether it’s possible, not possible, what are the caveats. The insight, which we got from consulting with Lucid Imagination helped open our eyes to the possibilities. That clinical knowledge is essential.
What have you learned about open source?
That’s a good question. Open source doesn’t always come for free. We need to keep that in mind. One can get open source software. Like other software, one needs to maintain it and keep it up to date.
Where’s Lucid fit in?
Without Lucid We would have to send an email to the community, and wait for somebody to respond. Now I ping Lucid.
Can you give me an example?
Of course. If I have 20,000 users, I can have 100 million terms in one shard. If I need to scale this to 100,000 users and put up five shards, how do I handle these shards so that each is localized? What is the method for determining relevancy of hits in a result set? I get technical input from Lucid on these types of issues.
When someone asks you why you don’t use a commercial search solution, what do you tell them?
I get this question a lot. In my opinion, the commercial search systems are often in a black box. We occasionally want to have use this type of system. In fact, we do have a couple of other related products which use commercial search technologies.
But for us, analysis of context is the core. Context is what the search is about. And when you look at the code, we realized, how we use this functionality is central to our work. How we find people is one example of what we need. We need an open system. For a central function, the code cannot be a black box. Open source meets our need.
Thank you. How can a reader contact you?
My email is sgannu at cisco dot com.
Stephen E Arnold, August 24, 2010
Sponsored post