Exclusive Interview with Kapow Software Founder
December 14, 2010
Our sister information service, Search Wizards Speak, published an exclusive interview with Stefan Andreasen, the founder of Kapow Software. You can read the full text of the discussion on the ArnoldIT.com Web site.
Kapow is a fast-growing company. The firm offers tools and services for what is called data integration. Other ways to characterize the firm’s impressive technology include data fusion, mashups, ETL (jargon for extracting, transforming and loading data from one system to another), and file conversion and slicing and dicing. The technology works within a browser and can mobile enable any application, integrated cloud applications, and migrate content from a source to another system.
In the interview, Mr. Andreasen said about the spark for the company:
As soon as we started building the foundational technology at Kapow.net in Denmark, I knew we were on to something special that had broad applicability far beyond that company. For one, the Web was evolving rapidly from an information-hub to a transaction-hub where businesses required the need to consolidate and automate millions of cross-application transactions in a scalable way. Also, Fortune 1000 companies were then and, as you know, even more so today, turning to outsourced consultants and hoards of manual workers to do the work that this innovation could do instantly.
On the subject of car manufacturer Audi’s use of the Kapow technology, he added:
In one user case, Audi, the automobile manufacturer, was able to eliminate dependencies, streamline their engineering process, and minimize the time-to-market on their new A8 model. Audi employs Katalyst to integrate data for their state of the art navigation system, called MMI, which combines Google Earth with real-time data about weather, gas prices, and other travel information, customizing the driver’s real-time experience according to their location and taste preferences. In developing the navigation system, Audi had relied on application providers to write custom real-time APIs compatible with the new Audi system. After months of waiting for the APIs and just two weeks away from the car launch date, Audi sought Kapow’s assistance. Katalyst was able to solve their problem quickly, wrapping their data providers’ current web applications into custom APIs and enabling Audi to meet their target launch date. By employing Kapow, Audi is now able to quickly launch the car in regional markets because Katalyst enables the Audi engineers to easily change and integrate new data sources for each market, in weeks rather than months.
For more information about Kapow, navigate to www.kapowsoftware.com. The full text of the interview is at http://www.arnoldit.com/search-wizards-speak/kapow.html.
Kenneth Toth, December 14, 2010
Freebie
Digital Reasoning Unleashes Synthesys Version 3
December 6, 2010
Our sister publication covers the dynamic world of data fusion and next generation analytics. I wanted to call your attention of an interview with Tim Estes, the founder of Digital Reasoning. The company has announced a new version of the firm’s Synthesys product. You can read a complete, far ranging interview with Mr. Estes in the Search Wizards Speak series at this link. Our analyses of the Digital Reasoning technology are most encouraging.
Here’s a snippet of the interview’s contents from the Inteltrax story which ran earlier today:
“Synthesys V3.0 provides a horizontally scalable solution for entity identification, resolution, and analysis from unstructured and structured data behind the firewall,” Estes said when asked about Digital Reasoning’s new offering. “Our customers are primarily in the defense and intelligence market at this point so we have focused on an architecture that is pure software and can run on a variety of server architectures.” In addition, the program is ripe with features that are miles beyond previous versions. “We’ve enhanced and improved the core language processing in dramatic ways. For example, there is more robustness against noisy and dirty data. And we have provided better analytics quality. We have also integrated fully with Hadoop for horizontal scale. We probably have one of the most flexible and scalable text processing architectures on the market today.”
While the company still works heavily with the government, Synthesys technology will benefit several other fields. “We are getting good bit of interest from companies that need what I call ‘big data analytics’ for financial services, legal eDiscovery, health care, and media tasks.” For example, the program: “can identify the who and the what, map the connections, and deliver the key insights.” Estes continues, “instead of clicking on links and scanning documents for information, Synthesys Version 3.0 moves the user from reading a ranked or filtered set of documents to a direct visual set of facts and relationships that are all linked back to the key contexts in documents or databases. One click and the user has the exact fact. Days and hours become minutes and seconds.”
EasyAsk: Exclusive Interview with Craig Bassin
November 22, 2010
EasyAsk was one of the first search vendors who demonstrated access to structured and unstructured data from a single interface. The firm is now under new ownership, and I wanted to get an update about the company and its technology.
Last week, I was able to talk with Craig Bassin, a former partner in an investment firm. Mr. Bassin is now pushing EasyAsk forward and made his excitement about the company, its technology, and future palpable. The full text of my interview with Mr. Bassin is available in the Search Wizards Speak series at this link.
I want to highlight two points from the one hour interview because each provides useful insight into a company that can compete with such firms as Endeca as well as vendors of technology to organizations struggling with information retrieval.
First, Mr. Bassin calls attention to the EasyAsk natural language processing method. He said:
While EasyAsk also supports the navigational style of search we go much further in helping customers find what they want quickly. EasyAsk’s natural language approach allows buyers to enter an entire descriptive phrase of exactly what they want. The natural language and semantic processing engine understands the context of the search and returns accurate results on the first page, greatly increasing conversion rates. With EasyAsk, customers can chose how they want to find products, and they will find them faster… EasyAsk enables e-commerce websites to always return search results, reducing the number of lost visitors.
My take is that NLP technology is getting more attention now that the limitations of key word searching and laundry lists of results are more well known. (In fact, my column each month for KMWorld will address the use of NLP and semantic technology in the enterprise starting in January 2011.)
Second, I probed Mr. Bassin about EasyAsk’s enterprise solution. He told me:
As you well know, the typical enterprise search product is geared towards allowing users to search unstructured or semi-structured data using keywords to find documents they need. This is good when a user is looking for a specific document, like a contract or performance review. EasyAsk Business Edition addresses a completely different problem – giving casual users faster, easier access to corporate data. At our core, EasyAsk is all about Natural Language linguistic processing, that is, understanding the ‘intent’ of any given question, or query. We’ve extended our intuitive search capability into corporate data allowing users to search, analyze, report and share results. … We designed EasyAsk for casual business users who need immediate access to data so they can make informed decisions improving their ability to increase sales, service customers and execute operational processes. And, they can’t wait a few weeks for IT or a data analyst to get them a custom report.
To learn more about EasyAsk, navigate to www.easyask.com. You can read other interviews in the Search Wizards Speak series at this link.
Stephen E Arnold, November 22, 2010
Freebie but there is always hope…
Inforbix Poised to Shake Up Engineering Design Search
November 3, 2010
In an exclusive interview with ArnoldIT.com, Oleg Shilovitsky, co-founder and CEO of Inforbix, provides an in-depth look at his information retrieval system for engineering and product design. His firm Inforbix has been operating in a low profile and is now beginning to attract the attention of engineering professionals struggling with conventional data management tools for parts, components, assemblies, and other engineered pieces.
Most search systems are blind to the data locked in engineering design tools and systems. For example, in a typical manufacturing company, a traditional search system would index the content on an Exchange server, email, proposals in Word files, and maybe some of the content residing in specialized systems used for accounts payable or inventory. When these items are indexed, most are displayed in a hit list like a Google results page or in a point-and-click interface with hot links to documents that may or may not be related to the user’s immediate business information need.
But what about the specific part needed for a motor assembly? How does one locate the drawing? Where are the data about the item’s mean time before failure? The semantic relationships between bits of product data data located in multiple silos are missing. The context of information related to components in a manufacturing and production process is either ignored, not indexable, or presented as a meaningless item number and a numerical value.
That’s the problem Mr. Shilovitsky and his team of engineers has solved. With basic key word retrieval now a commodity, specialized problems exist. As Mr. Shilovitsky told me, “I think maybe we have solved a problem for the first time. We make manufacturing and production related data available in context.”
In the interview conducted on November 1, 2010, Mr. Shilovitsky said:
In my view, the most valuable characteristics of future systems will be “flexibility” and “granularity”. The diversity of data in manufacturing organization is huge. You need to be flexible to be able to crack the information retrieval. On the other side, businesses are driven by values and ROI. So, to be able to have a granular solution (don’t boil the ocean) in order to address a particular business problem is a second important thing.
He added:
Our system foundation combines flexibility and granularity with a deep understanding of product data in engineering and manufacturing. One of the problems of product development is a uniqueness of organizational processes. Every organization runs their engineering and development shop differently. They are using the same tools (CAD, CAM, CAE, data management tools, or an ERP system), but the combination is unique.
To read the full text of this exclusive interview, navigate to this link. For more information about this ground-breaking approach to a tough information problem, point your browser to www.inforbix.com.
Stephen E Arnold, November 3, 2010
Freebie
Exclusive Interview: Mats Bjore, Silobreaker
September 23, 2010
In some follow up work for the Lucene Revolution Conference, I spoke with Mats Bjore, the former blue chip consultant at the blue chip McKinsey on Tuesday, September 21, 2010. After we discussed our respective talks at the upcoming open source conference sponsored by Lucid Imagination, I asked about the Silobreaker technology. Mats explained that there were some new developments that would be discussed at the upcoming Boston conference.
If you have not used Silobreaker.com, you will want to point your browser at www.silobreaker.com. When you click on the interface, you get an instant report. You can run a more traditional query, but Silobreaker uses proprietary methods to highlight the most important stories, provide visualizations related to the story and entities mentioned, and links to related content. The public Silobreaker is a demonstration of the more robust enterprise solution available from the firm. Silobreaker is in use at a number of interesting client facilities in the US and elsewhere.
I captured our conversation using the handy Skype recorder add in. The full text of our conversation appears below.
Mi, Mats, it’s good to talk with you again. I keep hearing about Silobreaker, so you are getting quite a bit of attention in the business intelligence space. What’s new with Silobreaker in the last few months?
Yes, we are getting quite a bit of media attention. As you know, the primary objective of launching the free news search engine was to showcase our technology to enterprise users and to make them see how easily a Silobreaker solution could be tweaked to fit their domain and requirements. The Silobreaker Enterprise Software Suite (“SESS”) was successfully installed last year as the core engine for the Swedish Armed Forces new news intelligence system and we are just about to release a SaaS product online called Silobreaker Premium that is specifically aimed at business and government agency users who don’t need or want a standalone installation. We already have some US clients as pilot clients.
Silobreaker’s splash screen at www.silobreaker.com
How do you describe Silobreaker at this stage in its development?
We’ve come a long way, yet have an exciting product roadmap ahead of us. But most importantly, we have finally reached some milestones in terms of commercial robustness and viability with the platform. Silobreaker Premium will be an exciting new product in the marketplace. Also since our technology and team is highly customizable – our clients and users demands is the most important guide for our development,
What new services have you introduced that you can talk about?
As I said, Silobreaker Premium is the new product for us this year, but we also develop a lot of integrated entity and content management functions for clients that want to have integrated Intelligence analytical tools.
What new functions are available to commercial licensees?
We think Silobreaker Premium is a powerful enterprise product for professional media-monitoring, early warning, risk management, intelligence and decision support.
Available as SaaS (Software as a Service) in a single intuitive and secure user interface, you are able to define monitoring targets, trigger content aggregation, perform analyses, and display results in customized dashboards, e-mail alerts and by auto-generated reports.
What else can the system do?
Let me list a few of the functions. You can set up watch lists and collaborate with colleagues. Also, it is easy to monitor news, reports, multimedia and social media. Clients can track big movers in the news by heat tools and other analytics. A user can easily save and export findings straight into third party applications. We make it very easy to mix and match free and premium content.
What’s the pricing?
Good question for sure. Silobreaker Premium will be priced with a single monthly flat fee per enterprise to allow and encourage large user groups within an organization to use the service regardless of the number of queries, monitoring agents, dashboards, watch lists, alerts, or reports.
There has been quite a bit of “noise” about Palantir and Recorded Future? I thought Silobreaker provided similar functions. Is that correct?
That is correct. I think conceptually we are very similar in what we are trying to deliver to our customers, but there are also some noticeable differences. We crawl different databases, we use different search methodologies, and as companies we are different in size and our pricing differs. Also I believe that from an analyst perspective the Silobreaker , in its customized versions, can provide tools that encompasses the whole intelligence process to a price that enables even large organizations to deploy our systems to everyone. We believe in Silobreaking also when it comes to collaboration.
And silobreaking means what?
Most organizations have “walls” between units. Information in one silo may not be easily accessible to authorized users in other silos. So, our product is a “silobreaker.”
I like the name. My view is that pr, venture capitalists, and the name “Google” blow some technologies up like a Macy’s Thanksgiving Day balloon. What anchors the Silobreaker approach? Don’t give me PR talk, okay?
No problem. Our independence and our beliefs makes Silobreaker unique. We are not VC-financed and have managed to build the business through our own money and customer revenues. That may mean that things have taken a bit longer, but it shows that what we do is for real, which is far away from the many “hype today gone tomorrow” companies that we’ve seen in passing over the last few years. We also anchor all we do in a strong belief in that information overload is not evil but a reassuring consequence of freedom and innovation, but that it is the ability to refine this overload and extract benefits from it that truly create the “killer app” that everybody needs.
Let’s assume I am a busy person. I have to make decisions and I don’t have much time. What do I have to do to get a custom output from Silobreaker?
Not much. Our Premium users typically do two things to generate custom output. Firstly, they create one or several watch lists. This could be people, products, companies or anything else they are interested in – or a list of favorite publications. Such lists can then be used to make queries across all our tools and features or to customize dashboards, email alerts and reports.
What happens if a new content stream becomes available. Say, for example, the Tumblr micro-blogging service. What is required to intake that content and include its content in my results? Is there an open source component or API for Silobreaker?
We support many different types of content. At the moment we will add open sources on request which are added easily through RSS/Atom feeds or through crawling the site. As a general rule, we do not allow users to add sources themselves. Having said that, though, Premium users can add “internal content” through an upload facility, enabling them to mix internal reports and posts with external content.
I find some visualizations pretty but either confusion, meaningless, or downright misleading. What has Silobreaker done to make information for quickly apprehendable? I guess this is called the UX or user experience?
We actually believe that graphics and visualizations should play as big a role for text-mining as it does for numerical analysis. However, I agree with you that usability becomes a big issue in order to make users understand what the visualizations are showing and how they can be used for more in-depth analysis. That is something we are working on all the time, but users must also realize that keyword-based queries generating just lists of search hits can never be the way forward for search, so we hope they are open-minded and about these new ways of presenting results.
As you look ahead, what are the two or three big changes you anticipate in next generation information access?
The focus on “how many hits at what speed” feels very much like first generation features and haven’t really helped with information overload. Context, analysis, and query customizations will be the challenges for next generation algorithms and services.
How can a reader explore Silobreaker.
Silobreaker.com is free and anyone is welcome to a free trial of Silobreaker Premium. Just get in touch.
If a person wants more information, what’s the best way to get that information or contact you?
Contact us directly at sales@silobreaker.com or call or sales director Anders Kasberg at +46 (0) 8 662 3230.
See you in Boston and then in Bethesda the following week, okay.
Yes.
Stephen E Arnold, September 23, 2010
Freebie. The last time I was in Sweden I got herring. I want a taco.
Exclusive Interview: Quentin Gallivan, Aster Data
September 22, 2010
In the last year or two, a new type of data management opportunity has blossomed. I describe this sector as “big data analytics”, although the azure chip consultants will craft more euphonious jargon. One of the most prominent companies in the big data market is Aster Data. The company leverages BigTable technology (closely associated with Google) and moves it into the enterprise. The company has the backing of some of the most prestigious venture firms; for example, Sequoia Capital and Institutional Venture Partners, among others.
Aster Data, therefore, is one of the flagships in big data management and big data analysis for data-driven applications. Aster Data’s nCluster is the first MPP data warehouse architecture that allows applications to be fully embedded within the database engine to enable fast, deep analysis of massive data sets.
The company offers what it calls an “applications-within” approach. The idea is to allow application logic to exist and execute with the data itself. Termed a “Data-Analytics Server,” Aster Data’s solution effectively utilizes Aster Data’s patent-pending SQL-MapReduce together with parallelized data processing and applications to address the big data challenge. Companies using Aster Data include Coremetrics, MySpace, comScore, Akamai, Full Tilt Poker, and ShareThis. Aster Data is headquartered in San Carlos, California.
I spoke with Quentin Gallivan, the company’s new chief executive officer on Tuesday, September 22. Mr. Gallivan made a number of interesting points. He told me that data within the enterprise is “growing at a rate of 60% a year.” What was even more interesting was that data growth within Internet-centric organizations was growing at “100% a year.”
I asked Mr. Gallivan about the key differentiator for Aster Data. Data management and chatter about “big data” peppers the information that flows to me from vendors each day. He said:
Aster Data’s solution is unique in that it allows complete processing of analytic applications ‘inside’ the Aster Data MPP database. This means you can now store all your data inside of Aster Data’s MPP database that runs on commodity hardware and deliver richer analytic applications that are core to improving business insights and providing more intelligence on your business. To enable richer analytic applications we offer both SQL and MapReduce. I think you know that MapReduce was first created by Google and provides a rich parallel processing framework. We run MapReduce in-database but expose it to analysts via a SQL-MapReduce interface. The combination of our MPP DBMS and in-database MapReduce makes it possible to analyze and process massive volumes of data very fast.
In the interview he describes an interesting use case for Barnes & Noble, one of Aster Data’s high profile clients. You can read the full text of the interview in the ArnoldIT.com Search Wizards Speak service by clicking this link. For a complete list of interviews with experts in search and content processing click here. Most of the azure chip consultants recycle what is one of the largest collection of free information about information retrieval in interview form available at this time.
Stephen E Arnold, September 22, 2010
Freebie. Maybe another Jamba juice someday?
Exclusive Interview with Steve Cohen, Basis Technology
September 21, 2010
The Lucene Revolution is a few weeks away. One of the featured speakers is Steve Cohen, the chief operating officer of Basis Technology. Long a leader in language technology, Basis Technology has ridden a rocket ship of growth in the last few years.
Steve Cohen, COO, Basis Technology
I spoke with Steve about his firm and its view of open source search technology on Monday, November 20, 2010. The full text of the interview appears below:
Why are you interested in open source search?
The open source search movement has brought great search technology to a much wider audience. The growing Lucene and Solr community provides us with a sophisticated set of potential customers, who understand the difference that high quality linguistics can make. Historically we have sold to commercial search engine customers, and now we’re able to connect with – and support – individual organizations who are implementing Solr for documents in many languages. This also provides us with the opportunity to get one step closer to the end user, which is where we get our best feedback.
What is your take on the community aspect of open source search?
Of course, open source only works if there is an active and diverse community. This is why the Apache Foundation has stringent rules regarding the community before they will accept a project. “Search” has migrated over the past 15 years from an adjunct capability plugged onto the side of database-based systems to a foundation around which high performance software can be created. This means that many products and organizations now depend on a great search core technology. Because they depend on it they need to support and improve it, which is what we see happening.
What’s your take on the commercial interest in open source?
Our take, as a mostly commercial software company, is that we absolutely want to embrace and support the open source community – we employ Apache committers and open source maintainers for non-Apache projects – while providing (selling) technology that enhances the open source products. We also plan to convert some of our core technology to open source projects over time.
What’s your view on the Oracle Google Java legal matter with regards to open source search?
The embedded Java situation is unique and I don’t think it applies to open source search technology. We’re not completely surprised, however, that Oracle would have a different opinion of how to manage an open source portfolio than Sun did. For the community at-large this is probably not a good thing.
What are the primary benefits of using open source search?
I’ll tell you what we hear from customers and users: the primary benefits are to avoid vendor lock-in and flexibility. There has been many changes in the commercial vendor landscape over the fifteen years we’ve been in this business, and customers feel like they’ve been hurt by changes in ownership and whole products and companies disappearing. Search, as we said earlier, is a core component that directly affects user experience, so customizing and tuning performance to their application is key. Customers want all of the other usual things as well: good price, high performance, support, etc.
When someone asks you why you don’t use a commercial search solution, what do you tell them?
We do partner with commercial search vendors as well, so we like to present the benefits of each approach and let the customer decide.
What about integration? That’s a killer for many vendors in my experience.
Our exposure to integration is on the “back end” of Lucene and Solr. Our technology plugs in to provide linguistic capabilities. Since we deliver a reliable connector between our technology and the search engine this hasn’t been much of a problem.
How does open source search fit into Basis’ product/service offerings?
Our product, Rosette, is a text analysis toolkit that plugs into search tools like Solr (or the Lucene index engine) to help make search work well in many languages. Rosette prepares tokens for the search index by segmenting the text (which is not easy in some languages, like Chinese and Japanese), using linguistic rules to normalize the terms to enhance recall, and also provide enhanced search and navigation capabilities like entity extraction and fuzzy name matching.
How do people reach you?
Our Web site, at www.basistech.com, contains details on our various products and services, or people can write to info@basistech.com or call +1-617-386-2090.
Stephen E Arnold, September 21, 2010
Sponsored post
More from Sematext
September 15, 2010
After the publication of Lucene in Action, 2nd edition, I wanted to more information about Otis Gospodnetic’s company Sematext. Mr. Gospodnetic, who is a busy professional, agreed to an interview with me last week. The full text of that interview is now available as part of the ArnoldIT.com Search Wizards Speak series. You can read the full text at this link.
Mr. Gospodnetic will be a speaker at the October Lucene Revolution Conference in Boston, Massachusetts. Space at the conference is limited due to the influx of early registrations. My hunch is that those who have an interest in open source search want an opportunity to hear from professionals like Mr. Gospodnetic. There are 25 or 30 other experts on the program, which makes the conference on of the few places the best minds in open source search and content processing can be tapped for their insights and knowledge.
You will want to read the full interview. However, I wanted to flag two comments offered by Mr. Gospodnetic.
First, his firm offers engineering and consulting firms to organizations. His approach struck me as interesting:
We primarily provide our knowledge and expertise in dealing with volume “problems”, be that data volume or request (search/query) volume. In addition, we have experience with tools that are designed to work well in high data (change) volume environments. For example, for our search customers we regularly design highly distributed search backend on top of Lucene or Solr or other search solutions that involve index sharding and distributed search or index replication, or both. While we focus on Lucene and Solr on the search side of our business, we are constantly looking at and evaluating new search technologies. In a recent engagement we looked beyond Lucene/Solr and, after evaluating several other solutions (although all based on Lucene!), decided to go with a solution that turned out to be more appropriate for the customer.
Many vendors focus “in”. Sematext continues to look “out” when it comes to information retrieval.
Second, he identified the three major trends in search. He told me:
Large-scale data processing (think Lucene/Solr and Hadoop family of products), distributed everything (think Solr, Nutch, Hadoop, HBase…), learning from data (think machine learning, Mahout…).
This statement makes clear that Mr. Gospodnetic sees open source in general and search in particular as having an important role to play in the months and years ahead. Some of the mid tier consulting firms have been slow to recognize the impact open source software is having. I am confident in 2011, there will be many “experts’ rushing forward to document what Mr. Gospodnetic, Lucid Imagination, and others have been doing for several years. Better late than never I think.
For more of Mr. Gospodnetic’s comments, click here.
Stephen E Arnold, September 15, 2010
Sponsored post
Exclusive Interview: Charlie Hull, FLAX
September 1, 2010
Cambridge, England, has been among the leaders in the open source search sector. The firm offers the FLAX search system and offers a range of professional services for clients and those who wish to use FLAX. Mr. Hull will be one of the speakers in the upcoming Lucene Revolution Conference, and I sought his views about open source search.
Charlie Hull, FLAX
Two years ago, Mr. Hull participated in a spirited discussion about the future of enterprise search. I learned about the firm’s clients which include Cambridge University, IBuildings, and MyDeco, among others. After our “debate,” I learned that Mr. Hull worked with the Muscat team, a search system which provided access to a wide range of European content in English and other languages. Dr. Martin Porter’s Muscat system was forward looking and broke new ground in my opinion. With the surge of interest in open source search, I found his comments quite interesting. The full text of the interview appears below:
Why are you interested in open source search?
I first became interested in search over a decade ago, while working on next-generation user interfaces for a Bayesian web search tool. Search is increasingly becoming a pervasive, ubiquitous feature – but it’s still being done so badly in many cases. I want to help change that. With open source, I firmly believe we’re seeing a truly disruptive approach to the search market, and a coming of age of some excellent technologies. I’m also pretty sure that open source search can match and even surpass commercial solutions in terms of accuracy, scalability and performance. It’s an exciting time!
What is your take on the community aspect of open source search?
On the positive side, a collaborative, community-based development method can work very well and lead to stable, secure and high-performing software with excellent support. However it all depends on the ‘shape’ of the community, and the ability of those within it to work together in a constructive way – luckily the open source search engines I’m familiar with have healthy and vibrant communities.
Commercial companies are playing what I call the “open source card.” Won’t that confuse people?
There are some companies who have added a drop of open source to their largely closed source offering – for example, they might have an open source version with far fewer features as tempting bait. I think customers are cleverer than this and will usually realize what defines ‘true’ open source – the source code is available, all of it, for free.
Those who have done their research will have realized true open source can give unmatched freedom and flexibility, and will have found companies like ourselves and Lucid Imagination who can help with development and ongoing support, to give a solid commercial backing to the open source community. They’ll also find that companies like ourselves regularly contribute code we develop back to the community.
What’s your take on the Oracle Google Java legal matter with regards to open source search?
Well, the Lucene engine is of course based on Java, but I can’t see any great risk to Lucene from this spat between Oracle and Google, which seems mainly to be about Oracle wanting a slice of Google’s Android operating system. I suspect that (as ever) the only real benefactors will be the lawyers…
What are the primary benefits of using open source search?
Freedom is the key one – freedom to choose how your search project is built, how it works and its future. Flexibility is important, as every project will need some amount of customization. The lack of ongoing license fees is an important economic consideration, although open source shouldn’t be seen as a ‘cheap and basic’ solution – these are solid, scalable and high performing technologies based on decades of experience. They’re mature and ready for action as well – we have implemented complete search solutions for our customers, scaling to millions of documents, in a matter of days.
When someone asks you why you don’t use a commercial search solution, what do you tell them?
The key tasks for any search solution are indexing the original data, providing search results and providing management tools. All of these will require custom development work in most cases, even with a closed source technology. So why pay license fees on top? The other thing to remember is anything could happen to the closed source technology – it could be bought up by another company, stuck on a shelf and you could be forced to ‘upgrade’ to something else, or a vital feature or supported platform could be discontinued…there’s too much risk. With open source you get the code, forever, to do what you want with. You can either develop it yourself, or engage experts like us to help.
What about integration? That’s a killer for many vendors in my experience.
Why so? Integrating search engines is what we do at Flax day-to-day – and since we’ve chosen highly flexible and adaptable open source technology, we can do this in a fraction of the time and cost. We don’t dictate to our customers how their systems will have to adapt to our search solution – we make our technology work for them. Whatever platform, programming language or framework you’re using, we can work with it.
How do people reach you?
Via our Web site at http://www.flax.co.uk – we’re based in Cambridge, England but we have customers worldwide. We’re always happy to speak to anyone with a search-related project or problem. You’ll also find me in Boston in October of course!
Thank you.
Stephen E Arnold, September 1, 2010
Freebie
Exclusive Interview: Satish Gannu, Cisco Systems Inc.
August 24, 2010
I made my way to San Jose, California, to find out about Cisco Systems and its rich media initiatives. Once I located Cisco Way, the company’s influence in the heart of Silicon Valley, I knew I would be able to connect with Satish Gannu, a director of engineering in Cisco’s Media Experience and Analytics Business Unit. Mr. Gannu leads the development team responsible for Cisco Pulse, a method for harnessing the collective expertise of an organization’s workforce. The idea is to apply next generation technology to the work place in order to make it quick and easy for employees to find the people and information they need to get their work done “in an instant.”
I had heard that Mr. Gannu is exploring the impact of video proliferation in the enterprise. Rich media require industrial-strength, smart network devices and software, both business sectors in which Cisco is one of the world’s leading vendors. I met with Mr. Gannu is Cisco Building 17 Cafeteria (appropriate because Mr. Gannu has worked at Cisco for 17 years). Before tackling rich media, he served as Director of Engineering in Cisco’s Security Technology Group. I did some poking around with my Overflight intelligence system and picked up signals that he is responsible for media transcoding, a technology that can bring some vendors’ network devices to their knees. Cisco’s high performance systems handle rich media. Mr. Gannu spearheads Cisco’s search and speech-to-text activities. He is giving a spotlight presentation at the October 7-8, 2010, Lucene Revolution Conference in Boston, Massachusetts. The conference is sponsored by Lucid Imagination.
Satish Gannu, Director of Engineering, Cisco Systems Inc.
The full text of my interview with Mr. Gannu appears below:
Thanks for taking the time to talk with me?
No problem.
I think of Cisco as a vendor of sophisticated networking and infrastructure systems and software? Why is Cisco interested in search?
We set off to do the Pulse project in order to turn people’s communications in to a mechanism for finding the right people in your company. For finding people, we asked how do people communicate what they know? People communicate what they know through documents — web page, or an email, or a Word document, or a PDF, and now, Video. Video is big for Cisco
Videos are difficult to consume or even find. The question we wanted to answer was, “Could we build a business-savvy recommendation engine. We wanted to develop a way to learn from user behavior and then recommend videos to people, not just in an organization but in other settings as well. We wanted to make videos more available for people to consume. Video is the next big thing in digital information, from You Tube coming to enterprise world. In many ways, video represents a paradigm shift. Video content takes a a lot of storage space. We think that video is also difficult to consume, difficult to find. In search, we’ve always worked from document-based view. We are now expanding the idea of a document from text to rich media. We want to make video findable, browseable, and searchable. Obviously the network infrastructure must be up to the task. So rich media is a total indexing and search challenge.
Is there a publicly-accessible source of information about Cisco’s Pulse project?
Yes. I will email you the link and you may insert it in this interview. [Click here for the Pulse information.]
No problem. Are you using open source search technology at Cisco.
Yes, we believe a lot in the wisdom of the crowds. The idea that a community and some of the best minds can work together to develop and enhance search technology is appealing to us. We also like the principle that we should not invent something that is already available.
I know you acquired Jabber. Is it open source?
Yes, in late 2008 we purchased Cisco bought the company called Jabber. The engineers had developed a presence and messaging protocol and software. Cisco is also active in the Open Social Platform.
Would you briefly describe Open Social?
Sure. “Open Social” is a platform with a set of APIs developed by a community of social networking developers and vendors to structure and expose social data over the network, at opensocial.org. We’ve adopted Open Social to expose the social data interfaces in our product for use by our customers, leveraging both the standardization and the innovation of this process to make corporate data available within organizations in a predictable, easy-to use platform.
Why are you interested in Lucene/Solr?
We talked to multiple companies, and we decided that Lucene and Solr were the best search options. As I said, we didn’t want to reinvent the wheel. We looked at available Lucene builds. We read the books. Then we started working with Lucid. Our hands on testing actually validated the software. We learned how mature it is. The road map for things which are coming up was important to us.
What do you mean?
Well, we had some specific ideas in mind. For example, we wanted to do certain extensions on top of basic Lucene. With the road map, open source gives us an an opportunity to do our own intellectual property on the top of Lucene/Solr.
Like video?
Yes, but I don’t want to get into too much detail. Lucene for video search is different. With rich media sources we worry about how transcribe it, and then we have to get into how the system can implement relevancy and things like that.
One assumption we made is how people speak at a rate of two to three words per second. So when we were doing tagging, we could calculate the length of the transcript and size of the document.
That’s helpful. What are the primary benefits of using Lucene/Solr?
One of our particular interests is figuring out how we can make it easy for people in an organization to find a person with expertise or information in a particular field. At Cisco, then, how our systems help users find people with specific expertise is core to our product.
So open source gives us the advantage of understanding what the software is doing. Then we can build on top of those capabilities., That’s how we determine what, which one to choose for.
Does the Lucene/Solr community provide useful developments?
Yes, that’s the wisdom of the crowds. In fact, the community is one of the reasons open source is thriving. In my opinion, the community is a big positive for us. In our group, we use open social too. At Cisco, we are part of the enterprise Open Social consortium, and we play an active role in it. We also publish an open source API.
I encourage my team be active participants in that and contribute. Many at Cisco are contributing certain extensions. We have added these on top of open social. We are giving our perspective to the community from our Pulse learnings. We are doing the same type of things for for Lucene/Solr.
My view is that if useful open source code is out there, everyone can make the best utilization of it. And if a developer is using open source, there is the opportunity for making some enhancement on top of the existing code. It is possible to create your own intellectual property around open source too.
How has Lucid Imagination contributed to your success in working with Solr/Lucene?
We are not Lucene experts. We needed to know whether it’s possible, not possible, what are the caveats. The insight, which we got from consulting with Lucid Imagination helped open our eyes to the possibilities. That clinical knowledge is essential.
What have you learned about open source?
That’s a good question. Open source doesn’t always come for free. We need to keep that in mind. One can get open source software. Like other software, one needs to maintain it and keep it up to date.
Where’s Lucid fit in?
Without Lucid We would have to send an email to the community, and wait for somebody to respond. Now I ping Lucid.
Can you give me an example?
Of course. If I have 20,000 users, I can have 100 million terms in one shard. If I need to scale this to 100,000 users and put up five shards, how do I handle these shards so that each is localized? What is the method for determining relevancy of hits in a result set? I get technical input from Lucid on these types of issues.
When someone asks you why you don’t use a commercial search solution, what do you tell them?
I get this question a lot. In my opinion, the commercial search systems are often in a black box. We occasionally want to have use this type of system. In fact, we do have a couple of other related products which use commercial search technologies.
But for us, analysis of context is the core. Context is what the search is about. And when you look at the code, we realized, how we use this functionality is central to our work. How we find people is one example of what we need. We need an open system. For a central function, the code cannot be a black box. Open source meets our need.
Thank you. How can a reader contact you?
My email is sgannu at cisco dot com.
Stephen E Arnold, August 24, 2010
Sponsored post