Exclusive Interview with MaxxCat

April 15, 2009

I spoke with Jim Jackson on April 14, 2009. Maxxcat is a search and content processing vendor delivering appliance solutions. The full text of the interview appears below:

Why another appliance to address a search and content processing problem?

At MaxxCat, we believe that from the performance and cost perspectives, appliance based computing provides the best overall value. The GSA and Google Mini are the market leaders, but provide only moderate performance at an expensive price point. We believe that by continuously obsessing about performance in the three major dimensions of search (volume of data, speed of retrieval, and crawl/indexing times), our appliances will continue to improve. Software only solutions can not match the performance of our appliances. Nor can software only, or general purpose hardware approaches provide the scaling, high availability or ease of use of a gray-box appliance. From an overall cost perspective, even free software such as Lucene, may end up being more expensive than our drop-in and use appliance.

Jim Jackson, Maxxcat

A second factor that is growing more important is the ease of integration of the appliance. Many of our customers have found unique and unexpected uses for our appliances that would have been very difficult to implement with black box architectures like Google’s. Our entry level appliance can be set up in 3 minutes, comes with a quick start guide that is only 12 pages long, and can be administered from two simple browser pages. That’s it! Conversely, software such as Lucene has to be downloaded, configured, installed, understood, matched with suitable hardware. This is typically followed by a steep learning curve and consulting fees from experts who are involved in getting a working solution, which sometimes doesn’t work, or won’t scale.

But just because the appliance features easy integration, this does not mean that complex tasks cannot be accomplished with it. To aid our customers in integrating our appliances with their computing environments, we expose most of the features of the appliance to a web API. The appliance can be started, stopped, backed up, queried, pointed at content, SNMP monitored, and reported upon by external applications. This greatly eases the burden on developers who wish to customize the output, crawl behavior and integration points with our appliance. Of course this level of customization is available with open source software solutions, but at what price? And most other hardware appliances do not expose the hardware and operating system to manipulation.

Throughput becomes an issue eventually. What are the scaling options you offer

Throughput is our major concern. Even our entry level appliance offers impressive performance using, for the most part, general purpose hardware. We have developed a micro-kernel architecture that scales from our SB-250 appliance all the way through our 6 enterprise models. Our clustering technology has been built to deliver performance over a wide range of the three dimensions that I mentioned before. Some customers have huge volumes of data that are updated and queried relatively infrequently. Our EX-5700 appliance runs the MaxxCAT kernel in a horizontal, front-facing cluster mode sitting on top of our proprietary SAN; storage heavy, adequate performance for retrieval. Other customers may have very high search volumes on relatively smaller data sets (< 1 Exabyte). In this case, the MaxxCAT kernel runs the nodes in a stacked cluster for maximum parallelism of retrieval. Same operating system, same search hardware, same query language, same configuration files etc, but two very different applications. Both heavy usage cases, but heavy in different dimensions. So I guess the point I am trying to make is that you can say a system scales, but does it scale well in all dimensions, or can you just throw storage on it? The MaxxCAT is the only appliance that we know of that offers multiple clustering paradigms from a single kernel. And by the way, with the simple flick of a switch on one of the two administration screens I mentioned before, the clusters can be converted to H/A, with symmetric load balancing, automatic fault detection, recovery and fail over.

Where the the idea for the MaxxCat solution originate?

Maxxcat was inspired by the growing frustration with the intrinsic limitations of the GSA and Google Mini. We were hearing lamentations in the market place with respect to pricing, licensing, uptime, performance and integration. So…we seized the opportunity to build a very fast, inexpensive enterprise search capability that was much more open, and easier to integrate using the latest web technologies and general purpose hardware. Originally, we had conceived it as a single stand alone appliance, but as we moved from alpha development to beta we realized that our core search kernel and algorithms would scale to much more complex computing topologies. This is why we began work on the clustering, H/A and SAN interfaces that have resulted in the EX-5000 series of appliances.

What’s a use case for your system?

I am going to answer your question twice, for the same price. One of our customers had an application in which they had to continuously scan literally hundreds of millions of documents for certain phrases as part of a service that they were providing to their customers, and marry that data with a structured database. The solution they had in place before working with us was a cobbled together mish mash of SQL databases, expensive server platforms and proprietary software. They were using MS SQLServer to do full text searching, which is a performance disaster. They had queries that were running on very high end Dell quad core servers maxed out with memory that were taking 22 hours to process. Our entry level enterprise appliance is now doing those same queries in under 10 minutes, but the excitement doesn’t stop there. Because our architecture is so open, they were able to structure the output of the MaxxCAT into SQL statements that were fed back into their application and eliminate 6 pieces of hardware and two databases. And now, for the free, second answer. We are working with a consortium of publishers who all have very large volumes of data, but in widely varying formats, locations and platforms. By using a MaxxCAT cluster, we are able to provide these customers, not customers from different divisions of the same company, but different companies, with unified access to their pooled data. So the benefits in both of these cases is performance, economy, time to market, and ease of implementation.

Where did the name “MaxxCat” come from?

There are three (at least) versions of the story, and I do not feel empowered to arbitrate between the factions. The acronym comes from Content Addressable Technology, an old CS/EE term. Most computer memories work by presenting the memory with an address, and the memory retrieves the content. Our system works in reverse, the system is presented with content, and the addresses are found. A rival group, consisting primarily of Guitar Hero players, claims that the name evokes a double x fast version of the Unix ‘cat’ command (wouldn’t MaxxGrep have been more appropriate?). And the final faction, consisting primarily of our low level programmers claim that the name came from a very fast female cat, named Max who sometimes shows up at our offices. I will make as many friends as enemies if I were to reveal my own leanings. Meow.

What’s the product line up today?

Our entry level appliance is the SB-250, and starts at a price point of $1,995. It can handle up to several million web pages or documents, depending upon size. None of our appliances have artificial license restrictions based upon silly things like document counts. We then have 6 models of our EX-5000 enterprise appliances that are configured in ever increasing numbers of nodes, storage, and throughput. We really try to understand a customer’s application before making a recommendation, and prefer to do proofs of concept with the customer’s actual data, because, as any good search practitioner can tell you, the devil is in the data.

8. What is the technical approach of your search and content processing system?

We are most concerned with performance, scalability and ease of use. First of all, we try to keep things as simple as possible, and if complexity is necessary, we try to bury it in the appliance, rather than making the customer deal with it. A note on performance; our approach has been to start with general purpose hardware and a basic Linux configuration. We then threw out most of Linux, and built our operating system that attempts to take advantage of every small detail we know about search. A general purpose Linux machine has been designed to run databases, run graphics applications, handle network routing, sharing and interface to a wide range of devices and so forth. It is sort of good at all of them, but not built from the ground up for any one of them. This fact is part of the beauty of building a hardware appliance dedicated to one function — we can throw out most of the operating system that does things like network routing, process scheduling, user accounting and so forth, and make the hardware scream through only the things that are pertinent to search. We are also obsessive about what may seem to be picayune details to most other software developers. We have meetings where each line of code is reviewed and developers are berated for using one more byte or one more cycle than necessary. If you watch the picoseconds, the microseconds will take care of themselves.

A lot of our development methodology would be anathema to other software firms. We could not care less about portability or platform independence. Object oriented is a wonderful idea, unless it costs one extra byte or cycle. We literally have search algorithms that are so obscure, they take advantage of the Endianess of the platform. When we want to do something fast, we go back to Knuth, Salton and Hartmanis, rather than reading about the latest greatest on the net. We are very focused on keeping things small, fast, and tight. If we have a choice between adding a feature or taking one out, it is nearly unanimous to take it out. We are all infected with the joy of making code fast and small. You might ask, “Isn’t that what optimizing compilers do”. You would be laughed out of our building. Optimizing compilers are not aware of the meta algorithms, the operating system threading, the file system structure and the like. We consider an assembler a high level programming tool, sort of. Unlike Microsoft Operating systems which keep getting bigger and slower, we are on a quest to make ours smaller, faster. We are not satisfied yet, and maybe we won’t ever get there. Hardware is changing really fast too, so the opportunities continue.

How has the Google Search Appliance affected the market for your firm’s appliance?

I think that the marketing and demand generation done by Google for the GSA is helping to create demand and awareness for enterprise search, which helps us. Usually, especially on the higher end of the spectrum, people who are considering a GSA will shop a little, or when they come back with the price tag, their boss will tell them “What??? Shop This!”. They are very happy when they find out about us. What we share with Google is a belief in box based search (they advocate a totally closed black box, we have a gray box philosophy where we hide what you don’t need to know about, but expose what you do). Both of our companies have realized the benefits of dedicating hardware to a special task using low cost, mass produced components to build a platform. Google offers massive brand awareness and a giant company (dare I say bureaucracy). We offer our customers a higher performing, lower cost, extensible platform that makes it very easy to do things that are very difficult with the Google Mini or GSA.

What hooks / services does your API offer?

Every function that is available from the browser based user interface is exported through the API. In fact, our front end runs on top of the API, so customers who are so inclined to do so could rewrite or re-organize the management console. Using the API, detailed machine status can be obtained. Things such as core temperature, queries per minute, available disk space, current crawl stats, errors and console logs are all at the user’s fingertips. Furthermore, collections can be added, dropped, scheduled and downloaded through the API. Our configuration and query languages are simple, text based protocols, and users can use text editors or software to generate and manipulate the control structures. Don’t like how fast the MaxxCAT is crawling your intranet, or when? Control it with external scheduling software. We don’t want to build that and make you learn how to use it. Use Unix cron for that if that’s what you like and are used to. For security reasons, do you want to suspend query processing during non-business hours? No problem. Do it from a browser or do it from a mainframe.

We also offer a number of protocol connectors to talk to external systems — HTTP, HTTPS, NFS, FTP, ODBC. And we can import the most common document formats, and provide a mechanism for customers to integrate additional format connectors. We have licensed a very slick technology for indexing ODBC databases. A template can be created to create pages from the database and the template can be included in the MaxxCAT control file. When it is time to update say, the invoice collection, the MaxxCAT can talk directly to the legacy system and pull the required records (or those that have changed or any other SQL selectable parameters), and format them as actual documents prior to indexing. This takes a lot of work off of the integration team. Databases are traditionally tricky to index, but we really like this solution.

With respect to customizing output, we emit a standard JSON object that contains the result and provide a simple templating language to format those results. If users want to integrate the results with SSIs or external systems, it is very straightforward to pass this data around, and to manipulate it. This is one area where we excel against Google, which only provides a very clunky XML output format that is server based, and hard to work with. Our appliance can literally become a sub-routine in somebody else’s system.

What are new features and functions added since the last point release of your product?

Our 3.2 OS (not yet released) will provide improved indexing performance, a handful of new API methods, and most exciting for us, a template based ODBC extractor that should make pulling data out of SQL databases a breeze for our customers. We also have scheduled toggle-switch H/A, but that may take a little more time to make it completely transparent to the users.

13. Consolidation and vendors going out of business like SurfRay seem to be a feature of the search sector. How will these business conditions affect your company?

Another strange thing about MaxxCAT, in addition to our iconoclastic development methods is our capital structure. Unlike most technology companies, especially young ones, we live off of revenue, not equity infusions. And we carry no debt. So we are somewhat insulated from the current downturn in the capital markets, and intend to survive on customers, not investors. Our major focus is to make our appliances better and faster. Although we like to be involved in the evaluation process with our customers, in all but the most difficult of cases, we prefer to hand off the implementation to partners who are familiar with our capabilities and who can bring in-depth enterprise search know how into the mix.

Where do I go to get more information?

Visit www.maxxcat.com or email sales@maxxcat.com

Stephen Arnold, April 15, 2009

Written by Stephen E. Arnold · Filed Under Interview, Online (general), Search, Technology | 2 Comments

Bob Boiko, Exclusive Interview

April 9, 2009

The J Boye Conference will be held in Philadelphia, May 5 to May 7, 2009. Attendees can choose from a number of special interest tracks. These include strategy and governance, Intranet, Web content management, SharePoint, user experience, and eHealth. You can get more information about this conference here.

One of the featured speakers, is Bob Boiko, author of Laughing at the CIO and a senior lecturer at the University of Washington iSchool. Peter Sejersen spoke with Mr. Boiko about the upcoming conference and information management today.

Why is it better to talk about “Information Management” than “Content Management”?

Content is just one kind of information. Document management, records management, asset management and a host of other “managements” including data management all deal with other worthy forms of information. While the objects differ between managements (CM has content items, DM has file, and so on) the principles are the same. So why not unite as a discipline around information rather than fracture because you call them records and I call them assets?

Who should be responsible for the information management in the organization?

That’s a hard question to answer outside of a particular organizational context. I can’t tell you who should manage information in *your* organization. But it seems to me in general that we already have *Information* Technology groups and Chief *Information* Officers, so they would be a good place to start. The real question is are the people with the titles ready to really embrace the full spectrum of activities that their titles imply

What is your best advice to people working with information management?

Again, advice has to vary with the context. I’ve never found two organizations that needed the same specific advice. However, we can all benefit from this simple idea. If, as we all seem to believe, information has value, then our first requirement must be to find that value and figure out how to quantify it in terms of both user information needs and organizational goals. Only then should we go on to building systems that move information from source to destination because only then will we know what the right sources and destinations are.

Your book “Laughing at the CIO” has a catchy title, but have you ever laughed at you CIO yourself?

I don’t actually. But it is always amazing to me how many nervous (and not so nervous) snickers I hear when I say the title. The sad fact is that a lot of the people I interact with don’t see their leadership as relevant. Many (but definitely not all) IT leaders forget or never knew that there is an I to be lead as well as a T. It’s not malicious, it has just never been their focus. I gave the book that title in an attempt to make it less ignorable to IT leaders. Once a leader (or would be leader) picks the book up, I hope it helps them build a base of strength and power based on the strategic use of information as well as technology.

Why are you speaking at a Philadelphia web conference organized by a company based in Denmark?

Janus and his crew are dynamite organizers. They know how to make a conference much more than a series of speeches. They have been connecting professionals and leaders with each other and with global talent for a long time. Those Danes get it and they know how to get you to get it too.

Peter Sejersen, J Boye. April 9, 2009

Written by Stephen E. Arnold · Filed Under Business strategy, Conferences, Enterprise, Feature, Interview, Online (general), Technology | Comments Off on Bob Boiko, Exclusive Interview

Exclusive Interview with David Pogue

April 8, 2009

This year’s most exciting conference for online professionals in Philadelphia is now only four weeks away. In addition to top notch speakers like David Pogue, the networking opportunities at a J. Boye conference are excellent.

One attendee said, “What I like about the J. Boye Conferences is that they bring together industry experts and practitioners over high-quality content that seems to push participants’ professional limits and gets everyone talking. So if you want to learn – but participate as well – consider joining us in Philadelphia this May.”

Instead of product pitches, the speakers at a J. Boye conference deliver substance. For example, among the newest confirmed case studies are Abercrombie & Fitch, Foreign Affairs and International Trade Canada, Pan American Health Organization, Hanley Wood and Oxford University (UK).

For a preview of what you will experience. Here’s an exclusive with David Pogue, technology expert and New York Times’s journalist. Sign up here and secure one of the remaining seats.

Why is Google so much more used than its competitors?

Mostly because it’s better. Fast, good, idiotproof, uncluttered, ubiquitous. There’s also, at this point, a “McDonald’s factor” happening. That is, people know the experience, it’s the same everywhere they go, there’s no risk. They use Google because they’ve always used Google. It would be very hard, therefore, for any rival to gain traction.

David Pogue, one of the featured speakers at JBoye 09 in Philadelphia May 5 to 7, 2009.

When will Gmail become the preferred email solution for organizations?

August 3, 2014. But seriously, folks. Nobody can predict the future of technology. Also, I’m sure plenty of organizations use it already, and it’s only picking up steam. Gmail is becoming truly amazing.

Will Google buy Twitter – and what will it mean if they do?

I don’t know if they’ll buy it; nobody does. It would probably mean very little except a guaranteed survival for Twitter, perhaps with enhancements along the way. That’s been Google’s pattern (for example, when it bought GrandCentral.)

Why is it so hard for organizations to get a grip on user experience design?

The problems include lack of expertise, limited budget (there’s an incentive to do things cheaply rather than properly), and lack of vision. In other words, anything done by committee generally winds up less elegant than something done by a single, focused person who knows what he’s doing.

Why are you speaking at a Philadelphia web conference organized by a Denmark-based company?

Because they obviously have excellent taste. 🙂

Stephen Arnold, April 8, 2009

Written by Stephen E. Arnold · Filed Under Conferences, Feature, Interview, News, Online (general), Publishing, Search, Social, Technology | 1 Comment

Adhere Solutions: Sticky Solutions and Connectors

March 31, 2009

I like Adhere Solutions’ software. I should. The company was conceived by my son, Erik S. Arnold. He once worked with the goslings, but he flew the coop to Chicago and services clients worldwide with his sticky solutions and connectors technology. Stuart Schram IV, one of ArnoldIT’s top geese, interviewed Erik Arnold. The full text of the conversation appears below. After the interview, you can read the full text of the Adhere Solutions news release about its newest product

Erik S. Arnold, Adhere Solutions. Quite Googley and reliable I wish to add.

What’s an Adhere?

Adhere Solutions is a Google Enterprise Partner providing products and services that help businesses create solutions based on Google and other cloud computing technologies. We have an experienced team of consultants to help our customers leverage Google’s Enterprise products (Search, Maps, Apps) to create business applications that improve access to information, communication and collaboration. Adhere will compliment Google’s enterprise products with other software and services to meet clients’ needs. Using Google as a foundation delivers applications faster and cheaper than traditional enterprise software approaches, while making end users happy. Few managers understand how they can create high-end solutions leveraging Google technologies.

Why are you providing connectors?

Connectors are an important piece of the puzzle to take advantage of Google technologies. For the GSA, it allows users to search across different sources of information inside an enterprise. I call the the Google Search Appliance a “SaaS in the Box,” because you can do sophisticated things with it if you leverage its APIs. However, you do have to have a good deal of search expertise to use the advanced capabilities.
Adhere Solutions wants to make it easy for GSA customers to index their enterprise data, and our connectors bridge the gap between the GSA and internal content stored in databases, document management systems, etc. This approach is the same as other enterprise software solutions, but customers are shielded through expensive professional services and setup fees. We want to educate the marketplace that they can use the GSA to perform these functions with connectors for a lower cost.

What’s a typical use case for your software?

Good question. I think that connectors in a search environment are easier to understand. We have a customer at a government agency that wishes to index a Documentum system with Google. Our connector extracts the data from Documentation, processes the data, and feeds it to the GSA. This process takes place on a server that outside of the GSA.

Image source: http://homepages.ius.edu/USTEWART/super_glue.jpg

A major reason for our investment in connectors, though, has to do with improvements to Google Apps. Google recently announced its visualization tools (http://googleenterprise.blogspot.com/2009/03/charts-charts-charts.html), so it is now possible to send selected enterprise data into Google Apps and have access to real-time visualization of your enterprise data. This to me is groundbreaking, I think that it is very cost efficient way to create business intelligence applications in a Google interface.

Can you deliver custom connectors?

We can build custom connectors, but we tend to license our connectors from established software vendors. Connecting into enterprise systems is not new, it is just that until now, no one has packaged a high end connector suite for the GSA. For lack of a better term, Adhere Solutions is more of an integrator than a software company. We use existing high quality products whenever we can.

How does a connector differentiate you from other GSA specialists?

Adhere Solutions is unique in that everyone involved has many years of enterprise search experience. Our goal as a company is to introduce Google into higher end search procurements. While Google Search Appliance is easy to get up and running, it is not uncommon to need help with basic search tasks. What is easy to Google is not easy for everyone. There are many fine GSA specialists who can help with basic setups, but we see ourselves as unique in delivering Google for high end solutions.

How do people reach you?

Write me: erik at adheresolutions dot com or call. Our number is 800 799 0520.

Here’s the full text of the Adhere Solutions news release:

Adhere Solutions Expands Its All Access Connector Suite For the Google Search Appliance to Include Enterprise Content Management Systems

Businesses now can provide employees greater access to enterprise data through the Google Search Appliance’s popular interface

Chicago, IL — March 31, 2009 — Today, Adhere Solutions, a certified Google Enterprise Partner, announced that its All Access Connector for the Google Search Appliance includes instant connectivity to over 30 popular enterprise content management systems, including EMC, Documentum, eRoom, IBM FileNet, and Lotus Notes, Interwoven’s TeamSite and Work Site, Microsoft SharePoint, Open Text, Oracle Stellent, Xerox Docushare and many more.
Adhere Solutions’ connector suite for the Google Search Appliance allows users to find information stored in disparate data sources and applications with Google’s user interface. This relieves users from having to separately search within each application and information repository. The Google Search Appliance combined with the All Access Connector empowers companies to efficiently unify information access and help users quickly find information to effectively perform their job.

“Users don’t particularly know or care about the subtleties of universal search vs. federated search – their mission is not to search, but rather to find. They are also not terribly interested in knowing WHY they cannot search for certain information,” said Dan Keldsen, noted Findability expert, Co-founder and Principal at Information Architect (www.InformationArchitected.com). “If factors in their findability frustrations have been because Google ‘couldn’t get there from here’ – the odds just significantly improved that the Google Search Appliance will be able to search across ALL of your information, rather than the ‘web native’ content Google is known for.”

Indexing connectors for enterprise content management systems are the newest addition to Adhere Solutions’ All Access Connector for the Google Search Appliance, which already includes federated search access to over 5,400 internal and external databases, repositories, subscription content sources, data feeds and business intelligence applications. With this addition Adhere Solutions delivers a suite of secure connectors to reduce the complexity and cost of searching across enterprise data repositories.

“Many organizations struggle with how to unlock their data when they have multiple content and document management solutions dispersed throughout their organization.” said Erik Arnold, Co-founder and President of Adhere Solutions. “We want every manager, IT or otherwise, to know that we enable the Google Search Appliance to provide enterprise search better, cheaper, and faster than other approaches.”

About Adhere Solutions

Adhere Solutions is a Google Enterprise Partner providing products and services that help businesses increase productivity through the accelerated adoption of Google and other technologies. Adhere’s experienced team of consultants help customers leverage Google’s Enterprise Search products, Google Maps, and Google Apps to create business applications that improve access to information, communication and collaboration.
For more information on Adhere Solutions products or services visit the company’s Web site at www.adheresolutions.com or write info@adheresolutions.com.

###

If you would like more information on this topic, or to schedule an interview with Erik Arnold, please contact Amy DiNorscio at (312) 380-5772 or write to pr@adheresolutions.com

Stuart Schram IV, March 31, 2009

Written by Stephen E. Arnold · Filed Under Enterprise, Feature, Interview, Publishing, Search, Technology | 2 Comments

EntropySoft: Exclusive Interview with Nicolas Maquaire, CEO

March 25, 2009

A search engine or content processing system is deaf and dumb without a connector to a content source. Most text processing systems include these software connectors (sometimes called “filters” or “adaptors”) to process flat text such as the ASCII generated by a simple text editor. But plain text makes up a small part of the content stored on an organization’s file servers, workstations, and computers. In order to index content from a legacy AS/400 system running the Ironsides enterprise resource planning system, a specialized software connector is required. Writing these connectors is tricky. EntropySoft is a content integration company. The firm has a strong competency in creating software to perform a range of content manipulations; for example, content transformation of an XML file into a file type required by another business process or enterprise system. Mr. Maquaire spoke with Stephen E. Arnold, ArnoldIT.com on March 24, 2009, about EntropySoft’s software and services.

Nicolas Maquaire, the chief executive officer, of EntropySoft described his company this way:

EntropySoft is a connector factory. We have more than 30 read/write connectors for unstructured data, possibly the biggest portfolio on the market. Our connectors enable most of the features of popular content-centric applications such as Alfresco, IBM FileNet P8, Hummingbird DM, Interwoven TeamSite, IBM Lotus Quickplace, Microsoft SharePoint etc… The extensive support of features and the size of the connector portfolio make this technology perfect OEM material for many software industries. On top of the read / write connectors, EntropySoft has two technological layers (Content ETL and Content Federation) that are also available as OEM components.

A number of the world’s leading search and content processing companies use EntropySoft’s connectors. Examples include Coveo, Exalead, and Image Integration Systems.

Mr. Maquaire, in an exclusive interview with ArnoldIT.com’s Search Wizards Speak series, said:

The market for content integration is complex. Building a single connector for a specific use case seems nonsensical to us. If you develop many connectors, interoperability then becomes reality. Thanks to its more than 30 (and growing!) connectors, EntropySoft is becoming a one-stop-shopping point for connectivity and interoperability. For the past four years, EntropySoft has acquired valuable knowledge on all popular content-centric systems. EntropySoft connectors have been market-tested for years. EntropySoft connectors are put to work daily in critical business conditions, and EntropySoft unique in-house developed testing system allows fast implementation of customer-driven connectors improvements.

You can read the full-text of the Maquaire interview on the ArnoldIT.com Web site here. The interview is number 37 in this series. The interviews provide one of the most useful bodies of information about enterprise search and content processing available at this time. The Search Wizards Speak is available as a service to organizations and information professionals worldwide. Knowledge about search and content processing increases the payoff from an investment in information retrieval.

Written by Stephen E. Arnold · Filed Under Enterprise, Interview, News, Online (general), Technology, Text processing | Comments Off on EntropySoft: Exclusive Interview with Nicolas Maquaire, CEO

Marc Krellenstein Interview: Inside Lucid Imagination

March 17, 2009

Open source search is gaining more and more attention. Marc Krellenstein, one of the founders of Lucid Imagination, a search and services firm, talked about the company’s technology with Stephen E. Arnold, ArnoldIT.com. Mr. Krellenstein was the innovator behind Northern Light’s search technology, and he served as the chief technical officer for Reed Elsevier, where he was responsible for search.

In an exclusive interview, Mr. Krellenstein said:

I started Lucid in August, 2007 together with three key Lucene/Solr core developers – Erik Hatcher, Grant Ingersoll and Yonik Seeley – and with the advice and support of Doug Cutting, the creator of Lucene, because I thought Lucene/Solr was the best search technology I’d seen. However, it lacked a real company that could provide the commercial-grade support and other services needed to realize its potential to be the most used search software (which is what you’d expect of software that is both the best core technology and free). I also wanted to continue to innovate in search, and believed it is easier and more productive to do so if you start with a high quality, open source engine and a large, active community of developers.

Mr. Krellenstein’s technical team gives the company solid open source DNA. With financial pressures increasing and many organizations expressing dissatisfaction with mainstream search solutions, Lucid Imagination may be poised to enjoy rapid growth.

Mr. Krelllenstein added:

I think most search companies that fail do so because they don’t offer decisively better and affordable software than the competition and/or can’t provide high quality support and other services. We aim to provide both and believe we are already working with the best and most affordable software. Our revenue comes not only from services such as training but also from support contracts and from value-add software that makes deploying Lucene/Solr applications easier and makes the applications better.

You can read the full text of the interview on the ArnoldIT.com Web site here. Search Wizards Speak is a collection of 36 candid interviews with movers and shakers in search, content processing, and business intelligence. Instead of reading what consultants say about a company’s technology, read what the people who developed the search and content processing systems say about their systems. Interviews may be reprinted and distributed without charge. Attribution and a back link to ArnoldIT.com and the company whose executive is featured in the interview are required. Stephen E. Arnold provides these interviews as a service to those interested in information retrieval.

Stephen Arnold, March 17, 2009

Written by Stephen E. Arnold · Filed Under Enterprise, Interview, News, Search, Semantic, Technology, Text analytics, Text processing | Comments Off on Marc Krellenstein Interview: Inside Lucid Imagination

EveryZing: Exclusive Interview with Tom Wilde, CEO

March 16, 2009

Tom Wilde, CEO of EveryZing, will be one of the speakers at the April 2009 Boston Search Engine Meeting. To meet innovators like Mr. Wilde, click here and reserve your space. Unlike “boat show” conferences that thrive on walk in gawkers, the Boston Search Engine Meeting is content muscle. Click here to reserve your spot.

EveryZing here is a “universal search and video SEO (vSEO) firm, and it recently launched MediaCloud, the Internet’s first cloud-based computing service for generating and managing metadata. Considered the “currency” of multimedia content, metadata includes the speech transcripts, time-stamped tags, categories/topics, named entities, geo-location and tagged thumbnails that comprise the backbone of the interactive web.

With MediaCloud, companies across the Web can post live or archived feeds of video, audio, image and text content to the cloud-based service and receive back a rich set of metadata. Prior to MediaCloud and the other solutions in EveryZing’s product suite — including ezSEARCH, ezSEO, MetaPlayer and RAMP — discovery and publishing of multimedia content had been restricted to the indexing of just titles and tags. Delivered in a software-as-a-service package, MediaCloud requires no software to purchase, install or maintain. Furthermore, customers only pay for the processing they need, while obtaining access to a service that has virtually unlimited scalability to handle even large content collections in near real-time. The company’s core intellectual property and capabilities include speech-to-text technology and natural language processing.

Harry Collier (Infonortics Ltd) and I spoke with Mr. Wilde on March 12, 2009. The full text of our interview with him appears below.

Will you describe briefly your company and its search / content processing technology?

EveryZing originally spun out of BBN technologies in Cambridge MA. BBN was truly one of the godfathers of the Internet, and developed the email @ protocol among other breakthroughs. Over the last 20 years, the US Government has spent approximately $100MM with BBN on speech-to-text and natural language processing technologies. These technologies were spun out in 2006 and EveryZing was formed. EveryZing has developed a unique Media Merchandising Engine which is able to connect audio and video content across the web with the search economy. By generating high quality metadata from audio and video clips, processing it with our NLP technology to automatically “tag” the content, and pushing it through our turnkey publishing system, we are able to make this content discoverable across the major search engines.

What are the three major challenges you see in search / content processing in 2009?

Indexing and discovery of audio and video content in search; 2) Deriving structured data from unstructured content; 3) Creating better user experiences for search & navigation.

What is your approach to problem solving in search and content processing?

Well, yes, meaning that all three are critical. However, the key is to start with the user expectation. Users expect to be able to find all relevant content for a given key term from a single search box. This is generally known as “universal search”. This requires then that all content formats can be easily indexed by the search engines, be they web search engines like Google or Yahoo, as well as site search engines. Further, users want to be able to alternately search and browse content at will. These user expectations drive how we have developed and deployed our products. First, we have the best audio and video content processing in the world. This enables us to richly markup these files and make them far more searchable. Second, our ability to auto-tag the content makes it eminently more browsable. Third, developing a video search result page that behaves just like a text result page (i.e. keyword in context, sortability, relevance tuning) means users can more easily navigate large video results. Finally, plumbing our meta data through the video player means users can search within videos and jump-to the precise points in these videos that are relevant to their interests. Combining all of the efforts together means we can deliver a great user experience, which in turn means more engagement and consumption for our publishing partners.

Search / content processing systems have been integrated into such diverse functions as business intelligence and customer support. Do you see search / content processing becoming increasingly integrated
into enterprise applications?

Yes, absolutely. Enterprises are facing a growing pile of structured and unstructured content, as well as an explosion in multimedia content with the advent of telepresence, Webex, videoconferencing, distance learning etc. At the same time, they face increasing requirements around discovery and compliance that requires them to be able to index all of this content. Search is rapidly gaining the same stature as databases and document management systems as core platforms.

Microsoft acquired Fast Search & Transfer. SAS acquired Teragram. Autonomy acquired Interwoven and Zantaz. In your opinion, will this consolidation create opportunities or shut doors?

Major companies are increasingly looking to vendors with deep pockets and bench strength around support and R&D. This has driven some rapid market consolidation. However, these firms are unlikely to be the innovators, and will continue to make acquisitions to broaden their offerings. There is also a requirement to more deeply integrate search into the broader enterprise IT footprint, and this is also driving acquisitions.

Multi core processors provide significant performance boosts. But search / content processing often faces bottlenecks and latency in indexing and query processing. What’s your view on the performance of
your system or systems with which you are familiar?

Yes, CPU power has directly benefited search applications. In the case of EveryZing, our cloud architecture takes advantage of quad-core computing so we can deliver triple threaded processing on each box. This enables us to create multiple quality of service tiers so we can optimize our system for latency or throughput, and do it on a customer by customer basis. This wouldn’t be possible without advances in computing power.

Graphical interfaces and portals (now called composite applications) are making a comeback. Semantic technology can make point and click interfaces more useful. What other uses of semantic technology do you see gaining significance in 2009?

Semantic analysis is core to our offering. Every clip we process is run through our NLP platform, which automatically extracts tags and key concepts. One of the great struggles publishers face today is having the resources to adequately tag and title all of their video assets. They are certainly aware of the importance of doing this, but are seeking more scalable approaches. Our system can use both a unsupervised and supervised approach to tagging content for customers.

Where can I find more information about your products, services, and research?

Our Web site is www.everyzing.com.

Written by Stephen E. Arnold · Filed Under Business strategy, Conferences, Interview, News, Search, Semantic, Technology, Text analytics, Text processing | 1 Comment

Harry Collier, Infonortics, Exclusive Interview

March 2, 2009

Editor’s Note: I spoke with Harry Collier on February 27, 2009, about the Boston Search Engine Meeting. The conference, more than a decade into in-depth explorations of search and content processing, is one of the most substantive search and content processing programs. The speakers have come from a range of information retrieval disciplines. The conference organizing committee has attracted speakers from the commercial and research sectors. Sales pitches and recycled product reviews are discouraged. Substantive presentations remain the backbone of the program. Conferences about search, search engine optimization, and Intranet search have proliferated in the last decade. Some of these shows focus on the “soft” topics in search and wrap the talks with golf outings and buzzwords. The attendee learns about “platinum sponsors” and can choose from sales pitches disguised as substantive presentations. The Infonortics search conference has remained sharply focused and content centric. One attendee told me last year, “I have to think about what I have learned. A number of speakers were quite happy to include equations in their talks.” Yep, equations. Facts. Thought provoking presentations. I still recall the tough questions posed to Larry Page (Google) after his talk in at the 1999 conference. He argued that truncation was not necessary and several in attendance did not agree with him. Google has since implemented truncation. Financial pressures have forced some organizers to cancel some of their 2009 information centric shows; for example, Gartner, Magazine Publishers Association., and Newspaper Publishers Association. to name three. Infonortics continues to thrive with its reputation for delivering content plus an opportunity to meet some of the most influential individuals in the information retrieval business. You can learn more about Infonortics here. The full text of the interview with Mr. Collier, who resides in the Cotswolds with an office in Tetbury, Glou., appears below:

Why did you start the Search Engine Meeting? How does it different from other search and SEO conferences?

The Search Engine Meeting grew out of a successful ASIDIC meeting held in Albuquerque in March 1994. The program was organized by Everett Brenner and, to everyone’s surprise, that meeting attracted record numbers of attendees. Ev was enthusiastic about continuing the meeting idea, and when Ev was enthusiastic he soon had you on board. So Infonortics agreed to take up the Search Engine Meeting concept and we did two meetings in Bath in England in 1997 and 1998, then moved thereafter to Boston (with an excursion to San Francisco in 2002 and to The Netherlands in 2004). Ev set the tone of the meetings: we wanted serious talks on serious search domain challenges. The first meeting in Bath already featured top speakers from organizations such as WebCrawler, Lycos, InfoSeek, IBM, PLS, Autonomy, Semio, Excalibur, NIST/TREC and Claritech. And ever since we have tried to avoid areas such as SEO and product puffs and to keep to the path of meaty, research talks for either search engine developers, or those in an enterprise environment charged with implementing search technology. The meetings tread a line between academic research meetings (lots of equations) and popular search engine optimization meetings (lots of commercial exhibits).

Pictured from the left: Anne Girard, Harry Collier, and Joan Brenner, wife of Ev Brenner. Each year the best presentation at the conference is recognized with the Evvie, an award named in honor of her husband, and chair of the first conference in 1997.

There’s a great deal of confusion about the meaning of the word “search”, what’s the scope of the definition for this year’s program?

Yes, “Search” is a meaty term. When you step back, searching, looking for things, seeking, hoping to find, hunting, etc are basic activities for human beings — be it seeking peace, searching for true love, trying to find an appropriate carburetor for an old vehicle, or whatever. We tend now to have a fairly catholic definition of what we include in a Search Engine Meeting. Search — and the problems of search — remains central, but we are also interested in areas such as data or text mining (extracting sense from masses of data) as well as visualization and analysis (making search results understandable and useful). We feel the center of attention is moving away from “can I retrieve all the data?” to that of “how can I find help in making sense out of all the data I am retrieving?”

Over the years, your conference has featured big companies like Autonomy, start ups like Google in 1999, and experts from very specialized fields such as Dr. David Evans and Dr. Liz Liddy. What pulls speakers to this conference?

We tend to get some of the good speakers, and most past and current luminaries have mounted the speakers’ podium of the Search Engine Meeting at one time or another. These people see us as a serious meeting where they will meet high quality professional search people. It’s a meeting without too much razzmatazz; we only have a small, informal exhibition, no real sponsorship, and we try to downplay the commercialized side of the search world. So we attract a certain class of person, and these people like finding each other at a smaller, more boutique-type meeting. We select good-quality venues (which is one reason we have stayed with the Fairmont Copley Plaza in Boston for many years), we finance and offer good lunches and a mixer cocktail, and we select meeting rooms that are ideal for an event of 150 or so people. It all helps networking and making contacts.

What people should attend this conference? Is it for scientists, entrepreneurs, marketing people?

Our attendees usually break down into around 50% people working in the search engine field, and 50 percent those charged with implementing enterprise search. Because of Infonortics international background, we have a pretty high international attendance compared with most meetings in the United States: many Europeans, Koreans and Asians. I’ve already used the word “serious”, but this is how I would characterize our typical attendee. They take lots of notes; they listen; they ask interesting questions. We don’t get many academics; Ev Brenner was always scandalized that not one person from MIT had ever attended the meeting in Boston. (That has not changed up until now).

You have the reputation for delivering a content rich program. Who assisted you with the program this year? What are the credentials of these advisor colleagues?

I like to work with people I know, with people who have a good track record. So ever since the first Infonortics Search Engine Meeting in 1997 we have relied upon the advice of people such as you, David Evans (who spoke at the very first Bath meeting), Liz Liddy (Syracuse University) and Susan Feldman (IDC). And over the past nine years or so my close associate, Anne Girard, has provided non-stop research and intelligence as to what is topical, who is up-and-coming, who can talk on what.These five people are steeped in the past, present and future of the whole world of search and information retrieval and bring a welcome sense of perspective to what we do. And, until his much lamented death in January 2006, Ev Brenner was a pillar of strength, tough-minded and with a 45 year track record in the information retrieval area.

Where can readers get more information about the conference?

The Infonortics Web site (www.infonortics.eu) provides one-click access to the Search Engine Meeting section, with details of the current program, access to pdf versions of presentations from previous years, conference booking form and details, the hotel booking form, etc.

Stephen Arnold, March 2, 2009

Written by Stephen E. Arnold · Filed Under Conferences, Enterprise, Feature, Federated search, Interview, Online (general), Search, Semantic, Technology, Text processing | 3 Comments

Attivio’s Sid Probstein: An Exclusive Interview

February 25, 2009

I caught up with Sid Probstein, Attivio’s engaging chief technologist on February 23, 2009. Attivio is a new breed information company. The company combines a number of technologies to allow its licensees to extract more value from structured and unstructured information. Mr. Probstein is one of the speakers at the Boston Search Engine Meeting, a show that is now recognized as one of the most important venues for those serious about search, information retrieval, and content processing. You can register to attend this year’s conference here. Too many conferences features confusing multi track programs, cavernous exhibit halls, and annoyed attendees who find that the substance of the program does not match the marketing hyperbole. When you attend the Boston Search Engine Meeting, you have opportunities to talk directly to influential experts like Mr. Probstein. The full text of the interview appears below.

Will you describe briefly your company and its search / content processing technology? If you are not a company, please, describe your research in search / content processing.

Attivio’s Active Intelligence Engine (AIE) is powering today’s critical business solutions with a completely new approach to unifying information access. AIE supports querying with the precision of SQL and the fuzziness of full-text search. Our patent-applied-for query-side JOIN() operator allows relational data to be manipulated as a database would, but in combination with full-text operations like fuzzy search, fielded search, Boolean search, etc. Finally our ability to save any query as an alert and thereafter have new data trigger a workflow that may notify a user or update another system, brings a sorely needed “active” component to information access.

By extending enterprise search capabilities across documents, data and media, AIE brings deeper insight to business applications and Web sites. AIE’s flexible design enables business and technology leaders to speed innovation through rapid prototyping and deployment, which dramatically lowers risk – and important consideration in today’s economy. Systems integrators, independent software vendors, corporations and government agencies partner with Attivio to automate information-driven processes and gain competitive advantage.

What are the three major challenges you see in search / content processing in 2009?

May I offer three plus a bonus challenge?

First, understanding structured and unstructured data; currently most search engines don’t deal with structured data as it exists; they remove or require removal of the relationships. Retaining these relationships is the key challenge and a core value of information access.

Second, switching from the “pull” model in which end-users consume information, to the “push” model in which end-users and information systems are fed a stream of relevant information and analysis.

Third, being able to easily and rapidly construct information access applications. The year-long implementation cycle simply won’t cut it in the current climate; after all, that was the status quo for the past five years – long, challenging implementations, as search was still nascent. In 2009 what took months should take weeks. Also, the model has to change. Instead of trying to determine exactly how to build your information access strategy – the classic “aim, fire” approach – which often misses! – the new model is to “fire” and then “aim, aim aim” – correct your course and learn as you go so that you ultimately produce an application you are delighted with.

I also want to mention supporting complex analysis and enrichment of many different forms of content. For example: identifying important fields, from a search perspective; detecting relationships between pieces of content, or entire silos of content. This is key to breaking down silos – something leading analysts agree that this will be a major focus in enterprise IT starting in 2011.

With search / content processing decades old, what have been the principal barriers to resolving these challenges in the past?

There are several hurdles. First, the inverted index structure has not traditionally been able to deal with relationships; just terms and documents. Second, there still is a lack of tools to move data around, as opposed to simply obtaining content, has been a barrier for enterprise search in particular. There has not been an analog to “ETL” in the unstructured world. (The “connector” standard is about getting data, not moving it.) Finally, I think there’s a lack of a truly dynamic architecture has meant having to re-index when changing configuration or adding new types of data to the index; also a lack of support for rapid updates has lead to a proliferation of paired search engines and databases.

With the rapid change in the business climate, how will the increasing financial pressure on information technology affect search / content processing?

Information access is critically important during a recession. Every interaction with the customer has the potential to cause churn. Reducing churn is less costly by far then acquiring new customers. Good service is one of the keys to retaining customers, and a typical cause of poor service is … poor information access. A real life example: I recently rolled over my 401K. I had 30 days to do it, and did on the 28th day via phone. On the 29th day someone else from my financial services firm called back and asked me if I wanted to roll my 401K over. This was quite surprising. When asked why the representative didn’t know I had done it the day before, they said “I don’t have access to that information”. The cost of that information access problem was two phone calls: the second rollover call, and then another call back from me to verify that I had, in fact, rolled over my 401k.

From the internal perspective of IT, demand to turn-around information access solutions will be higher than ever. The need to show progress quickly has never been higher, so selecting tools that support rapid development via iteration and prototyping is critically important.

Search is an essential feature in most every application used to create, manage or even analyze content. However, in this mode search is both a commodity and a de-facto silo of data. Standalone search and content processing will still be important as it is the best way to build applications using data across these silos. A good example here is what we call the Agile Content Network (ACN). Every content management system (CMS) has at least minimal search facilities. But how can a content provider create new channels and micro-sites of content across many incompatible CMSs? Standalone information access that can cut across silos is the answer.

Google has disrupted certain enterprise search markets with its appliance solution. The Google brand creates the idea in the minds of some procurement teams and purchasing agents that Google is the only or preferred search solution. What can a vendor do to adapt to this Google effect?

It is certainly true that Google has a powerful brand. However, vendors must promote transparency and help educate buyers so that they realize, on their own, the fit or non-fit of the GSA. It is also important to explain how what your product does is different from what Google does and how those differences apply to the customers’ needs for accessing information. Buyers are smart, and the challenge for vendors is to be sure to communicate and educate about needs, goals and the most effective way to attain them.

A good example of the Google brand blinding customers to their own needs is detailed in the following blog entry: http://www.attivio.com/attivio/blog/317-report-from-gilbane-2008-our-take-on-open-source-search.html

As you look forward, what are some new features / issues that you think will become more important in 2009? Where do you see a major break-through over the next 36 months?

I think that there continue to be no real standards around information access. We believe that older standards like SQL need to be updated with full-text capabilities. Legacy enterprise search vendors have traditionally focused on proprietary interfaces or driving their own standards. This will not be the case for the next wave of information access companies. Google and others are showing how powerful language modeling can be. I believe machine translation and various multi-word applications will all become part of the landscape in the next 36 months.

12. Mobile search is emerging as an important branch of search / content processing. Mobile search, however, imposes some limitations on presentation and query submission. What are your views of mobile search’s impact on more traditional enterprise search / content processing?

Mobile information access is definitely emerging in the enterprise. In the short term, it needs to become the instrument by which some updates are delivered – as alerts – and in other cases it is simply a notification that a more complex update – perhaps requiring a laptop – is available. In time mobile devices will be able to enrich results on their own. The iPhone, for example, could filter results using GPS location. The iPhone also shows that complex presentations are increasingly possible.

Ultimately, a mobile device, like the desktop, call center, digital home, brick and mortar store kiosk, are all access and delivery channels. Getting the information flow for each to work consistently while taking advantage of the intimacy of the medium (e.g. GPS information for mobile) is the future.

15. Where can I find more information about your products, services, and research?

The best place is our Web site: www.attivio.com.

Stephen Arnold, February 25, 2009

Written by Stephen E. Arnold · Filed Under Conferences, Enterprise, Interview, News, Search, Semantic, Technology, Text analytics, Text processing | Comments Off on Attivio’s Sid Probstein: An Exclusive Interview

Exclusive Interview, Martin Baumgartel, From Library Automation to Search

February 23, 2009

For many years, Martin Baumgartel worked for a unit of T-Mobile. His experience spans traditional information retrieval and next-generation search. Stephen Arnold and Harry Collier interviewed Mr. Baumgartel on February 20, 2009. As one of the featured speakers at the premier search conference this spring, you will be able to hear Mr. Baumgartel’s lecture and meet with him in the networking and post presentation breaks. The Boston Search Engine Meeting attracts the world’s brightest minds and most influential companies to an “all content” program. You can learn more about the conference, the tutorials, and the speakers at the Infonortics Ltd. Web site. Unlike other conferences, the Boston Search Engine Meeting limits attendance in order to facilitate conversations and networking. Register early for this year’s conference.

What’s your background in search?

When I entered the search arena in the 1990s, I originated from library automation. Back then, it was all about indexing algorithms and relevance ranking where I did research to develop a search engine. During eight years at T-Systems, we analyzed the situation in large enterprises in order to provide the right search solution. This included, increasingly, the integration of semantic technologies. Given the present hype about semantic technologies, it has been a focus in current projects to determine which approach or product can deliver in specific search scenarios. A related problem is to identify underlying principles of user-interface-innovations to know what’s going to work (and what’s not).

What are the three major challenges you see in search / content processing in 2009?

Let me come at this in a non technical way. There are plenty of challenges awaiting algorithmic solutions, I see more important challenges here:

Identifying the real objectives, fighting myths For an organization to implement internal search today hasn’t become any easier. There are numerous internal stakeholders, paired with a very high user expectation (they want the same quality as with Internet search, only better, more tailored to their work situation and without advertising…). To keep a sharp analysis becomes difficult in an orchestra of opinions, in particular when familiar brand names get involved (“Let’s just take Google internally, that will do.” )
Avoid simplicity. Although many CIOs claim they have “cleaned up” their intranets, enterprise search remains complex; both technological and in terms of successful management. Therefore, to tackle the problem with a self-proclaimed simple solution (plug in, ready, go) will provide Search. But perhaps not the search solution needed and with hidden costs, especially on the long run. In the other extreme, a design too complex – with the purchase of dozens of connectors – is likely to burst your budget.
Attention. Recently, I heard a lot about how the financial crisis will affect search. In my view, the effects are only reinforcing the challenge “How to draw enough management attention to Search to make sure it’s treated like other core assets”. Some customers might slow down the purchase of some SAP add-on modules or postpone a migration to the next version of Backup Software. But the status of those solutions among CIOs will remain high and un questioned.

With search / content processing decades old, what have been the principal barriers to resolving these challenges in the past?

There’s no unique definition of the ‘Enterprise Search Problem” as if it would be a math theorem. Therefore, you find somehow amorphous definitions about what is to be solved. Let’s take the scope of content to be searched: everything internal? And nothing external? Another obstacle is the widespread believe in shortcuts. Popular example: Let’s just index the content present in our internal content management system, the other content sources are irrelevant. That way, the concept of completeness in search/result set is sacrificed. But search can be as gruesome as the Marathon: you need endurance and there are no shortcuts. If you take a shortcut, you’ve failed.

What is your approach to problem solving in search and content processing?

Smarter software definitely, because the challenges in search (and there are more than three) are attracting programmers and innovators to come up with new solutions. But, in general, my approach is “keep your cool”. Assess the situation, analyze tools and environment, design the solution and explain it clearly. In the process, interfaces have to be improved sometimes in order to trim them down to fit with the corporate intranet design.

With the rapid change in the business climate, how will the increasing financial pressure on information technology affect search / content processing?

We’ll see how far a consolidation process will go. Perhaps we’ll see discontinued search products where we initially didn’t expect it. Also, the relation asked in the following question might be affected: software companies are unlikely to cut back at core features of their product. But integrated search functions are perhaps identified for the scalpel.

I’ve seen it the other way around: Customer Support Managers told me (the Search person) that the built-in search-tool is ok but that they would like to look up additional information from some other internal applications. I don’t believe that built-in search will replace stand-alone search. The term “built-in” tells you that the main purpose of the application is something else. No surprise that, for instance, the user interface was designed for this main purpose – and will, in conclusion, not address typical needs of search.

Google has disrupted certain enterprise search markets with its appliance solution. What can a vendor do to adapt to this Google effect?

A vendor should point out where he differs from Google and why to address this Google-effect.

But I see Google as a significant player in enterprise search, if only for the mindset of procurement teams you describe in your question.

As you look forward, what are some new features / issues that you think will become more important in 2009?

The issue of cloudsourcing will gain traction. As a consequence, not only small and medium sized enterprises will discover that they might not invest in in house Content Management and Collaboration applications, but use a hosted service instead. This is when you need more than a “behind the firewall” search, because content will be scattered across multiple clouds (CRM cloud, Office cloud). I’m not sure whether we see a breakthrough there in 36 month; but the sooner the better.

Where can I find more information about your services and research?

http://www.linkedin.com/in/mbaumgartel

Stephen E. Arnold, www.arnoldit.com/sitemap.html and Harry Collier, www.infonortics.com

Written by Stephen E. Arnold · Filed Under Conferences, Enterprise, Interview, News, Online (general), Search, Text analytics, Text processing | Comments Off on Exclusive Interview, Martin Baumgartel, From Library Automation to Search

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Exclusive Interview with MaxxCat

Bob Boiko, Exclusive Interview

Exclusive Interview with David Pogue

Adhere Solutions: Sticky Solutions and Connectors

EntropySoft: Exclusive Interview with Nicolas Maquaire, CEO

Marc Krellenstein Interview: Inside Lucid Imagination

EveryZing: Exclusive Interview with Tom Wilde, CEO

Harry Collier, Infonortics, Exclusive Interview

Attivio’s Sid Probstein: An Exclusive Interview

Exclusive Interview, Martin Baumgartel, From Library Automation to Search

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta