SearchBlox: Built on Apache Lucene

December 16, 2010

One of my two or three readers sent me a snippet of information from a LinkedIn user group. The write up referenced a search system from SearchBlox Software. As you may know, ArnoldIT.com contributed to the Lucene Revolution conference in October 2010. That for-fee job provided us with quite a bit of insight into the open source software world in general, and the open source search world, in particular.

SearchBlox makes version 6.1 of its “out-of-the-box enterprise search solution” via a download link on the company’s Web site. You can get your copy at this link. I found the direct download very refreshing. In the last six months I have noticed that it is tough to locate a download link on some sites. Sure, there are links but these often pitch software and services in which I have zero interest.

SearchBlox gets a big atta, boy for its approach.

The system is available for Windows, Unix, and Mac OS X. There is a cloud version available, which I find particularly interesting. A number of search vendors “talk” about the cloud, but some of the companies’ products were conceptualized in the mid 1990s. Talk does not mean that the cloud implementations are ready for prime time. SearchBlox seems fine. The product can be configured to do Web site search, Intranet search, and eDiscovery. My interest is enterprise search, and I will focus my comments on that feature. If you want to explore the other two uses, have at it.

The administrative tools are clear and comparatively easy to use. I am not sure my lawyer could get the system up and running, but for us, no problems. The administrative interface looks like this:

The features include:

Integrated crawlers. The system can handle filesystems, RSS feeds, assorted file types, and most Internet content. If you want more connectors, you can contact the company or contact one of the connector vendors. If you are a clever lad or lass, you can code your own. Reverse engineering connectors for certain file types may require permission from the vendor using the proprietary file types.
Multi-lingual support. This is important but most US organizations are quite happy with English with other languages supported when constituents, partners, or customers complain.
Enterprise support. This is where SearchBlox hopes to establish a relationship with a firm. Like other open source vendors, fees are charged for technical, professional, and engineering work. The base fee is $5,000, but I suggest checking with SearchBlox to get the pricing estimate for the services and support you specifically require. Cloud costs have to be worked up as a price quote which is becoming standard at many firms working with Amazon.

The company provides a chunk of info for developers and some documentation. The blog featured a price comparison with the Google Mini at http://www.searchblox.com/blog-2. The pricing of the Google Search Appliance could be misleading, however. To get a Google Search Appliance able to process 30 million documents can be expensive. You are looking at six figures. Hot fail over devices for the GSA add to the cost.

The company is based in Richmond, Virginia, which is not too far from Washington, DC. The company says, “Over 300 customers in 30 countries use SearchBlox to power their website, intranet and custom search. SearchBlox Software, Inc. was founded in 2003.

As with other open source software, assess your technical expertise before diving into Lucene/Solr waters.

Worth a look.

Stephen E Arnold, December 16, 2010

Freebie

Written by Stephen E. Arnold · Filed Under News, Open source, Search, Technology, Text processing | 2 Comments

Quote to Note: Netflix Is Albania

December 16, 2010

I saw this quote in my hard copy of the dear old New York Times. The reference page is Section B1 (National Edition) Business page. The article with the alleged statement is “time Warner Views Netflix As a Fading Star.”

Here’s the alleged quote, attributed to Jeffrey Bewkes, Time Warner executive:

“It’s a little bit like, is the Albanian army going to take over the world. I don’t think so.”

The mystery pronoun “it” refers to the success of Netflix, the streaming video service that recently put its goodies in the hands of Amazon’s cloud system. Yep, that’s the company whose cloud service went offline recently. The “take over the world” is ambiguous, but I interpreted the phrase to mean that Netflix (Albania) would not be able to control the real media industry.

I didn’t see a reference to Apple in the story whose online system embodied in hardware, software, and iTunes has had a reasonably significant impact on the music sector.

I really admire the metaphor. Netflix as Albania. That’s one place where I found the immigration procedure quite interesting. Lovely yard care in the smaller cities’ residential neighborhood as well.

Stephen E Arnold, December 16, 2010

Freebie

Written by Stephen E. Arnold · Filed Under Business strategy, News, Quotation, Rich media, Technology | Comments Off on Quote to Note: Netflix Is Albania

Leaks Becoming a River

December 16, 2010

“Openleaks Set to Rival WikiLeaks for Business” announces that one of WikiLeaks’ former employees is opening a new, rival company. In sum: “Openleaks will be a ‘service provider for third parties that want to be able to accept material from anonymous sources’ and will be based in Germany.” The third party aspect makes it distinctive from WikiLeaks since it will be an intermediary and not hosting the information for the public. As these types of sites increase, governments are finding that the ability to gather electronic information is a two-way street: it can gather information on citizens, but citizens also can find ways to gather it themselves. And with the lack of current laws for adequately prosecuting Julian Assange, these kinds of leaks are not likely to be dammed up any time soon.

Alice Wasielewski, December 16, 2010

Freebie

Written by Stephen E. Arnold · Filed Under Business strategy, Government, Legal matters, News, Online (general), Publishing, Search | 1 Comment

Ant Tech: Not So New

December 16, 2010

Short honk: “Next Generation of Algorithms Inspired by Problem-Solving Ants” talked about swarming algorithms and hungry ants. If you are interested in swarming ants, you will find that ants are clever beasties. The write up reminded me of the Inferno search method, developed by NuTech Solutions. I did some work for the company eight years ago. The NuTech method involved swarming algorithms applied to search and retrieval. Just wanted to remind my two or three readers that information that seems so fresh and novel is often not that. NuTech was realigned and the Inferno search system shelved. The math wizard behind the product moved to Australia. Inferno did not need live ants, just algorithms that implemented certain numerical recipes based in part on what is called mereology.

Stephen E Arnold, December 16, 2010

Freebie

Written by Stephen E. Arnold · Filed Under Analytics, News, Search, Technology, Text processing | Comments Off on Ant Tech: Not So New

OCLC-SkyRiver Dust Up

December 16, 2010

In the excitement of the i2 Ltd. legal action against Palantir, I put the OCLC – SkyRiver legal hassle aside. I was reminded of the library wrestling match when I read “SkyRiver Challenges OCLC as Newest LC Authority Records Node.” I don’t do too much in libraries at this time. But OCLC is a familiar name to me; SkyRiver not so much. The original article about the legal issue appeared in Library Journal in July 29, 2010, “SkyRiver and Innovative Interfaces File Major Antitrust Lawsuit against OCLC.” Libraries are mostly about information access. Search would not have become the core function if it had not been for libraries’ early adoption of online services and their making online access available to patrons. In the days before the wild and wooly Web, libraries were harbingers of the revolution in research.

Legal battles are not unknown in the staid world of research, library services, and traditional indexing and content processing activities. But a fight between a household name and OCLC and a company with which I had modest familiarity is news.

Here’s the key passage from the Library Journal write up:

Bibliographic services company SkyRiver Technology Solutions recently announced that it had become an official node of the Name Authority Cooperative Program (NACO), part of the Library of Congress’s (LC) Program for Cooperative Cataloging. It’s the first private company to provide this service, which was already provided by the nonprofit OCLC—SkyRiver’s much larger competitor in the bibliographic services field—and the British Library. Previously, many institutions have submitted their name authority records via OCLC. But SkyRiver’s new status as a NACO node allows it to provide the service, once exclusive to OCLC in the United States, to its users directly.

For me, this is a poke in the eye for OCLC, an outfit that used me on a couple of project when General K. Wayne Smith was running a very tight operation. I don’t know how management works at OCLC, but I think any action by the Library of Congress is going to trigger some meetings.

SkyRiver sees OCLC as acting in a non-competitive way. Now the Library of Congress has blown a kiss at SkyRiver. Looks like the library landscape, already ravaged by budget bulldozers, may be undergoing another change. I think outline of the mountain range where the work is underway appears to spell out the word “Monopoly.” Nah, probably my imagination.

Stephen E Arnold, December 16, 2010

Freebie

Written by Stephen E. Arnold · Filed Under Business strategy, Indexing, Legal matters, Library automation, News, Search | Comments Off on OCLC-SkyRiver Dust Up

Yolink from TigerLogic

December 16, 2010

TigerLogic offers a number of data and content solutions. The company (originally named Blyth Holdings, then Omnis Technology, and then Raining Data) uses proprietary methods to normalize data. The company refers to its method as Pick Universal Data Model (Pick UDM). The Pick UDM is a component across the XDMS and MDMS product lines. The approach looks similar to those used by other XML-centric transformation and access methods.

The company’s newest product is a Facebook user’s solution to the problem of aggregating FB content in one display. PostPost, a real-time Facebook newspaper, described this way on the TigerLogic Web site:

PostPost enables users to quickly skim relevant passages of text shared by their Facebook friends and sort shared content by type. To access PostPost, users simply login using Facebook Connect, and in a matter of seconds, all shared links, pictures, videos, articles from their Facebook friends will populate the front page of their personal paper.

You can see a video and obtain more information at www.postpost.com.

See http://www.postpost.com

We learned about the firm’s Yolink product this summer. Yolink extracts information from behind links and inside documents. On the Yolink Web site, you can see examples of outputs from the system. The content sources includes Craigslist, Google Patent Search, and Wikipedia.

Wikipedia included this comment sourced from CNet.com:

Yolink searches within the pages of your engine’s results to find your search terms in context. Go beyond the links. Search Web pages and discover information conventional search tools may have never revealed. In addition to mining content on a webpage, yolink will mine all of the links on that page for information relevant to your search. Yolink highlights information in the context of its original Web page and on the right side of your browser. Eliminating the need to bounce between multiple windows. Share your findings effortlessly by clicking on the save and share link. An email message containing your valuable information and the original Web page address is instantly created and ready to send, or save in folders for future use. Go beyond conventional search and find commands. Yolink allows you to search lengthy reference manuals, PDFs, legal documents, contracts, and news sites quickly and effortlessly. Yolink is especially helpful with a multi-word search, because it can extract all of the relevant content surrounding any of your search terms and display it all at once.”

Yolink is a unit of TigerLogic. The company develops software and solutions for creating and improving software applications. In addition to Yolink, the company offers XML Data Management Servers (XDMS), Multidimensional Database Management Systems (MDMS) and Rapid Application Development (RAD) software tools.

We think that the emergence of Facebook centric content aggregation tools is an interesting development. Search without navigating to a FB page is part of the “search without search” shift some vendors are advocating.

Stephen E Arnold, December 16, 2010

Freebie

Written by Stephen E. Arnold · Filed Under Facebook, News, Online (general), Search, Social | Comments Off on Yolink from TigerLogic

Microsoft Wins USDA Deal

December 15, 2010

Short honk: Not too exciting for the GOOG. Microsoft allegedly won a cloud services deal for the US Department of Agriculture. You can get the details in “Cloud Power or Cloud Compromises: You Choose.” Let’s assume the news item is accurate?

Is it time for Google to reassess its approach to selling to the US government?

Questions to answer include:

What’s Microsoft doing that Google is not?
Is Google’s reputation creating an invisible “shields up” for procurement teams?
Does taking legal action against US government procurements generate apprehension?
Does the US procurement method stress Google’s ability to follow directions no matter how addled?

I don’t know the answers to these questions. Someone may want to tackle them.

Stephen E Arnold, December 15, 2010

Freebie

Written by Stephen E. Arnold · Filed Under Business strategy, Google, Government, News | Comments Off on Microsoft Wins USDA Deal

Repositioning 2011: The Mad Scramble

December 15, 2010

Yep, the new year fast approaches. Time to turn one’s thoughts to vendors of search, content processing, data fusion, text mining, and—who could forget?—knowledge management. In the last two weeks, I have done several live-and-in-person briefings about ArnoldIT.com’s views on enterprise search and related disciplines.

Today enterprise search has become what I call an elastic concept. It is stretched over a baker’s dozen of quite divergent information retrieval concepts. Examples range from the old bugaboo of many companies customer support to the effervescence of knowledge management. In between the hard realities of the costs of support actual customers and the frothy topping of “knowledge”.

Several trends are pushing through the fractured landscape of information retrieval. Like earthquakes, the effects can vary significantly depending on one’s position at the time of the event.

Source: http://www.sportsnet.ca/gallery/2009/12/30/scramble_gal_640.jpg

Search can looked at in different ways. One can focus on a particular problem; for example, content management system repositories. The challenge is to find information in these systems. One would think that after years of making Web pages, the problem would be solved. Apparently not. CMS with embedded search stubs trigger some grousing in most of the organizations with which I am familiar. Search works, just not exactly as the users expect. A vendor of search technology can position the search solution as one that makes it easy for users to locate information in a CMS. This is, of course, the pitch of numerous Microsoft Certified Gold resellers of various types of search solutions, utilities, and work arounds. This an example of a search market defined by the type of enterprise system that creates a retrieval problem.

Other problems for search crop up when specific rules and regulations mandate a particular type of information processing. One example is the eDiscovery market. Anyone can be sued, and eDiscovery systems have to make content findable, but the users of an eDiscovery system have quite particular needs. One example is bookkeeping so that the time and search process can be documented and provided upon request under certain conditions.

Social media has created a new type of problem. One can take a specific industry sector such as the Madison Avenue crowd and apply information technology to the social media problem. The idea is for a search system to “harvest” data from social content sources like Facebook or Twitter, process the text which can be ambiguous, and generate information about how the people creating Facebook messages or tweets perceive a product, person, ad, or some other activity for the advertising team. The idea is that search unlocks hidden information. The Mad Ave crowd thinks in terms of nuggets of information that will allow the ad team to upsell the advertiser. Search is doing search work but the object of the exercise is to make sense out of content streams that are too voluminous for a single person to read. This type of search market—which may not be classic search and retrieval at all—is closer to what various intelligence agencies want software to do to transcribed phone calls, email, and general information from a range of sources.

Let’s stop with the examples of information access problems already. There are more information access problems than at any other time, and I want to move on to the impact of these quite diverse problems upon vendors in 2011.

Now let’s take a vendor that has a search system that can index Word documents, email, and content found in most office environments. Nothing tricky like product specifications, chemical structures, or the data in the R&D department’s lab notebooks. For mainstream search, here is the problem:

Commoditization

Right now (now pun on the vendor of customer support solutions by the way) anyone can download an open source search solution. It helps if the person downloading Lucene, Solr, or one of the other open source solutions has a technical bent. If not, a local university’s computer science department can provide a student to do the installation and get the system up and running. If the part time contracting approach won’t work, you can hire a company specializing in open source to do the work. There are dozens of these outfits bouncing around.

Written by Stephen E. Arnold · Filed Under Business strategy, Enterprise, Feature, Search, Technology | 1 Comment

Exclusive Interview: Brian Pinkerton

December 15, 2010

Introduction

At a recent conference, there was much buzz about consulting firms’ opinions about enterprise search. I spoke with several people who expressed surprise at the “rankings”. For example, one high-profile firm pronounced Vivisimo as the top vendor in enterprise search. Vivisimo positions itself as an “information optimization” company. I am not sure what that means, but it is clear that “enterprise search” is not the company’s main focus. Nevertheless, Vivisimo is number one.

Okay, but Vivisimo started life a company with on-the-fly clustering. Then Vivisimo morphed into a vendor of federated search. Next Vivisimo dabbled in government contracts. After an executive shake up and an infusion of venture capital, Vivisimo emerged as an “information optimization” company. The phrase is as confusing as Google’s “contextual discovery.”

What are these marketers talking about? The answer is making sales and no-calorie marketing jargon. The consulting firms know a sales opportunity exists when user satisfaction with enterprise search is chugging along in the 50 to 70 percent range. Yes, most users of an enterprise “findability” system are unhappy. Procurement teams are, therefore, busy because most companies are looking for a search silver bullet.

To cater to those looking for a quick, simple way to solve an enterprise information access problem, consultants and advisors offer impressionistic write ups. Madison Avenue works fine when selling toothpaste. Apply that method to the very tough problem of information retrieval, and you end up with confusion, rising costs, and unhappy users.

Let me give you another example that surfaced in my conversations with vendors in London at the December International Online Conference. I learned that one consulting firm named Endeca as the top dog in enterprise search. I am okay with that assertion as long as there are some data to back up the claim. When I hear the name “Endeca”, I think of eCommerce as the core strength. The system can be applied to other information problems, but when I recall Endeca’s patent applications, I think about eCommerce, not discovery and data fusion.

Perhaps some search firms are more adept at social engineering than software engineering? Are some search advisors doing Madison Avenue-type thinking, not engineering analyses?

I don’t have any quibble with consulting firms who peg Autonomy as Number One. The revenue alone makes the difference between Autonomy and other information access vendors evident. Last time I saw Andrew Kanter, the chief operating officer for the vendor of meaning-based computing solutions, I asked him, “When will Autonomy break the $1.0 billion in revenue barrier?” He told and an audience of about 175 people that Autonomy “was only $900 million.” Yep, $900 million, which is orders of magnitude greater than most of the 300 vendors whose information retrieval technology I track. IBM, Google, Microsoft, and Oracle do not provide search revenue detail in the financial reports. So on revenue Autonomy has a valid claim to the Number One position in enterprise search.

Consulting Firms Want to Sell Work, Not Expose Warts

Consulting firms—particularly those confined to the mid-tier below the McKinseys, the Bains and the Booz Allens and above the independent experts—have to feed their firms’ revenue hunger. Consulting is an expensive business because full time employees have to be kept billable. Making sales, therefore, is more important than objectivity in my experience.

What mid tier consulting firm sales professional wants to irritate an IBM, Google, Microsoft, or Oracle? Big companies, therefore, are often graded on the curve. Is it not easier to rubber stamp search systems from these Big Four vendors? Get along, go along is perhaps the motto in certain situations.

One consequence of the pressure to make sales is that consulting firms have to back certain horses. The idea is to focus on commercial vendors who are likely to have an appetite for buying and paying for the services of the consulting firm.

Somewhat surprisingly, most of the consulting firms’ search analyses fumble the ball when it comes to open source search; namely, Lucene/Solr, FLAX, Tesuji, and others. The fact is that organizations like Cisco Systems, eHarmony, LinkedIn, MTV, and Twitter, among others are relying on open source “findability” solutions, in particular Lucene/Solr. Open source search is now a viable option for many organizations, and the deprecation of Lucene/Solr is surprising to me.

The bottom-line is that most search vendor league tables are suspect. Unfortunately, these league tables are viewed fact.

On December 10, 2010, I wanted to get an open source technology to talk about open source search and how that option is perceived by marketing organizations masquerading as independent analysts.

The Interview

I spoke with Dr. Brian Pinkerton, one of Lucid Imagination’s vice president of product development. Brian has has a Ph.D. in Computer Science & Engineering and started his work career as a senior software engineer at NeXT. He then developed WebCrawler, the Web’s first comprehensive search engine.

Brian Pinkerton, VP Product Development, Lucid Imagination

Since then he was Technical Architect at AOL (which acquired WebCrawler), VP of Engineering and Chief Scientist at Excite, Principal Architect at A9, Director of Search at Technorati and co-founder/President of Minimal Loop, whose technology was acquired by Scout Labs and where Brian was VP of Engineering.

Today (December 15, 2010) Lucid Imagination is announcing the general availability of its Lucid Works enterprise product, which is available for free download. the product is described as a search solution development platform built on open source Apache Lucene/Solr.

The full text of my interview with Brian appears below:

Several consulting firms have issued analyses of the enterprise search market. I noted that open source search in general and Lucid Imagination in particular were not highlighted as top candidates for the enterprise. Why is open source search put on the bench?

Economics, primarily. Because customers spend huge amounts of money on commercial packages, a small industry has grown up to support and encourage such decisions. This process is naturally set up to ignore disruptive technologies, especially ones that are price-disruptive. The consulting firms don’t work for free: getting prominent placement in a report usually costs money. Who’s paying that fee for open source? Another important reason is the market: developers, not IT managers, are the main adopters of open source solutions, while IT execs are the main consumers of the fancy reports.

Large organizations rely on consultants’ reports. In your opinion are these reports accurate?

It’s hard to comment on these reports because the methods are not always transparent. These consultants spend a lot of time talking to vendors and customers, and draw some conclusions based on that. Many of them have been at it for a while, and they survive by providing useful insights. One useful thing to note, though, is that their conclusions are biased by those they talk to and their target audience: the IT exec. If you’re one of those, I’m sure you like the reports. If you’re a developer, you might not.

How is Lucid Imagination productizing open source search?

We have released a product, LucidWorks Enterprise, that extends Lucene/Solr with features commonly needed by commercial customers. We focus on is providing technology that will make open source Lucene/Solr more accessible to more people. For instance, user interfaces that simplify getting started, or APIs that are specifically targeted to the way enterprises build and integrate applications today.

For example, we extend Solr with RESTful interfaces for configuration; that provides developers with the ability to integrate it more easily. We also simplify functions that could be built from open source, but are more convenient to take as ready-made features. Finally, we add features that 99.9% of software developers probably can’t create easily from scratch, such as our Click Scoring framework, which boosts search results selected most often by users.

Furthermore, open source projects are really good at broad innovation, transparency, and easy access. But the communities around open source projects are not support organizations, so many vendors help companies adopting open source with timely expert support. That’s another one of the things we do at Lucid.

What steps have you taken to ensure the stability of the open source search product you offer?

We take the latest, most stable innovations from the open source development tree (known as ‘trunk’) and provide rigorous integration testing, as well as regular, stable releases driven by customer opportunities. We follow strict software engineering principles and use a quality-driven release process to build LucidWorks Enterprise. And we provide maintenance fixes and releases for our product in timely fashion to customers.

Proprietary search vendors emphasize that their approach ensures that licensees get timely bug fixes and updates. Is this a valid statement? What does Lucid Imagination provide a customer who wants timely bug fixes and updates?

I think both open-source vendors and commercial software suppliers provide timely bug fixes and updates. On the open-source side, it’s an interesting challenge because some bugs are fixed nearly instantly by the open source community, but they are not packaged in a way that a production customer can easily consume. Production customers want bug-fix-only branches of the the software, not bug fixes accompanied by the latest feature innovations that happened to be committed at the same time. We insulate our customers from the open-source volatility by releasing stable, bug-fix-only branches for our production customers.

Search technology has fragmented into a mind numbing number of implementations such as an appliance, cloud or hosted search, on premises search, and combinations of methods. How does Lucid Imagination’s search product fit into this fragmented solutions landscape?

LucidWorks Enterprise is a product that spans the range from software appliance to developer toolkit. Customers new to search can deploy it in a turnkey fashion, while more sophisticated customers can dive under the hood and build a complex application around it. A key secret to great search is how well it fits the business it is meant to serve — in fact, this is true of any application, particularly custom built apps. We believe that anyone who needs better than ‘adequate’ search results will want to build their search solution, and we created LucidWorks Enterprise to provide the best, lowest cost, most scalable platform for building that search solution.

Microsoft SharePoint provides a search solution. Microsoft offers the Fast technology for a more robust solution. What does Lucid Imagination provide to a SharePoint licensee wanting an enhanced search solution?

We will release a robust SharePoint solution in the first two quarters of 2011 and provide anyone to use LucidWorks Enterprise to search their SharePoint data alongside data from other common sources. One of the open questions about the new SharePoint solution is how long Microsoft will support Fast’s integration with anything but SharePoint.

Many search vendors offer faceted search; that is, the system generates hot links to related or supporting content. What is Lucid Imagination’s approach to faceted search?

Both LucidWorks Enterprise and Solr provide faceting support on every query that enables users to refine their results. Faceting is most obviously useful in eCommerce, though a wide variety of applications also take advantage of the feature. LucidWorks Enterprise and Solr support efficient and scalable faceting on any field, providing human-readable labels and accurate facet counts for the top facets. One of the important considerations for large collections is the degree to which faceting works in a distributed configuration. In LucidWorks Enterprise and Solr, faceting is supported seamlessly in distributed situations, offering the full performance at scale.

Would you describe a customer support use case for Lucid Imagination search? What are some common themes?

Because we have a diverse base of customers, we see a wide range of search applications. One common theme is relevance tuning: for instance, customers who need help tying certain results to certain queries, or just better optimizing the algorithms built with Solr & Lucene to deliver the right results. Another common theme, and one that I personally enjoy helping customers with, is performance. We had one customer who replaced a commercial search engine with Solr, reducing their median query response time from 30 seconds to about four seconds without our help. We then helped them reduce that by another factor of eight, to a median query response time of under half a second.

With open source search gaining acceptance within large companies like Cisco and high demand Web applications like Twitter, why are the consulting firms giving open source and Lucid so little attention?

One reason is that it’s coming up really, really fast — and they may not see it coming. Also, open source adoption is often driven by a broad, diffuse population of developers. The developers don’t generally put much stock in what the analysts say, if they’re even aware of the reports to begin with. And on the flip side, the analysts are paying attention to their own customers, CIOs and vendor salespeople, who may not know how the work is really getting done.

What do you suggest a procurement team do to evaluate fully an open source search solution such as the one Lucid Imagination offers?

I think they need to make sure their company is comfortable with creating their own applications; it’s not a passive technology, but one that can be actively used to drive competitive advantage. In looking at vendors, find one that can offer a solution that grows as their needs and skills grow: from something simple in the beginning to something fully customizable as they become more sophisticated consumers. And most importantly, they should look for a company with the depth and expertise to provide training, support, and consulting to help them harness the full scope of search innovation. Finally, they should do the math compared to what they might pay for a comparable implementation with a commercial enterprise search vendor. In many cases, they’re already spending many times what it would cost them to buy an open source-based solution. Sometimes they’ll pay more just for the annual maintenance — excluding consulting and license fees — than for a complete subscription for LucidWorks Enterprise.

In several of the recent analyses of enterprise search systems I have reviewed, I learned about such companies as Sinequa, Fabasoft and Expert System, both examples of firms that have zero profile in many organizations. In your opinion, why are these types of search vendors given so much attention in the search market?

I can imagine that the marketing guys at such organizations are always happy to talk to industry analysts. I spend my time mainly talking to customers and developers.

How can one get more information about Lucid Imagination and its open source enterprise search solution?

Our Web site www.lucidimagination.com is full of information about our product, LucidWorks Enterprise, and other information about the open source technologies, Lucene and Solr. We also have case studies that show how customers are building applications and products with Solr, Lucene, and LucidWorks Enterprise. And I always recommend downloading our product, now available free to developers, and taking it for a spin.

ArnoldIT Comment

My view about consulting firms’ analyses of search and content processing vendors has evolved over the last two years. The economic impact has put pressure on most of the companies that sell technical advice. Since the 2008 financial storm roiled commercial waters, certain advisory firms have shifted from independent analyses to what generates revenue for the consulting firms.

Many of the consulting firms’ reports are white papers or marketing material. The problem is that search is a particularly difficult technical field. Selecting a search system is often a difficult challenge for a procurement team. There are numerous, complex factors to consider.

Consulting firms offer “advice” about what system or systems is the “best” at a particular function. The problem is that writing about search is different from implementing search. It is easier to describe what a search vendor asserts in a demo. It is harder to take that solution and solve a real-world problem in a Microsoft SharePoint environment or in a setting where numerous mission critical applications operate in a stand alone manner.

If you are looking for a search solution, you will need to develop a “tight spec” and then investigate the options that match specific requirements. Few organizations have the time or resources to test multiple systems before making a decision about what search system to license.

The need for information about search creates an opportunity for independent firms to provide information, often at a hefty fee. In my experience, selecting a search system requires an approach close to the one that Martin White and I set forth in our 2009 book Successful Enterprise Search Management, published by Galatea in the UK.

We suggest that procurement teams become familiar with the available literature about search. Then a methodical process of assessment and evaluation can be followed. The short cut often leads to the all-too-common complaints about a search system. Users cannot locate needed information and user satisfaction plummets.

Stephen E Arnold, December 15, 2010

Sponsored

Written by Stephen E. Arnold · Filed Under Enterprise, Interview, News, Open source, Search, Technology, Text processing | Comments Off on Exclusive Interview: Brian Pinkerton

Online Shopping Reduces Hassles of the Mall

December 15, 2010

As reported on adage.com, Ad Age and Ipsos Observer recently conducted a survey studying the holiday buying habits of American consumers, and the results do not exactly bode well for physical retail stores as compared to virtual ones. The article detailed:

“Shoppers hunting for deals and convenience increasingly turned online during the Thanksgiving selling season. Coremetrics reported sales jumps of 19.4 percent on Cyber Monday, 9 percent on Black Friday and 28 percent on Thanksgiving Day when many retailers started rolling out their Black Friday deals early. These numbers far outpaced the nearly flat sales at physical store locations.”

Per the survey, daily discount websites such as Groupon provide incentive for shoppers to tarry off to the tangible storefronts in the name of getting a good deal. These sites will even hone on members specific interests based on location, demographics and input on interests. While this may seem like a silver lining for the brick-and-mortar locales, the article states that only the consumer is benefiting. “SymphonyIRI Group data show that coupons are not driving incremental sales. They are more likely to offer discounts to those already planning to buy, thereby cutting at the margins for retailers.”

Something else new to the shopping arena is the plethora of iPhone apps suited for this purpose, ranging from instant price comparison programs and barcode scanners to online coupon books, alleviating the need for printing and cutting. Easy just got even easier. So I guess with the internet, ‘the customer is king’ takes on a whole new meaning.

Sarah Rogers, December 15, 2010

Freebie

Written by Stephen E. Arnold · Filed Under Business strategy, Consumer, ECommerce, News, Online (general), Technology | Comments Off on Online Shopping Reduces Hassles of the Mall

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

SearchBlox: Built on Apache Lucene

Quote to Note: Netflix Is Albania

Leaks Becoming a River

Ant Tech: Not So New

OCLC-SkyRiver Dust Up

Yolink from TigerLogic

Microsoft Wins USDA Deal

Repositioning 2011: The Mad Scramble

Exclusive Interview: Brian Pinkerton

Online Shopping Reduces Hassles of the Mall

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta