A Free Pass for Open Source Search?

February 11, 2010

Dateline: Harrod’s Creek, February 11, 2010

I read Gavin Clarke’s “Microsoft Drops Open Source Birthday Gift with Fast Lucidly Imaginative?” I think that the point of the story was “a free pass” to “open source search providers like Lucid Imagination” is interesting. However, I am not willing to accept “free pass”, a variant of the “free lunch” in my opinion.

Here’s my view from the pleasant clime of snowy Harrod’s Creek.

First, in my opinion, most of the Fast Search & Transfer licensees bought into the “one size fits all” approach to search: facets, reports, access to structured and unstructured data, etc. As many of these licensees discovered, the cost of making Fast’s search technology deliver on the marketing PowerPoints was high. Furthermore, some like me learned how difficult it was for certain licensees to get the moving parts in sync quickly. Fast ESP consisted, prior to the Microsoft buy out, of keyword search, semantics from a team in Germany, third-party magic from companies like Lexalytics, home brew code from Norwegian wizards, and outright acquisitions for publishing and content management functionality. Wisely, many search vendors have learned to steer clear of the path that Fast Search & Transfer chopped through the sales wilderness. This means that orphaned Fast Search licensees may be looking at procurements that narrow the scope of search and content processing systems. In fact, there are only a handful vendors who are now pitching the “kitchen sink” approach to search.

no free lunch copy copy

Source: http://www.graceforlife.com/uploaded_images/no_free_lunch-772769.jpg

Second, open source search solutions are not created equal. Some are tool kits; others are ready-to-run systems. Lucid Imagination has a good public relations presence in certain places; for example, San Francisco. For those who monitor the search space, there are some other open source vendors that may provide some options. I particularly like the open source version of Lucene available from Tesuji.eu. Ah, never heard of the outfit, right? I also find the FLAX system available from Lemur Consulting useful as well. I think the issues with Fast Search & Transfer are not going to be resolved by ringing up a single vendor and saying, “We’re ready to go with your open source solution.” The more prudent approach is going to be understanding what the differences among various open source search solutions are and then determining if an organization’s specific requirements match up to one of these firms’ service offerings. Open source, therefore, requires some work and I don’t think a knee jerk reaction or a sweeping statement that the Microsoft announcement will deliver a “free pass” is accurate.

Read more

Quote to Note: Dick Brass on MSFT Innovation

February 6, 2010

I met Dick Brass many years ago. He left Oracle and joining Microsoft to contribute to a confidential initiative. Mr. Brass worked on the ill-fated Microsoft tablet, which Steve Jobs has reinvented as a revolutionary device. I am not a tablet guy, but one thing is certain. Mr. Jobs knows how to work public relations. Mr. Brass published an article in the New York Times, and it captured the attention of Microsoft and millions of readers who enjoyed Mr. Brass’s criticism of his former employer. I have no opinion about Microsoft, its administrative methods, or its ability to innovate. I did find a quote to note in the write up:

Microsoft is no longer considered the cool or cutting edge place to work. There has been a steady exist of its best and brightest. (“Microsoft’s Creative Destruction”, the New York Times, February 4, 2010, Page 25, column 3, National Edition)

Telling because if smart people don’t work at a company, that company is likely to make less informed decisions than an organization with smarter people. This applies in the consulting world. There are blue chip outfits like McKinsey, Bain, and BCG). Then there are lesser outfits which I am sure you can name because these companies “advertise”, have sales people who “sell” listings, and invent crazy phrases to to create buzz and sales. I am tempted to differentiate Microsoft with a reference to Apple or Google, but I will not. Oh, why did I not post this item before today. The hard copy of my New York Times was not delivered until today. Speed is important in today’s information world.

The quote nails it.

Stephen E Arnold, February 7, 2010

No one paid me to write this, not a single blue chip consulting firm, not a single savvy company. I will report this lack of compensation to the experts at the IRS, which is gearing up for the big day in April.


* Featured
* Interviews
* Profiles

Featured
Microsoft and Mikojo Trigger Semantic Winds across Search Landscape

Semantic technology is blowing across the search landscape again. The word “semantic” and its use in phrases like “semantic technology” has a certain trendiness. When I see the word, I think of smart software that understands information in the way a human does. I also think of computationally sluggish processes and the complexity of language, particularly in synthetic languages like English. Google has considerable investment in semantic technology, but the company wisely tucks it away within larger systems and avoiding the technical battles that rage among different semantic technology factions. You can see Google’s semantic operations tucked within the Ramanathan Guha inventions disclosed in February 2007. Pay attention to the discussion of the system and method for “context”.

image

Gale force winds from semantic technology advocates. Image source: http://www.smh.com.au/ffximage/2008/11/08/paloma_wideweb__470×289,0.jpg

Microsoft’s Semantic Puff

Other companies are pushing the semantic shock troops forward. I read yesterday in Network World’s “Microsoft Talks Up Semantic Search Ambitions.” The article reminded me that Fast Search & Transfer SA offered some semantic functionality which I summarized in the 2006 version of the original Enterprise Search Report (the one with real beef, not tofu inside). Microsoft also purchased Powerset, a company that used some of Xerox PARC’s technology and its own wizardry to “understand” queries and create a rich index. The Network World story reported:

With semantic technologies, which also are being to referred to as Web 3.0, computers have a greater understanding of relationships between different information, rather than just forwarding links based on keyword searches.  The end game for semantic search is “better, faster, cheaper, essentially,” said Prevost, who came over to Microsoft in the company’s 2008 acquisition of search engine vendor Powerset. Prevost is still general manager of Powerset.  Semantic capabilities get users more relevant information and help them accomplish tasks and make decisions, said Prevost.

The payoff is that software understands humans. Sounds good, but it does little to alter the startling dominance of Google in general Web search and the rocket like rise of social search systems like Facebook. In a social context humans tell “friends” about meaning or better yet offer an answer or a relevant link. No search required.

I reported about the complexities of configuring the enterprise search system that Microsoft offers for SharePoint in an earlier Web log post. The challenge is complexity and the time and money required to make a “smart” software system perform to an acceptable level in terms of throughput in content processing and for the user. Users often prefer to ask someone or just use what appears in the top of a search results list.

Read more »
Interviews
Inside Search: Raymond Bentinck of Exalead, Part 2

This is the second part of the interview with Raymond Bentinck of Exalead.

Isn’t this bad marketing?

No. This makes business sense.Traditional search vendors who may claim to have thousands of customers tend to use only a handful of well managed references. This is a direct result of customers choosing technology based on these overblown marketing claims and these claims then driving requirements that the vendor’s consultants struggle to deliver. The customer who is then far from happy with the results, doesn’t do reference calls and ultimately becomes disillusioned with search in general or with the vendor specifically. Either way, they end up moving to an alternative.

I see this all the time with our clients that have replaced their legacy search solution with Exalead. When we started, we were met with much skepticism from clients that we could answer their information retrieval problems. It was only after doing Proof of Concepts and delivering the solutions that they became convinced. Now that our reputation has grown organizations realize that we do not make unsubstantiated claims and do stick by our promises.

What about the shift to hybrid solutions? An appliance or an on premises server, then a cloud component, and maybe some  fairy dust thrown in to handle the security issues?

There is a major change that is happening within Information Technology at the moment driven primarily by the demands placed on IT by the business. Businesses want to vastly reduce the operational cost models of IT provision while pushing IT to be far more agile in their support of the business. Against this backdrop, information volumes continue to grow exponentially.

The push towards areas such as virtual servers and cloud computing are aspects of reducing the operational cost models of information technology provision. It is fundamental that software solutions can operate in these environments. It is surprising, however, to find that many traditional search vendors solutions do not even work in a virtual server environment.

Isn’t this approach going to add costs to an Exalead installation?

No, because another aspect of this is that software solutions need to be designed to make the best use of available hardware resources. When Exalead provided a solution to the leading classified ads site Fish4.co.uk, unlike the legacy search solution we replaced, not only were we able to deploy a solution that met and exceeded their requirements but we reduced the cost of search to the business by 250 percent. A large part of this was around the massively reduced hardware costs associated with the solution.

What about making changes and responding quickly? Many search vendors simply impose a six month or nine month cycle on a deployment. The client wants to move quickly, but the vendor cannot work quickly.

Agility is another key factor. In the past, an organization may implement a data warehouse. This would take around 12 to 18 months to deploy and would cost a huge amount in hardware, software and consultancy fees. As part of the deployment the consultants needed to second guess the questions the business would want to ask of the data warehouse and design these into the system. After the 12 to 18 months, the business would start using the data warehouse and then find out they needed to ask different types of questions than were designed into the system. The data warehouse would then go through a phase of redevelopment which would last many more months. The business would evolve… making more changes and the cycle would go on and on.

With Exalead, we are able to deploy the same solution in a couple months but significantly there is no need to second guess the questions that the business would want to ask and design them into the system.

This is the sort of agile solution that businesses have been pushing their IT departments to deliver for years. Businesses that do not provide agile IT solutions will fall behind their competitors and be unable to react quickly enough when the market changes.

One of the large UK search vendors has dozens of niche versions of its product. How can that company keep each of these specialty products up to date and working? Integration is often the big problem, is it not?

The founders of Exalead took two years before starting the company to research what worked in search and why the existing search vendors products were so complex. This research led them to understand that the search products that were on the marketplace at the time all started as quite simple products designed to work on relatively low volumes of information and with very limited functional capabilities. Over the years, new functionality has been added to the solutions to keep abreast of what competitors have offered but because of how the products were originally engineered they have not been clean integrations. They did not start out with this intention but search has evolved in ways never imagined at the time these solutions were originally engineered.

Wasn’t one of the key architects part of the famous AltaVista.com team?

Yes. In fact, both of the founders of Exalead were from this team.

What kind of issues occur with these overly complex products?

As you know, this has caused many issues for both vendors and clients. Changes in one part of the solution can cause unwanted side effects in another part. Trying to track down issues and bugs can take a huge amount of time and expense. This is a major factor as to why we see the legacy search products on the market today that are complex, expensive and take many months if not years to deploy even for simple requirements.

Exalead learned from these lessons when engineering our solution. We have an architecture that is fully object-orientated at the core and follows an SOA architecture. It means that we can swap in and out new modules without messy integrations. We can also take core modules such as connectors to repositories and instead of having to re-write them to meet specific requirements we can override various capabilities in the classes. This means that the majority of the code that has gone through our quality-management systems remains the same. If an issue is identified in the code, it is a simple task to locate the problem and this issue is isolated in one area of the code base. In the past, vendors have had to rewrite core components like connectors to meet customers’ requirements and this has caused huge quality and support issues for both the customer and the vendor.

What about integration? That’s a killer for many vendors in my experience.

The added advantage of this core engineering work means that for Exalead integration is a simple task. For example, building new secure connectors to new repositories can be performed in weeks rather than months. Our engineers can take this time saved to spend on adding new and innovative capabilities into the solution rather than spending time worrying about how to integrate a new function without affecting the 1001 other overlaying functions.

Without this model, legacy vendors have to continually provide point-solutions to problems that tend to be customer-specific leading to a very expensive support headache as core engineering changes take too long and are too hard to deploy.

I heard about a large firm in the US that has invested significant sums in retooling Lucene. The solution has been described on the firm’s Web site, but I don’t see how that engineering cost is offset by the time to market that the fix required. Do you see open source as a problem or a solution?

I do not wake up in the middle of the night worrying about Lucene if that is what you are thinking! I see Lucene in places that have typically large engineering teams to protect or by consultants more interested in making lots of fees through its complex integration. Neither of which adds value to the company in, for example, reducing costs of increasing revenue.

Organizations that are interested in providing cost effective richly functional solutions are in increasing numbers choosing solutions like Exalead. For example, The University of Sunderland wanted to replace their Google Search Appliance with a richer, more functional search tool. They looked at the marketplace and chose Exalead for searching their external site, their internal document repositories plus providing business intelligence solutions over their database applications such as student attendance records. The search on their website was developed in a single day including the integration to their existing user interface and the faceted navigation capabilities. This represented not only an exceptionally quick implementation, far in excess of any other solution on the marketplace today but it also delivered for them the lowest total cost of ownership compared to other vendors and of course open-source.

In my opinion, Lucene and other open-source offerings can offer a solution for some organizations but many jump on this bandwagon without fully appreciating the differences between the open source solution and the commercially available solutions either in terms of capability or total cost. It is assumed, wrongly in many instances, that the total cost of ownership for open source must be lower than the commercially available solutions. I would suggest that all too often, open source search is adopted by those who believe the consultants who say that search is a simple commodity problem.

What about the commercial enterprise that has had several search systems and none of them capable of delivering satisfactory solutions? What’s the cause of this? The vendors? The client’s approach?

I think the problem lies more with the vendors of the legacy search solutions than with the clients. Vendors have believed their own marketing messages and when customers are unsatisfied with the results have tended to blame the customers not understanding how to deploy the product correctly or in some cases, the third-party or system integrator responsible for the deployment.

One client of ours told me recently that with our solution they were able to deliver in a couple months what they failed to do with another leading search solution for seven years. This is pretty much the experience of every customer where we have replaced an existing search solution. In fact, every organization that I have worked with that has performed an in-depth analysis and comparison of our technology against any search solution has chosen Exalead.

In many ways, I see our solution as not only delivering on our promises but also delivering on the marketing messages that our competitors have been promoting for years but failing to deliver in reality.

So where does Exalead fit? The last demo I received showed me search working within a very large, global business process. The information just appeared? Is this where search is heading?

In the year 2000, and every year since, a CEO of one of the leading legacy search vendors made a claim that every major organization would be using their brand of meaning based search technology within two years.

I will not be as bold as him but it is my belief that in less than five years time the majority of organizations will be using search based applications in mission critical applications.

For too long software vendors have been trying to convince organizations, for example, that it was not possible to deploy mission critical solutions such as customer 360 degree customer view, Master Data Management, Data Warehousing or business intelligence solutions in a couple months, with no user training, with with up-to-the-minute information, with user friendly interfaces, with a low cost per query covering millions or billions of records of information.

With Exalead this is possible and we have proven it in some of the world’s largest companies.

How does this change the present understanding of search, which in my opinion is often quite shallow?

Two things are required to change the status quo.

Firstly, a disruptive technology is required that can deliver on these requirements and secondly businesses need to demand new methods of meeting ever greater business requirements on information.

Today I see both these things in place. Exalead has proven that our solutions can meet the most demanding of mission critical requirements in an agile way and now IT departments are realizing that they cannot support their businesses moving forward by using traditional technologies.

What do you see as the trends in enterprise search for 2010?

Last year was a turning point around Search Based Applications. With the world-wide economy in recession, many companies have put projects on hold until things were looking better. With economies still looking rather weak but projects not being able to be left on ice for ever, they are starting to question the value of utilizing expensive, time consuming and rigid technologies to deliver these projects.

Search is a game changing technology that can deliver more innovative, agile and cheaper solutions than using traditional technologies. Exalead is there to deliver on this promise.

Search, a commodity solution? No.

Editor’s note: You can learn more about Exalead’s search enable applications technology and method at the Exalead Web site.

Stephen E Arnold, February 4, 2010

I wrote this post without any compensation. However, Mr. Bentinck, who lives in a far off land, offered to buy me haggis, and I refused this tasty bribe. Ah, lungs! I will report the lack of payment to the National Institutes of Health, an outfit concerned about alveoli.
Profiles
Vyre: Software, Services, Search, and More

A happy quack to the reader who sent me a link to Vyre, whose catchphrase is “dissolving complexity.” The last time I looked at the company, I had pigeon holed it as a consulting and content management firm. The news release my reader sent me pointed out that the company has a mid market enterprise search solution that is now at version 4.x. I am getting old, or at least too sluggish to keep pace with content management companies that offer search solutions. My recollection is that Crown Point moved in this direction. I have a rather grim view of CMS because software cannot help organizations create high quality content or at least what I think is high quality content.

The Wikipedia description of Vyre matches up with the information in my archive:

VYRE, now based in the UK, is a software development company. The firm uses the catchphrase “Enterprise 2.0? to describe its enterprise  solutions for business.The firm’s core product is Unify. The Web based services allows users to build applications and content management. The company has technology that manages digital assets. The firm’s clients in 2006 included Diageo, Sony, Virgin, and Lowe and Partners. The company has reinvented itself several times since the late 1990s doing business as NCD (Northern Communication and Design), Salt, and then Vyre.

You can read Wikipedia summary here. You can read a 2006 Butler Group analysis here. My old link worked this evening (March 5, 2009), but click quickly.  In my files I had a link to a Vyre presentation but it was not about search. Dated 2008, you may find the information useful. The Vyre presentations are here. The link worked for me on March 5, 2009. The only name I have in my archive is Dragan Jotic. Other names of people linked to the company are here. Basic information about the company’s Web site is here. Traffic, if these data are correct, seem to be trending down. I don’t have current interface examples. The wiki for the CMS service is here. (Note: the company does not use its own CMS for the wiki. The wiki system is from MedioWiki. No problem for me, but I was curious about this decision because the company offers its own CMS system.  You can get a taste of the system here.

image

Administrative Vyre screen.

After a bit of poking around, it appears that Vyre has turned up the heat on its public relations activities. The Seybold Report here presented a news story / news release about the search system  here. I scanned the release and noted this passage as interesting for my work:

…version 4.4 introduces powerful new capabilities for performing facetted and federated searching across the enterprise. Facetted search provides immediate feedback on the breakdown of search results and allows users to quickly and accurately drill down within search results. Federated search enables users to eradicate content silos by allowing users to search multiple content repositories.

Vyre includes a taxonomy management function with its search system, if I read the Seybold article correctly. I gravitate to the taxonomy solution available from Access Innovations, a company run by my friend and colleagues Marje Hlava and Jay Ven Eman. Their system generates ANSI standard thesauri and word lists, which is the sort of stuff that revs my engine.

Endeca has been the pioneer in the enterprise sector for “guided navigation” which is a synonym in my mind for faceted search. Federated search gets into the functions that I associated with Bright Planet, Deep Web Technologies, and Vivisimo, among others. I know that shoving large volumes of data through systems that both facetize content and federated it are computationally intensive. Consequently, some organizations are not able to put the plumbing in place to make these computationally intensive systems hum like my grandmother’s sewing machine.

If you are in the market for a CMS and asset management company’s enterprise search solution, give the company’s product a test drive. You can buy a report from UK Data about this company here. I don’t have solid pricing data. My notes to myself record the phrase, “Sensible pricing.” I noted that the typical cost for the system begins at about $25,000. Check with the company for current license fees.

Stephen Arnold, March 6, 2009
Latest News
Mobile Devices and Their Apps: Search Gone Missing

VentureBeat’s “A Pretty Chart of Top Apps for iPhone, Android, BlackBerry” shocked me. Not a little. Quite a bit. You will want to look at the top apps f

Microsoft and Mikojo Trigger Semantic Winds across Search Landscape

January 28, 2010

Semantic technology is blowing across the search landscape again. The word “semantic” and its use in phrases like “semantic technology” has a certain trendiness. When I see the word, I think of smart software that understands information in the way a human does. I also think of computationally sluggish processes and the complexity of language, particularly in synthetic languages like English. Google has considerable investment in semantic technology, but the company wisely tucks it away within larger systems and avoiding the technical battles that rage among different semantic technology factions. You can see Google’s semantic operations tucked within the Ramanathan Guha inventions disclosed in February 2007. Pay attention to the discussion of the system and method for “context”.

image

Gale force winds from semantic technology advocates. Image source: http://www.smh.com.au/ffximage/2008/11/08/paloma_wideweb__470x289,0.jpg

Microsoft’s Semantic Puff

Other companies are pushing the semantic shock troops forward. I read yesterday in Network World’s “Microsoft Talks Up Semantic Search Ambitions.” The article reminded me that Fast Search & Transfer SA offered some semantic functionality which I summarized in the 2006 version of the original Enterprise Search Report (the one with real beef, not tofu inside). Microsoft also purchased Powerset, a company that used some of Xerox PARC’s technology and its own wizardry to “understand” queries and create a rich index. The Network World story reported:

With semantic technologies, which also are being to referred to as Web 3.0, computers have a greater understanding of relationships between different information, rather than just forwarding links based on keyword searches.  The end game for semantic search is “better, faster, cheaper, essentially,” said Prevost, who came over to Microsoft in the company’s 2008 acquisition of search engine vendor Powerset. Prevost is still general manager of Powerset.  Semantic capabilities get users more relevant information and help them accomplish tasks and make decisions, said Prevost.

The payoff is that software understands humans. Sounds good, but it does little to alter the startling dominance of Google in general Web search and the rocket like rise of social search systems like Facebook. In a social context humans tell “friends” about meaning or better yet offer an answer or a relevant link. No search required.

I reported about the complexities of configuring the enterprise search system that Microsoft offers for SharePoint in an earlier Web log post. The challenge is complexity and the time and money required to make a “smart” software system perform to an acceptable level in terms of throughput in content processing and for the user. Users often prefer to ask someone or just use what appears in the top of a search results list.

Read more

Enterprise Search Deployment Time

January 14, 2010

Our Overflight service snagged a news item in May 2009. The title was “Airbus Licenses Vivisimo Velocity Search Platform”. The release was good news for Vivisimo and straight forward, saying:

Vivisimo (Vivisimo.com), a leader in enterprise search, has entered into a major agreement with aircraft manufacturer Airbus for the license of the Vivisimo Velocity Search Platform. The license covers the corporate-wide intranet for Airbus and some extranet services for Airbus customers, indexing up to two petabytes of data for more than 50,000 users.  Vivisimo had already provided search for a group within Airbus before winning the company’s broader corporate business in a competitive setting. In a solution proof of concept, Vivisimo Velocity demonstrated its capability to handle the complexity of Airbus’ many data repositories while respecting the company’s various security parameters.

When I read this, I thought that Airbus made a wise decision. A deployment and an evaluation process was used. That’s smart. Most organizations license an engine and then plunge ahead.

The news item I received in my email this morning was equally clear. “Airbus Lifts Off Vivisimo Velocity to Provide More than 50,000 Users the Power of Search” states:

Vivisimo (Vivisimo.com), a leader in enterprise search, today announced the successful installation of its award-winning Vivisimo Velocity Search Platform with the world’s leading aircraft manufacturer Airbus.  Through this deployment, Velocity is powering search across its corporate-wide intranet and its customers, indexing up to two petabytes of data for more than 50,000 users.

After a quote the news release said:

In less than one month since the completed installation of Velocity, search has become the fastest growing application on the customer portal (AirbusWorld) homepage in terms of usage, which has resulted in increased page views.

I think the uptake information is good news for Airbus users and for Vivisimo. The other upside of my having these two statements is that it is possible to calculate roughly the time required for a prudent organization to move from decision to deploy to actual availability of the search service. The deal was signed in May 2009, and the system went online about January 2010. That means that after the trial period, another six months was required to deploy the system.

Several observations:

  • Appliance vendors have indicated that their solution requires less time. One vendor pegs the deployment time in a matter of days. Another suggested a month for a complicated installation.
  • The SaaS search vendors have demonstrated a deployment time of less than four hours for one test we ran for a governmental unit. Other vendors have indicated times in the days to two week periods, depending on the complexity of the installation. The all time speed champ is Blossom.com, which we used for the Threat Open Source Information Gateway project.
  • System centric vendors with solutions that snap into SharePoint, for example, have indicated an installation time of a half day to as much as a week, depending on the specific SharePoint environment.
  • Tool kit vendors typically require weeks or months to deploy an enterprise search system. However, in certain situations like a search system for a major publishing company’s online service, the time extended beyond six months.

What’s this mean? Vivisimo’s installation time is on a par with other high profile systems’ deployment times. The reason is that the different components must be integrated with the clients’ systems. In addition, certain types of customization—not always possible with appliances or SaaS solutions—are like any other software set up. Tweaking takes time.

With Google’s emphasis on speed, the Google Search Appliance is positioning itself to be a quicker install that some of the high profile enterprise systems.

What’s this mean? It looks to me that one group of vendors and services can deliver speedier installations. Other vendors offset speed with other search requirements. Beyond that obvious statement, I will have to think about the cost implications of deployment time.

Stephen E. Arnold, January 14, 2010

No one paid me to write this short article. Why would anyone pay me? It’s been 65 years of financial deprivation. I think I have to report this monetary fact to the Social Security folks.

Lazarus, Azure Chip Consultants, and Search

January 8, 2010

A person called me today to tell me that a consulting firm is not accepting my statement “Search is dead”. Then I received a spam email that said, “Search is back.” I thought, “Yo, Lazarus. There be lots of dead search vendors out there. Example: Convera.

Who reports that search has risen? An azure chip consultant! Here’s what raced through my addled goose brain as I pondered the call and the “search is back” T shirt slogan:

In 2006, I was sitting on a pile of research about the search market sector. The data I collected included:

  • Interviews with various procurement officers, search system managers, vendors, and financial analysts
  • My own profiles of about 36 vendors of enterprise search systems plus the automated content files I generate using the Overflight system. A small scale version is available as a demo on ArnoldIT.com
  • Information I had from my work as a systems engineering and technical advisor to several governments and their search system procurement teams
  • My own experience licensing, testing, and evaluating search systems for clients. (I started doing this work after we created in 1993 The Point (Top 5% of the Internet) and sold it to Lycos, a unit of CMGI. I figured I should look into what Lycos was doing so I could speak with authority about its differences from BRS/Search, InQuire, Dialog (RECON), and IBM STAIRS III. I had familiarity with most of these systems through various projects in my pre Point (Top 5% of the Internet life).
  • My Google research funded by the now-defunct BearStearns outfit and a couple of other well heeled organizations.

What was clear in 2006 was the following:

First, most of the search system vendors shared quite a bit of similarity. Despite the marketing baloney, the key differentiators among the flagship systems in 2006 were minor. Examples range from their basic architecture to their use of stemming to the methods of updating indexes. There were innovators, and I pointed out these companies in my talks and various writings, including the three editions of the Enterprise Search Report I wrote before I fell ill in February 2007 and quit doing that big encyclopedia type publication. These similarities made it very clear to me that innovation for enterprise search was shifting from the plain old key word indexing of structured records available since the advent of RECON and STAIRS to a more freeform approach with generally lousy relevance.

image

Get information access wrong, and some folks may find a new career. Source: http://www.seeing-stars.com/Images/ScenesFromMovies/AmericanBeautyMrSmiley%28BIG%29.JPG

Second, the more innovative vendors were making an effort in 2006 to take a document and provide some sort of context for it. Without a human indexer to assign a classification code to a document that is about marketing but does not contain the word “marketing”, this was rocket science. But when I examined these systems, there were two basic approaches which are still around today. The first was to use statistical methods to put documents together and make inferences and the other was a variation on human indexing but without humans doing most of the work. The idea was that a word list would contain synonyms. There were promising demonstrations of software methods that could “read” a document, but there were piggy and of use where money was no object.

Third, the Google approach which used social methods—that is, a human clicking on a link—were evident but not migrating to the enterprise world. Google was new but to make their 2006 method hum, lots of clicks were needed. In the enterprise, most documents never get clicked, so the 2006 Google method was truly lousy. Google has made improvements, mostly by implementing the older search methods, not by pushing the envelope as it has been doing with its Web search and dataspace efforts.

Fourth, most of the search vendors were trying like Dickens to get out of a “one size fits all” approach to enterprise search. Companies making sales were focusing on a specific niche or problem and selling a package of search and content searching that solved one problem. The failure of the boil the ocean approach was evident because user satisfaction data from my research funded by a government agency and other clients revealed that about two thirds of the users of an enterprise search system were dissatisfied or very dissatisfied with that search system. The solution, then, was to focus. My exemplary case was the use of the Endeca technology to allow Fidelity UK sales professionals to increase their productivity with content pushed to them using the Endeca system. The idea was that a broker could click on a link and the search results were displayed. No searching required. ClearForest got in the game by analyzing the dealer warranty repair comments. Endeca and ClearForest were harbingers of focus. ClearForest is owned by Thomson Reuters and in the open source software game too.

When I wrote the article in Online Magazine for Barbara Quint, one of my favorite editors, I explained these points in more detail. But it was clear that the financial pressures on Convera, for example, and the difficulty some of the more promising vendors like Entopia were having made the thin edge of survival glint in my desk lamp’s light. Autonomy by 2006 had shifted from search and organic growth to inorganic growth fueled by acquisitions that were adjacent to search.

Read more

Google Apps: The Microsoft View

December 17, 2009

Navigate to “The Grill: Microsoft’s Chris Capossela on Google, Twitter and that Blue Screen of Death.” One of the questions threw me; for example, “How similar was working at a restaurant with working at Microsoft?” Okay. For me the most interesting passage was:

Google seems to be doing Google Docs in part just to hurt your revenue. It is making some enterprises reassess the value they are getting from Office, especially if they don’t do any customizations or line-of-business apps. How do you convince CIOs there is value there?

Take a people process like an annual performance review. They are usually written in Word, but the end result goes off in some HR system like PeopleSoft or SAP. Budgeting is another very horizontal process. Most companies feel a lot of pain around the workflow and approval processes. They would love for Office to be more seamlessly integrated into their PeopleSoft system or SAP systems. Another good example is Accenture. They’ve written a lot of apps around making SharePoint the Facebook of their company. Traditional skills repositories, where people are supposed to update their skills into a line-of-business app, often struggle despite their over-designed back-end because it’s not a part of anyone’s daily process. With SharePoint, their consultants can articulate what they’re working on in a more unstructured way. The People Search in SharePoint becomes their expertise finder. It feels like a real social networking tool.

Interesting to me.

Stephen E. Arnold, December 17, 2009

I wish to disclose to the General Services Administration that this is an uncompensated post. (As well it should be.)

Cicumvallation: Reed Elsevier and Thomson as Vercingetorix

November 27, 2009

Google Scholar Gets Smart in Legal Information

One turkey received a presidential pardon. Other turkeys may not be so lucky on November 26, 2009, when the US celebrates Thanksgiving. I am befuddled about this holiday. There are not too many farmers in Harrod’s Creek. The fields contain the abandoned foundations of McMansions that the present economic meltdown have left like Shelly’s statue of Ozymandius. The “half buried in the sand” becomes half built homes in the horse farm.

As Kentuckians in my hollow give thanks for a day off from job hunting,, I am sitting by the goose pond trying to remember what I read in my copy of Caesar’s De Bello Gallico. I know Caesar did not write this memoir, but his PR bunnies did a pretty good job. I awoke this morning thinking about the connection between the battle of Alesia and what is now happening to the publishing giants Reed-Elsevier and Thomson Reuters. The trigger for this mental exercise was Google’s announcement that it had added legal content to Google Scholar.

vercingetorix

What’s Vercingetorix got to do with Google, Lexis, and Westlaw? Think military strategy. Starvation, death, surrender, and ritual killing. Just what today’s business giants relish.

Google has added the full text of US federal cases and state cases. The coverage of the federal cases, district and appellate, is from 1924 to the present. US state cases cover 1950 to the present. Additional content will be added; for example, I have one source that suggested that the Commonwealth of Virginia Supreme Court will provide Google with CD ROMs of cases back to 1924. Google, according to this source, is talking with other sources of US legal information and may provide access to additional legal information as well. What are these sources? Possibly
Public.Resource.Org and possibly Justia.org, among others.

The present service includes:

  • The full text of the legal document
  • Footnotes in the legal document
  • Page numbers in the legal document
  • Page breaks in the legal document
  • Hyperlinks in the legal document to cases
  • A tab to show how the case was cited in other documents
  • Links to non legal documents that cite a case.

You can read various pundits, mavens, and azure=chip consultants’ comments on this Google action at this link.

You may want to listen to a podcast called TWIL and listened to the November 23, 2009, show on which Google Scholar was discussed for about a half hour. You can find that discussion on iTunes. Just search for TWIL and download the program “Social Lubricants and Frictions.”

On the surface, the Google push into legal information is a modest amount of data in terms of Google’s daily petabyte flows. The service is easy to use, but the engineering required to provide access to the content strikes me as non-trivial. Content transformation is an expensive proposition, and the cost of fiddling with legal information is one of the primary reasons commercial online services have had to charges hefty fees to look at what amounts to taxpayer supported, public information.

The good news is that the information is free, easily accessible even from an iPhone or other mobile device. The Google service does the standard Google animal tricks of linking, displaying content with minimal latency, and updating new content in a a minute or so that content becoming available to Google software Dyson vacuum cleaner.

So what?

This service is similar to others I have written about in my three Google monographs. Be aware. My studies are not Sergey-and-Larry-eat-pizza books. I look at the Google open source technical and business information. I ignore most of what Google’s wizards “say” in public. These folks are “running the game plan” and add little useful information for my line of work. Your mileage may differ. If so, stop reading this blog post and hunt down a cheerful non-fiction Google book by a real live journalist. That’s not my game. I am an addled goose.

Now let me answer the “so what”.

First, the Google legal content is an incremental effort for the Google. This means that Google’s existing infrastructure, staff, and software can handle the content transformation, parsing, indexing, and serving. No additional big-buck investment is needed. In fact, I have heard that the legal content project, like Google News, was accomplished in the free time for play that Google makes available to its full time professionals. A bit of thought should make clear to you that commercial outfits who have to invest to handle legal content in a Google manner have a cost problem right out of the starting blocks.

Second, Google is doing content processing that should be the responsibility of the US government. I know. I know. The US government wants to create information and not compete with commercial outfits. But the outfits manipulating legal information have priced it so that most everyday Trents and Whitneys cannot afford to use these commercial services. Even some law firms cannot afford these services. Pro bono attorneys don’t have enough money to buy yellow pads to help their clients. Even kind hearted attorneys have to eat before they pay a couple a hundred bucks to run a query on the commercial online services from publicly traded companies out to make their shareholders have a great big financial payday. Google is operating like a government when it processes legal information and makes it available without direct charge to the user. The monetization takes place but on a different business model foundation. That also spells T-R-O-U-B-L-E for the commercial online services like Lexis and Westlaw.

Read more

Louisville Meet Up Lights Up Ali Center

November 5, 2009

ArnoldIT.com’s TheSeed2020 meet up for women- and minority-owned businesses was a hit. The event, held at the Muhammad Ali Center in downtown Louisville, Kentucky – attracted more than 80 people. The purpose of the event was to explore the effectiveness of social media marketing. The ArnoldIT.com team – Stuart Adams, Esq., Don Anderson, Constance Ard, Shaun Livingston, Keisha Mabry, Rob Redmon, Tony Safina, and Stuart Schram – used Facebook.com, Twitter.com, and email to announce the event.  The event’s Web site was produced using the SquareSpace.com service. You can look at the information about the events and peruse each of the presentations at http://www.theseed2020.com. A short video about the event will be made available in the next two weeks and posted on the ArnoldIT.com Web site.

emeka tess

Dr. Emeka Akaezuwa and Tess (ArnoldIT.com’s SharePoint expert) argue about the technical nuances of SQL Server 10.

I noted a number of presentations that were outstanding. I want to highlight Dr. Emeka Akaezuwa’s talk about the principles that have guided him through his operation of the successful Gaviri Technologies software company and his role in the Global Literacy Project. The crowd listened with rapt attention as Dr. Akaezuwa described his journey from Nigeria to his PhD in computer science from Rutgers University to his running a software company, raising a family, and working one month each year in Africa to make it possible for children to learn to read. More information about Dr. Akaezuwa’s company Gaviri is here. Information about the GLP foundation is here. Among many wonderful talks, his set a benchmark at TheSeed2020. I was disappointed that local Louisville business reporters did not avail themselves of the opportunity to speak with Dr. Akaezuwa and other presenters at this event. Their loss in my opinion.

the team

Some of the ArnoldIT.com team. Back row, left to right: Keisha Mabry, MBA and Constance Ard, MLS. Front row, left to right: Don Anderson, Dr. Emeka Akaezuwa, and Stuart Adams, Esq.

Key findings from the event were:

  1. Social media is as labor intensive as more traditional marketing methods. With carefully tailored social media messages, it is possible to reach a larger number of potential attendees than with more traditional methods.
  2. The cost of mounting a social media campaign is the time required to prepare the various messages and materials. An organization jumping into social media marketing without the skill and appropriate human resources may find that the new tools may not be an automatic home run. The ArnoldIT.com motto “Nothing worthwhile comes easy” is a message to consider.
  3. The people tracking social media messages who attend the event are definitely technically aware and computer oriented. The companies represented at the event had individuals in their firm who understood and used social media. One surprise was that a number of the conversations among attendees were about information, search, and online marketing. The program was designed to represent a wide range of businesses, technology was a unifying factor among the audience.
  4. Sponsors who expect a traditional trade show set up will have to learn new ways of engaging attendees. The emphasis was upon face-to-face conversations and a good social presence. Wall flowers are as forlorn in a meet as they were as at a grade school dance. Attendees were engaging. The two sponsors of the program were out of their element.
  5. Attendees appreciated the opportunity to learn and network. Unlike traditional trade shows where grousing is the conference sport, the attendees at this event were enthusiastic. One person told me that the evening was “fun”; another said, “Joyful”. I learned that these comments made me happy to have had the opportunity to support the event.

One attendee—a minority, female PhD in point of fact—point out that there was a single male minority giving a speech. There were two guys. I suppose I will have to muster the strength speak to Ms Ard and Ms Mabry about their bias toward smart, high-powered, successful females.

To wrap up, ArnoldIT.com has refined its social media communications methods. If your firm wants to move forward with a well-organized, effective meet up, contact seaky2000 at yahoo dot com. The managers of the ArnoldIT.com meet up service are Keisha Mabry, MBA, and Constance Ard, MLS.

I will post a link to the video for the event when it becomes available.

Stephen Arnold, November 5, 2009

I paid myself to write this article about my own business. I even pay the people whom I thanked for their outstanding work. To whom do I report this crass marketing work? Maybe I can email the White House. In case this is not clear, this post is an advertisement, a pitch, a shameless effort to hype my colleagues, and a boastful message about a job well done. Too bad the Louisville business associations could not make this type of business program part of their agenda. Guess those outfits are too busy with more important activities than highlighting individuals who are thriving in a lousy economy. Ooops. I am supposed to disclose, not criticize the status quo. Wow, I am sorry.

European Search Vendor Round Up

September 16, 2009

Updated at 8 29 am, September 17, 2009, to 23 vendors

I received a call from a very energetic, quite important investment wizard from a “big” financial firm yesterday. Based in Europe, the caller was having a bad hair day, and he seemed pushy, almost angry. I couldn’t figure out why he was out of sorts and why he was calling me. I asked him. He said, “I read your Web log and you annoy me with your poor coverage of European search vendors.”

I had to admit that I was baffled. I mentioned the companies that I tracked. But he wanted me to do more. I pointed out that the Web log is a marketing vehicle and he can pay me to cover his favorite investment in search. That really set him off. He wanted me to be a journalist (whatever that meant) and provide more detailed information about European vendors. And for free.

Right.

After the call, I took a moment and went through my files to see which European vendors I have mentioned and the general impression I have of each of these companies. The table below summarizes the companies I have either profiled in my for fee studies or the companies I have mentioned in this diary / marketing Web log. You may disagree with my opinions. I know that the azure chip consultants at Gartner, Ovum, Forrester, and others certainly do. But that’s understandable. The addled geese here in Harrod’s Creek actually install systems and test them, a step that most of the azure chip crowd just don’t have time because of their exciting work to generate enough revenue to keep the lights on, advise clients, and conduct social network marketing events. Just my opinion, folks. I am entitled to those despite the wide spread belief that I should be in the Happy Geese Retirement Home.

Vendor Function Opinion
Autonomy Search and eDiscovery One of the key players in content processing; good marketing
Bitext Semantic components Impressive technology
Brox Open source semantic tools Energetic, marketing centric open source play
Empolis GmbH Information management and business intel No cash tie with Attensity
Exalead Next generation application platform The leader in search and content processing technology
Expert System Semantic toolkit Works; can be tricky to get working the way the goslings want
Fast ESP Enterprise search, business intelligence, and everything else Legacy of a police investigation hangs over the core technology
InfoFinder Full featured enterprise search system my contact in Europe reports that this is a European technology. Listed customers are mostly in Norway.
Interse Scan Jour SharePoint enterprise search alternative Based in Copenhagen, the Interse system adds useful access functions to SharePoint; sold in Dec 2008
Intellisearch Enterprise search; closed US office Basic search positioned as a one size fits all system
Lumur Consulting Flax is a robust enterprise search system I have written positively about this system. Continues to improve with each release of the open source engine.
Lexalytics Sentiment analysis tools A no cash merger with a US company and UK based Infonics;
Linguamatics Content processing focused on pharma Insists that it does not have a price list
Living-e AG Information management No cash tie with Attensity
Mindbreeze Another SharePoint snap in for search Trying hard; interface confusing to some goslings
Neofonie Vertical search Founded in the late 1990s, created Fireball.de
Ontoprise GmbH Semantic search The firm’s semantic Web infrastructure product, OntoBroker, is at Version 5.3
Pertimm Enterprise search Now positioned as information management
PolySpot Enterprise search with workflow Now at Version 4.8, search, work flow, and faceted navigation
SAP Trex Search tool in NetWeaver; works with R/3 content Works; getting long in the tooth
Sinequa Enterprise search with workflow Now at Version 7, the system includes linguistic tools
Sowsoft High speed desktop search Excellent, lightweight desktop search
SurfRay Now focused on SharePoint Uncertain; emerging from some business uncertainties
Temis Content processing and discovery Original code and integrated components
Tesuji Lucene enterprise search Highly usable and speedy; recommended for open source installations

Updated at 8 29 am Eastern, September 17, 2009

Read more

Microsoft Fast for Portals

August 17, 2009

Author’s Note: The images in this Web log post are the property of Microsoft Corp. I am capturing my opinion based on a client’s request to provide feedback about “going with Fast for SharePoint” versus a third party solution from a Microsoft Certified Partner. If you want happy thoughts about Microsoft, Fast ESP, and search in SharePoint environments, look elsewhere. If you want my opinions, read on. Your mileage may vary. If you have questions about how the addled goose approaches these write ups, check out the editorial policy here.

Introduction

Portals are back. The idea is that a browser provides a “door” to information and applications is hot again. I think. You can view a video called “FAST: Building Search Driven Portals with Microsoft Office SharePoint Server 2007 and Microsoft Silverlight” to get the full story. I went back through my SharePoint search links. I focused on a presentation given in 2008 by Two Microsoft Fast engineers–Jan Helge Sageflåt and Stein Danielsen.

After watching the presentation for a second time, I formed several impressions of what seems to be the general thrust of the Microsoft Fast ESP search system. I have heard reports that Microsoft is doing a full court press to get Microsoft-centric organizations to use Fast ESP as the industrial strength search system.

Let me make several observations about the presentation by the Microsoft Fast engineers and then conclude with a suggestion that caution and prudence may be fine dinner companions before one feasts on Fast ESP. Portals are not a substitute for making it easy for employees to locate the item of information needed to answer a run-of-the-mill business information need.

Observations about the 2008 Demo

First, the presentation focuses on building interfaces and making connections to content in SharePoint. Most organizations want to connect to the content scattered on servers, file systems, and enterprise application software data stores. That is job one or it was until the financial meltdown. Now organizations want to acquire, merge, search, and tap into social content. Much of that information has a short shelf life. The 2008 presentation did not provide me with evidence that the Microsoft Fast ESP system could:

  • Acquire large flows of non-SharePoint content
  • Process that information without significant latency
  • Identify the plumbing needed to handle flows of real time content from RSS feeds and the new / updated content from a SharePoint system.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta