Quote to Note: Dick Brass on MSFT Innovation

February 6, 2010

I met Dick Brass many years ago. He left Oracle and joining Microsoft to contribute to a confidential initiative. Mr. Brass worked on the ill-fated Microsoft tablet, which Steve Jobs has reinvented as a revolutionary device. I am not a tablet guy, but one thing is certain. Mr. Jobs knows how to work public relations. Mr. Brass published an article in the New York Times, and it captured the attention of Microsoft and millions of readers who enjoyed Mr. Brass’s criticism of his former employer. I have no opinion about Microsoft, its administrative methods, or its ability to innovate. I did find a quote to note in the write up:

Microsoft is no longer considered the cool or cutting edge place to work. There has been a steady exist of its best and brightest. (“Microsoft’s Creative Destruction”, the New York Times, February 4, 2010, Page 25, column 3, National Edition)

Telling because if smart people don’t work at a company, that company is likely to make less informed decisions than an organization with smarter people. This applies in the consulting world. There are blue chip outfits like McKinsey, Bain, and BCG). Then there are lesser outfits which I am sure you can name because these companies “advertise”, have sales people who “sell” listings, and invent crazy phrases to to create buzz and sales. I am tempted to differentiate Microsoft with a reference to Apple or Google, but I will not. Oh, why did I not post this item before today. The hard copy of my New York Times was not delivered until today. Speed is important in today’s information world.

The quote nails it.

Stephen E Arnold, February 7, 2010

No one paid me to write this, not a single blue chip consulting firm, not a single savvy company. I will report this lack of compensation to the experts at the IRS, which is gearing up for the big day in April.


* Featured
* Interviews
* Profiles

Featured
Microsoft and Mikojo Trigger Semantic Winds across Search Landscape

Semantic technology is blowing across the search landscape again. The word “semantic” and its use in phrases like “semantic technology” has a certain trendiness. When I see the word, I think of smart software that understands information in the way a human does. I also think of computationally sluggish processes and the complexity of language, particularly in synthetic languages like English. Google has considerable investment in semantic technology, but the company wisely tucks it away within larger systems and avoiding the technical battles that rage among different semantic technology factions. You can see Google’s semantic operations tucked within the Ramanathan Guha inventions disclosed in February 2007. Pay attention to the discussion of the system and method for “context”.

image

Gale force winds from semantic technology advocates. Image source: http://www.smh.com.au/ffximage/2008/11/08/paloma_wideweb__470×289,0.jpg

Microsoft’s Semantic Puff

Other companies are pushing the semantic shock troops forward. I read yesterday in Network World’s “Microsoft Talks Up Semantic Search Ambitions.” The article reminded me that Fast Search & Transfer SA offered some semantic functionality which I summarized in the 2006 version of the original Enterprise Search Report (the one with real beef, not tofu inside). Microsoft also purchased Powerset, a company that used some of Xerox PARC’s technology and its own wizardry to “understand” queries and create a rich index. The Network World story reported:

With semantic technologies, which also are being to referred to as Web 3.0, computers have a greater understanding of relationships between different information, rather than just forwarding links based on keyword searches.  The end game for semantic search is “better, faster, cheaper, essentially,” said Prevost, who came over to Microsoft in the company’s 2008 acquisition of search engine vendor Powerset. Prevost is still general manager of Powerset.  Semantic capabilities get users more relevant information and help them accomplish tasks and make decisions, said Prevost.

The payoff is that software understands humans. Sounds good, but it does little to alter the startling dominance of Google in general Web search and the rocket like rise of social search systems like Facebook. In a social context humans tell “friends” about meaning or better yet offer an answer or a relevant link. No search required.

I reported about the complexities of configuring the enterprise search system that Microsoft offers for SharePoint in an earlier Web log post. The challenge is complexity and the time and money required to make a “smart” software system perform to an acceptable level in terms of throughput in content processing and for the user. Users often prefer to ask someone or just use what appears in the top of a search results list.

Read more »
Interviews
Inside Search: Raymond Bentinck of Exalead, Part 2

This is the second part of the interview with Raymond Bentinck of Exalead.

Isn’t this bad marketing?

No. This makes business sense.Traditional search vendors who may claim to have thousands of customers tend to use only a handful of well managed references. This is a direct result of customers choosing technology based on these overblown marketing claims and these claims then driving requirements that the vendor’s consultants struggle to deliver. The customer who is then far from happy with the results, doesn’t do reference calls and ultimately becomes disillusioned with search in general or with the vendor specifically. Either way, they end up moving to an alternative.

I see this all the time with our clients that have replaced their legacy search solution with Exalead. When we started, we were met with much skepticism from clients that we could answer their information retrieval problems. It was only after doing Proof of Concepts and delivering the solutions that they became convinced. Now that our reputation has grown organizations realize that we do not make unsubstantiated claims and do stick by our promises.

What about the shift to hybrid solutions? An appliance or an on premises server, then a cloud component, and maybe some  fairy dust thrown in to handle the security issues?

There is a major change that is happening within Information Technology at the moment driven primarily by the demands placed on IT by the business. Businesses want to vastly reduce the operational cost models of IT provision while pushing IT to be far more agile in their support of the business. Against this backdrop, information volumes continue to grow exponentially.

The push towards areas such as virtual servers and cloud computing are aspects of reducing the operational cost models of information technology provision. It is fundamental that software solutions can operate in these environments. It is surprising, however, to find that many traditional search vendors solutions do not even work in a virtual server environment.

Isn’t this approach going to add costs to an Exalead installation?

No, because another aspect of this is that software solutions need to be designed to make the best use of available hardware resources. When Exalead provided a solution to the leading classified ads site Fish4.co.uk, unlike the legacy search solution we replaced, not only were we able to deploy a solution that met and exceeded their requirements but we reduced the cost of search to the business by 250 percent. A large part of this was around the massively reduced hardware costs associated with the solution.

What about making changes and responding quickly? Many search vendors simply impose a six month or nine month cycle on a deployment. The client wants to move quickly, but the vendor cannot work quickly.

Agility is another key factor. In the past, an organization may implement a data warehouse. This would take around 12 to 18 months to deploy and would cost a huge amount in hardware, software and consultancy fees. As part of the deployment the consultants needed to second guess the questions the business would want to ask of the data warehouse and design these into the system. After the 12 to 18 months, the business would start using the data warehouse and then find out they needed to ask different types of questions than were designed into the system. The data warehouse would then go through a phase of redevelopment which would last many more months. The business would evolve… making more changes and the cycle would go on and on.

With Exalead, we are able to deploy the same solution in a couple months but significantly there is no need to second guess the questions that the business would want to ask and design them into the system.

This is the sort of agile solution that businesses have been pushing their IT departments to deliver for years. Businesses that do not provide agile IT solutions will fall behind their competitors and be unable to react quickly enough when the market changes.

One of the large UK search vendors has dozens of niche versions of its product. How can that company keep each of these specialty products up to date and working? Integration is often the big problem, is it not?

The founders of Exalead took two years before starting the company to research what worked in search and why the existing search vendors products were so complex. This research led them to understand that the search products that were on the marketplace at the time all started as quite simple products designed to work on relatively low volumes of information and with very limited functional capabilities. Over the years, new functionality has been added to the solutions to keep abreast of what competitors have offered but because of how the products were originally engineered they have not been clean integrations. They did not start out with this intention but search has evolved in ways never imagined at the time these solutions were originally engineered.

Wasn’t one of the key architects part of the famous AltaVista.com team?

Yes. In fact, both of the founders of Exalead were from this team.

What kind of issues occur with these overly complex products?

As you know, this has caused many issues for both vendors and clients. Changes in one part of the solution can cause unwanted side effects in another part. Trying to track down issues and bugs can take a huge amount of time and expense. This is a major factor as to why we see the legacy search products on the market today that are complex, expensive and take many months if not years to deploy even for simple requirements.

Exalead learned from these lessons when engineering our solution. We have an architecture that is fully object-orientated at the core and follows an SOA architecture. It means that we can swap in and out new modules without messy integrations. We can also take core modules such as connectors to repositories and instead of having to re-write them to meet specific requirements we can override various capabilities in the classes. This means that the majority of the code that has gone through our quality-management systems remains the same. If an issue is identified in the code, it is a simple task to locate the problem and this issue is isolated in one area of the code base. In the past, vendors have had to rewrite core components like connectors to meet customers’ requirements and this has caused huge quality and support issues for both the customer and the vendor.

What about integration? That’s a killer for many vendors in my experience.

The added advantage of this core engineering work means that for Exalead integration is a simple task. For example, building new secure connectors to new repositories can be performed in weeks rather than months. Our engineers can take this time saved to spend on adding new and innovative capabilities into the solution rather than spending time worrying about how to integrate a new function without affecting the 1001 other overlaying functions.

Without this model, legacy vendors have to continually provide point-solutions to problems that tend to be customer-specific leading to a very expensive support headache as core engineering changes take too long and are too hard to deploy.

I heard about a large firm in the US that has invested significant sums in retooling Lucene. The solution has been described on the firm’s Web site, but I don’t see how that engineering cost is offset by the time to market that the fix required. Do you see open source as a problem or a solution?

I do not wake up in the middle of the night worrying about Lucene if that is what you are thinking! I see Lucene in places that have typically large engineering teams to protect or by consultants more interested in making lots of fees through its complex integration. Neither of which adds value to the company in, for example, reducing costs of increasing revenue.

Organizations that are interested in providing cost effective richly functional solutions are in increasing numbers choosing solutions like Exalead. For example, The University of Sunderland wanted to replace their Google Search Appliance with a richer, more functional search tool. They looked at the marketplace and chose Exalead for searching their external site, their internal document repositories plus providing business intelligence solutions over their database applications such as student attendance records. The search on their website was developed in a single day including the integration to their existing user interface and the faceted navigation capabilities. This represented not only an exceptionally quick implementation, far in excess of any other solution on the marketplace today but it also delivered for them the lowest total cost of ownership compared to other vendors and of course open-source.

In my opinion, Lucene and other open-source offerings can offer a solution for some organizations but many jump on this bandwagon without fully appreciating the differences between the open source solution and the commercially available solutions either in terms of capability or total cost. It is assumed, wrongly in many instances, that the total cost of ownership for open source must be lower than the commercially available solutions. I would suggest that all too often, open source search is adopted by those who believe the consultants who say that search is a simple commodity problem.

What about the commercial enterprise that has had several search systems and none of them capable of delivering satisfactory solutions? What’s the cause of this? The vendors? The client’s approach?

I think the problem lies more with the vendors of the legacy search solutions than with the clients. Vendors have believed their own marketing messages and when customers are unsatisfied with the results have tended to blame the customers not understanding how to deploy the product correctly or in some cases, the third-party or system integrator responsible for the deployment.

One client of ours told me recently that with our solution they were able to deliver in a couple months what they failed to do with another leading search solution for seven years. This is pretty much the experience of every customer where we have replaced an existing search solution. In fact, every organization that I have worked with that has performed an in-depth analysis and comparison of our technology against any search solution has chosen Exalead.

In many ways, I see our solution as not only delivering on our promises but also delivering on the marketing messages that our competitors have been promoting for years but failing to deliver in reality.

So where does Exalead fit? The last demo I received showed me search working within a very large, global business process. The information just appeared? Is this where search is heading?

In the year 2000, and every year since, a CEO of one of the leading legacy search vendors made a claim that every major organization would be using their brand of meaning based search technology within two years.

I will not be as bold as him but it is my belief that in less than five years time the majority of organizations will be using search based applications in mission critical applications.

For too long software vendors have been trying to convince organizations, for example, that it was not possible to deploy mission critical solutions such as customer 360 degree customer view, Master Data Management, Data Warehousing or business intelligence solutions in a couple months, with no user training, with with up-to-the-minute information, with user friendly interfaces, with a low cost per query covering millions or billions of records of information.

With Exalead this is possible and we have proven it in some of the world’s largest companies.

How does this change the present understanding of search, which in my opinion is often quite shallow?

Two things are required to change the status quo.

Firstly, a disruptive technology is required that can deliver on these requirements and secondly businesses need to demand new methods of meeting ever greater business requirements on information.

Today I see both these things in place. Exalead has proven that our solutions can meet the most demanding of mission critical requirements in an agile way and now IT departments are realizing that they cannot support their businesses moving forward by using traditional technologies.

What do you see as the trends in enterprise search for 2010?

Last year was a turning point around Search Based Applications. With the world-wide economy in recession, many companies have put projects on hold until things were looking better. With economies still looking rather weak but projects not being able to be left on ice for ever, they are starting to question the value of utilizing expensive, time consuming and rigid technologies to deliver these projects.

Search is a game changing technology that can deliver more innovative, agile and cheaper solutions than using traditional technologies. Exalead is there to deliver on this promise.

Search, a commodity solution? No.

Editor’s note: You can learn more about Exalead’s search enable applications technology and method at the Exalead Web site.

Stephen E Arnold, February 4, 2010

I wrote this post without any compensation. However, Mr. Bentinck, who lives in a far off land, offered to buy me haggis, and I refused this tasty bribe. Ah, lungs! I will report the lack of payment to the National Institutes of Health, an outfit concerned about alveoli.
Profiles
Vyre: Software, Services, Search, and More

A happy quack to the reader who sent me a link to Vyre, whose catchphrase is “dissolving complexity.” The last time I looked at the company, I had pigeon holed it as a consulting and content management firm. The news release my reader sent me pointed out that the company has a mid market enterprise search solution that is now at version 4.x. I am getting old, or at least too sluggish to keep pace with content management companies that offer search solutions. My recollection is that Crown Point moved in this direction. I have a rather grim view of CMS because software cannot help organizations create high quality content or at least what I think is high quality content.

The Wikipedia description of Vyre matches up with the information in my archive:

VYRE, now based in the UK, is a software development company. The firm uses the catchphrase “Enterprise 2.0? to describe its enterprise  solutions for business.The firm’s core product is Unify. The Web based services allows users to build applications and content management. The company has technology that manages digital assets. The firm’s clients in 2006 included Diageo, Sony, Virgin, and Lowe and Partners. The company has reinvented itself several times since the late 1990s doing business as NCD (Northern Communication and Design), Salt, and then Vyre.

You can read Wikipedia summary here. You can read a 2006 Butler Group analysis here. My old link worked this evening (March 5, 2009), but click quickly.  In my files I had a link to a Vyre presentation but it was not about search. Dated 2008, you may find the information useful. The Vyre presentations are here. The link worked for me on March 5, 2009. The only name I have in my archive is Dragan Jotic. Other names of people linked to the company are here. Basic information about the company’s Web site is here. Traffic, if these data are correct, seem to be trending down. I don’t have current interface examples. The wiki for the CMS service is here. (Note: the company does not use its own CMS for the wiki. The wiki system is from MedioWiki. No problem for me, but I was curious about this decision because the company offers its own CMS system.  You can get a taste of the system here.

image

Administrative Vyre screen.

After a bit of poking around, it appears that Vyre has turned up the heat on its public relations activities. The Seybold Report here presented a news story / news release about the search system  here. I scanned the release and noted this passage as interesting for my work:

…version 4.4 introduces powerful new capabilities for performing facetted and federated searching across the enterprise. Facetted search provides immediate feedback on the breakdown of search results and allows users to quickly and accurately drill down within search results. Federated search enables users to eradicate content silos by allowing users to search multiple content repositories.

Vyre includes a taxonomy management function with its search system, if I read the Seybold article correctly. I gravitate to the taxonomy solution available from Access Innovations, a company run by my friend and colleagues Marje Hlava and Jay Ven Eman. Their system generates ANSI standard thesauri and word lists, which is the sort of stuff that revs my engine.

Endeca has been the pioneer in the enterprise sector for “guided navigation” which is a synonym in my mind for faceted search. Federated search gets into the functions that I associated with Bright Planet, Deep Web Technologies, and Vivisimo, among others. I know that shoving large volumes of data through systems that both facetize content and federated it are computationally intensive. Consequently, some organizations are not able to put the plumbing in place to make these computationally intensive systems hum like my grandmother’s sewing machine.

If you are in the market for a CMS and asset management company’s enterprise search solution, give the company’s product a test drive. You can buy a report from UK Data about this company here. I don’t have solid pricing data. My notes to myself record the phrase, “Sensible pricing.” I noted that the typical cost for the system begins at about $25,000. Check with the company for current license fees.

Stephen Arnold, March 6, 2009
Latest News
Mobile Devices and Their Apps: Search Gone Missing

VentureBeat’s “A Pretty Chart of Top Apps for iPhone, Android, BlackBerry” shocked me. Not a little. Quite a bit. You will want to look at the top apps f

Lazarus, Azure Chip Consultants, and Search

January 8, 2010

A person called me today to tell me that a consulting firm is not accepting my statement “Search is dead”. Then I received a spam email that said, “Search is back.” I thought, “Yo, Lazarus. There be lots of dead search vendors out there. Example: Convera.

Who reports that search has risen? An azure chip consultant! Here’s what raced through my addled goose brain as I pondered the call and the “search is back” T shirt slogan:

In 2006, I was sitting on a pile of research about the search market sector. The data I collected included:

  • Interviews with various procurement officers, search system managers, vendors, and financial analysts
  • My own profiles of about 36 vendors of enterprise search systems plus the automated content files I generate using the Overflight system. A small scale version is available as a demo on ArnoldIT.com
  • Information I had from my work as a systems engineering and technical advisor to several governments and their search system procurement teams
  • My own experience licensing, testing, and evaluating search systems for clients. (I started doing this work after we created in 1993 The Point (Top 5% of the Internet) and sold it to Lycos, a unit of CMGI. I figured I should look into what Lycos was doing so I could speak with authority about its differences from BRS/Search, InQuire, Dialog (RECON), and IBM STAIRS III. I had familiarity with most of these systems through various projects in my pre Point (Top 5% of the Internet life).
  • My Google research funded by the now-defunct BearStearns outfit and a couple of other well heeled organizations.

What was clear in 2006 was the following:

First, most of the search system vendors shared quite a bit of similarity. Despite the marketing baloney, the key differentiators among the flagship systems in 2006 were minor. Examples range from their basic architecture to their use of stemming to the methods of updating indexes. There were innovators, and I pointed out these companies in my talks and various writings, including the three editions of the Enterprise Search Report I wrote before I fell ill in February 2007 and quit doing that big encyclopedia type publication. These similarities made it very clear to me that innovation for enterprise search was shifting from the plain old key word indexing of structured records available since the advent of RECON and STAIRS to a more freeform approach with generally lousy relevance.

image

Get information access wrong, and some folks may find a new career. Source: http://www.seeing-stars.com/Images/ScenesFromMovies/AmericanBeautyMrSmiley%28BIG%29.JPG

Second, the more innovative vendors were making an effort in 2006 to take a document and provide some sort of context for it. Without a human indexer to assign a classification code to a document that is about marketing but does not contain the word “marketing”, this was rocket science. But when I examined these systems, there were two basic approaches which are still around today. The first was to use statistical methods to put documents together and make inferences and the other was a variation on human indexing but without humans doing most of the work. The idea was that a word list would contain synonyms. There were promising demonstrations of software methods that could “read” a document, but there were piggy and of use where money was no object.

Third, the Google approach which used social methods—that is, a human clicking on a link—were evident but not migrating to the enterprise world. Google was new but to make their 2006 method hum, lots of clicks were needed. In the enterprise, most documents never get clicked, so the 2006 Google method was truly lousy. Google has made improvements, mostly by implementing the older search methods, not by pushing the envelope as it has been doing with its Web search and dataspace efforts.

Fourth, most of the search vendors were trying like Dickens to get out of a “one size fits all” approach to enterprise search. Companies making sales were focusing on a specific niche or problem and selling a package of search and content searching that solved one problem. The failure of the boil the ocean approach was evident because user satisfaction data from my research funded by a government agency and other clients revealed that about two thirds of the users of an enterprise search system were dissatisfied or very dissatisfied with that search system. The solution, then, was to focus. My exemplary case was the use of the Endeca technology to allow Fidelity UK sales professionals to increase their productivity with content pushed to them using the Endeca system. The idea was that a broker could click on a link and the search results were displayed. No searching required. ClearForest got in the game by analyzing the dealer warranty repair comments. Endeca and ClearForest were harbingers of focus. ClearForest is owned by Thomson Reuters and in the open source software game too.

When I wrote the article in Online Magazine for Barbara Quint, one of my favorite editors, I explained these points in more detail. But it was clear that the financial pressures on Convera, for example, and the difficulty some of the more promising vendors like Entopia were having made the thin edge of survival glint in my desk lamp’s light. Autonomy by 2006 had shifted from search and organic growth to inorganic growth fueled by acquisitions that were adjacent to search.

Read more

Can Microsoft and Its Petascale Financial Services Mining Project Succeed

December 1, 2009

The goslings and I were chattering and quacking in Harrod’s Creek. One of our cousins was killed and eaten for an American holiday. What a way for our beloved friend and colleague to go: deep fried in an oil drum behind the River Creek Inn.

As we were recalling the best moments in Theodore the Turkey’s life, we discussed the likelihood of Microsoft’s petascale content mining project hitting a home run. The ideas, as we addled geese understand it, is that Microsoft wants to process lots of content and generate high value insights for the money crazed MBAs in the world’s leading financial institutions.

The project tackles a number of tough technical problems; for example, getting around the inherent latency in petascale systems, dealing with the traditional input output balkiness of Windows plumbing, and crunching enough data with sufficient accuracy to make the exercise worth the time of the financial client. You may find my earlier post germane.

Other outfits are in this game as well. Some are focused on the hardware / firmware / software side like Exegy. Others provide toolkits like Kapow Technologies. Some Beltway Bandits operate low profile content filtering systems for governmental and commercial clients. And there is the old nemesis, Googzilla, happily chewing through one trillion documents every few days. Finally, some of the financial institutions themselves have pumped support into outfits like Connotate. Even the struggling Time Warner owns some nifty technology in the Relegence unit. So, what’s new?

Three thoughts as I prepare to comment about the push into perceived real time processing at the International Online
Show:

  1. The cost of slashing latency with any type of content is going to be one expensive proposition. Not even some governments have the cash to muscle up a serve with terabytes of RAM. Yep, terabytes.
  2. Figuring out what process left another process gasping for air requires some programmers who can plow through code looking for an errant space, a undefined variable, or a bit of an Assembler hack that push when it should have popped
  3. Latency outside the span of control of the system can render some outputs just plain wrong. Delay is bad; bad outputs are even worse.

If you have not been tracking Microsoft’s big initiatives, you may want to spend some more time grinding through the ACM and other scholarly papers such as “Towards Loosely Coupled Coupled Programming on Petascale Systems.  and poking around on the Microsoft Web site. To find useful stuff, I use the Google Microsoft index. If you aren’t familiar with it, check it out here.

I wonder if this stuff will be part of SharePoint 2011 and available as a Microsoft Fast ESP plug in?

Stephen Arnold, December 1, 2009

Yes, oh, yes. Let me disclose to the National Institute of Science and Technology that I was not paid to write this humorous essay. Consider it a name day present. If I am late, that’s latency. If I am early, that’s predictive output.

Microsoft and the Cloud Burger: Have It Your Way

November 19, 2009

I am in lovely and organized Washington, DC, courtesy of MarKLogic. The MarkLogic events pull hundreds of people, so I go where the action is. Some of the search experts are at a search centric show, but search is a bit yesterday in my opinion. There’s a different content processing future and I want to be prowling that busy boulevard, not sitting alone on a bench in the autumn of a market sector.

The MarkLogic folks wanted me to poke my nose into its user meeting. That was a good experience. And now I am cooling my heels for a Beltway Bandit client. I have my watch and my wallet. With peace of mind, I thought I would catch up on my newsreader goodies.

I read with some surprise “Windows Server’s Plan to Move Customers Back Off the Cloud” in beta news. As I understand the news story, Microsoft wants its customers to use the cloud, the Azure service. Then when fancy strikes, the customer can license on premises software and populate big, hot, expensive to maintain servers in the licensee’s own data center. I find the “have it your own way” appealing. I was under the impression that the future was the cloud. If I understand this write up, the cloud is not really the future. The “future” is the approach to computing that has been here since I took my first computer programming class in 1963 or so.

I found this passage in the article interesting:

If you write your code for Windows Server AppFabric, it should run on Windows Azure,” said Ottaway, referring to the new mix-and-match composite applications system for the IIS platform. “What we are delivering in 2010 is a CTP [community technology preview] of AppFabric, called Windows Azure AppFabric, where you should be able to take the exact same code that you wrote for Windows Server AppFabric, and with zero or minimal refactoring, be able to put it up on Windows Azure and run it.” AppFabric for now appears to include a methodology for customers to rapidly deploy applications and services based on common components. But for many of these components, there will be analogs between the on-Earth and off-Earth versions, if you will, such that all or part of these apps may be translated between locales as necessary.

Note the “shoulds”. Also, there’s a “may be”. Great. What does this “have it your own way” mean for enterprise search?

First, I don’t think that the Fast ESP system is going to be as adept as either Blossom, Exalead, or Google at indexing and serving results from the cloud for enterprise customers. The leader in this segment is not Google. I would give the nod to Blossom and Exalead. There’s no “should” with these systems. Both deliver.

Second, the latency for a hybrid application when processing content is going to be an interesting challenge for those brave enough to tackle the job. I recall some issues with other vendors’ hybrid systems. In fact, these performance problems were among the reasons that these vendors are not exactly thriving today. Sorry, I cannot mention names. Use your imagination or sift through the articles I have written about long gone vendors.

Third, Microsoft is working from established code bases and added layers—wrappers, in my opinion—to these chunks of code that exist. That’s an issue for me because weird stuff can happen. Yesterday one Internet service provider told me that his shop was sticking with SQL Server 2000. “We have it under control”, he said. With new layers of code, I am not convinced that those building a cloud and on premises solution using SharePoint 2010 and the “new” Fast ESP search system are going to have stress free days.

In short, more Microsoft marketing messages sound like IBM’s marketing messages. Come to think of it hamburger chains have a similar problem. I think this play is jargon for finding ways to maximize revenues, not efficiencies for customers. When I go to a fast food chain, no matter what I order, the stuff tastes the same and delivers the same health benefits. And there’s a “solution accelerator.” I will have pickles with that. Just my opinion.

Stephen Arnold, November 19, 2009

I hereby disclose to the Internal Revenue Service and the Food and Drug Administration that this missive was written whilst waiting for a client to summon me to talk about topics unrelated to this post. This means that the write up is a gift. Report it as such on your tax report and watch your diet.

Analyst Underestimates Impact of Microsoft and Google in Enterprise Search

October 2, 2009

I have written a report for the Gilbane Group and think highly of the firm’s analysis. However, a recent news item forwarded to me by a reader of this Web log triggered some conversation in Harrod’s Creek today. The article was “Competition among Search Vendors,” and it was dated in early August 2009. The article included this assertion:

This additional news item indicates that Microsoft is still trying to get their search strategy straightened out with another new acquisition, Applied Discovery Selects Microsoft FAST for Advanced E-Discovery Document Search. E-discovery is a hot market in legal, life sciences and financial verticals but firms like ISYS, Recommind, Temis, and ZyLab are already doing well in that arena. It will take a lot of effort to displace those leaders, even if Microsoft is the contender. Enterprises are looking for point solutions to business problems, not just large vendors with a boatload of poorly differentiated products.

In the last three months, I have noticed an increase in the marketing activity from quite a few enterprise search and content processing vendors. One one hand, there are SharePoint surfers. These are vendors who don’t want to be an enterprise solution. The vendors are content to license their technology to some of the 100 million SharePoint licensees. Forget search. SharePoint is a market that can be harvested.

On the other hand, there are vendors in the process of changing their stripes, and not just once. Some companies have moved from intelligence applications to marketing to sentiment to call center applications. I have to tell you. I don’t know if some of these companies are even in the search business any more.

Looming over the entire sector are the shadows of Google and Microsoft. Each has a different strategy for sucking revenue blood from the other. Smaller vendors are going to need agility to avoid getting hurt when these two semi-clueless giants rampage across the enterprise to do battle to one another.

The notion that niches will support the more than 300 companies in the search and content processing market may be optimistic. I wanted to use a screen shot from the TV show Fantasy Island but I won’t. A number of search vendors are gasping for oxygen now. I am keen to keep a sunny outlook on life. But when it comes to search and content processing realism is useful. Just ask some of the musical chair executives making the rounds among the search and content processing companies.

What will Google and Microsoft do to get shelf space in the enterprise? What do big companies with deep pockets typically do? Smaller companies can’t stay in this winner take all game.

Stephen Arnold, October 2, 2009

Exclusive Interview with SurfRay President

September 29, 2009

SurfRay has come roaring back into the search and content processing sector. SurfRay, like many other companies, had to tighten its belt and straighten its tie in the wake of the global financial turmoil. With the release of new versions of Ontolica (a search system that snaps into SharePoint) and MondoSearch (a platform independent Web site search solution), SurfRay is making sales again. ArnoldIT.com spoke with Søren Pallesen about the firm’s new products. In a wide ranging interview, Mr. Pallesen, a former Gartner Group consultant, said:

SurfRay’s mission is to deliver tightly packaged search software solutions for businesses to provide effective search for both internal and external users. With Packaged we mean easy to try, install and use. Our vision is to be our customer’s first choice for packaged enterprise search solutions and to become the world’s third largest search solution provider in the world measured on number of paying business customers by 2012. The last six months have been an exciting time for SurfRay. I took over as CEO; we significantly increased investment in product development and an ambitious expansion of the organization. This has paid off. SurfRay is profitable, and we have released new versions of our core products. Ontolica is now in version 4.0, including a new suite of reporting and analytics, and MondoSearch 5.4 is in beta for a Q4 release. As a profitable company we are in the fortunate position to be able to fund our own growth and we are expanding in North America among other by hiring more sales people as well as formation of a Search Expert Center in Vancouver, Canada that will serve customers across the Americas. We are also expanding in Europe most recently with formation of SurfRay UK and Ireland, allowing us to expand sales and support with local people on the ground in this important European market.

When asked about the difference between MondoSearch and Ontolica, Mr. Pallesen told me:

Customers that buy our products typically fall into a number of usage scenarios. Simply put Ontolica solves search problems inside the firewall and MondoSearch outside the firewall. Firstly customers with SharePoint implementations look for enhanced search functionality, and turn to our Ontolica for SharePoint product. Secondly, businesses that do not use SharePoint but have the need for an internal search solution on an intranet, file servers, across email, applications and other sources buy Ontolica Express and use it in combination with Microsoft Search Server Express for simple single server installation or Micro Search Server for multiple load balanced server installations. Thirdly, customers with the need for robust and highly configurable web site search buy MondoSearch. Especially popular with businesses that want to implement up- and cross selling on their search results page.

You can read the full text of the interview in the Search Wizards Speak series on ArnoldIT.com. For more information about SurfRay, visit the company’s revamped Web site at http://www.surfray.com.

Stephen Arnold, September 29, 2009

Explaining the Difference between Fast ESP and MOSS 2007 Again

September 7, 2009

When a company offers multiple software products to perform a similar function, I get confused. For example, I have a difficult time explaining to my 88 year old father the differences among Notepad, WordPad, Microsoft Works’ word processing, Microsoft Word word processing, and the Microsoft Live Writer he watched me use to create this Web log post. I think it is an approach like the one the genius at Ragu spaghetti sauce used to boost sales of that condiment. When my wife sends me to the store to get a jar of Ragu spaghetti sauce, I have to invest many minutes figuring out what the heck is the one I need. Am I the only male who cannot differentiate between Sweet Tomato Basic and Margherita? I think Microsoft has taken a different angle of attack because when I acquired a Toshiba netbook, the machine had installed Notepad, WordPad, and Microsoft Works. I added a version of Office and also the Live Writer blog tool. Some of these were “free” and others products came with my MSDN subscription.

Now the same problem has surfaced with basic search. I read “FAST ESP versus MOSS 2007 / Microsoft Search Server” with interest. Frankly I could not recall if I had read this material before, but quit a bit seemed repetitive. I suppose when trying to explain the differences among word processors, the listener hears a lot of redundant information as well.

The write up begins:

It took me some time but i figured out some differences between Microsoft Search Server / MOSS 2007 and Microsoft FAST ESP. These differences are not coming from Microsoft or the FAST company. But it came to my notice that Microsoft and FAST will announce a complete and correct list with these differences between the two products at the conference in Las Vegas next week.These differences will help me and you to make the right decisions at our customers for implementing search and are based on business requirements.

Ah, what’s different is that this is a preview of the “real” list of differences. Given the fact that the search systems available for SharePoint choke and gasp when the magic number of 50 million documents is reached, I hope that the Fast ESP system can handle the volume of information objects that many organizations have on their systems at this time.

The list in the Bloggix post numbers 14. Three interested me:

  1. Scalability
  2. Faceted navigation
  3. Advanced federation.

Several observations:

First, scalability is an issue with most search systems. Some companies have made significant technical breakthroughs to make adding gizmos painless and reasonably economical. Other companies have made the process expensive, time consuming, and impossible for the average IT manager to perform. I heard about EMC’s purchase of Kazeon. I thought I heard that someone familiar with the matter pointed to problems with the Fast ESP architecture as one challenge for EMC. In order to address the issue, EMC bought Kazeon. I hope the words about “scalability” are backed up with the plumbing required to deliver. Scaling search is a tough problem, and throwing hardware at hot spots is, at best, a very costly dab of Neosporin.

Second, faceted navigation exists within existing MOSS implementations. I think I included screenshots of faceted navigation in the last edition of the Enterprise Search Report I wrote in 2006 and 2007. There was a blue interface and a green interface. Both of these made it possible to slice and dice results by clicking on an “expert” identified by counting the number of documents a person wrote with a certain word in them. There were other facets available as well, although most we more sophisticated that the “expert” function. I hope that the “new” Fast ESP implements a more useful approach for users of Fast ESP. Of course, identifying, tagging, and linking facets across processed content requires appropriate computing resources. That brings us back to scaling, doesn’t it? Sorry.

Third, federation is a buzz word that means many different things because vendors define the term in quite distinctive ways. For example, Vivisimo federates, and it is  or was at one time a metasearch system. The query went to different indexing services, brought back the results, deduplicated them, put the results in folders on the fly, and generated a results list. Another type of federation surfaces in the descriptions of business intelligence systems offered by SAS. The system blends structured and unstructured data within the SAP “environment”. Others are floating around as well, including the repository solutions from TeraText which federates disparate content into one XML repository. What I find interesting is that Microsoft is not delivering “federation” which is undefined. Microsoft is, according to the Bloggix post, on the trail of “advanced federation”. What the heck does that mean. The explanation is:

FAST ESP supports advanced federation including sending queries to various web search APIs, mixing results, and shallow navigation. MOSS only supports federation without mixing of results from different sources and navigation components, but showing them separately.

Okay, Vivisimo and SAP style for Fast ESP; basic tagging for MOSS. Hmm.

To close, I think that the Fast ESP product is going to add a dose of complexity to the SharePoint environment. Despite Google’s clumsy marketing, the Google Search Appliance continues to gain traction in many organizations. Google’s solution is not cheap. People want it. I think Fast ESP is going to find itself in a tough battle for three reasons:

  1. Google is a hot brand, even within SharePoint shops
  2. Microsoft certified search solutions are better than Fast ESP based on my testing of search systems over the past decade
  3. The cost savings pitch is only going to go so far. CFOs eventually will see the bills for staff time, consulting services, upgrades, and search related scaling. In a lousy financial environment, money will be a weak point.

I look forward to the official announcement about Fast ESP, the $1.2 billion Microsoft spent for this company is now going to have to deliver. I find it unfortunate that the police investigation of alleged impropriety at Fast Search & Transfer has not been resolved. If a product is so good as Fast ESP was advertised to be, what went wrong with the company, its technology, and its customer relations prior to the Microsoft buy out? I guess I have to wait for more information on these matters. When you have a lot of different products with overlapping and similar services, the message I get is more like the Ragu marketing model, not the solving of customer problems in a clear, straightforward way. Sigh. Marketing, not technology, fuels enterprise search these days I fear.

Stephen Arnold, September 7, 2009

Microsoft Fast for Portals

August 17, 2009

Author’s Note: The images in this Web log post are the property of Microsoft Corp. I am capturing my opinion based on a client’s request to provide feedback about “going with Fast for SharePoint” versus a third party solution from a Microsoft Certified Partner. If you want happy thoughts about Microsoft, Fast ESP, and search in SharePoint environments, look elsewhere. If you want my opinions, read on. Your mileage may vary. If you have questions about how the addled goose approaches these write ups, check out the editorial policy here.

Introduction

Portals are back. The idea is that a browser provides a “door” to information and applications is hot again. I think. You can view a video called “FAST: Building Search Driven Portals with Microsoft Office SharePoint Server 2007 and Microsoft Silverlight” to get the full story. I went back through my SharePoint search links. I focused on a presentation given in 2008 by Two Microsoft Fast engineers–Jan Helge Sageflåt and Stein Danielsen.

After watching the presentation for a second time, I formed several impressions of what seems to be the general thrust of the Microsoft Fast ESP search system. I have heard reports that Microsoft is doing a full court press to get Microsoft-centric organizations to use Fast ESP as the industrial strength search system.

Let me make several observations about the presentation by the Microsoft Fast engineers and then conclude with a suggestion that caution and prudence may be fine dinner companions before one feasts on Fast ESP. Portals are not a substitute for making it easy for employees to locate the item of information needed to answer a run-of-the-mill business information need.

Observations about the 2008 Demo

First, the presentation focuses on building interfaces and making connections to content in SharePoint. Most organizations want to connect to the content scattered on servers, file systems, and enterprise application software data stores. That is job one or it was until the financial meltdown. Now organizations want to acquire, merge, search, and tap into social content. Much of that information has a short shelf life. The 2008 presentation did not provide me with evidence that the Microsoft Fast ESP system could:

  • Acquire large flows of non-SharePoint content
  • Process that information without significant latency
  • Identify the plumbing needed to handle flows of real time content from RSS feeds and the new / updated content from a SharePoint system.

Read more

Microsoft Embraces Scale

August 4, 2009

The year was 2002. A cash-rich, confused outfit paid me to write a report about Google’s database technology. In 2002, Google was a Web search company with some good buzz among the alleged wizards of Web search. Google did not have much to say when its executives gave talks. I recall an exchange between me and Larry Page at the Boston Search Engine Meeting in 1999. The topic? Truncation. Now that has real sizzle among the average Web surfer. I referenced an outfit called InQuire, which supported forward truncation. Mr. Page asserted that Google did not have to fool around with truncation. The arguments bored even those who were search experts at the Boston meeting.

I realized then that Google had some very specific methods, and those methods were not influenced by the received wisdom of search as practiced at Inktomi or Lycos, to name two big players in 2000. So I began my research looking for differences between what Google engineers were revealing in their research papers. I compiled a list of differences. I won’t reference my Google studies, because in today’s economic climate, few people are buying $400 studies of Google or much else for that matter.

I flipped through some of the archives I have on one of my back up devices. I did a search for the word “scale”, and I found that it was used frequently by Google engineers and also by Google managers. Scale was a big deal to Google from the days of BackRub, according to my notes. BackRub did not scale. Google, scion of BackRub, was engineered to scale.

The reason, evident to Messrs. Brin and Page in 1998, was that the problem with existing Web search systems was that the operators ran out of money for exotic hardware needed to keep pace with the two rapidly generating cells of search: traffic and new / changed content. The stroke of genius, as I have documented in my Google studies, was that Google tackled the engineering bottlenecks. Other search companies such as Lycos lived with the input output issues, the bottlenecks of hitting the disc for search results, and updating indexes by brute force methods. Not the Google.

Messrs. Brin and Page hired smart men and women whose job was “find a solution”. So engineers from Alta Vista, Bell Labs, Sun Microsystems, and other places where bright folks get jobs worked to solve these inherent problems. Without solutions, there was zero chance that Google could avoid the fate of the Excites, OpenText Web index, and dozens of other companies without a way to grow without consuming the equivalent of the gross domestic product for hardware, disc space, bandwidth, chillers, and network devices.

Google’s brilliance (yes, brilliance) was to resolve in a cost effective way the technical problems that were deal breakers for other search vendors. AltaVista was a pretty good search system but it was too costly to operate. When the Alpha computers were online, you could melt iron ore, so the air condition bill was a killer.

Keep in mind that Google has been working on resolving bottlenecks and plumbing problems for more than 11 years.

I read “Microsoft’s Point Man on Search—Satya Nadella—Speaks: It’s a Game of Scale” and I shook my head in disbelief. Google operates at scale, but scale is a consequence of Google’s solutions to getting results without choking a system with unnecessary disc reads. Scale is a consequence of using dirt cheap hardware that is mostly controlled by smart software interacting with the operating system and the demands users and processes make on the system. Scale is a consequence of figuring out how to get heat out of a rack of servers and replacing conventional uninterruptable power supplies with on motherboard batteries from Walgreen’s to reduce electrical demand, heat and cost. Scale comes from creating certain propriety bits of hardware AND software to squeeze efficiencies out of problems caused by physics of computer operation.

If you navigate to Google and poke around you will discover “Publications by Googlers”. I suggest that anyone interested in Google browse this list of publications. I have tried to read every Google paper, but as I age, I find I cannot keep up. The Googlers have increased their output of research into plumbing and other search arcana by a factor of 10 since I first began following Google’s technical innovations. Here’s one example to give you some context for my comments about Mr. Nadella’s comments, reported by All Things Digital; to wit: “Thwarting Virtual Bottlenecks in Multi-Bitrate Streaming Servers” by Bin Liu and Raju Rangaswami (academics_) and Zora Dimitrijevic (Googler). Yep, there it is in plain English—an innovation plus hard data that shows that Google’s system anticipates bottlenecks. Software makes decisions to avoid these “virtual bottlenecks.” Nice, right? The bottleneck imposed by the way computers operate and the laws of physics are identified BEFORE they take place. The Google system then changes its methods in order to eliminate the bottleneck. Think about that the next time you wait for Oracle to respond to a query across a terabyte set of data tables or you wait as SharePoint labors to generate a new index update. Google’s innovation is predictive analysis and automated intervention. This explains why it is sometimes difficult to explain why a particular Web page declined in a Google set of relevance ranked results. The system, not humans, is adapting.

I understand the frustration that many Google pundits, haters, and apologists express to me. But if you take the time to read Google’s public statements about what it is doing and how it engineers its systems, the Google is quite forthcoming. The problem, as I see it, has two parts. First, Googlers write for those who understand the world as Google does. Notice the language of the “Thwarting” paper. Have you thought about Multi-bitrate streaming servers in a YouTube.com type of environment. YouTube.com has lots of users, and streams a lot of content. The problems are that Google’s notion of clarity is show in the statement below:

equation

Second, very few people in the search business deal with the user loads that Google experiences. Looking up the location of one video and copying it from one computer to another is trivial. Delivering videos to a couple of million people at the same time is a different class of problem. So, why read the “Thwarting” paper? The situation described does not exist for most search companies or streaming media companies. The condition at Google is, by definition, an anomaly. Anomalies are not what make most information technology companies hearts go pitter patter more quickly. Google has to solve these problems or it is not Google. A company that is not Google does not have these Google problems. Therefore, Google solves problems that are irrelevant to 99 percent of the companies in the content processing game.

Back to Mr. Nadella. This comment sums up what I call the Microsoft Yahoo search challenge:

Nadella does acknowledge in the video interview here that Microsoft has has not been able to catch up with Google and talks about how that might now be possible.

I love the “might”. The thoughts the went through my mind when I worked through this multi media article from All Things Digital were:

  1. Microsoft had access to similar thinking about scale in 1999. Microsoft hired former AltaVista engineers, but the Microsoft approach to data centers is a bit like the US Navy’s approach to aircraft carriers. More new stuff has been put on a design that has remained unchanged for a long time. I have written about Microsoft’s “as is” architecture in the Web log with snapshots of the approach at three points in time
  2. Google has been unchallenged in search for 11 years. Google has an “as is” infrastructure capable of supporting more than 2,200 queries per second as well as handling the other modest tasks such as YouTube.com, advertising, maps, and enterprise applications. In 2002, Google had not figured out how to handle high load reads and writes because Google focused on eliminating disc reads and gating writes. Google solved that problem years ago.
  3. Microsoft has to integrate the Yahoo craziness into the Microsoft “as is”, aircraft carrier approach to data centers. The affection for Microsoft server products is strong, but adapting to Yahoo search innovations will require some expensive, time consuming, and costly engineering.

In short, I am delighted that Mr. Nadella has embraced scale. Google is becoming more like a tortoise, but I think there was a fable about the race between the tortoise and the hare. Google’s reflexes are slowing. The company has a truck load of legal problems. New competitors like Collecta.com are running circles around Googzilla. Nevertheless, Microsoft has to figure out the Google problem before the “going Google” campaign bleeds revenue and profits from Microsoft’s pivotal business segments.

My hunch is that Microsoft will run out of cash before dealing the GOOG a disabling blow.

Stephen Arnold, August 4, 2009

Rethinking the Microsoft Corp Search Cost Burden

July 31, 2009

I am waiting to give a talk. I have been flipping through hard copy newspapers and clicking around to see what is happening as I cool my heels. Three items caught may attention. Both the New York Times and the Wall Street Journal reported that the Yahoo deal is good for Yahoo. Maybe? What I think is that Microsoft assumed the cost burden of Yahoo’s search operation. Since my analysis of Yahoo’s search costs in 2006, I have gently reminded folks that Yahoo had a growing cost problem and its various management teams could not do much about these costs. So Yahoo focused on other matters and few people  brought the focus back on infrastructure, staff, and duplicative search systems.

Now Microsoft has assumed this burden.

I scanned John Gruber’s “Microsoft’s Long, Slow Decline”, and I noted this comment:

Microsoft remains a very profitable company. However, they have never before reported year-over-year declines like this, nor fallen so short of projected earnings. Something is awry.

Dead on. What is missing is thinking about the challenge Microsoft has in search. My thoughts are:

First, Microsoft has to make headway with its $1.2 billion investment in enterprise search. I think the uptake of SharePoint will produce some slam dunk sales. But competitors in the enterprise search sector know that SharePoint is big and fuzzy, and many Microsoft centric companies have here and now problems. I think there is a real possibility for Microsoft to cut the price of Fast ESP or just give it away if the client buys enough CALs for other Microsoft products. What I wonder is, “How will Microsoft deal with the punishing engineering costs a complex content processing and search system imposes during its first six to 18 months in a licensee’s organization. Microsoft partners may know SharePoint but I don’t think many know Fast ESP. Then there is the R&D cost to keep pace with competitors in search, content processing, and the broader field of findability. Toss in business intelligence and you have a heck of a cost anchor.

Second, Bing.com may get access to Yahoo’s Ford F 150 filled with software. But integrating Yahoo technology with Microsoft technology is going to be expensive. There are other costs as well; for example, Microsoft bought Powerset and some legacy technology from Xerox Parc. Layer on the need for backward compatibility and you have another series of cost black holes.

Finally, there are the many different search technologies that Microsoft has deployed and must presumably rationalize. Fast ESP has a better SQL query method than SQL Server. Will customers get both SQL and Fast ESP or will there be more product variants. Regardless of the path forward, there are increased costs.

Now back to Mr. Gruber’s point: a long, slow decline requires innovation and marketing of the highest order. I think the cost burden imposed by search will be difficult for Microsoft to control. Furthermore, I hypothesize:

  • Google will become more aggressive in the enterprise sector. Inflicting several dozen wounds may be enough to addle Microsoft and erode its profitability in one or two core product lines
  • Google’s share of the Web search market may erode but not overnight’. The Googlers won’t stand still and the deal with Yahoo strikes me as chasing the Google of 2006, not the Google of 2009.
  • Of the more than 200 competitors in enterprise search and content processing, I am confident that a half dozen or more will find ways to suck cash from Microsoft’s key accounts because increasingly people want solutions, not erector sets.

In short, Microsoft’s cost burdens come at a difficult time for the company. Microsoft and Yahoo managers have their work cut out for them.

Stephen Arnold, July 31, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta