Google and Disruption: Will It Work Tomorrow?

April 15, 2010

Editor’s Note: The text in this article is derived from the notes prepared by Stephen E Arnold’s keynote talk on April 15, 2010. He delivered this speech as part of Slovenian Information Days in Portoroz, Slovenia.

Thank you, Mr. Chairman. I am most grateful for the opportunity to address this group and offer some observations about Google and its disruptive tactics.

I started tracking Google’s technical inventions in 2002. A client, now out of business, asked me to indicate if “Google really had something solid.”

My analysis showed a platform diagram and a list of markets that Google was likely to disrupt. I captured three ideas in my 2005 monograph “The Google Legacy“, which is still timely and available from Infonortics Ltd. in Tetbury, Glos.

The three ideas were:

First, Google had figured out how to add computing capacity, including storage, using mostly commodity hardware. I estimated the cost in 2002 dollars as about one-third what companies like Excite, Lycos, Microsoft, and Yahoo and were paying.

Second, Google had solved the problem of text search for content on Web pages. Google’s engineers were using that infrastructure to deliver other types of services. In 2002, there were rumors that Google was experimenting with services that ranged from email to an online community / messaging system. One person, whose name I have forgotten, pointed out that Google’s internal network MOMA was the test bed for this type of service.

Third, Google was not an invention company. Google was an applied research company. The firm’s engineers, some of whom came from Sun Microsystems and AltaVista.com, were adepts at plucking discoveries from university research computing tests and hooking them into systems that were improvements on what most companies used for their applications. The genius was focus and selection and integration.

Google is an information factory, a digital Rouge River construct. Raw materials enter at one end and higher value information products and services come out at the other end of the process.

In my second Google monograph, funded funded in part by another client, I built upon my research into technology and summarized Google’s patent activities between 2004 and mid 2007. Google Version 2.0: The Calculating Predator, also published by Infonortics Ltd., disclosed several interesting facts about the company.

Written by Stephen E. Arnold · Filed Under Business strategy, Feature, Google, Rich media, Technology, Text analytics, Text processing | 1 Comment

Quote to Note: Dick Brass on MSFT Innovation

February 6, 2010

I met Dick Brass many years ago. He left Oracle and joining Microsoft to contribute to a confidential initiative. Mr. Brass worked on the ill-fated Microsoft tablet, which Steve Jobs has reinvented as a revolutionary device. I am not a tablet guy, but one thing is certain. Mr. Jobs knows how to work public relations. Mr. Brass published an article in the New York Times, and it captured the attention of Microsoft and millions of readers who enjoyed Mr. Brass’s criticism of his former employer. I have no opinion about Microsoft, its administrative methods, or its ability to innovate. I did find a quote to note in the write up:

Microsoft is no longer considered the cool or cutting edge place to work. There has been a steady exist of its best and brightest. (“Microsoft’s Creative Destruction”, the New York Times, February 4, 2010, Page 25, column 3, National Edition)

Telling because if smart people don’t work at a company, that company is likely to make less informed decisions than an organization with smarter people. This applies in the consulting world. There are blue chip outfits like McKinsey, Bain, and BCG). Then there are lesser outfits which I am sure you can name because these companies “advertise”, have sales people who “sell” listings, and invent crazy phrases to to create buzz and sales. I am tempted to differentiate Microsoft with a reference to Apple or Google, but I will not. Oh, why did I not post this item before today. The hard copy of my New York Times was not delivered until today. Speed is important in today’s information world.

The quote nails it.

Stephen E Arnold, February 7, 2010

No one paid me to write this, not a single blue chip consulting firm, not a single savvy company. I will report this lack of compensation to the experts at the IRS, which is gearing up for the big day in April.

* Featured
* Interviews
* Profiles

Featured
Microsoft and Mikojo Trigger Semantic Winds across Search Landscape

Semantic technology is blowing across the search landscape again. The word “semantic” and its use in phrases like “semantic technology” has a certain trendiness. When I see the word, I think of smart software that understands information in the way a human does. I also think of computationally sluggish processes and the complexity of language, particularly in synthetic languages like English. Google has considerable investment in semantic technology, but the company wisely tucks it away within larger systems and avoiding the technical battles that rage among different semantic technology factions. You can see Google’s semantic operations tucked within the Ramanathan Guha inventions disclosed in February 2007. Pay attention to the discussion of the system and method for “context”.

image

Gale force winds from semantic technology advocates. Image source: http://www.smh.com.au/ffximage/2008/11/08/paloma_wideweb__470×289,0.jpg

Microsoft’s Semantic Puff

Other companies are pushing the semantic shock troops forward. I read yesterday in Network World’s “Microsoft Talks Up Semantic Search Ambitions.” The article reminded me that Fast Search & Transfer SA offered some semantic functionality which I summarized in the 2006 version of the original Enterprise Search Report (the one with real beef, not tofu inside). Microsoft also purchased Powerset, a company that used some of Xerox PARC’s technology and its own wizardry to “understand” queries and create a rich index. The Network World story reported:

With semantic technologies, which also are being to referred to as Web 3.0, computers have a greater understanding of relationships between different information, rather than just forwarding links based on keyword searches. The end game for semantic search is “better, faster, cheaper, essentially,” said Prevost, who came over to Microsoft in the company’s 2008 acquisition of search engine vendor Powerset. Prevost is still general manager of Powerset. Semantic capabilities get users more relevant information and help them accomplish tasks and make decisions, said Prevost.

The payoff is that software understands humans. Sounds good, but it does little to alter the startling dominance of Google in general Web search and the rocket like rise of social search systems like Facebook. In a social context humans tell “friends” about meaning or better yet offer an answer or a relevant link. No search required.

I reported about the complexities of configuring the enterprise search system that Microsoft offers for SharePoint in an earlier Web log post. The challenge is complexity and the time and money required to make a “smart” software system perform to an acceptable level in terms of throughput in content processing and for the user. Users often prefer to ask someone or just use what appears in the top of a search results list.

Read more »
Interviews
Inside Search: Raymond Bentinck of Exalead, Part 2

This is the second part of the interview with Raymond Bentinck of Exalead.

Isn’t this bad marketing?

No. This makes business sense.Traditional search vendors who may claim to have thousands of customers tend to use only a handful of well managed references. This is a direct result of customers choosing technology based on these overblown marketing claims and these claims then driving requirements that the vendor’s consultants struggle to deliver. The customer who is then far from happy with the results, doesn’t do reference calls and ultimately becomes disillusioned with search in general or with the vendor specifically. Either way, they end up moving to an alternative.

I see this all the time with our clients that have replaced their legacy search solution with Exalead. When we started, we were met with much skepticism from clients that we could answer their information retrieval problems. It was only after doing Proof of Concepts and delivering the solutions that they became convinced. Now that our reputation has grown organizations realize that we do not make unsubstantiated claims and do stick by our promises.

What about the shift to hybrid solutions? An appliance or an on premises server, then a cloud component, and maybe some fairy dust thrown in to handle the security issues?

There is a major change that is happening within Information Technology at the moment driven primarily by the demands placed on IT by the business. Businesses want to vastly reduce the operational cost models of IT provision while pushing IT to be far more agile in their support of the business. Against this backdrop, information volumes continue to grow exponentially.

The push towards areas such as virtual servers and cloud computing are aspects of reducing the operational cost models of information technology provision. It is fundamental that software solutions can operate in these environments. It is surprising, however, to find that many traditional search vendors solutions do not even work in a virtual server environment.

Isn’t this approach going to add costs to an Exalead installation?

No, because another aspect of this is that software solutions need to be designed to make the best use of available hardware resources. When Exalead provided a solution to the leading classified ads site Fish4.co.uk, unlike the legacy search solution we replaced, not only were we able to deploy a solution that met and exceeded their requirements but we reduced the cost of search to the business by 250 percent. A large part of this was around the massively reduced hardware costs associated with the solution.

What about making changes and responding quickly? Many search vendors simply impose a six month or nine month cycle on a deployment. The client wants to move quickly, but the vendor cannot work quickly.

Agility is another key factor. In the past, an organization may implement a data warehouse. This would take around 12 to 18 months to deploy and would cost a huge amount in hardware, software and consultancy fees. As part of the deployment the consultants needed to second guess the questions the business would want to ask of the data warehouse and design these into the system. After the 12 to 18 months, the business would start using the data warehouse and then find out they needed to ask different types of questions than were designed into the system. The data warehouse would then go through a phase of redevelopment which would last many more months. The business would evolve… making more changes and the cycle would go on and on.

With Exalead, we are able to deploy the same solution in a couple months but significantly there is no need to second guess the questions that the business would want to ask and design them into the system.

This is the sort of agile solution that businesses have been pushing their IT departments to deliver for years. Businesses that do not provide agile IT solutions will fall behind their competitors and be unable to react quickly enough when the market changes.

One of the large UK search vendors has dozens of niche versions of its product. How can that company keep each of these specialty products up to date and working? Integration is often the big problem, is it not?

The founders of Exalead took two years before starting the company to research what worked in search and why the existing search vendors products were so complex. This research led them to understand that the search products that were on the marketplace at the time all started as quite simple products designed to work on relatively low volumes of information and with very limited functional capabilities. Over the years, new functionality has been added to the solutions to keep abreast of what competitors have offered but because of how the products were originally engineered they have not been clean integrations. They did not start out with this intention but search has evolved in ways never imagined at the time these solutions were originally engineered.

Wasn’t one of the key architects part of the famous AltaVista.com team?

Yes. In fact, both of the founders of Exalead were from this team.

What kind of issues occur with these overly complex products?

As you know, this has caused many issues for both vendors and clients. Changes in one part of the solution can cause unwanted side effects in another part. Trying to track down issues and bugs can take a huge amount of time and expense. This is a major factor as to why we see the legacy search products on the market today that are complex, expensive and take many months if not years to deploy even for simple requirements.

Exalead learned from these lessons when engineering our solution. We have an architecture that is fully object-orientated at the core and follows an SOA architecture. It means that we can swap in and out new modules without messy integrations. We can also take core modules such as connectors to repositories and instead of having to re-write them to meet specific requirements we can override various capabilities in the classes. This means that the majority of the code that has gone through our quality-management systems remains the same. If an issue is identified in the code, it is a simple task to locate the problem and this issue is isolated in one area of the code base. In the past, vendors have had to rewrite core components like connectors to meet customers’ requirements and this has caused huge quality and support issues for both the customer and the vendor.

What about integration? That’s a killer for many vendors in my experience.

The added advantage of this core engineering work means that for Exalead integration is a simple task. For example, building new secure connectors to new repositories can be performed in weeks rather than months. Our engineers can take this time saved to spend on adding new and innovative capabilities into the solution rather than spending time worrying about how to integrate a new function without affecting the 1001 other overlaying functions.

Without this model, legacy vendors have to continually provide point-solutions to problems that tend to be customer-specific leading to a very expensive support headache as core engineering changes take too long and are too hard to deploy.

I heard about a large firm in the US that has invested significant sums in retooling Lucene. The solution has been described on the firm’s Web site, but I don’t see how that engineering cost is offset by the time to market that the fix required. Do you see open source as a problem or a solution?

I do not wake up in the middle of the night worrying about Lucene if that is what you are thinking! I see Lucene in places that have typically large engineering teams to protect or by consultants more interested in making lots of fees through its complex integration. Neither of which adds value to the company in, for example, reducing costs of increasing revenue.

Organizations that are interested in providing cost effective richly functional solutions are in increasing numbers choosing solutions like Exalead. For example, The University of Sunderland wanted to replace their Google Search Appliance with a richer, more functional search tool. They looked at the marketplace and chose Exalead for searching their external site, their internal document repositories plus providing business intelligence solutions over their database applications such as student attendance records. The search on their website was developed in a single day including the integration to their existing user interface and the faceted navigation capabilities. This represented not only an exceptionally quick implementation, far in excess of any other solution on the marketplace today but it also delivered for them the lowest total cost of ownership compared to other vendors and of course open-source.

In my opinion, Lucene and other open-source offerings can offer a solution for some organizations but many jump on this bandwagon without fully appreciating the differences between the open source solution and the commercially available solutions either in terms of capability or total cost. It is assumed, wrongly in many instances, that the total cost of ownership for open source must be lower than the commercially available solutions. I would suggest that all too often, open source search is adopted by those who believe the consultants who say that search is a simple commodity problem.

What about the commercial enterprise that has had several search systems and none of them capable of delivering satisfactory solutions? What’s the cause of this? The vendors? The client’s approach?

I think the problem lies more with the vendors of the legacy search solutions than with the clients. Vendors have believed their own marketing messages and when customers are unsatisfied with the results have tended to blame the customers not understanding how to deploy the product correctly or in some cases, the third-party or system integrator responsible for the deployment.

One client of ours told me recently that with our solution they were able to deliver in a couple months what they failed to do with another leading search solution for seven years. This is pretty much the experience of every customer where we have replaced an existing search solution. In fact, every organization that I have worked with that has performed an in-depth analysis and comparison of our technology against any search solution has chosen Exalead.

In many ways, I see our solution as not only delivering on our promises but also delivering on the marketing messages that our competitors have been promoting for years but failing to deliver in reality.

So where does Exalead fit? The last demo I received showed me search working within a very large, global business process. The information just appeared? Is this where search is heading?

In the year 2000, and every year since, a CEO of one of the leading legacy search vendors made a claim that every major organization would be using their brand of meaning based search technology within two years.

I will not be as bold as him but it is my belief that in less than five years time the majority of organizations will be using search based applications in mission critical applications.

For too long software vendors have been trying to convince organizations, for example, that it was not possible to deploy mission critical solutions such as customer 360 degree customer view, Master Data Management, Data Warehousing or business intelligence solutions in a couple months, with no user training, with with up-to-the-minute information, with user friendly interfaces, with a low cost per query covering millions or billions of records of information.

With Exalead this is possible and we have proven it in some of the world’s largest companies.

How does this change the present understanding of search, which in my opinion is often quite shallow?

Two things are required to change the status quo.

Firstly, a disruptive technology is required that can deliver on these requirements and secondly businesses need to demand new methods of meeting ever greater business requirements on information.

Today I see both these things in place. Exalead has proven that our solutions can meet the most demanding of mission critical requirements in an agile way and now IT departments are realizing that they cannot support their businesses moving forward by using traditional technologies.

What do you see as the trends in enterprise search for 2010?

Last year was a turning point around Search Based Applications. With the world-wide economy in recession, many companies have put projects on hold until things were looking better. With economies still looking rather weak but projects not being able to be left on ice for ever, they are starting to question the value of utilizing expensive, time consuming and rigid technologies to deliver these projects.

Search is a game changing technology that can deliver more innovative, agile and cheaper solutions than using traditional technologies. Exalead is there to deliver on this promise.

Search, a commodity solution? No.

Editor’s note: You can learn more about Exalead’s search enable applications technology and method at the Exalead Web site.

Stephen E Arnold, February 4, 2010

I wrote this post without any compensation. However, Mr. Bentinck, who lives in a far off land, offered to buy me haggis, and I refused this tasty bribe. Ah, lungs! I will report the lack of payment to the National Institutes of Health, an outfit concerned about alveoli.
Profiles
Vyre: Software, Services, Search, and More

A happy quack to the reader who sent me a link to Vyre, whose catchphrase is “dissolving complexity.” The last time I looked at the company, I had pigeon holed it as a consulting and content management firm. The news release my reader sent me pointed out that the company has a mid market enterprise search solution that is now at version 4.x. I am getting old, or at least too sluggish to keep pace with content management companies that offer search solutions. My recollection is that Crown Point moved in this direction. I have a rather grim view of CMS because software cannot help organizations create high quality content or at least what I think is high quality content.

The Wikipedia description of Vyre matches up with the information in my archive:

VYRE, now based in the UK, is a software development company. The firm uses the catchphrase “Enterprise 2.0? to describe its enterprise solutions for business.The firm’s core product is Unify. The Web based services allows users to build applications and content management. The company has technology that manages digital assets. The firm’s clients in 2006 included Diageo, Sony, Virgin, and Lowe and Partners. The company has reinvented itself several times since the late 1990s doing business as NCD (Northern Communication and Design), Salt, and then Vyre.

You can read Wikipedia summary here. You can read a 2006 Butler Group analysis here. My old link worked this evening (March 5, 2009), but click quickly. In my files I had a link to a Vyre presentation but it was not about search. Dated 2008, you may find the information useful. The Vyre presentations are here. The link worked for me on March 5, 2009. The only name I have in my archive is Dragan Jotic. Other names of people linked to the company are here. Basic information about the company’s Web site is here. Traffic, if these data are correct, seem to be trending down. I don’t have current interface examples. The wiki for the CMS service is here. (Note: the company does not use its own CMS for the wiki. The wiki system is from MedioWiki. No problem for me, but I was curious about this decision because the company offers its own CMS system. You can get a taste of the system here.

image

Administrative Vyre screen.

After a bit of poking around, it appears that Vyre has turned up the heat on its public relations activities. The Seybold Report here presented a news story / news release about the search system here. I scanned the release and noted this passage as interesting for my work:

…version 4.4 introduces powerful new capabilities for performing facetted and federated searching across the enterprise. Facetted search provides immediate feedback on the breakdown of search results and allows users to quickly and accurately drill down within search results. Federated search enables users to eradicate content silos by allowing users to search multiple content repositories.

Vyre includes a taxonomy management function with its search system, if I read the Seybold article correctly. I gravitate to the taxonomy solution available from Access Innovations, a company run by my friend and colleagues Marje Hlava and Jay Ven Eman. Their system generates ANSI standard thesauri and word lists, which is the sort of stuff that revs my engine.

Endeca has been the pioneer in the enterprise sector for “guided navigation” which is a synonym in my mind for faceted search. Federated search gets into the functions that I associated with Bright Planet, Deep Web Technologies, and Vivisimo, among others. I know that shoving large volumes of data through systems that both facetize content and federated it are computationally intensive. Consequently, some organizations are not able to put the plumbing in place to make these computationally intensive systems hum like my grandmother’s sewing machine.

If you are in the market for a CMS and asset management company’s enterprise search solution, give the company’s product a test drive. You can buy a report from UK Data about this company here. I don’t have solid pricing data. My notes to myself record the phrase, “Sensible pricing.” I noted that the typical cost for the system begins at about $25,000. Check with the company for current license fees.

Stephen Arnold, March 6, 2009
Latest News
Mobile Devices and Their Apps: Search Gone Missing

VentureBeat’s “A Pretty Chart of Top Apps for iPhone, Android, BlackBerry” shocked me. Not a little. Quite a bit. You will want to look at the top apps f

Written by Stephen E. Arnold · Filed Under Business intelligence, Financial, Microsoft, News | 1 Comment

Lazarus, Azure Chip Consultants, and Search

January 8, 2010

A person called me today to tell me that a consulting firm is not accepting my statement “Search is dead”. Then I received a spam email that said, “Search is back.” I thought, “Yo, Lazarus. There be lots of dead search vendors out there. Example: Convera.

Who reports that search has risen? An azure chip consultant! Here’s what raced through my addled goose brain as I pondered the call and the “search is back” T shirt slogan:

In 2006, I was sitting on a pile of research about the search market sector. The data I collected included:

Interviews with various procurement officers, search system managers, vendors, and financial analysts
My own profiles of about 36 vendors of enterprise search systems plus the automated content files I generate using the Overflight system. A small scale version is available as a demo on ArnoldIT.com
Information I had from my work as a systems engineering and technical advisor to several governments and their search system procurement teams
My own experience licensing, testing, and evaluating search systems for clients. (I started doing this work after we created in 1993 The Point (Top 5% of the Internet) and sold it to Lycos, a unit of CMGI. I figured I should look into what Lycos was doing so I could speak with authority about its differences from BRS/Search, InQuire, Dialog (RECON), and IBM STAIRS III. I had familiarity with most of these systems through various projects in my pre Point (Top 5% of the Internet life).
My Google research funded by the now-defunct BearStearns outfit and a couple of other well heeled organizations.

What was clear in 2006 was the following:

First, most of the search system vendors shared quite a bit of similarity. Despite the marketing baloney, the key differentiators among the flagship systems in 2006 were minor. Examples range from their basic architecture to their use of stemming to the methods of updating indexes. There were innovators, and I pointed out these companies in my talks and various writings, including the three editions of the Enterprise Search Report I wrote before I fell ill in February 2007 and quit doing that big encyclopedia type publication. These similarities made it very clear to me that innovation for enterprise search was shifting from the plain old key word indexing of structured records available since the advent of RECON and STAIRS to a more freeform approach with generally lousy relevance.

Get information access wrong, and some folks may find a new career. Source: http://www.seeing-stars.com/Images/ScenesFromMovies/AmericanBeautyMrSmiley%28BIG%29.JPG

Second, the more innovative vendors were making an effort in 2006 to take a document and provide some sort of context for it. Without a human indexer to assign a classification code to a document that is about marketing but does not contain the word “marketing”, this was rocket science. But when I examined these systems, there were two basic approaches which are still around today. The first was to use statistical methods to put documents together and make inferences and the other was a variation on human indexing but without humans doing most of the work. The idea was that a word list would contain synonyms. There were promising demonstrations of software methods that could “read” a document, but there were piggy and of use where money was no object.

Third, the Google approach which used social methods—that is, a human clicking on a link—were evident but not migrating to the enterprise world. Google was new but to make their 2006 method hum, lots of clicks were needed. In the enterprise, most documents never get clicked, so the 2006 Google method was truly lousy. Google has made improvements, mostly by implementing the older search methods, not by pushing the envelope as it has been doing with its Web search and dataspace efforts.

Fourth, most of the search vendors were trying like Dickens to get out of a “one size fits all” approach to enterprise search. Companies making sales were focusing on a specific niche or problem and selling a package of search and content searching that solved one problem. The failure of the boil the ocean approach was evident because user satisfaction data from my research funded by a government agency and other clients revealed that about two thirds of the users of an enterprise search system were dissatisfied or very dissatisfied with that search system. The solution, then, was to focus. My exemplary case was the use of the Endeca technology to allow Fidelity UK sales professionals to increase their productivity with content pushed to them using the Endeca system. The idea was that a broker could click on a link and the search results were displayed. No searching required. ClearForest got in the game by analyzing the dealer warranty repair comments. Endeca and ClearForest were harbingers of focus. ClearForest is owned by Thomson Reuters and in the open source software game too.

When I wrote the article in Online Magazine for Barbara Quint, one of my favorite editors, I explained these points in more detail. But it was clear that the financial pressures on Convera, for example, and the difficulty some of the more promising vendors like Entopia were having made the thin edge of survival glint in my desk lamp’s light. Autonomy by 2006 had shifted from search and organic growth to inorganic growth fueled by acquisitions that were adjacent to search.

Written by Stephen E. Arnold · Filed Under Business strategy, Feature, Marketing, Search, Technology, Text processing | 6 Comments

Microsoft Worries about Intellectual Arteriosclerosis

January 1, 2010

At my advanced age, I think a 50 year old is a spring chicken. Some one in their 40s is a chick. Anyone younger is a beak poking through a shell. Not at Microsoft. Based on the article “At Microsoft, Is Age More Than Just a Number?”, someone at Microsoft allegedly believes the company is losing touch with youth. The idea is that a young engineer comes to Microsoft and then high tails it somewhere else quickly. It costs a big company a lot of money to recruit a young employee, train them, and make them productive. There is a high cost to churn, just as there is in the mobile phone subscription list. To get payback from young workers, the company has to keep these lads and lasses around and productive for several years. As maddening as youth are, these folks are important to the future of an organization. The most interesting sentence in the write up, in my opinion, was:

“My research has shown that 401ks, salaries and other forms of monetary compensation are less important to Generation Y retention than fruitful collaboration with peers, recognition of work, opportunities for growth and the idea of “being a part of something”. These young employees are less averse to change and will tirelessly seek environments that promote these activities, leaving those that don’t.”

The quote is from a wizard at Microsoft. Am I alone in thinking that Google has a magnetic lure for young wizards. Those who can’t get hired at the Google can look at Apple, maybe Amazon, or even the wacky land of Facebook.com. If Microsoft has a bunch of spring chickens, these folks may not be ready for the retirement home, but they may be just the ticket for Googzilla to run over on the information highway. The spring chickens may cross the road to fix a glitch in SharePoint and then find themselves under the wheels of Messrs. Brin’s and Page’s Hummer.

Stephen E. Arnold, January 1, 2010

Another freebie. I must report this sad state of affairs to the National Park Service. A dead animal in a national park is its responsibility.

Written by Stephen E. Arnold · Filed Under Business strategy, Google, Microsoft, News, Technology | Comments Off on Microsoft Worries about Intellectual Arteriosclerosis

Cicumvallation: Reed Elsevier and Thomson as Vercingetorix

November 27, 2009

Google Scholar Gets Smart in Legal Information

One turkey received a presidential pardon. Other turkeys may not be so lucky on November 26, 2009, when the US celebrates Thanksgiving. I am befuddled about this holiday. There are not too many farmers in Harrod’s Creek. The fields contain the abandoned foundations of McMansions that the present economic meltdown have left like Shelly’s statue of Ozymandius. The “half buried in the sand” becomes half built homes in the horse farm.

As Kentuckians in my hollow give thanks for a day off from job hunting,, I am sitting by the goose pond trying to remember what I read in my copy of Caesar’s De Bello Gallico. I know Caesar did not write this memoir, but his PR bunnies did a pretty good job. I awoke this morning thinking about the connection between the battle of Alesia and what is now happening to the publishing giants Reed-Elsevier and Thomson Reuters. The trigger for this mental exercise was Google’s announcement that it had added legal content to Google Scholar.

What’s Vercingetorix got to do with Google, Lexis, and Westlaw? Think military strategy. Starvation, death, surrender, and ritual killing. Just what today’s business giants relish.

Google has added the full text of US federal cases and state cases. The coverage of the federal cases, district and appellate, is from 1924 to the present. US state cases cover 1950 to the present. Additional content will be added; for example, I have one source that suggested that the Commonwealth of Virginia Supreme Court will provide Google with CD ROMs of cases back to 1924. Google, according to this source, is talking with other sources of US legal information and may provide access to additional legal information as well. What are these sources? Possibly
Public.Resource.Org and possibly Justia.org, among others.

The present service includes:

The full text of the legal document
Footnotes in the legal document
Page numbers in the legal document
Page breaks in the legal document
Hyperlinks in the legal document to cases
A tab to show how the case was cited in other documents
Links to non legal documents that cite a case.

You can read various pundits, mavens, and azure=chip consultants’ comments on this Google action at this link.

You may want to listen to a podcast called TWIL and listened to the November 23, 2009, show on which Google Scholar was discussed for about a half hour. You can find that discussion on iTunes. Just search for TWIL and download the program “Social Lubricants and Frictions.”

On the surface, the Google push into legal information is a modest amount of data in terms of Google’s daily petabyte flows. The service is easy to use, but the engineering required to provide access to the content strikes me as non-trivial. Content transformation is an expensive proposition, and the cost of fiddling with legal information is one of the primary reasons commercial online services have had to charges hefty fees to look at what amounts to taxpayer supported, public information.

The good news is that the information is free, easily accessible even from an iPhone or other mobile device. The Google service does the standard Google animal tricks of linking, displaying content with minimal latency, and updating new content in a a minute or so that content becoming available to Google software Dyson vacuum cleaner.

So what?

This service is similar to others I have written about in my three Google monographs. Be aware. My studies are not Sergey-and-Larry-eat-pizza books. I look at the Google open source technical and business information. I ignore most of what Google’s wizards “say” in public. These folks are “running the game plan” and add little useful information for my line of work. Your mileage may differ. If so, stop reading this blog post and hunt down a cheerful non-fiction Google book by a real live journalist. That’s not my game. I am an addled goose.

Now let me answer the “so what”.

First, the Google legal content is an incremental effort for the Google. This means that Google’s existing infrastructure, staff, and software can handle the content transformation, parsing, indexing, and serving. No additional big-buck investment is needed. In fact, I have heard that the legal content project, like Google News, was accomplished in the free time for play that Google makes available to its full time professionals. A bit of thought should make clear to you that commercial outfits who have to invest to handle legal content in a Google manner have a cost problem right out of the starting blocks.

Second, Google is doing content processing that should be the responsibility of the US government. I know. I know. The US government wants to create information and not compete with commercial outfits. But the outfits manipulating legal information have priced it so that most everyday Trents and Whitneys cannot afford to use these commercial services. Even some law firms cannot afford these services. Pro bono attorneys don’t have enough money to buy yellow pads to help their clients. Even kind hearted attorneys have to eat before they pay a couple a hundred bucks to run a query on the commercial online services from publicly traded companies out to make their shareholders have a great big financial payday. Google is operating like a government when it processes legal information and makes it available without direct charge to the user. The monetization takes place but on a different business model foundation. That also spells T-R-O-U-B-L-E for the commercial online services like Lexis and Westlaw.

Written by Stephen E. Arnold · Filed Under Business strategy, EDiscovery, Enterprise, Feature, Financial, Google, Legal matters, Online (general), Search, Text processing, Vertical search | 4 Comments

Coveo Expresso Breaks New Ground in Information Access

November 9, 2009

Coveo, a leading provider of enterprise search technology and information access solutions, recently unveiled a free, entry-level enterprise search solution, Coveo Expresso™ Beta. Coveo’s new solution places the power of enterprise information access in the hands of employees everywhere, at no cost, for up to 50 users. The free version of the Expresso content processing system can index one million one million desktop files and email items as well as 100,000 Intranet documents. Licenses can be expanded at minimal cost to as many as 250 users, five million desktop files and email items, and one million SharePoint and File share documents, just by typing a new access code. Administrators simply add new email accounts and SharePoint or file share documents within the intuitive administrative interface. Coveo Expresso is available for immediate download at www.coveo.com/expresso.

Laurent Simoneau, President and CEO, told Beyond Search:

Although enterprise search solutions have been available for nearly a decade, most are built on legacy systems that are difficult to implement and have not lived up to the promise of intuitive, secure and comprehensive information access across information silos. We want to re-educate businesses about the ease and simplicity with which enterprise search should work, as our customers can attest. Coveo Expresso does that—and takes enterprise search one step further with ubiquitous access interfaces such as the Coveo Outlook Sidebar or the desktop floating search bar, which provide guided, faceted search where employees ‘live’—in their email interface or on their PC/laptop. We’ve been testing this feature for a number of months with our current customers and have found it to be one of the biggest boosts to productivity for all employees, regardless of their roles.

Features

The free download features a number of Coveo innovations, including:

Cross-enterprise Email Search, for 50 email accounts, including PST files and attachments, on desktops and in servers for up to 1 million total items.
The Coveo Outlook Sidebar, the industry’s first true enterprise search Outlook plug-in, which provides sophisticated features such as conversation folding, related conversations, related people, related attachments, and the ability to search any indexed content without leaving Outlook, as well as the ability to launch advanced search with guided navigation through search facets.
The Coveo Desktop Floating Search bar, enabling guided searches without leaving the program in which the user is working.
Enterprise Desktop Search, including always-on indexing for 50 PCs/laptops.
Mobile access via BlackBerries for 50 users.

The Espresso Interface

Search results appear in a clean, well-organized panel display.

Written by Stephen E. Arnold · Filed Under Business strategy, EDiscovery, Enterprise, Feature, Technology, Text processing | Comments Off on Coveo Expresso Breaks New Ground in Information Access

Differentiation: The New Enterprise Search Barrier

October 30, 2009

I don’t know one tree from another. When someone points out a maple and remarks that it is a sugar maple, I have no clue about a maple and even less information about a sugar maple. A lack of factual foundation means that I know nothing about trees. Sure, I know that most trees are green and that I can cut one down and burn it. But I don’t own a chain saw, so that general information means zero in the real world.

Now consider the clueless minions who have to purchase an enterprise search system. The difference between my tree knowledge and their search knowledge is easy to point out. Both of us are likely to become confused. To me, trees look alive. To the search procurement team, search systems look alike.

I received an announcement about a search system (nameless, of course) which asserted:

[The vendor’s product] is the first mobile enterprise search server to enable secure ‘anywhere’ access to data that resides across all information sources, including individual desktops, email stores, file shares, external sites and enterprise applications. Leveraging the [vendor’s product] Enterprise Server as its backbone, [the vendor’s product] Anywhere is capable of delivering secure, immediate access to any browser-enabled device, from an iPhone to a Blackberry and beyond.

I find that this write up is * very * similar to the Coveo email search solution, which has one of its features as mobile access plus a number of other bells and whistles.

I can document many other similarities in the way in which search vendors describe their products. In fact, I identified a phrase first used by Endeca in 2003 or 2004 as a key element in Microsoft’s marketing of its SharePoint search systems. My recollection is the phrase in question is “user experience.” Endeca may have snagged it somewhere just as Mozart plucked notes from his contemporaries.

Confusion among search vendors is easy. Many recycle words, phrases, and buzzwords, hoping that their spin will win customers. One thing is certain. Vendors have the azure chip consultants in a tizzy. One prominent azure chip outfit in New York has pegged Google a laggard and a product that has yet to make its appearance as a leader.

Procurement teams? Baffled for sure. Differentiation is needed, but it doesn’t come by recycling another vendor’s marketing collateral or relying on the azure chip crowd to cook up a new phrase to baffle the paying customers, or some of the paying customers.

Vendors, differentiate. Don’t imitate.

Stephen Arnold, October 30, 2009

A former Ziffer bought me dinner this week. Does that count as compensation? I deserve more.

Written by Stephen E. Arnold · Filed Under Enterprise, Marketing, Mobile, News, Search | 1 Comment

Coveo Discloses Client Wins in Q209

August 14, 2009

Coveo is a technology company with some interesting products. I learned about the firm when I poked into the origins of the desktop search system called Copernic. The firm flashed on my radar with a snap in solution for SharePoint. I saw a demonstration of email search that provided features I had heard other vendors describe. Coveo implemented them; for example, maintaining a complete email archive for the user’s desktop computer so if he or she lost a mobile device, the mail was recoverable.

Getting information out of Coveo has not been easy for me. I received a link to a Marketwire article that provided me with some useful information, and I wanted to snag it before the data gets buried in the digital avalanche that cascades into the goose pond each day.

Coveo disclosed several interesting customer wins:

Goodrich Corporation, a Fortune 500 company
Odyssey America, an insurance firm
The Doctor’s Company, an insurer of physicians and surgeon.

Coveo also formed alliances with New Idea Engineering and a number of other integrators around the world.

A happy quack to Coveo and a wing flap to the person at Coveo who provided this information.

Stephen Arnold, August 14, 2009

Written by Stephen E. Arnold · Filed Under Business strategy, Enterprise, News, Online (general), Search, SharePoint, Technology, Text processing | 2 Comments

MarkLogic: The Shift Beyond Search

June 5, 2009

Editor’s note: I gave a talk at a recent user group meeting. My actual remarks were extemporaneous, but I did prepare a narrative from which I derived my speech. I am reproducing my notes so I don’t lose track of the examples. I did not mention specific company names. The Successful Enterprise Search Management (SESM) reference is to the new study Martin White and I wrote for Galatea, a publishing company in the UK. MarkLogic paid me to show up and deliver a talk, and the addled goose wishes other companies would turn to Harrod’s Creek for similar enlightenment. MarkLogic is an interesting company because it goes “beyond search”. The firm addresses the thorny problem of information architecture. Once that issue is confronted, search, reports, repurposing, and other information transformations becomes much more useful to users. If you have comments or corrections to my opinions, use the comments feature for this Web log. The talk was given in early May 2009, and the Tyra Banks’s example is now a bit stale. Keep in mind this is my working draft, not my final talk.

Introduction

Thank you for inviting me to be at this conference. My topic is “Multi-Dimensional Content: Enabling Opportunities and Revenue.” A shorter title would be repurposing content to save and make money from information. That’s an important topic today. I want to make a reference to real time information, present two brief cases I researched, offer some observations, and then take questions.

Let me begin with a summary of an event that took place in Manhattan less than a month ago.

Real Time Information

America’s Top Model wanted to add some zest to their popular television reality program. The idea was to hold an audition for short models, not the lanky male and female prototypes with whom we are familiar.

The short models gathered in front of a hotel on Central Park South. In a matter of minutes, the crowd began to grow. A police cruiser stopped and the two officers were watching a full fledged mêlée in progress. Complete with swinging shoulder bags, spike heels, and hair spray. Every combatant was 5 feet six inches taller or below.

The officers called for the SWAT team but the police were caught by surprise.

I learned in the course of the nine months research for the new study written by Martin White (a UK based information governance expert) and myself that a number of police and intelligence groups have embraced one of MarkLogic’s systems to prevent this type of surprise.

Real-time information flows from Twitter, Facebook, and other services are, at their core, publishing methods. The messages may be brief, less than 140 characters or about 12 to 14 words, but they pack a wallop.

MarkLogic’s slicing and dicing capabilities open new revenue opportunities.

Here’s a screenshot of the product about which we heard quite positive comments. This is MarkMail, and it makes it possible to take content from real-time systems such as mail and messaging, process them, and use that information to create opportunities.

Intelligence professionals use the slicing and dicing capabilities to generate intelligence that can save lives and reduce to some extent the type of reactive situation in which the NYPD found itself with the short models disturbance.

Financial services and consulting firms can use MarkMail to produce high value knowledge products for their clients. Publishing companies may have similar opportunities to produce high grade materials from high volume, low quality source material.

Written by Stephen E. Arnold · Filed Under Business strategy, Feature, Publishing, Technology, Text analytics, Text processing | 3 Comments

Search 2010: Five Game Changers

May 7, 2009

Editor’s Note: This is the outline of Stephen Arnold’s comments at the “debate”session of the Boye 09 Conference in Philadelphia, Pennsylvania, on May 6, 2009. The actual talk will be informal, and these notes are part of the preparation for that talk.

Introduction

Thank you for inviting me to share my ideas with you. I remember that WC Fields had a love hate relationship with Philadelphia. Approaching the Curtis Building, where we are meeting, I realized that much of the old way of doing business has changed. I don’t have time to dig too deeply into the many content challenges organizations face. If the publisher of the Saturday Evening Post were with us this afternoon, I think Mr. Curtis would have a difficult time explaining why his successful business was marginalized; that is, pushed aside, made into an artifact like the Liberty Bell down the street.

I have been asked to do a “Search 2010” talk twice this year. Predicting the future in today’s troubled economic environment is difficult. Nevertheless, I want to identify five trends in the next 20 minutes. I will try to take a position on each trend to challenge the panelists’ thinking and stimulate questions from you in the audience.

Let’s dive right in. Here are the five trends:

Darwinism and search
Real time search
Google’s enterprise push
Microsoft’s enterprise search
Open source

I want to comment on each, offer a couple of examples, and try to come at these subjects in a way that highlights what my research for Google: The Digital Gutenberg revealed as substantive actions in search.

Search and Darwin

The search sector is in a terrible position. The term “search” has been devalued. Few people know what the word means, yet most people say, “I am pretty good at search.” That confidence is an illusion. The search sector is a tough nut to crack. Well known companies such as Mondosoft and Ontolica found themselves purchased by an entrepreneur. That company restructured, and now the “old” Mondosoft has been reincarnated but it is not clear that the new owners will make a success of the business. Delphes, a specialist vendor in Québec, failed. Attensity orchestrated a roll up with two German firms to become more of a force in marketing. A promising system in the Netherlands called Teezir was closed when I visited the office in November 2009. I hear rumors about search vendors who are chasing funding frequently, but I don’t want to mention the names of some of these well known firms in this forum. Not long ago, the high profile Endeca sought support in the form of investments from Intel and SAP’s venture arm. At Oracle, the Secure Enterpriser Search 10g product has largely disappeared. The strong survive, which means big players like Google and Microsoft are going to fighting for the available revenue.

Real Time Search

What is it? The first thing to say is that real time search is a terrible phrase. Riches await the person who crafts a more appropriate buzzword. The notion is that messages from a service like Twitter fly around in their 140 character glory. The Twitter search system at http://search.twitter.com or the developers who use the Twitter API make it easy to find or see information. A good example is the service at http://www.twitturly.com or http://www.tweetmeme.com. You look at Tweets (the name for Twitter messages) and you scan the listings on these services. Real time search blends geospatial and mobile operations. Push, not key word search, complements scanning a list of suggested hits. The mode of user interaction is not keyword search. This is an important distinction.

“Search” means look at or scan. “Search” does not mean type key words and hunt through results list. It is possible to send a Tweet to everyone on Twitter or to those who follow you and ask a question. You may get an answer, but the point is that the word “search” does not explain the value of this type of system for business intelligence or marketing, for example. If you run a search with the keyword of a company like Google or Yahoo, you can get information which may or may not be accurate or useful. You will see what’s happening “now”, which is the meaning of “real time”.

Written by Stephen E. Arnold · Filed Under Conferences, News | Comments Off on Search 2010: Five Game Changers

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Google and Disruption: Will It Work Tomorrow?

Quote to Note: Dick Brass on MSFT Innovation

Lazarus, Azure Chip Consultants, and Search

Microsoft Worries about Intellectual Arteriosclerosis

Cicumvallation: Reed Elsevier and Thomson as Vercingetorix

Coveo Expresso Breaks New Ground in Information Access

Differentiation: The New Enterprise Search Barrier

Coveo Discloses Client Wins in Q209

MarkLogic: The Shift Beyond Search

Search 2010: Five Game Changers

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta