Headlong into an Abyss
July 17, 2008
Right before the 4th of July, my phone rang. A very enthusiastic person had to speak with me. I hate the phone. Since February 2007, my hearing has gone downhill.
The chipper caller explained that a major organization had a problem. In five minutes, I learned that this outfit had three content processing systems. Each system was a search system, a collaboration system, and a content processing system.
The problem was that no one could find anything. Right after the holiday, I opened by mail program and there sat several plump PDF files stuffed full of baloney about requirements, guidelines, billing, and other administrivia. The problem was boiled down to a request for suggestions about making the three systems work happily together so employees could find information.
I thought about this situation and sent an email message telling Ms. Chipper, “No bid. Tx, Steve”. I do this a lot.
In this essay, I want to run down the four reasons I want to steer clear of outfits who are ready to do a header off a cliff into the search abyss.
Too Many Toys
The organization has money and buys search toys. No one plays with the search toys but there is a person who thinks that the organization should play with the search toys.
Source: http://www.tgtbt.com/images/atozvictoriatoys50pc.JPG
Buy, Try, Buy, Try
Organizations unable to get one system working just buy another one. The reasons range from a change in management so any organizational intelligence about search and content processing are lost. One big drug company got a new president, and he mandated a new system. Who really knows how to make a search system work? No one, so buy another one. Maybe it will work. I call this crazy procurement, and it is a sure sign of a dysfunctional organization.
Silo Wars
Multiple search systems can also be a consequence of units that refuse to cooperate. If unit A wants one system, well, unit B wants a different one. This baffles me, because neither system allows a user to access content in one query. When a person tells me, “We need federated search,” I know that this is a silo war situation. Somehow finding a way to take one query, send it to three or more separate search systems, and return concatenated results will save the day. Not likely. The silo barons will find a way to keep their information, thank you.
Blinkx Actions and Rumors
July 17, 2008
Blinkx is in the news again. As you may know, Blinkx is a video search service. Beta News reported on July 16, 2008, that the company is taking offensive and defensive action. You can read Jacqueline Emigh’s story “Rumored Google Target Blinkx Teams with Microsoft, Others” here.
Ms. Emigh cites another Web log. The paragraph of interest to me was:
“Speculation is rising around a potential acquisition of Blinkx, a video search engine, by either Google or News Corp. (and possibly Yahoo),” noted a blog posting in May by Heather Dougherty, director of research at the Hitwise market research firm.
Blinkx reports that it you can search over 26 million hours of video from Google Video, YouTube, MetaCafe, and others.
Among Blinkx’ new deals are tie ups with:
You can learn more about Blinkx here. The Register offers a different angle here.
I am not sure what to make of this story. In 2005, Blinkx was an independent company according to this posting in Search Engine Watch. I have heard that the company is linked to Autonomy, but I need to look into this alleged tie up more closely.
The phrase “sharpened its defense and offense” continues to puzzle me. Does Blinkx want to be purchased by Google or Yahoo? Does Blinkx want to remain independent? More information is needed.
Working Hypotheses
My working hypotheses are:
- This is a PR play, designed to increase visibility of Blinkx and focus attention on what are partnering deals, probably not big money makers yet.
- Blinkx is a hot property, and the company’s management is aggressively positioning the search engine for acquisition
- Video search is a money pit, and the company is pursuing tie ups in hopes of finding a formula to boost revenues and get the stock moving upwards.
The honking goose will fly over its available information and update this Blinkx coverage. My earlier essay about Blinkx is here.
Stephen Arnold, July 17, 2008
Enterprise Search: It’s Easy but Work Is Never Done
July 17, 2008
The Burton Group caught my attention with its report describing Microsoft a couple of years ago as a superplatform. I liked the term, but the report struck me as overly enthusiastic in favor of Microsoft’s server products.
I was surprised when I saw part one of Margie Semilof’s interview with two Burton Group consultants, Guy Creese and Larry Cannell. These folks were described as experts in content management, a discipline with a somewhat checkered history in the pantheon of enterprise software applications. You can read the first part interview here. The interview carries a July 15, 2008, date, and I am capturing my personal thoughts on July 16, 2008. That’s my mode of operation, a euro short and a day late. Also, I am not enthusiastic about CMS experts making the jump to enterprise search expertise. The leap can be made, but it’s like jumping from the frying pan into the fire.
The interview contains a rich vein of intellectual gold or what appears to me to be sort of gold. I jotted down two points made by the Burton experts, and I wanted to offer some color around selected points. When you read the interview, your conclusions and take aways will probably differ from mine. I am an opinionated goose, so if that bothers you, quit reading now.
Let me address two points.
First, this question and answer surprised me:
Question: How much development work is require with search technology?
Answer by Guy Creese, Burton Group expert in content management: It’s pretty easy… Usually a company is up and running and can see most of its documents without trouble.
Yikes. Enterprise search dissatisfies anywhere from half to two thirds of a system’s users. Enterprise search systems are among the most troublesome enterprise applications to set up, optimize, and maintain. Even the Google Search Appliance, one of the most toaster like search solutions, takes some effort to get into fighting shape. Customization requires expertise with the OneBox API. “Seeing documents” and finding information are two quite different functions in my experience.
Second, this question and answer ran counter to the research I conducted for the first three editions of Enterprise Search Report (2004-2006) and my most recent study Beyond Search (2008).
Search technology has some care and feeding involved. How do companies organize the various tasks?
Answer by Guy Creese, Burton Group expert in content management: This is not onerous. Companies don’t have huge armies [to do this work], but someone has to know the formats, whether to index, how quickly they refresh. If no one worries about this, then search becomes less effective. So beyond the eye candy, you have to know how to maintain and adjust your search.
“Not onerous” runs counter to the data I have gathered in surveys and focus groups. “Formats” invoke transformation. Transformation can be difficult and expensive. Hooking search into work processes requires analysis and then customization of search functions. Search that processes content in content management systems often require specialized set up, particularly when the search system indexes duplicate or versioned documents. Rich text processing, a highly desirable function, can wander off the beaten path unless customization and tuning are performed.
Observations
There are a handful of people who have a solid understanding of enterprise search. Miles Kehoe, one of the Verity wizards, is the subject of a Search Wizards Speak interview that will be published on ArnoldIT.com on July 21, 2008. His company, New Idea Engineering, has considerable expertise in search, and you can read his views on what must be done to ensure a satisfactory deployment. Another expert is my son, Erik Arnold, whose company Adhere Solutions, specializes in customizing and integrating the Google Search Appliance into enterprise environments. To my knowledge, neither Mr. Kehoe nor Mr. Arnold characterizes search as a “pretty easy” task. In fact, I can’t recall anyone in my circle of professional acquaintances describing enterprise search as “pretty easy.”
Second, I am concerned that content management systems are expanding into applications and functions that are not germane to these systems’ capabilities. For example, CMS needs search. Interwoven has struck a deal with Vivisimo to provide search that “just works” to Interwoven customers. Vivisimo has worked hard to create a seamless experience, but, based on my sources, the initial work was not “pretty easy”. In fact, Interwoven had a mixed track record in delivering search before hooking up with Vivisimo. But CMS vendors are also asserting that their system is social. Well, CMS allows different people to index a document. I think that’s a social and collaborative function. But social software to me suggests Digg, Twitter, and Mahalo type functionality. Implementing these technologies in a Broadvision (if it is still paddling upstream) or Vignette might take some doing.
Third, SharePoint (a favorite of Burton if I recall the superplatform document) is a polymorphic software system. Once it was a CMS. Now it is a collaboration platform just like Exchange. I think these are marketing words slapped on servers which are positioned to make sales, not solve problems. SharePoint includes a search function, which is improving. But deploying a robust search system within SharePoint is hard in my experience. I prefer using third party software from such companies as ISYS Search Software or the use of third-party tools. ISYS, along with Coveo, offer systems that are indeed much easier to deploy, configure, and maintain than SharePoint. But planning and experience with SharePoint are necessary.
I look forward to the second part of this interesting interview with CMS experts about enterprise search. Agree? Disagree? Quack back.
Stephen Arnold, July 17, 2008
Google: Web Search Market Share Increasing, Again
July 16, 2008
Silicon.com ran Stephen Shankland’s essay “Google’s Search Share Continues to Creep Up.” You can read the full text here. In the flurry of news about Google-Viacom, Google-Microsoft-Yahoo, and Google everywhere, I missed this point:
Its share increased from 68.29 per cent in May to 69.17 per cent in June, the analyst firm said. Over the same period, Yahoo! dropped from 19.95 per cent to 19.62 per cent and Microsoft dropped from 5.89 per cent to 5.46 per cent.
In my little world, a company under such intense media scrutiny and in the midst of legal hassles on a number of fronts, this is interesting. In fact, I can’t recall reading about a company increasing market share while coming under attack from many different quarters, in different countries, and in different technical areas.
I looked up the term monopoly to refresh my memory. The Wiktionary provided this memory nudge to me:
A situation in which solely one company exclusively provides a particular product or service, dominating that market and generally exerting powerful control over it; An exclusive control over the trade or manufacture of a commodity; A company dominating a market in one of the above manners.
If one is a stickler, the loop hole is exclusively. Google is not the only provider of no-charge Web search. I can choose to use Live.com, Yahoo.com, Baidu.com (or have one of my interns tackle Chinese for me), and hundreds of other services which I list here.
In Google Version 2.0, which you can learn more about here, I don’t spend much time discussing Google’s domination of Web search. I accept that hegemony as the status quo. After all, Google’s been grinding forward for 10 years without significant opposition except from Baidu.com and Yandex.com and maybe a couple of other companies operating outside of the ken of US pundits. In North America, despite lots of hand waving and public relations, no vendor has been able to focus the technical effort on search to leap frog Google. As a result, Google has had time to improve its operation and increase its lead as Mr. Shankland points out.
The real challenge that Google presents is that its Web search and ad business is so dominant that getting a better or clearer view of the company is very difficult. In my research, I have learned that Google has an application platform. Search and advertising are just two very successful applications running on the Google infrastructure. This means that Google could enter other markets at low incremental cost.
The company is moving into other markets now, but taking baby steps and following an unorthodox approach. Much has been made of the departure of Google executives who have cashed out or grown tired of the idiosyncrasies of life at the GOOG. Some of the recent information I have gathered suggests that Google is adapting to its work force changes. Those adaptations are likely to make it even more difficult to predict what Google will do next. Consider that:
- Google is hiring smart, young engineers and giving them freedom to make decisions as long as there are data to back up those decisions. This approach increases risk to a certain degree and it also makes surprises more likely.
- Google’s innovation process of pushing out products and services has slowed, which reflects more management discipline. Nevertheless, if a beta generates a positive result, Google flows resources to that service in response to clicks. The addition of “voting” to search results is a current example of this Darwinian behavior.
- Google is concatenating its services in roll up patent applications. Google is becoming more seamless which reduces complexity to some degree and allows faster response to market conditions.
Google’s not the exclusive provider yet. But unless the competition gets into gear, only lawyers and Google’s own management missteps will prevent Google’s extending its reach.
Stephen Arnold, July 16, 2008
Vertical Search Resurgent
July 16, 2008
Several years ago, the mantra among some of my financial service clients was, “Vertical search.” What’s vertical search? It is two ideas rolled into one buzzword.
A Casual Definition
First, the content processed by the search system is about a particular topic. Different database producers define the scope of a database in idiosyncratic ways. In Compendex, an index of engineering information, you can find a wide range of engineering topics, covering many fields. You can find information about environmental engineering, which looks to me as if the article belongs in a database about chemistry. But in general, the processed information fits into a topical basket. Chemical Abstracts is about chemistry, but the span of chemistry is wide. Nevertheless, the guts of a vertical search engine is bounded content that is brought together in a generally useful topic area. When you look for information about travel, you are using a vertical search engine. For example, Orbitz.com and BookIt.com are vertical search engines.
Second, the content has to searchable. So, vertical content collections require a search engine. Vertical content is often structured. When you look for a flight from LGA to SFO, you fill in dates, times, department airport code, arrival airport code, etc. A parametric query is a fancy way of saying, “Training wheels for a SQL query.” But vertical content collections can be processed by the menagerie of text processing systems. When you query, the Dr. Koop Web site, you are using the type of search system provided by Live.com and Yahoo.com.
Source: http://www.sonirodban.com/images/wheel.jpg
Google is a horizontal search engine, but it is also a vertical search engine. If you navigate to Google’s advanced search page, which is accessed by fewer than three percent of Google’s users, you will find links to a number of vertical search engines; for example, the Microsoft collection and the US government collection. Note: Google’s universal search is a bit of marketing swizzle that means Google can take a query and pass it across indexes for discrete collections. The results are pulled together, deduplicated, and relevance ranked. This is a function available from Vivisimo since 2000. Universal search Google style displays maps and images, but it is far from cutting edge technology save for one Google factor–scale.
Why am I writing about vertical search when the topic for me came and went years ago. In fact, at the height of the vertical search frenzy I dismissed the hype. Innovators, unaware of the vertical nature of commercial databases 30 years ago, thought something quite new was at hand. Wrong. Google’s horizontal information dominance forced other companies to find niches where Google was not doing a good job or any job for that matter.
Vertical search flashed on my radar today (July 15, 2008) when I flipped through the wonderful information in my tireless news reader.
Autonomy announced:
that Foundography, a subsidiary of Nexus Business Media Ltd, has selected Autonomy to power vertical search on its website (sic)Â for IT professionals: foundographytech.com. The site enables business information users to access only the information they want and through Autonomy’s unique conceptual capabilities delivers an ‘already found’ set of results, providing pertinent information users may not have known existed. The site also presents a unique proposition for advertisers, providing conceptually targeted ad selling.
SharePoint Faithful: Infrastructure Updates
July 16, 2008
On July 15, 2008, Microsoft announced three updates that have an impact on our favorite Swiss Army knife for the enterprise. You can download the Infrastructure Update for Microsoft Office Servers (KB951297) (Download X86, Download X64). These updates provide the features previously shipped in Search Server 2008 and Search Server 2008 Express. I know this gets confusing, but for the moment, SharePoint and search are in sync. Among the features are “federated search” which allows the different repositories to be indexed and results displayed in a single results list. Vivisimo offered this function in 2000, and it is a positive step for Microsoft to offer this function. A spiffy dashboard is provided. We have not verified the assertion that the whole SharePoint ecosystem delivers performance improvements. We know that to goose SharePoint, the drill is to scale up and out. The problem, of course, is that budgets are shrinking. Cloud-based SharePoint solutions are starting to look mighty appealing.
Here’s a screen capture of the federated results. Suggestions appear in the right hand column. The central display includes Web pages and docx files. There is no on the fly clustering in this display.
Microsoft has posted some knowledge base articles to explain other enhancements to the SharePoint infrastructure; for example:
Description of the Infrastructure Update for Microsoft Office Servers (KB951297)
Fixes Included in the Infrastructure Update for Microsoft Office Servers (KB953750)
Please You will want to scan the SharePoint Team blog. Not surprisingly, there are some installation procedures that must be followed. We clicked on the links but as of 7 pm Eastern time on July 15, 2008, not all the links were active. Check this link to see if the necessary knowledge base articles are available.
Take it from me. Read these documents. A hitch in the git along can ruin a sysadmin’s weekend.
Stephen Arnold, July 16, 2008
Google and Yahoo: Alleged Conspirators
July 15, 2008
Marketwatch reported that Brad Smith, a Microsoft executive, asserted that Mountain View’s most interesting search companies conspired against the Redmond giant. The venue was a hearing before the U.S. Senate Committee on the Judiciary. The scope of hearings begins with one issue and then wanders to and fro.
Jeffry Bartash wrote “Microsoft Takes Aim at Yahoo CEO”, and you should read the full story here. The Joseph Weisenthal reported about the hearing in which the assertion was made. You can find the gory details via PaidContent.org on the Washington Post’s Web site here. Both write ups agree on the escalating war of words in the Microsoft-Yahoo matter. You can find other summaries of the hearing in Google News and news aggregator sites.
The most interesting comment to me about this war of words comes from Mr. Bartash:
One thing that seems clear is that the traditional corpus of antitrust law isn’t well suited for this market.
Yes, and that’s why nothing particularly constructive or informed will come from high-tech companies’s executives, lawyers, and elected officials mixing it up in a cavernous room with aides passing notes, photographers crawling on their knees, and the Senate elite asking questions prepared by an group of 20-somethings with Potomac fever.
Assume that the hearing results in a decision that triggers a legal matter. The outcome enters the unusual world of litigation. Normal rules don’t apply, and the decision can often baffle those involved in the process.
I like Microsoft’s approach. Cut through the technology, the talk of online ad methodology, and marketing. The accusation, if indeed a real accusation, simplifies Microsoft’s position. Making something simple is a key skill. In my view, Microsoft has seized the high ground. Now Google and Yahoo have to take the hill.
Stephen Arnold, July 15, 2008
Update: July 15, 2008, 5 pm: My news reader cheerfully delivered Erick Schonfeld’s “Google’s Talking Points for Today’s Antitrust hearings”: The Only One Who Won’t Like Our Yahoo Deal Is Microsoft”. Pretty useful information because TechCrunch provides an insight into Google’s attitude. The comments to the base essay are also helpful. I liked SteveR’s comment that Microsoft is now on the flip side of “an anti trust case”.
Google’s NLP in the Address Bar
July 15, 2008
The USPTO published US7401072, “Named URL Entry”. Awarded to Google, the patent discloses a system for performing natural language search on words typed in a browser’s navigation bar. The idea is that when Google Toolbar, Google ig, or a Google-friendly browser is installed on a user’s system, a user can type queries in the navigation bar, not just the search box.
How does this magic work? You will want to read the patent application. My initial thought was that the user would have have a stateful Google session running; for example, Google “ig”, Google Docs, or the Google Toolbar. As I thought about this invention, I wondered, “Will Google introduce its own browser?”
I tried to dig up some useful information about the inventors of this disclosed system and method. What I found was slim pickings. John Piscitello (former Product Manger Google Video) seems to have left the search giant. Xuefu Wang and Breen Hagan are mysteries to me. And Simon Tong, Senior Research Scientist at Google, leaves few biographical traces in content indexed by public search engines.
I find the lack of information about Dr. Tong interesting. He is mentioned in more than a dozen Google patent documents, which qualifies him as a genuine Google wizard. Dr. Tong has received several Google awards for contributions to the firm; for example, the Google Founders’ Award. He does play ping pong very well and enjoys photography. Beyond those facts and his ties to Stanford’s Daphne Koller, I don’t know much about his technical contributions to Google. He did figure as a co-inventor on what I consider a very important Google invention; namely, Large Scale Machine Learning Systems and Methods, 7222127, May 22, 2007. If you have not reviewed this patent document, a half hour with this disclosure may be helpful in understanding Google’s approach to computational intelligence.
My research suggests that when Dr. Tong’s name is on a Google patent document, that document warrants close attention. Almost as interesting is the impact of this invention if Google brings out its own browser. The notion of a walled garden exerts its charms on many because of the control it delivers along with the joys within.
Stephen Arnold, July 15, 2008
Shooing Away Legal Eagles
July 15, 2008
This lawyer stuff is not to my liking. I read the Google Web log post “The Law and Your Privacy: An Update” and I shivered. I recall my debate coach in high school criticizing my linking of a broad generalization with a personal argument. He used a fancy Greek term I didn’t understand. What I did understand was his telling me that I would lose if my opponent drove a truck through the argument and over me. You can read the Google Web log post here.
Then I scanned Michael Arrington’s “Google/Viacom Agree to Preserve User Anonymity in Data Shakedown”, and I thought my debate coach would have smiled that chilling grin he used when he knew he had a winner. Mr. Arrington summarizes what the various legal eagles worked out, and then he added this remark with which I agree:
I for one have no further objections to this data being handed over from a privacy standpoint, although I still urge Viacom to stop the endless litigation and consider more innovative business models around their content.
By the way, the TechCrunch post includes the court document issued by Judget Stanton.
I have had the opportunity to serve as a consultant in some legal matters, and I think my health issues in early 2007 were blow back from the stress of these jobs. Once a legal process starts moving, in my opinion, the matter takes on a life of its own. The logic is exactly that honed in university rhetoric courses and then refined in law school.
Once inside the argument, the reasoning becomes sharply faceted. For those without much experience getting ripped by one of the winningest high school debate coaches in Illinois’ history, you have to keep fingers out of the moving machinery. A misstep can mean real trouble.
Mr. Arrington’s point is one that speaks clearly and wisely. Throwing money at a crusade is fine and dandy for a while, but most legal skirmishes can be resolved by talking. In this matter, I know I cannot tell who is “right” and who is “wrong”. Once the legal drag line starts ripping earth, you see the big gouges and piles of dirt. It’s pretty tough to envision what was there before the process began.
In my view, big media ignored the grassroots revolution that has been building. I wrote an essay about Napster years ago in which I portrayed media’s view of that service as the spawn of Satan. The angle I took, as I recall, was that the people doing the bulk of the file sharing, copying, and trading were the sons and daughters of the big media people and guys like me. I recall watching students at a major university engaging in mass audio ripping and sharing when I visited my son’s dorm. Now there are young people who just don’t understand why everyone doesn’t behave as they do.
Today, with traditional media business models failing, controlling ephemeral zeros and ones is pretty tough. No, not tough. Impossible. Google bought YouTube.com and with it Google acquired traffic and an opportunity to monetize. The traffic is still there, but the service has been difficult to monetize. Meanwhile savvy young people have seized upon online media as their next big thing.
Young people see online video as the way to snag chunks of crunchy information. I have done several research projects probing into young people’s attitudes about online information. I don’t understand what I learned, and I can sum it up easily:
Online is the standard TV-telephone-juke box-hang out super enabler. I am a dinosaur. I think it is admirable to preserve traditions. A crusade, as long as one is not killed, can be enervating. However, with each passing year, the swelling number of young people who are going about their business oblivious to the needs of courts, Google, Viacom, and adults in general is growing. I would assert that if Google and Viacom did not exist, the inexorable march of executives’ children will continue into the fluid, copyright indifferent digital future.
Based on my research, in my opinion, it is more productive to work out a pragmatic solution and turn one’s attention to serving the needs of this growing and soon to be dominant majority consumer mass. I have a wonderful gold Waterman fountain pen, a relic from my Booz, Allen & Hamilton days in the late 1970s and early 1980s. I don’t use it. I can write this essay on paper, and most people can read my writing. But I have had to adjust to the new world. My expensive Waterman is sleeping in its velvet lined storage box, but I don’t think I will use it again–ever.
Adjust and adapt. Otherwise the rising tide of young people’s behavior will marginalize BOTH new companies like Google and the old guard like Viacom.
Agree? Disagree? So educate me, already.
Stephen Arnold, July 15, 2008
Xobni: Email Search and More
July 15, 2008
The process of locating an email message remains–ah, shall I say?–uneven. I have seen demos of some nifty email search from Coveo, the Canadian search and content processing that is expanding its product portfolio and its top line revenue.
Xobni, an email extender, now integrates with Facebook. You can read about this function here. I am not a fan of social search, and I think that this type of function in an organization can deliver some surprises to senior management unless certain precautions are taken.
A client asked me if Xobni could be used as an alternative to Clearwell Systems. I have described Clearwell’s approach to content processing here. I am working on a more thorough analysis of Xobni now. My hypothesis is that Xobni is designed for the average email user. Clearwell, on the other hand, is tailored to the needs of attorneys and law librarians, among other specialists, working on legal matters
Xobni does provide email search, but its reach extends to email organization, in box management, and social functions. Xobni has a Googler on the staff and venture money in its bank account.
You can download a demo of the product here. Xobni runs on Windows and requires that you have Outlook 2003 or 2007 installed. The software runs on Microsoft Windows.
The company has a nifty demo here.
One of the two or three people who read this Web log alerted me to Xobni’s embedded entity extraction function. Xobni can parse emails and pull out phone numbers, among other entities. The software features a function that threads people together. This function is somewhat similar to Clearwell System’s email threading operation.
What I find interesting about Xobni is that sophisticated text processing operations are finding their way into what are consumer applications or mainstream business applications.
One risk to Xobni is that Microsoft embeds similar functions into the next release of Outlook. My experience suggests that Xobni is positioning itself to be purchased, possibly by Microsoft.
My concern with any application written for Outlook is that the personal store management issues loom large. Security simply does not exist when most users can copy a PST file and have a go at browsing email, sometimes another person’s.
Take a look at Xobni. There will be more interesting uses of text processing functions.
Stephen Arnold, July 15, 2008