SharePoint Placemat

June 28, 2008

Microsoft SharePoint got to know one another several years ago. Via referral, a Microsoft Gold Certified Partner wanted my team and me to run some tests on a SharePoint application. We got everything running, wrote our report, and the Gold Certified Partner was a quick pay.

After the project, one of my colleagues remarked, “SharePoint is really complex.” We put the idea aside until someone emailed us a SharePoint placement. A copy of this remarkable diagram is available if you want to look at it. You can find it in SharePointSearch.com here.

Here is a thumbnail of the full diagram, but I strongly urge you to download the diagram. Do you think it is a joke of some type? My colleagues and I saw something similar from a Microsoft partner in New Zealand a year ago, but this placemat is a triumph of sorts. The company preparing the diagram is Impac Systems Engineering.

The complexity of search in general and SharePoint in particular is an interesting topic. Search can be quite a challenge. One recent example is the inability of Internet Explorer to open a SharePoint document. You can read more here and download a fix here. Embedding search into a content and collaboration system with data management features may push the boundaries of software to their limits.

CleverWorkArounds.com has an essay called “Why Do SharePoint Projects Fail”. You can look at Part 5 here. I was unable to locate the other portions of this discussion, however. (Part 3 is here.) For me, there are three main points that address the issue of the almost-funny placemat diagram:

The skills required to implement SharePoint include “IIS, Windows Server, TCP/IP & networks, SQL Server 2005 Advanced Administration, Firewalls, Proxies, Active Directory, Authentication, Security, IT Infrastructure Design, Hardware, Performance Monitoring, Capacity Planning, Workflow, IE, Firefox, Office Client tools, ASP.NET, HTML, JavaScript, AJAX, XSL, XSLT, Exchange/SMTP, Clustering, NLB, SANs, Backup Solutions, Single Sign on, Monitoring & Troubleshooting, Global Deployments, Dev, Test, Staging, Production – Staged deployments, ITIL, Vitalization.”
“SharePoint is complex and the products it relies on are also complex. In the wrong infrastructure/architect hands, this can cause costly problems.”
“… if there is not a certain degree of discipline around change management, configuration management, procedures, standards and guidelines to administrators, users, site owners and developers, bad things will happen.”

These points underscore the problem with “boil the ocean” systems. The fire needed to get water sufficiently hot to cook eggs can consume the pot, leading to a big mess.

Observations

I took another look at the placemat diagram and re read Part 3 and Part 5 of the essay “Why Do SharePoint Projects Fail?” Let me offer several observations from my dirt floor cabin in the hills of rural Kentucky:

First, SharePoint is a beast. Enterprise search is a monster. What will the progeny of these two behemoths be like? My opinion is that it will be tough to see through the red ink flooding some SharePoint projects. Toss in a hugely complex system such as Fast Search & Transfer’s Enterprise Search Platform, and you have a very interesting challenge to resolve.

Second, complexity is a Miracle Grow for consultants. SharePoint is complex, and it will probably only get more complicated. In my experience, Microsoft software becomes efflorescent quickly.

Finally, SharePoint attempts to deliver what may be a system that will be out of step with cloud-based services. SharePoint as a hosted or cloud-based service is generating some buzz. However, will the latency present in most on-premises installations be an issue when delivered as a service? My view is that latency, more than issues of security or data confidentiality, will bog down the SaaS implementation of SharePoint.

SharePoint is hugely successful. I heard that there are more than 65,000 licenses in North America alone. The SharePoint market is a tempting one for companies like Google to consider as one ripe for an alternative.

Stephen Arnold, June 27, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, Feature, Microsoft, Search | 8 Comments

Microsoft Powerset: Is There a Role for Amazon?

June 27, 2008

On May 10, 2008, I offered some thoughts about Microsoft’s alleged interest in Powerset. You can find this bit of goose quacking here.

In case you missed the flurry of articles, essays, and opinion pieces, more rumors of a Microsoft Powerset tie up are in the wind. Matt Marshall ignited this story with his write up “Microsoft to Buy Semantic Search Engine Powerset for $100 Million Plus”. You must read this here. The most interesting statement in the essay is:

Google has generally dismissed Powerset’s semantic, or “natural language” approach as being only marginally interesting, even though Google has hired some semantic specialists to work on that approach in limited fashion.

My research for BearStearns last year revealed that Google has more than “some specialists” working on semantic issues. Alas, that document “Google’s Semantic Web: the Radical Change Coming to Search
and the Profound Implications to Yahoo! & Microsoft” is no longer easily available. There is some information about the work of Dr. Ramanathan Guha in my Google Version 2.0 study, but the publisher insists on charging people for the analysis of Dr. Guha’s five patent applications. Each of these comes at pieces of the semantic puzzle in quite innovative ways. If Dr. Guha’s name does not ring a bell, he worked on the documents that set forth the so-called Semantic Web.

So, Google is–according to this statement by Mr. Marshall not too keen on Powerset-style semantics. I agree, and I will get to the reasons in the Observations section of this essay.

The story triggered a wave of comments. You can find very useful link trails at Techmeme.com and Megite.com. The one essay you will want to read is Michael Arrington’s “Microsoft to Buy Powerset? Not Just Yet.” By the time you read this belated write up, there will be more information available. I enjoy Mr. Arrington’s writing, and his point about the Powerset user interface is dead accurate. We must remember that user’s are creatures of habit, and the user community seems to like type a couple of words, hitting the enter key, and accepting the first three or four Google results as pretty darn good.

Semantic technology is very important. Martin White and I are working on a new study, and at this point it appears that semantic technology is something that belongs out of site. Semantic technology can improve the results, but like my late grandmother’s girdle and garters, the direct experience is appropriate only for a select few. Semantic technology seems to share some similarities with this type of best-left-unseen experience from my childhood.

An Amazon Connection?

My interest in a Microsoft Powerset deal pivots around some information that I believe to have a kernel of truth buried in it. Earlier this year, I learned the Microsoft had a keen interest in Amazon’s database technology. Actually, the interest was not in the Oracle database that sites, like a black widow spider in the center of a Web, but in the wrapper that Amazon allegedly used to prevent direct access to the Oracle tables from creating some technical problems.

Amazon had ventured into new territory, tapping graduate students from the Netherlands, open source, specialist vendors, and internal Amazon wizards to build its present infrastructure. Amazon has apparently succeeded in creating a Google-like infrastructure at a fraction of the cost of Google’s own infrastructure. Amazon also has fewer engineers and more commercial sense than Google.

In the last 18 months, Amazon has pushed into cloud computing, Amazon Web services, and jump starting a wide range of start ups needful of a sugar daddy. I recently wrote about Zoomii.com, one innovator surfing on the Amazon Web services “wave”. You can read that essay here.

Microsoft needs a NASCAR engine for its online business. Microsoft is building data centers. But compared to Amazon and Google, Microsoft’s data centers are a couple of steps behind, based on my research work.

At one meeting in Seattle, I heard that Microsoft was “quite involved” with Amazon. When I probed the speaker for details, the engineer quickly changed the subject.

Powerset–if my sources are correct (which I often doubt)–is using Amazon Web services for some its processing. If true, we have an interesting possibility that Microsoft may be pulled into an even closer relationship with Amazon.

I am one of the people who thought that Microsoft would be better able to compete in the post-Google world if Microsoft bought Amazon. Now let me get to my thinking, and, as always, I invite comments. First, Microsoft would gain Amazon’s revenue and technical know how. Arguably these assets could provide a useful platform for a larger presence in the online world.

Second, Microsoft gains the cloud-based infrastructure that Amazon has up and running. From my point of view, this approach makes more sense than trying to whip Windows Server and SQL Server into shape. The Live.com services could run on Amazon or, alternatively, the whopping big Microsoft data centers could be used to provide more infrastructure for Amazon. An added benefit is that Microsoft–despite its spotty reputation for engineering–seem to me to be more disciplined than Amazon’s engineers. I have heard that Amazon pivots on teams that can be fed with a pizza. While good for the lone ranger programmers, the resulting code can be tough to troubleshoot. Each team can do what it needs to do to resolve a problem. The approach may be cheaper in the short run, but in my opinion, may create the risk of a cost time bomb. A problem can be tough to troubleshoot and then fix. Every minute of downtime translates to a loss in credibility or revenue.

Written by Stephen E. Arnold · Filed Under Feature, Microsoft, Online (general), Search | 1 Comment

IBM’s Vertical Search Engine for Research Papers: As Disappointing as IBM Planetwide Search

June 26, 2008

I want to pick up the thread of my discussion of IBM’s Planetwide search system. IBM offers a vertical search system for its research publications. If you are not familiar with this system, you can access it here http://domino.research.ibm.com/library/cyberdig.nsf/index.html.

The default search page features fields. I assume that IBM believes that anyone looking for IBM research information feels comfortable with specifying authors, reports by geographic region, and the notion of narrowing a query to a title or abstract.

The first query I ran was “dataspace”, an approach to data management that dates from the 1990s. The query returned a null set just like my query for a WebFountain document on IBM Planetwide. No suggestions. No “did you mean”. No training wheels in the form of “See Also” references.

The second query was one of my favorites, programmable search engine. IBM did quite a bit of research related to this technical notion in 2004 to 2005. Again, a null set.

My third query is for Ramanathan Guha, one of the wizards involved in defining bits and pieces of the semantic Web. Again a null set. Zero hits. I was surprised by Ramanathan Guha worked at IBM Almaden before he went to Google and promptly filed five patents on the same day in 2005.

My fourth query was for “Semantic Web.” I was not too hopeful. I was zero for three in the basic query department. The system generated a page of results.

When I scanned this list, I noticed three quirks:

I could not figure out the relevance logic in this list. The first hit does not have “semantic web” in its title but the phrase appears in the abstract. The date is 2005. The paper references the Semantic Web, yet its focus is on two IBM-emmy notions, Model-Driven Architecture (MDA) and Ontology Definition Metamodel (ODM).
Newer documents appeared deep in the result list; for example, Kamal Bhattacharya, Cagdas Gerede, Richard Hull, Rong Liu, Jianwen Su (2007). “Towards formal analysis of artifact-centric business process models” in RC24282. I could not find a way to sort by date.
A document that I thought was relevant was even deeper in the result list. The title, the abstract, and the paper itself evidenced numerous references to semantics and concepts germane to the query. After examining the paper, I wondered if the IBM system was putting the most relevant documents at the foot of the results list not the top. Furthermore, there were no 2008 documents on this subject, and I could not figure out exactly what was in this collection.

I clicked on the hot link for recent news. The most recent news was dated 2007 but the system offered me a hot link to 2008 news. I was expecting the news to be displayed in reverse chronological order with the most recent news at the head of the page and the older news at the foot. Nope. I clicked on the hot link for 2008 news and the system displayed this page:

At this point, I lost enthusiasm for running queries for papers from IBM research using the search system that one search pundit described to me as “quite good”.

I navigated to Google and entered this query: IBM Almaden research +”Ramanathan Guha”. Google responded in 0.23 seconds with 78 hits. The first three were:

My searching skills are not too good. I am getting old. I eat squirrel stew. My logo is a silly goose. I wear bunny rabbit ears before erudite audiences in New York. Nevertheless, the IBM search system for its research papers is not too useful. I will stick with Googzilla. IBM may want to try Google’s free custom search engine and at least deliver pretty good results instead of the disappointments I experienced. IBM-ers, agree or disagree? Search pundits weigh in. Maybe I am missing something. Time to go shoot squirrels with my water pistol. More productive than trying to find information with the IBM research vertical search engine.

Stephen Arnold, June 26, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, Feature, Search | Comments Off on IBM’s Vertical Search Engine for Research Papers: As Disappointing as IBM Planetwide Search

IBM Search: A Trial of Patience for Customers

June 25, 2008

A quick question. What is the url for IBM’s public Web search? Ah, you did not know that IBM had a Web search system. I did. IBM’s crawler once paid a quick visit to my Web site years ago. You can use this service yourself. Navigate to http://www.ibm.com/search. The service is called the IBM Planetwide Web.

Let us run a test query. My favorite test query is for an IBM server called the PC704. I once owned two of these four processor Pentium Pro machines. For years I wanted to upgrade the memory to a full gigabyte, so I became a regular Sherlock Holmes as I tried to find memory I could afford.

Here are the results for this query PC704.

The screen shot is difficult to read, but there is one result–a reference in an IBM technical manual. Let us click on the link. We get a link to a manual about storage sub systems. I know that IBM discontinued the PC704, but the fact that there is no archive of technical information about this system is only slightly less baffling than the link to the storage documentation.

Let’s try another query. Navigate to http://www.ibm.com. We are greeted with a different splash screen with an option to “sign in” and a search box. Let’s run a new query “text mining”. The system responds with a laundry list of results. The first five hits are primarily research documents. The second page of the results has links to two IBM text mining systems, IBM TAKMI and IBM Text Mining Server. TAKMI is another research link and the Text Mining Server is on the IBM developer Web site.

I don’t know about you, but I received one hit for PC704 and and quite a few research hits for text mining. Where is the product information?

Let us persist. I know that IBM had a product called WebFountain. I want information about that product. I enter the single word, WebFountain, and the IBM system responds with 152 results. The documentation links figure prominently as well as pointers to information about a WebFountain appliance and architecture for a large-scale text analytics system.

Result 13 seemed to be on target. Here is what the Planetwide system showed me:

IBM – WebFountain – United States
WebFountain is a new text analytics technology from IBM’s Research division that analyzes millions of pages of data weekly.
URL: http://www-304.ibm.com/jct03004c/businesscenter/vent…

And here is the Web page this link displays.

Stepping Back

What have these three queries revealed?

Despite the cratering of prices for storage devices, IBM does not maintain an archive of information about its older systems. The single hit for the string PC704 was to a book about storage. The string PC704 probably appears in this technical manual, but the system’s precision and recall disappointed me.
The second query for text mining generated more than 3,000 hits. My inspection of the results suggested to me that IBM was indexing technical information. Some of the documents appeared to be as old as the PC704 that was not available in the index. The results provided no context for the bound phrase, and the results were to me delivering unsatisfactory precision. Recall was better than the single hit for PC704 however. To me, irrelevant hits are not much better than one hit.
The third query for an IBM product called WebFountain generated hits to research reports, documentation, and a Web site about WebFountain. Unfortunately, the link was active but there were no data displayed on the Web page.

All in all, IBM’s Planetwide search is pretty lousy for me. Your mileage may vary, of course.

Written by Stephen E. Arnold · Filed Under Feature, Google, Online (general), Search | 2 Comments

ProQuest Dialog: An Optimistic View

June 24, 2008

I talked with a colleague who described to me Outsell’s view of the ProQuest Dialog deal. I have not seen the report, but you can order a copy here. Note that the research company has as its url outsellinc.com. The url outsell.inc is an automotive company with the same name.

As I understood my colleague, Outsell sees a positive gain. Libraries–Dialog’s revenue bulwark–will win. ProQuest is a company focused on library information. The buyers got a good deal, paying considerably less than Dialog’s previous owners paid. My recollection is that with each sale of Dialog, the seller lost money. From a high of $400 million, this deal rings in well south of that, probably a fire sale price. Third, users benefit because ProQuest is able to leverage the terabytes of abstracted, bibliographic, full text, and semi-structured data on the Dialog computers.

This is a snippet of a Dialog Blue Sheet. It makes clear the details about a database. Can you figure this out? If you can, then you can pay as much as $100 per query to access these types of data. The market for this information is inelastic; that is, you can raise the price, but you will not be able to boost revenues. A few people will pay anything to get these data. Most people will go elsewhere.

The Optimist’s View

I can accept a positive spin on the deal. However, there may be some factors that the financial wizards with the sharp pencils may not be able to control:

Libraries are strapped for cash. The life blood of information companies who sell to libraries is the standing order. Under budget pressure, each year standing orders get closer scrutiny. In head to head competitions, clever pricing deals can win the one year deal. The problem is that there is no renewal and thus no revenue base. With only a few big players selling online information, it may be tough to pump up revenues.
Information users like the New South Wales’ students who will be exposed to Google may find the Dialog-style online information as archaic as my son did when I introduced him to online research in lieu of the Readers Guide to Periodical Literature in 1982. Dialog is simply not in tune with the bloggy, real timey, and Webby world of online.
ProQuest and its parent have never been at the cutting edge of technology. Online today has to deal with scaling, commodity hardware, and fast cycle programming. Perhaps ProQuest’s technologists are as good as the engineers at Google? My thought is that ProQuest may be stretched to the limit dealing with an online system with its roots in the late 1960s. The cost to get modern may be beyond the reach of the new owners.

Written by Stephen E. Arnold · Filed Under Feature, Online (general), Search | Comments Off on ProQuest Dialog: An Optimistic View

Connectors: Rounding Up Some Definitions

June 22, 2008

I received an email this morning (June 22, 2008). The writer asked, “Are connectors the same as filters?” As I walked the world’s most wonderful dog, I considered this question. This short essay is a summary of my thoughts. If you have other concepts and definitions to add, please, use the comments section to share them with me and the three other readers of this Web log. Ooops. There may be four readers. I sent a link to my father and he often looks at what I write.

Connectors

Let us look at what Google provides as a definition. Enter the query “define:connectors” and the Google returns nine definitions. A quick scan of the links and the text snippets provides a useful starting point; specifically:

A growing collection of libraries that abstract the interfaces of specific hardware or enterprise integration methods. (More here)

Google offers a number of related phrases to assist me, but none of these seem to relate to enterprise search, content processing, or text analytics.

Is this gold or fool’s gold? Without a formal method for testing, even experience rock hounds may not know what the substance is. Software can deliver valuable functions or deliver a lower value operation.

File Conversion

Language is tricky, and business English is a slippery type of language. I know, for example, that some companies provide file conversion tools. So, what is the meaning of file conversion. For a definition, I turn to one of the vendors offering file conversion software. I enter the query “stellent file conversion” into Google and get a pointer to Stellent’s Dynamic Conversion Process“. This is a function that takes content from a Web page or other source and makes it Web viewable.

I recalled licensing software under the name “Outside In”, and my recollection was that Stellent bought this company and continued to sell the product. My recollection is that Stellent’s software components could take a file in one format such as XyWrite III+ and convert it to Microsoft Rich Text Format. Few people today need to convert XyWrite files, but the US House of Representatives still has some XyWrite files kicking around even today I heard.

I consulted my digital archive and located this explanation of the software. I am going to paraphrase the description to pull out the key point: The technology allows developers the ability to “view, filter and convert more than 225 file formats without using native applications.

After a bit of poking around I located a description of Outside In on the Oracle Web site here. Oracle purchased Stellent, and you can license the software from Oracle. The most recent version of the Outside In software performs a number of functions important to file conversion, which seems to be the main thrust of Oracle’s description of the Outside In technology; specifically Oracle says:

Clean Content—Identifies and scrubs risky hidden data from Microsoft Office documents
Content Access—Extracts text and metadata from more than 400 file types
File ID—Quickly and accurately identifies file types
HTML Export—Converts files into HTML rendering embedded graphics as a GIF, JPEG, or PNG
Image Export—Converts files into TIFF, JPEG, BMP, GIF, or PNG images
PDF Export—Converts files into PDF without native applications or 3rd party libraries
Search Export—Converts files into one of four formats designed specifically for search
Viewer—Renders high-fidelity views of files and allows printing, copy/paste, and annotations
XML Export—Converts and normalizes files into XML that defines properties, content, and structure

Oracle’s checklist provides a good round up of the bits and pieces that comprise file conversion functions. It seems that we have a definition of sorts. Note: Oracle provides a useful 2007 white paper to help you navigate through the sub concepts embedded in the Outside In system here.

File conversion–A software that performs a number of separate operations to change a file in one format to another format. The purpose of file conversion is to eliminate the need to open a native applications such as XyWrite to export a file in a different format.

But what about information in a database like IBM’s DB2, Microsoft’s SQL Server, and Oracle’s database? Well, these file types are widely used in organizations, and it is easy for a database administrator to export a relational database as a comma separated value file or in what is called the CSV format. Also, may systems can “read” database files or database reports. But I have heard that these features do not work on certain types of information stored in a database; for example, the database contains row and column headings that are not plain English or the cells in the database are filled with numerical strings that are codes.

One work around is to write a report, query the database, save the answer to the query as HTML or XML and then process those HTML or XML files as individual documents. But that seems like a great deal of work. What happens to those cryptic row and column headings? What does the report do to make the values in the cells understandable to a human.

We don’t need file conversion. We need another process? What is it called?

Written by Stephen E. Arnold · Filed Under Database, Feature, Search, Text processing | Comments Off on Connectors: Rounding Up Some Definitions

Text Analytics Summit Summary Sparks UIMA Thoughts

June 22, 2008

Seth Grimes posted a useful series of links about the Text Analytics Summit, held in Boston the week of June 16, 2008. You can read his take on the conference here. I was not at the conference. I was on the other side of the country at the Gilbane shin dig. To make up for my non attendance, I have been reading about the summit.

From what I can deduce from the Web log posts, the conference attracted the Babe Ruths and Ty Cobbs of text analysis, a market that nestles between enterprise search and business intelligence. I am not too certain about the boundaries of either of these markets, but text analytics is polymorphic and can appear searchy or business intelligency depending upon the context.

I clicked through the links Mr. Grimes provides, and I recommend that you spend a few finites with each of the presentations. I learned a great deal. Please, review his short essay.

One point stuck in my mind. The purpose of this essay is to call your attention to this comment and offer several observations about its implications for those who want to move beyond key word retrieval. Keep in mind that I am offering my opinion.

Here’s the comment. Mr. Grimes writes:

I’ll conclude with one disappointing surprise on the technical front, that UIMA — the Unstructured Information Management Architecture, an integration framework created by IBM and released several years ago as open source to the Apache — has not been more broadly accepted. IBM software architect Thomas Hampp spoke about his company’s use of the framework in the OmniFind Analytics edition, but Technology Panel participants said that their companies — Attensity (David Bean), Business Objects (Claire Thomas), Clarabridge (Justin Langseth), Jodange (Larry Levy), and SPSS (Olivier Jouve) — simply do not perceive user demand for the interoperability that UIMA can offer.

My understanding of this statement and the supporting evidence in the form of high profile industry executives is that an open standard developed by IBM has little, if any, market traction. In short, if the UIMA standard were gasoline, your automobile would not run or just sputter along.

Let us assume that this lack of UIMA demand is accurate. Now I know this is a big assumption, and I am confident that an IBM wizard will tell me that I am wrong. Nevertheless, I want to follow this assumption in the next part of the essay.

Possible Causes

[Please, keep in mind that I am offering my opinion in a free Web log. If you have not read the editorial policy for this Web log, click on the About link on any page of Beyond Search. Some readers forget that I am using this Web log as a journal and a container for the information that does not appear in my for fee reports and my paid writings such as my monthly column in KMWorld. Some folks are reading my musings and ignoring or forgetting what I am trying to capture for myself in these posts. Check out the disclaimer here.]

What might be causing the lack of interest in UIMA, which as you know is an open source framework to allow different software gizmos to talk to one another? For a more precise definition UIMA, you can give the IBM search engine a whirl or click this Wikipedia link, http://en.wikipedia.org/wiki/UIMA.

Here is my short list of the causes for the UIMA excitement void. I am not annoyed with IBM. I own IBM servers, but I want to pick up Mr. Grimes’ s statement and perform a thought experiment. If this type of writing troubles you, please, click away from Beyond Search. Also, I am reacting to a comment about IBM, but I want to use IBM as an example of any large company’s standards or open source initiative.

First, IBM is IBM. IBM has an obligation to its shareholders to deliver growth. Therefore, IBM’s promulgating a standard is in some way large or small a way to sell IBM products and services. Maybe potential UIMA users are not interested in the potential upsell that may follow.

Second, open source and standards have proven to be incredibly useful. Maybe IBM nees to put more effort into educating partners, vendors, and customers about UIMA? Maybe IBM has invested in UIMA and found that marketing did not produce the expected results, so IBM has moved on.

Third, maybe today IBM lacks clout in the search and content processing sector. In 1960, IBM could dictate what was hot and what was not. UIMA’s underwhelming penetration might be evidence that the IBM of today lacks the moxie the company enjoyed almost a half century ago.

And one fourth possibility is that no one really wants to embrace UIMA. Enterprise software is not a level playing field. The vendor wants to own the customer, locking out any other vendor who might suck dollars from the company owning a customer. IBM and other enterprise vendors want to build walls, not create open doors.

I have several other thoughts on my list, but these four provide insight into my preliminary thinking.

Observations

Now let’s consider the implications of these four points, assuming, of course, that I am correct.

Big companies and standards do not blend as well as a peanut butter and jelly sandwich. The two ingredients may not yet be fully in harmony. Big companies want money and open standards do not have the revenue to risk ratio that makes financial officers comfortable.
Open source is hard to control. Vendors and buyers want control. Vendors want to control the technology. Buyers want to control risk. Open source may reduce the vendor’s control over a system and buyers lose control over the risk a particular open source system introduces into an enterprise.
Open source appeals to those willing to break with traditional information technology behavior. IBM, despite its sporty standards garb, is a traditional vendor selfing traditional solutions. Open source is making headway, but it is most successful when youthful blood flows through the enterprise. Maybe UIMA needs more time for the old cows to leave the stock pen?

What is your view? Is your organization ready to embrace UIMA, big company standards, and open source? Agree? Disagree? Let me know.

Stephen Arnold, June 22, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, Feature, Search, Semantic, Text processing | 1 Comment

When Search News Isn’t News, It’s Disinformation

June 21, 2008

One of my short essays triggered a number of anonymous emails (the best kind), nasty phone calls (an interesting diversion for me) and Web log comments of varying clarity. The flash point item is here.

I have another project to complete, but this reaction to my short news item about Autonomy winning a search contract from a library in Lyon, France, kept nagging at me. As far as I can tell, Autonomy issued a news release about a renewed license agreement, not a new win.

I want to step back. I am not interested in Autonomy’s deal in Lyon. I am interested in the broader topic of what is new in enterprise search. In a conference call this afternoon, one of the people on the call asked me, “What did you learn from the 18 interviews in the Search Wizards Speak series?”

The answer I gave was, “There were only three of four innovations that struck me as new.”

Let’s consider my comment. I talked with 18 “search wizards” over a period of four months. I can identify only three or four new developments.”

I know that the companies for whom these wizards work have generated news releases. Some pump out publicity every two weeks. Others punch the PR button six or seven times a year. What are these companies announcing that I don’t consider news.

Here is my short list:

New software versions. This is an item of interest, but it is not going to be picked up by Computerworld.
Added features or functions. A popular innovation is “social”. The idea is fuzzy but seems to mean that a system user can add index tags or attach a note for any other person who accesses a document or report.
Deal wins. The vendor lands a contract and issues a news release saying, “We won this big deal.” Shareholders and competitors have more interest in these than I do.
New hires. This is legitimate news, but the enterprise search industry lacks a Wall Street Journal to gather the executive changes which light the marketing fires at Booz, Allen & Hamilton and McKinsey & Company. These firms write a new hire and say, “Congratulations. We can help you be successful.” Competitors and insurance sales people salivate over such announcements.

What happens when this type of news is diluted with multiple releases of the same information? What is the impact of pumped up version announcements which contain only bug fixes and a couple of add ons? What is the cumulative effect of repeated executive hiring announcements?

My thought is that enterprise search vendors are engaging in disinformation. I don’t think the consumers of the disinformation are potential purchasers, stakeholders, or employees of the company issuing the release.

Nope.

The folks who gobble up enterprise search information are the executives at other enterprise search companies. The search industry, which is under seige by customers and companies offering higher value solutions, is talking to itself.

I grew up in a small town. Information circulated quickly and was chock full of gossip, half truths, and insinuations. The intelligence was parochial; that is, the small town’s thought processes were honed for baloney processing.

When hard data from the “outside world” arrived, few knew how to interpret or put it in its approrpiate context. I think the enterprise search sector is close to becoming the equivallent of a dead end town on the edge of the prairie about 150 miles from a city with a million people.

Enterprise search vendors’ efforts to make sales, build buzz, differentiate themselves, and puff up their achievements are the equivalent of a digital peacock spreading its tail feathers. Other peacocks notices but no one else knows what the heck the squawks and flash mean.

Enterprise search is drifting close to disinformation. The marketing is filled with metaphors and homiletic assurances. The news is often not news; it is the peacock squawk and tail shaking. The licensees are wising up. Users of enterprise search systems are grousing. IT departments are unable to deal with some of the search systems because they are too complex. Options are now available.

What’s the fix? I don’t think there is a magic wand that can address disinformation. Customers will decide. Vendors may be too busy news releasing to one another to notice that the buyers have licensed enterprise applications with search baked in or settled for a plug-and-play solution. When the search vendors’ conversation lapses, their world may have changed without their noticing.

Stephen Arnold, June 21, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, Feature, Search | 3 Comments

Boomers and Millennials: Implications for Enterprise Search

June 20, 2008

Enterprise Search and the Age Gap

Employees, contractors, and consultants are becoming younger. For enterprise search, aging boomers are leaving the work force and younger employees moving in.

En route from San Francisco to the less civilized environs of rural Kentucky, I made a list of the differences between Millennials and Baby Boomers. Millennials are all digital all the time. Baby Boomers have luggage stuffed with printed books, paper calendars, and blank notebooks in which one letter at a time can be written using a pencil. For simplicity, I will call the Millennials the younger workers, and the Baby Boomers the aging dinosaurs. Keep in mind that you may be 25 and as mired in books and microfilm as an ossified Baby Boomer. The categories are not absolute. The two part division is intended to make it easy for me to communicate my thoughts about the changes wrought upon search as as Baby Boomers become the minority in organizations and Millennials become the majority.

I want to alert you that any one under the age of 35 will probably be annoyed at my thoughts. But this is a Web log, and I am going to capture these notions before I pass out from the brutalities of a red-eye flight seated next to the lavatory. In short, another red eye, another Web log essay about enterprise search from a different angle.

The generational differences mark a clean break with key word search and retrieval systems of the past and point to more sophisticated and complex information access solutions more youthful enterprise system users require.

Seven Differences between a Young Professional and a Near Retirement Professional

Difference 1: Under 35s don’t read anything long. I have the impression that the under 35 enterprise search user wants short, chunky information from search systems. Systems that return long documents that have to be printed out, annotated, and studies are not what users of search systems want from their information access systems. Over 55s (yes, I am generalizing) may not like long documents, but I for one will slog through this stuff. There may be gold in those hills, I think.

Difference 2: Under 35s want to have search suggestions, assisted navigation, Use For references, and See Also hints. Over 55s like me don’t have much resistance to formulating a query, scanning results, reformulating the query, scanning results, and finally narrowing the result set to a useful collection of documents which can then one-by-one be reviewed. I love shortcuts, but research is research.

Difference 3: Under 35s seem to have the uncanny ability to do several electronic tasks at once. At the Gilbane conference I watched as professional journalists listened to a speaker, sent messages on a BlackBerry, and chatted with the person sitting next to her. I am lucky if I can listen to the speaker; forget the digital activity. Over 55s are less adept multi taskers. The reason the BART train was speeding and crashed into a stopped train appears to have been a young train driver who was chatting on a mobile and controlling the subway train. I prefer single task focus to avoid collisions.

Written by Stephen E. Arnold · Filed Under Enterprise, Feature, Online (general), Search | 4 Comments

Fast Search: Is This a Real License Document?

June 19, 2008

I was updating my Fast Search & Transfer files. Over the last two months, I have noticed that certain information has been removed from the Fast Search Web site. I ran my crawler scripts and saw in the hit list this link:

http://contracts.onecle.com/findwhat/fast.lic.shtml

I am not familiar with onecle.com, so I ran a whois search. The owner of the domain is listed with some data that are not particularly helpful The title of the Web site to which the domain points is “California MCLE, CLE and Continuing Legal Education.” The purpose of the Web site seems to be to provide sample agreements for people to examine.

What interested me is this document which you can view if you click here. It appears to me, and I am not an attorney and have zero qualifications to do anything other than obey the law, that this agreement sets forth the terms for a deal between FindWhat.com and Fast Search & Transfer. FindWhat.com has become MIVA, which is a vendor of pay-for-click software and services.

The Onecle splash screen. Access to the site is here.

Because I am not certain of the provenance of this sample license agreement, I will not reproduce it here. However, there were three parts of the agreement that I found interesting. Let me highlight each of these points and then offer several observations about my understanding of each. You will need to print out the entire agreement or have it one your screen as you read my post. I am not going to quote more than a sentence from the source document just as I would have done in the 7th grade when I wrote my first term paper. Miss Soapes was quite negative about using work by another as one’s own.

The Three Points

1. Maintenance and Support

The $7 million license fee may be bogus. I heard that Fast Search licensing began in the $175,000 to $250,000 range and could go higher depending on the customer’s requirements. But $7 million seems high to me. But set that questionable number aside. The agreement says in 4. Maintenance and Support:

Customer shall purchase maintenance and support services from FAST with respect to all software licensed hereunder for a three year term

The fee, if I understand this correctly, is the starting point. Additional fees will be assessed to maintain the system and support it. I learned that most enterprise software vendors charge anywhere from 15 – 25 percent of the license fee for maintenance and support, I think the cost of the installation jumps significantly. I also believe that customization of the system adds to the cost.

A segment of the alleged Fast Search license agreement. Full document is here.

What baffles me is that Fast Search stumbled into financial difficulty because revenue did not flow into the company quickly enough to off set its costs. My thought is that customers signed a deal and then balked at paying when the system could not be set up, made operational quickly, and then supported at the specified rates. Whatever the license fee, I think it was not enough to free Fast Search from of financial pressure.

2. Schedule B: Service Level Objectives

Fast Search spells out that the customer must try to figure out what the problem is. Okay, that’s reasonable. However, Fast Search then limits who can contact Fast Search about the problem. You will find this language in Schedule B, Paragraph A. The point that hit me is that if the “standard support” is not what the customer needs, then the licensee can sign up for “Premium Support”. Again more charges to make a system work. Furthermore, Fast Search offers “Resolution Objectives”. When I read these, I concluded that Fast Search may not be able to fix some problems; therefore, a work around may be provided. Some work arounds can take up to a month. My thought was that if I am using Fast Search to generate revenue for my company, I cannot be offline or down for a month. I would say, “This software is supposed to work, right?” I am not certain if procurement teams poke their noses into the legal documents for an enterprise search acquisition. An attorney, unfamiliar with information access systems, might overlook these nuances of support. When a problem arises, I can see that it would reach a flash point quickly as the procurement team tries to get the system working only to be told that the caller is not on the list and that the fix may take a month.

3. Schedule D and Schedule E: Customer Competitors and Fast Search Competitors

My thought, when I saw these lists, was that these are not timely, which casts doubt on the authenticity of this sample license agreement. On the other hand, I wondered if a licensee’s legal department would review an agreement such as this and routinely update the “you cannot work with these people” lists. Now that Microsoft owns Fast Search, I think Fast Search licensees need to revisit their agreements and I assume that the Microsoft-Fast Search team will be contacting licensees to update these lists. What I found interesting is that Fast Search listed Microsoft as a competitor along with Google, Yahoo, Verity, Autonomy, Convera, and Endeca. Now Microsoft owns Fast Search, and it will be interesting to see if a sample license agreement becomes available.

Observations

I have two observations about this agreement, which may not be a legitimate contract:

First, the fees seem designed to produce significant revenue. That is okay as long as the system works. When the system does not work, the fees become an issue. Big companies with big bills owed to Fast Search may quit paying. The alleged financial difficulties may be a result of big companies not paying their bills.

Second, I will be most interested in any changes in Fast Search’s pricing and business policies under Microsoft ownership. The changes may reveal the approach Microsoft will take with the Fast ESP technology.

If you have insights, or simply wish to disagree with me, use the comments section on the Web log.

Stephen Arnold
June 19, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, Feature, Search | 5 Comments

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Employment
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

SharePoint Placemat

Microsoft Powerset: Is There a Role for Amazon?

IBM’s Vertical Search Engine for Research Papers: As Disappointing as IBM Planetwide Search

IBM Search: A Trial of Patience for Customers

ProQuest Dialog: An Optimistic View

Connectors: Rounding Up Some Definitions

Text Analytics Summit Summary Sparks UIMA Thoughts

When Search News Isn’t News, It’s Disinformation

Boomers and Millennials: Implications for Enterprise Search

Fast Search: Is This a Real License Document?

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta