Connectors: Rounding Up Some Definitions
June 22, 2008
I received an email this morning (June 22, 2008). The writer asked, “Are connectors the same as filters?” As I walked the world’s most wonderful dog, I considered this question. This short essay is a summary of my thoughts. If you have other concepts and definitions to add, please, use the comments section to share them with me and the three other readers of this Web log. Ooops. There may be four readers. I sent a link to my father and he often looks at what I write.
Connectors
Let us look at what Google provides as a definition. Enter the query “define:connectors” and the Google returns nine definitions. A quick scan of the links and the text snippets provides a useful starting point; specifically:
A growing collection of libraries that abstract the interfaces of specific hardware or enterprise integration methods. (More here)
Google offers a number of related phrases to assist me, but none of these seem to relate to enterprise search, content processing, or text analytics.
Is this gold or fool’s gold? Without a formal method for testing, even experience rock hounds may not know what the substance is. Software can deliver valuable functions or deliver a lower value operation.
File Conversion
Language is tricky, and business English is a slippery type of language. I know, for example, that some companies provide file conversion tools. So, what is the meaning of file conversion. For a definition, I turn to one of the vendors offering file conversion software. I enter the query “stellent file conversion” into Google and get a pointer to Stellent’s Dynamic Conversion Process“. This is a function that takes content from a Web page or other source and makes it Web viewable.
I recalled licensing software under the name “Outside In”, and my recollection was that Stellent bought this company and continued to sell the product. My recollection is that Stellent’s software components could take a file in one format such as XyWrite III+ and convert it to Microsoft Rich Text Format. Few people today need to convert XyWrite files, but the US House of Representatives still has some XyWrite files kicking around even today I heard.
I consulted my digital archive and located this explanation of the software. I am going to paraphrase the description to pull out the key point: The technology allows developers the ability to “view, filter and convert more than 225 file formats without using native applications.
After a bit of poking around I located a description of Outside In on the Oracle Web site here. Oracle purchased Stellent, and you can license the software from Oracle. The most recent version of the Outside In software performs a number of functions important to file conversion, which seems to be the main thrust of Oracle’s description of the Outside In technology; specifically Oracle says:
- Clean Content—Identifies and scrubs risky hidden data from Microsoft Office documents
- Content Access—Extracts text and metadata from more than 400 file types
- File ID—Quickly and accurately identifies file types
- HTML Export—Converts files into HTML rendering embedded graphics as a GIF, JPEG, or PNG
- Image Export—Converts files into TIFF, JPEG, BMP, GIF, or PNG images
- PDF Export—Converts files into PDF without native applications or 3rd party libraries
- Search Export—Converts files into one of four formats designed specifically for search
- Viewer—Renders high-fidelity views of files and allows printing, copy/paste, and annotations
- XML Export—Converts and normalizes files into XML that defines properties, content, and structure
Oracle’s checklist provides a good round up of the bits and pieces that comprise file conversion functions. It seems that we have a definition of sorts. Note: Oracle provides a useful 2007 white paper to help you navigate through the sub concepts embedded in the Outside In system here.
File conversion–A software that performs a number of separate operations to change a file in one format to another format. The purpose of file conversion is to eliminate the need to open a native applications such as XyWrite to export a file in a different format.
But what about information in a database like IBM’s DB2, Microsoft’s SQL Server, and Oracle’s database? Well, these file types are widely used in organizations, and it is easy for a database administrator to export a relational database as a comma separated value file or in what is called the CSV format. Also, may systems can “read” database files or database reports. But I have heard that these features do not work on certain types of information stored in a database; for example, the database contains row and column headings that are not plain English or the cells in the database are filled with numerical strings that are codes.
One work around is to write a report, query the database, save the answer to the query as HTML or XML and then process those HTML or XML files as individual documents. But that seems like a great deal of work. What happens to those cryptic row and column headings? What does the report do to make the values in the cells understandable to a human.
We don’t need file conversion. We need another process? What is it called?
Text Analytics Summit Summary Sparks UIMA Thoughts
June 22, 2008
Seth Grimes posted a useful series of links about the Text Analytics Summit, held in Boston the week of June 16, 2008. You can read his take on the conference here. I was not at the conference. I was on the other side of the country at the Gilbane shin dig. To make up for my non attendance, I have been reading about the summit.
From what I can deduce from the Web log posts, the conference attracted the Babe Ruths and Ty Cobbs of text analysis, a market that nestles between enterprise search and business intelligence. I am not too certain about the boundaries of either of these markets, but text analytics is polymorphic and can appear searchy or business intelligency depending upon the context.
I clicked through the links Mr. Grimes provides, and I recommend that you spend a few finites with each of the presentations. I learned a great deal. Please, review his short essay.
One point stuck in my mind. The purpose of this essay is to call your attention to this comment and offer several observations about its implications for those who want to move beyond key word retrieval. Keep in mind that I am offering my opinion.
Here’s the comment. Mr. Grimes writes:
I’ll conclude with one disappointing surprise on the technical front, that UIMA — the Unstructured Information Management Architecture, an integration framework created by IBM and released several years ago as open source to the Apache — has not been more broadly accepted. IBM software architect Thomas Hampp spoke about his company’s use of the framework in the OmniFind Analytics edition, but Technology Panel participants said that their companies — Attensity (David Bean), Business Objects (Claire Thomas), Clarabridge (Justin Langseth), Jodange (Larry Levy), and SPSS (Olivier Jouve) — simply do not perceive user demand for the interoperability that UIMA can offer.
My understanding of this statement and the supporting evidence in the form of high profile industry executives is that an open standard developed by IBM has little, if any, market traction. In short, if the UIMA standard were gasoline, your automobile would not run or just sputter along.
Let us assume that this lack of UIMA demand is accurate. Now I know this is a big assumption, and I am confident that an IBM wizard will tell me that I am wrong. Nevertheless, I want to follow this assumption in the next part of the essay.
Possible Causes
[Please, keep in mind that I am offering my opinion in a free Web log. If you have not read the editorial policy for this Web log, click on the About link on any page of Beyond Search. Some readers forget that I am using this Web log as a journal and a container for the information that does not appear in my for fee reports and my paid writings such as my monthly column in KMWorld. Some folks are reading my musings and ignoring or forgetting what I am trying to capture for myself in these posts. Check out the disclaimer here.]
What might be causing the lack of interest in UIMA, which as you know is an open source framework to allow different software gizmos to talk to one another? For a more precise definition UIMA, you can give the IBM search engine a whirl or click this Wikipedia link, http://en.wikipedia.org/wiki/UIMA.
Here is my short list of the causes for the UIMA excitement void. I am not annoyed with IBM. I own IBM servers, but I want to pick up Mr. Grimes’ s statement and perform a thought experiment. If this type of writing troubles you, please, click away from Beyond Search. Also, I am reacting to a comment about IBM, but I want to use IBM as an example of any large company’s standards or open source initiative.
First, IBM is IBM. IBM has an obligation to its shareholders to deliver growth. Therefore, IBM’s promulgating a standard is in some way large or small a way to sell IBM products and services. Maybe potential UIMA users are not interested in the potential upsell that may follow.
Second, open source and standards have proven to be incredibly useful. Maybe IBM nees to put more effort into educating partners, vendors, and customers about UIMA? Maybe IBM has invested in UIMA and found that marketing did not produce the expected results, so IBM has moved on.
Third, maybe today IBM lacks clout in the search and content processing sector. In 1960, IBM could dictate what was hot and what was not. UIMA’s underwhelming penetration might be evidence that the IBM of today lacks the moxie the company enjoyed almost a half century ago.
And one fourth possibility is that no one really wants to embrace UIMA. Enterprise software is not a level playing field. The vendor wants to own the customer, locking out any other vendor who might suck dollars from the company owning a customer. IBM and other enterprise vendors want to build walls, not create open doors.
I have several other thoughts on my list, but these four provide insight into my preliminary thinking.
Observations
Now let’s consider the implications of these four points, assuming, of course, that I am correct.
- Big companies and standards do not blend as well as a peanut butter and jelly sandwich. The two ingredients may not yet be fully in harmony. Big companies want money and open standards do not have the revenue to risk ratio that makes financial officers comfortable.
- Open source is hard to control. Vendors and buyers want control. Vendors want to control the technology. Buyers want to control risk. Open source may reduce the vendor’s control over a system and buyers lose control over the risk a particular open source system introduces into an enterprise.
- Open source appeals to those willing to break with traditional information technology behavior. IBM, despite its sporty standards garb, is a traditional vendor selfing traditional solutions. Open source is making headway, but it is most successful when youthful blood flows through the enterprise. Maybe UIMA needs more time for the old cows to leave the stock pen?
What is your view? Is your organization ready to embrace UIMA, big company standards, and open source? Agree? Disagree? Let me know.
Stephen Arnold, June 22, 2008
Microsoft Fast: A 45-Day Innovation Cycle Yields a Web Part
June 21, 2008
Microsoft announced Fast ESP Search Web Parts for SharePoint. You can read the full announcement here. Microsoft said, “Using these Web Parts and Site Template SharePoint administrators will be able to quickly and easily build FAST ESP-based search sites inside SharePoint 2007 by simply dropping in and configuring the appropriate components.”You can download the Web parts from www.codeplex.com/espwebparts.
The announcement says, “Some of the FAST ESP search capabilities that can be exposed within SharePoint Server 2007 using these Web Parts include:
- Search Box Web Part — Search box for query term submission and includes “did you mean” functionality for query correction
- Result List Web Part — Displays search results and supports sorting, pagination, and navigator-based filtering
- Navigator Web Part — Displays dynamic navigators that profile search results across a set of pre-defined dimensions and allow users to refine the search through navigation clicks
- Breadcrumb Web Part — Displays the search term(s) and list of navigators used to obtain the current result set.”
Can you mom integrate Fast ESP with SharePoint? Probably not. Will more innovations flow from the Microsoft and Fast ESP teams? Almost certainly.
Microsoft will have to kick its enterprise search activity up a notch. SharePoint is popular but it is creating information access challenges that Microsoft Certified Gold partners stand ready to remediate. In fact, for eace of use, assisted navigation, and better scaling, Microsoft Fast has to leap frog Coveo, Exalead, ISYS Search Software, and other firms with snap in solutions. Most of these outfits are more nimble than Microsoft. So winning in search does not mean killing Google. Microsoft Fast must swat annoying up and coming vendors who are quick and clever, a challenging combination for a firm that releases a mostly pre-existing Web part in a month and a half.
Stephen Arnold, June 21, 2008
When Search News Isn’t News, It’s Disinformation
June 21, 2008
One of my short essays triggered a number of anonymous emails (the best kind), nasty phone calls (an interesting diversion for me) and Web log comments of varying clarity. The flash point item is here.
I have another project to complete, but this reaction to my short news item about Autonomy winning a search contract from a library in Lyon, France, kept nagging at me. As far as I can tell, Autonomy issued a news release about a renewed license agreement, not a new win.
I want to step back. I am not interested in Autonomy’s deal in Lyon. I am interested in the broader topic of what is new in enterprise search. In a conference call this afternoon, one of the people on the call asked me, “What did you learn from the 18 interviews in the Search Wizards Speak series?”
The answer I gave was, “There were only three of four innovations that struck me as new.”
Let’s consider my comment. I talked with 18 “search wizards” over a period of four months. I can identify only three or four new developments.”
I know that the companies for whom these wizards work have generated news releases. Some pump out publicity every two weeks. Others punch the PR button six or seven times a year. What are these companies announcing that I don’t consider news.
Here is my short list:
- New software versions. This is an item of interest, but it is not going to be picked up by Computerworld.
- Added features or functions. A popular innovation is “social”. The idea is fuzzy but seems to mean that a system user can add index tags or attach a note for any other person who accesses a document or report.
- Deal wins. The vendor lands a contract and issues a news release saying, “We won this big deal.” Shareholders and competitors have more interest in these than I do.
- New hires. This is legitimate news, but the enterprise search industry lacks a Wall Street Journal to gather the executive changes which light the marketing fires at Booz, Allen & Hamilton and McKinsey & Company. These firms write a new hire and say, “Congratulations. We can help you be successful.” Competitors and insurance sales people salivate over such announcements.
What happens when this type of news is diluted with multiple releases of the same information? What is the impact of pumped up version announcements which contain only bug fixes and a couple of add ons? What is the cumulative effect of repeated executive hiring announcements?
My thought is that enterprise search vendors are engaging in disinformation. I don’t think the consumers of the disinformation are potential purchasers, stakeholders, or employees of the company issuing the release.
Nope.
The folks who gobble up enterprise search information are the executives at other enterprise search companies. The search industry, which is under seige by customers and companies offering higher value solutions, is talking to itself.
I grew up in a small town. Information circulated quickly and was chock full of gossip, half truths, and insinuations. The intelligence was parochial; that is, the small town’s thought processes were honed for baloney processing.
When hard data from the “outside world” arrived, few knew how to interpret or put it in its approrpiate context. I think the enterprise search sector is close to becoming the equivallent of a dead end town on the edge of the prairie about 150 miles from a city with a million people.
Enterprise search vendors’ efforts to make sales, build buzz, differentiate themselves, and puff up their achievements are the equivalent of a digital peacock spreading its tail feathers. Other peacocks notices but no one else knows what the heck the squawks and flash mean.
Enterprise search is drifting close to disinformation. The marketing is filled with metaphors and homiletic assurances. The news is often not news; it is the peacock squawk and tail shaking. The licensees are wising up. Users of enterprise search systems are grousing. IT departments are unable to deal with some of the search systems because they are too complex. Options are now available.
What’s the fix? I don’t think there is a magic wand that can address disinformation. Customers will decide. Vendors may be too busy news releasing to one another to notice that the buyers have licensed enterprise applications with search baked in or settled for a plug-and-play solution. When the search vendors’ conversation lapses, their world may have changed without their noticing.
Stephen Arnold, June 21, 2008
Business Intelligence: Revenues Drift Down
June 21, 2008
The business intelligence revolution has come and seems to be headed for knee surgery. The nimbleness is gone. If Information Week has it right, business intelligence in the US needs help to get back on the revenue treadmill. Mary Hayes Weier’s essay “Business Intelligence Software Growth Shows Dramatic Drop in U.S.” is here. Ms. Weier’s key point is, “Sales growth of BI software in the United States…sputtered to just five percent.”
BI, as the aficionados prefer, is the stuff that fires the synapses of smart managers. That may be true, but the market is dominated by a handful of companies who seem to be intent of sucking the oxygen from the market, stunting smaller BI vendors, and competing with one another.
Software giants like IBM, Microsoft, and Oracle seem to be moving to a super sized version of Microsoft Office only this time the suite contains application servers, databases, analytics, and search. Will this strategy of delivering a dump truck filled with software help or hurt business intelligence?
My thought upon reading Ms. Weier’s essay was that the effect of on premises systems of such complexity will hasten the emergence of cloud-based solutions. The reasons are easy to identify:
- Information technology departments are no longer able to budget reliably. The slightest glitch can chew through a budget. More complexity means more glitches and less control and predictability in IT spending. Solution? Shift to the cloud and a price list.
- Vendors will find their marketing assurances losing efficacy. Customers cannot afford systems that do not work as advertised. For many years, licensees have been reluctant to grouse. That is beginning to change. Even stage managed vendor trade shows are becoming tough to hold to the party line. Going forward, licensees may become more vocal in their criticism of software and pricing policies. Buggy software “sells” consulting services until licensees get savvy about this ploy.
- Smaller firms may find it easier to explore alternative delivery, pricing, and support models. A small vendor has narrow margin of error. On the other hand, a smaller outfit can make a change to a business quickly. When there are large numbers of competitors, one or two of these outfits may find the keys to the kingdom. The giant firms will be unable to adapt quickly and in effect become more vulnerable.
The business intelligence wave has come, hit the shore, and is now receding. Companies want to make decisions based on data, and the winners will be firms who can make complexity less painful from the cloud. My hypothesis is that a shake up is coming. It may take many years, but the dominant companies will be the BI equivalent of Toyota and Honda surrounded by a small number of specialists. Whom do you think will emerge as the BI winners? I am going to put my money on the GOOG for these reasons: big league analytics in an easy to use package, cloud capability, and big data to complement the puny data sets most companies crunch for their current BI analyses.
Stephen Arnold, June 21, 2008
Boomers and Millennials: Implications for Enterprise Search
June 20, 2008
Enterprise Search and the Age Gap
Employees, contractors, and consultants are becoming younger. For enterprise search, aging boomers are leaving the work force and younger employees moving in.
En route from San Francisco to the less civilized environs of rural Kentucky, I made a list of the differences between Millennials and Baby Boomers. Millennials are all digital all the time. Baby Boomers have luggage stuffed with printed books, paper calendars, and blank notebooks in which one letter at a time can be written using a pencil. For simplicity, I will call the Millennials the younger workers, and the Baby Boomers the aging dinosaurs. Keep in mind that you may be 25 and as mired in books and microfilm as an ossified Baby Boomer. The categories are not absolute. The two part division is intended to make it easy for me to communicate my thoughts about the changes wrought upon search as as Baby Boomers become the minority in organizations and Millennials become the majority.
I want to alert you that any one under the age of 35 will probably be annoyed at my thoughts. But this is a Web log, and I am going to capture these notions before I pass out from the brutalities of a red-eye flight seated next to the lavatory. In short, another red eye, another Web log essay about enterprise search from a different angle.
The generational differences mark a clean break with key word search and retrieval systems of the past and point to more sophisticated and complex information access solutions more youthful enterprise system users require.
Seven Differences between a Young Professional and a Near Retirement Professional
Difference 1: Under 35s don’t read anything long. I have the impression that the under 35 enterprise search user wants short, chunky information from search systems. Systems that return long documents that have to be printed out, annotated, and studies are not what users of search systems want from their information access systems. Over 55s (yes, I am generalizing) may not like long documents, but I for one will slog through this stuff. There may be gold in those hills, I think.
Difference 2: Under 35s want to have search suggestions, assisted navigation, Use For references, and See Also hints. Over 55s like me don’t have much resistance to formulating a query, scanning results, reformulating the query, scanning results, and finally narrowing the result set to a useful collection of documents which can then one-by-one be reviewed. I love shortcuts, but research is research.
Difference 3: Under 35s seem to have the uncanny ability to do several electronic tasks at once. At the Gilbane conference I watched as professional journalists listened to a speaker, sent messages on a BlackBerry, and chatted with the person sitting next to her. I am lucky if I can listen to the speaker; forget the digital activity. Over 55s are less adept multi taskers. The reason the BART train was speeding and crashed into a stopped train appears to have been a young train driver who was chatting on a mobile and controlling the subway train. I prefer single task focus to avoid collisions.
Another Google Should: Buy the Associated Press
June 20, 2008
I enjoy “Google should” essays. Google has money, technology, the number one global band, and the ability to move like a ninja. Wired’s Web log carries Betsy Schiffman’s interesting essay “Forget the New York Times: Google Should Buy The AP”. You can read it here.
The idea is one way for the Associated Press to jump from the tracks and avoid the same fate as a coin placed on the railroad tracks so the wheels can flatten it. Playing on train tracks is fun; letting the wheels of the locomotive rework a penny is a sudden transformation.
The most interesting point in her essay for me is this statement: “The flip side of the equation is that web companies are picking up where the newspapers left off.”
That nails it.
The implications for enterprise search are significant. More and more organizations want to create a Folger’s blend for their Intranet or behind the firewall search users. For fee content has been available to organizations for many years. Now, why bother? Even the high value information such as financial data are becoming more findable. Stock traders need their fancy Bloomberg terminals and Reuters data. But for a snapshot of a competitor Google Finance works well for me. (Yahoo, I fear may be slipping off my radar due to organic issues at that company.)
In June 2007 I made the suggestion that the Associated Press should find a way to “surf on Google”. Perhaps the tie up between Google and AP should become more formal, as Ms. Schiffman suggests. She’s on the right insight vibe as I. The Google is more than Web search and advertising. I am more convinced than ever that my describing the company as a “supranational corporation” is an understatement. The GOOD is our own informational revolution. Instead of sitting in the Black Country in England we are in Data Country. News is one piece of raw material in this new world.
Stephen Arnold, June 20, 2008
English Invade France: Autonomy Snares Lyon Library
June 20, 2008
Autonomy, arguably one of the top two or three vendors of enterprise search, landed a big fish. In fact, the company snagged the second biggest library in France, the Bibliothèque Municipale de Lyon. You can read the full story on CityAM here. (No quotes. The AP sabre rattling echoes in my ears.)
Why is this important? Three reasons:
- France has some serious search, content processing, and text analytics vendors. The ones that merit a close look in any search bake off include Exalead and PolySpot, but there are others about whom I have written in Beyond Search and this Web log.
- Libraries are strapped for cash, so these organizations look for a vendor that offers text firepower and a good deal. My sources tell me that Autonomy’s pencil sharpening carried the day which, when combined with a video search capability, melted French hearts the way the sun softens camembert.
- User expectations for search are soaring. At the same time, dissatisfaction with search systems is rising as well. Lyon’s technologists have sent a bright signal that Autonomy can deliver a better solution and one that will leave users smiling and the users with big grins
So Autonomy has invaded Lyon. The company will work overtime to make this sales win the foundation of other attempts to win business. I will be monitoring the Lyon implementation, the reaction of the French technologists, and the number of wins that Autonomy can achieve.
Stephen Arnold
June 20, 2008
Update 1
A helpful (though reticent reader) has alerted me that the Autonomy Lyon win is not new. You could have fooled me. Here’s the news release that I saw on June 16, 2008, and I certainly thought “news” meant “news”. To my aging self, the news release appeared to originate in Cambridge, UK and San Francisco, California, and I understood the news release to report that event. My anonymous email writer pointed out that the library had been a Verity customer and this was a multi year extension. My anonymous writer suggested I do more research before commenting about news releases. Point well taken, but I’m trying to link actions in search to the needs of users, not analyze the veracity of what appears on PRNewswire. At my age, it is a habit (obviously a bad one) to assume that a “news release” contains news.
Stephen Arnold, June 20, 2008, 8 50 am
Gilbane 2008: Three Things I Learned
June 20, 2008
The Gilbane 2008 conference ended. Attendees seemed happy. In fact, as the exhibit tear down began, some attendees were sitting at tables in the registration area exchanging business cards and planning enterprise application moves. At times, the snap of business cards rivaled the noise in and around a Las Vegas 21 table.
I lost track of the sessions into which I poked my beak. Knitting together what I heard from earnest lecturers, in the break chats, and from exhibitors who smiled continuously for two days, I learned three things.
First, content management systems designed to end hassles with content for the Web and other outputs don’t work very well. One person told me, “CMS. My goodness, what a disaster.” I don’t think this youthful looking person was exaggerating. CMS is one of those software systems that is supposed to allow an organization with neither publishing work processes or people who can write very well do both and generate content automatically.
Second, enterprise search does not work very well. For the first time, a number of different people were talking about the problems of search, but what I heard boiled down to two issues: [a] users don’t use the system and [b] the system is tough to fix. One person told me, “I was surprised how many people admitted that their search systems were not what the companies thought they were buying.” Popular silver bullets and amulets included taxonomies, social search, and semantics. All incantations to keep search evils at bay.
Third, consultants feasted on the attendees looking for silver bullets. The UK outfit 451 and Gartner were making sales left and right. Lesser souls were also dragging in nets filled with prospects. I steered clear of the consultants because the flashing white teeth and the broad smiles frightened me.
Quite a lesson for me.
Stephen Arnold
June 20, 2008
Newspapers: Descent from Mt. Olympus Continues
June 19, 2008
No references or quotes from the Associated Press for me: The news, however, is easy to find, and it has grave implications for enterprise search systems dependent on for-fee content from traditoinal publishers.
A quick read of these stories makes clear that unless traditional news organizations staunch the bleeding, traditional newspapers may be reduced to a shadow of their present selves. This is more startling than the Subway fast food weight loss advertisements. Newspapers may be even more trim that the once-chubby Jared.
You can start your learning about the financial plight at these two links:
- Editor & Publisher, “Behind the McClatchy Layoffs”. One-second summary: Ads go away.
- Bloomberg, “Tribune, MediaNews May Wind Up in Default as Ad Sales Evaporate,” One-second summary: Default possible
Enterprises depending on branded news may want to increase their intake of unbranded news available via the Internet. I heard that Factiva is stepping up its professional services activity. Good move. Content licensing may be in for some rough water.
Stephen Arnold, June 19, 2008