Attensity and Tremendous Momentum
October 3, 2008
With the economy in the US stumbling along, I found Attensity’s September 30, 2008, “Momentum” news release intriguing. The information issued by the the analytics company is here. I had to struggle to decipher some of the jargon. For example, First Person Intelligence. This is a product name with a trademark. The idea is that email or phone calls from a customer are analyzed by Attensity. The resulting insights yield information about a particular customer; hence, First Person Intelligence. You can see FPI in action by clicking here. The company won an award called the Stevie. If you are curious or you want to enter to compete to snag the 2009 award, click here. I think I know what text analytics is, so I jumped to VoC. The acronym means “voice of the customer.” I think the notion is that a company pays attention to emails, call center notes, and survey data. I’m not certain if VoC is a subset of FPI or if VoCis the broader concept and FPI is a subset of VoC.
The core of the news release is that Attensity has landed some major accounts. Customer names are tough to come by, so you may want to note these organizations who have licensed the Attensity technology but hopefully not the jargon:
- JetBlue
- Royal Bankk of Canada
- Travelocity
For me, the most useful part of the company-written article was this passage:
The text analytics market is rapidly moving out of the early adopter stage. Industry analyst firm Hurwitz & Associates estimates an annual growth rate for this market at 30 to 50 percent. According to a survey conducted last year by the firm, the largest growth area is in customer care-related applications. In fact, over 70 percent of the companies surveyed that had deployed, or were considering deploying the technology, cited customer care as a key application area.
The growth rate does not match my calculation which pegs growth at a more leisurely 10 to 18 percent on an annual basis. The Hurwitz organization is much larger than this single goose operation. Endangered species like this addled goose are more conservative, and its estimates in a grim financial market are less optimistic than other consultants’ and analysts’.
In my Beyond Search study for the Gilbane Group, published in April 2008, I gave Attensity high marks. Its deep extraction technology yields useful metadata. Since my early 2008 analysis, Attensity has worked hard to productize its system. Calls centers are a market segment in need of help. Most companies want to contain support costs.
In my opinoin, Attensity’s technology is better than its explanation of its products and those products names. I wonder if the addition of marketers to a technology-centric company is a benefit or a drawback. Thoughts?
Stephen Arnold, October 3, 2008
Silobreaker: Mary Ellen Bates’ Opinion Is on Target
September 30, 2008
Mary Ellen Bates is one sharp information professional. She moved from Washington, DC, to the sunny clime in Colorado. The shift from the nation’s capital (the US crime capital) to the land of the Prairie Lark Finch has boosted her acumen. Like me, she finds much goodness in the Silobreaker.com service. (You can read an interview with one of the founders of Silobreaker.com here.) Writing in the September number of Red Orbit here she said:
What Silobreaker does particularly well is provide you with visual displays of information, which enable you to spot trends or relationships that might not be initially obvious. Say, for example, you want to find out about transgenic research. Start with what Silobreaker calls the “360[degrees] search,” which looks across its indexes, including fields for entities (people, companies, locations, organizations, industries, and keywords), news stories, YouTube videos, blog postings, and articles.
If you want to try Silobreaker yourself, click here. With Ms. Bates in the wilds of Colorado and me in a hollow in rural Kentucky, I am gratified that news about next-generation information services reaches us equally. A happy quack to Silobreaker and Ms. Bates.
Stephen Arnold, September 30, 2008
Dow Jones and Automatic Taxonomy Generation
September 30, 2008
An eager beaver reader (I only have two or three) sent me a link to “Taxonomies for Human Vs Auto-Indexing.” The author of the Synaptica Central write up is Wendy Lim. She is summarizing or reproducing information attributed to Heather Hedden. From a bibliographic angle, I think a tad more work could be done to make clear who was writing what, where, and when. But that’s an old, failed database goose quacking about the brilliant work done by “experts” decades younger than I. Quack. Quack.
You can read the September 26, 2008, write up here. The article is about a Taxonomy Bootcamp. After a bit of sleuthing, I discovered that this is an add on to some Information Today trade shows. The bootcamp, as I understand it, is an intellectual Camp Lejune except the that the attendees skip the push ups, the 5 am wake up calls, and the 20 mile runs. Over a period of two or three days, taxonomy recruits emerge battle ready, honed to deal with the intellectual rigors of creating taxonomies.
A real taxonomy. Source: www.nnf.org.na
The word “taxonomy” is more popular than “enterprise search” and for good reason. Enterpriser search has emerged from organizations with a bold 4F stamped on its fitness report. After hours, maybe months of work, and some hefty bills to pay, enterprise search customers are looking for a way to kill the enterprise search enemy. That’s where a taxonomy comes it. I’m no expert in taxonomies. I know I was involved in creating taxonomies for some once-hot commercial databases like ABI / INFORM, Business Dateline, General Business File, Health Reference Center, and the 1993 Web direct Point (Top 5% of the Internet). What those experiences taught me was that I don’t know too much about taxonomies or classification systems in general for that matter. I keep in touch with people who do know; for example, Marje Hlava at Access Innovations, Barbara Quint (Searcher Magazine), Marydee Ojala (Online Magazine), Ulla de Stricker (De Stricker & Associates), and other specialists. I get nervous when a 20- or 30-something explains that taxonomies are not big deal or that a business process can crack a taxonomy problem or a certain vendor’s software can auto-magically create a taxonomy.
A Synaptica Central tag cloud.
In my experience, the truth is not to be found in any one solution. In fact, the reality of taxonomies is that the concept has gained traction because of fundamental errors in planning and deploying information access systems. I don’t think a taxonomy can retrofit stupid, short sighted decisions. For that reason, I steer clear of most taxonomy discussions because after working with these beasts for more than 30 years, I understand their unpredictable behavior.
Expert System: Morphing into an Online Advertising Tool Vendor
September 28, 2008
Several years ago, YourAmigo (an Australian search and content processing vendor) shifted from enterprise search to search engine optimization. I stopped following the company because I have zero interest in figuring out how to get traffic to my Web site or my Web log. Now Expert System has rolled out what it calls Cogito Advertiser. A brief write up appeared in DMReview.com when I was in Europe. You can read that article here.
The new service, according to DMReview.com:
automatically analyzes Web pages to identify the most relevant topics and extract the main themes included in the text. It classifies content by assigning the category related to the text in real time, based on an optimized taxonomy and high precision. By processing the text, it collects all useful data in an output format structured to be uploaded into a database and directly integrates it with the ad server.
Expert System has some interesting technology.The idea is that software that can “understand” will be able to a better job of key word identification than a human, often fresh out of college with vocabulary flush with “ums”, “ers”, and “you knows”.
You can learn more about the company here. As the financial and competitive pressures mount, I expect other vendors to repackage their technology in an effort to tap into more rapidly growing markets with shorter buying cycles than enterprise search typically merits.
Stephen Arnold, September 28, 2008
Taxonomy: Silver Bullet or Shallow Puddle
September 27, 2008
Taxonomy is hot. One of my few readers sent me a link to Fumsi, a Web log that contains a two part discussion of taxonomy. I urge you to read this post by James Kelway, whom I don’t know. You can find the article here. The write up is far better than most of the Webby discussions of taxonomies. After a quick pass at nodes and navigation, he jumps into information architecture requiring fewer than 125 words. The often unreliable Wikipedia discussion of taxonomy here chews up more than 6,000. Brevity is the soul of wit, and whoever contributed to the Wikipedia article must be SWD; that is, severely wit deprived.
Take a look at the Google Trends’ chart I generated at 8 pm on Friday, September 26, 2008. Not only is taxonomy generating more Google traffic than the now mud crawler enterprise search. Taxonomy is not as popular as “CMS”, the shorthand for content management system. But “taxonomy” is a specialist concept that seems to be moving into the mainstream. At the just concluded Information Today trifecta conference featuring search, knowledge management (whatever that is), and streaming media, taxonomy was a hot topic. At the Wednesday roof top cocktail, where I worked on my tan in the 90 degree ambient air temperature, I was asked four times about taxonomies. I know I worked on commercial taxonomies and controlled vocabularies for database, but I learned from those years of experience that taxonomies are really tough, demanding, time consuming intellectual undertakings. I thought I was pretty good at making logical, coherent lists. Then I met the late Betty Eddison and the very active Marje Hlava. These two pros taught me a thing or 50.
In the dumper is the red line which maps “enterprise search” popularity. The blue line is the up and coming taxonomy popularity. The top line is the really popular, yet hugely disappointing, content management term traffic.
I heard people who have been responsible for failed search systems and non functional content management systems asking, “Will a taxonomy improve our content processing?” The answer is, “Sure, if you get an appropriate taxonomy?” I then excuse myself and head to the bar man for a Diet 7 Up. The kicker, of course, is “appropriate”. Figuring out what’s appropriate and then creating a taxonomy that users will actually exploit directly or indirectly is tough work. But today, you can learn how to do a taxonomy in a 40 minute presentation or if you are really studious a full eight hour seminar.
I remember talking with Betty Eddison and Marje Hlava about their learning how to craft appropriate taxonomies. Marje just laughed and turned to her business partner who also burst out laughing. Betty smiled and in her deep, pleasant voice said, “A life time, kiddo.” She called me “kiddo”, and I don’t think anyone else ever did. Marje Hlava chimed in and added, “Well, Jay [her business partner] and I have been at it for two life times.” I figured out pretty quickly that building “appropriate” taxonomies required more than persistence and blissfully ignorant confidence.
Why are taxonomies perceived as the silver bullet that will kill the vampire search or CMS system. A vampire system is one that will suck those working on it into endless nights and weekends and then gobble available budget dollars. In my opinion, here are the top five reasons:
- The notion of a taxonomy as a quick fix is easy to understand. Most people think of a taxonomy as the equivalent of the Dewey Decimal system or the Library of Congress subject headings and think, “How tough can this taxonomy stuff be?” After a couple of runs at the problem, the notion of a quick fix withers and dies.
- Vendors of lousy enterpriser search systems wriggle off the hook by asserting, “You just need a taxonomy and then our indexing system will be able to generate an assisted navigation interface.” This is the search equivalent of “The check is in the mail.”
- CMS vendors, mired in sluggish performance, lost information, and users who can’t find their writings, can suggest, “A taxonomy and classification module makes it much easier to pinpoint the marketing collateral. If you search for a common term, our system displays those documents with that common term. Yes, a taxonomy will do the trick.” This is the same as “Let’s do lunch” repeated every week to a person whom you know but with whom you don’t want to talk for more than 30 seconds on a street corner in mid town Manhattan.
- A shill at a user group meeting–now called a “summit”–praises the usefulness of the taxonomy in making it easier for users to find information. Vendors work hard to get a system that works and win over the project manager. Put on center stage and pampered by the vendor’s PR crafts people, the star customer presents a Kodachrome version of the value of taxonomies. Those in the audience often swallow the tale the way my dog Tess goes after a hot dog that falls from the grill. There’s not much thinking in Tess’s actions either.
- Vendors of “automated” taxonomy systems demonstrate how their software chops a tough problem down to size in a matter of hours or days. Stuff in some sample content and the smart algorithms do the work of Betty Eddison and Marje Hlava in a nonce. Not on your life, kiddo. The automated systems really are 100 percent automatic. The training corpus is tough to build. The tuning is a manual task. The smart software needs dummies like me to fiddle. Even more startling to licensees of automatic taxonomy systems is that you may have to buy a third party tool from Access Innovations, Marje Hlava’s company, to get the job done. That old phrase “If ignorance is bliss, hello, happy” comes to mind when I hear vendors pitch the “automated taxonomy” tale.
I assume that some readers may violently disagree with my view of 21st century taxonomy work. That’s okay. Use the comments section to teach this 65 year old dog some new tricks. I promise I will try to learn from those who bring hard data. If you make assertions, you won’t get too far with me.
Stephen Arnold, September 27, 2008
IBM: Another New Search System from Big Blue
September 27, 2008
IBM announced its eDiscovery Analyzer. You can read the IBM news release on the MarketWatch news release aggregation page here. Alternatively you can put up with the sluggish response of IBM.com and read the more details here. You won’t be able to locate this page using IBM.com’s search function. The eDiscovery Analyzer had not been indexed when I ran the query at 7 30 pm on September 27, 2008. I * was * able to locate the page using Google.com. If I were the IBM person running site search, I would shift to Google, which works.
The eDiscovery Analyzer, according to Big Blue:
… provides conceptual search and analysis of cases created by IBM eDiscovery Manager.
Translating: eDiscovery Manager assists with legal discovery, a formal investigation governed by court rules and conducted before
trial, and internal investigations on possible violations of company policies, by enabling users to search e-mail documents that
were archived from multiple mailboxes or Mail Journaling databases into a central repository. You license eDiscovery Manager, the bits and pieces needed to make it go and then you license the brand new eDiscovery Analyzer component.
I believe that this is the current interface for the “new” IBM eDiscovery Analyzer. Source: IBM’s Information Management Software IBM eDiscovery Analyzer 2.1 marketing collateral.
You will need FileNet, IBM’s aging content management system. The phrase I liked best in the IBM write up was, “[eDiscovery Analyzer] is easy to deploy and use, Web 2.0 based interface requires minimal user training.” I’m not sure about the easy to deploy assertion. And the system has to be easy to use because the intended users are attorneys. In my experience, which is limited, legal eagles are not too excited about complicated technology unless it boosts their billable hours. You can run your FileNet add in on AIX (think IBM servers) or Windows (think lots of servers).
You can read about IBM’s search and discovery technology here. You can tap into such “easy to deploy” systems as classification, content analysis, OmniFind search, and, if you are truly fortunate, DB2, IBM’s user friendly enterprise database management system. You might want to have a certified database administrator, an expert in SQL, and an IBM-trained optimization engineer on hand in case you run into problems with these user friendly systems. If these systems leave you with an appetite for more sophisticated functions, click here to learn about other IBM search and discovery products. You can, for example, read about four different versions of OmniFind and learn how to buy these products.
Remember: look for IBM products by searching Google. IBM.com’s search system won’t do the job. Of course, IBM’s enterprise eDiscovery Analyzer is a different animal, and I assume it works. By the way, when you try to download the user guide, you get to answer a question about the usefulness of the information * before * you have received the file. I conclude that IBM prefers users who are able to read documents without actually having the document.
Stephen Arnold, September 27, 2008
Linguamatics Sells Bayer CropScience
September 27, 2008
My newsreader snagged this item, which I found interesting. The little-known Linguamatics (a content processing company based in the UK) retained its deal with the warm and friendly Bayer CropScience. The Linguamatics’ technology is called I2E, and Bayer has been using the I2E system since the summer of 2007. In September, Bayer CropScience decided to renew its license and process patent documents, scientific and technical information, and perform knowledge discovery. (I must admit I am not sure how one discovers knowledge, but I will believe the article that you can find here.)
For me, this small news item was interesting for several reasons. First, for many years a relatively small number of companies had been granted access to the inner circle of European pharma. I find it refreshing that after two centuries, upstarts like Linguamatics are able to follow in the footsteps of Temis and other firms who have worked to make sales in these somewhat conservative companies. “Conservative” might not be the correct word. Computational chemists are a fun-loving group. One computational chemist told me last October in Barcelona that computational chemists were pharma’s equivalent to Brazilian soccer football fans. On the off change that a clinical trial goes off the rails, some pharma players prefer keeping “knowledge” quite undiscovered until an “issue” can be resolved.
A representative I2E results display. © Linguamatics, 2008.
Second, Linguamatics–a company I profiled after significant bother and effort–is profiled in my April 2008 study Beyond Search, published by the Gilbane Group. You can learn more about this study here because ferreting out information about I2E is not the walk in the park that I expected from a content processing company with a somewhat low profile. Linguamatics has some interesting technology, and I surmise that the uses of the system are somewhat more sophisticated and useful to Bayer CropScience than “discovering knowledge”.
Finally, Bayer CropScience is a subsidiary of the influential Bayer AG, an outfit with an annual turnover of about US$8.0 billion, give or take a billion because of the sad state of the dollar on the international market. My hunch is that if the CropScience deal feels good, other units of this chemical and pharmaceutical giant will learn to love the I2E system.
Stephen Arnold, September 27, 2008
TeezIR BV: Coquette or Quitter
September 26, 2008
For my first visit to Utrecht, once a bastion of Catholicism and now Rabobank stronghold, I wanted to speak with interesting companies engaged in search and content processing. After a little sleuthing, I spotted TeezIR, a company founded in November 2007. When I tried to track down one of the principals–Victor Van Tol, Arthus Van Bunningen, and Thijs Westerveld–I was stonewalled. I snagged a taxi and visited the firm’s address (according to trusty Google Maps) at Kanaalweg 17L-E, Building A6. I made my way to the second floor but was unable to rouse the TeezIR team. I am hesitant to say, “No one was there”. My ability to peer through walls after a nine hour flight is limited.
I asked myself, “Is TeezIR playing the role of a coquette or has the aforementioned team quit the search and content processing business?” I still don’t know. At the Hartmann conference, no one had heard of the company. One person asked me, “How did you find out about the company?” I just smiled my crafty goose grin and quacked in an evasive manner.
The trick was that one of my two or three readers of this Web log sent me a snippet of text and asked me if I knew of the company:
Proprietary, state-of-the-art technology is information retrieval and search technology. Technology is built up in “standardized building blocks” around search technology.
So, let’s assume TeezIR is still in business. I hope this is true because search, content processing, and the enterprise systems dependent on these functions are in a sorry state. Cloud computing is racing toward traditional on premises installations the way hurricanes line up to smash the American south east. There’s a reason cloud computing is gaining steam–on premises installations are too expensive, too complicated, and too much of a drag on a struggling business. I wanted to know if TeezIR was the next big thing.
My research revealed that TeezIR had some ties to the University of Twente. One person at the Hartmann conference told me that he thought he heard that a company in Ede had been looking for graduate students to do some work in information retrieval. Beyond that tantalizing comment, I was able to find some references to Antal van den Bosch, who has expertise in entity extraction. I found a single mention of Luuk Kornelius, who may have been an interim officer at TeezIR and at one time a laborer in the venture capital field with Arengo (no valid link found on September 16, 2009). Other interesting connections emerged from TeezIR to Arjen P. de Vries (University of Twente), Thomas Roelleke (once hooked up with Fredhopper), and Guido van’t Noordende (security specialist). Adding these names to the management team here, TeezIR looked like a promising start up.
Since I was drawing a blank on getting people affiliated with TeezIR to speak with me, I turned to my own list of international search engines here, and I began the thrilling task of hunting for needles in hay stacks. I tell people that research for me is a matter of running smart software. But for TeezIR, the work was the old-fashioned variety.
Overview
Here’s what I learned:
First, the company seemed to focus on the problem of locating experts. I grudgingly must call this a knowledge problem. In a large organization, it can be hard to find a colleague who, in theory, knows an answer to another employee’s question. Here’s a depiction of the areas in which TeezIR is (was?) working:
Second, TeezIR’s approach is (was?) to make search an implicit function. Like me, the TeezIR team realized that by itself search is a commodity, maybe a non starter in the revenue department. Here’s how TeezIR relates content processing to the problem of finding experts:
Eaagle Text Processing Swoops In
September 26, 2008
Eaagle Software announced the availability of Full Text Mapper (FTM), a desktop software program that provides analysis of unstructured data. Eaagle Software brings together advanced text mining technology and desktop computing. ‘Our philosophy is that text mining and data analysis tools should be easy-to-use and not require any particular skills,’ states Yves Kergall, president and CEO of Eaagle. ‘Our software doesn’t require any setup or predefinition to begin discovering knowledge. Simply highlight the information, launch FTM, and instantly visualize your data to begin your analysis…it is that easy.’ You can read the full news story here. For more information about Eaagle, navigate to the company’s Web site here. A single user license is about $4,000.
Stephen Arnold, September 26, 2008
Knol Understanding
September 23, 2008
Slate’s Farhad Manjoo’s “Why Google’s Online Encyclopedia Will Never Be as Good as Wikipedia” takes a somewhat frosty stance toward Knol. You can read his interesting essay here. For me the most significant point was this one:
Knol is a wasteland of such articles: text copied from elsewhere, outdated entries abandoned by their creators, self-promotion, spam, and a great many old college papers that people have dug up from their files. Part of Knol’s problem is its novelty. Google opened the system for public contribution just a couple months ago, so it’s unreasonable to expect too much of it at the moment; Wikipedia took years to attract the sort of contributors and editors who’ve made it the amazing resource it is now.
Knol is one of those Google products that appear and seem to have little or no overt support. I agree. I would like to make three comments:
- Knol may be a way for Google to get content for itself first and then secondarily for its users. Google wants information, and Knol is a different mechanism for information acquisition. Assuming that it is a Wikipedia may only be partially correct.
- Knol, like many other Google services, does not appear to have a champion. As a result, Knol evolves slowly or not at all. Knol may be another way for Google to determine interest, learn about authors who are alleged experts, and determine if submitted content validates or invalidates other data known to Google.
- Knol may be part of a larger grid or data ecosystem. As a result, looking at it out of context and comparing it to a product with which it may not be designed to compete might be a partially informed approach.
Based on my analysis of the Google JotSpot acquisition and the still youthful Knol service, I’m not prepared to label Knol or describe it as either a success or failure. In my 10pinion, Knol is a multi purpose beta. Its principal value may be in the enterprise, not the consumer space. But for me, I have too little data and an incomplete understanding of how the JotSpot “plumbing” is implemented; therefore, I am neutral. What’s your view?
Stephen Arnold, September 23, 2008