Boston Search Engine Meeting and Exalead
April 9, 2010
The Evvie Award recognizes outstanding work in the field of search and content processing. Ev Brenner, one of the original founders of the Boston Search Engine Meeting emphasized the need to acknowledge original research and innovative thinking. After Mr. Brenner died, the Boston Search Engine Meeting, then owned by a company in the UK, instituted the Evvie award. This year, the Evvie is sponsored by Exalead, one of the leaders in search-based applications and ArnoldIT.com, are sponsoring the award. in addition to a cash recognition of $1,000, the recipient receives the Evvie shown below.
For more information about the premier search and content processing conference, navigate to the Search Engine Meeting Web site. You can review the program and pre conference activities.
For more information about Exalead, navigate to the Exalead Web site. You can see a demonstration of the Exalead system on the ArnoldIT.com site here and you can explore next generation search and content processing innovations at Exalead’s “labs” site.
For more information about the award, click here.
Stephen E Arnold, April 9, 2010
This post is sponsored by ArnoldIT.com, Exalead, and Information Today, Inc.
AIIM Report on Content Analytics
March 30, 2010
A happy quack to the reader who sent me a link available from the Allyis Web site for the report “Content Analytics – Research Tools for Unstructured Content and Rich Media”. If you are trying to figure out what about 600 AIIM members think about the changing nature of information analysis, you will find this report useful. I flipped through the 20 pages of data from what strikes me as a somewhat biased sample of enterprise professionals. Your mileage may vary, of course. One quick example. In Figure 4: How would you rate your ability to research across the following content types on page 7, the respondants’ data are pretty good at search customer support logs. The respondents are also confident of their ability to search “case files” and “litigation and legal reports.” My research suggests that these three areas are real problems in most organizations. I am not sure how this sample interprets their organizations’ capabilities, but I think something is wacky. How can, for example, a general business employee assess the ease with which litigation content can be researched. Lawyers are the folks who have the expertise. At any rate, another flashing yellow light is the indication that the respondents have a tough time searching for press articles and news along with collateral, brochures, and publications. This is pretty common content, and an outfit that can search “case files” should be able to locate a brochure. Well, maybe not?
There were three findings that I found interesting, but I am not ready to bet my bread crust on the solidity of the data.
First, Figure 14: What are your spending plans for the following areas in the next 12 months?. The top dog is enterprise search – application. This should give some search vendors the idea to market to the AIIM membership.
Second, respondents, according to the Key Findings, can find information on the Web more easily than they can find information within their organization. This matches what Martin White and I reported in our 2009 study Successful Enterprise Search Management. It is clear that this finding underscores the wackiness in Figure 4, page 7.
Finally, the Conclusion, page 15 states:
The benefits of investment in Finance and ERP systems have only come to the fore with the increasing power of Business Intelligence (BI) reporting tools and the insight they provide for business managers. In the same way, the benefits of Content Management systems can be much more heavily leveraged by the use of Content Analytics tools.
I don’t really understand this paragraph. Finance has been stretched with the present economic climate. ERP is a clunker. Content management systems are often quite problematic. So what’s the analysis? How about cost overruns?
I tucked the study into my reference file. You may want to do the same. If the Allyis link goes dead, you can get the report directly from AIIM but you may have to join the association.
Stephen E Arnold, March 31, 2010
Like the report, a freebie.
SAS Teragram in Marketing Push
March 25, 2010
Two readers on two different continents sent me links to write ups about SAS Teragram. As you may know, SAS has been a licensee of the Inxight technology for various text processing operations. Business Objects bought Inxight, and then SAP bought Business Objects. I was told a year or so ago that there was no material change in the way in which SAS worked with Inxight. Not long after I heard that remark, SAS bought the little-known specialist content processing firm, Teragram. Teragram, founded by Yves Schabes and a fellow academic, landed some big clients for the firm’s automated text processing system. These clients included the New York Times and, I believe, America Online.
Teragram has integrated its software with Apache Lucene, and the company has rolled out what it calls a Sentiment Analysis Manager. The idea behind sentiment analysis is simple. Process text such as customer emails and flag the ones that are potential problems. These “problems” can then be given special attention.
The first news item I received from a reader was a pointer to a summary of an interview with Dr. Schabes on the Business Intelligence network. Like ZDNet and Fierce Media, these are pay-for-coverage services. The podcasts usually reach several hundred people and the information is recycled in print and as audio files. The article was “Teragram Delivers Text Analytics Solutions and Language Technologies.” You can find a summary in the write up, but the link to the audio file was not working when I checked it out (March 24, 2010, at 8 am Eastern). The most interesting comment in the write up in my opinion was:
Business intelligence has evolved from a field of computing on numbers to actually computing on text, and that is where natural language processing and linguistics comes in… Text is a reflection of language, and you need computational linguistics technologies to be able to turn language into a structure of information. That is really what the core mission of our company is to provide technologies that allow us to treat text at a more elaborate level than just characters, and to add structure on top of documents and language.
The second item appeared as “SAS Text Analytics, The Last Frontier in the Analysis of Documents” in Areapress. The passage in that write up I noted was this list of licensees:
Associated Press, eBay, Factiva, Forbes.com, Hewlett Packard, New York Times Company, Reed Business Information, Sony, Tribune Interactive, WashingtonPost.com, Wolters Kluwer, Yahoo! and the World Bank.
I am not sure how up to date the list is. I heard that the World Bank recently switched search systems. For more information about Teragram, navigate to the SAS Web site. Could this uptick in SAS Teragram marketing be another indication that making sales is getting more difficult in today’s financial climate?
Stephen E Arnold, March 25, 2010
A no fee write up. I will report this sad state of affairs to the IMF, which it appears is not clued in like the World Bank.
IBM and Its Do Everything Strategy
March 24, 2010
I read an unusual interview with Steve Mills. The story was “Q&A: IBM’s Steve Mills on Strategy, Oracle, and SAP.” What jumped out at me was that there was no reference to Google that I noticed. Odd. Google seems to be ramping up in the enterprise sector and poised to compete with just about everyone in the enterprise software and services market. When I noticed this, I decided to work through the interview to see what the rationale was for describing companies that are struggling with many “push back” issues from customers, resellers, and partners. The hassles Oracle is now enduring with regard to open source and the SAP service pricing fluctuations are examples of companies struggling to deal with a changing market needs.
Please, read the original interview because I am comfortable highlighting three comments in a blog post.
First, Mr. Mills said:
Our technology delivers important elements of the solution, but there are often third-part application companies that add to that solution. No one vendor delivers everything required. The average large business, if you went into their compute centers around the world, runs 50,000 to 60,000 programs that are part of 2,000 to 4,000 unique applications.
Yes, and it is the cost and complexity of the IT infrastructure in those companies today that are creating pressures on the CFO, the users, and stakeholders. IBM’s engineers helped created the present situation and the company is now in a position where those customers are likely to look for lower cost, different types of options. If I have a broken auto, would I go to the mechanic who failed to make the repair on an earlier visit? I seek a new mechanic, but perhaps IBM’s cash rich customers don’t think the way I do.
Second, Mr. Mills offered this “fact”:
But in the enterprise, for every dollar invested in ERP, there will be five dollars of investment made around that ERP package to get it fully implemented, integrated, scaled and running effectively.
My view is that the time value of the dinosaur like applications are likely to be put under increasing pressure by new hires. The younger engineers are more comfortable with certain approaches to computing. Over time, the IBM “factoid” will be converted into a question like, “If we shift to Google Apps, perhaps we could save some money?” The answer would require verification, but if the savings are accurate, the implications for Oracle and SAP are significant. I think IBM will either have to buy its way into the cloud and “try to make up the revenue delta” on volume or find itself in the same boat as other “old style” enterprise software vendors.
Third, Mr. Mills stated:
It’s money. That’s the No. 1 motivator. And money is not a single-dimensional factor because there’s short-term money, long-term money and money described in broader value terms versus the cost of a product. The surrounding costs are far in excess of products. Every month, customers convert from Oracle to DB2. Why do they do that? Well, Oracle is expensive. Oracle tries to use pricing power to capture a customer and then get the customer to keep on paying. Oracle raises its prices constantly. Oracle does not provide a strong support infrastructure. There are many customers who have decided to move away from Oracle across a variety of products because of those characteristics.
I agree. The implication are that IBM is a low cost option. Well, maybe in some other dimension which the addled goose cannot perceive. My view is that time, vale, and cost will conspire to create a gravity well into which the IBM-like companies will be sucked. IBM’s dalliance with open source, its adherence to its services model, and its reliance on acquisitions to generate revenue may lose traction in the future.
And finding stuff in IBM systems? Not mentioned. Also, interesting.
I don’t know when, but IBM’s $100 billion in revenue needs some oxygen going forward. The race is not a marathon. It’s more like a 200 or 440. Maybe Google will be in the race? Should be interesting.
Stephen E Arnold, March 24, 2010
No pay for this write up. I will report this to the GSA who has tapped IBM to build its next generation computing infrastructure. I think IBM will be compensated for this necessary work.
Google Bombshell: Alleged Links to Intelligence Services Alleged
March 22, 2010
I was plonking along looking at ho hum headlines when I spotted “Chinese Media Hits Out at Google, Alleges Intelligence Links”. The addled goose does not know anything about this source nor about the subject of the article. But the addled goose is savvy enough to know that if this story is true, it is pretty darned important. The main point of the story in Economic Times / India Times is:
Xinhua said in an editorial: “Some Chinese Internet users who prefer to use Google still don’t realize perhaps that due to the links between Google and the American intelligence services, search histories on Google will be kept and used by the American intelligence agencies.”
Okay, that’s interesting. Several years ago, I heard a talk by a citizen in Washington, DC who made a similar comment. My recollection is that Google was pretty darned mad. I wondered if the citizen in Washington, DC was right or wrong. If another source comes up with more detail, the story becomes much more interesting.
Chinese intelligence agents are pretty savvy. And the Ministry of State Security is one of the best. I can’t remember whether Section 6 is the go-to bunch, but perhaps more information will surface.
Stephen E Arnold, March 22, 2010
A freebie. I will report non payment to DC Chief of Police who is really clued into Google’s activities in Washington.
Attensity in PR Full court Press
March 2, 2010
Risking the quacking of the addled goose, Attensity sent me a link to its “new” voice of the customer service. I have been tracking Attensity’s shift from deep extraction for content processing to customer support for a while. I posted on the GlobalETM.com site a map of search sectors, and Attensity is wisely focusing on customer support. You can read the “new” information about customer support at the company’s VOC Community Advantage page. The idea is to process content to find out if customers are a company’s pals. Revenues and legal actions can also be a helpful indicator too.
What interested me was the link to the Attensity blog post. “Leveraging Communities through Analytic Engines” presents an argument that organizations have useful data that can yield insights. I found this passage interesting:
Analytical engines cannot stop at simply producing a report for each community; they have to become a critical part of the platform used by the organizations to interact with and manage their customers. This platform will then integrate the content generated by all channels and all methods the organization uses to communicate, and produce great insights that can be analyzed for different channels and segments, or altogether. This analysis, and the subsequent insights, yield far more powerful customer profiles and help the organization identify needs and wants faster and better. Alas, the role of analytical engines for communities is not to analyze the community as a stand-alone channel, although there is some value on that as a starting point, but to integrate the valuable data from the communities into the rest of the data the organization collects and produce insights from this superset of feedback.
Now this is an interesting proposition. The lingo sounds a bit like that cranked out by the azure chip crowd, but that’ is what many search and content processing vendors do now? Wordsmithing.
An “analytical engine” – obviously one like Attensity’s – is an integration service. In my opinion this elevation of a component of text processing to a much larger and vital role sounds compelling. The key word for me is “superset”. This notion of taking a component and popping it up a couple of levels is what a number of vendors are pursuing. Search is not finding. Search is a user experience. Metatagging is not indexing. Metatagging is the core function of a content management system.
I understand that need to make sales, and as my GlobalETM.com diagram shows, the effort is leading to marketing plays that focus on positioning search and content processing technologies as higher value solutions. From a marketing point of view, this makes sense. The problem is that most vendors are following this path. What happens is that the technical plumbing does one or two things quite well and then some other things not so well.
Many vendors run into trouble with connectors or performance or the need for new coding to “hook” services together. Set Attensity aside, how many search and content processing vendors have an architecture that can scale economically, quickly, and efficiently? In my experience, scaling, performance, and flexibility – not the marketing lingo – make the difference. Just my opinion.
Stephen E Arnold, March 2, 2010
No one paid me to write this. I suppose I have to report poverty to the unemployment folks. Ooops. Out of money like some of the search and content processing vendors.
Twitter and Mining Tweets
February 21, 2010
I must admit. I get confused. There is Twitter, TWIT (a podcast network), TWIST (a podcast from another me-too outfit), and “tweets”. If I am confused, imagine the challenge for text processing and then analyzing short messages.
Without context, a brief text message can be opaque to someone my age; for example, “r u thr”. Other messages say one thing, “at the place, 5” and mean to an insider “Mary’s parents are out of town. The party is at Mary’s house at 5 pm.”
When I read “Twitter’s Plan to Analyze 100 Billion Tweets”, several thoughts struck me:
- What took so long?
- Twitter is venturing into some tricky computational thickets. Analyzing tweets (the word given to 140 character messages sent via Twitter and not to be confused with “twits”, members of the TWIT podcast network) is not easy.
- Non US law enforcement and intelligence professionals will be paying a bit more attention to the Twitter analyses because Twitter’s own outputs may be better, faster, and cheaper than setting up exotic tweet subsystems.
- Twitter makes clear that it has not analyzed its own data stream, which surprises me. I thought these young wizards were on top of data flows, not sitting back and just reacting to whatever happens.
According to the article, “Twitter is the nervous system of the Web.” This is a hypothetical, and I am not sure I buy that assertion. My view is that Google’s more diverse data flows are more useful. In fact, the metadata generated by observing flows within Buzz and Wave are potentially a leapfrog. Twitter is a bit like one of those Faith Popcorn-type of projects. Sniffing is different from getting the rare sirloin in a three star eatery in Lyon.
The write up points out that Twitter will use open source tools for the job. There are some juicy details of how Twitter will process the traffic.
A useful write up.
Stephen E Arnold, February 22, 2010
No one paid me to write this article. I will report non payment to the Department of Labor, where many are paid for every lick of work.
Global ETM Dives into Enterprise Search Intelligence
February 18, 2010
Stephen E Arnold has entered into an agreement with Global Enterprise Technology Management, an information and professional services company in the UK. Mr. Arnold has created a special focus page about enterprise search on the Global ETM Web site. The page is now available, and it features:
- A summary of the principal market sectors in enterprise search and content processing. More than a dozen sectors are identified. Each sector is plotted in a grid using Mr. Arnold’s Knowledge-Value Methodology. You can see at a glance which search sectors command higher and lower Knowledge Value to organizations. (Some of the Knowledge Value Methodology was described in Martin White’s and Stephen E. Arnold’s 2009 study Successful Enterprise Search Management.
- A table provides a brief description of each of the search market sectors and includes hot links to representative vendors with products and services for that respective market sector. More than 30 vendors are identified in this initial special focus page.
- The page includes a selected list of major trends in enterprise search and content processing.
Mr. Arnold will be adding content to this Web page on a weekly schedule.
Information about GlobalETM is available from the firm’s Web site.
Stuart Schram IV, February 18, 2010
I am paid by ArnoldIT.com, so this is a for-fee post.
Google Flashes Star Trek Gizmo
February 8, 2010
Short honk: In 2006 one of my partners and I made a series of presentations to Big Telecommunications Companies. After about 15 minutes of introductory comments, I perceived the reaction as my bringing a couple of dead squirrels into the conference room, chopping them up, and building a fire with the telco executives’ billfolds. Chilly and hostile are positive ways to describe the reaction to my description of Google’s telecommunications related technologies. Fortunately I got paid, sort of like a losing gladiator getting buried in 24 BCE in a mass grave.
You can see telco woe when you read and think about the story in the Herald Sun, “Google Leaps Barrier with Translator Phone.” The story apparently surfaced in the paywall secure London Times but the info leaked into the world in which I live via Australia. The key point in the write up was the sentence:
If it [a Google phone with automatic translation] worked, it could eventually transform communication among speakers of the world’s 6,000-plus languages.
Well, if it worked, it means that the Googlers’ voice search, machine translation, and low latency distributed computing infrastructure will find quite a few new customers in my opinion. Think beyond talking, which is obviously really important. I wonder if entertainment executives can see what the telco executives insisted was impossible tin 2006.
One president of a big cellular company in the chilly Midwest said in a very hostile tone as I recall, “Google can’t do telecommunications. It’s an ad company. We’re a telecommunications company. There’s a difference.”
Oh, is there? Bits are bits in my experience. I used to watch Star Trek and so did some Googlers assert I.
Stephen E Arnold, February 8, 2010
No one paid me to write this. I will report non payment to the FCC, a really great entity.
An Attensity About Face?
February 3, 2010
Update, February 3, 2010, 9 pm Eastern. A person suggested that this administrative move is designed to get around one or more procurement guidelines. Sounds reasonable but if the marketing push were ringing the cash register, would such a shift be necessary?–Stephen E Arnold
I learned that the Attensity Group has set up a business unit to sell to the Federal government. I thought Attensity’s roots were in selling to the Federal government and that the company’s diversification into marketing was a way to break free of the stereotypical vendor dependent on US government projects. Guess I was wrong again.
A reader sent me a link to this January 28, 2010, news release “Attensity Government Systems Launches as a Wholly Owned US Subsidiary of Attensity Group.” I noted this passage in the news release:
AGS offers a unique combination of the world’s leading semantic technologies: Attensity Group’s full offering of semantic engines and applications along with Inxight technologies from SAP BusinessObjects. Government agencies can now leverage — for the first time – the powerful capabilities enabled by the combination of Inxight’s multi-lingual advanced entity and event extraction with that of Attensity Group’s patented Exhaustive Extraction. Exhaustive Extraction automatically identifies and transforms the facts, opinions, requests, trends and trouble spots in unstructured text into structured, actionable intelligence and then connects it back to entities – people, places and things. This new combined solution provides researchers with the deepest and broadest capabilities for identifying issues hidden in mountains of unstructured data — inside emails, letters, social media sites, passenger manifests, websites, and more.
In my experience, this is a hybrid play. Along with consulting and engineering services, Attensity will make its proprietary solutions available.
According Attensity, AGS, short for Attensity Government Systems, will:
provides semantic technologies and software applications that enable government agencies to quickly find, understand, and use information trapped in unstructured text to drive critical decision-making. AGS solutions pre-integrate nouns (entities) together with verbs, combining leading semantic technologies, such as Inxight ThingFinder, with Attensity’s unique exhaustive extraction and other semantic language capabilities. This creates a unique capability to see important relationships, create link analysis charts, easily integrate with other software packages, and connect the dots in near real-time when time is of the essence. The comprehensive suite of commercial off-the-shelf applications includes intelligence analysis, social media monitoring, voice of the citizen, automated communications response and routing, and the industry’s most extensive suite of semantic extraction technologies. With installations in intelligence, defense and civilian agencies, Attensity enables organizations to better track trends, identify patterns, detect anomalies, reduce threats, and seize opportunities faster.
I did a quick check of my files on Inxight. A similar functionality may be part of the Powerset technology that acquired acquired. My hunch is that Attensity wants to go after government contracts with a broader offering than its own deep extraction technology. The play makes sense, but I wonder if it will confuse the ad execs who use Attensity technology for quite different purposes than some US government agencies.
Will Attensity be a front runner in this about face, or will the company build out other specialized business units? I can see a customer support unit coming from a vendor, maybe Attensity, maybe not? The bottom line is that search and content processing vendors are scrambling in order to avoid what some business school egg heads call “commoditization.”
Stephen E Arnold, February 3, 2010
No one paid me to write about vendors selling to the US government. I will report this to the US government, maybe the GAO just to show that I am intrinsically responsible.