Microsoft’s SharePoint in a Post Chrome World

September 17, 2008

CNet ran an interesting story on September 9, 2008 with the fetching title “Microsoft’s Response to Chrome. SharePoint.” The author was Matt Asay, a fellow whose viewpoint I enjoy. For me, the key point to this article which you can read here was:

Microsoft, then, has not been sitting still, waiting to be run over by Google. It has been quietly spreading SharePoint throughout enterprises. SharePoint opens up enterprise data to Microsoft services, running in Microsoft’s browser. Unlike Google, however, Microsoft already has an impressive beachhead in the enterprise. It’s called Office, and most enterprises are addicted to it. In sum, if Google is aiming for Windows, it’s going to lose, because the table stakes are much higher. For Microsoft, the game is SharePoint. For the rest of the industry, including Google, the response needs to be content standardization.

The battle between Google and Microsoft pivots on content. SharePoint is Microsoft’s content standardization play. I think this argument is interesting, but a handful of modest issues nagged at me when I read the article:

  1. SharePoint is a complicated collection of “stuff”. You can check out the SharePoint placemat here. Complexity may be the major weakness of SharePoint.
  2. SharePoint search is a work in progress. If you have lots of content even if it is standardized, I find the native SharePoint search function pretty awful. I find it even more awful when I have to configure it, chase down aberrant security settings, and mud wrestle SQL Server performance. I think this is an iceberg issue for Microsoft. The marketing shows the top; the tech folks see what’s hidden. It’s not pretty.
  3. Google’s approach to content standardization is different from the SharePoint approach Mr. Asay describes. The GOOG wants software to transform and manipulate content. The organization can do what it wants to create information. Googzilla can handle it, make it searchable, and even repurpose it with one of its “publishing” inventions disclosed in patent documents.

I hear Mr. Asay. I just don’t think SharePoint is the “shields up” that Microsoft needs to deal with Google in the enterprise. Agree? Disagree? Help me learn, please.

Stephen Arnold, September 10, 2008

Attensity and BzzAgent: What’s the Angle

September 14, 2008

Attensity made a splash in the US intelligence community after 2001. A quick review of Attensity’s news releases suggests that the company began shifting its marketing emphasis from In-Q-Tel related entities to the enterprise in 2004-2005. By 2006, the company was sharpening its focus on customer support. Now Attensity is offering a wider range of technologies to organizations wanting to deal with their customers using Attensity’s technology.

In August 2008, the company announced that it had teamed up with the oddly named BzzAgent to provide insights into consumer conversations. BzzAgent, a specialist in word of mouth media. You can learn more about WOM–that is, word of mouth marketing–at the company’s Web site here.

The Attensity technology makes it possible for BzzAgent to squeeze meaning out of email or any other text. With the outputs of the Attensity system, BzzAgent can figure out whether a product is getting marketing lift or down draft. Other functionality provides beefier metrics to buttress the BaaAgent’s technology.

The purpose of this post is to ask a broader question about content processing and text analytics? To close, I want to offer a comment about the need to find places to sell rocket science information technology.

Why Chase Customer Support?

The big question is, “Why chase customer support?” Call centers, self service Web sites, and online bulletin board systems have replaced people in many organizations. In an effort to slash the cost of support, organizations have outsourced help to countries with lower wages than the organization’s home country. In an interesting twist of fate, Indian software outsourcing firms are sending some programming and technical work back to the US. Atlanta has been a beneficiary of this reverse outsourcing, according to my source in the Peach State.

Attensity’s technology performs what the company once described as “deep extraction.” The idea is to iterate through source documents. The process outputs metadata, entities, and a wide range of data that one can slice, dice, chart, and analyze. Attensity’s technology is quite advanced, and it can be tricky to optimize to get the best performance from the system on a particular domain of content.

Customer support appears to be a niche that functions like a hamburger to a hungry fly buzzing around tailgaters at the college football game. Customer support, despite vendors’ efforts to reduce costs and keep customers happy, has embraced every conceivable technology. There are the “live chat” telepresence services. There work fine until the company realizes that customers may be in time zones when the company is not open for business. There are the smart systems like the one Yahoo deployed using InQuira’s technology. To see how this works, navigate to Yahoo help central, type this question “How do I can premium email?”, and check out the answers. There are even more sophisticated systems deployed using tools from such companies as RightNow. This firm includes work flow tools and consulting to improve customer support services and operations.

The reason is simple–customer support remains a problem, or as the marketers say, “An opportunity.” I know that I avoid customer support whenever possible. Here’s a typical example. Verizon sent me a flier that told me I could reduce my monthly wireless broadband bill from $80 to $60. It took a Web site visit and six telephone calls to find out that the lower price came with a five gigabyte bandwidth cap. Not only was I stressed by the bum customer support experience, I was annoyed at what I perceived rightly or wrongly as the duplicity of the promotion. Software vendors jump at the chance to license Verizon a better mousetrap. So far, costs may have come down for Verizon, but this mouse remains far away from the mouse trap.

The new spin on customer support rotates around one idea: find out stuff * before * the customer calls, visits the Web site, or fires up a telepresence session.

That’s where Attensity’s focus narrows its beam. Attensity’s rocket science technology can support zippy new angles on customer support; for example, BzzAgent’s early warning system.

What’s This Mean for Search and Content Processing?

For me that is the $64 question. Here’s what I think:

  1. Companies like Attensity are working hard to find niches where their text analytics tools can make a difference. By signing licensing deals with third parties like BzzAgent, Attensity gets some revenue and shifts the cost of sales to the BzzAgent’s team.
  2. Attensity’s embedding or inserting its technology into BzzAgent’s systems deemphasizes or possibly eliminates the brand “Attensity” from the customers’ radar. Licensing deals deliver revenue with a concomitant loss of identify. Either way, text analytics moves from the center stage to a supporting role.
  3. The key to success in Attensity’s marketing shift is getting to the new customers first. A stampede is building from other search and content processing vendors to follow a very similar strategy. Saturation will lower prices, which will have the effect of making the customer support sector less attractive to text processing companies than it is now. ClearForest was an early entrant, but now the herd is arriving.

The net net for me is that Attensity has been nimble. What will the arrival of other competitors in the customer support and call center space mean for this niche? My hunch is that search and content processing is quickly becoming a commodity. Companies just discovering the customer support market will have to displace established vendors such as InQuira and Attensity.

Search and content processing certainly appear to be headed rapidly toward commoditization unless the vendor can come up with a magnetic, value add.

Stephen Arnold, September 14, 2008

eDiscovery: Speed Bumps Annoy Billing Attorneys

September 12, 2008

A happy quack to my Australian reader who called “eDiscovery Performance Still a Worry”. The article by Greg McNevin appeared on the IDM.net.au Web site on September 10, 2008. The main point of the write up is that 60 percent of those polled about their organization’s eDiscovery litigation support system said, “Dog slow.” The more felicitous wording chosen by Mr. McNevin was:

The survey also found that despite 80 percent of organisations claiming to have made an investment in IT to address discovery challenges, 60 percent of respondents think their IT department is not always able to deliver information quickly enough for them to do their legal job efficiently.

The survey was conducted by Dynamic Markets, who polled 300 in house legal eagles in the Uk, Germany, and the Netherlands. My hunch is that the 60 percent figure may well apply in North America as well. My own research unearthed the fact that two thirds of the users of enterprise search systems were dissatisfied with those systems. The 60 percent score matches up well.

In my view, the larger implication of this CommVault study is that when it comes to text and content processing, more than half the users go away annoyed or use the system whilst grumbling and complaining.

What are vendors doing? There’s quite a bit of activity in the eDiscovery arena. More gladiators arrive to take the place of those who fall on their swords, get bought as trophies, or die at hands of another gladiator. Sadly, the activity does not address the issue of speed. In the sense for this context, “speed” in not three millisecond response time. “Speed” means transforming content, updating indexes, and generating the reports needed to figure out what information is where in the discovered information.

Many vendors are counting on Intel to solve the “speed” problem. I don’t think faster chips will do much, however. The “speed” problem is that eDiscovery relies on a great many processes. Lawyers, in general, have a need for what’s required to meet a deadline. There’s little reason for them to trouble their keen legal minds with such details as content throughput, malformed XML, flawed metatagging, and trashed indexes after an index update.

eDiscovery’s dissatisfaction score mirrors the larger problems with search and content processing. There’s no fix coming that will convert a grim black and white image to a Kodachrome version of reality.

Stephen Arnold, September 12, 2008

Search: Google’s 10 Percent Problem

September 11, 2008

I love it when Google explains the future of search. Since Google equals search for more than 70 percent of the users in North America and even more outside the US, the future of search means Google. And what does Google’s helpful Google Web log here tells us:

So what’s our straightforward definition of the ideal search engine? Your best friend with instant access to all the world’s facts and a photographic memory of everything you’ve seen and know. That search engine could tailor answers to you based on your preferences, your existing knowledge and the best available information; it could ask for clarification and present the answers in whatever setting or media worked best. That ideal search engine could have easily and elegantly quenched my withdrawal and fueled my addiction on Saturday.

The “universal search” play announced at the hastily conceived Searchology news conference–anyone remember that?–has fallen by the wayside. I have wondered if the BearStearns’ publication of the Google Programmable Search Engine report and the suggestion that Google may be angling to become the Semantic Web spawned that Searchology program.

I don’t think search is a 10 percent problem for Google. The problem is bandwidth, regulations, traffic, and the market. After digging through Google’s technical papers and patent documents, I have reached the conclusion that the GOOG has the basics in place for next-generation search; for example:

  • Search without search
  • Dossier generation
  • Predictive content assembly
  • Integration of multiple functions because “search” is simply a way station on the path to solving a problem.

Most of the search pundits getting regular paychecks for now from mid level consulting firms assert that we are at the first step or Day One of a long journey with regard to search. Sorry, mid range MBAs. Search–key word variety–has been nailed. Meeting the needs of the herd searcher–nailed. Personalization of results–nailed.

What’s next are these search solutions. The reason that vendors are chasing niches like eDiscovery and call center support is simple. These are problems that can be addressed in part by information access.

Meanwhile the GOOG sits in its lairs and ponders when and how to release to maximum advantage the PSE, dataspaces, “I’m feeling doubly lucky” and dozens of other next generation search goodies, including social. Keep in mind that the notion of clicks is a social function. Google’s been social since the early days of BackRub.

There you have it. Google has a 10 percent challenge. In my opinion, that last 10 percent will be tough. Lawyers and other statistically messy non-algorithmic operations may now govern Googzilla’s future. If you want links to these Google references, you can find them here. My rescue boxer Tess needs special medical attention, so you have to buy my studies for the details. Sorry. Rescue boxers come before free Web log readers. Such is life. Sigh.

Stephen Arnold, September 11, 2008

Google Chrome: What’s It Mean

September 10, 2008

Author’s Note: this post is speculation about the “meaning” of chrome.

Over the weekend, I spoke with a colleague who was interested in the metaphor behind Google’s choice of the word chrome as the name for the beta of the Google browser. There’s a firestorm of controversy raging over what that Google browser is. I want to steer clear of that discussion. I have written about Google’s technology elsewhere and concluded in 2005 that Google is now building applications for its infrastructure. The browser is just an application, which means that it is not “just” a browser.

Back to our conversation: chrome is an interesting choice. I argued that the meaning of “chrome” was a bright, shiny surface, tougher than the lower grade compound to which it is applied. I was thinking of the bumpers on my restored 1973 Grandville convertible, which gets an awesome five miles to the gallon.

The first metaphor, then, is a shiny, hard surface. Could Google Chrome make the innards of Google more attractive? If so, then, it follows that the surface would protect the underlying parts. Makes send to me. I think this “meaning” works quite well.

Chrome also is an alternative name for the Oxygene programming language. Based on Object Pascal, Chrome is adept at lambda expressions. Could the meaning of chrome be a reference to the functions of this specialized programming language. I think this is an outlier. More information about this language is at here.

Chrome carries the connotation of bright colors and hyper reality. The source for this interpretation is Kodak Kodachrome transparency film. John Evans, a professional photographer based in Pittsburgh, told me, “Kodachrome makes nuclear power plants look good.” Maybe? I do like the suggestion of heightening reality. Could Google Chrome heighten the reality of a browser experience.

Chrome is a fictional mutant character in Marvel Comics’ Universe. I often refer to Google as Googzilla. I must admit I have a predisposition to this “meaning” of chrome.

Chrome refers to music. There’s an XM Radio channel by that name, an album by Trace Adkins, who is popular in rural Kentucky, and a track Debbie Harry’s album Koo Koo.

What does this tell us? Not much I fear.

Stephen Arnold, September 14, 2008

First Search Mini-Profile: Stratify

September 9, 2008

Beyond Search has started its search and content processing mini-profile series.

The first profile is about Stratify, and you can read it here.

The goal is to publish each week a brief snapshot of selected search and content processing vendors. The format of each profile will be a short essay that covers the background of the system, its principal features, strengths, weaknesses, and an observation. The idea inspiring each profile is to create a basic summary. Each vendor is invited to post additional information, links, and updates. On a schedule yet to be determined, each mini-profile will be updated and the comments providing new information deleted. The system allows a reasonable trade off between editorial control and vendor supplements. We will try to adhere to the weekly schedule. Our “Search Wizards Speak” series has been well received, and we will add interviews, but the interest in profiles has been good. Remember. You don’t need to write me “off the record” or even worse call me to provide insights, updates, and emendations. Please, use the comments section for each profile. I have other work to do. I enjoy meeting new people via email and the phone, the volume of messages to me is rising rapidly. Enjoy the Stratify post. You will find the profiles under the “Profile” tab on the splash page for the Web log. I will post a short news item when a new profile becomes available. Each profile will be indexed with the key word “profile”.

Stephen Arnold, September

Text Processing: Why Servers Choke

September 6, 2008

Resource Shelf posted a link to a Hewlett Packard Labs’s paper. Great find. You can download the HP write up here (verified at 7 pm Eastern) on September 5, 2008. The paper argues that an HP innovation can process text at the rate of 100 megabytes per second per processor core. That’s quite fast. The value of the paper for me was that the authors of Extremely Fast Text Feature Extraction for Classification and Indexing” have done a thorough job of providing data about the performance of certain text processing systems. If you’ve been wondering how slow Lucene is, this paper gives you some metrics. The data seem to suggest that Lucene is a very slow horse in a slow race.

Another highlight of George Forman’s and Evan Kirshebaum’s write up was this statement:

Multiple disks or a 100 gigabit Ethernet feed from many client computers may certainly increase the input rate, but ultimately (multi-core) processing technology is getting faster faster than I/O bandwidth is getting faster. One potential avenue for future work is to push the general-
purpose text feature extraction algorithm closer to the disk hardware.  That is, for each file or block read, the disk controller itself could distill the bag-of-words representation and then transfer only this small amount  of data to the general-purpose processor.  This could enable much higher indexing or classification scanning rates than is currently feasible.  Another potential avenue is to investigate varying the hash function to improve classification performance, e.g. to avoid a particularly unfortunate collision between an important, predictive feature and a more frequent word that masks it.

When I read this, two thoughts came to mind:

  1. Search vendors counting on new multi core CPUs to solve performance problems won’t get the speed ups needed to make some systems process content more quickly. Bad news for one vendor whose system I just analyzed for a company convinced that performance is a strategic advantage. In short, slow loses.
  2. As more content is processed and short cuts taken, hash collisions can reduce the usefulness of the value-added processing. A query returns unexpected results. Much of the HP speed up is a series of short cuts. The problem is that short cuts can undermine what matters most to the user–getting the information needed to meet a need.

I urge you to read this paper. Quite a good piece of work. If you have other thoughts about this paper, please, share them.

Stephen Arnold, September 6, 2008

Intel and Search

September 5, 2008

True, this is a Web log posting, but I am interested in search thoughts from Intel or its employees. I found the post  “Why I Will Never Own and Electronic Book” interesting. I can’t decide whether the post is suggestive or naive. You can read the posted by Clay Breshears here. On the surface, Mr. Breshears is pointing out that ebook readers’ search systems are able to locate key words. He wants these generally lousy devices to sport NLP or natural language processing. The portion of the post that caught my attention was:

We need better natural language processing and recognition in our search technology.  Better algorithms along with parallel processing is going to be the key.  Larger memory space will also be needed in these devices to hold thesaurus entries that can find the link between “unemployed” and “jobless” when the search is asked to find the former but only sees the latter.  Maybe, just maybe, when we get to something like that level of sophistication in e-book devices, then I might be interested in getting one.

Intel invested some money in Endeca. Endeca gets cash, and it seems likely that Intel may provide Endeca with some guidance with regard to Intel’s next generation multi core processors. In year 2000, Intel showed interest in getting into the search business with its exciting deal with Convera. I have heard references to Intel’s interest in content processing. The references touch upon the new CPUs computational capability. Most of this horsepower goes unused, and the grape vine suggests that putting some content pre-processing functions in an appliance, firmware, or on the CPU die itself might make sense.

This Web log post may be a one-off comment. On the other hand, this ebook post might hint at other, more substantives conversations about search and content processing within Intel. There’s probably nothing to these rumors, but $10 million signals a modicum of interest from my vantage point in rural Kentucky.

Stephen Arnold, September 5, 2008

Why Dataspaces Matter

August 30, 2008

My posts have been whipping super-wizards into action. I don’t want to disappoint anyone over the long American “end of summer” holiday. Let’s consider a problem in information retrieval and then answer in a very brief way why dataspaces matter. No, this is not a typographical error.

Set Up

A dataspace is somewhat different from a database. Databases can be within a dataspace, but other information objects, garden variety metadata, and new types of metadata which I like to call meta metadata, among others can be encompassed. These are represented in an index. For our purpose, we don’t have to worry about the type of index. We’re going to look up something in any of the indexes that represent our dataspace. You can learn more about dataspaces in the IDC report #213562, published on August 28, 2008. It’s a for fee write up, and I don’t have a copy. I just contribute; I don’t own these analyses published by blue chip firms.

Now let’s consider an interesting problem. We want to index people, figure out what those people know about, and then generate results to a query such as “Who’s an expert on Google?” If you run this query on Google, you get a list of hits like this.

google expert

This is not what I want. I require a list of people who are experts on Google. Does Live.com deliver this type of output? Here’s the same query on the Microsoft system:

live expert output

Same problem.

Now let’s try the query on Cluuz.com, a system that I have written about a couple of times. Run the query “Jayant Madhavan” and I get this:

cluuz

I don’t have an expert result list, but I have a wizard and direct links to people Dr. Madhavan knows. I can make the assumption that some of these people will be experts.

If I work in a company, the firm may have the Tacit system. This commercial vendor makes it possible to search for a person with expertise. I can get some of this functionality in the baked in search system provided with SharePoint. The Microsoft method relies on the number of documents a person known to the system writes on a topic, but that’s better than nothing. I could if I were working in a certain US government agency use the MITRE system that delivers a list of experts. The MITRE system is not one whose screen shots I can show, but if you have a friend in a certain government agency, maybe you can take a peek.

None of these systems really do what I want.

Enter Dataspaces

The idea for a dataspace is to process the available information. Some folks call this transformation, and it really helps to have systems and methods to transform, normalize, parse, tag, and crunch the source information. It also helps to monitor the message traffic for some of that meta metadata goodness. An example of meta metadata is an email. I want to index who received the email, who forwarded the email to whom and when, and any cutting or copying of the information in the email to which documents and the people who have access to said information. You get the idea. Meta metadata is where the rubber meets the road in determining what’s important regarding information in a dataspace.

Read more

Dataspaces Analysis Available

August 29, 2008

IDC, the research giant near Boston, has issued for its paying customers “Google: A Push Beyond Databases”. The write up is part of the firm’s Technology Assessment series. Sue Feldman, the IDC search and content processing lead analyst and industry expert, is the lead author. I provided some background rowing. The result is a useful first look at a Google initiative that’s been rolling along since 2006. The 12-page document provides a brief definition of dataspaces, a timeline of key events, and several peeks into the technology and applications of this important technology. Ms. Feldman and I teamed to outline some of the implications that we identified. If you want a copy of this document, you will have to contact IDC for document #213562. If your company has an IDC account, you can obtain the document directly. If you wish to purchase a copy of this report, navigate to http://www.idc.com/ and click on the “Contact” link. As with my BearStearns’ Google analyses, I am not able to release these documents. I’m sure others know about dataspaces, but I find the topic somewhat fresh and quite suggestive.

This report is particularly significant in light of Google’s making its “golden oldie” technology MapReduce available to Aster Data and Greenplum. You can read about this here. Last year, I spoke with representatives of IBM and Oracle. I asked about their perceptions of Google in the database and data management business. Representatives of both companies assured me that Google was not interested in this business. Earlier this year, via a government client I learned that IBM’s senior managers see Google as a company that is fully understood by the top brass of the White Plains giant. My thought is that it must be wonderful to know so much about Google, its deal for MapReduce, and now the dataspace technology before anyone else learns of these innovations. The dataspace write up, therefor, will be interest to those who lack the knowledge and insight of IBM and Oracle wizards.

Stephen Arnold, August 29, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta