Google: With Maturity Cometh Fear

September 15, 2008

CIOL News reported on September 13, 2008, “Google Mobile Chief Says Can’t Afford a Dud.” You can read the story by Yinka Adegoke and Eric Auchard here. The peg for the write up is that a Googler (Andy Rubin, director of mobile platforms) told folks that Android had to be a success. Not long ago, Google would roll out a beta and walk away carefree. Now, it seems, the company recognizes that a foul up with Android might chip one of Googlzilla’s fangs. CIOL News does a good job of summarizing the promise, the disappointments, and the status of Android. For me, the most important statement in the article was this passage:

Google plans its own software store, called Android Market. “It’s not necessarily the operating system software that is the unifying factor, it is the marketplace,” Rubin said. Unlike Apple, Google does not expect to generate revenue by selling applications or to share revenue with partners. “We made a strategic decision not to revenue share with the developers. We will basically pass through any revenue to the carrier or the developer,” said Rubin.

I found this interesting, but a trifle off center with some of the research I have done for my two Google studies here. Let me highlight three thoughts and invite you to purchase a copy of my studies to get more detail.

First, Google’s telephony related inventions span a wide range of technologies. While the marketplace is important, the investment Google has made in its telco inventions suggests that the marketplace may be the current focus, not the only focus, particularly over a span of years.

Two, Google, like Microsoft, is behind the eight ball in terms of Apple. The iPhone is a game changer, and the ecosystem that Apple has in place and generating money has momentum. Google and Microsoft have words and some devices that are not yet in iPhone’s league.

Third, mobile is a big deal, and I found a number of patent documents that suggest that Google is headed down the path to a walled garden. Right now, I don’t think that aspect of the Google strategy has been analyzed fully. The battle, therefore, may not be the one that most pundits write about; namely, Google and Microsoft. There are other wars to fight and soon.

Agree? Disagree? Help me learn.

Stephen Arnold, September 15, 2008

Future of Business Intelligence

September 15, 2008

Chris Webb penned a thoughtful and interesting article about the future of business intelligence. “Google, Panorama, and the Future of BI” here. A number of the comments touch upon delivering business intelligence from the cloud. Take a look at his write up. For me the most interesting point was:

It [cloud based business intelligence]  all depends on how quickly the likes of Google and Microsoft (which is supposedly going to be revealing more about its online services platform soon) can deliver usable online apps; they have the deep pockets to be able to finance these apps for a few releases while they grow into something people want to use…

What stuck me about this comment is that it suggests that the future of business intelligence will be determined by two companies who are not particularly well known for their business intelligence offerings. What becomes of SAP, SAS, and SPSS (just to name the companies whose names begin with “s”)?

What do you think? A two horse race or a couple of nags not sure where the race track is? Let me know.

Stephen Arnold, September 15, 2008

Privacy: One of Google’s Seven Deadly Sins?

September 15, 2008

The Register, CNet, and other Web information services were abuzz over Google’s clarification of its data retention policy. The story originated on CNet in an article keyboarded by Chris Soghoian here. The story was titled “Debunking Google’s Log Anonymization Propaganda.” In a real coup, Mr. Soghoian elicited a response from the GOOG that will be a favorite of mine for many years to come. Mr. Soghoian asked Google for clarification, and the GOOG replied:

After nine months, we will change some of the bits in the IP address in the logs; after 18 months we remove the last eight bits in the IP address and change the cookie information. We’re still developing the precise technical methods and approach to this, but we believe these changes will be a significant addition to protecting user privacy…. It is difficult to guarantee complete anonymization, but we believe these changes will make it very unlikely users could be identified…. We hope to be able to add the 9-month anonymization process to our existing 18-month process by early 2009, or even earlier.

The Register picked up the story here. The most interesting comment in the Register’s “Google’s Privacy Reform Is a Hoax.” For me the most interesting point in this article was:

What Google plans on doing means that it will still be able to track its users’ web search histories longer than nine months. And if, as one might be forgiven for suspecting, Google never clears users’ cookie identifiers, then it can track them forever. Without clearing its users’ cookie identifiers, Google’s widely praised, supposed “reform” of its individually identifying data retention practices is meaningless, and no true reform.

I am now making a catalog of Google’s Seven Deadly Sins. Privacy is definitely a candidate or should I consider mendacity? Watch this Web log for my decision and the other six sins. I may need to expand the limit in this Zeta Function. Stay tuned.

Stephen Arnold, September 15, 2008

More HP Search Related Information

September 14, 2008

One of my two or three readers sent along some kind words about Hewlett Packard. According to this professional, HP has been writing interesting white papers about search for a number of years. I dipped into the company’s Web site, and it seemed to me that HP was turning up the heat on search and content processing. I wanted to pass along one white paper recommended by your fellow reader that I found quite interesting. The subject is Lucene, the open source search engine which lurks at the heart of the IBM Yahoo “free” search system. The paper is by Mark Butler and James Rutherford. “Distributed Lucene: A Distributed Free Text Index for Hadoop.” Free is good. Hadoop is better. Distributed is the best. The paper became available in June 2008, and you can download it by navigating to http://www.hpl.hp.com/techreports/2008/HPL-2008-64.pdf. I have had quite a  bit of trouble locating information on the HP Web sites. I can’t guarantee that this link will be valid for months. I verified it on September 9, 2008. The paper is useful, but I liked Section 1.2.6 “Current Limitations”. Enjoy and a happy quack to the canny Beyond Search reader who submitted this link. The goose loves you. Your auto’s paint job is safe–for now.

Stephen Arnold, September 14, 2008

Attensity and BzzAgent: What’s the Angle

September 14, 2008

Attensity made a splash in the US intelligence community after 2001. A quick review of Attensity’s news releases suggests that the company began shifting its marketing emphasis from In-Q-Tel related entities to the enterprise in 2004-2005. By 2006, the company was sharpening its focus on customer support. Now Attensity is offering a wider range of technologies to organizations wanting to deal with their customers using Attensity’s technology.

In August 2008, the company announced that it had teamed up with the oddly named BzzAgent to provide insights into consumer conversations. BzzAgent, a specialist in word of mouth media. You can learn more about WOM–that is, word of mouth marketing–at the company’s Web site here.

The Attensity technology makes it possible for BzzAgent to squeeze meaning out of email or any other text. With the outputs of the Attensity system, BzzAgent can figure out whether a product is getting marketing lift or down draft. Other functionality provides beefier metrics to buttress the BaaAgent’s technology.

The purpose of this post is to ask a broader question about content processing and text analytics? To close, I want to offer a comment about the need to find places to sell rocket science information technology.

Why Chase Customer Support?

The big question is, “Why chase customer support?” Call centers, self service Web sites, and online bulletin board systems have replaced people in many organizations. In an effort to slash the cost of support, organizations have outsourced help to countries with lower wages than the organization’s home country. In an interesting twist of fate, Indian software outsourcing firms are sending some programming and technical work back to the US. Atlanta has been a beneficiary of this reverse outsourcing, according to my source in the Peach State.

Attensity’s technology performs what the company once described as “deep extraction.” The idea is to iterate through source documents. The process outputs metadata, entities, and a wide range of data that one can slice, dice, chart, and analyze. Attensity’s technology is quite advanced, and it can be tricky to optimize to get the best performance from the system on a particular domain of content.

Customer support appears to be a niche that functions like a hamburger to a hungry fly buzzing around tailgaters at the college football game. Customer support, despite vendors’ efforts to reduce costs and keep customers happy, has embraced every conceivable technology. There are the “live chat” telepresence services. There work fine until the company realizes that customers may be in time zones when the company is not open for business. There are the smart systems like the one Yahoo deployed using InQuira’s technology. To see how this works, navigate to Yahoo help central, type this question “How do I can premium email?”, and check out the answers. There are even more sophisticated systems deployed using tools from such companies as RightNow. This firm includes work flow tools and consulting to improve customer support services and operations.

The reason is simple–customer support remains a problem, or as the marketers say, “An opportunity.” I know that I avoid customer support whenever possible. Here’s a typical example. Verizon sent me a flier that told me I could reduce my monthly wireless broadband bill from $80 to $60. It took a Web site visit and six telephone calls to find out that the lower price came with a five gigabyte bandwidth cap. Not only was I stressed by the bum customer support experience, I was annoyed at what I perceived rightly or wrongly as the duplicity of the promotion. Software vendors jump at the chance to license Verizon a better mousetrap. So far, costs may have come down for Verizon, but this mouse remains far away from the mouse trap.

The new spin on customer support rotates around one idea: find out stuff * before * the customer calls, visits the Web site, or fires up a telepresence session.

That’s where Attensity’s focus narrows its beam. Attensity’s rocket science technology can support zippy new angles on customer support; for example, BzzAgent’s early warning system.

What’s This Mean for Search and Content Processing?

For me that is the $64 question. Here’s what I think:

  1. Companies like Attensity are working hard to find niches where their text analytics tools can make a difference. By signing licensing deals with third parties like BzzAgent, Attensity gets some revenue and shifts the cost of sales to the BzzAgent’s team.
  2. Attensity’s embedding or inserting its technology into BzzAgent’s systems deemphasizes or possibly eliminates the brand “Attensity” from the customers’ radar. Licensing deals deliver revenue with a concomitant loss of identify. Either way, text analytics moves from the center stage to a supporting role.
  3. The key to success in Attensity’s marketing shift is getting to the new customers first. A stampede is building from other search and content processing vendors to follow a very similar strategy. Saturation will lower prices, which will have the effect of making the customer support sector less attractive to text processing companies than it is now. ClearForest was an early entrant, but now the herd is arriving.

The net net for me is that Attensity has been nimble. What will the arrival of other competitors in the customer support and call center space mean for this niche? My hunch is that search and content processing is quickly becoming a commodity. Companies just discovering the customer support market will have to displace established vendors such as InQuira and Attensity.

Search and content processing certainly appear to be headed rapidly toward commoditization unless the vendor can come up with a magnetic, value add.

Stephen Arnold, September 14, 2008

Google: How Many Achilles’ Heels

September 14, 2008

The essay “Another Step to Protect User Privacy” on the Official Google Blog triggered a wave of commentary on Web logs and news services. I’m writing this commentary on September 9, 1008, but it will post on Sunday, September 14, 2008, as I fly to the Netherlands for a conference. I won’t rehash the arguments stated ably and well in the hundreds of links available on various news aggregation sites. Instead, I want to highlight the key sentence in the post and then offer several observations. For me the key sentence was:

We’ll anonymize IP addresses on our server logs after 9 months.

The rule of thumb in online information is that the most current data are the most useful. In any online system, historical data are useful as a base. The action is the most recent data. I’ve seen this in the most accessed documents in a commercial database, in our Point (Top 5%) of the Internet service in the mid 1990s, and I see it now on this Web log. This means that nine months is a long time when it comes to log and usage data. Think of the baseline of data as a bank filled with gold bricks. The current and timely data are the load bearing ore. Once processed, the high value component can be put in the vault and tapped when needed.

You don’t need me to reinterate that the issues of privacy and security are important. Both are intertwined, and I am uniformly critical of online systems that don’t pay much attention to these issues. Martin White and I have a new monograph nearing completion, and we are collaborating on the security section. I won’t repeat our arguments in detail. A one word summary is “Important”.

Privacy and security, therefore, are an Achilles’ heel for many companies, including Google. Google gets headlines because people have been slow to realize what the company has been building for upwards of a decade. Messrs. Brin and Page started with BackRub, learned from AltaVista.com, benefited from the portal craze, borrowed the Overture ad model, and befuddled everyone with lava lamps. Now folks are starting to realize that Google is a different kind of company. I won’t even say “I told you so.” My Google studies made this clear years ago.

The thoughts in my addled goose brain are the following:

  1. Google is not a search and advertising company. Those are applications running on what Google is. The company is the 21st centruy version of Ma Bell, US Steel, and Standard Oil. The problem is that those outfits were confined to one nation state. Google is supra national; that is, it’s opeating across nation states. This makes it tough to regulate.
  2. Security and privacy are one point of vulnerability, but one off challenges won’t make much difference.
  3. Google’s diffusion of its origional ethos is another Achilles’ heel. In the last year, the company has been dinged for going in many directions with little apparent focus. I’m not so sure. Google’s quite good as misdirection.
  4. Google’s now in the public eye, and the company is finding itself having to reverse directions, often quickly. The license agreement for Chrome is one example. The change in user data retention is another.

How many Achilles’ heels does Google have? I refer to Google as Googzilla and have since late 2004. That means that there are four key vulnerabilities that Google has. So far, none of the charges directed at Google have aimed at these weaknesses. As long as the critics target Google’s tough, protective hide, there is little chance of [a] leap frogging Google’s technology or [b] knocking out one of its four legs.

Stephen Arnold, September 14, 2008

For SharePoint and Dot Net Fans: The London Stock Exchange Case

September 13, 2008

Cyber cynic Stephen J. Vaughan-Nichols wrote “London Stock Exchange Suffers Dot Net Crash”. You should click here and read this well-written post. Do it now, gentle readers. The gist of the story is that LSE, with the help of Accenture and Microsoft, built a near real time system running on lots of Hewlett Packard upscale servers, Windows Server 2003, and my old pal, SQL Server 2000. The architecture was designed to run really fast, a feat my team has never achieved with Windows Server or SQL Server without lots of tricks and lots of scale up and scale out work. The LSE crashed. For me the most significant statement in the write up was:

Sorry, Microsoft, .NET Framework is simply incapable of performing this kind of work, and SQL Server 2000, or any version of SQL Server really, can’t possibly handle the world’s number three stock exchange’s transaction load on a consistent basis. I’d been hearing from friends who trade on the LSE for ages about how slow the system could get. Now, I know why.

Why did I find this interesting? Three reasons:

  1. There’s a lot of cheerleading for Microsoft SharePoint. This LSE melt down is a reminder that even with experts and resources, the Dot Net / Windows Server / SQL Server triumvirate get along about as well as Pompey, Crassus and Caesar. Pretty exciting interactions with this group.
  2. Microsoft is pushing hard on cloud computing. If the LSE can’t stay up, what’s that suggest for mission critical enterprise applications running in Microsoft’s brand new data centers running on similar hardware and using the same triumvirate of software
  3. Speed and Dot Net are not like peanut butter and jelly or ham and eggs. Making Microsoft software go fast requires significant engineering work and sophisticated hardware. The speed ups don’t come in software, file systems, or data management methods. Think really expensive engineering year in and year out.

I know there are quite a few Dot Net fans out there. We have it running on one of our servers. Are your experiences like mine, generally good. Or are your experiences like the LSE, less than stellar. Oh, Mr. Vaughan-Nichols asserts that the LSE is starting to use Linux on its hardware.

Stephen Arnold, September 13, 2008

Salesforce InStranet and Sinequa Connection

September 13, 2008

In August 2008, Salesforce.com paid about $31.0 million for InStranet, according to eWeek here. Salesforce.com is the high flying cloud computing vendor. InStranet, founded in 1999, was generating a fraction of the Salesforce.com revenue. The principal business of InStranet’s technology provides customer support and contact center systems and tools. You can find more information, including white papers, here. InStranet calls its solutions “multi channel knowledge applications.” The idea is that a customer searches for information or looks at suggestions for links that can resolve a customer support issue.

Why am I writing about this modest acquisition that has been overlooked by most Web logs and trade news services? The answer is that Sinequa, the French search and content processing vendor, provides the search and finding technology included in some of InStranet’s customer support solutions. After four weeks of waiting, Sinequa learned that its deal with InStranet will not be affected by the Salesforce.com buy out of InStranet. You can read Bios Magazine’s report here.

Will Salesforce.com leverage Sinequa’s search and content processing technology? I will keep you posted.

More information about Sinequa is here.

Google and Content: Way Back in 1999

September 13, 2008

Nine years ago Google was a search engine, right? If you said, “Yes,” you were not wrong but not correct either. Google was worrying about static Web pages and ways to inject content into those Web pages. The idea was to get “some arbitrary input” from the user, and Google would take it from there. In 1999, Google’s wizards were working on ways to respond to user actions, create methods to assemble pertinent content likely to match the user’s need, and generate a Web page with that disparate information.

Why do I care about Google in 1999? Two reasons:

  1. Google was thinking “publishing” type thoughts a long time ago
  2. A patent with information on the specific system and method just popped out of the USPTO’s extremely efficient system.

The patent in question is US7424478 B2. Google filed it in 2000, received patent 6728705 and now this most recent incarnation. The title is “System and Method for Selecting Content for Displaying over the Internet Based upon Some User Input.” With the recent release of Chrome, the notion of assembling and publishing content from disparate sources is somewhat analogous to what Ziff Communications Co. used to do when it was publishing magazines or what its database units did when generating job opportunities in its General Business File product.

With Google scanning books and newspapers, it seems logical that it would take user input and assemble a Web page that goes beyond a laundry list. For me, the importance of this invention is that the GOOG was thinking these thoughts before it had much search traffic or money. Postscript: the mock screen shots are fun as well. You can see the sites that were catching Google’s attention almost a decade ago. Anyone remember Go.com?

Stephen Arnold, September 13, 2008

Search: A Failure to Communicate

September 12, 2008

At lunch today, the ArnoldIT.com team embraced a law librarian. For Mongolian beef, this information professional agreed to talk about indexing. The conversation turned to the grousing that lawyers do when looking for information. I remembered seeing a cartoon that captured the the problem we shelled, boiled, and deviled during our Chinese meal.

failure to communicate chrisian

Source: http://www.i-heart-god.com/images/failure%20to%20communicate.jpg

Our lunch analysis identified three constituencies in a professionals services organization. We agreed that narrowing our focus to consultants, lawyers, financial mavens, and accountants was an easy way to put egg rolls in one basket.

First, we have the people who understand information. Think indexing, consistent tagging for XML documents, consistent bibliographic data, the credibility of the source, and other nuances that escape my 86 year old father when he searches for “Chicago Cubs”.

Second, we have the information technology people. The “information” in their title is a bit of misdirection that leads to a stir fry of trouble. IT pros understand databases and file types. Once data are structured and normalized, the job is complete. Algorithms can handle the indexing and the metadata. When a system needs to go faster, the fix is to buy hardware. If it breaks, the IT pros tinker a bit and then call in an authorized service provider.

Third, we have the professionals. These are the ladies and gentlemen who have trained to master a specific professional skill; for example, legal eagle or bean counter. These folks are trapped within their training. Their notions of information are shaped by their dead lines, crazed clients, and crushing billability.

Here’s where the search system or content processing system begins it rapid slide to the greasy bottom of the organization’s wok.

  1. No one listens or understands the other players’ definition of “information”.
  2. The three players, unable to get their points across, clam up and work to implement their vision of information
  3. The vendors, hungry for the licensing deal, steer clear of this internal collision of ignorant, often supremely confident souls
  4. The system is a clunker, doing nothing particularly well.

Enter the senior manager or the CFO. Users are unhappy. Maybe the system is broken and a big deal is lost or a legal matter goes against the organization. The senior manager wants a fix. The problem is that unless the three constituents go back to the definition of information and carry that common understanding through requirements, to procurement, to deployment, not much will change.

Like the old joke says, “Get me some new numbers or I will get a new numbers guy.” So, heads may roll. The problem remains the same. The search and content processing system annoys a majority of its users. Now, a question for you two or three readers, “How do we fix this problem in professional services organizations?

Stephen Arnold, September 12, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta