Google’s Udi Manber on Search Quality

June 18, 2008

The Googlers were out in force, chipper and  explaining, to the 150 or so attendees of the Gilbane Group’s annual content management conference.

The key reason that drives Google forward, asserted Dr. Manber, is that users have rising expectations. Google, therefore, must use smart software, innovate, and scale. In 2007, Google tweaked its PageRank algorithm more than 450 times. Google works to keep bureaucracy at a minimum, empowering engineers to make necessary changes.

PageRank changes are not based on hunches. Extensive data analysis underlies tweaks.

The 21st century, asserted Dr. Manber, is about understanding people; that is, social interactions. Starting points for analysis are user intent. Queries are diverse like “hairstyles for ears that stick out” or “i’m going to win the lottery”.

Like other search systems, Google looks terms up in its index. Then Google uses other functions in order to determine intent; for example, time, place, context, and user information from “individualized Google,” if available.

You can see this in action. Run the queries “GM cars” then “GM food”. Google returns different results for each query even though the acronym GM appears in each query.

User expectations are now growing quickly. Google, therefore, must innovate and continue to scale.

Some development features were referenced, but these were not active in “regular” Google when I ran these sample queries. The presentation was well received and triggered a flurry of questions about site search and universal or federated search. Attendees applauded enthusiastically. The Googley magic was working today.

Stephen Arnold, June 18, 2008

The LinkedIn Bet: $1 Billion Social Valuation

June 18, 2008

The chatter about the Linked In valuation of $1 billion is choking my trusty RSS readers. The voice that reached me was Om Malik’s comments here. The essay is “Is LinkedIn worth $1 Billion.” Mr. Malik makes two points that warrant highlighting in the midst of the cacophony:

  • The notion that smart money has picked a winner may be suspect.
  • The per subscriber valuation is generous.

Mr. Malik nails this financial optimism as out of step with the company’s performance.

There are three other factors that Mr. Malik’s must-read essay surfaced in my mind:

  1. Social networks can be gamed. My experience with Linked In suggests that the controls on abuse are not as fine-grained as they should be
  2. The layers of fees are annoying to me, and I suspect that others will find that invitations often carry along obligations I don’t want
  3. In a deteriorating economy, referrals are indeed important. However, LinkedIn often wobbles into probes for intelligence in the form of questions from people whom I don’t know and marketing in the form of thinly disguised marketing pitches.

These three factors when combined with Mr. Malik’s analysis suggest an optimistic valuation. “Social” is hot. I am not convinced that today’s flag carriers will be tomorrow’s winners.

Stephen Arnold, June 17, 2008

Business Intelligence: Turmoil and Change Loom

June 18, 2008

Fern Halper, a member of Hurwitz & Associates team, wrote “Text Analytics and the Predictive Enterprise” on June 13, 2008. The story appeared on IT Analyses, and I just saw it.

Ms. Halper makes two point about text analysis. She is talking about analytics vendor SPSS, but her comments apply across the business intelligence spectrum.

First, she makes it clear that text contributes to business intelligence. Structured data and text yield useful insights. The idea is that mining both is more meaningful.

Second, she asserts that analysis of Web logs and other social information can add value to traditional business intelligence activities.

SPSS, SAS Institute, Business Objects (now part of SAP), Clarabridge, and other vendors share somewhat similar views.

My hunch is that market friction is going to become more evident as IBM, Microsoft, and Oracle increase their analytics efforts. Business intelligence, like search, is moving downmarket and to some extent becoming a utility functions.

My research into frustration with enterprise search shined a light into a formerly dark corner of an increasingly important function. Business intelligence also has annoyed users with its complexity, hard-to-understand reports, and lack of “average manager” interfaces.

What’s this mean?

My thought is that head-to-head competition will increase. Business intelligence vendors will find themselves pressured to keep their clients from drifting toward analytics solutions bundled with other enterprise applications from the likes of IBM, Microsoft, and Oracle. In addition, traditional business intelligence vendors have to figure out how to keep newcomers like Attensity (deep extraction) and Aster Data (data management) from making sales in organizations where there once was a traditional business intelligence monopoly.

For many years, competition among the SAS Institute and SPSS was governed by the type of rules that once governed duels with pistols. Business Objects brought more Madison Avenue sizzle to business intelligence. Now lines are blurring between high-end, specialist business intelligence and what I call “baked in BI” from IBM, Microsoft, and Oracle. Add to this the upstarts arriving with zippier technology and a hunger for making sales. The result is an uptick in competitiveness.

Companies today need to find ways to keep customers and squeeze meaning from available data. Search on its own does not deliver what an organization needs. Crunching numbers does not deliver. Text analytics does not deliver. Organizations need all three functions to be available and usable by the average manager.

With this problem getting more attention, a hybrid solution is needed. With a lucrative pay off for the company that cracks this problem, accelerating change is not just likely, significant disruption awaits us in business intelligence. Who could profit from this increased turmoil? I think Google may be a factor going forward. Hosted crunching, customers wanting ease of use, canned analytics and APIs, and social data–are ingredients for a new enterprise recipe from the GOOG?

Stephen Arnold, June 17, 2008

Tag Clouds for the Enterprise

June 18, 2008

One of the Web 2.0 functions that cause Enterprise 2.0 champions’ adrenaline to surge is tag clouds. Digital Inspiration has an excellent essay about these here. The examples are worth the visit. The most useful information in the Web log post is the link to Wordle. You will discover that Wordle is not designed for industrial-strength tag cloud generation. You may find these tools more useful:

Tag clouds and other text processing visualizations are available in a number of commercial text processing systems, including Attensity‘s and Megaputer‘s products.

Stephen Arnold, June 18, 2008

Gilbane Chats Up a Silly Goose: The Arnold Interview

June 18, 2008

On Wednesday, June 18, 2008, I will be interviewed in front of an audience completely unaware of why a fellow from Harrod’s Creek, Kentucky, is sitting on a stage answering questions. No one is more baffled than I. Based on my knowledge of the big city, I anticipate confusion, torpor, and indifference to my comments.

In this essay, which will become available on June 18, 2008, the curious will have a reference document that summarizes my thoughts on issues about which I may be asked. There has been no dry run for this interview. The last one in which I participated–the Associated Press’s invitation-only gathering last year–left the audience with little appetite for food. Some found the beverage table a more welcome destination.

Anticipated Question 1: What’s “beyond search” mean?

In research conducted by me and others, about two-thirds of the users of an enterprise search system are dissatisfied with that system. “Beyond search” implies that we have to move to another approach because what is now available in organizations with which I and the other researchers have investigated is not well liked. Due to the cost of some systems, annoying two-thirds of the users is tantamount to getting a D or an F on a report card.

Anticipated Question 2: What’s “behind the firewall search” mean?

I wrote about the search elephant here. Many different functions involving information access are made available to an employee, contractor, or authorized user. The idea is that “behind the firewall search” is not public and made available by an organization to a select group of users. The “search elephant” refers to the many different ways in which search is understood and perceived within an organization.

Anticipated Question 3: Why are there so many search vendors and more coming each day?

There is a belief that existing systems are not tapping into what I have estimated to be a $2.5 billion market for information access in the enterprise. Entrepreneurs and people with money look at Google and think, “We should be able to make gains like that in the enterprise market.” I also think that the market itself is trying to figure out the search elephant. Buyers don’t know what is needed. When entrepreneurs, money, and confused customers with severe information access problems come together, we have the type of market place that exists today.

Anticipated Question 4: What about Microsoft and Fast Search & Transfer?

I understand that it is business as usual at Microsoft and Fast Search. For Microsoft, this means trying to get 10,000 motorboats to go in roughly the same direction. For Fast Search, the company continues to license its Enterprise Search Platform and service customers. There are many bits of grit in the working parts where Microsoft and Fast Search mesh. It is too soon to tell if these inhibitors are trivial or whether the machine will sputter, maybe stop. What I tell people is to ignore the Microsoft-Fast Search tie up, and get a solution for a SharePoint environment that works. There are good choices ranging from a lower cost solution like dtSearch to a competitively priced system from Coveo, Exalead, ISYS Search Software, or another Microsoft Certified vendor.

Anticipated Question 5: What’s the impact of the Google Search Appliance?

Many vendors will tell you that Google has delivered a second-class system. That’s not exactly true. With the OneBox API, Google has a very solid solution. The impact is that Google has about 10,000 enterprise customers. These are sales made, in many cases, under the noses of incumbent vendors. Google’s a player in the enterprise market and a serious one. I have uncovered one impactful bit of research at Google that could–note, I said, could–change the search landscape. I have tried to ask Google about this development, but the GOOG thinks I am do not merit their attention. Too bad for me, I guess.

Anticipated Question 6: What’s the impact of text processing, semantic search, and other new technologies on enterprise search?

These are hot terms that will open doors. Some vendors will make sales because of their ability to mesh trendy concepts with more traditional search.

Stephen Arnold, June 18, 2008

Dialog Information Services: Can a New Owner Revive a Former Online Giant

June 17, 2008

A flurry of voice mail greeted me when I landed in San Francisco a short time ago. Thomson Reuters, according to the folks feeding me rumors, is looking to sell Dialog Information Services. Dialog was once the Google of online. Just long ago, of course. In the late 1970s and early 1980s, online dirt paths ran through Dialog Information Services, once a crown jewel of Lockheed Martin.

Dialog hosted commercial databases, charged users for access, and shared the money with the specialists who created such files as ABI / INFORM (business information) and Investext (analyst reports). The customers were not average folks. The users were trained information professionals who mastered the syntax of the naked Dialog command line. When I used Dialog, I paid to get a password. I paid to connect to a dial up network. I paid to see a bibliographic record. I paid to see an abstract. I paid to print out on thermal paper my search results. After getting a Dialog bill, it was easy for me to make the jump to Internet research. I did not get whacked $1 an abstract or more when I wanted to research a topic. I have had Dialog bills for a single research session that hit $300 in 1980 dollars.

Thomson Reuters is run by financial wizards with sharp pencils. Selling this property makes sense. I am not the only person addicted to research who found the online charges motivation to find an alternative like the Internet and Google Scholar.

Commercial online services have been hard hit in recent years by a double-whammy.

First, the core market of trained information professionals have watched their budgets squeezed. The for-fee services, therefore, have to fight to keep their numbers up.

Second, the Internet has become the first stop for many people looking for information, including me. I no longer use for fee services. The most timely information is available elsewhere, and I think that a specialized service for a user community used to looking for information on no charge Internet sites suggests an even more difficult future.

Potential Buyers

Who will be sufficiently bold to buy Dialog for several hundred million dollars, maybe more? Here are a few possibilities:

  • Cambridge Scientific Abstracts. This company has been gobbling up good properties and questionable information companies for several years. More debt should make life exciting if CSA is the buyer
  • Ebsco Electronic Publishing. This is a unit of the privately held EB Stevens Company. I heard that Ebsco just inked a deal with Endeca to breathe life into Ebsco’s own online service. Ebsco Electronic Publishing has a parent with deep pockets, and it is possible that buying Dialog delivers more reach.
  • An investment bank. Buyout masters see gold where others, like me, see dust bunnies. Dialog will be tough to spiff up and resell.
  • LexisNexis. This online legal service has not had an easy time with non-law content since it lost its exclusive for the New York Times. Parent Reed Elsevier will have to be reading tea leaves to find a way to make this combination work technically and financially. If LexisNexis is the buyer, I think the management team may know something about traditional online I’m missing.

My hunch is that Bethesda-based Cambridge Scientific Abstracts will make the leap.

The rumor, of course, may be false. But let’s look at the upside and downside of buying an online company with its roots the mainframe world.

Upside

Dialog, along with LexisNexis and Westlaw (also a Thomson Reuters property), has a good customer base among Fortune 500 companies. Also, Dialog has more than 500 commercial databases online and ready to go.

Downside

I think the big issue is the cost of operating this traditional business. Add to that the lack of enthusiasm youthful online searcher have for for-fee services and you have a growth problem.

When I get more substantive information, I will pass it along.

Stephen Arnold, June 17, 2008

Update 1, June 17, 2008 9 am Eastern

A reader who wishes to remain anonymous reminded me of these points:

  • Dialog has more content than any other service.
  • Dialog’s interface could learn from Ovid. He writes: “[What is needed is an] OvidSP-type interface to all Dialog databases. You can see how that looks by clicking on “Journal Articles Buy Now” on www.ovid.com, then on “Main Search Page” on the top toolbar. This pay-per-full-text-view interface for the medical field is a large subset of the complete commercial product.

Let me offer some comments about these useful remarks.

First, Dialog has content. One issue that challenges the present owner and the to-be owner if there is one with sufficient appetite for risk is scale. Dialog’s content would be more useful if it were possible to query, analyze, and report across the data. Let me give you one example. At this time, it is difficult for me to manipulate the Investext reports. I have to download chunks and then assemble them. If I want to look for relationship, I have to download other results and then process the data in a separate application on my own system. No problem for me, but this is a challenge for others. Can Dialog do a “dataspace” query? Nope, not unless the buyer is Google or one of the data management companies with the technology and the ability to scale a commercial service. I agree that Dialog has terabytes of data, but in its historical form, those data are increasingly valueless to me. Making Dialog into a golden goose is out of reach for most of the online companies with which I am familiar. I hope I am surprised that Dialog’s terabytes of individual datum atoms become useful information. My hunch is that the cost and technical complexity of as-is Dialog will make progress slow, expensive, and difficult.

With regards to the interface, I agree. Online services have not been particularly good with interface and user experience. Ovid is better than some, but not as good as some of the more innovative systems that I have seen and in one case profiled in my Silobreaker.com write ups in this Web log which you can locate using the Web log search box on the Beyond Search splash page. The reason commercial interfaces to for-fee content are lousy are not so good boils down to two factors: [a] the money comes from experienced or expert users, so the interfaces are overly complex. The Google approach is ignored in favor of too many choices, options, and features. [b] The decision making process at for-fee information companies are hamstrung by their legacy systems, contracts that limit what can and cannot be done with content, and awareness of the outside world. Like traditional publishing, for-fee database operations find themselves isolated. Remember, in the pre-Web world, these outfits were in the driver’s seat. Now most are waiting for the bus to come pick them up.

End update 1

Mark Logic: Content Applications Fuel Company’s Growth

June 17, 2008

Mark Logic provides information access and delivery solutions that accelerate the creation of content applications. Customers across a range of industries rely on Mark Logic to repurpose content and deliver that information through channels. Some vendors describe this suite of functions as an enterprise publishing system.

The company has been growing at a furious pace. Dave Kellogg, former Business Objects’ executive, said:

Mark Logic… is a database management system built to natively manage XML documents and optimized for handling vast numbers of them (I mean hundreds of terabytes) with high performance. It’s a read/write system. It has a query language (XQuery). It has transactions and logging. You can use it, by itself–without the need to bolt it on to either a relational database or an application server–as the basis for content applications.

The company’s customers include Oxford University Press, O’Reilly Media, and the Congressional Quarterly. The company builds relationships with its customers. Mr. Kellogg says, “Our philosophy is to sell sell solutions to problems and avoid the stereotypical “drive-by” technology sale, where companies dump the software in the parking lot and leave.”

The full interview appears as part of the Search Wizards Speak series published by ArnoldIT.com. You can read the transcript of the interview with Mr. Kellogg here. The index to the full series of interviews is here.

Stephen Arnold, June 17, 2008

Microsoft Plans European Search Center

June 17, 2008

A news release popped into my mailbox this morning (June 17, 2008). The surprising announcement was that Microsoft will set up a search technology center in Europe. You can read the full announcement here.

(Access this story quickly, PR Newswire is one of the organizations whose content can be difficult to track down. Access to some content may require you to pay a fee. This link worked at 6 46 am on June 17, 2008.)

According to the announcement:

“The new center will be designed to help accelerate Microsoft’s investments in Live Search and disrupt the search and advertising marketplace to the benefit of both the consumer and the advertiser, in line with Microsoft’s recent announcement in the U.S. of Live Search cashback.”

What is startling to me is that Microsoft has not set a location for the search center. With the acquisition of Fast Search & Technology, I thought that Trondheim, Norway, would have been the default choice. Microsoft has a big operation in Cambridge, England, as well. The news release does say, “We’re already doing some great work in Europe in the enterprise search space through our January 2008 acquisition of Fast Search & Transfer SA…”

Fast Search’s AllTheWEb.com Web site was arguably the only challenger to Google’s Web index prior to Fast Search’s selling the AllTheWeb.com site to Overture. My experience with Fast Search is that its core Web crawling and indexing system is one of the firm’s core strengths. Furthermore, Fast Search had developed its own advertising technology, also sold to Overture. But the company reentered online advertising with internal work and the acquisition of Platfood.com.

One big question remains unanswered: Why not turn to Fast Search, a company that successfully challenged Google for Web search?

One thing is clear. Microsoft is making an effort to close the gap with Google. Perhaps the 2009 research center will do the trick. But the center will open in 2009. In the fast-moving Web world, the timing to me seems leisurely.

Stephen Arnold, June 17, 2008

Google: From the Disruptor to the Disrupted

June 17, 2008

I am a fan of ReadWriteWeb, and I found the essay by Bernard Lunn quite interesting. Mr. Lunn has identified the “11 Search Trends that May Disrupt Google.” ReadWriteWeb.com makes it easy to locate its articles, so you can track this story down easily.

I found the list of factors that may be moving Google into a different role: from disruptor to a company that is itself disrupted. On the whole, I agree with the ReadWriteWeb analysis. Of particular importance is the notion of “start ups using a new outsourced infrastructure.” Powerset is an example of a company taking a different approach. I have heard that Powerset makes use of Amazon Web Services, and I think this is an important aspect of the company to monitor if my information is accurate.

The other point that I found on target is the impact tagging may have upon Google. Not long ago Vivisimo announced that its system made it possible for a user to add a tag–that is, index term–to an item in a result list. Tagging is becoming one of the everyday activities for those who write Web logs. The Semantic Web has been slow in coming, but I think the “social tagging” function may be providing some opportunities that search engines, including Google, have yet to exploit fully.

I would add one other point to the factors that are likely to influence Google–the challenge of size. Google is now 10 years old, and it is getting big enough to encounter the friction that plagues any large organization. Google, therefore, changes more slowly even though certain innovations make users gasp. A competitor can exploit Google’s own inertia but that competitor must take care to stay clear of Google’s momentum.

A happy quack for a useful and thought provoking write up, ReadWriteWeb!

Stephen Arnold, June 17, 2008

Google and Mobile Search

June 17, 2008

I have been a fan of Ars Technica. Jacqui Cheng has an excellent essay about Google and mobile search. My eyes are too lousy for me to be enamored of small screens and smaller type. But I am out of the mainstream. Ms. Cheng reports that Google has captured 60 percent of the mobile search market. You should read the full story here.

But it is early in the mobile search world. Mobile search brings new challenges because “regular” search does not work too well. I prefer to get answers, pick choices from lists, or just accept the default link. I am lazy but the form factor inhibits me. Other demographics will behave differently. Over time, the mobile search market will undergo some rapid evolution. I will start paying more attention to this area.

Stephen Arnold, June 17, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta