Web Search: Picking Sides from the Bleachers

June 5, 2011

One of the interesting characteristics of fans is that shouts can inspire athletes in the game. Here is rural Kentucky, fans can focus on one another. Instead of the usually Southern civility, shouting matches or fisticuffs can break out. The players continue playing as the “game within the game” unfolds.

image

Fans cheer but whether noise alters the outcome of the game is a matter for a PhD dissertation, not a job, though.

The Gold Team

I read “How Facebook Can Put Google Out of Business.” The write up takes a premise set up by Googler Eric Schmidt, who until recently, was the CEO of the company. The PR-inspired mea culpa positioned Mr. Schmidt as the person who was responsible for Google’s failures in social media. Even before Orkut in 2003, I recall seeing references to social functions in Google’s patent documents prior to Google’s purchase of Orkut and its quite interesting trajectory. As you may know, the path wandered through a legal thicket, toured the more risk filled environs of Brazil, and ended up parked next to the railroad tracks near the Googleplex in Mountain View.

The TechCrunch article pointed out that Facebook has detailed information about its 500 or 600 million “members”. The idea is that Facebook can leverage the information about these members’ in order to create a more compelling “finding” system.

I suppose I can nitpick about the write up, but it presents information that I have touched upon in this Web log for a couple of years. When I read the article, my reaction was, “I thought everyone already knew this.”

The Blue Team

Then I read “The Silliest Idea Ever: Facebook Going After Google In Search.” This write up used a rhetorical technique that I have long employed; namely, taking a contrary position in order to highlight certain features of an issue. In my experience, the approach annoys 30 somethings who have memorized an elevator pitch and want to get back to Call of Duty or their iPhone. However, I enjoy the intellectual exercise and will continue the practice.

The main premise of the “Silliest Idea Ever” is that competing with Google in search is expensive, Google is a moving target, and other types of disruption will influence what happens between Google and Facebook in search. You should read the original write up to get the full freight of meaning.

Read more

Google, Mobile, and Money: Can We Discern a Pattern, Connect Some Dots

May 31, 2011

I woke up early this morning mostly because the crows decided to have a Post Memorial Day celebration here in the hollow near Harrod’s Creek. Beautiful birds. Often their discourse reminds me of data about the success of Android, the lack of success at RIM, and the slow start Microsoft Phone 7 Windows Mobile edition has had. And Apple? Well, even the crows have iPhones in Kentucky.

What I found interesting was more data about the success/failure of Android and Apple in the mobile game. “Nielsen: Android’s Lead Over iOS May Have Stopped Growing” reports that Android is popular “but no more than it was in March [2011].” You can work through the numbers which are based on Nielsen’s survey results. Note that Nielsen is hedging its bets on its results. My experience is that the results are often driven by the needs of marketing and sales and not so much what I want to know.

image

I want to connect the dots, but I am not sure what’s happening. Source: http://corknuts.tumblr.com/

Here’s the passage I noted on my trusty iPad:

Read more

The Web, Blogs, and the Reed Effect

May 31, 2011

There was a blip in the blogosphere about the infusion of capital into the big, firm information arteries of GigaOm, founded by Om Malik. Even the trend tracking Mashable covered the story in “Tech Blog GigaOM Shifts Focus to Premium Content.”

The money apparently flowed from Reed Elsevier Ventures with some other investors betting on the blog news and analysis service. The founder added some cash to the pot as did Alloy Ventures. The funding flies in the face of the well received of the Business Insider’s link to a presentation about how traditional media companies can behave more like start ups.

image

Traditional professional publishers push prices to the peak of Mount Tolerance. As long as revenues do not decline, the number of customers is irrelevant. Remember the concept of elasticity in pricing from Econ 100?

This is an interesting development for three reasons:

First, although the Huffington Post hit the jackpot, the GigaOM investment is suggestive. What I see is that the GigaOM content play is interesting, but not yet at the Huffington Post level. Investors hope to reach that benchmark in money magnetism so outfits like AOL will acquire GigaOM for an even more juicy pay day.

Second, the shift signals more trouble for the advertiser supported model of publishing. Google Adsense seems to be losing some steam, and the costs of pitching vendors to support a blog is expensive and time consuming. With more cash, GigaOM can follow in the footsteps of more traditional publishing, consulting, and analysis businesses. Get subscribers, sell reports, and cherry pick other money making opportunities as they come along—Sounds like a plan to me. For outfits like Google, the river of money may behave like Lake Hamoun. The Reed Effect, in my view, is pushing prices to the heights. If customers want the information, those customers can pay.

Read more

Mapping the New Landscape of Enterprise Search

May 23, 2011

What has happened to enterprise search? In a down economy, confusion among potential licensees has increased, based on the information I gathered for my forthcoming The Landscape of Enterprise Search, to be published by Pandia in June 2011. The price for the 186 page report is $20 US and 15 euros. Pandia and I decided that the information in the report should be available to those wrestling with enterprise search. With some “experts’ charging $500 and more for brief, pay to play studies, our approach is to provide substantive information at a very competitive price point.

In this completely new report, my team and I compress a complex subject into a manageable 150 pages of text. There are 30 pages of supplementary material, which you use as needed. The core of the report is an eyes-wide-open analysis of six key vendors: Autonomy, Endeca, Exalead, Google, Microsoft, and Vivisimo.

cover 5 10 C

You may recall that in the 2004 edition of the Enterprise Search Report, I covered about two dozen vendors. By the time I completed the third edition (the last one I wrote), the coverage had swelled to more than 28 vendors and to an unwieldy 600 plus pages of text.

In this new Landscape report, the publisher, my team, and I focused on the companies most often included in procurement reviews. With more than 200 vendors offering enterprise search solutions, there are 194 vendors who could argue that their system is better, faster, and cheaper than the vendors’ systems discussed in Landscape. That may be true, but to include a large number of vendors makes for another unwieldy report. I know from conversations with people who call me asking about another “encyclopedia of search” that most people want two or three profiles of search vendors. We maintain profiles for about 50 systems, and we track about 300 vendors in our in house Overflight system.

My team and I have tried to make clear the key points about the age and technical aspects of each vendor’s search solution. I am also focused on explaining what systems can and cannot do. If you want information that will strike you as new and different, you will want to get a copy of my new Landscape report.

Alchemist layers 02

Are you lost in the alchemist’s laboratory? This is a place where unscientific and fiddling take precedence over facts. Little wonder when “experts” explain enterprise search, there is no “lead into gold” moment. There is a mess. The New Landscape of Search helps you avoid the alchemists’ approach. Facts help reduce the risk in procuring an enterprise search solution.

Read more

Search: An Information Retrieval Fukushima?

May 18, 2011

Information about the scale of the horrific nuclear disaster in Japan at the Fukushima Daiichi nuclear complex is now becoming more widely known.

Expertise and Smoothing

My interest in the event is the engineering of a necklace of old-style reactors and the problems the LOCA (loss of coolant accident) triggered. The nagging thought I had was that today’s nuclear engineers understood the issues with the reactor design, the placement of the spent fuel pool, and the risks posed by an earthquake. After my years in the nuclear industry, I am quite confident that engineers articulated these issues. However, the technical information gets “smoothed” and simplified. The complexities of nuclear power generation are well known at least in engineering schools. The nuclear engineers are often viewed as odd ducks by the civil engineers and mechanical engineers. A nuclear engineer has to do the regular engineering stuff of calculating loads and looking up data in hefty tomes. But the nukes need grounding in chemistry, physics, and math, lots of math. Then the engineer who wants to become a certified, professional nuclear engineer has some other hoops to jump through. I won’t bore you with the details, but the end result of the process produces people who can explain clearly a particular process and its impacts.

image

Does your search experience emit signs of troubles within?

The problem is that art history majors, journalists, failed Web masters, and even Harvard and Wharton MBAs get bored quickly. The details of a particular nuclear process makes zero sense to someone more comfortable commenting about the color of Mona Lisa’s gown. So “smoothing” takes place. The ridges and outcrops of scientific and statistical knowledge get simplified. Once a complex situation has been smoothed, the need for hard expertise is diminished. With these simplifications, the liberal arts crowd can “reason” about risks, costs, upsides, and downsides.

image

A nuclear fall out map. The effect of a search meltdown extends far beyond the boundaries of a single user’s actions. Flawed search and retrieval has major consequences, many of which cannot be predicted with high confidence.

Everything works in an acceptable or okay manner until there is a LOCA or some other problem like a stuck valve or a crack in a pipe in a radioactive area of the reactor. Quickly the complexities, risks, and costs of the “smoothed problem” reveal the fissures and crags of reality.

Web search and enterprise search are now experiencing what I call a Fukushima event. After years of contentment with finding information, suddenly the dashboards are blinking yellow and red. Users are unable to find the information needed to do their job or something as basic as locate a colleague’s telephone number or office location. I have separated Web search and enterprise search in my professional work.

I want to depart for a moment and consider the two “species” of search as a single process before the ideas slip away from me. I know that Web search processes publicly accessible content, has the luxury of ignoring servers with high latency, and filtering content to create an index that meets the vendors’ needs, not the users’ needs. I know that enterprise search must handle diverse content types, must cope with security and access controls, and perform more functions that one of those two inch wide Swiss Army knives on sale at the airport in Geneva. I understand. My concern is broader is this write up. Please, bear with me.

Read more

Google and Search

May 11, 2011

Over the last five days, I have been immersed in conversations about Google and its public Web search system. I am not able to disclose the people with whom I have spoken. However, I want to isolate the issues that surfaced and offer some observations about the role of traditional Web sites. I want to capture the thoughts that surfaced after I thought about what I learned in my face to face and telephone conversations. In fact, one of the participants in this conversation directed my attention to this post, “Google Panda=Disaster.” I don’t think the problem is Panda. I think a more fundamental change has taken place and Google’s methods are just out of sync with the post shift environment. But hope is not lost. At the end of this write up, I provide a way for you to learn about a different approach. Sales pitch? Sure but a gentle one.

Relevance versus Selling Advertising

The main thrust of the conversations was that Google’s Web search is degrading. I have not experienced this problem, but the three groups with whom I spoke have. Each had different data to show that Google’s method of handling their publicly accessible Web site has changed.

First, one vendor reported that traffic to the firm’s Web site had dropped from 2,000 uniques per month to 100. The Web site is informational. There is a widget that displays headlines from the firm’s Web log. The code is clean and the site is not complex.

Second, another vendor reported that content from the firm’s news page was appearing on competitors’ Web sites. More troubling, the content was appearing high in a Google results list. However, the creator of the content found that the stories from the originating Web site were buried deep in the Google results list. The point is that others were recycling original content and receiving a higher ranking than the source of the original content.

image

Traditional Web advertising depicted brilliantly by Ken Rockwell. See his work at http://www.kenrockwell.com/canon/compacts/sd880/gallery-10.htm

Third, the third company found that its core business was no longer appearing in a Google results list for a query about the type of service the firm offered. However, the company was turning up in an unrelated or, at best, secondary results list.

I had no answer to the question each firm asked me, “What’s going on?”

Through various contacts, I pieced together a picture that suggests Google itself may not know what is happening. One source indicated that the core search team responsible for the PageRank output is doing its work much as it has for the last 12 years. Googlers responsible for selling advertising were not sure what changes were going on in the core search team’s algorithm tweaks. Not surprisingly, most people are scrutinizing search results, fiddling with metatags and other aspects of a Web site, and then checking to see what happened. The approach is time consuming and, in my opinion, very much like the person who plugs a token into a slot machine and hits the jack pot. There is great excitement at the payoff, but the process is not likely to work on the next go round.

Net net: I think there is a communications filter (intentional or unintentional) between the group at Google working to improve relevance and the sales professionals at Google who need to sell advertising. On one hand, this is probably healthy because many organizations put a wall between certain company functions. On the other hand, if Adwords and Adsense are linked to traffic and that traffic is highly variable, some advertisers may look to other alternatives. Facebook’s alleged 30 percent share of the banner advertising market may grow if the efficacy of Google’s advertising programs drops.

Read more

Tracking: Does It Matter?

May 11, 2011

A news story broke this week that was more difficult for many to ignore; it seems our beloved iPhones and iPads are paying us the same attention we lavish on them. It turns out these Apple devices keep an internal log of every cell tower or hot spot they connect to, in essence creating a map of the user’s movements for as long as ten months. It gets better. The log file is highly visible and unencrypted, making it accessible to anyone with your device in their hands.

image

Getting the scent. Source: http://www2.journalnow.com/news/2011/feb/07/wsweat01-beagle-found-in-a-jiffy-by-tracking-dogs-ar-760887/

This news stems from a couple of British programmers who stumbled upon said “secret” location file. In the midst of the melee that ensued from outraged consumers and lawmakers alike, I was directed to a Bloomberg article titled “Researcher: iPhone Location Data Already Used By Cops”.

Interestingly enough, a rendition of this same story has been covered by the press months ago, only featured in a different light courtesy of an individual studying forensic computing. Per the write-up: “In a post on his blog, he explains that the existence of the location database—which tracks the cell phone towers your phone has connected to—has been public in security circles for some time.

While it’s not widely known, that’s not the same as not being known at all. In fact, he has written and presented several papers on the subject and even contributed a chapter on the location data in a book that covers forensic analysis of the iPhone.”

Read more

New Spin for OmniFind: Content Analytics

May 2, 2011

IBM has dominated my thinking with its bold claims for Watson. In the blaze of game show publicity, I lost track of the Lucene-based search system OmniFind 9.x. My Overflight system alerted me to “Content Analytics Starter Pack.” According to the April 2011 announcement:

The Starter Pack offers an advanced content analytics platform with Content Analytics and industry-leading, knowledge-driven enterprise search with OmniFind Enterprise Edition in a combined package. IBM Content Analytics with Enterprise Search empowers organizations to search, assess, and analyze large volumes of content in order to explore and surface relevant insight quickly to gain the most value from their information repositories inside and outside the firewall.

The product allows IBM licensees to:

  • Find relevant enterprise content more quickly
  • Turn raw text into rapid insight from content sources internal and external to your enterprise
  • Customize rapid insight to industry and customer specific needs
  • Enable deeper insights through integration to other systems and solutions.

At first glance, I thought IBM Content Analytics V2.2 was one program. I noticed that the OmniFind Enterprise Edition 9.1 has one set of hardware requirements at http://goo.gl/Wie0X and another set of hardware requirements for the analytics component at http://goo.gl/5J1ox. In addition, there are specific software requirements for each product.

The “new” product includes “improved support for content assessment, Cognos® Business Intelligence, and Advanced Case Management.”

shotgun marriage big

Is IBM’s bundling of analytics and search a signal that the era of traditional search and retrieval has officially ended? Base image source: www.awesomefunnyclever.com

When you navigate to http://goo.gl/he3NR, you can see the different configurations available for this combo product.

What’s the pricing? According to IBM, “The charges are unchanged by this announcement.” The pricing seems to be based on processor value units or PVUs. Without a link, I am a bit at sea with regards to pricing. IBM does point out:

For clarification, note that if for any reason you are dissatisfied with the program and you are the original licensee, you may obtain a refund of the amount you paid for it, if within 30 days of your invoice date you return the program and its PoE to the party from whom you obtained it. If you downloaded the program, you may contact the party from whom you acquired it for instructions on how to obtain the refund. For clarification, note that for programs acquired under the IBM International Passport Advantage Agreement, this term applies only to your first acquisition of the program.

Read more

Google and Mobile: Will the Pass from Web to Mobile Search Be Smooth?

April 25, 2011

Over the bunny weekend, I spoke with two people about the direction the Web is moving. In those information conversations, I learned some interesting factoids. First, the Web today is different from the Web of five, even two years ago. The person used the word “ephemeral” to describe much of the information that is available. I thought that “ephemeral” applied to Twitter “tweets” and some of the short content posted in the comments section of blogs and other social media. As I learned, this definition is too narrow. The ephemeral nature of the Web applies to such content types as:

  • Dynamic Web pages such as those produced by airline ticket or hotel reservation systems. The content which is mostly availability and price changes often with each screen refresh.
  • Junk pages that someone produces until the pages stop attracting traffic, often leaving no trace anywhere. To see an example, navigate to Webspace.com
  • Test Web sites or blogs put up and then abandoned. To see an example, navigate to Captain Roy. The Web page stays behind, but the blog and its content is temporary.

I did not agree with the person’s approach to ephemera, but I did agree with the perception that the texture of information available via the Web was quite different today than it was a few years back.

passing baton copy

Can Google’s Web search pass the baton to Google mobile search without losing cadence, speed, or control?

The second conversation focused on the notion of the volume of data. I had heard some astounding and unsubstantiated claims about the rate of growth of digital information. One person told me that Web and organizational content was doubling every two months. This person was the president of a trendy software company, so I zipped my lip. But on the call over the weekend, a person who shall remain anonymous asserted, “Web content doubles every 72 hours.” Again, I did not push the issue, but that is a heck of a statement.

Two observations:

There is a lot of digital information and some of it is clearly not intended to be substantive. Persistence, if it does occur, is accidental or irrelevant to the person creating the information. Other content is machine generated like the Webspace.com “page”, and it is little more than a placeholder or a way to generate ad revenue or click throughs.

Finding information in today’s environment is not particularly easy. The general purpose Web search engines like Bing.com and Google.com are able to provide pointers to more traditional Web content. To locate information that appears in a tweet, I have to exert considerable effort to locate an item. For companies with distinct name, my Overflight services works okay but some outfits have names that make it almost impossible to find them. Examples include Brainware, Stratify, and Thunderstone without lots of false drops to games, rock and roll, or other content which has appropriated a word, phrase, or semantic space.

Mobile search is the primary means of finding information for many people. On my trip to Hong Kong at the end of March 2011, I watched people in public spaces like the Starbuck’s at the giant mall near the central rapid transit station. There were a few laptops and iPads, but the majority of the people were using mobile devices. A similar uptake is evident in most big cities. Here in Harrod’s Creek, there are precious few people, so the one person using a clunky laptop at the Dairy Queen is out of the mainstream.

In my printed edition of the New York Times, I read in the business section today (April 25, 2011) “Google, a Giant in Mobile Search, Seeks New Ways to Make It Pay.” The “it”, of course, is mobile search in particular and more generally mobile online information access. You may be able to read the story online, but the links often go dead. More ephemera, I suppose. Try this one, but no guarantees: http://goo.gl/Ebpnz.

Read more

Google, Traffic, English 101, and an Annoying Panda

April 21, 2011

I read a snippet on my iPad and then the full story in the hard copy of the Wall Street Journal “Sites Retool for Google Effect.” You can find this story on hard copy page B 4 in the version that gets tossed in the wet grass in Harrod’s Creek, Kentucky. Online, not too sure anymore. This link may work. But, then again, maybe not.

The point of the story is that Google has changed its method of determining relevance. A number of sites mostly unfamiliar to me made the point that Google’s rankings are important to businesses. One example was One Way Furniture, an outfit that operates in Melville, New York. Another was M2commerce LLC, an office supply retailer in Atlanta, Georgia. My take away from the story is that these sites’ owners are going to find a way to deliver content that Google perceives as being relevant.

image

A panda attack. Some Web site owners suffer serious wounds. Who are these Web site owners trying to please? Google or their customers? Image source: http://tomdoerr.wordpress.com/2011/03/25/whos-in-the-house-panda-in-da-house/

I don’t want to be too much like my auto mechanic here in Harrod’s Creek, but what about the customer? My thought is that if one posts information, these outfits should ask, “What does our customer need to make an informed decision?” The Wall Street Journal story left me with the impression, which is probably incorrect, that the question should be, “What do I need to create so Google will reward me with a high Google rank?”

For many years I have been avoiding search engine optimization. When I explained how some of Google’s indexing “worked” on lecture tours for my 2004-2005 Google monograph, The Google Legacy, pesky SEO kept popping up. Google has done a reasonable job of explaining how its basic voting mechanism worked. For those of you who were fans of John Kleinberg, you know that Google was influenced to some extent by Clever. There are other touch points in the Backrub/Google PageRank methods disclosed in the now famous PageRank patent. Not familiar with that document? You can find a reasonable summary on Wikipedia or in my The Google Legacy.

If we flash forward from 1996, 1997, and 1998 to the present, quite a bit has happened to relevance ranking in the intervening 13 to 15 years. First, note that we are talking more than a decade. The guts of PageRank remain but the method has been handled the way my mother reacted to a cold day. She used to put on a sweater. Then she put on a light jacket. After adding a scarf, she donned her heavy wool coat. Underneath, it was my mom, but she added layers of “stuff” to keep her warm.

image

All wrapped up, just slow moving with reduced vision. Layers have and operational downside.

That’s what has happened, in part, to Google. The problem with technology is that if you build a giant facility, it becomes difficult, time consuming, and expensive to tear big chunks of that facility apart and rebuild it. The method of change in MBA class is to draw a couple of boxes, babble a few buzzwords, get a quick touch of Excel fever, and then head to the squash court. The engineering reality is that the MBA diagrams get implemented incrementally. Eventually the desired rebuild is accomplished, but at any point, there is a lot of the original facility still around. If you took an archaeology class for something other than the field trips, you know that humans leave foundations, walls, and even gutters in place. The discarded material is then recycled in the “new” building.

How this apply to Google? Works the same way.

How significant are the changes that Google has made in the last few months? The answer is, “It depends.”

Google has to serve a number of different constituencies. Each constituency has to be kept happy and the “gravity” of each constituency carefully balanced. Algorithms, even Google algorithms, are still software. Software, even smart software that scurries to a look up table to get a red hot value or weight, is chock full of bugs, unknown dependencies, and weird actions that trigger volleyball games or some other mind clearing activity.

image

Google has to make progress and keep its different information “packages” in balance and hooked up.

The first constituency is the advertiser. I know you think that giant companies care about “you” and “your Web site”, but that is just not true. I don’t care about individuals who have trouble using the comments section of this blog. If a user can’t figure something out, what am I supposed to do? Call WordPress and tell them to fix its comments function because one user does not know how to fill in a Web form? I won’t do that. WordPress won’t do that. I am not confident you, gentle reader, would do that. Google has to fiddle with its relevance method because there are some very BIG reasons to take such a risky and unknown charged step as slapping another layer of functionality on top of the ageing PageRank method. My view is that Google is concerned enough to fool with plumbing because of its awareness that the golden goose of Adwords and Adsense is honking in a manner that signals distress. No advertisers, no Google. Pretty simple equation, but that’s one benefit from living in rural Kentucky. I can only discern the obvious.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta