Beyond Search’s Search Function Back On Track

August 10, 2008

I have had many positive comments about the search function for my Web log “Beyond Search”. Last week, we had reports of current postings not appearing in the index. Our hosting company had in place a method to block certain clickstreams when certain conditions were detected by the hosting company’s automated systems. The increasing demand for access to the site and the additional content indexed by the Blossom search system caused a slow down in “Beyond Search.” The hosting company, Blossom.com, and my engineering team have resolved the problem. Thank for your patience. Blossom.com’s Web log indexing system continues to delight me. If you are looking for a search system for a Web site or a Web log, please navigate to http://www.blossom.com and check the company. Feel free to mention that Beyond Search is happy. I’m sufficiently happy to award the Blossom.com team three happy quacks. We’re back to normal, but my normal may be different from your normal. Anyway you can search for posts about SearchCloud, Sprylogics, and of course my favorite SharePoint. Enjoy.

Stephen Arnold, August 10, 2008

Search Fundamentals: Cost

August 10, 2008

Set aside the fancy buzz words like taxonomies, natural language processing, and automatic classification. I want to relate one anecdote from a real-life conversation last week and then review five search fundamentals.

Anecdote

I’m sitting in a fancy conference room near Tyson’s Corner. The subject is large-scale information systems, not search. But search was assumed to be a function that would be available to the larger online system. And that’s where the problem with search fundamentals became a time bomb. The people in the room assumed that search was not a problem. One could send an email to one of the 300 vendors in the search and content processing market, negotiate a licensing deal, install the software, and move on to more important activities. After all, search was a mud flap on a very exotic sports car. Who gets excited about mud flaps?

The situation is becoming more and more common. I think it is a consequence of Googling. Most of the people with whom I meet in North America use Google for general Web search. The company’s name has become a verb, and the use of Google is becoming more ubiquitous each day. If I open Firefox, I have a Google search box available at all times.

If Google works, how hard can search be?

Five Fundamentals

I have created a table that lists five search fundamentals. Feel free to scan it, even recycle it in your search procurement background write ups. I want to make a few comments about each fundamental and then wrap up this essay with what seems to me to be an obvious caution. Table after jump.

Read more

NTT Video Search

August 10, 2008

Video search evokes thoughts of Autonomy and Google, rarely NTT, the Japanese communications giant. According to DigitalBroadcasting.com, NTT has a robust media search technology cleverly named Robust Media Search. You can read about the invention here. The write up is a news release, so it has a pro-NTT spin. Imagine that.

NTT is no newcomer to search. The company has been pecking away at media search, based on proprietary NTT technologies, since 1996.

Video search is important. In my own research, I find that many organizations plop a 20-something in front of a Web cam and capture minutes of video about a topic. Google, the search wizards, are among the worst offenders. Google records presentations accompanied by almost unreadable visuals about many topics. To find the occasional gem, I have to let the video drone as I listen for an interesting fact. The video search engines are not too good. There are false drops and often the need to run the entire video to locate the single point referenced in the search system. Grrr. I hate video.

Will the NTT invention make my life easier as I try to cope with the rising tide of rich media?  If you want to learn more about NTT’s technology, you can navigate here and deal with the registration process. I refused to do this.

Here’s what I have pieced together about this new search technology:

  • Search is one component of a number of rich media services. These range from distribution to digital fingerprinting. More about this is here.
  • The wizard identified with the search technology is Takayuki Kurozumi, Ph.D. Bio here.
  • A field trial with BayTSP began in April 2008. More about the trial is here. Information about BayTSP  is here. The “TSP” is an acronym for “track, secure, and protect”.

I have not been able to locate public information about the outcome of this test. Based on my experience with Japanese search systems, the technology may find its way into a network service. The “search” does not necessarily mean that I will be able to look for a video. The “search” may be a function for a copyright holder to locate and track video or audio content used without permission of the copyright holder.

Stephen Arnold, August 10, 2008

Ads Soften, Google Crumbles: Doomsday Approaches

August 10, 2008

Investor Business Daily’s Pete Barlas reports that online advertising may be crumbling. His story “Survey Indicates Economy Even Taking Toll On Search Ads” relies on data from a Covario survey that supports the assertion “search ad spending last quarter rose by the smallest percentage since at least the start of 2007.” You can read the Investor’s
Business Daily article here. In the best tradition of good news and bad news, Mr. Barlas reviews both sides of the Covario’s research findings. I’m a bad news type, and my mind considered the impact of sharply lower ad spending on Google. Because Google has the lion’s share of the online ad market, any downturn kicks Googzilla in the shin. A big enough downturn, Google could experience periostitis. The problem is not fatal, but it may slow the giant, giving Microsoft and other competitors an opportunity to make headway.

Stephen Arnold, August 10, 2008

Google: More Cause for Doubt

August 10, 2008

SFGate.com offers interesting views of business actions. The article “The AOL Flub Has analysts Revisiting Google” delivers on two counts. First, Ryan Kim summarizes Google’s admission that its investment in America Online has lost value, lots of value. Second, the write up rekindles the ashes of Google’s attempts at diversification have failed. You can read the August 9, 2008, story here. Mr. Kim revisits Google’s scattershot product development and reminds the reader that Google has been distracted by investments in companies such as YouTube.com, which has become a magnet for litigation and a challenge to monetize. Google may have overpaid for such properties as DoubleClick.com and gobbled small companies and done nothing to make them grow. More troublesome is Google’s interest in technologies unrelated to its core business; for example energy and space travel. For me the most important point in the article was this statement:

“Other than search, what has Google done right? They have 1,001 products in beta, but what’s been successful?” Chowdhry {\[an analyst quoted by Mr. Ryan] asked. “There has been a sequence of missteps and failures, and this is not the end. They miscalculated the valuation of AOL, and this is the first time they’re admitting to it.”

Google has a dominant position in Web search and advertising. The company has a track record of success in online advertising. Is it now time to reassess Google as company with a single business model and little else?

Stephen Arnold, August 10, 2008

Sinequa Inks OEM Deal with Oxaproc

August 9, 2008

Sinequa’s information retrieval solution will be integrated into Oxalys Technologies into Oxaproc, according to ITRNews.com. Oxaproc is an e-procurement system. Antoine Renard, director of development at Oxylys said that ease of integration was a factor in the company’s decision to license Sinequa CS. OEM deals are highly prized among search and content processing vendors. A typical deal involves up front cash and a royalty. Once an information retrieval engine has been embedded in an enterprise application, ripping and replacing can give engineers and customers migraines. You can learn more about Oxalys here. Specific information about Oxaproc e-procurement is here. Details about Sinequa are here. You can read the interview with Sinequa’s Jean Ferré top gun here.

Stephen Arnold, August 9, 2008

Intel: What Business Is It In?

August 9, 2008

Intel’s push in cloud computing strikes me as a “me too” response to a customer rebellion that is brewing. Maintaining servers, struggling with heat and power consumption costs, and the the mind-numbing wackiness of enterprise software fuel the shift. Intel in a search for more revenue is looking for a two-fer.

Intel wants to grow its revenue, particularly in its semiconductor business and Intel wants a bigger piece of the action in cloud computing. Can Intel perform this trick? This is a difficult question to answer. Now Intel seems to be probing other markets as well.

On August 8, 2008, Intel surprised me with its release of its Summary Statistics Library. You can read the Web log post by Dmitry Kabaev here. You can download the library here. There is also an installation guide available from the download page. You can choose either the Linux or the Windows library. There are two low key requests for your email, but as far as I could tell, I was able to suck down the libraries without registering. If you want to participate in Intel forums, you will have to cough up some information, but I register my dog, who seems quite happy to ignore his email.

The stats pack is part of Intel’s Whatif.Intel.com initiative. Intel wants to be a good open source citizen, and it is an excellent way to allow developers to start mud wrestling with programming for massively parallel systems. Intel is upfront about this point, describing the library as “a set of algorithms for parallel processing of multi-dimensional datasets. It contains functions for initial analysis of raw data which allow investigating structure of datasets and get their basic characteristics, estimates, and internal dependencies.”

You can whack on data sets with:

  • Basic statistics. Algebraic and central moments up to 4th order, skewness, kurtosis, variation coefficient, quantiles and order statistics.
  • Estimation of Dependencies. Variance-covariance/correlation matrix, partial variance-covariance/correlation matrix, pooled/group variance-covariance/correlation matrix.
  • Data with Outliers. The Intel® Summary Statistics Library contains a tool for detection of outliers in a dataset. Also the library allows computing robust estimates of the covariance matrix and mean in presence of outliers.
  • Missing Values. Data which contains missing values can be effectively processed using modern algorithms implemented in the package.
  • Out-of-Memory Datasets.  Many algorithms of the library support data which cannot fit into the physical memory processing huge data arrays in portions. Specifically, variance-covariance matrix estimators, algebraic and central moments, skewness, kurtosis, and variation coefficient can process a dataset in portions.
  • Various Data Storage Formats. The Intel Summary Statistics Library supports in-rows and in-columns storage formats for datasets, full and packed format for variance-covariance matrix.

The libraries support C and Fortran90/95.

Intel has invested in Endeca, and I don’t think this is a casual greenfield seeding. Endeca’s technology performs some interesting processes on structured and unstructured content. I see not overt evidence that Intel is overtly moving into information retrieval. I am tracking announcements like this stats pack as part of my research effort to figure out how Endeca figures in Intel’s plans.

While I root around for information, download the statistics libraries. My quick look revealed some useful work by Intel’s engineers, who merit a happy quack.

Stephen Arnold, August 9, 2008

Google’s Universe Is the Schrodinger Cat’s Meow

August 8, 2008

After a delightful flight delay, I sat down and scanned my email. A helpful reader send me a link to “Schrodinger-Like PageRank Equation and Localization in the WWW.” (I am getting weird results when I try to insert the correct character in Schrodinger. The spelling is a quick zig zag around this problem.) You can read the essay in Archiv.org here. The research involves a number of experts, including some nuclear physicists and a Yahoo researcher. The assertion is:

PageRank can be expressed in terms of a wave function obeying a Schrodinger-like equation.

If true, Google’s PageRank can be calculated using the type of math that a garden variety physics student uses to pass a third year physics course. You can read more about this assertion at:

Several thoughts crossed my mind as I worked through these materials:

First, the assertion requires verification. The implication of the Web documents is that Google’s PageRank can be replicated with less computation. The implications of this for a company like Yahoo are significant. Adding a lightweight PageRank value to Yahoo’s index could–note the could–improve its query matching.

Second, if true, Google may have to become more open about the many factors it uses to make the PageRank method more useful than a method based on the maths referenced by the European researchers.

Third, Google’s image of the unassailable leader in Web search gets a scorch mark.

I will revisit this subject when I am not sated with the luxury of air travel. More later.

Stephen Arnold, August 9, 2008

Company Profiles Coming to Beyond Search

August 8, 2008

I talked with the team working on this Web log today at lunch. After I bought everyone super burritos, I was able to gather some ideas for making the Beyond Search Web site more useful to me, the team, and the two or three readers out there.

The Search Wizards Speak series on ArnoldIT.com has been well received. Several of the interviews have been recycled and turned up in Web logs in lands far from rural Kentucky and our lone “authentic” Mexican restaurant. One of the people working there has a non-hill folk accent, so the Cantina Kentucky must be muy authentico.

The idea that emerged between mouthfuls of “authentic” burritos was to post one or two page profiles of the companies mentioned in the stories in the Web log. I thought the idea was pretty awful, but the burrito-sated Beyond Search team thought it was wonderful.

Here’s the plan.

I have developed on a restaurant napkin a rough outline for what should be included in each of the company profiles. A team member or one of the writers who work on this Web log will write the profile. I have gigabytes of info about search, and I will let the lucky journalist grind through these data and then tap other sources.

Each profile will have a comments section. If you want to add information or correct an error, use the comments form. Once a year, we will roll the comments into the baseline profile. In this way, you can get some basic information about the companies mentioned in the Web log. You can also update or correct the basic entry.

I think we will be cutting and pasting from company information of search vendors’ Web sites. I am thinking about adding my unique stamp to each write up with my personal “likes and dislikes” for each system. My attorney says he wants to think about this “likes and dislikes” stuff, so stay tuned on that point.

Keep in mind that I do really meaty analyses of companies in the search and content processing business. The profiles, like the interviews in Search Wizards Speak, will provide some useful information but the juicy stuff will not be included.

So what’s juicy?

Well, I just completed ripping through Endeca’s patent documents. I have identified some upsides and downsides to the inventions disclosed. I have then worked through the publicly available information about Endeca, made a couple of calls, and thought about what I have learned. That type of detail is not going to be in these free two-page profiles. Some lucky or silly outfit is going to have to pay me for the slog through the golden prose of lawyers and engineers. The prose makes Henry James’s novels look like the script to the new Batman movie.

I want to post a couple of test profiles and invite comments. I will go slowly at first, but if I can get the kinks worked out, my goal is to have one profile every week or two.

One of the burrito eaters suggested I sell profiles to companies who want the Beyond Search team to write about a specific firm. I am a greedy goose, but I want to put that idea on the back burner until I figure out if this is a feasible activity. There’s a lot of email and chasing required to get an interview completed. I’m not sure about search company profiles. The idea of money is easier to experience than the actual process of squeezing a beet for nectar.

Watch this Web log for a link to the first profile. I’m thinking next week. Comments? Suggestions? Let me know in the comments section below this article.

Stephen Arnold, August 8, 2008

Will Microsoft Bring Home the Gold in the SharePoint Olympics?

August 8, 2008

The Olympics are underway. If you have any questions, you will want to navigate to the Beijing Organizing Committee for the Olympic Games’ portal here. Ooops. That’s not the SharePoint site, and this MSDN article “SharePoint Server 2007 Powers Beijing 2008 Olympic Games” does not include a link to the SharePoint site. You can read this post, dated August 5, 2008l, here. The screenshot featured on the site does not look like any of the pages on the “official” site at http://en.beijing2008.cn/.

Here’s the “official” site’s look and feel:

olympics official

And here’s the screen shot of Microsoft SharePoint and its “official” site:

clip_image001_2

I think I have figured out what’s going on, but it would be nice if the MSDN post contained links to pages, not screenshots without a url or trackback link. You can navigate to a July 2008 case study here and learn more about this high profile opportunity for SharePoint. Here’s the architecture diagram for the Microsoft system:

set up

Compared to the SharePoint placemat diagram here, it seems to me that this Olympics’ diagram is a simplified schematic.

One oddity is that the drop down box that one uses to specify the viewer’s country is tough to control The video won’t play until you click on the country, but the scroll function is somewhat immature. The video is displayed on the NBColympics.com Web site, and I was puzzled by the design of that page.

A happy quack to the SharePoint team. Nothing but smooth sailing for the next couple of weeks.

Stephen Arnold, August 8, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta