Text Analytics SummitPolySpot: Agile Enterprise Search Infrastructure

Trust: Rhymes with Rust and Like Rust, Trust Erodes

August 6, 2011

We don’t do philosophy. But Discover Magazine has examined in “The Slow Decline of Trust over Time the idea of “trust.” The article is full of graphs that illustrate trends in trust from 1972 to 2010, as reported annually by the General Social Survey at the University of California, Berkeley.

Writer Razib Khan explains,

“I realized that the General Social Survey has 2010 results available. This means that I could check any changes in public trust and confidence from 2008 to 2010! . . . . It seems that my intuition was wrong in that American society had slouched toward more general distrust.”

Well, no, not between 2008 and 2010, but it has gradually eroded since the survey was begun in ’72. Khan broke the results down by a number of factors, and the one that interests me regards “confidence in scientific community.” It shows that, since 2000, that confidence has gone down.

I wonder, does this mean that people are also losing confidence in Web and enterprise search technologies? This might play a factor in the future of search.

What’s trust have to do with search? Three points:

First, if a search system does not process a comprehensive, cohesive colleciton of content, the researcher will not get what’s called precision and recall. What comes out are results that do not represent the on point information that matches the user’s query. Distortion can enter search results in many ways. Most users “trust” search systems. That’s probably not a great idea.

Second, if the search system lacks an editorial policy which makes an attempt to winnow disinformation from information, then the search system and its index can be distorted by certain actions. Search engine optimization experts know many ways to get a search system to display content which may not match the user’s query or the more fuzzy notion of “intent”.

Finally, as costs crush even the big boys of search, decisions may be made by humans or algoritms to introduce efficiencies. Costs may fall, but the index may deliver results which are wide of the mark with the distance of the miss undetectable to all but an expert.

In short, once search systems generated distorted information, trust is what makes this situation persist. Most users will ask, “What’s the difference?” If yoiu don’t know the answer to this question, trust those search systems. Life will be just fine.

Cynthia Murrell, August 6, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Harmony Search Algorithm: a Jazz Legacy

August 6, 2011

Will you Harry Me? describes an unusual algorithm in “Neat algorithms—Harmony Search.” It’s based on principles of jazz musicians. What could be better?

The write up explains:

“The central idea is that when trying to solve some given optimization problem, you have some set of input variables that can be evaluated for their quality, and you want to know what inputs produce the best quality. Metaheuristic algorithms try to find this global optimum using some strategy which is better than brute force. For problems where it is hard to decipher why changing an input changes the quality (and thus the optimal solution isn’t very obvious), these algorithms are extremely useful. Harmony search and its siblings in this category do not guarantee that the globally optimal solution will be found, but often they do find it, and they are often much more efficient than an exhaustive brute force search of all input combinations.”

The article includes a nifty demo that illustrates the concept. It also provides Coffeescript code examples.

Writer Harry Brundage provides an example of a problem with which harmony search help: he needs to know how much time he should spend studying and how much sleeping in order to get the best grade possible. The results are shown as a heat map. If you are a tech type, you may want to click and explore.

Cynthia Murrell, August 6, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Google Conspiracy or Poorly Designed Web Sites?

August 5, 2011

It’s got to be tough being alpha dog. At least, it seems that way for Google who has one of the largest most used search engines in the world. With a slew of patent infringement lawsuits pending and several states looking into anti-trust issues associated with the top-dog, yet another company is complaining about Google’s business practices, as explained in the article, Local Business Site Challenges Google Ranking, on SiliconValley.com.

 How search engines determine ranking is a closely guarded secret, a series of algorithms that can make or break websites, depending on where they fall in the rankings. This is precisely what ShopCity is complaining about. According to the small company, Google is ‘manually monkeying’ with the rankings in order for ShopCity sites to appear lower than Google owned competing sites.

 Google asserts that ShopCity sites are low in the ranking because…well, they are basically bad sites. While ShopCity admits they are still working on building several of their sites (meaning they know their sites are rotten), many of the sites in the Bay Area, like ShopPaloAlto and ShopPleasanton, are alive and stuffed full of helpful and legitimate information. They believe those sites should be higher up in the rankings, as they are on Yahoo!

“Search industry expert Danny Sullivan, editor in chief of Search Engine Land, said such suspicions about a site as small as ShopPaloAlto.com are “ludicrous. If that was what (Google) was worried about, you would never find Yelp,” a formidable competitor for Google that offers restaurant reviews and business listings, Sullivan said. But Sullivan said Google should be able to differentiate between higher-quality ShopCity sites such as the Bay Area sites, and placeholder sites waiting until ShopCity makes partnerships with local groups for listings.”

 Is ShopCity going to be just another flea on Google’s back, or will something come from their claims? Coincidentally, after the FTC inquiry was announced, ShopCity’s Bay Area sites jumped in Google rankings, causing a 400% increase in traffic, but then plummeted back to page seven of search results after only three weeks. A Google imposed penalty for outside complaints if the official explanation.

 Catherine Lamsfuss, August 5, 2011

Sponsored by Quasar CA, your source for informed financial advisory services

Rich Media Search May Become Expensive and Slow

August 5, 2011

Bandwidth hogs, watch out! ReadWriteWeb warns, “AT&T to Start Data Throttling, How Will It Affect Users?

The impending throttle will begin on October first. AT&T 3G users who have “unlimited” data plans (hah!) will see their speeds artificially reduced if they reach a certain bandwidth threshold. Just what that threshold will be is still a mystery, but writer Dan Rowinski dug up some details:

“9to5Mac gives some guidelines on to what kind of usage will achieve reaching the throttling threshold. The site says 12,000 emails or website visits, four streaming movies or five hours of streaming music. That all makes sense except for the last bit, which may be a typo as five hours of music certainly will not eat anywhere near 2.5 GB of data that is expected to cue the throttling.”

AT&T helpfully points to some activities that tend to gulp down data: streaming video, remote web camera apps, sending large files (like uploading to cloud storage), and online gaming. In other words, everything that makes the Web what it is today.

Bottom line: tiered data plans (you know that’s where this leads, right?) are a money machine and AT&T wants to have its share. Ironically, better search leads to more data flow, so more search is good for AT&T; what’s good for AT&T is good for America.

Consumers who just let background processes update, download rich media without much thinking, and gobble up chunky online apps will be paying a lot for their data gluttony. Users will just have to cope.

What are the implications for rich media search? “Free” will come with a price. Welcome to the new datasphere!

Cynthia Murrell August 5, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

IBM Sets New File Scanning Record

August 5, 2011

IBM’s announcements fascinate us. The company releases information about products, services, and inventions and then we don’t hear too much about them. We still are waiting for a live demo of the search prowess of Watson. We think indexing Wikipedia would be a good start, but it seems that Watson has developed an interest in medicine. No problem. We’re patient. (No pun intended.)

We liked the write up “IBM System Scans 10 Billion Files in 43 Minutes,” reports TecheEYE.net. That beats their own previous record set back in 2007. Writer Matthew Finnegan elaborates:

“IBM has successfully scanned 10 billion files in just 43 minutes, opening the doors to access of zettabytes of information storage. This means a massive improvement on the previous record, a relatively sluggish one billion files scanned in three hours.

Changes credited for the success include relying on a single platform data environment and management task simplification. Also, an algorithm was devised that maximized use of all ten eight-core systems in the General Parallel File System. Researchers expect this accomplishment to point the way to ever greater data management efficiency in the future.

Our view is that this seems like a lot of files, but without a comparison against some other vendors of high speed file access, we interpret the number as similar to Amazon’s reporting of how successful Amazon Web Services is. We think Amazon is successful, but the metrics are tough to anchor to something to which we can relate. IBM is, it appears, emulating Amazon’s approach to unanchored metrics.

Our question: when will we see these different and amazing technologies in Watson? When will we see a third party analysis of file scanning speed or better yet, an article from a customer detailing the method and payoff from IBM’s remarkable technology?

Cynthia Murrell, August 5, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Are Panda and Android Influencing One Another?

August 5, 2011

I admit that the juxtaposition of the Panda initiative to improve Google search results and Android mobile operating system juggernaut are an odd couple. What gave me the idea was the article “Windows Phone Revenue “Abysmal”, Still Better Than Android.” The article explains that the Windows Phone is not generating a great deal of revenue in terms of Microsoft’s cash throughput. Here’s the passage that caught my attention:

Nick Eaton of the Seattle PI calls this “abysmal” and depending on how you look at it, perhaps. Compared to Xbox, sure. Compared to Android? Not so much. After all, Google makes $0 from Android sales, though they do take in some money through the limited advertising on the phone. In that sense, making money off of the OS is not Google’s goal, but market saturation is. The same is the same for Microsoft at this point. While they do charge for licenses, it’s not exactly an area of revenue for them, nor are they banking on it (pun alert). However, neither was Xbox which took 5 years to turn a profit (and after losing billions). [Emphasis added.]

My thought is that Panda may be a significant step because the changes are designed to keep traditional online Web ad revenues pumping cash. My hunch is that my juxtapositions are often off the bull’s eye. On the other hand, my idea is anchored in what seems to be a simple assertion in the Phone Revenue Abysmal write up.

I am now at lunch (August 1, 2011) and here are the three points from the goslings (my code work for my colleagues):

  1. Panda and Android are not related. Google will monetize Android at some point in the future.
  2. Panda is more important as a signal that Google has work to do with its core relevance method.
  3. Google has to get more money from Panda because Android and other mobile devices forces a different approach to search.

I am going to stick with the revenue issue and I like the point about changing search behavior. Stay tuned.

Stephen E Arnold, August 5, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

A Hypothetical 10-Server SharePoint Farm

August 5, 2011

Are you curious about how to increase your circuit bumper crop? Or how to deploy the newest network streams to water content libraries and documents? Or how about syncing web site uploads so that they will rise and fall with the sun, making your SharePoint chores easier?

If you have, the SharePoint Developer Team Blog wrote, “Services Running in a Multi-Server SharePoint Farm” to describe the possible configured farm-scoped instantiations (CFSIs) and service instances on a hypothetical 10-server farm. They have a graphic that shows all the configurations, maps, and layout of the farmyard. You really need to see it in order to understand how everything fits together.

Diagnostics Services run on all servers. There are five front-end web servers to run the Web Service Application Service. An entire server is dedicated to search and two are dedicated to Business Data Connectivity. The author says:

“When the SharePoint Foundation databases are on a dedicated server, as in this case, SharePoint Foundation need not be installed on that server. The Database Service is just a wrapper for the SQL Server service running on the database server. Hence, SharePoint Foundation code is not running on the dedicated database server. The service and its instance appears in the figure because it is represented in the object model with the SPDatabaseService and SPDatabaseServiceInstance classes.”

There’s a lot going on here and it would be amazing to see one of these farms in reality. We challenge you to make one!

Since an entire server is dedicated to search, use SurfRay Ontolica to enhance the users’ experience.

Whitney Grace, August 5, 2011

Sponsored by SurfRay, developers of Ontolica for SharePoint

Search Innovation: The Apodora Snake in the Google Garden of Eden

August 4, 2011

I wrote about the mercurial nature of innovation in one of our longer postings this week. You can find my “Innovation” essay here. I summarized the findings of work in which I was involved 40 years ago. The main point was that big companies spend a significant amount for innovation but find that innovation is tough.

Will this snake find its way to Google? The apodora papuana. Source: http://morelia-viridis.winnerbb.com/t2343-apodora-papuana

Now think about the information in “Science Fair Gold Medalist, 17, Invents Better Way to Search Internet.” The news story describes Mr. Schiefer, who is a teenager, and his approach to searching certain social media. Think Twitter messages, which are short, cryptic, and often context free. Here’s the passage I noted:

Seventeen-year-old Nicholas Schiefer has found a better way to search small documents, such as tweets and Facebook statuses – all for his Grade 11 science fair project. The Pickering resident created an algorithm to filter through, and find relevant information. Created using linear algebra and discrete math, his algorithm is named “Apodora” after a python species with extraordinary search capabilities.

The Globe and Mail report includes several interesting quotes from the young search wizard. Here are three I marked and filed for future recycling:

  • The genius in Facebook was not so much algorithmic, but in the social aspect of the network. What [Mr. Zuckerberg] managed to create very well was a desire. In search in general, we already have the desire to search. The technology is trying to catch up to what people expect.
  • My algorithm tries to follow connections further. Connections that are close are deemed more valuable. In theory, it follows connections to an infinite degree. One thing which I really liked about my algorithm is that it didn’t rely on my hand coding almost anything. The computer was able to infer that certain words were related.
  • It’s been shown that people are increasingly reading shorter and shorter documents.

Several thoughts. The use of a snake’s name reminded me that industry giants can be bitten unexpectedly. Search is a work in progress, difficult, and sufficiently expansive to permit numerical recipes to deliver potentially tasty results. Google will probably hire the lad.

So what? As I said in my “Innovation” essay, innovation is often easier to buy than cultivate at home.

Stephen E Arnold, August 4, 2011

Sponsored by Pandia.com, publishers of the New Landscape of Enterprise Search

Thoughts from an Industry Leader: Margie Hlava, Access Innovations

August 4, 2011

Here are some astute observations on the direction of enterprise search from someone who knows what she’s talking about. Library Technology Guides points to an interview with Margie Hlava, president of Access Innovations, in “Access Innovations founder and industry pioneer talks about trends in taxonomy and search.”

Ms Hlava’s 33 years in the search industry informed her observations on current trends, three of which she sees as significant: Cloud and Software as a Service (SaaS) computing, term mining, and the demand for metadata.

The move to the Cloud and SaaS computing demands more of our hardware, not less, Hlava insists. In particular, broadband networks are struggling to keep up. One advantage of the shift is a declining need to navigate labyrinths of hardware, software, and even internal politics on the client side. Other pluses are the motion toward increased data sharing and service enhancement. Also, more ways to maintain security and intellectual property rights are on the horizon.

She says that term mining is “a process involving conceptual extraction using thesaurus terms and their synonyms with a rule-base, then looking for occurrences to create more detailed data maps,” according to Hlava. Her company leverages this concept to make the most of clients’ large data sets. She is interested in new angles like mashups, data fusion, visualization, linked data, and personalization, but with a caveat: success in all these depends on the quality of the data itself. “Rotten data gives rotten results.”

Ms. Hlava regards taxonomies and other metadata enrichment as the way to bring efficiency to our searches. In that realm, the benefits have only begun:

“In terms of taxonomies and search, ‘I think we have just scratched the surface. With good data, our clients are in a good position to do an incredible array of new and interesting things. Good taxonomies take everything to the next level, forming the basis of not only mashups, but also author networks, project collaborations, deeper and better information retrieval,’ she concluded.”

Wise words from a wise woman. We look forward to observing these predictions take shape as the search industry moves forward. The interview with Margie Hlava, can be read in full here.

Access Innovations offers a wide range of content management services. The company has been building its semantic-based solutions for over thirty years and prides itself on its unique tool set and experienced personnel.

Stephen E Arnold, August 4, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

New Countries for Yahoo-Microsoft Search Alliance

August 4, 2011

Yahoo’s partnership with Microsoft is in the driver’s seat, as Search Engine Journal explains in “Yahoo Unrolls Search Alliance to 6 New Countries.” The deal has Microsoft supporting Yahoo by managing the mechanics of the search engine and providing search advertisements. However, Yahoo is remitting transition costs and a percentage of ad revenue. Writer Rob D. Young notes:

“One of the most clear things is that the search alliance will become less costly once it’s complete. At that point, Yahoo will be able to drop its back-end support in countries where Microsoft hasn’t yet taken the reigns, and transition costs will no longer be deducted from the total company income. So it’s good news for Yahoo that the transition to Microsoft has completed in another six regions.”

Argentina, Chile, Colombia, New Zealand, Peru, and Venezuela are the new areas, while more in Europe and Asia are on their way. Yahoo search is being customized for each region. Full migration should be completed by the end of this year.

The company’s second quarter earnings report confirms that these transitions are crucial to the its bottom line. Bing has been in the news lately, but we think that Bing will persist for the foreseeable future. Microsoft cannot concede search advertising to the Google—at least not yet.

Cynthia Murrell, August 4, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

« Previous PageNext Page »

  •  Only search links from this page: