Harmony Search Algorithm: a Jazz Legacy

August 6, 2011

Will you Harry Me? describes an unusual algorithm in “Neat algorithms—Harmony Search.” It’s based on principles of jazz musicians. What could be better?

The write up explains:

“The central idea is that when trying to solve some given optimization problem, you have some set of input variables that can be evaluated for their quality, and you want to know what inputs produce the best quality. Metaheuristic algorithms try to find this global optimum using some strategy which is better than brute force. For problems where it is hard to decipher why changing an input changes the quality (and thus the optimal solution isn’t very obvious), these algorithms are extremely useful. Harmony search and its siblings in this category do not guarantee that the globally optimal solution will be found, but often they do find it, and they are often much more efficient than an exhaustive brute force search of all input combinations.”

The article includes a nifty demo that illustrates the concept. It also provides Coffeescript code examples.

Writer Harry Brundage provides an example of a problem with which harmony search help: he needs to know how much time he should spend studying and how much sleeping in order to get the best grade possible. The results are shown as a heat map. If you are a tech type, you may want to click and explore.

Cynthia Murrell, August 6, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Google Conspiracy or Poorly Designed Web Sites?

August 5, 2011

It’s got to be tough being alpha dog. At least, it seems that way for Google who has one of the largest most used search engines in the world. With a slew of patent infringement lawsuits pending and several states looking into anti-trust issues associated with the top-dog, yet another company is complaining about Google’s business practices, as explained in the article, Local Business Site Challenges Google Ranking, on SiliconValley.com.

 How search engines determine ranking is a closely guarded secret, a series of algorithms that can make or break websites, depending on where they fall in the rankings. This is precisely what ShopCity is complaining about. According to the small company, Google is ‘manually monkeying’ with the rankings in order for ShopCity sites to appear lower than Google owned competing sites.

 Google asserts that ShopCity sites are low in the ranking because…well, they are basically bad sites. While ShopCity admits they are still working on building several of their sites (meaning they know their sites are rotten), many of the sites in the Bay Area, like ShopPaloAlto and ShopPleasanton, are alive and stuffed full of helpful and legitimate information. They believe those sites should be higher up in the rankings, as they are on Yahoo!

“Search industry expert Danny Sullivan, editor in chief of Search Engine Land, said such suspicions about a site as small as ShopPaloAlto.com are “ludicrous. If that was what (Google) was worried about, you would never find Yelp,” a formidable competitor for Google that offers restaurant reviews and business listings, Sullivan said. But Sullivan said Google should be able to differentiate between higher-quality ShopCity sites such as the Bay Area sites, and placeholder sites waiting until ShopCity makes partnerships with local groups for listings.”

 Is ShopCity going to be just another flea on Google’s back, or will something come from their claims? Coincidentally, after the FTC inquiry was announced, ShopCity’s Bay Area sites jumped in Google rankings, causing a 400% increase in traffic, but then plummeted back to page seven of search results after only three weeks. A Google imposed penalty for outside complaints if the official explanation.

 Catherine Lamsfuss, August 5, 2011

Sponsored by Quasar CA, your source for informed financial advisory services

Rich Media Search May Become Expensive and Slow

August 5, 2011

Bandwidth hogs, watch out! ReadWriteWeb warns, “AT&T to Start Data Throttling, How Will It Affect Users?

The impending throttle will begin on October first. AT&T 3G users who have “unlimited” data plans (hah!) will see their speeds artificially reduced if they reach a certain bandwidth threshold. Just what that threshold will be is still a mystery, but writer Dan Rowinski dug up some details:

“9to5Mac gives some guidelines on to what kind of usage will achieve reaching the throttling threshold. The site says 12,000 emails or website visits, four streaming movies or five hours of streaming music. That all makes sense except for the last bit, which may be a typo as five hours of music certainly will not eat anywhere near 2.5 GB of data that is expected to cue the throttling.”

AT&T helpfully points to some activities that tend to gulp down data: streaming video, remote web camera apps, sending large files (like uploading to cloud storage), and online gaming. In other words, everything that makes the Web what it is today.

Bottom line: tiered data plans (you know that’s where this leads, right?) are a money machine and AT&T wants to have its share. Ironically, better search leads to more data flow, so more search is good for AT&T; what’s good for AT&T is good for America.

Consumers who just let background processes update, download rich media without much thinking, and gobble up chunky online apps will be paying a lot for their data gluttony. Users will just have to cope.

What are the implications for rich media search? “Free” will come with a price. Welcome to the new datasphere!

Cynthia Murrell August 5, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

IBM Sets New File Scanning Record

August 5, 2011

IBM’s announcements fascinate us. The company releases information about products, services, and inventions and then we don’t hear too much about them. We still are waiting for a live demo of the search prowess of Watson. We think indexing Wikipedia would be a good start, but it seems that Watson has developed an interest in medicine. No problem. We’re patient. (No pun intended.)

We liked the write up “IBM System Scans 10 Billion Files in 43 Minutes,” reports TecheEYE.net. That beats their own previous record set back in 2007. Writer Matthew Finnegan elaborates:

“IBM has successfully scanned 10 billion files in just 43 minutes, opening the doors to access of zettabytes of information storage. This means a massive improvement on the previous record, a relatively sluggish one billion files scanned in three hours.

Changes credited for the success include relying on a single platform data environment and management task simplification. Also, an algorithm was devised that maximized use of all ten eight-core systems in the General Parallel File System. Researchers expect this accomplishment to point the way to ever greater data management efficiency in the future.

Our view is that this seems like a lot of files, but without a comparison against some other vendors of high speed file access, we interpret the number as similar to Amazon’s reporting of how successful Amazon Web Services is. We think Amazon is successful, but the metrics are tough to anchor to something to which we can relate. IBM is, it appears, emulating Amazon’s approach to unanchored metrics.

Our question: when will we see these different and amazing technologies in Watson? When will we see a third party analysis of file scanning speed or better yet, an article from a customer detailing the method and payoff from IBM’s remarkable technology?

Cynthia Murrell, August 5, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Are Panda and Android Influencing One Another?

August 5, 2011

I admit that the juxtaposition of the Panda initiative to improve Google search results and Android mobile operating system juggernaut are an odd couple. What gave me the idea was the article “Windows Phone Revenue “Abysmal”, Still Better Than Android.” The article explains that the Windows Phone is not generating a great deal of revenue in terms of Microsoft’s cash throughput. Here’s the passage that caught my attention:

Nick Eaton of the Seattle PI calls this “abysmal” and depending on how you look at it, perhaps. Compared to Xbox, sure. Compared to Android? Not so much. After all, Google makes $0 from Android sales, though they do take in some money through the limited advertising on the phone. In that sense, making money off of the OS is not Google’s goal, but market saturation is. The same is the same for Microsoft at this point. While they do charge for licenses, it’s not exactly an area of revenue for them, nor are they banking on it (pun alert). However, neither was Xbox which took 5 years to turn a profit (and after losing billions). [Emphasis added.]

My thought is that Panda may be a significant step because the changes are designed to keep traditional online Web ad revenues pumping cash. My hunch is that my juxtapositions are often off the bull’s eye. On the other hand, my idea is anchored in what seems to be a simple assertion in the Phone Revenue Abysmal write up.

I am now at lunch (August 1, 2011) and here are the three points from the goslings (my code work for my colleagues):

  1. Panda and Android are not related. Google will monetize Android at some point in the future.
  2. Panda is more important as a signal that Google has work to do with its core relevance method.
  3. Google has to get more money from Panda because Android and other mobile devices forces a different approach to search.

I am going to stick with the revenue issue and I like the point about changing search behavior. Stay tuned.

Stephen E Arnold, August 5, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Protected: A Hypothetical 10-Server SharePoint Farm

August 5, 2011

This content is password protected. To view it please enter your password below:

Search Innovation: The Apodora Snake in the Google Garden of Eden

August 4, 2011

I wrote about the mercurial nature of innovation in one of our longer postings this week. You can find my “Innovation” essay here. I summarized the findings of work in which I was involved 40 years ago. The main point was that big companies spend a significant amount for innovation but find that innovation is tough.

Will this snake find its way to Google? The apodora papuana. Source: http://morelia-viridis.winnerbb.com/t2343-apodora-papuana

Now think about the information in “Science Fair Gold Medalist, 17, Invents Better Way to Search Internet.” The news story describes Mr. Schiefer, who is a teenager, and his approach to searching certain social media. Think Twitter messages, which are short, cryptic, and often context free. Here’s the passage I noted:

Seventeen-year-old Nicholas Schiefer has found a better way to search small documents, such as tweets and Facebook statuses – all for his Grade 11 science fair project. The Pickering resident created an algorithm to filter through, and find relevant information. Created using linear algebra and discrete math, his algorithm is named “Apodora” after a python species with extraordinary search capabilities.

The Globe and Mail report includes several interesting quotes from the young search wizard. Here are three I marked and filed for future recycling:

  • The genius in Facebook was not so much algorithmic, but in the social aspect of the network. What [Mr. Zuckerberg] managed to create very well was a desire. In search in general, we already have the desire to search. The technology is trying to catch up to what people expect.
  • My algorithm tries to follow connections further. Connections that are close are deemed more valuable. In theory, it follows connections to an infinite degree. One thing which I really liked about my algorithm is that it didn’t rely on my hand coding almost anything. The computer was able to infer that certain words were related.
  • It’s been shown that people are increasingly reading shorter and shorter documents.

Several thoughts. The use of a snake’s name reminded me that industry giants can be bitten unexpectedly. Search is a work in progress, difficult, and sufficiently expansive to permit numerical recipes to deliver potentially tasty results. Google will probably hire the lad.

So what? As I said in my “Innovation” essay, innovation is often easier to buy than cultivate at home.

Stephen E Arnold, August 4, 2011

Sponsored by Pandia.com, publishers of the New Landscape of Enterprise Search

Thoughts from an Industry Leader: Margie Hlava, Access Innovations

August 4, 2011

Here are some astute observations on the direction of enterprise search from someone who knows what she’s talking about. Library Technology Guides points to an interview with Margie Hlava, president of Access Innovations, in “Access Innovations founder and industry pioneer talks about trends in taxonomy and search.”

Ms Hlava’s 33 years in the search industry informed her observations on current trends, three of which she sees as significant: Cloud and Software as a Service (SaaS) computing, term mining, and the demand for metadata.

The move to the Cloud and SaaS computing demands more of our hardware, not less, Hlava insists. In particular, broadband networks are struggling to keep up. One advantage of the shift is a declining need to navigate labyrinths of hardware, software, and even internal politics on the client side. Other pluses are the motion toward increased data sharing and service enhancement. Also, more ways to maintain security and intellectual property rights are on the horizon.

She says that term mining is “a process involving conceptual extraction using thesaurus terms and their synonyms with a rule-base, then looking for occurrences to create more detailed data maps,” according to Hlava. Her company leverages this concept to make the most of clients’ large data sets. She is interested in new angles like mashups, data fusion, visualization, linked data, and personalization, but with a caveat: success in all these depends on the quality of the data itself. “Rotten data gives rotten results.”

Ms. Hlava regards taxonomies and other metadata enrichment as the way to bring efficiency to our searches. In that realm, the benefits have only begun:

“In terms of taxonomies and search, ‘I think we have just scratched the surface. With good data, our clients are in a good position to do an incredible array of new and interesting things. Good taxonomies take everything to the next level, forming the basis of not only mashups, but also author networks, project collaborations, deeper and better information retrieval,’ she concluded.”

Wise words from a wise woman. We look forward to observing these predictions take shape as the search industry moves forward. The interview with Margie Hlava, can be read in full here.

Access Innovations offers a wide range of content management services. The company has been building its semantic-based solutions for over thirty years and prides itself on its unique tool set and experienced personnel.

Stephen E Arnold, August 4, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

New Countries for Yahoo-Microsoft Search Alliance

August 4, 2011

Yahoo’s partnership with Microsoft is in the driver’s seat, as Search Engine Journal explains in “Yahoo Unrolls Search Alliance to 6 New Countries.” The deal has Microsoft supporting Yahoo by managing the mechanics of the search engine and providing search advertisements. However, Yahoo is remitting transition costs and a percentage of ad revenue. Writer Rob D. Young notes:

“One of the most clear things is that the search alliance will become less costly once it’s complete. At that point, Yahoo will be able to drop its back-end support in countries where Microsoft hasn’t yet taken the reigns, and transition costs will no longer be deducted from the total company income. So it’s good news for Yahoo that the transition to Microsoft has completed in another six regions.”

Argentina, Chile, Colombia, New Zealand, Peru, and Venezuela are the new areas, while more in Europe and Asia are on their way. Yahoo search is being customized for each region. Full migration should be completed by the end of this year.

The company’s second quarter earnings report confirms that these transitions are crucial to the its bottom line. Bing has been in the news lately, but we think that Bing will persist for the foreseeable future. Microsoft cannot concede search advertising to the Google—at least not yet.

Cynthia Murrell, August 4, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Quote to Note from British Telecom

August 4, 2011

Quote to note: The source is a story in the quite enjoyable The Register. The article with the  gem was “BT on Site Blocking: Every Case Will Need a Court Order.”

“We believe in an open internet – we won’t do any other blocking,” he told us. “We will never stop our customers getting to any service they want to get to.`Unless a court orders us to.”

A keeper.

Stephen E Arnold, August 4, 2011

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta