Why SEO Is in a Bind?

April 4, 2011

In New York City, I gave a breezy 15 minute lecture about “content with intent.” The main point was that traditional search engine optimization methods are now under attack. On one hand, the Web indexing systems have had to admit that SEO distorts results lists. Examples range from links to auto generated placeholder pages such as the one at www.usseek.com or links to sites not related to the user’s query.

Google has made significant changes to its method of relevance ranking. You can read about the infamous Panda update to the PageRank algorithm in these articles:

Blekko.com’s approach has been more direct. The company introduced filtering of sites. For more information about the Blekko method, read “Blekko Banning Some Content Farm Sites.”

The larger problem can be seen by running a free comparison on www.compete.com. Enter the urls for Bing, Facebook, Google, Twitter, and Yahoo in the search box on this page. If the traffic from Facebook and Twitter are combined, the traffic winners will not be a traditional Web search engine in the future. Keep in mind that Compete.com’s data may be different from the data your Web analytics system uses.

image

SEO experts and service providers may find themselves hemmed in by changes such as Google’s Panda algorithm tweak.

The real problem for traditional search engine optimization service providers comes from a combination of factors, not a single factor. This means that Google’s Panda update has disrupted some Web sites’ traffic, there are a number of other forces altering the shape of SEO. These include:

  • A social system which allows a user to recommend a good source of information is performing a spontaneous and in most cases no cost editorial function or a curation activity. A human has filtered sources of information and is flagging a particular source with a value judgment. The individual judgment may be flawed but over time, the social method will provide a useful pool of information sources. Delicious.com was an early example of how this type of system worked.
  • The demographics of users is changing. As younger people enter the datasphere, these individuals are likely to embrace different types of information retrieval methods. Traditional Web search is similar to running a query on a library’s online public access catalog. The new demographic uses mobile devices and often has a different view of the utility of a search box.
  • The SEO methods have interacted with outfits that generate content tailored to what users look for. When Lady Gaga is hot, content farms produce information about Lady Gaga. Over the last five years, producing content that tracks what people are searching for has altered search results. The content may be fine, but the search engines’ relevance ranking methods often skew the results making a user spend more time digging through a results list.
  • Google, as well as other online search systems, is essentially addicted to online advertising revenue. Despite the robust growth on online advertising, Google has to find a way to generate the revenue it needs to support its brute force Web indexing and search system AND keep its stakeholders happy. With search results getting less relevant, advertisers may think twice about betting on Google’s PageRank and Oingo-based Adwords system.

Read more

The FTC, Google and the Buzz

March 30, 2011

I read “Google Will Face Privacy Audits For The Next 20 Long Years.” The Federal Trade Commission has under its umbrella the mechanism to trigger privacy audits of Google’s practices for the next 20 years. Okay. Two decades. The matter fired off the launch pad in February 2010 and, if the story is spot on, landed with a ruling in March 2011. Here’s the passage that caught my attention:

As the FTC put it, “Although Google led Gmail users to believe that they could choose whether or not they wanted to join the network, the options for declining or leaving the social network were ineffective.”

I think this means that Google’s judgment was found lacking. The notion of just doing something and apologizing if that something goes wrong works in some sectors. The method did not seem to work in this particular situation, however.

I noted this passage in the article:

Google has formally apologized for the whole mess, saying “The launch of Google Buzz. fell short of our usual standards for transparency and user control—letting our users and Google down.”

Yep. Apologies. More about those at the Google blog. Here’s the passage of Google speak I found fascinating:

User trust really matters to Google.

For sure. No, really. You know. Really. Absolutely.

I am not sure I have an opinion about this type of “decision”. What strikes me is that if a company cannot do exactly what it wants, that company may be hampered to some degree. On the other hand, a government agency which requires a year to make a decision seems to be operating at an interesting level of efficiency.

What about the users? Well, does either of the parties to this legal matter think about the user? My hunch is that Google wants to get back to the business of selling ads. The FTC wants to move on to weightier matter. The user will continue with behaviors that fascinate economists and social scientists.

In a larger frame, other players move forward creating value. Web indexing, ads, and US government intervention may ultimately have minimal impact at a 12 month remove. Would faster, more stringent action made a more significant difference? Probably but not now.

Maybe Google and the FTC will take Britney Spears’s advice:

“My mum said that when you have a bad day, eat ice-cream. That’s the best advice,”

A modern day Li Kui for sure. For sure. No, really.

Stephen E Arnold, March 30, 2011

Freebie unlike some of life’s activities

OpenText Joins Semantic Web Race

March 25, 2011

Nstein, the Quebec based content administration merchant recently acquired by Open Text, announced the release of a new version of the popular Semantic Navigation software. In a notice on the company’s blog, “Open Text Semantic Navigation Now Available.” The write up presented a lengthy laundry list of features and functions.

Boiling the article down to a sentence or two proved difficult. We believe that OpenText now offers a crawling and indexing system that supports faceted navigation. But there is an important twist. The semantic tool has a search engine optimization and sentiment analysis component as well. The article asserts:

[A licensee can] enrich content–including huge volumes of uncategorized content–by automatically analyzing and tagging it with metadata to help discern relevant and insightful keywords, topics, summaries, and sentiments.

The list of features and functions is lengthy. There is additional information available. Public information is available at this link, but you will need an OpenText user name and password to access the content at this link.

If the product performs according to the descriptions in the source article, a number of OpenText’s competitors will be faced with significant competition.

Stephen E Arnold, March 25, 2011

Freebie

Access Innovations and IEEE Team Up

March 20, 2011

Access Innovations has cultivated a solid relationship with the Institute of Electrical and Electronics Engineers, the foundation of which seems to be their Data Harmony software series.

Access Innovations is one of the leaders in indexing, controlled vocabulary development, and taxonomies. For IEEE Access Innovations has a long, successful track record in helping organizations develop thesauri and controlled vocabularies. The company also has proprietary software which can perform automatic content tagging.

IEEE is responsible for close to a third of the technical publications circulated around the globe, has now sought the firm’s help in revamping how their Xplore library catalogues the massive amounts of data stored within.

Access Innovations said:

To complete the latest project, Access Innovations used an implementation of Data Harmony Metadata Extractor to determine the article’s content type and then built an improved rules base to identify content types in order for each type to be indexed in a specific way using the IEEE Thesaurus.”

Access Innovation’s system provides users the ability to outline and remove information from the source, compiling a fresh record in the process. This marks yet another lucrative venture for the 33 year old company, which services a variety of academic institutions and government agencies.

Micheal Cory, March 20, 2011

Freebie

Facebook, Semantic Search, and Bad News for the Key Word Crowd

March 16, 2011

You can wade through the baloney from the pundits, satraps, and poobahs. I will cut to the chase. Facebook can deliver a useful search service without too many cartwheels. There are three reasons. (If you want to complain, that’s what the comments section of the blog permits. Spare me personal email and LinkedIn comments.)

First, there are upwards of 500 million users who spend more time in Facebook doing Facebook things than I would have ever believed. I don’t do “social” but 500 million or more people see me as a dinosaur watching the snow flakes. Fine.

Second, the Facebook users stuff links in their posts, pages, wall crannies, and everywhere else in the Facebook universe they can. This bunch of urls is a selection filter that is of enormous value to Facebook users. Facebook gets real people stuffing in links without begging, paying, or advertising. The member-screened and identified links just arrive.

Third, indexing the content on the pages to which the links refer produces an index that is different from and for some types of content more useful to Facebook members than laundry lists, decision engine outputs, or faceted results from any other system. Yep, “any other”. That situation has not existed since the GOOG took the learnings of the key word crowd, bought Oingo, and racked up the world’s biggest online advertising and search engine optimization operation in the history of digital mankind.

Navigate to “New Facebook Patent: the Huge Implications of Curated Search” and learn Bnet’s view of a patent document. I am not as excited about the patent at the Bnet outfit, but it is interesting. If one assumes that the patent contributes to the three points I identified above, Facebook gets a boost.

But my view is that Facebook does not need much in the way of a boost from semantics or any other hot trend technology. Facebook is sitting on a search gold mine. When Facebook does release its index of member-provided sources, four things will take place over a period of “Internet” time.

  1. The Google faces a competitor able to index at lower cost. Google, remember, is a brute force operation. Facebook is letting the members do the heavy lifting. A lower cost index of Facebook-member-vetted content is going to be a threat. The threat may fizzle, but a threat it will be to the Google.
  2. Users within Facebook can do “search” where Facebook members prefer to be. This means that Facebook advertising offers some interesting opportunities not lost on the Xooglers who now work at Facebook and want a gigantic payday for themselves. Money can inspire certain types of innovation.
  3. Facebook is closed. The “member” thing is important to keep in mind. The benefits of stateful actions are many, and you don’t need me to explain why knowing who a customer is, who the customer’s friends are, and what the customer does is important. But make the customer a member and you get some real juice.
  4. Facebook competitors will have to find a way to deal with the 500 million members and fast. Facebook may not be focused on search, but whatever the company does will leverage the membership, not the whizzy technology.

Bottomline: Facebook has an opportunity in search whether it does laundry lists, facets, semantics, or any combination of methods. My question, “When will Facebook drop its other social shoe?”

Stephen E Arnold, March 16, 2011

Freebie unlike the ads big companies will want to slap into Facebook outputs for its members

Metadata Are Important. Good to Know.

March 16, 2011

I read “When it Comes to Securing and Managing Data, It’s all about the Metadata.” The goslings and I have no disagreement about the importance of metadata. We do prefer words and phrases like controlled term lists, controlled vocabularies, classification systems, indexing, and geotagging. But metadata is hot so metadata the term shall be.

There is a phase that is useful when talking about indexing and the sorts of things in our preferred terms list. That phrase is “editorial policy.” Today’s pundits, former English majors, and unemployed Webmasters like the word “governance.” I find the word disconcerting because “governance” is unfamiliar to me. The word is fuzzy and, therefore, ideal for the poobahs who advise organizations unable to find content on the reasons for the lousy performance of one or more enterprise search systems.

The article gallops through these concepts. I learned about the growing issue of managing and securing structured and semi structured data within the enterprise.  (Isn’t this part of security?) I learned about collaborative content technologies are on the increase which is an echo of locking a file which several people edit in an authoring system.)

I did notice this factoid:

IDC forecasts that the total digital universe volume will increase by a factor of 44 in 2020. According to the report, unstructured data and metadata have an average annual growth rate of 62 percent. More importantly, high-value information is also skyrocketing. In 2008, IDC found that 22 to 33 percent of the digital universe was high-value information (data and content that are governed by security, compliance and preservation obligations). Today, IDC forecasts that high-value information will comprise close to 50 percent of the digital universe by the end of 2020.

There you go. According to the article, metadata framework technology is a large part of the answer to this problem to collect user and group information, permissions information, access activity, and sensitive content indicators.

My view is to implement an editorial policy for content. Skip the flowery and made-up language. Get back to basics. That would be what I call indexing, a component addressed in an editorial policy. Leave the governance to the government. The government is so darn good at everything it undertakes.

Stephen E Arnold, March 16, 2011

Freebie

Is Precision and Recall Making a Comeback?

March 15, 2011

Microsoft-centric BA Insight explored these touch points of traditional information retrieval. Precision and recall have quite specific meanings to those who care about the details of figuring out which indexing method actually delivers useful results. The Web world and most organizations care not a whit about fooling around with this equation.

image

And recall. This is another numerical recipe that causes the procurement team’s eyes to glaze.

image

I was interested to read in The SharePoint and FAST Search Experts Blog’s “What is the Difference Between Precision and Recall?”  This is a very basic question for determining the relevance of query search results.

Equations aside, precision is the percentage of relevant retrieved documents, and recall is the percentage of relevant documents that are retrieved.  In other words, when you have a search that’s high in precision, your results list will have a large percentage of items relevant to what you typed in, but you may also be missing a lot of items in the total.

With a search that is high in recall, your results list will have more items of what you’re searching for, but will also have a lot of irrelevant items as well.  The post points out that determining the usefulness of search results is actually simpler than this sounds:

“The truth is, you don’t have to calculate relevance to determine how SharePoint or FAST search implementation is performing.  You can look at a much more telling KPI.  Are users actually finding what they are looking for?”

The problem, in my opinion is that most enterprise search deployments lack a solid understanding of the corpus to be processed. As a result, test queries are difficult to run in a “lab-type” setting. A few random queries are close enough for horseshoes. The cost and time required to benchmark a system and then tune it for optimal precision and recall is a step usually skipped.

Kudos to BA Insight for bringing up the subject of precision and recall. My view is that the present environment for enterprise search puts more emphasis on point and click interfaces and training wheels for users who lack the time, motivation, or expertise to formulate effective queries. Even worse, the content processed by the index is usually an unexplored continent. There are more questions like “Why can’t I find that PowerPoint?” that shouts of Eureka! Just my opinion.

Stephen E Arnold, March 15, 2011

Freebie

Google Speeds Tweet Information

March 14, 2011

If you can say one thing about Google is that it likes to do things for itself.  Soshable reports “Forget Indexing Tweets: Google Is Pulling Them Directly from the API.”  Google launched Caffeine last year as a tool for real time web indexing with a heavy influence on social media.

Google used to display tweets from people’s accounts, but now we have learned the company is linking directly to Twitter’s API, thus reducing latency. Our source said:

“Most tweets are eventually indexed – some within minutes, some within hours or even days. These Tweets are being presented in their raw form prior to being indexed. The Tweets themselves are not being used in search results through this new method. They will be indexed separately and can then appear in searches as their own listings, but this is different. Just as with Google’s “Real-time” search, this feature is a fire hose.”

Once tweets are indexed they can be added to search results as individual listings.  One might think this is a new endeavor, but it’s not.  It’s only a quicker way for Google to provide real time information, but it is fact to keep in your frontal memory.

Google continues to make speed a differentiator. In addition to reducing latency for Twitter content, the Chrome Version 10 browser has been positioned as “faster” as well.

Whitney Grace, March 14, 2011

Freebie

Google Search Algorithm: A New Direction?

March 11, 2011

Content, content, content. There is a lot of bad information, some so-so information, and not much high value information available on the public Web. The challenge is to pinpoint the high value information. For Google, the challenge is to identify the high value information and keep the Adwords revenue flowing.

After reading “Google’s New Algorithm Puts Content in the Driver’s Seat“, that word content remained entrenched in my consciousness. The author made some compelling points as he discussed Google’s new algorithm and the role content plays in several aspects of the activities performed online. Citing a noticeable, though not complete improvement to the results of high value search requests, the article expressed both praise for the new formula and relief in what he sees as an overdue shift in the approach to commerce.

One passage I noted was:

“Give them valuable content.  Free.  Give them plenty of it.”

This certainly seems like sound advice. But I want the information that is germane to my query. Who wants to click through a laundry list of links to find what is needed to meet my information need. I don’t.

Google’s PageRank pivoted a decade ago on the importance of links and a site’s rank. Link popularity works for Lady Gaga. Other types of queries may require content that lack have high click or link scores.

Maybe I am sensitive to coincidences. Google’s change to its method comes on the heels of some legal issues related to indexing and results ranking. Is Google trying to improve relevance, manage some push back, or generate some positive public relations? I don’t have the answers to these questions.

Micheal Cory, March 11, 2011

Google and Yelp: The Future of Content Peeks around the PR Baloney

March 7, 2011

My personal view is that content is undergoing a bit of chemical change. The idea that authors are important seems to be eroding. The new value is an online system that generates content from software, volunteers, paid contributors, and sucking goodies from different places. There is no single word that encapsulates these trends. I wish there were. With the right word I could explain what is at the core of the unsolvable problem for Google and Yelp. You can get the core of the hassle in “Google Issues Ultimatum to Yelp: Free Content or No Search Indexing.” One interesting comment in the write up was this passage:

The issue has been ongoing for several years. However, Stoppelman said there is no answer to it at the moment, while Google maintains the same position.

I thought Google had the world’s smartest employees. Yelp has some sharp folks as well. No solution. So we have a new example of an unsolved problem. The Yelp Conundrum is right up there with The Goldbach conjecture. Well, that suggests that neither company is  as smart as I thought it was or both companies have what I call a “power” problem.

Yelp is performing in an area where Google is not doing too well. Google wants the Yelp content and will remove Yelp from its index unless Yelp buckles under. When I read this story I thought about Google’s position when China pulled a power play. Now Google seems to be throwing its traffic power around.

Interesting.

With Bing and Yahoo accounting for 12 percent of Web search and Google most of the other traffic, Google’s position strikes an interesting chord with me.

Let’s assume that Google arbitrarily excludes Yelp. What does this say about objectivity in indexing? What does this make clear about Google’s ability to adjust an index manually? No matter how smart Google’s software is, the decision to block or otherwise hamper Yelp tells me quite a bit.

And what about Yelp? Is the company playing hardball because a potential tie up fell through? Is Yelp the most recent casualty of Google’s effort to expand via acquisition, not organic growth?

Too bad I am not a lawyer. I would have an informed opinion. As an observer from far off Kentucky, I see a glimpse of the future for new content environment.

Stephen E Arnold, March 7, 2011

Freebie

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta