Google and the Perils of Posting

October 21, 2011

I don’t want to make a big deal out of an simple human mistake from a button click. I just had eye surgery, and it is a miracle that I can [a] find my keyboard and [b] make any function on my computers work.

However, I did notice this item this morning and wanted to snag it before it magically disappeared due to mysterious computer gremlins. The item in question is “Last Week I Accidentally Posted”, via Google Plus at this url. I apologize for the notation style, but Google Plus posts come with the weird use of the “+” sign which is a killer when running queries on some search systems. Also, there is no title, which means this is more of a James Joyce type of writing than a standard news article or even a blog post from the addled goose in Harrod’s Creek.

To get some context you can read my original commentary in “Google Amazon Dust Bunnies.” My focus in that write up is squarely on the battle between Google and Amazon, which I think is more serious confrontation that the unemployed English teachers, aging hippies turned consultant, and the failed yet smarmy Web masters who have reinvented themselves as “search experts” think.

Believe me, Google versus Amazon is going to be interesting. If my research is on the money, the problems between Google and Amazon will escalate to and may surpass the tension that exists between Google and Oracle, Google and Apple, and Google and Viacom. (Well, Viacom may be different because that is a personal and business spat, not just big companies trying to grab the entire supply of apple pies in the cafeteria.)

In the Dust Bunnies write up, I focused on the management context of the information in the original post and the subsequent news stories. In this write up, I want to comment on four aspects of this second post about why Google and Amazon are both so good, so important, and so often misunderstood. If you want me to talk about the writer of these Google Plus essays, stop reading. The individual’s name which appears on the source documents is irrelevant.

1. Altering or Idealizing What Really Happened

I had a college professor, Dr. Philip Crane who told us in history class in 1963, “When Stalin wanted to change history, he ordered history textbooks to be rewritten.” I don’t know if the anecdote is true or not. Dr. Crane went on to become a US congressman, and you know how reliable those folks’ public statements are. What we have in the original document and this apologia is a rewriting of history. I find this interesting because the author could use other methods to make the content disappear. My question, “Why not?” And, “Why revisit what was a pretty sophomoric tirade involving a couple of big companies?”

2, Suppressing Content with New Content

One of the quirks of modern indexing systems such as Baidu, Jike, and Yandex is that once content is in the index, it can persist. As more content on a particular topic accretes “around” an anchor document, the document becomes more findable. What I find interesting is that despite the removal of the original post the secondary post continues to “hook” to discussions of that original post. In fact, the snippet I quoted in “Dust Bunnies” comes from a secondary source. I have noted and adapted to “good stuff” disappearing as a primary document. The only evidence of a document’s existence are secondary references. As these expand, then the original item becomes more visible and more difficult to suppress. In short, the author of the apologia is ensuring the findability of the gaffe. Fascinating to me.

3. Amazon: A Problem for Google

Read more

Gain Power, Lose Control? A Search Variant

October 20, 2011

The future of technology, like always, is fascinating: personal virtual assistants, customized search results, and big changes to information appliances. However, the new future Silicon Valley giants like Apple, Google and Facebook will be creating a mix of changes that will bring both unique benefits and some bad results.

It seems that the more advanced and powerful technology becomes, the more control users lose. We learn more in Datamation’s article, “How Apple, Google and Facebook Will Take Away Your Control,” which tells us:

“The more advanced this technology becomes, the bigger the decisions we’ll rely on them to make for us. Choices we now make will be “outsourced” to an unseen algorithm. We’ll voluntarily place ourselves at the mercy of thousands of software developers, and also blind chance. We will gain convenience, power and reliability. But we will lose control.”

Personal computers will no longer need to be maintained or customized. Personal assistants, like the iPhone 4s’ Siri, will place our words in context and learn what we “want.” Search algorithms will continue to customize to user attributes and actions.

Is the gain of convenience and reliability that we get from these shiny new toys worth it? Or is the shine just a distraction from the fact that we lose all control in search and technological decision making? I am not so sure the good will be outweighing the bad in this scenario, but I fear that we may be stuck in the cycle.

Andrea Hayden, October 20, 2011

Sponsored by Pandia.com

Protected: Into the Future of Sharepoint with a Smooth Sail

October 12, 2011

This content is password protected. To view it please enter your password below:

Lucid Imagination: Open Source Search Reaches for Big Data

September 30, 2011

We are wrapping up a report about the challenges “big data” pose to organizations. Perhaps the most interesting outcome of our research is that there are very few search and content processing systems which can cope with the digital information required by some organizations. Three examples merit listing before I comment on open source search and “big data”.

The first example is the challenge of filtering information required by orgnaizatio0ns produced within the organization and by the organizations staff, contractors, and advisors. We learned in the course of our investigation that the promises of processing updates to Web pages, price lists, contracts, sales and marketing collateral, and other routine information are largely unmet. One of the problems is that the disparate content types have different update and change cycles. The most widely used content management system based on our research results is SharePoint, and SharePoint is not able to deliver a comprehensive listing of content without significant latency. Fixes are available but these are engineering tasks which consume resources. Cloud solutions do not fare much better, once again due to latency. The bottom line is that for information produced within an organization employees are mostly unable to locate information without a manual double check. Latency is the problem. We did identify one system which delivered documented latency across disparate content types of 10 to 15 minutes. The solution is available from Exalead, but the other vendors’ systems were not able to match this problem of putting fresh, timely information produced within an organization in front of system users. Shocked? We were.

lucid decision copy

Reducing latency in search and content processing systems is a major challenge. Vendors often lack the resources required to solve a “hard problem” so “easy problems” are positioned as the key to improving information access. Is latency a popular topic? A few vendors do address the issue; for example, Digital Reasoning and Exalead.

Second, when organizations tap into content produced by third parties, the latency problem becomes more severe. There is the issue of the inefficiency and scaling of frequent index updates. But the larger problem is that once an organization “goes outside” for information, additional variables are introduced. In order to process the broad range of content available from publicly accessible Web sites or the specialized file types used by certain third party content producers, connectors become a factor. Most search vendors obtain connectors from third parties. These work pretty much as advertised for common file types such as Lotus Notes. However, when one of the targeted Web sites such as a commercial news services or a third-party research firm makes a change, the content acquisition system cannot acquire content until the connectors are “fixed”. No problem as long as the company needing the information is prepared to wait. In my experience, broken connectors mean another variable. Again, no problem unless critical information needed to close a deal is overlooked.

Read more

Relevance: Once Ignored, Now a Core Issue

September 23, 2011

The Google recipe for its Web site placement order for searches is closely guarded despite the company’s open-source policy. The article, Google Discusses Their Algorithm Change Process, on Search Engine Journal, explains the lengthy and arduous process Googlers must go through in the quest for search engine impact.

Google explains that they must guard the algorithms to keep the manipulation of its numerical recipes which contain mathematical formulas and secret sauce, within Google defined boundaries. In fact, an entire industry has grown up around trying to crack Google’s search algorithms in an effort to bolster one’s relevance in Google’s recipe. Google isn’t just sitting around; rather, the company is constantly updating and tweaking their algorithms. We learned:

Each idea is based on improving the user experience, but not every idea actually shows a positive user impact; while over 500 changes were made last year, over 20,000 experiments were conducted in that same time period. The key takeaway is that, while it’s a good idea to pay attention to experiments, only a small cut will ever become a part of the standard – and, with 500 changes a year, even those alterations are subject to reversal.

With many changes occurring behind the curtain, how are Web masters who want users to find their content to achieve findability? Although 500 changes may be made in a year, not all of them (hardly any at all) have an impact of the majority of site rankings. The sites which may be affected are, we have heard, on their own. Google does not offer support, but it does provide a useful list of guidelines.

The few changes that do impact some sites can pack a wallop. The search engine optimization industry typically responds with suggestions, ideas, and work arounds. Google then changes its algorithm and the process begins again.

What’s the fix? Our conclusion is that one may have to “go Google”. Embrace all things Google, and buy Adwords. Trying to outwit Google may be a time consuming and unrewarding activity.

Catherine Lamsfuss, September 23, 2011

Sponsored by Pandia.com

Search Is a Major Project: Does Doom Await?

September 22, 2011

Datamation ran a long article called “Why Major IT Projects are More Likely to Fail Than Any Others” informs us of a study published by Oxford University which found that major IT projects are twenty times more likely to fail than other IT projects. On average, these larger projects ran 27 percent over-budget and took 55 percent longer to complete than originally planned. One in six eventually spiral out of control.

According to Silicon.com’s article, “Five ways to stop your IT projects spiralling out of control and overbudget” describes why this is the case and details tips for controlling projects. The article states:

The risk of tech projects going rogue is down to IT being harder to manage than other business disciplines, according to Bent Flyvbjerg, BT professor and founding chair of major program management at Oxford University. ‘Our theory is this is because IT projects are less tangible, they are also more complex,’ he told silicon.com.

Is it possible that a main culprit behind this phenomenal statistic is complexity? Are information technology companies attempting to develop elaborate plans for the newest and the best and aiming too high? I think it’s very likely. Perhaps if developers could simplify their ideas and end the game of out-performing each other, we could easily have more IT projects completed.

Search deployments often become expensive headaches, but it may not just be the peculiarities of search or search vendors, integrators, or staff. The problem may reside in the fact that the complexity of the undertaking is overlooked, ignored, or not understood. Too bad. Some search vendors take the blame for a problem created by a business process, not technology.

When I spoke with Stephen E Arnold, publisher of Beyond Search, he told me:

Software and systems are complex. The environments into which engineers insert these things is complex. Complexity adds to complexity. Costs sky rocket and systems don’t work particularly well. Vendors often take the blame for problems caused by casual, uninformed, or inappropriate business processes used to scope a project and spell out requirements. Search falls victim to these issues just like enterprise resource planning, accounting, and document management systems.

Quite a challenge for those responsible for a large scale project awaits it seems.

Andrea Hayden, September 22, 2011

Sponsored by Pandia.com

Protected: SharePoint and Product Lifecycle Management

September 21, 2011

This content is password protected. To view it please enter your password below:

Protected: An X Ray of the SharePoint User Subsystem

September 16, 2011

This content is password protected. To view it please enter your password below:

Kroll in the UK and Its Content Technology

September 14, 2011

The recent disturbances in London have lead UK Prime Minister David Cameron to reach across the pond to consult Kroll Chairman and former American police chief, William Bratton on preventing gang related violence and building safer communities. There’s nothing like an outside US expert to come to the aid of our British cousins.

Altergrity, a specialized law enforcement training company and owner of Kroll, quoted Mr. Bratton in an Aug 12, Media Release:

I would certainly be in a position to discuss the contemporary American experience and my work in these areas – in particular the successes that created real reductions in gang-related crime in Boston, New York and most recently in Los Angeles, where we also saw significant improvements in the relations between the police and the city’s diverse communities. There are many lessons from these experiences that I believe are relevant to the current situation in England.

Based on this release, Mr. Bratton appears confident in his abilities to solve the world’s security concerns. We hope that UK police and civilians are equally secure in the role that his company takes in dispelling the violence affecting their country. If you want some basic information about the types of search and content processing tools that Mr. Bratton brings to his engagements, navigate to the interview with former Kroll wizard David Chaplin here. This is quite impressive technology.

Jasmine Ashton, September 14, 2011

Sponsored by Pandia.com

Hlava on Machine Assisted Indexing

September 8, 2011

On September 7, 2011, I interviewed Margie Hlava, president and co-founder of Access Innovations. Access Innovations has been delivering professional taxonomy, indexing, and consulting services to organizations worldwide for more than 30 years. In our first interview, Ms. Hlava discussed the needs for standards and the costs associated with flawed controlled term lists and some loosely-formed indexing methods.

In this podcast, I spoke with her about her MAI or machine assisted indexing technology. The idea is that automated systems can tag in a consistent manner high volume flows of data. The “big data” challenge often creates significant performance problems for some content processing systems. MAI balances high speed processing with the ability to accommodate the inevitable “language drift” that is a natural part of human content generation.

In this interview, Ms. Hlava discusses:

  • The value of a neutral format so that content and tags can be easily repurposed
  • The importance of metadata enrichment which allows an indexing process to capture the nuances of meaning as well as the tagging required to allow a user to “zoom” to a septic location in a document, pinpoint the entities in a document, and automated summarization of documents
  • The role of an inverted index versus the tagging of records with a controlled vocabulary.

One of the key points is that flawed indexing contributes to user dissatisfaction with some search and retrieval systems. She said, “Search is like standing in line for a cold drink on a hot day. No matter how good the drink, there will be some dissatisfaction with the wait, the length of the line, and the process itself.”

You can listen to the second podcast, recorded on August 31, 2011, by pointing your browser to http://arnoldit.com/podcasts/. You can get additional information about Access Innovations at For more information about Access Innovations at this link.  The company publishes Taxodiary, a highly regarded Web log about indexing and taxonomy related topics.

Stephen E Arnold, September 8, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta