Google and Search

May 11, 2011

Over the last five days, I have been immersed in conversations about Google and its public Web search system. I am not able to disclose the people with whom I have spoken. However, I want to isolate the issues that surfaced and offer some observations about the role of traditional Web sites. I want to capture the thoughts that surfaced after I thought about what I learned in my face to face and telephone conversations. In fact, one of the participants in this conversation directed my attention to this post, “Google Panda=Disaster.” I don’t think the problem is Panda. I think a more fundamental change has taken place and Google’s methods are just out of sync with the post shift environment. But hope is not lost. At the end of this write up, I provide a way for you to learn about a different approach. Sales pitch? Sure but a gentle one.

Relevance versus Selling Advertising

The main thrust of the conversations was that Google’s Web search is degrading. I have not experienced this problem, but the three groups with whom I spoke have. Each had different data to show that Google’s method of handling their publicly accessible Web site has changed.

First, one vendor reported that traffic to the firm’s Web site had dropped from 2,000 uniques per month to 100. The Web site is informational. There is a widget that displays headlines from the firm’s Web log. The code is clean and the site is not complex.

Second, another vendor reported that content from the firm’s news page was appearing on competitors’ Web sites. More troubling, the content was appearing high in a Google results list. However, the creator of the content found that the stories from the originating Web site were buried deep in the Google results list. The point is that others were recycling original content and receiving a higher ranking than the source of the original content.

image

Traditional Web advertising depicted brilliantly by Ken Rockwell. See his work at http://www.kenrockwell.com/canon/compacts/sd880/gallery-10.htm

Third, the third company found that its core business was no longer appearing in a Google results list for a query about the type of service the firm offered. However, the company was turning up in an unrelated or, at best, secondary results list.

I had no answer to the question each firm asked me, “What’s going on?”

Through various contacts, I pieced together a picture that suggests Google itself may not know what is happening. One source indicated that the core search team responsible for the PageRank output is doing its work much as it has for the last 12 years. Googlers responsible for selling advertising were not sure what changes were going on in the core search team’s algorithm tweaks. Not surprisingly, most people are scrutinizing search results, fiddling with metatags and other aspects of a Web site, and then checking to see what happened. The approach is time consuming and, in my opinion, very much like the person who plugs a token into a slot machine and hits the jack pot. There is great excitement at the payoff, but the process is not likely to work on the next go round.

Net net: I think there is a communications filter (intentional or unintentional) between the group at Google working to improve relevance and the sales professionals at Google who need to sell advertising. On one hand, this is probably healthy because many organizations put a wall between certain company functions. On the other hand, if Adwords and Adsense are linked to traffic and that traffic is highly variable, some advertisers may look to other alternatives. Facebook’s alleged 30 percent share of the banner advertising market may grow if the efficacy of Google’s advertising programs drops.

Read more

ZyLAB Audio Search

May 11, 2011

It’s like semantic search for audio files: Allvoipnews announces “ZyLAB Launches Audil Search Bundle.” The eDiscovery company’s product allows you to search your enterprise’s audio using speech analytics:

Company officials said that the desktop software product transforms audio recordings into a phonetic representation of the way in which words are pronounced. The investigators are able to search for dictionary terms, however also proper names, company names, or brands without the need to ‘re-ingest’ the data.

Kudos to ZyLAB. With this project, the company is pushing ahead of Microsoft Fast and Google. That’s no small feat. However, Exalead has offered audio and video search for several years.

Cynthia Murrell May 11, 2011

Military Could Benefit From Universal Translator App

May 3, 2011

According to the Wired.com article “Psst, Military” There’s Already A Universal Translator in the App Store” the US military hopes to get money from Congress to conduct research on the development of a universal translation device for soldiers in the field. “BOLT, the Boundless Operational Language Translation, will be so sophisticated it can understand foreign slang. Robust Automatic Translation of Speech — yes, RATS — will know the difference between speech that needs translating and background noise to discard.”

However, the SpeechTrans App is available on the iPhone and iPad will “record a spoken phrase you want translated, choose your foreign language, and the app will speak it back to you, all while displaying both versions of your text on the screen.” Some might say that it lacks the overall sophistication and advancement of similar army systems but it is an extremely thorough translation program that is ready and available for soldiers to use right now when they need it most.

As an added bonus it costs a mere $19.99. Maybe the army should look in the app store before its next million dollar research project. The Apple App Store claims to have an app for any and everything, and looks like it does. I think online searchers may benefit as well.

April Holmes, May 3, 2011

Freebie

Exalead Embraces SWYM or “See What You Mean”

May 3, 2011

In late April 2011, I spoke with Francois Bourdoncle, one of the founders of Exalead. Exalead was acquired by Dassault Systèmes in 2010. The French firm is one of the world’s premier engineering and technology products and services companies. I wanted to get more information about the acquisition and probe the next wave of product releases from Exalead, a leader in search and content processing. Exalead introduced its search based applications approach. Since that shift, the firm has experienced a surge in sales. Organizations such as the World Bank and PriceWaterhouseCoopers (IBM) have licensed the Exalead Cloudview platform.

I wanted to know more about Exalead’s semantic methods. In our conversation, Mr. Bourdoncle told me:

We have a number of customers that use Exalead for semantic processing. Cloudview has a number of text processing modules that we classify as providing semantic processing. These are: entity matching, ontology matching, fuzzy matching, related terms extraction, categorization/clustering and event detection among others. Used in combination, these processors can extract arbitrary sentiment, meaning not just positive or negative, but also along other dimensions as well. For example, if we were analyzing sentiment about restaurants, perhaps we’d want to know if the ambiance was casual or upscale or the cuisine was homey or refined.

When I probed about future products and services, Mr. Bourdoncle stated:

I cannot pre-announce future product plans, I will say that Dassault Systèmes has a deep technology portfolio. For example, it is creating a prototype simulation of the human body. This is a non-trivial computer science challenge. One way Dassault describes its technology vision is “See-What-You-Mean”. Or SWYM.

For the full text of the April 2011 interview with Mr. Bourdoncle, navigate to the ArnoldIT.com Search Wizards Speak subsite. For more information about Exalead, visit www.exalead.com.

Stephen E Arnold, May 3, 2011

No money but I was promised a KYFry the next time I was in Paris.

New Spin for OmniFind: Content Analytics

May 2, 2011

IBM has dominated my thinking with its bold claims for Watson. In the blaze of game show publicity, I lost track of the Lucene-based search system OmniFind 9.x. My Overflight system alerted me to “Content Analytics Starter Pack.” According to the April 2011 announcement:

The Starter Pack offers an advanced content analytics platform with Content Analytics and industry-leading, knowledge-driven enterprise search with OmniFind Enterprise Edition in a combined package. IBM Content Analytics with Enterprise Search empowers organizations to search, assess, and analyze large volumes of content in order to explore and surface relevant insight quickly to gain the most value from their information repositories inside and outside the firewall.

The product allows IBM licensees to:

  • Find relevant enterprise content more quickly
  • Turn raw text into rapid insight from content sources internal and external to your enterprise
  • Customize rapid insight to industry and customer specific needs
  • Enable deeper insights through integration to other systems and solutions.

At first glance, I thought IBM Content Analytics V2.2 was one program. I noticed that the OmniFind Enterprise Edition 9.1 has one set of hardware requirements at http://goo.gl/Wie0X and another set of hardware requirements for the analytics component at http://goo.gl/5J1ox. In addition, there are specific software requirements for each product.

The “new” product includes “improved support for content assessment, Cognos® Business Intelligence, and Advanced Case Management.”

shotgun marriage big

Is IBM’s bundling of analytics and search a signal that the era of traditional search and retrieval has officially ended? Base image source: www.awesomefunnyclever.com

When you navigate to http://goo.gl/he3NR, you can see the different configurations available for this combo product.

What’s the pricing? According to IBM, “The charges are unchanged by this announcement.” The pricing seems to be based on processor value units or PVUs. Without a link, I am a bit at sea with regards to pricing. IBM does point out:

For clarification, note that if for any reason you are dissatisfied with the program and you are the original licensee, you may obtain a refund of the amount you paid for it, if within 30 days of your invoice date you return the program and its PoE to the party from whom you obtained it. If you downloaded the program, you may contact the party from whom you acquired it for instructions on how to obtain the refund. For clarification, note that for programs acquired under the IBM International Passport Advantage Agreement, this term applies only to your first acquisition of the program.

Read more

Ducks and Alphas: Wolfram Alpha and DuckDuckGo Unite

April 25, 2011

Wolfram|Alpha and DuckDuckGo Partner on API binding and Search Integration,” touts Wolfram Alpha’s own blog. Both organizations have brought something unique to the Search universe, so we’re interested to see what comes of this. Will it be more agile than a Google and Godzilla would? (Googzilla?)

Wolfram|Alpha’s Computational Knowledge Engine not only retrieves data but crunches it for you—very useful, if you phrase your query well. Play with that here.

DuckDuckGo’s claim to fame is that they don’t track us; privacy champions like that. A lot. The site provides brief info, say from a dictionary or Wikipedia, as well as related topics at the top of the results page. It’s also blissfully free of advertising clutter. Check that out here.

According to the Wolfram Alpha blog, they are combining the Wolfram|Alpha functionality with the DuckDuckGo search:

So what does this new partnership mean for you? If you are a DuckDuckGo user, you’ll start to notice expanded Wolfram|Alpha integration. DuckDuckGo will start adding more Wolfram|Alpha functionality and datasets based on users’ suggestions. If there’s a specific topic area you’d like to see integrated into DuckDuckGo, your suggestions are welcome.

And for developers, DuckDuckGo will maintain the free Wolfram Alpha API Perl binding. With that, you can integrate Wolfram|Alpha into your application. Keep in mind that InQuira and Attensity are “products” of similar tie ups.

We’ll enjoy watching the progress of this hybrid beast.

Cynthia Murrell April 25, 2011

Freebie

Is IBM Reshaping Its Approach to Enterprise Search?

April 25, 2011

IBM is a mysterious and baffling outfit to me. One day I get a call from eager IBMers panting to find out what I know about the vendors in enterprise search. content processing, and semantics. Then weeks, maybe months go by, before an IBM person emails me a message like “We’ve been really busy” or “We don’t have a very big budget but maybe you could talk for free”. The classic IBM input I had this year is from a person who agreed to participate in a Search Wizards Speak interview via email. Months after the deadline, I was told an excuse similar to those I heard when I was a freshman in college and a classmate was explaining that his mother and dog died on the same day.

image

A better search or a more complex guitar? Source: http://www.heirloomradio.com/history.htm

Imagine my surprise when I received a link to a story from Yomiuri Online. “Natural Language Analysis Software, IBM Japan” contained what may be an compass reading about IBM’s enterprise search strategy. In a nutshell, IBM may be hooking together a content analytics component with the Lucene based OmniFind Enterprise Edition 9.1. Instead of offering what I can download from Apache or Lucid Imagination, IBM has grafted on text analytics.

The product, which becomes available on April 26, 2011, in Japan. IBM Content Analytics with Enterprise Search mashes up text mining software and information retrieval software. For good measure, IBM includes natural language analysis technology.

The other shocker, if the person translating the article was accurate, is that IBM will compete aggressively on price. I am not sure how IBM prices its products in Japan, but the software could, for all practical purposes be free. IBM makes its money on hardware and services with services becoming increasingly important in my opinion.

The product will handle social content, the unstructured data that plagues customer service operations, and email, among other source and file types. The system classifies content and outputs analytics, which may mean anything from a simple frequency count to a more elaborate SPSS type of function. If prices are indeed low, my hunch is that the SPSS type horsepower will not be present in full royal wedding  regalia.

Some questions:

  • Will this approach make IBM a bigger contender in enterprise search? No. IBM may be trying to carve a new niche for itself but Autonomy and Exalead are already there.
  • Will this play explain the role of Watson or what IBM is doing with the dozens of analytics companies it has acquired? No.
  • Is this a new trend in enterprise search? No.
  • Will IBM continue to make sales to organizations who want to “go IBM”? Yep.

Vendors have been trying to distance themselves from the word “search” for years. In a sense, IBM is just late to the party. But with its financial resources and clout, tardiness may not matter.

Stephen E Arnold, April 25, 2011

Freebie unlike IBM professional services or a technical roll for a FRU.

The Semantic Web as it Stands

April 16, 2011

Semantic search for the enterprise is here, but the semantic web remains  the elusive holy grail.  “Semantic Web:  Tools you can use” gives an overview of the existing state of semantic technology and what is needed to get it off the ground as a true semantic web technology.

Tim Berners-Lee was the first one to articulate what the semantic web would be like, and his vision of federated search is still sorely missing from reality.  Federated search searches several disparate resources simultaneously (like when you search several different library databases at once).  Windows 7 supports federated search, but it is still not common throughout the web.  The W3C (World Wide Web Consortium) has developed standards to support semantic web infrastructure, including SPARQL, RDF, and OWL, and Google, Yahoo and Bing are starting to use semantic metadata and support W3C standards like RDF.

Semantic software is able to analyze and describe the meaning of data objects and their inter-relationships, while resolving language ambiguities such as homonyms or synonyms, as long as standards are followed.  This has practical applications with things like shopping comparisons.  If standards are followed and semantic metadata provided by the merchants themselves, online shoppers can compare products without all the inaccuracies and out-of-date information currently plaguing third-party shopping comparison sites.
There are some tools, platforms, prewritten components, and services currently available to make semantic deployment easier and somewhat less expensive.  Jena is an open-source Java framework for building semantic Web applications, and Sesame, is an open-source framework for storing, inferencing and querying RDF data.  Lexalytics produces a semantic platform that contains general ontologies that can then be fine-tuned by service provider partners for specific business domains and applications.  Revelytix sells a knowledge-modeling tool called Knoodl.com, a wiki-based framework that helps a wide variety of types of users to collaboratively develop a semantic vocabulary for domain-specific information residing on different web sites.  Sinequa’s semantic platform, Context Engine, provides semantic infrastructure that includes a generic semantic dictionary that can translate between various languages and can also be customized with business-specific terms.  Thomson Reuters provides Machine Readable News which collects and analyzes analyzes and scores online news for sentiment (public opinion), relevance, and novelty and OpenCalais, which creates open metadata for submitted content.

Despite all these advances for the use of the semantic web in the enterprise, general, widespread use of the semantic web remains elusive, and no one can predict exactly when that will change:

“In a 2010 Pew Research survey of about 895 semantic technology experts and stakeholders, 47% of the respondents agreed that Berners-Lee’s vision of a semantic Web won’t be realized or make a significant difference to end users by the year 2020. On the other hand, 41% of those polled predicted that it would. The remainder did not answer that query.”

Semantic technology for the enterprise is not only here today, but is growing by about 20% a year according to IDC.  That kind of semantic technology is a much smaller beast to tame.  When it comes to the World Wide Wide, there is still not widespread support of W3C standards and common vocabularies, which is why more people said no than yes in the survey mentioned above.  Generalized web searches are difficult because each site has its own largely proprietary ontology instead of a shared and open taxonomy.
Sometimes even within an enterprise it is difficult to overcome differences in different sectors of the same business.

However, certain industries are starting to come under pressure from customers or industry and have responded by creating standardized ontologies.  GoodRelations is one such e-commerce ontology used by eBestBuy.com, Overstock.com, and Google.  This kind of technique has not become widespread because of the costs and slow payoff involved.  This is a catch-22 where businesses don’t want to jump on the bandwagon because there is not a critical mass yet, but the real benefits won’t start until there is a large number of businesses participating.  Things like product categories are often unique to a business and getting some kind of universal standardization is akin to a nightmare, but there still needs to be consensus on using some type of W3C standards of categorization to satisfy customers.  And, with more an more bogus information proliferating on the web, semantics become not only convenient, but essential for finding the right information.

I think the fundamental question that this article leaves us with is whether or not we have the standards we need or whether the current standards are the stepping off point to something new.  SGML was fine in its day, but it didn’t get very far.  HTML cherrypicked some of the basic ideas of SGML and added linking and the World Wide Web was born.  Now HTML 5 is re-introducing some of the ideas of SGML that were lost.  Maybe HTML can continue to evolve, or maybe someone will cherrypick its best ideas and create something (almost) entirely new.  Another issue is all the work that it takes to create all the metadata, no matter what the standards.  Flickr and Facebook have made user tagging into a fun activity, but for the semantic web to really function, machines need to do do most of the work.  Will this all be figured out by 2020?  Survey says no, but who knows?

Alice Wasielewski
April 16, 2011

Twitter Firehose News

April 15, 2011

There is a tweak to the Witter and Mediasift partnership. You can read about it in the DataSift write up “Twitter Partnership”.

Mediasift and Twitter have agreed to a partnership that has the potential to change how marketers and companies understand conversations about their products as well as how they choose to market them to target audiences. By utilizing the advanced DataSift software they are able to break down “tweets” into a language that is easily understandable and searchable and is still quite cost effective with it’s “pay per use” subscription. The article said:

As a company we have been very fortunate to have access to the Twitter Firehose for quite some time. This has enabled us over the past two years to refine our thinking, leading to the incarnation of DataSift.

DataSift compiles multiple social media feeds and additional data sets to create a common abstract layer which provides meaningful insight into much of the chaotic and unstructured data from the outlets. It took nearly 18 months to complete the DataSift platform but it has already seen a huge outpouring of company and marketing support with more than a billion requests per month.

Important stuff for the real time crowd.

Leslie Radcliff, April 15, 2011

Freebie

Improving Health via Analytics and a Competition

April 14, 2011

We have been poking around in health care information for about eight months. We have an exclusive briefing that covers, among other things, what we call the “shadow FBI.” If you are curious about this shadow FBI angle, shoot us a note at seaky2000 at yahoo dot com. One of the goslings will respond. While you wait for our return quack, consider the notion of a competition to improve health care information in order to make health care better.

Competition promises better health care stated:

The goal of the prize is to develop a predictive algorithm that can identify patients who will be admitted to the hospital within the next year, using historical claims data.

According to the latest survey from the American Hospital Association more than 70 million people in the United States alone will be admitted to a hospital this year. The Heritage Provider Network believes that they can change all of that. The HPN will be holding a two year competition that will award $3 million dollars to the team that can create an algorithm that accurately predicts how many days a person will spend in the hospital over the next year.

An algorithm that can predict how many days a person will spend in the hospital can help doctors create new more effective care plans that can help “nip it in the bud” if there are any causes for concern. If possible the algorithm could help to lower the cost of care while reducing the number of hospitalizations.

This will result in increasing the health of patients while decreasing the cost of care. In short, a winning solution will change health care delivery as we know it – from an emphasis on caring for the individual after they get sick to a true health care system.

HPN believes that an incentive based competition is the way to achieve the big breakthroughs that are needed to begin redeveloping America’s health care system.

Leslie Radcliff, April 14, 2011

Freebie

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta