Search Engine Optimization Meets Semantic Search

August 19, 2008

I’ve been sitting in the corn fields of Illinois for the last six days. I have been following the SES (Search Engine Strategies) Conference via the Web. If you have read some of my previous posts about the art of getting traffic to a Web page, you know my views of SEO. In a word, “baloney.” Web sites without content want to get traffic. The techniques used range from link trading to meta tag spamming. With Google the venturi for 70 percent of Web search, SES is really about spoofing Google. Google goes along with this stuff because the people without traffic will probably give AdWords a go when the content-free tricks don’t work reliably.

I was startled when I read the summary of the panel “Semantic Search: How Will It Change Our Lives?” The write up I saw was by Thomas McMahon, and it seemed better than the other posts I looked at this evening. You can read it here. The idea behind the panel is that “semantic search” goes beyond key words.

This has implications for people who stuff content free Web pages with index terms. Google indexes using words and sometimes the meta tags play a role as well. If semantic search grabs on, people will not search by key words, people will ask questions. The idea is that instead of typing Google +”semantic Web” +Guha, I would type, “What are the documents by Ramanathan Guha that pertain to the semantic Web.” The fellow helped write the standard document several years ago. He’s a semantic Web guru, maybe the Yoda of the semantic Web?

image

Source: http://www.kimrichter.com/Blog/uploaded_images/snakeoil_1-794216.jpg

Participating in this panel were Powerset (Xerox PARC technology plus some original code), Hakia (original technology and a robust site), Ask.com (I’m not sure where it falls on the semantic scale since the rock band wizard from Rutgers cut out), and Yahoo (poor, fragmented Yahoo).

The elephant in the room but not on the panel is Google, a serious omission in my opinion. Microsoft R&D has some hefty semantic talent as well, also not on the panel.

In my opinion the semantic revolution is going to make life more difficult for the SEO folks. Semantic methods require content. Content free Web sites are going to be struggling for traffic unless several actions are taken:

  1. Create original, compelling information. I just completed an analysis of a successful company’s Web site. It was content free. It had zero traffic. The short cut to traffic is content. The client lacks the ability to create content and doesn’t understand that people who create content charge money for their skills. If you don’t have content, go to item two below.
  2. Buy ads. Google’s traffic is sufficiently high that an ad with appropriate key words will get some hits. Buying ads is something SES attendees understand. Google understands it. You may need to pump $20,000 per month into Googzilla’s maw, but you will get traffic.
  3. Combine items one and two.
  4. Buy a high traffic Web site and shoehorn a message into it. There are some tasty morsels available. Go direct and eliminate the hassle and delay of building an audience. Acquire one.

Most SEO consulting is snake oil and expensive snake oil at that. The role of semantic methods will be similar to plumbing. It is important, but like the pipes that carry water, I don’t have to see them. The pipes perform a function. Semantics and SEO are a bit of an odd couple.

Stephen Arnold, August 19, 2008

Search Engine Plumbing Revealed

August 19, 2008

Explaining search is a very difficult business. I want to recommend  “The Linear Algebra Behind Search Engines” by Amy Langville. The discussion was developed several years ago and is now available without charge on MathDL, a service of the Mathematical Association of America’s Digital Library. You can find the excellent write up here. Dr. Langville does include equations, something most publishers quickly delete from most books and reports about search and content processing. Useful comments and explanatory material set this essay apart. If you are interested in the inner workings of some of the search methods in use today, this is must read material. Two–yes, two–happy quacks to Dr. Langville and her excellent work. Now on the University of Charleston team, Dr. Langville has a Ph.D. in operations research from North Carolina State University.She is a recipient of the multi year CAREER Award from the National Science Foundation. She has a new book about Google’s PageRank method in the works.

Stephen Arnold, August 19, 2008

Tuesday Trivia: Misspell Google And

August 19, 2008

If you enter the address www.goole.com, you will end up at “the town and port of Goole.” Goole helpfully provides a number of links to Ask.com. Thanks for the reader who called this to my attention. Watch your spelling, gentle reader. I would put more than a single advert on this page, however.

Stephen Arnold, August 19, 2008

GraphOn Vs Google

August 18, 2008

Patents are complicated. Software patents are even more complicated. GraphOn, a publicly traded company with the motto “Fast and Secure Application Access”, asserts that Google has infringed on GraphOn patents. Forbes’ Magazine has a good summary here. GraphOn’s technology includes systems and methods for cloud-based services. One bone of contention pertains to data management.

The GraphOn organization has pressed claims against Juniper Networks, AutoTrader.com, and other high profile outfits. Some of Google’s highest profile services may be affected, including Google Base and Google AdWords. Google has a number of patents for its systems and methods. A partial list of these is available at ArnoldIT.com here. Some of the information from my study of selected Google inventions may be located by navigating here and entering the phrase Google patents in the search box. I do maintain a relatively complete listing of Google’s patent documents, but this information is available to my clients. If you are interested in accessing these data, write me at seaky2000 at yahoo dot com for more information. My Google Version 2.0 reviews a number of Google’s patent documents, including some references to Google’s approach to data management, publishing, and a number of innovation drivers; that is, inventions in which Sergey Brin or Larry Page play a role. Keep in mind that I am not a legal eagle. My discussion of these inventions is intended to share my findings about how certain Google innovations enable certain applications. As Google’s influence grows, legal charges are likely to increase as well. Google has a number of legal matters underway, some involving data management systems and methods. Patent litigation is slow and expensive. Information will dribble out which it difficult to know exactly what’s happening. What’s clear is that GraphOn believes it has a strong case based on its patents:

  • 6,324,538, Automated on-line information service and directory, particularly for the world wide web
  • 6,850,940, Automated on-line information service and directory, particularly for the world wide web
  • 7,028,034, Method and apparatus for providing a dynamically-updating pay-for-service web site,
  • 7,269,591, Method and apparatus for providing a pay-for-service web site

You can get more information about each of these from the search system at the US Patent & Trademark Office. Remember to check your query syntax. It must match the sample searches in order to get goodies from the USPTO’s wonderful system. I am making no warranties or guaranties about these references. You will need to verify these numbers and titles yourself.

The ZDNet discussion of this issue is here.

Stephen Arnold, August 18, 2008

SharePoint: Custom Search Scopes

August 18, 2008

A reader sent me a link to SearchWinIT at TechTarget.com. The article explains how to “Create Custom Global Search Scopes in Microsoft SharePoint 2007.” The author is Natalya Voskresenskaya, and you can read the full text here. A “search scope” is a narrowing function. It’s somewhat like setting up a collection of documents and then routing specific users’ queries to that collection. The idea is that the content in the scope (“collection”) will be more appropriate. For example, the marketing department needs access to content from two departments and the documents reside in specific folders. A scope allows a user in marketing to get hits from that specific subset of content. SharePoint has other documents in its index, but the marketing person sees documents from that scope. The article does a good job of explaining the procedure to set up a scope.

Stephen Arnold, August 18, 2008

Microsoft SharePoint: Almost to the Podium

August 18, 2008

I don’t pay much attention to Adobe. The outfit lost me when it went nowhere with Framemaker (ideal for long technical reports) and put steroids in InDesign, a tool for people with a Master of Fine Arts degree. I use FoxIt, abandoning the bloated Acrobat. I get along okay with GIMP because the upgrades to Photoshop take too much time for me to figure out. Fiddling with weird little controls is fine for a 16 year old, not so fine for a 64 year old with lousy eye sight.

I did read JD/Adobe, a Web log by an Adobe guru, and I urge you to read the article as well. Titled “NBCOlympics.com aftereffects”, Mr. John Dowdell does a very good job of summarizing some of the knows about the Microsoft / NBC Olympics service. You will want to save the article. I think others will respond to it in the near future. The story is here.

For me, the most interesting point in the write up is this comment:

Microsoft was heard as saying “we’re bringing Olympics to the world”, and only later people realized this was a US-only deal. Linux users were cut out, as were Mac/PPC owners. Then 10% of US broadband folks were cut out atop that. Microsoft would have drawn less criticism were they a little more realistic in setting expectations.

Another critical view of the coverage appears in LiveSide.net’s Web log. That article is here.

When I read this, three thoughts went through my mind:

  • The shading of the universality of the Silverlight-based service bothered me. The reality of what was available versus what was suggested to be available illustrates the schism between engineers and marketers in the Microsoft organization.
  • The exclusion of Mac and Linux owners is, for me, at odds with Microsoft’s relatively recent emphasis upon “playing well with others.” My Macs sat deaf and blind to the Olympic events I wanted to see. As a result, as a consumer, I was angry. I like table tennis, and I wasn’t going to get up at 4 am and watch to see if the events would be broadcast on “regular” TV. I want table tennis Olympics style my way.
  • The strategic implications of the Microsoft information in Mr. Dowdell’s write are significant if he is on the money. Why? SharePoint is positioned as collaboration, content management, search, and the next Microsoft enterprise operating system or some such wild vision. The reality is that SharePoint cannot be equally adept at each of these quite different functionalities. Just as the Olympics programs were not available on the platforms named by Mr. Dowdell, SharePoint will not be the universal slicing and dicing machine the marketers suggest.

I want to pick up the thread of this idea in a subsequent addled goose post. For the moment, please, read Mr. Dowdell’s essay and look at the comments.

Stephen Arnold, August 18, 2008

Email or Search: Which Wins the Gold

August 18, 2008

My son (Erik Arnold) runs a nifty services firm called Adhere Solutions. He’s hooked up with Google, and he views the world through Googley eyes. I (Stephen Arnold) run the addled goose outfit ArnoldIT. Google does not know I exist, and if Googzilla did, the Mountain View giant would make a duvet from my tail feathers.

The setting. We’re sitting in a cafeteria. The subject turns to which is the killer application for today’s 20 something. Is it email (the Brett Favre of online) or is it search (the Michael Phelps of cloud services). My son and I play this argument MP3 file frequently, and our wives have set down specific rules for these talks. First, we have to be by ourselves. Two, we have to knock off the debate after 30 minutes or so. Erik and I can extend analytic discussions of digital theory over years, and we have marching orders to knock that off.

Here’s the argument. Erik asserts that search is the new killer app. I agree, but I tell him I want to make a case for email as long as I can extend it to SMS and newer services under the category Twitterish. He agrees.

My Argument: Messaging

Messaging is communications. Search is finding and discovering. Therefore, the need to communicate is higher on the digital needs scale than simple finding. With services that allow me to call, text, create mini blogs, and broadcast brief Tweets, I am outputting and receiving messages that are known to be:

  • Important. I don’t text a client to tell her what I had for lunch is the wonderful cafeteria. Grilled cheese as it turns out. Important to me, but to no one else. I send important messages that have an instrumentality.
  • Timely. I control the time delivery, matching urgency with medium. I sent a fax last week. What a hassle, but the message warranted a fungible copy, not urgent delivery. I want to dial in the “time” function, not leave it to chance or to some other authority.
  • Content rich. I write baloney, but I wouldn’t write baloney unless it was important to me and to the recipient of one of my messages, articles, or 350 page studies.

In conclusion, messaging–particularly electronically implemented messaging–is the killer app. Search is useful, just not one to one, one to many, many to one, or many to many communications. By definition, search is not timely, of uncertain importance, and often not content rich due to format, editorial policy or the vapidity of the data.

My Son’s Argument

Messaging is not necessarily digital. Though crucial, when we talk about an online killer app, it’s not email. The killer app must deliver a function that we can’t duplicate in the analogue world. For that reason search is the killer application for the 21st century. Here’s why:

Read more

Facets ‘Lite": Discovery Navigation for Thunderbird

August 18, 2008

David Huynh, a research scientist at MIT, posted in March 2008, a brief description of Seek 1.0. This software plug in allows a user to locate information in Thunderbird email. In eCommerce and enterprise search, Endeca has been successful positioning itself as one of the leaders in point-and-click interfaces. The idea is that during content processing, the system identifies concepts, entities, and relationships. A user has the option of plugging a word into a search box or browsing categories or other objects displayed. The user can scan a list of hot links, click on one, and begin examining information. Key word search is useful, but if the user does not know the terms to use, the browse feature becomes a useful way to locate information.

The Seek 1.0 component, according to Dr. Huynh’s Web log here, “an extension for Mozilla Thunderbird that provides faceted browsing features to let you search through your email more efficiently.” Commercial systems can be expensive. Dr. Huynh’s is available here. Endeca is most likely aware of Dr. Huynh’s activities, and Dr. Huynh lists one of Endeca’s research scientists in his “blogroll”.

Here’s a snippet of the interface:

image

After installing the component, navigate to the Thunderbird Tools menu and click on Seek. You are good to go.

Mr. Huynh says:

It is thus important that everyone be able to deal with data themselves: gather data, sift through data, integrate data, interpret data, make informed conclusions, and present their findings to their peers and to the world.

For me the importance of Seek is that the system is sufficient light weight to run on most notebook computers. Furthermore, the interface integrates well with Thunderbird, so users don’t have to understand metadata to make use of the system. Finally, for now, the system is making discovery interfaces available to a broader range of email users.

Is there a downside? The system does take some time to process content. I didn’t notice significant latency, but I have a fire breather and you may have an asthmatic gizmo. We have not subjected the component to crash recovery testing; that is, is it possible to restore indexes in the event of a problem. We will get to that in the days ahead. Finally, there are a number of commercial systems gearing up to enhance, improve, and search email. At this point it’s not clear how these services will serve to confuse users which can create traction problems for interesting projects like Seek.

A happy quack to Dr Huynh and the rest of the technical Jedi knights at the MIT Haystack Group. If you want to know more about Dr. Huynh, here’s cv is here.

Transinsight: Bio-Science Search

August 17, 2008

Earlier this year, I watched several “webinars” (man, I hate that term) about life science search. One company was in Denmark. Another outfit was in Michigan. A third company was the German firm Transinsight. Semantic content processing allows assisted navigation to complement the search box. The idea is that a user will recognize useful information. A key word search puts the burden on the user to find the “right” query to get the system to disgorge the need information.

The company has a demo to showcase its technology. GoPubMed here allows you to locate information without entering and refining queries. The interface offers some useful options; for example, here’s the discovered topics and statistics for 1,000 documents about oncology.

stats display

The company’s customers include Elsevier, BASF, Unilever, and the Max-Planck-Institut for Biochemistry, among others. The privately held firm has revenues estimated to be about $3.0 million per year. Venture funding has been provided by High Tech Gruenderfonds.

On August 15, 2008, Transinsight announced a deal with Abcam, a specialist in antibodies and reagents, to develop a search solution for antibody targets. You can read more about Abcam here. In today’s search lingo, the new service will be a “vertical search system.” A news release about the new system is here.

The important points about Transinsight and its announcement include:

  • The semantic technology originated in Germany
  • The system pushes beyond the point and click interfaces available for less specialized content with the addition of the illustrated statistics function in the screenshot
  • The technology is an appropriate use for the six or seven synonyms for gene name. Although complex, the application is not a “boil the ocean solution”.

A happy quack to Transinsight and the Beyond Search reader who provided the link to Transight.

Stephen Arnold, August 17, 2008

The ‘Search Is Dead’ Question

August 17, 2008

New Idea Engineering and I cooperate to produce a list of utilities helpful to those working with search and content processing. I want to build on the August 4, 2008, post “Enterprise Search Dead” Or Just Misunderstood?” Keep in mind that I don’t disagree with the points in the post. For me, the important point in the article was this statement about the fact that organizations have multiple search and content processing systems:

The real trick is to glue these technologies together not into a single giant searchable index, but to combine them together logically so the user does not need to know where to look for specific content.  We, like many others, call this Federated Search,

I am in favor of federation, aggregation, and simplification. My concern is that the costs associated with multiple systems, multiple “looks” at information, and multiple “cooks in the kitchen” will be difficult to control. Costs matter today. Tomorrow costs will be even more important. Here’s why:

  1. As search becomes pervasive, costs will chug along, controls will be lax, and then the bills arrive. Few managers can survive cost time bombs like those associated with search. A “time bomb” is a “do whatever it takes” weekends when the system goes down or a cost review by a new chief financial officer who puts a ceiling on information technology expenditures and triggers a melt down.
  2. Multiple indexes of the same document are okay as long as the document is not undergoing rapid change. In certain organizations, change is frequent and often pretty darn wacky. Out of sync information retrieval systems can be a gold mine for legal discovery. Figuring out which index is the “right” one may be an issue in some situations.
  3. Multiple systems indexing content within the organization can choke the internal network. Running several systems to update an index may degrade network performance.

Most information technology mangers assume that today’s software and hardware can handle any demand. The problem is that many of today’s systems increase complexity and risk. The ready availability of low cost, fire breathing servers removes inhibitions. The result is system promiscuity and projects that look great in a PowerPoint presentation but fail miserably in the crucible of doing every day work.

If search is not dead, we need to retire it and move up a level. Let’s give users a way to access information that makes most users happy. That’s not what today’s systems deliver. Most users are unhappy with the search systems available to them for behind the firewall search.

Stephen Arnold, August 17, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta