Google Interview Worth Reading

March 25, 2009

The interview with Alfred Spector in ComputerWorld is interesting for what it says and what it omits. You can find the article “The Grill: Google’s Alfred Spector on the Hot Seat” here. This is a three part interview. Mr. Spector is billed as Google’s vice president of research. For me, the most interesting comment was:

Do you have plans to go after that huge body of information on the Internet that is not currently searched? There is stuff on the Web, the so-called Deep Web, that is only “materialized” when a particular query is given by filling fields in a form. Since crawlers only follow HTML links, they cannot get to that “hidden” content. We have developed technologies to enable the Google crawler to get content behind forms and therefore expose it to our users. In general, this kind of Deep Web tends to be tabular in nature. It covers a very broad set of topics. It’s a challenge, but we’ve made progress.

I would hope so. Google has Drs. Guha and Halevy chugging away or had them chugging away on this problem. Furthermore, Google bought Transformics, a company that most of the Google pundits have paid scant attention to. Yep, Googzilla is making progress. Just plonking along with the fellow who worked on the semantic Web standards and the chap who invented the information manifold. I enjoy Google understatement.

Stephen Arnold, March 24, 2009

Google Copies from Ask.com

March 25, 2009

Newsvine.com ran a story bylined by Michael Liedtke, a journalist working for the Associated Press. I am fearful of quoting anything from an AP story, but I think I can convey the gist of the story “Google Draws upon Rival Ideas with Search Changes” here. The idea is that Google’s suggested queries were inspired–Mr. Liedtke uses the word “popularized”–by Ask.com, the search engine of NASCAR. When I read this, I laughed. Suggested searches are not exactly a new innovation. I looked in my files and found references to clustering dating back a decade or more. I recall a clustering effort coded by the Information Industry Award recipient Howard Flank in 1981. The difference between the early attempts at clustering and what Google introduced boils down to one word–scale. Mr. Flank’s effort would not run on the machines available to us at the proprietary Lockheed Dialog company. My thought is that the Google has a practice of working through innovation from computer labs and research papers, learning, and using its clever methods to implement useful functions on Google’s scale. Was Ask.com the inspiration for Google’s engineers. A more likely influence was Dr. Salton’s 1978 paper “Generation and Search of Clustered Files”. Need a copy. Click here. I have zero relationship with Googzilla, an outfit wishing I was a roasted goose. But I was taken aback with the suggestion that the GOOG turned to the search engine of NASCAR for inspiration.

Stephen Arnold, March 25, 2009

EntropySoft: Exclusive Interview with Nicolas Maquaire, CEO

March 25, 2009

A search engine or content processing system is deaf and dumb without a connector to a content source. Most text processing systems include these software connectors (sometimes called “filters” or “adaptors”) to process flat text such as the ASCII generated by a simple text editor. But plain text makes up a small part of the content stored on an organization’s file servers, workstations, and computers. In order to index content from a legacy AS/400 system running the Ironsides enterprise resource planning system, a specialized software connector is required. Writing these connectors is tricky. EntropySoft is a content integration company. The firm has a strong competency in creating software to perform a range of content manipulations; for example, content transformation of an XML file into a file type required by another business process or enterprise system. Mr. Maquaire spoke with Stephen E. Arnold, ArnoldIT.com on March 24, 2009, about EntropySoft’s software and services.

Nicolas Maquaire, the chief executive officer, of EntropySoft described his company this way:

EntropySoft is a connector factory. We have more than 30 read/write connectors for unstructured data, possibly the biggest portfolio on the market. Our connectors enable most of the features of popular content-centric applications such as Alfresco, IBM FileNet P8, Hummingbird DM, Interwoven TeamSite, IBM Lotus Quickplace, Microsoft SharePoint etc… The extensive support of features and the size of the connector portfolio make this technology perfect OEM material for many software industries. On top of the read / write connectors, EntropySoft has two technological layers (Content ETL and Content Federation) that are also available as OEM components.

A number of the world’s leading search and content processing companies use EntropySoft’s connectors. Examples include Coveo, Exalead, and Image Integration Systems.

Mr. Maquaire, in an exclusive interview with ArnoldIT.com’s Search Wizards Speak series, said:

The market for content integration is complex. Building a single connector for a specific use case seems nonsensical to us. If you develop many connectors, interoperability then becomes reality. Thanks to its more than 30 (and growing!) connectors, EntropySoft is becoming a one-stop-shopping point for connectivity and interoperability. For the past four years, EntropySoft has acquired valuable knowledge on all popular content-centric systems. EntropySoft connectors have been market-tested for years. EntropySoft connectors are put to work daily in critical business conditions, and EntropySoft unique in-house developed testing system allows fast implementation of customer-driven connectors improvements.

You can read the full-text of the Maquaire interview on the ArnoldIT.com Web site here. The interview is number 37 in this series. The interviews provide one of the most useful bodies of information about enterprise search and content processing available at this time. The Search Wizards Speak is available as a service to organizations and information professionals worldwide. Knowledge about search and content processing increases the payoff from an investment in information retrieval.

Google Slowing Down, Sitting on the Sidelines

March 24, 2009

IDC has been showing some zip. Two articles caught my attention because both point out vulnerabilities in this formidable company. You must read both of these articles. They were:

  • The ComputerWorld story “Pentaho and Amazon.com Deliver BI to the Cloud” here. The story reported that Amazon, the cloud computing retailer, hooked up with Pentaho. The goal is to deliver business intelligence. How is this germane to Google? In my opinion, Google is not in this game. The company’s failure to respond to Amazon’s cloud computing challenge underscores the fact that Google is not as nimble as Google. I was hoping that Eric Lai would have pointed out that Google is simply not at this dance.
  • The IDG news service story “Google Apps Missing Enterprise Social-Networking Revolution” here. This story was distributed by Reuters and it pointed out that Google’s Orkut is not hooked into Google Apps.”

Is Google falling behind? In my view, Google is the cat’s meow. To some Google watchers, I think one can make a case that the GOOG is not able to keep pace with some of its more nimble rivals. IDC seems to be on top of this issue.

Stephen Arnold, March 24, 2009

ISYS Search Software: Google Patent Collection

March 24, 2009

You will want to take a look at the ISYS Search Software demonstration here. The company took my collection of Google patent documents from 1998 to December 2008 and processed them. You can run a key word query, click on the names of people, and explore this window into Google’s technology hot house via the ISYS Search Version 9. When you locate a patent document that interests you, a single click will display the PDF of the patent document. You can browse the drawings and claims with the versatile ISYS system at your beck and call.

I have used the ISYS Search Software since Version 3.0. The system delivers high speed document processing, high speed query processing, and a raft of features. For more information about ISYS Version 9, click here. I have been critical of search systems for more than two decades. ISYS Search Software engineers’ have listened to me, and I know from experience that the team in Crow’s Nest and in Denver have a long term commitment to their customers and implementing useful features with each release.

Highly recommended. More information about ISYS Search Software is at http://www.isys-search.com/

Stephen Arnold, March 24, 2009

Microsoft Load Balancing: All or Nothing

March 24, 2009

I found this technical tip quite interesting. I am not a fan of either – or approaches. Your mileage may vary. Click here to read “Windows Network Load Balancing – Don’t run with the defaults!” from the At Scale Web log. The topic is an important one, load balancing. The tip is to disable the affinity mode, which is the default. At Scale recommends that we  “set every single one of your servers in the NLB cluster to non-affinity mode and you are golden.” Either – or. Let me know if it works. I prefer a different architecture with good old dedicated load balancers, but I am an addled goose with little tolerance for some of the Microsoft software load balancing excitement.

Stephen Arnold, March 25, 2009

Big Media to Google, Make Us Number One Again

March 23, 2009

I enjoyed Steve Rubel’s article “Media Companies Ask Google to Favor Their Content Over Blogs” here. He presented the argument originally set forth in Ad Age and some some useful comments. For me, the most interesting was:

A neutral Google is a good Google. They should continue to deliver an algorithm that rewards the highest quality sources that have earned a following, interest and links from other sources. If the media companies don’t want Google to favor bloggers, why not just stop linking to them or use no follow tag? That may over time, erode their Google Juice. However, I suspect most realize it’s too late to put the genie back in the bottle.

Neutral? Hmmm.

However, I wanted to ask several questions. I don’t want to forget them:

  1. Why are the big media outfits so confident that their information should be at the top of a Google results list? When I run a query about Google or Microsoft technology, I skip big media write ups and look for solid information in technical papers or from specialist sources.
  2. What’s driving this proposal at this time? My hunch is that after a decade of ignoring Google and even longer hoping that online would behave like information on paper or on 1950s broadcast TV, the big media folks realize that they are marginalized. The savior is Google. Google is not a religion, so why not pay Google for placement?
  3. How can informed people perceive Google as objective?: Run a query for enterprise search. Who is at the top of the results list? Why is this entry at the top of the results list? Why are pointers to my Google patent search buried in the Google search results?  I must admit that the notion that Google is objective is a novel one to me.

I liked Mr. Rubel’s analysis for the most part. I think his write up will spark a number of comments.

Stephen Arnold, March 23, 2009

The Guardian’s Observer Sees Trickiness in Google

March 23, 2009

I am an addled goose and my prose does not flow trippingly on the tongue. I do enjoy British humor. If I did not, I would not have been able to sustain my 25 year friendship with my British publisher.

If you enjoy the Oxbridge way, you will find some enjoyment in Robert McCrum’s “Is Google Committing Theft – or Ushering in a Bright New Age?” here.

Poor Googzilla. Its alleged copyright transgressions continue to provide ammunition to its critics. But Google gets off with a slap on its snout. Mr. McCrum targets “a pop academic” named James Boyle. He was, if I am understanding Mr. McCrum’s argument, used as an example of “a rallying cry to the Googletariat: ‘Nerds of the world unite, you have nothing to lose but your urls.’

I quite like the “nerds of the world” phrase. I don’t care too much for Googletariat. I hope Mr. Boyle experiences joy when he realizes that his book The Public Domain is the focus of Mr. McCrum’s wit.

The only issue I have with Mr. McCrum’s write up is that he works for a dead tree outfit that may face some challenges that wit cannot resolve. I suppose the financial mavens at the Guardian / Observer could use Google to look for some ideas. Well, maybe not? A manual search of the shelves in the basement of Blackwell’s in Oxford would probably be more comfortable.

Stephen Arnold, March 22, 2009

Intelligenx Powers Mapaspublicar.com

March 23, 2009

Intelligenx and Publicar SA recently launched http://www.mapaspublicar.com, a new local search application that focuses on the country of Colombia. Publicar, a leading multimedia content provider in Latin America, is using Intelligenx’s search platform with advanced mapping features and Google API to serve up 3,000 points of interest (e.g., museums, hospitals, schools) and detailed information about businesses in Colombia. The new web site functions like Google Maps to search for Colombia locations, all through a Spanish language interface. It’s another step by Intelligenx, http://www.intelligenx.com, which has more than 10 years’ experience in local search advertising, to create business with more directory publishers and local search providers. The plan is to expand coverage of mapaspublicar.com to eight countries in Latin America.

Jessica W. Bratcher, March 22, 2009

Apple Google: The Arrogant to Teach the Arrogant

March 23, 2009

I found the CNet article “What Google Should Learn from Apple” here amusing. In my opinion, some of the CNet articles are he-said she-saids. Mr. Matyszczyk’s approach appeals to me. He used a Google arts and crafts person’s resignation as a spring board to this idea: Data crushes artistic instinct. I think he’s was right. For me, the most interesting observation in the article was:

The fact is that human beings are astoundingly, depressingly, maddeningly human. Which makes them irrational, contradictory, capricious and, sometimes, just plain nuts. These aspects are the hardest for engineers to get their talents around because, one hopes, they are impossible for engineers to get their talents around.

I have just one thought which I wish to capture. Is this the arrogant leading the arrogant?

Stephen Arnold, March 22, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta