Google Base Tip

April 23, 2009

Google Base is not widely known among the suits who prowl up and down Madison Avenue. For those who are familiar with Google Base, the system is a portent of Googzilla’s data management capabilities. You can explore the system here. Ryan Frank’s “Optimizing Your Google Base Feeds” here provides some some useful information for those who have discovered that Google Base is a tool for Google employment ads, real estate, and other types of structured information. Mr. Frank wrote:

It is also important to note that Google Base uses the information from Base listings for more than just Google OneBox results. This data may also be displayed in Google Product Search (previously Froogle), organic search results, Google Maps, Google Image Search and more. That adds up to a variety of exposure your site could potentially receive from a single Google Base listing.

Interesting, right? Read the rest of his post for some useful information about this Google service.

Stephen Arnold, April 23, 2009

Personalized Network Searching: Google after People Search

April 22, 2009

The hounds of the Internet are chasing Google’s “Search for Me on Google”. I can’t add to that outpouring of insight about technology that is exciting today but dated by Google time standards. I can, however, direct your attention to US 7,523,096, “Methods and Systems for Personalized Network Searching.” You can download this patent from the USPTO. The document was published on April 21, 2009, and was filed on December 3, 2003. You may want to read the background of the invention and scan the claims. The diagrams are standard Google fare, leaving much to the reader who must bring an understanding of other Google subsystems to the analysis. To put the Search on Me discussion into context, here’s the abstract for the granted patent, now almost six years old:

Systems and methods for personalized network searching are described. A search engine implements a method comprising receiving a search query, determining a personalized result by searching a personalized search object using the search query, determining a general result by searching a general search object using the search query, and providing a search result for the search query based at least in part on the personalized result and the general result. The search engine may utilize ratings or annotations associated with the previously identified uniform resource locator to locate and sort results.

This is an important invention attributed to Stephen Lawrence and Greg Badros. Both have made substantive contributions to Google in the past. You may want to examine the current people search and then check out the dossier invention that I have written about elsewhere. There are some interesting enhancements to the core dossier technology in the future. My assertion is that Google moves slowly. When these “innovations” roll out, some are surprised. The GOOG leaves big footprints in my experience. Where’s Pathfinder when one needs him?

Stephen Arnold, April 22, 2009

GEFCO and Exalead: Win International Prize for Innovation

April 21, 2009

Congratulations to GEFCO, and by extension, Exalead, for winning the Grand Prix et Trophée de l’innovation prize in recognition of innovation in business information management. The trophy was presented on April 7, 2009, by
CIO-online.com, Le Monde Informatique and IT News Info. There’s a video of the awards here ttp://www.trophees-cio.com/ and a PDF profile of the winners and projects at CIO Online.

A leading European provider of vehicle transport, logistics, and other transportation services, GEFCO earned its award thanks to Exalead, a leader of search based business application solutions and information access in the enterprise and on the web. GEFCO won the CIO-online.com trophy for its new vehicle track and trace service built on Exalead CloudView’s platform (You can read about CloudView here.

GEFCO uses Exalead CloudView to drive a search based application engine and real time operational tools for reporting, query, and analysis of the database of vehicles delivered logistics and spare parts management.

ArnoldIT.com interviewed Paul Doscher, U.S. CEO of Exalead, in January 2009, and Mr. Doscher spoke of their partnership with GEFCO then. He stated:

GEFCO is using Exalead to track their vehicles. GEFCO’s new ‘Track and Trace’ application is built upon Exalead’s flagship platform that offers powerful search functionality and can provide up-to-the-minute information from an extremely large data set. You can read the entire interview on the Search Wizards Speak service here.

Jessica Bratcher, April 21, 2009

Semantic Roll Up: The Effect of Financial Compression

April 21, 2009

A flurry of emails arrived today about the tie up among several companies with good reputations but profiles that are lower than those enjoyed by Autonomy and Endeca. You can read the official news announcement here about the deal among Attensity, Empolis GmbH, and Living-e AG. The conflation is called The Attensity Group. Here’s a snapshot of each company based on the information I ratted out of my files in the midst of new carpet, painting, and hanging new boxer dog pictures:

  • Attensity. Deep text processing. Started in the intel community. Probed marketing. Acted as ring master for the tie up.
  • Empolis GmbH. (Link was dead when I checked it  on April 20, 2009.) A distribution and archiving system and file based content transformation. Orphaned after parent Bertelsmann faced up to the realities facing the dead tree crowd. Now positions itself in knowledge management.
  • Living-e AG. Provides software products that enable efficient information exchange. Web content management, behavior analysis. Founded in 2003 as WebEdition Software GmbH.

The news release refers to the deal as a “market powerhouse”. This is the type of phrase that gets me to push the goslings to the computer terminals to do some company monitoring.

It’s too early for me to make a call about the product line up the company will offer. Should be interesting. Some pundits will make an attempt to presage the future. Not this silly goose. The customers will decide, not the mavens.

Stephen Arnold, April 21, 2009

Google and Guha: The Semantic Steamroller

April 17, 2009

I hear quite a lot about semantic search. I try to provide some color on selected players. By now, you know that I recycle in this Web log, and this article is no exception. The difference is that few people pay much attention to patent documents. In general, these are less popular than a printed dead tree daily paper, but in my opinion quite a bit more exciting. But that’s what makes me an addled goose, and you a reader of free Web log posts.

You will want to snag a copy of US20090100036 from our ever efficient USPTO. Please, read the instructions for running a query on the USPTO system. I don’t provide for free support to public facing, easy to use, elegant interfaces such as that available from the Federal government.

weights 20090100036

The “eyes” of Googzilla. From US20090100036, Figure 21, Cyrus, in case you want to see what your employer is doing these days.

The title of the document is “Methods and Systems for Classifying Search Results to Determine Page Elements” by a gaggle of Googlers, one of whom is Ramanathan Guha. If you read my Google Version 2.0 or the semantic white paper I wrote for Bear Stearns when it was respected and in business, you know that Dr. Guha is a bit of a superstar in my corner of the world. The founder of Epinions.com and a blue chip wizard with credentials (Semantic Web RDF, Babelfish, Open Directory, etc.) that will take away the puffery of newly minted search consultants, Dr. Guha invented, wrote up, and filed five major inventions. These five set forth the Programmable Search Engine. You will have to chase down one of my for fee writings to get more detail about how the PSE meshes with Google’s data management inventions. If you are IBM or Microsoft, you will remind me that patents are products and that Google is not doing anything particularly new. I love those old eight track tapes, don’t you.

The new invention is the work of Tania Bedrax-Weiss, Patrick Riley, Corin Anderson, and Ramanathan Guha. His name is spelled “Ramanthan” in the patent snippet I have. Fish & Richardson, Google’s go-to search patent attorney may have submitted it correctly in October 2007 but it emerged from the USPTO on April 16, 2009, with the spelling error.

The application is a 33 page long document, which is beefy by Google’s standard. Google dearly loves brevity so the invention is pushing into Gone with the Wind length for the GOOG. The Fish & Richardson synopsis said:

This invention relates to determining page elements to display in response to a search. A method embodiment of this invention determines a page element based on a search result. The method includes: (1) determining a set of result classifications based on the search result, wherein each result classification includes a result category and a result score; and (2) determining the page element based on the set of result classifications. In this way, a classification is determined based on a search result and page elements are generated based on the classification. By using the search result, as opposed to just the query, page elements are generated that corresponds to a predominant interpretation of the user’s query within the search results. As result, the page elements may, in most cases, accurately reflect the user’s intent.

Got that? If you did not, you are not alone. The invention makes sense in the context of a number of other Google technical initiatives ranging from the non hierarchical clustering methods to the data management innovations you can spot if you poke around Google Base. I noted classification refinement, snippets, and “signal” weighting. If you are in the health biz, you might want to check out the labels in the figures in the patent application. If you were at my lecture for Houston Wellness, I described some of Google’s health related activities.

On the surface, you may think, “Page parsing. No big deal.” You are not exactly right. Page parsing at Google scale, the method, and the scores complement Google’s “dossier” function about which Sue Feldman and I wrote in our September 2008 IDC client only report. This is IDC paper 213562.

What does a medical information publisher need with those human editors anyway?

Stephen Arnold, April 17, 2009

True Knowledge: Semantic Search System

April 16, 2009

A happy quack to the readers who sent me a link to this ZDNet Web log post called “True Knowledge API Lies at the Heart of Real Business Model” here. I had heard about True Knowledge — The Internet Answer Engine —  a while back, but I tucked away the information until a live system became available. I had heard that the computer scientist spark plug of True Knowledge (William Tunstall-Pedoe) has been working on the technology for about 10 years. The company’s Web site is www.trueknoweldge.com, and it contains some useful information. You can sign up for a beta account, read Web log posts, and get some basic information about the system.

About one year ago, the Financial Times’s Web log here reported:

Another Semantic Web company looking for cash: William Tunstall-Pedoe of True Knowledge says he needs $10m in venture capital to back the next stage of his Cambridge (UK)-based company, which is trying to build a sort of “universal database” on the Web.

In April 2009, the company is raising its profile with an API that allows developers to make Web sites smarter.

image

Interface. © True Knowledge

The company said:

True Knowledge is a pioneer in a new class of Internet search technology that’s aimed at dramatically improving the experience of finding known facts on the Web. Our first service – the True Knowledge Answer Engine – is a major step toward fulfilling a longstanding Internet industry goal: providing consumers with instant answers to complex questions, with a single click.

The company’s proprietary technology allows a user to ask questions and get an answer. Quite a few companies have embraced the “semantic” approach to content processing. The reason is that traditional search engines require that the person with the question find the magic combination that delivers what’s needed. The research done by Martin White and my team, among others, makes clear that about two thirds of the users of a key word search system come away empty handed, annoyed, or both. True Knowledge and other semantic-centric vendors see significant opportunities to improve search and generate revenue.

architecture

Architecture block diagram. © True Knowledge

Paul Miller, the author of the ZDNet article, wrote:

True Knowledge is certainly interesting, and frequently impressive. It remains to be seen whether a Platform proposition will set them firmly on the road to riches, or if they’ll end up finding more success following the same route as Powerset and getting acquired by an existing (enterprise?) search provider.

ZDNet wrote a similar article in July 2007 here. In 2008, Venture Beat here mentioned True Knowledge here in July 2008 in a story that referenced Cuil.com (former Googlers) and Powerset (now part of Microsoft’s search cornucopia). Hakia.com was not mentioned even though at that time in 2008, Hakia.com was ramping up its PR efforts. Venture Beat mentioned Metaweb, another semantic start up that obtained $42 million in 2008, roughly eight times the funding of True Knowledge. (Metaweb’s product is Freebase, an open, shared database of the world’s information. More here.) You will want to read Venture Beat’s April 13, 2009, follow up story about True Knowledge here. This article contains an interesting influence diagram.

I don’t know enough about the appetite of investors for semantic search systems to offer an opinion. What I found interesting was:

  • The company has roots in Cambridge University where computational approaches are much in favor. With Autonomy and Lemur Consulting working in the search sector, Cambridge is emerging as one of the hot spots in search
  • The language and word choice used to describe the system here reminded me of some Google research papers and the work of Janet Widom at Stanford University. If there are some similarities, True Knowledge may be more than a question answering system
  • The company received an infusion of $4.0 million in a second round of funding completed in mid 2008. Octopus Ventures provided an earlier injection of $1.2 million in 2007.
  • The present push is to make the technology available to developers so that the semantic system can be “baked in” to other applications. The notion is a variant of that used in the early days of Verity’s OEM and developer push in the late 1980s. The API account is offered without charge.
  • There’s a True Knowledge Facebook page here.

I recall seeing references to a private beta of the system. I can’t locate my notes from my 2007 trips to the UK, but I think that may have been the first time I heard about the system. I did locate a link to a demo video here, dated late 2007 That video explains that the information is represented in a way “that computers can understand”. I made a note to myself about this because this type of function in 2007 was embodied in the Guha inventions for the Google Programmable Search Engine.

The API allows systems to ask questions. The developer can formulate a query and see the result. Once the developer has the query refined, the True Knowledge system makes it easy for the developer to include the service in another application. The idea, I noted, was to make enterprise software systems smarter. The system performs reasoning and inference. The system generates answers and a reading list. The system can handle short queries, performing accurate disambiguation; that is, figuring out what the user meant.  The system made it possible for a user to provide information to the system, in effect a Wikipedia type of function. The approach is a clever way for the user to teach the True Knowledge system.

RapidMiner: Open Source Data Mining

April 11, 2009

A happy quack to the reader who reminded me that Google Apps supports Java. If you are interested in data mining, you may want to catch up with RapidMiner, an open source data mining system. RapidMiner drinks Java, so you may want to think about ways to make use of Google Apps and RapidMiner. The person who wrote me wanted some information about this idea.

My April 2009 column for KMWorld talks about Google Apps, but I don’t have any information about hooking RapidMininer into Google Apps. In fact, I had not thought about it.

RapidMiner is “the world-wide leading open-source data mining solution due to the combination of its leading-edge technologies and its functional range. Applications of RapidMiner cover a wide range of real-world data mining tasks.” There is an enterprise version plus consulting services available.

You can download the RapidMiner community edition here. The documentation is quite good. You can snag a copy of those documents here. The community edition offers a number of features, and it is extensible. Here’s an example of a data output from RapidMiner:

rapidminer

You can find a useful discussion by Michael Wurst of the open source version at Nemoz.org here. This write up provides some useful examples that show one way to hook RapidMiner into a Java application. What is quite useful is the code sample for using the text classifier on a chunk of text. RapidMiner’s classification component is called RapidMinerTextClassifier.

There are some limitations to the Google Apps implementation of Java, but I think the person who wrote me has an interesting idea. The notion of combining sophisticated RapidMiner oiperations with the Google Apps struck me as interesting. If you have any interesting examples of this type of hybridization, use the comments section of this Web log to pass along the information.

Stephen Arnold, April 11, 2009

Cirilab: Entity Extraction

April 6, 2009

I took a quick look at Cirilab in order to update my files about entity extraction vendors.

Cirilab develops practical search, retrieval and categorization software designed to increase organizational productivity by effectively harnessing key knowledge resources. Cirilab offers a range of advanced analysis and organization applications and tools.

I learned about the company when another consultant sent me links to several online demonstrations of the Cirilab’s technology. I located an older but useful discussion of the Crilab technology here. You can explore a Wikipedia entry about Winston Churchill here and a document navigator of Sir Winston’s writings here. The engine generating these demos is called the KGE or Knowledge Generation. The idea is that KGE can process unstructured text and generate insights into that text.

crilab

Source: http://www.cirilab.com/TSMAP/Cirilab_Library/Literature/Winston_Churchill/WikiKMapPage/index.htm

The company’s enterprise solutions include vertical builds of the KGE:

  • Publishing. The Web Ready Publishing service allows an organization to take unstructured data in WordPerfect, Word, Adobe PDF, HTML, and even Text files, and publish it in a Web Ready Publishing format so that it is instantly available to your customers in a thematically navigable format.
  • Pharma. Cirilab can “read” the documents and therefore allow “mining” of existing data.
  • Legal. KGE permits discovery of information.
  • Security and intelligence. Cirilab products provide unique insights into this information not otherwise available.

The company offers a range of desktop products. These are excellent ways to learn about the features and functions of the Crilab’s KGE system.

More recently, Cirilab has succeeded in developing and bringing to market a core suite of technologies known as KOS (Knowledge Object Suite) based on its Multidimensional Semantic Spatial Indexing Technology.

You can register and receive a free, thematic map of your Web site. The company is located in Ottawa, Ontario. You can get more information here.

Stephen Arnold, April 6, 2009

Google Leximo Tie Up

April 2, 2009

Leximo is a social dictionary; specifically, “a Multilingual User Collaborated Dictionary that lets you search, discover and share your words with the World.” Google snapped up the company. You can read the Leximo manifesto here. One of the tenets is:

Open community-based and user-friendly functions promote participation, accountability and trust.

What’s Google need a dictionary for? In my opinion, the GOOG wants a flow of new words plus definitions to fatten up its existing knowledgebases. I am confident the idealism of Leximo will persist at the GOOG.

Stephen Arnold, April 2, 2009

Coveo Lands a Whale-Sized Search Deal

March 31, 2009

Bell Mobility and Coveo, a leading provider of information access and search solutions for the enterprise, announced an exclusive next-generation search and access tool called Enterprise Search from Bell.

Powered by Coveo’s patented search and index technology, Enterprise Search from Bell offers business clients comprehensive search capability on their BlackBerry smartphones, including full mobile access to information contained within their Microsoft Exchange server accounts and even across their entire corporate information technology systems.

Wade Oosterman, President of Bell Mobility and Chief Brand Officer for Bell, said:

Enterprise Search from Bell is the only mobile business tool available that provides clients with such instant mobile access to critical enterprise content via their BlackBerry devices. Our partnership with Coveo is another example of Bell’s dedication to delivering data solutions that meet the evolving needs of mobile business. Enterprise Search from Bell provides business users with the ability to securely search and retrieve any information within their Microsoft Exchange server accounts, including emails, attachments content, calendars, tasks and contacts via their BlackBerry devices. With this capability, users can access the precise information they need within seconds, when they need it, without having to know where the document was previously stored.

Louis Tetu, Coveo’s Executive Chairman, said:

Search technology is by far one of the most promising information technology investments for enabling workforce productivity across the enterprise. Mobile search solutions provide employees with all the relevant information they need from any location. In contributing our expertise to Bell Mobility, we are helping to drive innovation in their mobile business solutions, which is part of our initiative to be a player in the ‘smartphones for business revolution’ market trend.

The service features Coveo’s user-friendly interface. The new enterprise search service from Bell offers numerous possibilities to business users who work with large amounts of rapidly changing information. Executives, sales professionals, account managers, professional services providers, customer care, call center representatives, IT administrators, project managers and human resources, legal and engineering professionals are provided instant access to the information they require to perform their roles more effectively. As a result, this drives improved productivity and enables higher levels of self-service across the workforce.

Bell Mobility’s partnership with Coveo is unique in the market as it combines the mass distribution reach of a market-leading national mobile carrier with an innovative enterprise search solution to offer customers in all areas of Canada the ability to drive better business performance across their workforce in a rapid and economical deployment model.

For further information on Enterprise Search from Bell click here. For more information about Coveo, click here. To read the interview with Laurent Simoneau, the search expert driving Coveo’s technology, click here.

A happy quack to the Coveo team from the goslings in Harrod’s Creek.

Stephen Arnold, March 31, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta