Thetus Savanna

November 9, 2009

Directions Magazine published “Thetus Unveils the Savanna Analysis Solution”. Thetus describes itself as “a pioneer of semantic knowledge modeling and discovery software”. The Savanna product, according to the company:

… provides users with a model-centric environment that is optimized for analysis involving multiple perspectives, confidence and detailed lineage tracking. The solution provides extension points at every level of the architecture, allowing customers to adapt models, analysis tasks and user experience to meet their individual needs.

The Savanna technology uses flexible knowledge models uniquely suited to cultural, geo-cultural and Human Terrain analysis. The Savanna framework includes out-of-the-box connectors to leading providers of content management, entity extraction, geospatial analysis and temporal analysis products including MarkLogic, Janya, MetaCarta, and ESRI. These integrations deliver a new level of deployment speed and ease to customers and enable Savanna to address a broad range of structured and unstructured data typical of today’s intelligence process.

For more information, navigate to www.thetus.com.

A freebie, pure and simple. Grrr.

LexisNexis Jumps on Semantic Bandwagon

October 15, 2009

Pure Discovery, a Dallas based search and content processing company, has landed a mid-sized tuna, LexisNexis. Owned by publishing giant Reed Elsevier, LexisNexis faces some strong downstream water. The $1 billion plus operation is paddling its dugout canoe upstream. Government agencies, outfits like Gov Resources, and the Google are offering products and services that address the squeals from law firms. What is the cause of the legal eagle squeaks? The cost of running searches on the commercial online services like LexisNexis and Westlaw, among others like Questel. Clients are putting caps on some law firm expenditures. Even white shoe outfits in New York and Chicago are feeling the pinch.

I saw one short news item about this tie up in an article in Search Engine Watch.

Patent searching is a particularly exciting field of investigation. If you click over to the responsive USPTO, you can search patents for free. Tip: Print out the search hints before you begin. I am not sure who is responsible for this wonderful search system, but it is a wonder.

Semantic technology along with other sophisticated content processing tools can make life a little – notice the word “little” – easier for those conducting patent research. Even the patent examiners have to use third party systems because the corpus of the USPTO is a bit like a buggy without a horse in my opinion.

The company that LexisNexis tapped to provide its semantic technology is Pure Discovery in Dallas, Texas. I had one reference to the firm in my Overflight service and that was to an individual named Adam Keys, Twitter name therealadam. Mr. Keys left Pure Discovery in 2006 after two years at the company. I had a handwritten note to the effect that venture funding was provided in part by Zon Capital Partners in Princeton, New Jersey. I have little detail about how the Pure Discovery system works.

Here’s a description of the company I pulled from Zon’s Web site:

Pure Discovery (Dallas, TX) has developed enterprise semantic web software. Its offering combines automated semantic discovery with a peer networking architecture to transform static networks into dynamic ecosystems for knowledge discovery.

I snagged a few items from the firm’s Web site.

The product line up consists of KnowledgeGraph products. These include the PD BrainLibrary (“BrainLibrary is a breakthrough technology that harnesses the collective intelligence of organizations and their people in ways that have never been possible before), PD Transparent Concept Search (“PD Concept Search has completely removed the top off the black box and for the first time ever, users are not only able to see what has been learned by the system, but also use our QueryCloud application to control it.”), PD QueryCloud Visual Query Generator (“QueryCloud then lets users control what terms or phrases are used, not used, emphasized or de-emphasized. All with the simple click of a button.”), PD Clustering (“D Clustering dynamically orders similar documents into clusters enabling users to browse data by semantically related groups rather than looking at each individual document. PD Clustering is fast enough to cluster even the largest of document populations with a benchmark of over 80 million pages clustered in a 48 hr period on a single machine.”), and PD Near-Dupe Identification (“PureDiscovery’s Near-Dedupe Identification Engine provides instant value to any application by detecting and grouping near duplicate documents. Identifying documents with these slight variances results in dramatic savings in time wasted looking at the same document again and again.”) This information is from the Pure Discovery Web site here.

The company also offers its Transparent Concept Search Query Cloud.

The software is available for specific vertical markets and niches; for example, litigation support, “human capital management” (maybe human resources or knowledge management?), intellectual property, and homeland security and defense.

These are sophisticated functions. I look forward to examining the LexisNexis patent documents using this new tool. Perhaps LexisNexis has found a software bullet to kill the beasties chewing into its core business. If not, LexisNexis will face that rushing torrent without a paddle.

As more information flows to me, I will update this write up.

Stephen Arnold, October 15, 2009
I wrote this short post without so much as a thank you from anyone.

Predictive Content from DailyPerfect

October 13, 2009

A happy quack to the reader who sent me a link to DailyPerfect. The company says:

[DailyPerfect] is a showcase for our innovative personalization technology, which can predict a user’s interests through an automated semantic analysis of publicly available information. Our predictive content engine will generate a personalized news feed customized just for you.

The company profile says:

This news site is a showcase for our innovative personalization technology, which is able to predict a user’s interests through an automated semantic analysis of publicly available information on the web and minimal or no input from the user. Yes, it’s pretty clever. You can find out a little more about how we do this right here. We have been operating in stealth/closed-beta mode for over a year and are pleased to be opening our flagship news site to the public. Give it a try here, and let us know what you think! The DailyPerfect project was initiated by Ambient Sound Investments (ASI, www.asi.ee) and Curonia Research (www.curonia.com). After being hatched at ASI’s Incubator, DailyPerfect has attracted a group of visionaries, startup veterans and developers to commercialize our progressive approach to behavioral targeting. The team is led by our CEO Louis Kanganis, Co-Founder Asko Seeba – the former Engineering Manager at Skype – and Co-Founder and CTO Ahti Heinla – a partner at Ambient Sound Investments and the former Lead Architect at Skype.

One application of the service is to use it to create custom news feeds.

Stephen Arnold, October 13, 2009

Google Lifts Free Translation to a New Level

October 1, 2009

Here in Harrod’s Creek, a foreign language is a person who speaks with a New York accent. For other folks, a foreign language means information on a Web site presented in a language other than the one mummy and daddy spoke. Google has announced on its corporate Web log that Google’s quite good translation system has flexed its muscles. You should read the Google’s own words in “Translate Your Web Site with Google: Expand Your Audience Globally”. The basic idea is simple, but the scale and scope are Googley. The statement revealed:

Today, we’re happy to announce a new Web site translator gadget powered by Google Translate that enables you to make your site’s content available in 51 languages. Now, when people visit your page, if their language (as determined by their browser settings) is different than the language of your page, they’ll be prompted to automatically translate the page into their own language. If the visitor’s language is the same as the language of your page, no translation banner will appear.

Nifty. Three comments:

  1. There may be some bugs in Google Apps that are getting hammered each day, but the Google is obviously prepared to direct some computational horsepower at translation of lots of stuff
  2. The Yahoo translation service looks like a 90 pound weakling and the for fee services are going to have to do some creating thinking. The Google can marginalize the market leaders in translation software with a bit flip in my opinion.
  3. The service unlocks content that has been mostly inaccessible to me. I am a happy goose.

The service is not perfect.

  • Cnet points out that the service is a quick gist.
  • Search Engine Journal reports that a user must paste a snippet of code in his / her Web site.
  • Silicon Taps reveals that Google supports just 51 languages.

What about Microsoft? I suppose nifty graphics and more UX will be sufficient to close the gap between Bing.com and Google.com. Well, maybe not close the gap that much.

Stephen Arnold, October 1, 2009

Google Base on Death Row?

September 30, 2009

Garrett Rogers’ “Google Base No Longer for Products” reported that changes are underway for a Google service that few people know about. He said:

It’s hard to tell if Google is actually thinking of pulling the plug on Google Base or not — I’m thinking they are going to let it die a slow death. The reason I think that is because they are actually replacing their Google Base Blog with the Google Merchant Blog.

Google Base has been an interesting beta. If you have not explored the service, you may want to hurry. If Mr. Rogers is correct, the employment ads, the mixed bag of content, and the real estate listings may be removed from the service.

In my opinion, Google Base made clear some interesting Google functions; for example, ability to ingest content and place it into one of Google’s data management systems. Some of the features of Google Base struck me as providing test beds for specific data processing functions. The user was not the focal point of Google Base.

My view is that Google Base might be on death row, but its underlying technology is alive and changing to fit into the rapidly evolving dataspace functionality the Google has been working on since the company made a strategic acquisition in 2006. I can hear now, “All you base are belong to us.” Do you?

Stephen Arnold, September 30, 2009

Exclusive Interview with SurfRay President

September 29, 2009

SurfRay has come roaring back into the search and content processing sector. SurfRay, like many other companies, had to tighten its belt and straighten its tie in the wake of the global financial turmoil. With the release of new versions of Ontolica (a search system that snaps into SharePoint) and MondoSearch (a platform independent Web site search solution), SurfRay is making sales again. ArnoldIT.com spoke with Søren Pallesen about the firm’s new products. In a wide ranging interview, Mr. Pallesen, a former Gartner Group consultant, said:

SurfRay’s mission is to deliver tightly packaged search software solutions for businesses to provide effective search for both internal and external users. With Packaged we mean easy to try, install and use. Our vision is to be our customer’s first choice for packaged enterprise search solutions and to become the world’s third largest search solution provider in the world measured on number of paying business customers by 2012. The last six months have been an exciting time for SurfRay. I took over as CEO; we significantly increased investment in product development and an ambitious expansion of the organization. This has paid off. SurfRay is profitable, and we have released new versions of our core products. Ontolica is now in version 4.0, including a new suite of reporting and analytics, and MondoSearch 5.4 is in beta for a Q4 release. As a profitable company we are in the fortunate position to be able to fund our own growth and we are expanding in North America among other by hiring more sales people as well as formation of a Search Expert Center in Vancouver, Canada that will serve customers across the Americas. We are also expanding in Europe most recently with formation of SurfRay UK and Ireland, allowing us to expand sales and support with local people on the ground in this important European market.

When asked about the difference between MondoSearch and Ontolica, Mr. Pallesen told me:

Customers that buy our products typically fall into a number of usage scenarios. Simply put Ontolica solves search problems inside the firewall and MondoSearch outside the firewall. Firstly customers with SharePoint implementations look for enhanced search functionality, and turn to our Ontolica for SharePoint product. Secondly, businesses that do not use SharePoint but have the need for an internal search solution on an intranet, file servers, across email, applications and other sources buy Ontolica Express and use it in combination with Microsoft Search Server Express for simple single server installation or Micro Search Server for multiple load balanced server installations. Thirdly, customers with the need for robust and highly configurable web site search buy MondoSearch. Especially popular with businesses that want to implement up- and cross selling on their search results page.

You can read the full text of the interview in the Search Wizards Speak series on ArnoldIT.com. For more information about SurfRay, visit the company’s revamped Web site at http://www.surfray.com.

Stephen Arnold, September 29, 2009

Google on Path to Becoming the Internet

September 28, 2009

I thought I made Google’s intent clear in Google Version 2.0. The company provides a user with access to content within the Google index. The inventions reviewed briefly in The Google Legacy and in greater detail in Google Version 2.0 explain that information within the Google data management system can be sliced, diced, remixed, and output as new information objects. The analogy is similar to what an MBA does at Booz, McKinsey, or any other rental firm for semi-wizards. Intakes become high value outputs. I was delighted to read Erick Schonfeld’s “With Google Places, Concerns Rise that Google Just Wants to Link to Its Own Content.” The story makes clear that folks are now beginning to see that Google is a digital Gutenberg and is a different type of information company. Mr. Schonfeld wrote:

The concerns arise, however, back on Google’s main search page, where Google is indexing these Places pages. Since Google controls its own search index, it can push Google Places more prominently if it so desires. There isn’t a heck of a lot of evidence that Google is doing this yet, but the mere fact that Google is indexing these Places pages has the SEO world in a tizzy. And Google is indexing them, despite assurances to the contrary. If you do a search for the Burdick Chocolate Cafe in Boston, for instance, the Google Places page is the sixth result, above results from Yelp, Yahoo Travel, and New York Times Travel. This wouldn’t be so bad if Google wasn’t already linking to itself in the top “one Box” result, which shows a detail from Google Maps. So within the top ten results, two of them link back to Google content.

Directories are variants of vertical search. Google is much more than rich directory listings.

Let me give one example, and you are welcome to snag a copy of my three Google monographs for more examples.

Consider a deal between Google and a mobile telephone company. The users of the mobile telco’s service run a query. The deal makes it possible for the telco to use the content in the Google system. No query goes into the “world beyond Google”. The reason is that Google and the telco gain control over latency, content, and advertising. This makes sense. Let’s assume that this is a deal that Google crafts with an outfit like T Mobile. Remember: this is a hypothetical example. When I use my T Mobile device to get access to the T Mobile Internet service, the content comes from Google with its caches, distributed data centers, and proprietary methods for speeding results to a device. In this example, as a user, I just want fast access to content that is pretty routine; for example, traffic, weather, flight schedules. I don’t do much heavy lifting from my flakey BlackBerry or old person hostile iPhone / iTouch device. Google uses its magical ability to predict, slice, and dice to put what I want in my personal queue so it is ready before I know I need the info. Think “I am feeling doubly lucky”, a “real” patent application by the way. T Mobile wins. The user wins. The Google wins. The stuff not in the Google system loses.

Interesting? I think so. But the system goes well beyond directory listings. I have been writing about Dr. Guha, Simon Tong, Jeff Dean, and the Halevy team for a while. The inventions, systems and methods from this group have revolutionized information access in ways that reach well beyond local directory listings.

The Google has been pecking away for 11 years and I am pleased that some influential journalists / analysts are beginning to see the shape of the world’s first trans national information access company. Google is the digital Gutenberg and well into the process of moving info and data into a hyper state. Google is becoming the Internet. If one is not “in” Google, one may not exist for a certain sector of the Google user community. Googleo ergo sum.

Stephen Arnold, September 28, 2009

Yebol Web Search: Semantics, Facets, and More

September 28, 2009

Do We Really Need Another Search Engine?” is an article about Yebol. Yebol is another search engine. The write up included this description of the new system:

According to its developers, “Yebol utilizes a combination of patented algorithms paired with human knowledge to build a Web directory for each query and each user.  Instead of the common ‘listing’ of Web search queries, Yebol automatically clusters and categorizes search terms, Web sites, pages and contents.” What this actually means is that Yebol uses a combination of methods – web crawlers and algorithms combined with human intelligence – to produce a “homepage” for each and every search query. For example, search Bell Canada in Yebol and, instead of a Google-style listing of results, you’re presented with a “homepage” that provides details about Bell’s various enterprises, executives, competitors as well as a host of other information including recent Tweets that mention Bell.

The site at http://www.yebol.com includes the phrase “knowledge based smart search.” I ran a query for Google and received a wealth of information: links, facets, hot links to Google Maps, etc.

yebol results

My search for dataspace, on the other hand, was not particularly useful. I anticipate that the service will become more robust in the months ahead.

The PC World write up about Yebol said:

At launch, Yebol can provide categorized results for more than 10 million search terms. According to the company it intends to provide results for ‘every conceivable search term’ in the next three to six months.

The founder is Hongfeng Yin, was a senior data mining researcher at Yahoo! Data Mining Research team, where he built the core behavioral targeting technologies and products which generate multi-hundred millions revenue. Prior to Yahoo, he was a software manager and Sr. staff software engineer with KLA-Tencor. He worked several years on noetic sciences and human think theory with professor Dai Ruwei and professor Tsien Hsue-shen (Qian Xuesen) at Chinese Academy of Sciences. He has a Ph.D. in Computer Science from Concordia University, Canada and Master degree from Huazhong University of Science and Technology, China. Hongfeng has multiple patents on search engine, behavioral targeting and contextual targeting.

The Yebol launch news release is here. The challenge will be to deliver a useful service without running out of cash. The use of patented algorithms is a positive. Combining these recipes with human knowledge can be tricky and potentially expensive.

Stephen Arnold, September 28, 2009

Guha’s Most Recent Patent: Enhanced Third Party Control

September 24, 2009

I am a big fan of Ramanathan Guha’s engineering. From his work on the Programmable Search Engine in 2007 to this most recent invention, he adds some zip to Google’s impressive arsenal of smart methods. You may want to take a look at US 7,593,939, filed in March 2007, a few weeks after his five PSE inventions went to the ever efficient USPTO. This invention “Generating Specialized Search Results in Response to Patterned Queries”

Third party content providers can specify parameters for generating specialized search results in response to queries matching specific patterns. In this way, a generic search website can be enhanced to provide specialized search results to subscribed users. In one embodiment, these specialized results appear on a given user’s result pages only when the user has subscribed to the enhancements from that particular content provider, so that users can tailor their search experience and see results that are more likely to be of interest to them. In other embodiments the specialized results are available to all users.

What I find interesting is that this particular method nudges the ball forward for third party content providers so certain users can obtain information enhancements. The system makes use of Google’s “trust server,” answers questions, and generates a new type of top result for a query. The invention provides additional color for Dr. Guha’s semantic systems and methods which nest comfortably within the broader dataspace inventions discussed at length in Google: The Digital Gutenberg. For a more detailed explanation of the invention, you can download the open source document from the USPTO or another US patent provider. When will Google make a “Go Guha” T shirt available. Oh, for those of you new to my less-than-clear explanation of Google’s technology, you can find the context for this third party aspect of Google’s PSE and publishing / repurposing semantic system in my Google Version 2.0, just click on Arnold’s Google studies. This invention makes explicit the type of outputs a user may receive from the exemplary system referenced in this open source document. This invention is more substantive than “eye candy” user experience as defined by Microsoft and light years ahead of the Yahoo “interface” refresh I saw this morning. The Google pushes ahead in search technology as others chase.

Stephen Arnold, September 23, 2009

Two Additions to Euro Search Vendor List

September 22, 2009

Readers have continued to shoot buckshot at my list of European search vendors. I appreciate the input and I am adding two vendors to the list.

The first is Exorbyte. The second is Silobreaker.

Exorbyte, founded in 2000, is a privately-held company. The firm is based in Switzerland, not far from Zurich. The firm says that its search technology is focused on “high-performance approximate search and data matching solutions for online ecommerce, directories and data quality applications.” The company offers Web extraction functions as part of its technology suite. The search function complements the firm’s navigation features to support database, directory, and catalog search. More information is available from the firm’s Web site.

Silobreaker, a company I have written about in my studies and in this Web log, continues to gain features and functions. The firm’s search system is speedy, but what sets the company apart is its ability to generate relationship maps, display data on topics in actionable reports, and widgets that make it easy to add specific Silobreaker functions to third –party applications or customized implementations of the Silobreaker system. The company told me:

Silobreaker is a search service for news and current affairs that aims to provide more relevant results to the user than what traditional search and aggregation engines have been offering so far. Instead of returning just lists of articles matching a search query, Silobreaker finds people, companies, organizations, topics, places and keywords; understands how they relate to each other in the news flow, and puts them in context through graphical results in its intuitive user interface.

More information is available from the Silobreaker Web site.

The vendor table addition rows are:

Vendor Function Opinion
Exorbyte Ecommerce and database search The firm has a strong following for database and directory search. Blue chip clients.
Silobreaker Search plus intelligence analysis The company’s system processes content in real time and generates actionable reports on people, events, or concepts.

Let me know of other vendors to include on this list.

Stephen Arnold, September 22, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta