Semantic Universe Sighted

January 31, 2009

A happy quack to the reader who alerted me to this Yahoo News story here about the Semantic Universe. According to Tony Shaw, Editor of Semantic Universe Network, “The semantic community needs a vehicle to communicate the comprehensive business applications and benefits of semantic technology, as well as a better way to connect developers, customers, entrepreneurs and investors. Semantic Universe Network will be that vehicle.”  Sponsorship opportunities are available too. You can get additional information from the Web site here. The goslings at Beyond Search wish the information service well.

Stephen Arnold, January 31, 2009

Exalead: Moving the Front Line

January 26, 2009

A happy quack to the reader in California who sent me an update on Exalead. In the last 10 days, I have received a steady flow of news. The company continues to make headway in the US market.

exalead logo

The company has announced CloudView OEM Edition 5.0. This is a version of the product that can be embedded in third- party applications. The product has been designed for independent software vendors and software as a service providers. The OEM edition includes performance improvements with tweaks to make embedding easier and quicker. The product, as I understand it, can be used to add search and sophisticated content processing functions to email, CMS, call center, and other information centric applications.

Paul Doscher, CEO of Exalead said:

As the use of traditional Web and Web 2.0 technologies including wikis, instant messaging, social networking, and collaboration has proliferated within the enterprise, users have come to expect the same simplicity, speed, and scale from their enterprise software providers. The challenge for ISVs is to provide that same experience in their search capabilities without sacrificing the security and precision required for enterprise use. Exalead CloudView OEM Edition helps them deliver on that challenge.

(Note: you can read an exclusive January 2008 Beyond Search interview with Mr. Doscher here.)

Features of the new product include:

  • Ability to deal with petabytes of data
  • Aggregation, collation, and normalization of data from disparate structured and unstructured sources; for example, HTML, Microsoft Office documents and other files scattered across corporate servers, data located at SaaS providers, active and archived e-mail, relational data, proprietary application data, etc.
  • Support for fuzzy and precise relevancy
  • Small CPU and disk footprints
  • Scalability to handle spikes
  • High peak user concurrency
  • Support for existing interfaces, security models, and data source
  • Multi language support.

In my April 2008 Gilbane Group report Beyond Search I highlighted Exalead’s architectural advantage. Based on my research, Exalead and Google tackle scaling and performance in somewhat similar ways. (Note: the founder of Exalead was a senior AltaVista.com engineer. You can read an interview with François Bourdoncle here.)

Read more

Word and C#: Now That’s User Friendly

January 25, 2009

This item is not directly about search, but it has to do with content creation. Close enough. The headline that stopped me in my web footed tracks was, “Take the Pain Out of Creating Word Documents by Using C# and XML” here. To be a little fair, Chris Bennett is writing to developers. But the inclusion of a reference to Microsoft Word almost guarantees that non developers will see his write up. The idea is to put a chunk of code on a server that converts whatever is on a Web page to a Word file format. He provides a clear explanation. I particularly liked the detail for the “transformation method” but you may not be as keen on reading scripts as I. You can convert an XML into a Word document using XSLT. My thought was provide the XHTML or XML and a link that spits out a PDF. You will see this approach in action when my new Google patent search system becomes available. One of the new vendors of super fast search technology will provide the stallion for my service. Watch for more details. We’re shooting to release in the first week of February 2009. No C# required either. Free service. More Google open source information. With world finally discovering Ramanathan Guha, I thought it would be useful to provide access to documents for a method that’s approaching the age of four. We don’t want the pundits to rush too quickly toward understanding Google. That effort would take time away from figuring out how to convert a Web page to a Word file.

Stephen Arnold, January 25, 2009

Ah, Ha, Guha: Semantics Are Coming

January 23, 2009

One of my three or four readers pointed out that Google Watch here ran “Google CEO Hints at Semantic, Contextual Search” here. What’s interesting is that the pundits and mavens are finally realizing that the GOOG wants to be the Semantic Web. It is old news. The core technology was disclosed by Google in February 2007. I covered it in detail in Google Version 2.0. And in 2007 BearStearns’ recycled some of my information in an analyst’s note that dived into the financial implications of the Google Semantic Web play. If you want to know the nitty gritty of the Google Semantic Web play, snag a copy of Google Version 2.0 here. The study, published in mid 2007, remains timely. There’s a reason I chose this cover for the monograph. That’s a black panther that’s tough to see clearly. Apt metaphor for Google’s play in the Semantic Web space in my opinion.,

image

Stephen Arnold, January 23, 2009

New Google Study Announced

January 21, 2009

In July 2007, I vowed, “No more Google studies.” I was tired. Now I am just about finished with my third analysis of Google’s technology and business strategy. The two are intertwined. My publisher (Harry Collier, Infonortics Ltd.) has posted some preliminary information here about the forthcoming monograph, Google: The Digital Gutenberg. If you are curious how a Web search engine can be a digital Gutenberg, you will find this analysis of Google’s newest information technology useful. None of the information in this monograph has appeared in the more than 1,200 posts on this Web log, in my two previous Google studies, nor in my more than 200 publicly available articles, columns, and talks.

In short, the monograph will contain new information.

If you are involved in traditional media as a distributor, producer, content creator, aggregator, reseller, indexer, or user–you will find the monograph useful. You may get a business idea or two. If you are the nervous type, the monograph will give some ideas on which to chew. This study represents more than one year of research and analysis. I don’t pay much attention to the received wisdom about Google. I do focus almost exclusively on the open source information about Google’s technology using journal articles, presentations, and patent documents. The result is a look at Google that is quite different from the Google is an advertising agency approach that continues to dominate discourse. Even the recent chatter about Google’s semantic technology is old hat if you read my previous Google monographs. In short, I think this third study provides a solid look at what Google will be unveiling in the period between mid 2009 and the end of 2010. Here are the links to my two earlier studies.

  • The Google Legacy. Describes how Google’s search system became an application platform. You know this today, but my analysis appeared in early 2005.
  • Google Version 2.0. Explores Google’s semantic technology and the company’s innovations that greased the skids for applications, enterprise solutions, and disintermediation of commercial database publishers. A recent podcast broke the old news just a few days ago. Suffice it to say that most pundits were unaware of the scope and scale of Google’s semantic innovations. Cluelessness is reassuring, just not helpful when trying to assess a competitive threat in my opinion.

I don’t have the energy to think about a fourth Google study, but this trilogy does provide a reasonably comprehensive view of Google’s technical infrastructure. I know from feedback from Googlers that the information about some of Google’s advanced technology is not widely known among Google’s rank and file employees. Google’s top wizards know, but these folks are generally not too descriptive about Google’s competitive strengths. Most pundits are happy to get a Google mouse pad or maybe a Google baseball hat. Not me. I track the nitty gritty and look past the glow of the lava lamps. I don’t even like Odwalla strawberry banana juice.

Stephen Arnold, January 21, 2009

Kosmix: YAGK (Yet Another Google Killer)

January 20, 2009

Kosmix like Cuil.com has some fibrous tendrils that connect to the Google. Not surprisingly, the Kosmix system does not tackle the Google head on. Think of Kosmix as an automated portal for information. When I visit the site, I see what’s new, I have “hot” topic to click and explore. I have trends. I have videos. In short, I get search without search. There is a search box, and it works reasonably well.

kosmix splash

Kosmix splash page. An information portal for the 21st century.

One of the wizards behind Kosmix is Anand Rajaraman, who has considerable visibility in the Silicon Valley technology world. I have followed his Web log posts because he has demonstrated keen insight into the technical activities at Google. In December 2008 he wrote “Kosmix Adds Rocketfuel to Power Voyage of Exploration” here. Several points earned a place in my notes about search; to wit:

  • Kosmix raised an additional $20 million in financing
  • Google=Search+Find. But Kosmix=Explore+Browse
  • The system is based on algorithmic categorization technology.

A feature summary appears on the Kosmix Web log here.

Read more

Autonomy and Xerox in Tie Up

January 20, 2009

Big news in the world of content processing and search: Xerox and Autonomy have struck a deal. According to this news story on Forbes.com “Xerox DocuShare Enters into OEM Agreement with Autonomy”, “The new license will allow Xerox to integrate Autonomy’s Intelligent Data Operating Layer (IDOL) technology into its DocuShare enterprise content management (ECM) platform.” Docushare is a content management system. The IDOL server will be integrated into the existing Docushare accounts worldwide.

David Smith, Xerox VP, said:

Content management technologies and services that help organizations save money, better manage content and improve efficiencies are essential in today’s business climate… The integration of Autonomy’s IDOL Server takes DocuShare’s ability to meet the needs of our global customer base to the next level.

Information about Docushare is here. Information about Autonomy IDOL is here. The content management sector has been hit by Microsoft’s SharePoint push. Other CMS vendors have beefed up their search and content processing services to withstand the “good enough” system available at competitive rates from Microsoft and its resellers. For example, Interwoven has a deal with Vivisimo.

The challenge for Xerox will be to hold on to its existing customers. The opportunity for Autonomy is to make upsells for other Autonomy functionality. If this deal works, perhaps Xerox will step forward and acquire Autonomy. The vendor has more than 16,000 licensees and a number of lucrative deals. Xerox has dabbled in search and content processing for many years. In fact, Microsoft licensed some of the Xerox search and content processing technology as part of Microsoft’s purchase of Powerset in 2008.

My question is, “What does Xerox know about Xerox PARC technology that prevents Xerox from using its own technology in the Docushare product?” This begs another question, “Does Microsoft know that Xerox has sidestepped Xerox PARC technology for the Autonomy IDOL system?”

Autonomy has a strong business in litigation support. I wonder if Xerox Litigation Services will avail itself of the Autonomy technology to address some of the shortcomings in the Xerox eDiscovery offerings. I don’t have any color for the financial terms of the deal. If I get some substantive information, I will post it.

Stephen Arnold, January 20, 2009

Cognition Technologies: Gospels Demo Available

January 18, 2009

Cognition Technologies has put up a new demo that allows users to search the Gospels of the Bible. The system has processed the books of Matthew, Mark, Luke, and John. You can find the demo from the Cognition Technologies’ home page at http://www.cognition.com or here.

The company selected this corpus to showcase Cognition Technologies’ ability to deal with metaphorical language. The natural language processing system permits queries by words, phrases, and questions. The company said:

We have worked very hard to show companies interested in semantically-enabling their technologies that Cognition’s technology understands language and concept nuance.

Biblical texts are difficult to parse and tag. Keep this in mind when you look at the demo. The easiest content to process is tidy, scientific, technical, and medical content chock full of jargon. Texts like the Gospels are stuffed with fuzziness, concepts, and metaphors.

For more information about the Cognition Technologies’ system, you can explore the firm’s Web site or you can read an analysis in the Gilbane Group’s study “Beyond Search” here.

Stephen Arnold, January 18, 2009

Oracle, Semantics, and Search

January 17, 2009

Secure Enterprise Search (SES10g) has dropped off my radar screen. Nothing new at Oracle World last fall. I did attend an Oracle briefing at one of the lame duck conferences I hit last year, but recently–zip. I knew that Oracle had explored a tie up with Siderean Software, a now quiet company near Los Angeles. I also picked up some intel about a conversation and test with the Bitext wizards but nothing lately.

I read in Semantic Focus here that Oracle is moving forward with semantics. The article “Semantic Data Storage in Oracle” here is worth a read. I found the information encouraging, but the write up prompted me to do some addled goose type thinking. If you are familiar with this Web log, you know that the “addled goose” phrase signals some questions and few observations.

The point of the article was to tell me that Oracle’s mothership (the Oracle 11g database) provides a platform to store the semantic Webby stuff called RDF and OWL data. RDF, in case you have forgotten, is semantic Web speak for Resource Description Framework. It is a framework for describing and interchanging metadata. More info is here. OWL is not part of the Hooter’s logo. The acronym means Web Ontology Language. More info is here. For me the most important comment was:

It [Oracle 11g] allows efficient storage, loading and querying of semantic data. Queries are enhanced by adding relationships (ontologies) to data and evaluated on the basis of semantics. Data storage is in the form of RDF triples (Subject, Predicate, Object) and can scale up to millions of triples. The triples stored in the semantic data store are modeled as a graphed structure. All the data is stored in a single central schema allowing access to users for loading and querying data.

Now my questions:

  1. Where does Secure Enterprise Search fit into this semantic data picture?
  2. With performance an issue, how will the inclusion of potentially verbose information affect retrieval?
  3. What tools will Oracle provide to make use of these new data types?

We’ve been stuff all sorts of information into database management systems for years. Maybe I am missing something, but I don’t see the type of breakthrough that companies like Aster Data and InfoBright are delivering whether the data are or are not “semantic”. One final question: What’s going on with SES10g?

Stephen Arnold, January 17, 2009

Exalead Profile Now Available

January 14, 2009

The Enterprise Search Report is no more. Thank goodness. A good idea in 2003 when work on the first edition began, the tome became an antique. I wrote the first three editions. I don’t know who did the fourth. With the coming of the new year, the rights to the information in the Enterprise Search Reports, 1st, 2nd, and 3rd editions, came back to me. I will be creating profiles based on my research into more than 50 vendors. At its peak the ESR only contained 30 profiles.

The first profile in the new, free Beyond Search Report series–an analysis of Exalead–is now available on the ArnoldIT.com Web site here. It runs about 11 pages and includes information about Exalead’s search system. I have enough information for a supplement about Exalead’s newest technology, and I will try to get that posted in the next couple of weeks as well.

I will work through my files and publish a profile every week or two. I have not worked out the full publication schedule yet, but I will get that done once I become more familiar with the new format.

There is no charge for these analyses. If you find an error, or if there is something in a profile with which you don’t agree–use the comments section of this Web log to provide your ideas and facts. I try to deliver a zero error document, but I have been writing about companies for a long time. Changes occur frequently, so you may find some variance between what’s in my free report and what the company’s sales rep tells you tomorrow.

beyond search report logo

The new logo. The Beyond Search goose is a proud mommy. Tess, however, was annoyed. She wanted a canine to identify these free reports.

Keep in mind that some of the information I have about vendors will not appear in the profiles. If you want more information about a vendor, you can write me at seaky2000 at yahoo dot com and ask for a price quote for a more detailed report. I try to track down pricing and patent information, for example, but I don’t put this information in these free profiles. I want to be helpful, but I don’t want to end up as a Wal*Mart greeter. I have to sell some proprietary reports to survive.

Part of my method is to give the vendor an opportunity to comment on my analyses. These profiles are objective, so a vendor may not agree with some of my points. That’s okay. I just don’t want to be sued by 20 somethings who take umbrage at a 65 year old’s view of a search or content processing company. What vendors say and what the software does are two very different things in my opinion.

The combination of the interviews in the Search Wizards Speak series plus this Web log plus the Beyond Search profiles with a nifty new logo makes it easy for a person interested in enterprise search to get smart without spending $1,000 or more for a report that is outdated the minute it becomes available.

Stephen Arnold, January 14, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta