Forbes Taps Belief Networks for Semantics

September 15, 2009

Their are a number of what I would call publisher-centric information services. Examples range from Relegence.com (a unit of AOL, formerly Time Warner) to DayLife.com (funded in part by the New York Times Co.). Another outfit is Belief Networks. The Beyond Search team learned last week that Forbes.com will be using technology from Belief Networks, which specializes in semantic intelligence and predictive analytics, to power advanced search on its Web site. Belief Networks packages set up a semantic search that returns relevant advertising and content listings, including real-time social network entries and Twitter conversations. Forbes says it’s trying to “enrich” the web site experience and “engage” its readers. People go to Forbes.com looking for up-to-date or even before-the-date money- and business-focused topic matter. That’s why Forbes is looking to upgrade reader access to real time discovery and tracking of both structured and unstructured content. The Belief Networks’ method reminded us of the original Oingo service (which changed its name to Applied Semantics). Google acquired Oingo / Applied Semantics and made good use of the technology in a number of Google services. Perhaps Forbes will enjoy a similar Googley success?

Jessica Bratcher, September 15, 2009

LexisNexis and Recommind Tie Up

August 26, 2009

LexisNexis continues to look for ways to boost its revenues. The efforts are interesting because the company continues to retrace its steps in an effort to crack the code. LexisNexis, like Westlaw, faces a mini revolt. Some law firm clients are asking the firms to do legal work for fixed prices, reduced rates with ceilings on costs, and be more creative in reducing the costs of certain legal work. This is bad news for the commercial legal information companies. The reason? The companies delivering US legal information depend to some degree on taxi meter pricing. The idea is that the legal researcher pays for time and other variables for system access. Not surprisingly, for a patent matter, a legal researcher can run up a four or five figure bill. In the good old days, the law firms’ clients would pay up. Today, some clients are balking. Once again the traditional business model runs headlong into the new realities of business.

What’s the fix? LexisNexis has tried a tie up with Microsoft to put research in Microsoft Word. Did you activate the feature? I didn’t. LexisNexis has tried to diversify into fraud, content analysis, and risk. Do you think of LexisNexis when I say these words? I didn’t think so. LexisNexis has tried different angles of attack on search, law firm software services, and Web access.

The financial pressure continues to mount.

I just learned that knowledge management is the next revenue Petri dish. “LexisNexis and Recommind to Deliver New Knowledge Management Capabilities for Law Firms” reported this new venture. The story reported:

The new offering integrates Lexis Search Advantage content and services accessed through lexis.com with MindServer Search, Recommind’s enterprise search platform. It provides a one-stop destination combining access to documents and information from both a firm’s internal sources as well as trusted LexisNexis® content, delivering search results that are more complete, efficient and actionable.

The run down of benefits is pretty much what one would expect: information integration, better research, etc.

The proof of the pudding will be revenue. LexisNexis is straying from its core competency of delivering commercial grade legal information. Will knowledge management generate enough cash to put LexisNexis back on the fast growth track? In my opinion, the company is in a race. Some government entities are making more legal related information available online. Attorneys looking for ways to cut costs are likely to flock to these free services. Another challenge is the interest in lower cost professional information services like FastCase.

Recommind shifted from a legal niche strategy to an enterprise search strategy. Does this tie up mean that Recommind is returning to its legal niche or diversifying courtesy of LexisNexis.

Interesting moves by both companies. Each firm has search technology. Now search and content have morphed into knowledge management. Does anyone know what knowledge means? Does anyone know what management means? The phrase strikes me as “old school”.

Stephen Arnold, August 26, 2009

Somat Engineering: Delivering on the Obama Technology Vision

August 24, 2009

I fielded an email from an engaging engineer named Mark Crawford. Most of those who contact me get a shrill honk that means, “The addled goose does not want to talk with you.” Mr. Crawford, an expert in the technology required to make rockets reach orbit and vehicles avoid collisions, said, “One of the top demo companies in San Francisco listened.”  I asked, "Was it TechCrunch?" Mr. Crawford said, "I cannot comment on that."  So with a SF demo showcase interested, I figured, “Why not get a WebEx myself?”

Quite a surprise. I wrote a dataspace report for IDC in September 2008. No one really cared. I then included dataspace material in my new Google monograph, Google: The Digital Gutenberg. No one cared. I was getting a bit gun shy about this dataspace technology. You can get a reasonable sense of the thinking behind dataspace technology by reading the chapter in Digital Gutenberg which is available without charge from my publisher in the UK. Click here to access this discussion of the concept of dataspaces.

Mr. Crawford’s briefing began, “We looked at how we could create a dataspace that brings together information for a government agency, an organization, or a small group of entrepreneurs. We took a clean sheet of paper and built a system that bridges the gap between what people want to do and the various islands of technology most enterprises use to get their knowledge sharing act together.”

ripply image

Mr. Crawford, along with his partner Arpan Patel, and some of Somat Engineering’s information technology team, built a system that President Obama himself would love. Said Mr. Crawford, “We continue to talk with our government partners and people like the demo showcase. There seems to be quite a bit of excitement about our dataspace technology.”

I wanted to put screenshots and links in this write up, but when I followed up with Somat Engineering, a 60 person multi-office professional services firm headquartered in Detroit, Michigan, I was told, “Sit tight. You are one of the first to get a glimpse at our dataspace system.”

I challenged Mr. Crawford because Somat designs bridges, roads and other fungible entities. It is not a software company. Mr. Crawford chuckled:

Sure, we work with bridges and smart transportation systems. What we have learned is that engineers in our various offices build bridges among information items. Our dataspace technology was developed to build bridges across the gaps in data. Without our dataspace technology, we could not design the bridges you drive on. Unlike some software companies, our dataspace technology was a solution to a real problem. We did not develop software and then have to hunt for a problem to solve. Without our technology, we could not deliver the award winning engineering Somat puts out each and every day.

Good answer. A real software solution to a real world problem – bridging gaps among and between disparate information. Maybe that is what turned the crank at the analyst’s shop. Refreshing and pragmatic.

However, I did get the okay to provide some highlights for you and one piece of information that may be of considerable interest to my two or three readers who live in the Washington, DC area.

First, the functions:

  1. Somat has woven together in one seamless system Microsoft and Google functions. Other functions can be snapped to make information sharing and access intuitive and dead simple.
  2. The service allows a team to create the type of sharing spaces that Lotus has been delivering in a very costly and complicated manner. The Somat approach chops down the cost per user and eliminates the trademarked complexity of the Lotus solutions.
  3. The system integrates with whatever security methods a licensing organization requires. This means that the informal security of the Active Directory works as well as the most exotic dongle based methods that are popular in some sectors.

The second piece of news is that the public demonstration of this Somat technology will take place in Washington, DC, at the National Press Club on September 23, 2009. I have only basic details at the moment. The program begins at 9 am sharp and ends promptly at 11 am. There will be a presentation by the president of Somat, a demonstration of the dataspace technology, a presentation by long-time open source champion Robert Steele, president of OSS Inc. and a technology review by Adhere Solutions, the US government’s contact point for Google technology. A question and answer session will be held. After my interrogation of Mr. Crawford, he extended an invitation to me to participate in that Q&A session.

image

Bridging information pathways. The key to Somat’s engineering method.

Somat’s choice of the National Press Club venue was, according to Mr. Crawford:

The logical choice for Somat. As a minority owned engineering and professional services company, we see our dataspace technology as one way to deliver on President Obama’s technical vision. We think that dataspaces can address many of the transparency challenges that the Obama administration is tackling head on. Furthermore, we know from our work on certain government projects, that US agencies can use our dataspace approach to reduce costs and chop out the “wait” time in knowledge centric projects.

Based on what I saw on Friday, August 21, 2009, the San Francisco tech analysts were right on the money. I believe that Somat’s dataspace solution will be one of the contenders in this year’s high profile demo event.  My thought is that if you want to deal with integrated information in a way that works, you will want to navigate to the Somat registration link to attend this briefing.  If you want to talk to me about my view, drop me an email at seaky2000 at yahoo dot com. (The NPC charges for coffee, etc., so there is a nominal fee to deal with this situation.)

Somat has a remarkable technology.  It touches upon such important information requirements as access to disparate information, intuitive “no training required” interfaces, and real time intelligence.

For more information about Somat, visit the company’s Web site. The addled goose asserts, “Important technology that bridges the gaps between Google, Microsoft and other information systems.”

Stephen Arnold, August 24, 2009

Smart Wiki Search

August 23, 2009

A happy quack to the  reader who sent me a link to Smart Wiki Search. There is a good description of what the system does to identify related queries. Click a related query and the system displays a page of Wikipedia information germane to your original query. I tested the query “Julius Caesar” and got useful links shown below:

smart wiki

The about section of the service explains the nuts and bolts of the system:

Smart Wiki Search uses the link structure of Wikipedia to calculate which concepts each page is associated with. It is easy to see why looking at links can help group pages by concepts. For example, pages about mathematics have a lot of links to (and from) other pages about mathematics. Pages about the Apollo moon landing have a lot of links to pages about NASA and pages about the moon, etc. More specifically, Smart Wiki Search uses the so-called Eigen decomposition of the Wikipedia link transition matrix. Eigen decomposition provides of a number of special vectors, called eigenvectors, and their corresponding Eigen values. These vectors are special because even a relatively small number of eigenvectors having the largest Eigen values can capture all the most important properties of the link structure.

Give the system a spin. Graduate students and those writing research papers are likely to find this content domain specific search system useful.

Stephen Arnold, August 23, 2009

Stephen Arnold

Semantic Sig.ma

August 3, 2009

We’ve found a tool that offers results from multiple and varied (and sometimes unrelated) Web sites by using semantic analysis of embedded site data. It’s called Sig.ma, http://sig.ma/. What it returns is a mashup, a bulk treatment of data that may or may not help you. As the site explains: “It might be noisy but you can spot gems, e.g. interesting description differences in different sources.” Sig.ma can be a data browser, a widget to paste into e-mail or HTML, or a semantic API.

Sig.ma is also interactive and capable of learning; if you delete an item it will process that info to exclude that data next time. The catch right now is that only pages exposing RDF, RDFa or Microformats will appear. So while Google spiders the web like crazy, Sig.ma is delving into more specific caches of information. Does this mean Google is lagging behind Sig.ma’s services: Sindice, Kkam, and Boss? We’ll see what the future holds for semantic search.

Jessica Bratcher, August 3, 2009

Big Data, Big Implications for Microsoft

July 17, 2009

In March 2009, my Overflight service picked up a brief post in the Google Research Web log called “The Unreasonable Effectiveness of Data.” The item mentioned that three Google wizards wrote an article in the IEEE Intelligent Systems journal called “The Unreasonable Effectiveness of Data.” You may be able to download a copy from this link.

On the surface this is a rehash of Google’s big data argument. The idea is that when you process large amounts of data with a zippy system using statistical and other mathematical methods, you get pretty good information. In a very simple way, you know what the odds are that something is in bounds or out of bounds, right or wrong, even good or bad. Murky human methods like judgment are useful, but with big data, you can get close to human judgment and be “right” most of the time.

When you read the IEEE write up, you will want to pay attention to the names of the three authors. These are not just smart guys, these are individuals who are having an impact on Google’s leapfrog technologies. There’s lots of talk about Bing.com and its semantic technology. These three Googlers are into semantics and quite a bit more. The names:

  • Alon Halevy, former Bell Labs researcher and the thinker answering to some degree the question, “What’s after relational databases”?”
  • Peter Norvig, the fellow who wrote the standard textbook on computational intelligence and smart software
  • Fernando Pereira, the former chair of Penn’s computer science department and the Andrew and Debra Rachleff Professor.

So what do these three Googlers offer in their five page “expert opinion” essay?

First, large data makes smart software smart. This is a reference to the Google approach to computational intelligence.

Second, big data can learn from rare events. Small data and human rules are not going to deliver the precision that one gets from algorithms and big data flows. In short, costs for getting software and systems smarter will not spiral out of control.

Third, the Semantic Web is a non starter so another method – semantic interpretation – may hold promise. By implication, if semantic interpretation works, Google gets better search results plus other benefits for users.

Conclusion: dataspaces.

See Google is up front and clear when explaining what its researchers are doing to improve search and other knowledge centric operations. What are the implications for Microsoft? Simple. The big data approach is not used in the Powerset method applied to Bing in my opinion. Therefore, Microsoft has a cost control issue to resolve with its present approach to Web search. Just my opinion. Your mileage may vary.

Stephen Arnold, July 17, 2009

Semantic Search Revealed

July 14, 2009

I read “Semantic Search round Table at the Semantic Technology Conference” in ZDNet Web logs. Paul Miller, the author of the write up, did a good job, including snippets from the participants in the round table. In order to get a sense of the companies, the topics covered, and the nuances of the session, please, read the original. I want to highlight three points that jumped out at me:

First, I saw that there was a lot of talk about semantics, but I did not come away from the participants’ comments with a sense that a single definition was in play. Two quick examples:

  • One participant said, ‘It means different things’. Okay, but once again we have “wizards” talking about search in general and semantic search in particular and I am forced to deal with ambiguity. “Different things” means absolutely zero to me. True, I am an addled goose, but my warning flights started flashing.
  • The Googler (artificial intelligence guru Dr. Peter Norvig) put my feathers back in place. He is quoted as saying, ‘Different types of answers are appropriate for different types of questions…’. That’s okay, but I think that definition should have been the operating foundation for the entire session.

Second, the wrap up of the article focused on Bing.com. Now Bing incorporates Powerset, according to what I have read. But Bing.com is variation on the types of search results that have been available from such companies as Endeca for a while and from newcomers like Kosmix. The point I wanted to have addressed is what specific use is being made of semantics in each of the search and content processing systems represented in the roundtable discussion. Unreasonable? Sure, but facts are better than generalities and faux politesse.

Finally, I did not learn much about search. Nothing unusual in that. Innovation was not what the participants were conveying in their comments.

Bottomline: great write up, disappointing information.

Stephen Arnold, July 14, 2009

Oracle, Publishing, and XSQL

July 14, 2009

I am a big fan of the MarkLogic technology. A reader told me that I should not be such a fan boy. That’s a fair point, but the reader has not worked on the same engagements I have. As a result, the reader has zero clue about how the MarkLogic technology can resolve some of the fundamental information management, access, and repurposing issues that some organizations face. I am all for making parental type suggestions. I give them to my dog. They don’t work because the dog does not share my context.

The same reader who wanted me to be less supportive of MarkLogic urged me to dig into Oracle’s capabilities in Oracle XSQL, which I know something about because XSQL has been around longer than MarkLogic has.

Now Oracle is a lot like IBM. The company is under pressure because its core business lights up the radar of its licensees’ chief financial officer every time an invoice arrives. Oracle is in the software, consulting, open source, and hardware business. Sure, Oracle may not want to make SPARC chips, but until those units of Sun Micro are dumped, Oracle is a hardware outfit. Like I said, “Like IBM.”

MarkLogic has been growing rapidly. The last time I talked with MarkLogic’s tech team, it was clear to me that the company was thriving. New hires, new clients, and new technologies—these added to the buzz about the company. Then MarkLogic nailed another round of financing to fuel its growth. Positive signs.

Oracle cannot sit on its hands and watch a company that is just up Highway 101 expand into a data management sector right under Oracle’s nose. Enter Oracle XSQL, which is Oracle’s answer to MarkLogic Server.

The first document I examined was “XSQL Pages Publishing Framework” from the Oracle 9i/XML Developer’s Kits Guide. I printed out my copy, but you can locate an online instance on the Oracle West download site. I am not sure if you will have to register. Parts of Oracle recognize me; other parts want me to set up a new account. Go figure. Also, Oracle has published a book about XSQL, and you can learn more about that from eBooksLab.com. You can also snag a Wiley book on the subject: Oracle XSQL: Combining SQL, Oracle Text, XSLT, and Java to Publish Dynamic Web Content (2003). A Google preview is available as well. (I find this possibly ironic because I think Wiley is a MarkLogic licensee but I might be wrong about that.)

Oracle has an Oracle BI Publisher Web log that provides information about the use of XSQL. The most recent post I located was a June 11, 2009, write up but the link pointed to “Crystal Fallout” dated May 22, 2009. Scroll to the bottom of this page because the results are listed in chronological order, with the most recent write up at the bottom of the stack. The first article, dated May 3, 2006, is interesting. “It’s Here: XML Publisher Enterprise Is Released” by Tim Dexter provides a run down of the features of this XSQL product. A download link is provided, but it points to a registration process. I terminated the process because I wasn’t that interested in having an Oracle rep call me.

I found “BI Publisher Enterprise 10.1.3.2. Comes Out of Hiding” interesting as well. The notion that an Oracle product cannot be found underscores another aspect of Oracle’s messaging. From surprising chronological order to hiding a key product, Oracle XSQL seems to be on the sidelines in my opinion.

An August 31, 2007 post “A Brief History of BIP” surprised me. The enterprise publishing project was not a main development effort. It evolved out of frustration with circa 2007 Oracle tools. Mr. Dexter wrote:

Three years later and the tool has come a long way … we still have a long way to go of course. But you’ll find it in EBS, PeopleSoft, JDE, BIEE as a standalone product, integrated with APEX and maybe even bundled with the database one day – its a fun ride, exhausting but fun.

This statement, if accurate, pegs one part of XSQL in 2004. (I apologize that the links point to the long list of postings, but Oracle’s system apparently cannot link to a single Web log post on a separate Web page. Annoying, I know. MarkLogic’s system provides such fine grain control with a mouse click, gentle reader.)

When we hit 2009, posts begin to taper off. A new release—10.1.3.3.3—was announced in May 2008. The interesting posts described the method of tapping into External Data Engines Part I, May 13, 2008) and Part 2, May 15, 2008).

image

The flow seems somewhat non intuitive to me, even after reading two detailed Web log posts.

An iPhone version of Publisher became available on July 17, 2008.

In August 2008, Version 10.1.3.4 was released. The principal features, as I understand them, were:

  • Integration with Oracle Enterprise Performance Management Workspace
  • Integration with Oracle “Smart Space”
  • Support for multidimensional data sources, including Hyperion Essbase, SQL Server, and SAP Business Information Warehouse (!)
  • Usability and operation enhancements which seem to eliminate the need to write scripts for routine functions
  • Support for triggers
  • Enhanced Web  services support
  • A Word template builder
  • Support for BEA Web Logic, JBoss, and Mac OS X.

Another release came out in April 2009. This one was 10.1.3.4.1 and focused on enhancements. When I scanned the list of changes, most of these modifications looked like bug fixes to me. In April 2009, Tim Dexter explained a migration gotcha. I read this as a pretty big glitch in one Oracle service integrating with another Oracle service.

Stepping back I am left with the impression that XSQL and this product are not the mainstream interest of “big” Oracle. In fact, if I had to decide between using Oracle’s XSQL, I would not hesitate in selecting MarkLogic’s solution for these reasons:

  1. MarkLogic has one mission: facilitate content and information management. The company is not running an XQuery side show. The company runs an XQuery main event.
  2. The MarkLogic server generates pages that make it easy to produce crunchy content. The Oracle system produces big chunks of content that are difficult to access and print out. Manual copying and pasting is necessary to extract information from the referenced Web log.
  3. The search function in MarkLogic works. Search in Oracle is slow and returns unpredictable results. I encountered this problem when trying to figure out whether “search” means “Ultra Search” or “SES”.

So, I appreciate the feedback about my enthusiasm for MarkLogic. I think my judgment is sound. Go with an outfit that does something well, not something that is a sideline.

Stephen Arnold, July 14, 2009

Lingospot: Technology for Publishers

July 10, 2009

I have been thinking about the problem facing traditional publishing companies. Some pundits suggest that publishers need to solve their problems with technology. My view is that publishers think about technology in a way that makes sense for their company’s business and work processes. I have come across two companies who have been providing publishers with quite sophisticated technology. The management of these two firms’ clients gets the credit. The technology vendors are enablers.

I provided some recent information about Mark Logic Corporation in my summary of my presentation to the Mark Logic user’s group a few weeks ago. You can refresh your memory about Mark Logic’s technology by scanning the company’s Web site.

The other company is Lingospot. Among its customers are Forbes Magazine and the National Newspaper Association. The company offers hosted software solutions. Publishing companies comprise some of Lingospot’s customers, but the firm has deals with marketing firms as well.

The company describes its technology in this way:

Lingospot’s patented topic recognition and ranking technology is the result of more than eight years of development and four years of machine learning. To date, our algorithm has identified over 30 million unique topics drawn from more than two billion pages that we have crawled and analyzed. During the last four years, we have collected over five billion data points on such topics, including the context in which readers have chosen to interact with each topic. What does all this mean for our clients? By partnering with Lingospot you have access to the leading topic recognition, extraction and ranking technology, as well as the accumulated machine learning of our platform. This translates into a more engaging experience for your readers and substantially higher metrics and revenue for you.

My understanding of this technology is that Lingospot can automatically generate topic pages from a client’s content and then handle syndication of the content. The Lingospot works with text, images, and video.

The company is based in Los Angeles. Founded by Nikos Iatropoulos, Mr. Iatropoulos was involved with several other marketing-centric companies. He worked for Credit Suisse and, like the addled goose, did a stint with Booz, Allen & Hamilton. His co founder is Gerald Chao, who is the company’s chief technical officer. Prior to founding Lingospot, Gerald served as a Senior. Programmer at WebSciences International and as a programmer at IBM. Gerald holds a MS in Computer Science and a PhD in statistical natural language processing, both from UCLA.

Publishers are embracing technology. My hunch is that the business models need adjustment.

Stephen Arnold, July 10, 2009

Google Gestation Period

July 7, 2009

I went through my notes about the Guha patent documents. These were published in February 2007. BearStearns published my analysis of these documents in May 2007. I am not sure these are available to the public, but I did describe the Programmable Search Engine invention in my Google Version 2.0 study which came out in September 2007. The Google Squared service and its query “digital camera” replicates the exemplary item in the Guha patent document. Several observations:

  1. My 2005 assertion that the Google gestation period is about four years. There is a two year ramp period inside the firm during which time the technology is shaped and then, if deemed patentable, submitted to the USPTO and other patent bodies.
  2. After the patent document is published like the Guha February 2007 PSE patents a two year maturing and deployment process begins.

The appearance of the Google Squared service as a beta marks the Darwinian field testing. The age of semantics is now officially underway. You can read about Google’s methods in my trilogy The Google Legacy (2005), Google Version 2.0 (2007), and Google: The Digital Gutenberg (2009). The 2007 and 2009 studies provide some research data germane to those who want to surf on Google. Yep, that the source of my “wave” analogies and the injunction at the end of my Google talks to “surf on Google”.

What’s next? Wait for my newest monograph on time in search and content. I find it easier to let research and content analysis illuminate the would and could of the GOOG.

Stephen Arnold, June 7, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta