Funnelback 8: New Version Now Available

June 8, 2008

Funnelback, a search and content processing system, has released Version 8 with a number of new features and enhancements. Formerly Panoptic, the system now supports Microsoft SharePoint, Lotus Notes, and mainstream content management systems such as EMC Documentum and Interwoven. (For search history buffs, you can see a demo of the original Panoptic system here.)

You can now generated point-and-click interfaces. Like Vivisimo, Funnelback makes it possible for a user to add a tag to a document. The system can process structured data and index data behind a Web form. The system has added support for Chinese, Japanese, Korean, and Thai. The system can be installed on premises or it can be deployed in a software as a service model (SaaS).

You can get more information at the company’s Web site. I profiled the Panoptic / Funnelback system in the third edition of the Enterprise Search Report. I can’t recall if that profile was retained for the current edition of Enterprise Search Report. The company has a number of customers in Canada and the UK, but its profile in the United States was modest. You can access a client list here.

You can see the system in action at the Australian job search site CareerOne here. You can enter a free text concept like “Web developer” and narrow your focus via point-and-click drop down boxes. Funnelback has implemented a browse feature, which some vendors call guided navigation or assisted navigation. Whatever the concept’s buzz word, users like this feature.

There’s an implementation of the system’s capabilities on the Australian Securities Exchange site. You can use the text search method, or interact via point-and-click, ticker symbols, or role-based views. You may recall that role-based views are a feature of Microsoft’s next-generation Dynamics’ systems. Funnelback seems to be ahead of Microsoft in this approach to complex information retrieval. You can see the Funnelback Financial Planner view of Australian Securities Exchange data here.

The company has roots in academia (Australian National University, I believe) like many other search and content processing systems. My take on the original Panoptic system and the newer Funnelback system was that it represented a good value. The drawback is one that many non-US companies face when trying to make a sale in the American market. Procurement teams like to have a local presence for a product that has brand recognition on senior managers. I’ve heard rumors that Funnelback will open a US office, but I have no confirmation that this is true. I will keep you posted. In the meantime, check out the system.

Stephen Arnold, June 7, 2008

The Semantic Chimera

June 8, 2008

GigaOM has a very good essay about semantic search. What I liked was the inclusion of screen shots of results of natural language queries–that is, queries without Boolean operators. Two systems indexing Wikipedia are available in semantic garb: Cognition here and Powerset here. (Note: there is another advanced text processing company called Cognition Technologies whose url is www.cognitiontech.com. Don’t confuse these two firms’ technologies.) GigaOM does a good job of making posts findable, but I recommend navigating to the Web log immediately.

Nitin Karandikar reviews both Cognition’s and Powerset’s approach, so I don’t need to rehash that material. For me the most important statement in the essay is this one:

There are still queries (especially when semantic parsing is not involved) in which Google results are much better than [sic] either Powerset or Cognition.

Let me offer several observations about semantic technology applied to constrained domains of content like the Wikipedia:

  1. Semantic technology is extremely important in text processing. By itself, it is not a silver bullet. A search engine vendor can say, “We use semantic technology”. The payoff, as the GigaOM essay makes clear, may not be immediately evident. Hence, the “Google is better” type statement.
  2. Semantic technology is in many search systems, just not given center state. Like Bayesian maths, semantic technology is part of the search engine vendors’ toolkits. Semantic technology delivers very real benefits in functions from disambiguation to entity extraction. As this statement implies, there are many different types of semantics in the semantic technology spectrum. Picking the proper chunk of semantic technology for a particular process is complicated stuff, and most search engine vendors don’t provide much information about what they do, where they get the technology, or how the engineers determined which semantic widget to use in the first place. In my experience, the engineers arrive at their job with academic and work experience. Those factors often play a more important part than rigorous testing.
  3. Google has semantic technology in its gun sights. In February 2007, information became available about Google programmable search engine which has semantics in its plumbing. These patent applications state that Google can discern context from various semantic operations. Google–despite its sudden willingness to talk in fora about its universal search and openness–doesn’t say much about semantics and for good reason. It’s plumbing, not a service. Google has pretty good plumbing, and its results are relevant to many users. Google doesn’t dwell on the nitty gritty of its system. It’s a secret ingredient and no user really cares. Users want answers or relevant information, not a lab demo of a single text processing discipline.
  4. Most users don’t want to type more than 2.2 words in a query. Forget typing well formed queries in natural language. Users expect the system to understand what is needed and the situation into which the information fits. Semantic technology, therefore, is an essential component of figuring out meaning and intention. Properly functioning semantic processes produce an answer. The GigaOM essay makes it clear that when the answers are not comprehensive, on point, or what the user wanted, semantic technology is just another buzz word. Semantic technology is incredibly important, just not as an explicit function for the user to access.

I talk about semantic technology, linguistic technologies, and statistical technologies in this Web log and in my new study for the Gilbane Group. The bottom line is that search doesn’t pivot on one approach. Marketers have a tough time explaining how their systems work, and these folks often fall back on simplifications that blur quite different things. Mash ups are good in some contexts, but in understanding how a Powerset integrates a licensed technology from Xerox PARC and how that differs from Cognition’s approach, simplifications are of modest value.

In my experience, a company which starts out as statistics only quickly expands the system to handle semantics and linguistics. The reason–there’s no magic formula that makes search work better. Search systems are dynamic, and the engineers bolt new functions on in the hope of finding something that will convert a demo into a Google killer. That has not happened yet, but it will. When a better Google emerges, describing it as a semantic search system will not tell the entire story. Plumbing that runs compute intensive processes to cruch log data and smart software are important too.

A demo is not a scalable commercial system. By definition a service like Google’s incorporates many systems and methods. Search requires more than one buzz word.You may also find the New York Times’s Web log post by Miguel Helft about Powerset helpful. It is here.

Stephen Arnold, June 8, 2008

Mobile Projection: Truly Stunning

June 7, 2008

Information Week reporter K.C. Jones reported an iSuppli estimate that stunned me. The title of the story is “Wireless Social Networking To Generate $2.5 Trillion By 2020.” You can read it here, but hurry news has a peculiar way of becoming hard to find a day or two after the story appears on a Web site.

iSuppli–a company in the business of providing applied market intelligence–projected that Wireless social networking products, services, applications, components, and advertising will generate more than $2.5 trillion in revenue by 2020. I think that’s 12 zeros.

I’m not sure I know what wireless social networking is but if iSuppli is correct–consultants and research firms are rarely off base–it’s a great opportunity for entrepreneurs who catch the wave. Wow, $2.5 trillion in 12 short years. I thought I had seen some robust estimates from Forrester, Gartner, 451, and ComScore, but the iSuppli projection is a keeper.

Stephen Arnold, June 7, 2008

Lexalytics: Stepping Up Its Marketing

June 7, 2008

Lexalytics is a finalist in the annual MIXT (Massachusetts Innovation & Technology Exchange. Lexalytics has also revamped its Web site. The company now makes it easy to download a trial of its text analytics software. Teh trial is limited to 50 documents, but you can generate a list of entities, generate summaries of the processed documents. The most interesting function of the trial’s ability to display a sentiment score for a document. In effect, you can tell if opinion is running for or against a product.

The company’s system performs three functions on collections of content. The content can be standard office files such as Word or PowerPoint documents. The system can ingest Web log content and RSS streams as well. Once installed, the system outputs:

  • The sentiment and tone from a text source
  • The names of the people, companies, places or other entities in processed content 
  • Any hot themes in a text source.

Lexalytics has provided technology to other search and content processing companies. For example, Northern Light and Fast Search & Transfer, to name two. A happy quack to the Lexalytics’ team for the MIXT recognition. You can learn more about the company here. 

Stephen Arnold, June 7, 2008

Inside the Microsoft Mind

June 6, 2008

The Washington Post‘s Peter Whoriskey did a bang up job with his interview of Steve Ballmer, the lead dog for the Microsoft pack. Traditional media can make it tough to locate an electronic version of a story, so click here immediately and read the article.

I don’t want to spoil your fun, so I won’t recycle or paraphrase the statements Mr. Whoriskey captured. There was one comment that stuck in my mind:

I have no clue what [Google is] up to. It’s very hard for me to understand what they are up to. . . . I don’t know what Google’s angle is because it sometimes looks like Google wants to become a telecommunications company. And yet that may not be right. But that recent thing where they went in with Sprint and WiMax guys is very confusing to me. I think it’s very confusing to a number of telecommunications companies, as well.

This statement is particularly revealing to me. I have a modest bit of experience with Microsoft, both in the pre-Google days (before 1998) and the post-Google days (1999 to 2007). When I worked on a couple of tiny jobs as a sub sub contractor to the Redmond machine, the focus across the people whom I met was pretty clear. In fact, the people used the phrase Microsoft agenda to refer to Windows, Office, and servers. The “agenda” meant sell licenses, get organizations drinking the Microsoft-flavored KoolAid, and “put a computer on every desk.”

The post-Google period can be summarized for me to one word: Diffused. The original “agenda” expanded in a number of ways. These decisions have been documented in hundreds of books, articles, and Web log posts. Let me mention a few and then move on to my observations: MSN, Zune, Xbox, WebTV, and UMBC. Promising businesses to be sure: “agenda” changers all.

Mr. Ballmer’s statement about his not understanding what “they are up to” is revelatory. The “they” is Google. I wonder if the “confusing” part of Google is a reflection of Microsoft itself.

What’s my research suggestsis that Google is moving in a deliberate way and has been since its initial public offering. As the company has grown, there are more Google initiatives but these are of almost zero incremental cost to Google. Most Google innovations are software that a code wizard loads on the Google super computer. If there are clicks, Google cares. If there are no clicks, there’s no cost or revenue loss, just learning what doesn’t work.

I have documented what Google’s approach in my two studies, The Google Legacy (plumbing) and Google Version 2.0 (mathematical methods). You can buy copies of these here. Others have followed in my footsteps and in many cases gone far beyond my individual, early research about Google.

I can sum up six years of research and hundreds of hours of conversations about Google in one word: disruption. Google disrupts and then looks for advantages. “Look for” is a bit too proactive. What Google does is let the clicks guide them.

Microsoft is facing a disruptive strategy hooked to a different business model. Verizon feels the disruptive force. Traditional publishers sense that Google is “coming”. I look forward to more information from the mind of Microsoft as it wrestles with Google’s digging in and getting comfortable in some of Microsoft’s markets.

Stephen Arnold, June 6, 2008

Search Wizard Starts New Venture

June 6, 2008

Years ago I examined search technology developed by a teen age whiz named Judd Bowman. You can read about his background here. Mr. Bowman and an equally talented Taylor Brockman had devised a way around memory access bottlenecks that hobbled other search companies’ performance. Mr. Bowman founded Pinpoint, which became Motricity. With a keen interest in search, Motricity provided a range of technology to a number of high profile clients, including Motroloa.

Mr. Bowman’s new venture is PocketGear, formerly a unit of Motricity. There’s not much information available at this time. Based on my knowledge of Mr. Bowman’s interest in search, the company will offer mobile search and content services. The venture warrants close observation, particularly with regard to mobile on device applications and cloud based search and retrieval.

Note: the Charlotte News Observer article quotes me and cites my for-fee work for investors in Messrs. Bowman and Brockman’s first company, Pinpoint.

Stephen Arnold, June 6, 2008

Google: Tighter Time Controls

June 6, 2008

Valley Wag, one of the Web logs I enjoy immensely, reports that Google’s 20 percent free time for personal projects policy may be changing. You can read the original news story here.

The key point for me was this observation:

What we hear from Googlers is that supervisors are cracking down on use of 20 percent time when employees’ main projects are behind schedule. A sensible management move, but against the spirit of 20 percent time, which was meant to liberate creative employees from meddling middle management.

Google is now a decade old an rocketing forward in many business sectors. The implication is that Google needs more productivity. The flip side is that the idea of having the equivalent of one day each week to work on projects that interest a Googler is now public relations.

My sources tell me that this is not a change in policy, just a reaction to work load. If I learn more, I will let you know. I calculated that the 20 percent rule if applied to 12,000 engineers with an average salary of $130,000 per year including benefits added hundreds of millions of research costs to the company. This cost does not appear as part of Google’s “regular” R&D activity, but the approach has produced some interesting innovations for the company.

Stephen Arnold, June 6, 2008

Microsoft: Role-Based Approach to Enterprise Apps

June 6, 2008

Colin Barker, ZDNet UK, wrote an interesting article “Microsoft Launches Connected, Role-Based CRM.” You will want to read the full story here. The key idea is that Dynamics AX 2009 (one of the different flavors of customer relationship management software Microsoft sells) supports roles. The idea is that a user, once assigned a role, interacts with the system from the point of view of the role. The article quotes Microsoft’s Gary Turner, who makes this point:

This is different from the way in which ERP systems have worked in the past, where everyone has one ‘vanilla’ front end… A chief executive will look at the information differently from someone in marketing or whatever. Your needs and requirements will be different.

The system also supports direct connections to eBay (the troubled online retailer) and PayPal. The system, if I understand Mr. Turner correctly, supports smartphone access. The default Dynamics user-facing interface is a dense, detailed beastie. Presumably, the smartphone interface will be stripped down to fit the smartphone screen real estate. Support for Microsoft’s business intelligence tools is included.

Why’s this important in search?

My research indicates that role-based interfaces may be one of Microsoft’s weapons as it tries to expand the market for its different enterprise systems. Applied to search, each user would “see” an interface and search results tailored to his or her role. This personalization of the system allows Microsoft to shift from a one-size-fits-all interface to a more specialized approach to a complex system.

With announcements about the integration of Fast Search & Transfer with Microsoft’s own search technology, there is little hard information about role-based interfaces available. In my opinion, competitors can offer similar functionality if the feature gets traction with customers.

Oh, the other products in the Dynamics line up are Dynamics NAV, Dynamics GP and Dynamics SL. I have difficulty keeping each straight in my mind. Microsoft’s preference for multiple versions of products like five flavors of Vista, SharePoint’s ESS and MOSS, and four ERP systems sends me to Google’s Microsoft search here to keep track of the differences. I rely on Google to locate Microsoft information. Response seems quicker and the index appears to be refreshed more frequently.

Stephen Arnold, June 6, 2008

Enterprise Information’s Missing Pieces

June 5, 2008

In 2001, I found myself on a panel talking about electronic information and enterprise search. The venue was Internet World. That’s right the once dominant trade show for the brave new world of online.

I’m not sure how I ended up on the program, but I recall I was there, facing an audience of 250 people. Put the word “Internet” on a hand lettered sign in a diner’s window and a crowd would gather. The Internet has evolved but the missing pieces in the information puzzle are still with us.

Here’s an image from my PowerPoint deck.

puzzle pieces

Web log graphics are “crunched” and the result is difficult for me to read. Let me highlight each of these nine pieces of the enterprise information puzzle.

  1. Graphical editor
  2. Database engine
  3. Version controls
  4. Site manipulation tools (that is, publishing tools)
  5. Personalization tools
  6. Search engine
  7. Administrative interface
  8. Usage tracking
  9. Security services

Nothing is missing. The nine elements are identified in the graphic, and in your own organization you have each of these functions up and running. Some puzzle pieces work better than others. These are complex sub systems and functions. Variability and unevenness are to be expected.

My point in 2001 was that each of these pieces was not fitted to the others. The parts are there, but until integration across different sub systems and functions, the puzzle is incomplete. In fact, you don’t even have a decent picture of what the integrated results will look like.

Read more

Touching Lightly on a Killer Task

June 5, 2008

A colleague sent me a link to a white paper written by Bikram Sankar Das, the head of Tata Consultancy Services Business Intelligence and Performance Management practice in the UK and Ireland. Tata is a massive conglomerate known for outsourcing and buying aging automobile companies. Its consulting unit’s tag line is “Experience certainty”.

I enjoy reading white papers about business intelligence and content processing. Good papers give me useful anecdotes for my talks. Not-so-hot papers are less useful. “Business Intelligence and SOA: Making the Jump” tips toward the useful side. The wrap up struck me as the strongest section of the paper; to wit:

At the same time, there are still a number of challenges to be faced. One of the defining characteristics of BI-PM is its hunger for accurate information. As users become more and more accustomed to relying on analytical tools, their demand for new kinds of data capture and analysis increases. This leads to rapid database growth, and accelerating demand for storage capacity – pushing up costs and clashing with green IT policies.

The author makes an important point: users are going to grouse unless the systems deliver heterogeneous information properly parsed and sorted in a timely way. When systems don’t deliver what the marketers promise, users won’t use the system. Bad things happen when users get cranky and find other ways to get the information needed to do their jobs.

The weaker part of the paper is the hippity-hop over the problem of data transformation. As much as one-third of an information technology budget can be consumed fiddling with data so a fancy-Dan system can do its song and dance act. The author put my teeth on edge when he wrote:

Instead of a centralized information store, a federated approach can work well. With this approach, although the information is stored in a number of different databases, the databases themselves share a common protocol for information exchange through an ‘information bus’ – making it simple to compare and analyse data from different sources. To create a successful federated infrastructure, metadata must be carefully standardized across all systems, and a data/information governance model must be adopted across the entire organization. This can often necessitate a cultural change in the process of information creation, storage and consumption.

We’re talking big money, mucho time, and quite a bit of work to deliver standardized information to a business intelligence system. There simply is neither the money nor the programming resources to crunch through the large amounts of digital information. Users don’t know about these costs, nor do they care.

The hurdle next-generation business intelligence systems must get over is data transformation. A failure to explain the costs and complexities of this set of tasks fertilizes the ground for user revolt to take root.

Judge for yourself. You can download the essay here. The author minimizes what may be the most complicated work required for next-generation business intelligence.

Stephen Arnold, June 6, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta