Sprylogics’ CTO Zivkovic Talks about Cluuz.com
August 7, 2008
The popular media fawn over search and process content companies with modest demos that work on limited result sets. Cluuz.com–a company I profiled here several weeks ago here–offers more hearty fare. The company uses Yahoo’s index to showcase its technology. You can take Cluuz.com for a test drive here. I was quite interested in the company’s approach because it uses Fancy Dan technology in a way that was immediately useful for me. Cluuz.com is a demonstration of Toronto-based Sprylogics International Inc. The company is traded on the Toronto exchange symbol TSXV:SPY.
With roots in the intelligence community, unlocking Sprylogics took some work. Once I established contact with Alex Zivkovic, I was impressed with his responsiveness and his candor.
You can read about the origins of the Cluuz.com service as well as some of the company’s other interesting content processing technology. The company offers a search system, but the real substance of the company is how the company processes content, even the Yahoo search index into a significantly more useful form.
The Cluuz.com system puts on display the firm’s proprietary semantic graph technology. You can see relationships for specific subsets of relevant content. I often use the system to locate information about a topic and then explore the identified experts and their relationships. This feature saves me hours of work trying to find a connection between two people. Cluuz.com makes this a trivial task.
Mr. Zivkovic told me:
So, we have clustering. We have entity extraction. We have a relational ship analysis in a graph format. I want to point out that for enterprise applications, the Cluuz.com functions are significantly more rich. For example, a query can be run across internal content and external content. The user sees that the internal information is useful but not exactly on point. Our graph technology makes it easy for the user to spot useful information from an external source such as the Web in conjunction with the internal information. With a single click, the user can be looking into those information objects.
I probed into the “guts” of the system. Mr. Zivkovic revealed:
Our engineers have worked hard to perform multiple text processing operations in an optimized way. Our technology can, in most cases, process content and update the indexes in a minute or less. We keep the details of our method close to the vest. I can say that we use some of the techniques that you and others have identified as characteristic of high-speed systems like those at Google, for example.
You can read the full interview with Mr. Zivkovic in the Search Wizards Speaks interview collection on the ArnoldIT.com Web site. The full text of this exclusive interview is here. A complete index of the interviews in this series is here.
Attivio: Active Intelligence Engine Version 1.2 Released
August 7, 2008
Attivio is on my radar. The company demonstrated its next-generation business intelligence system to me at the AIIM Show in April 2008. I liked what I saw. I interviewed the founder of Attivio, and you can read that transcript on the ArnoldIT.com Search Wizards Speak site here.
Now the company has released Version 1.2 of its Active Intelligence Engine, which means the gang in Wellesley, Massachusetts, is on the move. You can read the write up here.
Attivio–like some other newcomers–is not just a search or information access engine. Attivio has rethought the problem of getting information to employees who are under pressure or just in a hurry to get their kids from day care. I will dig into some of the new features later this month.
For now, let me highlight some of the new features of AIE 1.2. (I must admit when I pronounce AIE, I imagine the sound of a scream from competitors. Then after the scream dies down, I hear, “Why didn’t we implement those functions?”).
Four points about Version 1.2 caught my attention:
- AIE is positioned as a platform. The idea is that you can deploy quickly and build on top of the AIE system.
- Rich index that combines ease of use with the type of precision associated with structured query language statements. To me, this means I can get what I need without trying to get a programmer’s attention or spend some time flipping through a SQL manual
- Fast index updates and real time alerting.
- New connectors and support for 50 languages.
Attivio wants to deliver business intelligence without the hassles of most older business intelligence systems. The Bottomline for me is that Attivio is focusing on basics: speed, ease of use, quick deployment, and platform extensibility. You can learn more about Attivio here.
Stephen Arnold, August 7, 2008
Search Options: Betting in a Down Economy
August 5, 2008
Paula Hane, who writes for Information Today, the same outfit paying me for my KMWorld column, has a very interesting run down of search engine options here. I agree with most of her points, and I think highly of the search systems which she has flagged as an option to Google.
But I want to take another look at search options and, true to my rhetorical approach, I want to take the opposite side of the argument. Ms. Hane who knows me has remarked about this aspect of my way of looking at information. Keep in mind I am not critical of her or Information Today. I want to be paid for my most recent column about Google’s geospatial services, the subject of the next column for KMWorld.
Here goes. Let’s get ready to rumble.
First, search is no longer “about search”. Search has become an umbrella term to refer to what I see as the next supra national monopoly. If you are looking for information, you probably use search less than 20 percent of the time. Most people locate information by asking someone or browsing through whatever sources are at hand. Search seems to be the number one way to get information, but people navigate directly to sites where the answer (an answer) may be found. I routinely field phone calls from sharp MBAs who prefer to be told something, not hunt for it.
Second, fancy technology is neither new nor fancy. Google has some rocket science in its bakery. The flour and the yeast date from 1993. Most of the zippy “new” search systems are built on “algorithms”. Some of Autonomy reaches back to the 18th century. Other companies just recycle functions that appear in books of algorithms. What makes something “new” is putting pieces together in a delightful way. Fresh, yes. New, no. Software lags algorithms and hardware. With fast and cheap processors, some “old” algorithms can be used in the types of systems Ms. Hane identifies; for example, Hakia, Powerset, etc. Google is not inventing “new” things; Google is cleverly assembling bits and pieces that are often well known to college juniors taking a third year math class.
Third, semantics–like natural language processing–is a hot notion. My view is that semantics work best in the plumbing. Language is slippery, and the semantic tools in use today add some value, but often the systems need human baby sitters. No one–including me–types well formed questions into a search box. I type two or three words, hit enter, and start looking at hits in the result list.
Fourth, social search sounds great. Get 200 smart people to be your pals and you can ask them for information. We do this now, or at least well connected people do. As soon as you open up a group to anyone, the social content can be spoofed. I understand the wisdom of crowds, and I think the idea of averaging guesses for the number of jelly beans in a jar is a great use of collective intelligence. For specialized work, let me ask a trusted expert in the subject. I don’t count jelly beans too often, and I don’t think you do either. Social = spoof.
Fifth, use a search system because a company pays you. Sorry, I don’t think this is a sustainable business model. Search is difficult. Search requires that a habit be formed. If the pay angle worked, the company would find that it becomes too expensive. The reason pay for search works is that not too many people search to get paid. When a person searches, there’s a reason. Getting a few pennies is not going to make me change my habits.
What’s this mean for Google competitors?
My contrarian analysis implies:
- Competitors have to leap frog Google. So far no one has been able to pull this off. Maybe some day. Just not today or for the foreseeable future.
- Google is not a search system. It’s an application platform. Search with the search box is just one application of the broader Google construct.
- Google will be lose its grip on search. As companies get larger, those companies lose their edge. This is happening to Google now. Look at how many of its services have no focus. Talk to a company that wants to get customer support. Google is losing its luster, and this means that the “next big thing” could come from a person who is Googley, just not working at Google.
So, Ms. Hane, what are we going to do with lame duck search solutions in a world dominated by a monopolistic supra national corporation that’s working on its digital arteriosclerosis. Dear reader, what do you say? Agree with me? Agree with Ms. Hane? Have another angle? Let me know.
Stephen Arnold, August 5, 2008
Vivisimo: Organizations Need a Search Strategy
August 3, 2008
Vivisimo, a company benefiting from the missteps of better known search vendors, has a new theme for its Fall sales push. Jerome Pesenti, chief scientist for Vivisimo, delivered a lecture called “Thinking Outside the (Search) Box”. The company issued a news release about the need for an organization to have an enterprise search strategy in order to prove the return on investment for a search system. What is remarkable is that–like Eric Schmidt’s opinions about how other companies should innovate here–scientists are providing consulting guidance. MBAs, accountants, and lawyers have long been the business gurus to whom challenged organizations turned for illumination. Now, a Ph.D. in math or a hard science provides the foundation for giving advice and counsel. Personally I think that scientists have a great deal to offer many of today’s befuddled executives. You will want to download the presentation here. You will have to register. I think that the company will use the names to follow up for marketing purposes, but no one has contacted me since I registered as Ben Kent, a name based on the names of beloved pets.
Is Vivisimo’s ROI Number Right?
For me the key point in the Vivisimo guidance is, and I am paraphrasing so your take may be different from mine, is that an organization needs to consider user needs when embarking on an enterprise search procurement. Mr. Pesenti reveals that the Vivisimo Velocity system saved Modine Manufacturing saved an estimated $3.5 million with a search strategy and the Vivisimo search system. You can learn more about Modine here. The company has about $1.8 billion in revenue in 2008, and it may punch through the $2.0 billion barrier in 2009. I know that savings are important, but when I calculated the percent of revenue the ROI yielded I got a small number. The payoff from search seems modest, but the $3.5 million is “large” in terms of the actual license fee and the estimated ROI. My thought is that if a mission critical system yields less than one percent return on investment, I would ask these questions:
- How much did the search system cost fully loaded; that is, staff time, consultants, license fees, and engineering?
- What’s the on going cost of maintaining and enhancing a search system; that is, when I project costs outwards for two years, a reasonable life for enterprise software in a fast moving application space, what is that cost?
- How can I get my money back? What I want as a non-scientific consultant and corporate executive is a “hard” number directly tied to revenue or significant savings? If I am running a $2.0 billion per year company, I need a number that does more than twiddle the least significant digits. I need hundreds of millions to keep my shareholder happy and my country club membership.
Enterprise search vendors continue to wrestle with the ROI (MBA speak for proving that spending X returns Y cash) for content processing. Philosophically search makes good business sense. In most organizations, an employee can’t do “work” unless he or she can find electronic mail, locate an invoice, or unearth the contract for a customer who balks at paying his bill. One measure of the ROI of search is Sue Feldman’s and her colleagues’ approach. Ms. Feldman, a pretty sharp thinker, focuses on time; that is, an employee who requires 10 minutes to locate a document rooting through paper folders costs the company 10 minutes worth of salary. Replace the paper with a search system from one of the hundreds of vendors selling information retrieval, and you can chop that 10 minutes down to one minute, maybe less.
This is the land of search costs. What’s your return on investment when you wade into this muck?
Problems with ROI for Utility Functions
The problem with any method of calculating ROI for a non-fungible service that incurs on going costs is that accounting systems don’t capture the costs. In the US government, costs are scattered hither and yon and not too many government executives work very hard to pull “total costs” together. In my experience, corporate cost analysis is somewhat similar. When I look at the costs reported by Amazon, I have a tough time figuring out how Mr. Bezos spends so little to build such a big online and search system. The costs are opaque to me, but I suppose MBA mavens can figure out what he spends.
The problem search, content processing, and text analytics vendors can’t solve is the value of investments in these complex information retrieval technologies. Even in tightly controlled, narrowly defined deployments of search systems, costs are tough to capture. Consider the investment special operations groups make in search systems. The cost is usually reported in a budget as the license fee, plus maintenance, and some hardware. The actual cost is unknown. Here’s why? How do you capture the staff cost for fixing a glitch in a system when the system must absolutely be online. That extraordinary cost disappears into a consulting or engineering budget. In some organizations, an engineer works overtime and bills the 16 hours to a project or maybe a broad category called “overtime”. Magnify this across a year of operations for a troubled search system and those costs exist but are often disassociated from the search system. Here’s why. The search system kills a network device due to a usage spike. The search system’s network infrastructure may be outsourced and the engineer records the time as “network troubleshooting.” The link to the search system is lost; therefore, the cost is not accrued to the search system.
In one search deployment, the first year operation cost was about $300,000. By the seventh year, the costs rose to $23.0 million. What’s the ROI on this installation? No one wants to gather the numbers and explain these costs. The standard operating procedure among vendors and licensees is to chop up the costs and push them under the rug.
Stanford TAP: Google Cool that Trails Cuil
July 31, 2008
in the period from 2000 to 2002, Dr. Ramanathan Guha with the help of various colleagues and students at Stanford built a demonstration project call TAP. You can download a Power Point presentation here. I verified this link on July 30, 2008. Frankly I was surprised that this useful document was still available.
TAP was a multi-organization research effort. Participants included IBM, Stanford, and Carnegie Mellon University.
Why am I writing about information that is at least six years old? The ideas set forth in the Power Point were not feasible when Dr. Guha formulated them. Today, the computational power of multi core processors coupled with attractive price-performance ratios for storage makes the demos from 2002 possible in 2008.
TAP was a project set up to unify islands of XML from disparate Web services. TAP also brushed against automatic augmentation of human-generated Web content.Working with Dr. Guha was Rob McCool, one of the developers of the common gateway interface. Mr. McCool worked at Yahoo, and he may still be at that company. Were he to leave Yahoo, he may want to join some of his former colleagues at Google or a similar company.
Now back to 2002.
One of TAP’s ambitious goals was to “make the Web a giant distributed database.” The reason for this effort was to bring “the Internet to programs”. The Web, however, is messy. One problem is that “different sites have different names for the same thing.” TAP wanted to develop a system and method for descriptions, not editors, to choreograph
the integration.”
The payoff for this effort, according to Dr. Guha and Mr. McCool is that “good infrastructures have waves of applications.” I think this is a very important point for two reasons:
- The infrastructure makes the semantic functions possible and then the infrastructure supports “waves of applications”.
- The outputs of the system described is new combinations of information, different ways to slice data, and new types of queries, particularly those related to time.
Here’s a screen shot of TAP augmenting a query run on Google.
The augmented results appear to the left of the results list. These are sometimes described as “facets” or “assisted navigation hot links”. I find this type of enhance quite useful. I can and do scan result lists. I find overviews of the retrieved information and other information in the system helpful. When well executed, these augmentations are significant time savers.
Keep in mind that when this TAP work up was done, Dr. Guha did not work at Google. Mr. McCool was employed at Stanford. Yet the demo platform was Google. I find this interesting as well that the presentation emphasizes this point: “We need [an] infrastructure layer for semantics.”
Let me conclude with three questions:
- Google was not directly mentioned as participating in this project, yet the augmented results were implemented using Google’s plumbing. Why is this?
- The notion of fueling waves of applications seems somewhat descriptive of Google’s current approach to enhancing its system. Are semantic functions one enabler of Google’s newer applications?
- When will Google implement these enhanced features of its interface? As recently as yesterday, the Cuil.com interface was described as more up to date than Google. Google had functionality in 2002 or shortly thereafter that moves beyond what Cuil.com showed today.
Let me close with a final question. What’s Google waiting for?
Stephen Arnold, July 31, 2008
Cluuz.com: Useful Interface Enhancements
July 31, 2008
Cluuz.com is one of the search companies tapping Yahoo’s search index. The Cluuz.com has introduced some useful interface changes. I will be digging into this system in future write ups, but I want to call your attention to one of the innovations I found useful. (my first Cluuz.com write up is here.)
Navigate to Cluuz.com here. Enter your query. You will see a result screen that looks like my query for “fractal frameworks”.
The three major changes shown in this screenshot are:
- Entities appear in the tinted area above the graphic. My test queries suggested to me that Cluuz.com was identifying the most important entities in the result set.
- A top ranked link with selected images. Each image is a hot link. I could tell quickly that the top ranked document included the type of technical diagram that I typically want to review.
- A selected list of other entities and concepts.
Cognition Rolls Out Semantic Medline
July 30, 2008
Resource Shelf reports that Cognition Technologies has indexed Medline content with its semantic search system. The new service is free, and you can try it yourself at http://www.semanticmedline.com/. Remember that you will be searching abstracts, not the full text of medical documents.
You can read the Resource Shelf story here. The point that jumped out at me was:
[This is] a new free service that enables complex health and life science material to be rapidly and efficiently discovered with greater precision and completeness using natural language processing (NLP) technology.
Cognition Technologies, like Hakia, develops semantic search and content processing systems. You can find out more about the company here. The company also offers a demonstration of its content processing applied to the Wikipedia. You can access that service here.
Stephen Arnold, July 30, 2008
Funnelback CTO Interview Now Available
July 29, 2008
Dr. David Hawking, the chief technical officer of Funnelback, has joined the search and content processing company full time. Dr. Hawking is well known among the information retrieval community. His students have joined Google and Microsoft Research. Dr. Hawking’s interview with ArnoldIT.com is now available as part of the Search Wizards Speak series at www.arnoldit.com/search-wizards-speak.
Dr. Hawking said that Funnelback, now in version 8, delivers search ranking quality and tunability, geospatial query processing, folksonomy tagging of search results, streamlined set up and configuration, customizable work flows, and a software as a service option. In short, Funnelback is a capable enterprise search solution.,
Located in Canberra, Australia, the Funnelback system has a number of high profile clients in Australia and New Zealand. The company also has clients in the United Kingdom and Canada.,
Dr. Hawking said,
Funnelback includes an intuitive Web based administration interface for configuration, user interface customization and viewing query reports. No programming skills are required for the majority of configuration tasks, but deeper integrations can be achieved by developing specific interfaces to work with various enterprise application such as content management systems or portal applications.
The next release of Funnelback will appear in the first half of 2009. The company has plans to expand into other countries, but Dr. Hawking would not reveal specific plans for new offices. He hinted that Funnelback is working on solutions for vertical markets. The company already has a vertical implementation for one of Australia’s law enforcement agencies. That project has been well received by the users.
You can read the full text of the interview here. Information about the company is here.
Stephen Arnold, July 29, 2008
Google’s Publishing Baby Step
July 29, 2008
I have written about Knol, Google’s publishing technology in Google Version 2.0. Outsell (a consulting firm) recycled some on of my Google publishing research in the summer of 2007. I will have an update available from my UK publisher, Infonortics, Ltd., in Tetbury, Glou., in September 2008. If you want to read my take of Google’s publishing technology, you can snag a copy of Google Version 2.0 here. In my analysis, Knol is a publishing baby step, but it is an important one because it delivers two payoffs: [a] content to monetize and [b] inputs for Google’s smart software. I explain why Google wants to process quality content, not just Webby dogs and cats in Google Version 2.0.
You may also want to read Andrew Lih’s “Google Know Wikipedia Comparison Faulty” analysis here. Mr. Lih does a good job of pointing out what Knol is and is not. Particularly useful to those confused about the competition Google faces, Mr, Lih’s identification of Google’s “real competition” is solid. The part of his essay I enjoyed was his “grading” of those who were covering the Knol story. He identifies who did poorly, those who were stuck in the mire of the bell curve, and the informed souls who received a gold star for excellence. I won’t spoil your fun, but you will find at the back of the class some names with which you will be familiar.
A happy quack to Mr. Lih.
Stephen Arnold, July 29, 2008
Opinion: Cuil, Google, and Microsoft
July 28, 2008
Before I go out and feed the geese on my pond in Harrods Creek, I wanted to offer several unsolicited comments about Microsoft, Cuil, and search.
First, now that Microsoft has its own search technologies, Fast Search & Transfer’s search technologies for the enterprise and the Web, and Powerset’s search technologies, does Cuil look cool?
This is a tough question, and I don’t think that Microsoft had much knowledge of the Cuil team and its work ins search. My research suggests that work on Cuil began for real in 2007. The work profiles of the Cuil team is decidedly non-Microsoft. My thought is that Microsoft did not have a competitive profile about this company. My working hypothesis is that this search system struck Microsoft like a bolt from the blue.
Second, will Microsoft buy Cuil? This is a question that will probably garner some discussion at Microsoft. The Linux “heads” at Microsoft will probably resonate with the idea. Cuil incorporates some of the “beyond” Google technology that one can find at Exalead and now at Cuil. The architecture of these “beyond” Google operations might be quite useful to Microsoft. On the other hand, Microsoft is charging forward with its own approach to massively parallel distributed systems that the “beyond” Google engineering would be a touch pill to swallow.
Third, will Cuil get traction? The answer is yes. My hypothesis is that the folks who flock to Cuil will be Google users, but the real impact of Cuil may well be taking orphaned or disaffected users from Ask.com, Live.com, and Yahoo.com search.
The short term impact on Google may be significant for several reasons:
- Cuil has poked a finger in Google’s eye with its user tracking policy. Simply stated, Cuil won’t build user and usage profiles that tie to an individual in a stateful session or to an individual assigned to a fine grained group of clusters in a stateless session. See my July August KMWorld feature for more about the data model of this type of tracking.
- Cuil hit Google with its larger index of 120 Web pages processed to Google’s 30 to 40 million pages. Keep in mind that size doesn’t matter, but it is a public relations hook that could snare Googzilla around the ankles.
- Cuil includes bells and whistles that have not be released on the public Google system. For example, there are snazzier results displays, insets for suggested searches, and tabs to allow slicing results. Google has these features, but the GOOG keeps them under wraps. Right now, Cuil looks cooler (pun intended). The Cuil search page is black which even says “green”. Clever.
Google now has to sit quietly and watch Xooglers implement features that Google has had in the can for years. Interesting day for both Microsoft (Should we buy Cuil too?) and Google (What’s the next step for the Xooglers’ service?).
Stephen Arnold, July 28, 2008