Google Mini Signals Maxi Change

May 16, 2008

In San Francisco earlier this week, I spent some time with one of my tech pals. In the course of the conversation, we talked about the lousy margins on hardware, even the flashiest gear from HP, IBM, and Sun. He said, “Too much cost, not enough fast cash.” He also told me that Google was going to trim its line of Google Search Appliances.

Yesterday, TechCrunch–an information life support device for my aging self in rural Kentucky–said much the same thing. Mark Henderson’s “Rumor: Google to Launch Hosted Site Search, Ditch Mini” appeared on May 15, 2008. One point that jumped out at me was:

It’s not exactly clear what this decision means for the enterprise search industry, but it won’t be surprising if Google does indeed come out with a cloud-based solution.

The comments from my friend and Mr. Henderson’s blockbuster mesh with what I have learned from people using Google’s custom search. Custom search is a no-charge way to get Google search for your Web site. We use it as one search option for ArnoldIT.com’s Web log, “Beyond Search”. We’ve tested the function and found that it works with near-zero latency and spiders tirelessly, often picking up changes to test custom search pages in less than 15 minutes.

Why do we think this “mini” change signals a “maxi” shift? There are three reasons:

Google isn’t in the hardware business. Google’s wizards love hardware, and the company has patent applications that are stuffed with fans, racks, and other gizmos. Hardware equals support, and if there’s one thing less exciting to a Googler than attending a lecture on ancient Greek pottery, it’s dealing with a a flesh-and-blood customer

Google moves in surprisingly small, incremental steps for a giant company. Any shift to a cloud-based service is no casual decision. Folks, we have a signal.

The Google Search Appliance is a beast of burden. To get the most out of the OneBox API and deliver the functionality that customers are discovering is possible, more robust devices are needed. The “blue” Mini was a black sheep compared to the “yellow” GB (Google Box) siblings.

Companies that dismiss Google’s enterprise ambitions are certainly free to continue emulating ostrich. The more strategically-minded may want to increase their fly-bys of the Googleplex. The enterprise market with its billions appeals to Google’s financial officer.

Stephen Arnold, May 15, 2008

Data Harmony Update a Suite Release

May 16, 2008

Access Innovations Inc., a data management systems company, is releasing version 3.4 of its Data Harmony software suite, and it sounds like a sweet deal.

The five-component software is used to make and maintain taxonomies, thesaurus, and indexing systems. Data Harmony focuses on accuracy, precision, and repeatability in its search results, an emphasis that receives a happy quack from the Arnold IT mascot.

The major updates include more than 30 new features and revised documentation (to keep you in tune). The company says current users will recognize the same look and feel of the program and appreciate “friendlier and more functional features.”

President and Chairman Marjorie M.K. Hlava said the upgrade comes courtesy user requests and suggestions. It’s refreshing to find a tech company making such efforts to rework a good product and actually making it better. We like Ms. Hlava’s old-fashioned, hands-on, we-care approach most refreshing at a time when software vendors do better PR than coding. The full list of the Data Harmony enhancements for 3.4 can be found here.

Jessica Bratcher, May 16, 2008

Enterprise Search Vendors’ Taglines

May 16, 2008

A colleague in San Francisco asked me on May 14, 2008, “How do the search engine vendors position themselves?”

I told him that I would think about the question on the luxurious red-eye flight from SFO to Detroit. I did. I worked through the files on my trusty laptop and compiled a list of the taglines for some of the vendors whom I monitor. The list is not exhaustive, but I had data about a couple of dozen companies in the behind-the-firewall search business.

The table below provides a summary of the taglines. These are quite interesting, and I was surprised at the different approaches taken to explaining the companies’ systems. For example, I liked the taglines that echoed Caesar’s I came, I saw, I conquered (Vini, vidi, vici). SchemaLogic says, “Find. Use. Protect.” Thetus asserts, “Find. Assess. Fit. Understand.” Lexalytics crafts, “Discover. Understand. Act.”

Several of the companies use active or instrumental catchphrases. Brainware, a spin out from a German content management company, uses, “Intelligence unleashed.” I thought of a tiger pursuing me through the Louisville Zoo. And InQuira says, “Harvest knowledge.” Nstein, a company that has undergone accelerated evolution,

Less creative influences put a damper on marketing passion in these slogans. Panoptic (now Funnelback) gently offers, “Internet and Enterprise Search.” Almost matching the Australian’s tagline is Fast Search & Transfer’s “The business of search.” Clearforest matches these in understatement with its “Text Analytics Solutions.” ZyLAB comes close too, saying, “Infomation Access Solutions.”

Other companies use the tagline as elevator speeches on a diet. For example, Endeca, flush with investments from Intel and SAP, states, “Innovative Software to Help People Explore, Analyze, and Understand Information.” Not to be outdone in the pitch department is ISYS Search Software’s “Enterprise Search Solutions for Real People Doing Business in the Real World.” (I like the “real” part of this statement because some of the taglines are a bit abstract.) Stratify (formerly Purple Yogi) stikes a Zen-like note: “Focus on the Matter of eDiscovery with Peace of Mind.” When I repeat this five times, my heart rate slows and my blood pressure drops.

Other vendors assert that their system is Numero Uno in the search-and-retrieval sector in a nice way, of course. Open Text, a company with as many search technologies as Microsoft, declares themselves “The Content Experts.” And, Dieselpoint opines, “The Leader in Search & Navigation Technology.”

A small number of vendors drift into the poetic. Exegy uses repetition and alliteration to explain its super-fast appliance: “Extreme Speed. Extreme Insight.” Or, SurfRay (owner of Mondosoft and Speed of Mind) and its rhytmic “We Move People to Discover.” Note that SurfRay itself, a relative newcomer to search, describes itself this way, “Pioneers in Enterprise Search and Behavior Analytics.” Strong stuff and sure to cat catch the attention of Autonomy working overtime to catch up with the “Don’t be evil” Googlers.

Read more

Google Translate

May 15, 2008

The Google Search Appliance is a pretty nifty gizmo when you know how to “pimp” your GSA with the One Box API. On May 15, 2008, the GOOG confirmed what I heard at the Where 2.0 conference yesterday afternoon: Google Translate now handles another 10 languages. You can read the Googlers’ official announcement here.

Why mention this on Beyond Search with its narrow editorial scope. Well, in addition to cross language searchers, you get to play with the AJAX language API. For the clever kids at Adhere Solutions, you can use some of this translation goodness to allow a language-challenged person to read a document written in another language by a colleague half way across the world.

These baby step announcements can be overlooked unless you keep your ears attuned to the sound of big Googzilla paws advancing toward the enterprise. What will Google tell me about this “interesting” expansion of Google Translate. I’m a persona non grata, so my queries fall on deaf but wealthy ears. I can hear those claws scraping across the pavement of Shoreline Drive.

Stephen Arnold, May 15, 2008

Content Transformation: A Challenge that Won’t Go Away

May 15, 2008

We live in a world of Web 2.0 and Web 3.0 goodness. At the Where 2.0 conference in Burlingame, California on May 14, 2008, I overheard this snippet of conversation:

We had everything working, but when we imported content, the system crashed. I reinstalled. I checked the config files. It still crashed. I have to open each file, resave it as an RTF, and import them one at a time. Grrrr.

Sound familiar?

I have heard this complaint many times before. In our content-savvy, XML-ized era, moving a source file into a content processing system should be trivial. The content processing system can extract entities. It can metatag. Some can slice, dice, and cook a chicken. But unless the system can intake content and transform it to something that the content processing subsystem understands, the system is dead in the water. Even worse, the text processing system only processes some of the source documents. In certain mission critical applications, kicking out documents is a no-no. Not only is the manual manipulation expensive, it’s time consuming. In those minutes or hours of fiddling, potentially significant data are not available to the analysts. What does missing information cost? Well, it depends on your work situation. In the Wall Street world, investment information can turn a win into a loss in a millisecond. In certain military applications, the information may mean the difference between health and harm.

square circle

Transforming a square into a circle or a circle into a square looks easy. With a triangle and a compasss you can create two objects. Its the intermediate steps that become tricky for an artist or a budding mathematician.

What is file or data transformation? In its simplest form, you have a file in Microsoft Word 2007 format, and you want to “transform” or change the file into a format recognized by another system’s import filter. So, one approach would be to open the File in Word 2007, click on File Save As, select RTF (Rich Text Format), and save the file. You can then allow your search or content processing system to suck the file into the conversion subsystem and turn the RTF into whatever target output format the filter generates. In a more sophisticated form, you take an unstructured document or a database table, and you transform it into some file type that your system can process. A more interesting task is to convert a file into a file with a comparable structure; for instance, take and SGML instance and convert it to HTML. Some search system vendors include filters and transformation tools with their system. Others provide an application programming interface. The idea is that you will write a script to perform whatever conversion you require, handle entities in an appropriate manner, and preserve the information and metadata (if available) throughout the process.

Let’s take a quick look at several transformation challenges and then step back to consider what steps you can follow to minimize these problems. Before jumping into the causes, keep in mind that as much as 30 percent of an information technology department’s budget is consumed by transformation costs. This astounding number surfaced in a presentation given by a Google engineer in 2007. If that number seems high, you can knock it down to a more acceptable 10 or 20 percent. The point is that fiddling with data when moving it from one system and format to another is a common task. Any transformation activity can go off the tracks. Read more

New Contract for Clarabridge

May 15, 2008

Clarabridge, a “customer experience management vendor,” recently scored a posh client in Gaylord Hotels, who wants to utilize text analysis to review customer satisfaction surveys. Keeping millionaires happy requires technology.

The Clarabridge contract will install its content mining platform at Gaylord properties. The goal: to relate textual commentary to a satisfaction scale. Clarabridge’s product dumps extracted, unstructured data into a star schema to make associated fact tables, just like progenitor-once-removed MicroStrategy, the business intelligence company that passed on its reporting, analysis, and monitoring solutions DNA.

Clarabridge has a client list that includes big names Marriott, The Gap, H&R Block and more – making it quite unlikely that it will suffer a stock crash like Microstrategy did ($333 to $1 – ouch!) in 2001. Some pundits assert that Clarabridge is a company that will challenge Attensity www.attensity.com, a low-profile, fast-growing text analytics company headed by David Bean.

Gaylord, owner and operator of four vast and lavish resort hotel properties,  receives tens of thousands of guest commentaries through its Opryland (Nashville, Tenn.), Palms (Orlando, Fla.), Texan (Dallas/Fort Worth), and National (Washington, D.C./Maryland) properties in a Web-based survey. While polled information is fairly straightforward, the information gained in the “other comments” box at the end of a survey is expensive, difficult to quantify, and make useful using humans. Clarabridge’s platform will change all that.

At Clarabridge’s web site, you can download their white papers, case studies, industry resources and more.

Jessica Bratcher, May 15, 2008

Semantra and Conversational Analytics

May 15, 2008

Semantra asserts that it is a “pioneering developer of conversational analystics software”, or so it says in the news release a helpful person sent me.

The companies “conversational analytics” application pushes “beyond key word search” because a user can use “common language commands to retrieve specific information from back end databases”. You can read the Semantra announcement here: www.semantra.com/library/Semantra%202.0%20GA%20FINAL.pdf

The lingo “common language commands” means natural language processing or NLP. A number of vendors have embraced this approach in order to [a] eliminate the need for a specialist to intermediate between an enduser with a question and the database with the answers and [b] allow faster interaction with a database. After all, in business intelligence, the idea is to get the information quickly. Calling up an SAS or SPSS analyst, having that person understand what’s needed, creating the queries, pulling down the data cube, and providing that chunk of info to a manager on a deadline is generally viewed as a problem.

What’s interesting about the Semantra approach is that its tool is designed for Microsoft Dynamics CRM. Microsoft’s push into CRM or customer relationship management has been erratic. To make the situation more interesting, Microsoft is working to move Dynamics (an unhappy amalgam of several products) into the Live.com or “cloud” environment. Semantra is hoping that Microsoft’s CRM offerings will generate even greater demand for third-party tools that tame the Dynamics’ beastie.

ArnoldIT.com analyzed the Dynamics product and technology late in 2007 and found that it was even more complex than Microsoft SharePoint Search. Given the multiple products that make up SharePoint Search, we were surprised to find that the Dynamics team had bested the SharePoint team on this important yardstick. The Dynamics product line up consists of Microsoft’s own technology, Axapta, Great Plains, Navision, and Solomon components. These are mixed-and-matched into a somewhat complex suite of products.

We wish Semantra great success with their system. There will be strong demand for a product that can simplify the Microsoft CRM system. You can get more information about Semantra at wwwsemantra.com. The splash page for Microsoft Dynamics is at www.microsoft.com/dynamics. If you are interested in the ArnoldIT.com analysis of the Dynamics suite, contact seaky2000 at Yahoo dot com. The report costs US$125 via online payment for a password protected PDF.

Stephen Arnold, May 15, 2008

Vertica and Cloud-Based Business Intelligence

May 15, 2008

The IDG news service reported on May 12, 2008, that Vertica Systems will offer business intelligence as a service. You can read the complete IDG story here. Please, navigate to it quickly, since some IDG items can become tough to locate a few days after they appear. The computing horsepower will be provided by Amazon. Vertica will use the EC2 (Elastic Compute Cloud) infrastructure introduced by Amazon in August 2006.

Vertica, another column-oriented database shop, sees an opportunity for hosted and software as a service products. Smaller firms often lack the resources to install industrial-strength business intelligence systems on premises.

The pricing for the service begins at $2,000 per month for 500 gigabytes of data. You can read the Amazon Web Services catalog entry here.

In the meantime, Amazon has worked hard to build out its Web services. I’ve heard that the company has embraced Hadoop (a Google File System variant in open source) and Xen (another open source solution). Amazon has experienced some technical hiccups but has recovered quickly.

Amazon’s putting significant effort into its Web services, and Vertica’s use of the EC2 service will be an interesting one to watch. Amazon’s cloud services have beaten Google and other firms to the punch. Although one Google source pointed out to me that Google is able to learn from Amazon’s efforts. The implication is that Google can watch and wait until the market is “right” for Google to make a move. When it comes to infrastructure investments, Amazon’s spending lags behind Google’s. Amazon also has a leaner technical team. If Google enters this sector in a major way, Amazon’s technologists will have an opportunity to demonstrate their superiority to Google’s cloud-centric engineering.

I’m going to watch the Vertica service. If successful, it may spark a strong run up for Amazon. Then Vertica will have to make the math work. A typical Vertica on premises installation costs about $125,000. So, Vertica will have to make up the difference on volume, since the cloud service is likely to generate less revenue per customer. If support and customization costs rise, Vertica may find that getting the math to work could be tricky. Meanwhile, Google watches and learns.

Stephen Arnold, May 14, 2008

The Library of Congress and Semantic Search

May 14, 2008

The buzz about semantic search is rising. Powerset’s demonstration using Wikipedia data has triggered interest in searching in more intuitive ways. I received a news item about Semantra http://www.semantra.com, another player in this search market segment.

The Library of Congress is in the game too.

There’s an interesting news item “Semantic Search the Library of Congress”. To see how the US government approaches “beyond search”, navigate to http://lcsh.info/sh95000541. Once you have this url in your browser’s address bar, you can open a new window, and use this url to get a list of LCCNs to search semantically.
http://lcsh.info/.

The search result is a list of Use For terms, Narrower Terms (each of which is a hot link to more terms), the LC Classification, the date the entry was created, the date the entry was modified and alink to the Concept URI.

You will want to navigate to ProgrammableWeb.com http://www.programmableweb.com/api/library-of-congress-subject-headings and check out their explanation.

Based on this demonstration, today’s semantic search engines are not likely to be challenged in a meaningful way by a US government initiative any time soon.

Stephen Arnold, May 14, 2008

Collective Intelligence Anthology Available

May 14, 2008

The Arnoldit.com mascot admires the new collection of essay by Mark Tovey. Collective Intelligence: Creating a Prosperous World at Peace, published by the Earth Intelligence Network in Oakton, Virginia (ISBN: 13: 978-0-97-15661-6-3) contains more than 50 essays by analysts, consultants, and intelligence practitioners. You can obtain a copy from the publisher, Amazon, or your bookseller.

ci_art_02 copy

The ArnoldIT mascot completed reading the 600-page book with remarkable alacrity for a duck.

The collection of essays is likely to find many readers among those interested in social phenomena of networks. Many of the essays, including the one I contributed, talk about information retrieval in our increasingly inter connected world.

This essay will provide a synopsis of my contribution, “Search–Panacea or Play. Can Collective Intelligence Improve Findability”, which I wrote shortly before completing Beyond Search: What to Do When Your Search System Doesn’t Work“. My essay begins on page 375.

Social Search

The dominance of Google forces other vendors to look for a way over, under, around, or through its grip on the Web search. The vendor landscape now offers search and content processing systems that arguably do a better job of manipulating XML (Extensible Markup Language) content, figuring out who knows whom (the social graph initiative), and the “real” meaning of content (semantic search). There are more than 100 vendors who have technology that offers, if one believes the marketing collateral and conference presentations, a way to squeeze more information from information.

Social search is the name given to an information retrieval system that incorporates one or more of these functions:

  1. Users can suggest useful sites. Examples: Delicious.com and StumbleUpon.com
  2. The system discovers relationships between and among processed documents and links: Powerset.com and Kartoo Visu
  3. The system analyzes information extracts entities and identifies individuals and their relationships: i2 Ltd (now part of ChoicePoint) and Cluuz.com
  4. Monitoring of user behavior and using data to guide relevance, spidering and other system functions: public Web indexing companies

There are other types of social functions, but these provide sufficient salt and pepper for this information side dish. The reason I say side dish is that social functions are not going to displace the traditional functions on which they are based. Social search has been in the mainstream from the moment i2 Ltd. introduced its workbench product to the intelligence community more than a decade ago. “Social” functions, then, are a recent add-on to the main diet in information retrieval.

Old Statistics and Cheap, Powerful Computers

What’s overlooked in the rush to find a Google “killer” is that the new companies are using some well-known technologies. For example, the inner workings of Autonomy’s “black box” is somewhat dependent on the work of a slightly unusual Englishman, Thomas Bayes. Mr. Bayes left the world a couple of centuries ago, but his math has been a staple in college statistics courses for many years. To deploy Bayesian techniques on a large scale is, therefore, not exactly a secret to the thousands of mathematicians who followed his proofs in pursuit of their baccalaureate.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta