Data Harmony Update a Suite Release
May 16, 2008
Access Innovations Inc., a data management systems company, is releasing version 3.4 of its Data Harmony software suite, and it sounds like a sweet deal.
The five-component software is used to make and maintain taxonomies, thesaurus, and indexing systems. Data Harmony focuses on accuracy, precision, and repeatability in its search results, an emphasis that receives a happy quack from the Arnold IT mascot.
The major updates include more than 30 new features and revised documentation (to keep you in tune). The company says current users will recognize the same look and feel of the program and appreciate “friendlier and more functional features.”
President and Chairman Marjorie M.K. Hlava said the upgrade comes courtesy user requests and suggestions. It’s refreshing to find a tech company making such efforts to rework a good product and actually making it better. We like Ms. Hlava’s old-fashioned, hands-on, we-care approach most refreshing at a time when software vendors do better PR than coding. The full list of the Data Harmony enhancements for 3.4 can be found here.
Jessica Bratcher, May 16, 2008
Enterprise Search Vendors’ Taglines
May 16, 2008
A colleague in San Francisco asked me on May 14, 2008, “How do the search engine vendors position themselves?”
I told him that I would think about the question on the luxurious red-eye flight from SFO to Detroit. I did. I worked through the files on my trusty laptop and compiled a list of the taglines for some of the vendors whom I monitor. The list is not exhaustive, but I had data about a couple of dozen companies in the behind-the-firewall search business.
The table below provides a summary of the taglines. These are quite interesting, and I was surprised at the different approaches taken to explaining the companies’ systems. For example, I liked the taglines that echoed Caesar’s I came, I saw, I conquered (Vini, vidi, vici). SchemaLogic says, “Find. Use. Protect.” Thetus asserts, “Find. Assess. Fit. Understand.” Lexalytics crafts, “Discover. Understand. Act.”
Several of the companies use active or instrumental catchphrases. Brainware, a spin out from a German content management company, uses, “Intelligence unleashed.” I thought of a tiger pursuing me through the Louisville Zoo. And InQuira says, “Harvest knowledge.” Nstein, a company that has undergone accelerated evolution,
Less creative influences put a damper on marketing passion in these slogans. Panoptic (now Funnelback) gently offers, “Internet and Enterprise Search.” Almost matching the Australian’s tagline is Fast Search & Transfer’s “The business of search.” Clearforest matches these in understatement with its “Text Analytics Solutions.” ZyLAB comes close too, saying, “Infomation Access Solutions.”
Other companies use the tagline as elevator speeches on a diet. For example, Endeca, flush with investments from Intel and SAP, states, “Innovative Software to Help People Explore, Analyze, and Understand Information.” Not to be outdone in the pitch department is ISYS Search Software’s “Enterprise Search Solutions for Real People Doing Business in the Real World.” (I like the “real” part of this statement because some of the taglines are a bit abstract.) Stratify (formerly Purple Yogi) stikes a Zen-like note: “Focus on the Matter of eDiscovery with Peace of Mind.” When I repeat this five times, my heart rate slows and my blood pressure drops.
Other vendors assert that their system is Numero Uno in the search-and-retrieval sector in a nice way, of course. Open Text, a company with as many search technologies as Microsoft, declares themselves “The Content Experts.” And, Dieselpoint opines, “The Leader in Search & Navigation Technology.”
A small number of vendors drift into the poetic. Exegy uses repetition and alliteration to explain its super-fast appliance: “Extreme Speed. Extreme Insight.” Or, SurfRay (owner of Mondosoft and Speed of Mind) and its rhytmic “We Move People to Discover.” Note that SurfRay itself, a relative newcomer to search, describes itself this way, “Pioneers in Enterprise Search and Behavior Analytics.” Strong stuff and sure to cat catch the attention of Autonomy working overtime to catch up with the “Don’t be evil” Googlers.
Google Translate
May 15, 2008
The Google Search Appliance is a pretty nifty gizmo when you know how to “pimp” your GSA with the One Box API. On May 15, 2008, the GOOG confirmed what I heard at the Where 2.0 conference yesterday afternoon: Google Translate now handles another 10 languages. You can read the Googlers’ official announcement here.
Why mention this on Beyond Search with its narrow editorial scope. Well, in addition to cross language searchers, you get to play with the AJAX language API. For the clever kids at Adhere Solutions, you can use some of this translation goodness to allow a language-challenged person to read a document written in another language by a colleague half way across the world.
These baby step announcements can be overlooked unless you keep your ears attuned to the sound of big Googzilla paws advancing toward the enterprise. What will Google tell me about this “interesting” expansion of Google Translate. I’m a persona non grata, so my queries fall on deaf but wealthy ears. I can hear those claws scraping across the pavement of Shoreline Drive.
Stephen Arnold, May 15, 2008
Content Transformation: A Challenge that Won’t Go Away
May 15, 2008
We live in a world of Web 2.0 and Web 3.0 goodness. At the Where 2.0 conference in Burlingame, California on May 14, 2008, I overheard this snippet of conversation:
We had everything working, but when we imported content, the system crashed. I reinstalled. I checked the config files. It still crashed. I have to open each file, resave it as an RTF, and import them one at a time. Grrrr.
Sound familiar?
I have heard this complaint many times before. In our content-savvy, XML-ized era, moving a source file into a content processing system should be trivial. The content processing system can extract entities. It can metatag. Some can slice, dice, and cook a chicken. But unless the system can intake content and transform it to something that the content processing subsystem understands, the system is dead in the water. Even worse, the text processing system only processes some of the source documents. In certain mission critical applications, kicking out documents is a no-no. Not only is the manual manipulation expensive, it’s time consuming. In those minutes or hours of fiddling, potentially significant data are not available to the analysts. What does missing information cost? Well, it depends on your work situation. In the Wall Street world, investment information can turn a win into a loss in a millisecond. In certain military applications, the information may mean the difference between health and harm.
Transforming a square into a circle or a circle into a square looks easy. With a triangle and a compasss you can create two objects. Its the intermediate steps that become tricky for an artist or a budding mathematician.
What is file or data transformation? In its simplest form, you have a file in Microsoft Word 2007 format, and you want to “transform” or change the file into a format recognized by another system’s import filter. So, one approach would be to open the File in Word 2007, click on File Save As, select RTF (Rich Text Format), and save the file. You can then allow your search or content processing system to suck the file into the conversion subsystem and turn the RTF into whatever target output format the filter generates. In a more sophisticated form, you take an unstructured document or a database table, and you transform it into some file type that your system can process. A more interesting task is to convert a file into a file with a comparable structure; for instance, take and SGML instance and convert it to HTML. Some search system vendors include filters and transformation tools with their system. Others provide an application programming interface. The idea is that you will write a script to perform whatever conversion you require, handle entities in an appropriate manner, and preserve the information and metadata (if available) throughout the process.
Let’s take a quick look at several transformation challenges and then step back to consider what steps you can follow to minimize these problems. Before jumping into the causes, keep in mind that as much as 30 percent of an information technology department’s budget is consumed by transformation costs. This astounding number surfaced in a presentation given by a Google engineer in 2007. If that number seems high, you can knock it down to a more acceptable 10 or 20 percent. The point is that fiddling with data when moving it from one system and format to another is a common task. Any transformation activity can go off the tracks. Read more
New Contract for Clarabridge
May 15, 2008
Clarabridge, a “customer experience management vendor,” recently scored a posh client in Gaylord Hotels, who wants to utilize text analysis to review customer satisfaction surveys. Keeping millionaires happy requires technology.
The Clarabridge contract will install its content mining platform at Gaylord properties. The goal: to relate textual commentary to a satisfaction scale. Clarabridge’s product dumps extracted, unstructured data into a star schema to make associated fact tables, just like progenitor-once-removed MicroStrategy, the business intelligence company that passed on its reporting, analysis, and monitoring solutions DNA.
Clarabridge has a client list that includes big names Marriott, The Gap, H&R Block and more – making it quite unlikely that it will suffer a stock crash like Microstrategy did ($333 to $1 – ouch!) in 2001. Some pundits assert that Clarabridge is a company that will challenge Attensity www.attensity.com, a low-profile, fast-growing text analytics company headed by David Bean.
Gaylord, owner and operator of four vast and lavish resort hotel properties, receives tens of thousands of guest commentaries through its Opryland (Nashville, Tenn.), Palms (Orlando, Fla.), Texan (Dallas/Fort Worth), and National (Washington, D.C./Maryland) properties in a Web-based survey. While polled information is fairly straightforward, the information gained in the “other comments” box at the end of a survey is expensive, difficult to quantify, and make useful using humans. Clarabridge’s platform will change all that.
At Clarabridge’s web site, you can download their white papers, case studies, industry resources and more.
Jessica Bratcher, May 15, 2008
Sybase Jumps into the Content Processing Appliance Fray
May 13, 2008
Sybase announced on May 12, 2008, the roll out of its Sybase Analytic Appliance. The hardware is an IBM Power System preconfigured with Sybase IQ, Sybase PowerDesigner, and MIcroStrategy 8. The idea is to eliminate the fiddly tasks associated with setting up a data and content processing system. The idea is that a customer will get the benefits of a custom-built enterprise data warehouse in a ready-to-deploy device.
Sybase IQ is the column-oriented Sybase database engine. Column databases offer a performance boost over traditional relational databases. Sybase PowerDesigner is a model-driven tool intended to reduce the pain of building report requirements, models, and related tasks. MIcroStrategy 8 is a business intelligence system.
The cost for the system is based on the data volume. The information I saw quote an introductory price of $27,000 per terabyte of data. The design of the appliance allows “snap in” scaling. There are three versions of the appliance, and the prices rise as you move from the starter to standard to enterprise version. You can buy the device from Sybase, MIcroStrategy, or mLogica (a systems integrator).
Appliances can be criticized for their limited functionality. Sybase has done a good job of providing a bundle that gives the licensee considerable freedom to configure the device and manipulate data. Compared to other industrial-strength appliances, Sybase has an attractive launch price point. You will need to determine your data volume and data change rate in order to determine which appliance version is appropriate for your organization.
Stephen Arnold, May 13, 2008
Intelligenx Discloses Referrals Fuel Rapid Growth
May 12, 2008
In an exclusive interview, Iqbal and Zubair Talib, senior managers of Intelligenx, reveal that referrals have fueled the company’s rapid growth. Intelligenx has a leadership position in directory and “yellow page” search in South Africa, South America, and elsewhere. The company’s profile, despite its US headquarters in suburban Washington, DC, is modest.
The father-son team said:
It seems that our international clients are actively talking about our technology at international conferences. We can always do a better job of marketing, but we put our customers first. Sales occur because people come to us and say, “We want to license your system”… we maintained certain relationships among an elite group of scientists and engineers. We never signed up to give marketing talks at the marketing-oriented venues. Our success comes because certain people understand our technology and recognize that it delivers scale, speed, performance, data management today. Our technology is our marketing.
Unlike search and content processing firms who issue news releases when a Web site signs on to use a well-known search engine or when a vendor announces for the second or third time a reseller deal, Intelligenx keeps innovating and selling.
The company’s system offers almost all of the features associated with the best-known vendors in the search market sector. The Talibs said:
Intelligenx was first to market with technology that offered a true full-text search with what many people call faceted or assisted search results. To achieve this functionality, performance under heavy loads is the prevailing challenge and simply put, our Discovery Engine® solves the problem in what we think is a most elegant fashion “Facets” or “guided navigation” are not just a “checkbox” on a feature matrix but an underlying central philosophy in our technology, the company, and in the development of our system.
You can read about the company’s new stream processing of information, what the Talibs call “cluster flow”. In addition to near real time index updating, additional metadata are generated without adding latency to the system. Another interesting feature of the Intelligenx system is that a licensee can provide its sales people with a real time view of what advertisements are germane to a popular query. The sales person is able to show a prospective advertiser a live report of traffic and the payoff from an advertisement in a specific context.
The company’s technology offers an alternative to the better-known MarkLogic system and the specialist firm, Dieselpoint.
You can read the entire interview on the ArnoldIT.com Web site. The full text of the interview is part of the Search Wizards Speak feature. The exclusive interview is the 13th in this series of first-person accounts of the origin and functionality of important search and content processing systems. Click here to read the interview.
Powerset Available
May 12, 2008
Navigate to Powerset.com and try out the much-publicized Web search system. Using proprietary technology plus third-party components, Powerset is a semantic search system. The system differentiates itself with fact extraction (Factz, in Powerset jargon), direct links to definitions, and a summary / outline view. A big yellow sticky note says that Powerset is searching Wikipedia articles, but my test queries returned useful information in the results list in default mode; for example, the name of Tropes Zoom, a system I had heard about but never seen. A quick Google search allowed me to pinpoint Semantic Knowledge as a company with a technology of this name. I’m not sure Powerset envisioned my use of its system as a front end for Google, but that use jumped out at me. Check it out and let me know if you think it is better than Google, Hakia, or Exalead. These are systems that contain a dollop of semantic sauce. Hopefully the company will provide a larger content index either by spidering the Web or via a metasearch like Vivisimo’s.
Stephen Arnold, May 12, 2008
Google: A Brace of Media Analyzer Inventions
May 11, 2008
On May 8, 2008, the USPTO, an outstanding organization with a stellar search system, published two Google patent applications. US2008/0107337 is “Methods and Systems for Analyzing Data in Media Material Having Layout” and US2008/0107338 is “Media Material Analysis of Continuing Article Portions”. You can download these here.
Both inventions, to which Google is the assignee, pertain to figuring out what’s important and what’s not on Web pages. Companies that scan hard copy and convert those images to machine-readable ASCII use some tricks but a great deal of brute force to figure out what’s information and what’s advertising or other dross.
The inventions’ systems and methods can also be applied to other types of images converted to a machine-readable form; for example, a PDF that consists of the PDF wrapper and the TIFF image in the wrapper. I know that commercial database publishers are on top of Google’s innovations in content processing, so this is old news to the wizards at ProQuest, Reed Elsevier, and Thomson Reuters. But others in the less rarified atmosphere may find these disclosures interesting. Two patent documents stumbling through the USPTO’s hallowed halls are not an accident of fate.
Stephen Arnold, May 11, 2008
Semantic Web: Useful Links
May 10, 2008
Advancing Insights posted a list of useful links for “Web 3.0, RDF, and the Semantic Web”. A content goose squawk for Jim Wilde for the links. Clicking through these documents is instructive. If you follow Google’s activities in the semantic space, you can see why Google has pushed forward with its programmable search engine.
Invented by former IBM Almaden scientist, the PSE or programmable search engine could, if deployed on a large scale by Google, make Google the de facto “hub” for semantic processing. You can download one of the Google PSE documents by navigating to the USPTO’s awesome Web site and searching for US2007 00386616, filed on April 10, 2005, and published on February 15, 2007.
Stephen Arnold, May 9, 2008


