Mark Logic and Basis Technology

October 13, 2008

Browsing the Basis Technology Web site revealed an October 7, 2008, news release about a Basis Technology and Mark Logic tie up. You can read the news release here or here. Basis Technology licenses text and content processing components and systems. The Basis Technology announcement says “Rosette Entity Extractor provides advanced search and text analytics for MarkLogic Server 4.0.” Mark Logic, as I have noted elsewhere in this Web log, is one of the leading providers of XML server technology. The new version can store, manage, search, and deliver content in a variety of forms to individual users, other enterprise systems, or to devices. REX (shorthand for Rosette Entity Extractor) can identify people, organizations, locations, numeric strings such as credit card numbers, email address, geographic data, and other items such as dates from unstructured or semi structured content. I don’t have details on the deal. My take on this is that Mark Logic wants to put its XML engine into gear and drive into market spaces not now well served with applications and functions in other vendors’ XML systems. Enterprise search is dead. Long live more sophisticated information and data management systems. Search will be tucked in these solutions, but it’s no longer the focal point of the system. I am pondering the impact of this announcement on other XML vendors and upon such companies as Microsoft Fast Search.

Stephen Arnold, October 13, 2008

Written by Stephen E. Arnold · Filed Under Business strategy, Enterprise, News, Semantic, Text analytics, Text processing | Comments Off on Mark Logic and Basis Technology

The Financial Times Rediscovers Text Mining

October 11, 2008

On October 8, 2008, the former owner of Madame Tussaud’s wax museum until 1998, published Alan Cane’s “New Techniques Find Meanings in Words.” Click “fast” because locating Financial Times’s news stories can be an interesting exercise. You can read this “news” in the Financial Times, a traditional publishing company with the same type of online track record as the Wall Street Journal and the New York Times. The premise of Mr. Cane’s article is that individuals need information about people, places, and things. Apparently Mr. Cane is unfamiliar with the work of i2 in Cambridge, England, Linguamatics, and dozens of other companies in the British Commonwealth alone actively engaged in providing systems that parse content to discern and make evident information of this type. Nevertheless, Mr. Cane reviews the ideas of Sinequa, Google, and Autonomy. You can read about these companies and their “new” technology in this Web log. For me, the most interesting comment in this write up was this passage attributed in part to the Charles Armstrong, CEO of Trampoline Systems, a company with which I am not familiar:

“The rise of Web 2.0 in the consumer world alerted business to the role that social contacts and networks play. When you are dealing with a project that requires a particular knowledge, you look for the person with the knowledge, not a document.” Mr Armstrong says Trampoline’ [System]s search engine is the first to analyse not just the content of documents but the professional networks of those connected to the documents.

There are three points in this snippet that I noted on my trusty yellow pad:

Who is Charles Armstrong?
What is the connection between the specious buzzword “Web 2.0” and entity extraction. I recall Dr. Ramana Rao talking about entity extraction in the mid-1980s. Before that, various government agencies had systems that would identify “persons of interest”. Vendors included ConQuest Technologies, acquired by Excalibur and even earlier saved queries running against content in the Dialog and LexisNexis files. Anyone remember the command UD=9999 from 1979.
What’s with the “Web 2.0” and the “first”? You can see this type of function on public demonstration sites at www.cluuz.com and www.silobreaker.com. You can also ring your local Kroll OnTrack office, and if you have the right credentials, you can see this type of operation in its industrial strength form.

Here’s what I found:

CRM Magazine named Trampoline Systems a rising start in 2008
Charles Armstrong, Cambridge grad, is an “ethnographer turned technology entrepreneur.” The company Trampoline Systems was founded in 2003 to “build on his research into how small communities distribute information to relevant recipients.” Ah, the angle is the blend of entity extraction and alerts. Not really new, but more of an angle on what Mr. Armstrong wants to deliver to licensees. Source: here. You can read the Wikipedia profile here. His Linked In profile carries this tag: “Ethnographer gone wrong” here. His Web log is here.
Craig McMillan is the technology honcho. According to the Trampoline Web site here, he is a veteran of Sun Microsystems where he “led the technical team building the Identrus Global Trust Network Identity assertion platform led technical team for new enterprise integration and meta-directory platform.” Source: here. I found it interesting that the Fast Forward Web log, the official organ of the pre-Microsoft Fast Search & Transfer, wrote about Mr. McMillan’s work in early 2007 here in a story called “Trampoline Systems: Rediscovering the Lost Art of Communications.” The Fast Forward article identifies Raytheon, the US defense outfit, as a “pilot”. Maybe Fast Search should have purchased this company before the financial issues thrust Fast Search into the maw of Microsoft?
I located an Enron Explorer here. This seems to be a demo of some of the Trampoline functionality. But the visualizer was not working on October 10, 2008.
The core products are packaged as the Sonar Suite. You can view a demo of a Tacit Software like system here. You can download a demo of the system here. The graphics look quite nice, but the entity precision, relevance, throughput and query response time are where the rubber meets the road. A nice touch is that the demos are available for Macs and PCs. With a bit of clicking from the Trampoline Systems’ home page, you can explore the different products the company offers.
Web Pro News has a useful write up about the company which appeared in 2006 here.

Charles Armstrong’s relationships as identified by the Canadian company Cluuz.com appear in the diagram below. You can recreate this map by running this query “Charles Armstrong” + Trampoline on Cluuz.com. The url for the map below is http://www.cluuz.com/ClusterChart.aspx?req=633592276174800000&key=9

This is Cluuz.com’s relationship map of Charles Armstrong, CEO of Trampoline Systems. “New” is not the word I would use to describe either the Cluuz.com or the Trampoline relationship visualization function. Both have interesting approaches, but the guts of this type of map have been around for a couple of decades.

Let me be clear: I am intrigued by the Trampoline Systems’ approach. There’s something there. The FT article doesn’t pull the cart, however. I am, therefore, not too thrilled with the FT’s write up, but that’s my opinion to which I am entitled.

Make up your own mind. Please, read the Financial Times article. You will get some insight into why traditional media struggles to explain technology. Neither the editors nor the journalist takes the time or has the expertise to figure out what’s “new” and what’s not. My hunch is that trampoline does offer some interesting features. Ripping through some contacts with well known companies and jumping to the “new” assertion calls into question the understanding of the subjects about which the UK’s top journalists write. Agree? Disagree? Run a query on FT.com for “Trampoline Systems” before you chime in, please.

Stephen Arnold, October 10, 2008

Written by Stephen E. Arnold · Filed Under Business strategy, Enterprise, News, Semantic, Text analytics, Text processing | 1 Comment

Data Mining: A Bad Report Card

October 9, 2008

Two readers sent me a link to reports about the National Research Council’s study findings about data mining. Declan McCullagh’s “Government Report: Data Mining Doesn’t Work Well” for CNet is here. BoingBoing’s most colorful write up of the report is here. The is certainly catchy, “Data Mining Sucks: Official Report.” The only problem with the study’s findings is that I don’t believe the results. I had a stake in a firm responsible for a crazy “red, yellow, green” flagging system for a Federal agency. The data mining system worked like a champ. What did not work was the government agency responsible for the program and the data stuffed into the system. Algorithms are numerical recipes. Some work better than others, but in most cases, the math in data mining is pretty standard. Sure there are some fancy tricks, but these are not the deep, dark secrets locked in Descartes’ secret notebooks. The math is taught in classes that dance majors and social science students never, ever consider taking. Cut through the math nerd fog, and the principles can be explained.

I am also suspicious that nothing reassures a gullible reader more than a statement that something is broken. I don’t think I am going to bite that worm nestled on a barbed hook. Clean data, off-the-shelf algorithms, reasonably competent management, and appropriate resources–data mining works. Period. Fumble the data, the management, and the resources–data mining outputs garbage. To get a glimpse of data mining that works, click here. Crazy stuff won’t work. Pragmatic stuff works really well. Keep that in mind after reading the NRC report.

Stephen Arnold, October 9, 2008

Written by Stephen E. Arnold · Filed Under News, Search, Semantic, Technology, Text analytics, Text processing | Comments Off on Data Mining: A Bad Report Card

MarkLogic 4.0: A Next-Generation Content System

October 7, 2008

Navigate to KDNuggets here, and you can see a line up of some of the content processing systems available on October 2, 2008. The list is useful but it is not complete. There are more than 350 vendors in my files, and each asserts that it has a must-have content system. Most of these systems suffer from one or more drawbacks; for example, scaling is problematic or just plain expensive, repurposing the information is difficult, or modifying the system requires lots of fiddling.

MarkLogic 4.0 addresses a number of these common shortcomings in text processing and XML content manipulation. The company “accelerates the creation of content applications.” With MarkLogic’s most recent release of its flagship server product, MarkLogic offers a content platform, not a content utility. Think in terms of most content processing companies as tug boats. MarkLogic 4.0 is an ocean growing vessel with speed, power, and range. When I spoke with MarkLogic’s engineers, the ideas for enhancements to MarkLogic 3.2, the previous release, originated with MarkLogic users. One engineer said, “Our licensees have discovered new uses for the system. We have integrated into the code base functions and operations that our customers told us they need to get the most value from their information. Version 4.0 is a customer driven upgrade. We just did the coding for them.”

Most text processing systems, including XML databases, are useful but limited in power and scope. The MarkLogic 4.0 system is an ocean going vessel among harbor bound craft.

You can learn quite a bit about the functionality of MarkLogic in this Dr Dobbs’s interview with Dave Kellogg, CEO of this Sequoia-backed firm. The Dr Dobbs’ interview is here.

MarkLogic is an ocean going vessel amidst smaller boats. The product is an XML server, and it offers search, analytics, and jazzy features such as geospatial querying. For example, I can ask a MarkLogic system for information about a specific topic within a 100 mile radius of a particular city. But the core of MarkLogic 4.0 is an XML database. When textual information or data are stored in MarkLogic 4.0, slicing, dicing, reporting, and reshaping information provides solutions, not results lists.

According to Andy Feit, vice president, MarkLogic is “a mix of native XML handling, full-text search engines, and state-of-the-art DBMS features like time-based queries, large-scale alerting, and large-scale clustering.” The new release adds important new functionality. New features include:

Geospatial support for common geospatial markup standards plus an ability to display data on polygons such as state boundaries or a sales person’s region. The outputs or geospatial mash ups are hot linked to make drill down a one-mouse click operation
Push operations such as alerts sent to a user’s mobile phone or triggers which operate when a content change occurs which, in turn, launches a separate application. The idea is to automate content and information operations in near real time, not leave it up to the system user to run a query and find the important new information.
Embedded entity enrichment functionality including support for Chinese, Russian and other languages
Improved support for third party enterprise entity extraction engines or specialized systems. For example, the new version ships with direct support for TEMIS’s health and medical processing, financial services, and pharmaceutical content processing system. MarkLogic calls its approach “an open framework”
Mobile device support. A licensee can extract data from MarkLogic and the built in operations will format those data for the user’s device. Location services become more fluid and require less developer time to implement.

The new release of MarkLogic manipulates XML quickly. In addition to performance enhancements to the underlying XML data management system, MarkLogic supports the Xquery 1.0 standard. Users of earlier versions of MarkLogic server can continue to use these systems along side Version 4.0. According to Mr. Feit, “Some vendors require a lock step upgrade when a new release becomes available. At MarkLogic, we make it possible for a licensee to upgrade to a new version incrementally. No recoding is required. Version 4, for example, supports earlier versions’ query language and scripts.”

Written by Stephen E. Arnold · Filed Under Business strategy, Database, News, Search, Semantic, Text analytics, Text processing | Comments Off on MarkLogic 4.0: A Next-Generation Content System

Powerset’s Approach to Search

October 6, 2008

Powerset was acquired by Microsoft for about $100 million in June 2008. I haven’t paid too much attention to what Microsoft has done or is doing with the Powerset semantic, natural language, latent semantic indexing, et al system it acquired. A reader sent me a link to Jon Udell’s well Web log interview that focuses on Powerset. If you want to know more about how Microsoft will leverage the aging Xerox Parc technology, you will want to click here to get an introduction to the Perspectives interview conducted on September 30, 2008, with Scott Prevost. You will need to install Silverlight, or you can read the interview transcript here.

I can’t summarize the lengthy interview. For several three points were of particular interest:

The $100 million bought Powerset, but Microsoft had to then license the Xerox Parc technology. You can get some “inxight” into the functions of the technology by exploring the SAP/ Business Objects’ information here.
The Powerset technology can be used with both structured and unstructured information.
Microsoft will be doing more work to deliver “instant answers”.

A happy quack to the reader who sent me this link, and two quacks for Mr. Udell for getting some useful information from Scott Prevost. I am curious about the roles of Barney Pell (Powerset founder) and Ron Kaplan (Powerset CTO and former Xerox Parc wizard) in the new organization. If anyone can shed light on this, you too will warrant a happy quack.

Stephen Arnold, October

Written by Stephen E. Arnold · Filed Under Microsoft, News, Online (general), Search, Semantic, Text analytics, Text processing | 1 Comment

Exalead’s High Performance Platform: CloudView

October 5, 2008

It’s no secret. When I profiled Exalead in one of the first three editions of Enterprise Search Report that I wrote, I likened the company’s plumbing to Google’s. The DNA of AltaVista.com influenced Google and Exalead. For most 20 somethings, AltaVista.com was one of a long line of pre-Google flops. That, like prognostications about Web 3.0, is not exactly on target.

The AltaVista.com search system was a demonstration of several interesting technologies developed by Digital Equipment Corporation’s engineers over many years. First, there was the multi core processor that ran hotter than the blood of a snorting bull in Pamplona. Second, there was the nifty manipulation of memory. In fact, that memory manipulation allowed Oracle performance in the system I played with to zip right along in the mid 1990s as I recall. And, the DEC engineers were able to index the Internet with its latency and flawed HTML so that a query was processed and a results list displayed quickly on my dial up modem in 1996. I even have a copy of AltaVista desk top search, one of the first of these scaled down search systems intended to make files in hierarchical systems findable. On my bookshelf is a copy of Eric and Deborah Ray’s AltaVista Search Revolution. Louis Monier wrote the forward. He used to work at Google, and, what few people know, is that Mr. Monier lured the founder of Exalead to work on the AltaVista.com project. Like I said, the DNA of AltaVista influenced Google and Exalead. In 1997, some AltaVista engineers were not happy campers after DEC was acquired by Compaq and then Hewlett Packard acquired Compaq. In the fury of the HP’s efforts to become really big, tiny AltaVista.com was an orphan, and an unwanted annoyance clamoring for hardware, money, engineering, and a business model.

François Bourdoncle–unlike Louis Monier, Jeff Dean, Sanjay Ghemawat, and Simon Tong, among others–did not join Google. In year 2000, he set up Exalead to build a next-generation information access and content processing system. What I find interesting is that just the trajectory of Google in Web search was affected by the AltaVista.com “gravity,” Exalead’s trajectory in content processing was also touched by the AltaVista.com experiment.

A result list from Exalead’s Web search system. Try it here.

When M. Bourdoncle founded Exalead, he wanted to resolve some of AltaVista’s known weaknesses. For example, the heat issues associated with the DEC Alpha chips was one problem. Another was rapid scaling, using commodity hardware, not hand crafted components which take months to obtain.

Exalead now has, according to the company’s Web site, more than 170 licensees. Earlier this week (October 1, 2008), Exalead CloudView, a new version of the company’s platform and new software features.

Paula Hane, Information Today, provided this run down of the new Exalead features:

Unlimited scalability and high performance
Business-level tuning and management of the search experience
Streamlined administration UI
Full traceability within the product
WYSIWYG configuration of indexing and search workflows
Advanced configuration management system (with built-in version control)
Improvements in the relevancy model
Provision for additional connectors with simple and advanced APIs for third-party implementations

You can read her “Exalead Offers a Cloud(y) View of Information Access here. The article provides substantive, useful information. For example, Ms. Hane reports:

One large [Exalead] customer in the U.K. can’t say enough good things about the choice of Exalead—its search solution was up and running in just 3 months. “After performing an extensive three-month technical evaluation of the major enterprise search software vendors we found that Exalead had the best technology, vision and ability to fulfill our demanding requirements,” says Peter Brooks-Johnson, product director of Rightmove, a fast-growing U.K. real estate Web site. “Not only does Exalead require minimal hardware to work effectively, but Exalead has a strong, accessible support team and a culture that takes pride in its customer implementations.”

(Note: A happy quack to Ms. Hane, whom I am quoting shamelessly in this Web log post.)

Phil Muncaster’s “Exalead Claims Enterprise Search Boost” here does a good job of explaining what’s coming from this Paris-based information access company. For me the most significant point in the write up was this passage:

The new line features a streamlined user interface, improved relevancy and the ability to extend business intelligence applications to textual search…

In my investigation of search company technology, I learned that Exalead’s ability to scale is comparable to Google’s. As Mr. Muncaster noted, the forthcoming version of the Exalead software–called CloudView–will put Exalead squarely in the business intelligence sector of the content processing market.

You can get more information about Exalead here. A fact sheet is also available here. Exalead’s Web index is available at www.exalead.com.

I have to wrangle a trip to Paris and learn more about Exalead. I hear the food is okay in Paris. The French have a strong tradition in math as well. I remember such trois étoiles innovators as Descartes, Mersenne, Poincaré, and Possson, and others. In my opinion, Microsoft should have acquired Exalead, not Fast Search & Transfer. Exalead is a next generation system; it scales; and it is easily “snapped in” to enterprise environments, including those dependent on SharePoint. I think Exalead is a company I want to watch more closely.

Stephen Arnold, October 5, 2008

Written by Stephen E. Arnold · Filed Under Cloud computing, Enterprise, News, Online (general), Search, Technology, Text analytics, Text processing | Comments Off on Exalead’s High Performance Platform: CloudView

The Goose Quacks: Arnold Endnote at Enterprise Search Summit

October 4, 2008

Editor’s Note: This is a file with a number of screen shots. If you are on a slow connection, skip this document.

One again I was batting last. I arrived the day before my talk from Europe, and I wasn’t sure what time it was or what day it was. In short, the addled goose was more off kilter than I had been in the Netherlands for my keynote at the Hartmann Utrecht conference and my meetings in Paris squished around the Utrecht gig.

I poked my head into about half of the sessions. I heard about managing search, taxonomies, business intelligence, and product pitches disguised as analyses. I’m going to be 65; I was tired; and I had heard similar talks a few days earlier in Europe. The challenges facing those involved with search are reaching a boiling point.

After dipping into the presentations, including the remarkable Ahead in the Clouds talk by Dr. Werner Vogels, top technical gun at Amazon, and some business process management razzle dazzle, I went back to the drawing board for my talk. I had just reviewed usage data that revealed that Google’s lead in Web search was nosing towards 70 percent of the search traffic. I also had some earlier cuts at the traffic data for the Top 50 Web sites. In the two hours before my talk, I fiddled with these data and produced an interesting graph of the Web usage. I did not use it in my talk, sticking with my big images snagged from Flickr. I don’t put many words on PowerPoint slides. In fact, I use them because conference organizers want a “paper”. I just send them the PowerPoint deck and give my talk using a note card which I hold in my hand or put on the podium in front of me. I hate PowerPoints.

Here’s the chart I made to see how the GOOG was doing in terms of Microsoft and Yahoo.

Source: http://blogs.zdnet.com/ITFacts/

The top six sites are where the action is. The other 44 sites are in the “long tail”. In this case, the sites out of the top 50 have few options for getting traffic. The 44 sites accounted in August 2008 for a big chunk percent of the calculated traffic, but no single site is likely to make it into the top six quickly. Google sits on top the pile and seems to be increasing its traffic each month. Google monetizes its traffic reasonably well, so it is generating $18 billion or so in the last 12 months.

In the enterprise search arena, I have only “off the record” sources. These ghostly people tell me that Google has:

Shipped 24, 600 Google Search Appliances. For comparison, Fast Search & Transfer prior to its purchase by Microsoft had somewhere in the neighborhood of 2,500 enterprise search platform licensees. Now, of course, Fast Search has access to the 100 million happy SharePoint customers. Who knows what the Fast Search customer count is now? Not me.
Become the standard for mapping in numerous government agencies, including those who don’t have signs on their buildings
Been signing up as many as 3,000 Google Docs users per day, excluding the 1.5 million school children who will be using Google services in New South Wales, Australia.

I debated about how to spin these data. I decided to declare, “Google has won the search battle in 2008 and probably in 2009.” Not surprisingly, the audience was disturbed with my assertion. Remember, I did not parade these data. I use pictures like this one to make my point. This illustration shows a frustrated enterprise search customer setting fire to the vendor’s software disks, documentation, and one surly consultant:

How did I build up to the conclusion that Google has won the 2008-2009 search season. Here are the main points and some of the illustrations I used in my talk.

Written by Stephen E. Arnold · Filed Under Business strategy, Cloud computing, Conferences, News, Online (general), Search, Technology, Text analytics, Text processing | 1 Comment

Cognos 8: Blurring Business Intelligence and Search

October 4, 2008

The death of enterprise search and the wobblies pulling down content management systems (CMS) are not well understood by licensees–yet. In the months going forward, the growing financial challenges in North America and Western Europe will take a toll on spending for information technology. The strong interest (based on my analysis of the clicks on the articles on this Web site) suggest that some folks are thinking hard about the utility of open source search systems and lower-cost alternatives to the seven figure price tags on some of the high profile search systems. I can’t mention these firms by name. My attorney is no fun at all. You can identify these vendors by going to almost any Web search system and keying the phrase “enterprise search” or “information access”. You can figure out the rest of the information from these results pages.

IBM baffles me. The company offers more information products and services than any other firm I track. Each year I try to sort out the product and service names. This year I noticed this information buried deep in one of the news stories about the new version of Cognos 8. My source is here,

My hunch is that IBM is creating a new map for business intelligence. On that map, IBM will point out the big X where the real high value payoff may be found. Here’s the pertinent passage from the IBM Cognos news release:

IBM’s recent CEO and CIO surveys have found unstructured corporate information such as user files, customer comments, medical images, Web and rich media content to be growing at 63%. The explosive growth of this type of business information has pushed the convergence of the BI and Search categories. It has created demand for new BI search capabilities to provide quick and easy access to both ranked and relevant BI content and unstructured information. Newly updated, IBM Cognos 8 Go! Search v4 lets any business user extend the decision-making capabilities of IBM Cognos 8 BI by securely accessing and dynamically creating BI content using simple key-word search criteria. The software works with popular enterprise search applications such as IBM OmniFind Enterprise Edition, Google, Yahoo and Autonomy so users can see structured, trusted BI content and unstructured data such as Word documents and PDF’s in the same view within a familiar interface. Users can search all fully-indexed metadata as well as titles and descriptions within a report. Search-assisted authoring and exploration gives them options to refine queries or analyze data cubes based on search terms. These capabilities speed access to the most relevant business information regardless of naming similarities between reports, helps business users quickly refine queries as required and frees IT from constantly re-creating commonly used reports. This leaves IT with more time for strategic business initiatives. The software is completely integrated with the web-based administration and security parameters set by IT administrators for IBM Cognos 8 BI. This integration provides a centralized, efficient approach to administration and security and effectively addresses two common areas of concern for resource-constrained IT departments, who want to provide more autonomy to business users, but need a single administration point and assurance that corporate authentication policies will be maintained. ‘These new enhancements to our Go! Portfolio provide business-driven performance information to help each area of the organization strategically manage the information that is most pertinent to them,’ said Leah MacMillan, vice president, product marketing, Cognos, an IBM Company. ‘Both the business and IT gain more autonomy whether employees are in the office searching, monitoring and analyzing business outcomes or on the road looking for new business updates or geographically relevant information.’ The IBM Cognos 8 Go! Portfolio of software is a key component of IBM’s Information Agenda, a new approach consisting of industry-specific software and consulting services geared to helping customers use information as a strategic asset across their businesses. [Emphasis added]

Let me deconstruct this passage using my addled goose methods.

Written by Stephen E. Arnold · Filed Under Business strategy, Enterprise, Feature, Search, Text analytics, Text processing | Comments Off on Cognos 8: Blurring Business Intelligence and Search

Intel’s Interest in Medical Terminology Translation

October 4, 2008

Intel continues to be a slippery fish when it comes to search and content processing. The ill fated Convera deal burned thorough millions in the early 2000s. Earlier this year, Intel pumped cash into Endeca, one of the two high profile enterprise search systems, known for their ecommerce and information access systems. (The other vendor is Autonomy. Fast Search & Transfer seems to be shifting from a vendor to an R&D role, but its trajectory remains unclear to me.)

Intel has one engineer thinking about language. The posting on an Intel Software Network Web log “Designing for Gray Scale: Under the Hood of Medical Terminology Translation” is suggestive. The author is Joshua Painter, who identifies himself with Intel. You can read this post here. Translation of scientific, technical, and medical terminology is somewhat easier than translating general business writing. The task is difficult, particularly when a large pharmaceutical company wants to monitor references to a drug’ formal and casual names in English and non-English document sets.

Mr. Painter’s write up concerns standards; specifically, “data standards in enabling interoperability in healthcare.” For me the interesting passage in this write up was:

An architecture for Health Information Exchange must accommodate choice and dealing with change – it must be designed for grayscale. This includes choice of medical vocabularies, messaging standards, and other terminology interchange considerations. In my last post I introduced the notion of a Common Terminology Services to deliver a set of capabilities in this space. In this post, I will discuss a technical architecture for enabling this.

The word grayscale, I think, means fuzziness. Intel makes these tantalizing pieces of information available, and I continue to watch for them. My hunch is that Intel wants to put some content centric operations in silicon. Imagine. Endeca on a multi core chip. So far this is speculation, but it is clear that juiced hardware can deliver some impressive content processing performance boosts. Exegy’s appliance demonstrates the value of this hardware angle.

Stephen Arnold, October 4, 2008

Written by Stephen E. Arnold · Filed Under Business strategy, Enterprise, News, Semantic, Text analytics, Text processing | Comments Off on Intel’s Interest in Medical Terminology Translation

Endeca Pursues Publishers

October 3, 2008

MarkLogic has been making headway in the world of publishing. I know that I have predicted the demise of traditional newspaper, magazine, and book publishers, but there is life in a number of publishing sectors. Publishers–spurred by amateur journalists like this addled goose and fast changing Web companies like Google–have been increasingly open to new technology. Nstein, a former content processing vendor, has worked hard to reposition some of its technology specifically for the publishing industry. Now Endeca is hopping on the bandwagon. One of the early entrants from the search and content processing sector was Fast Search & Transfer. The company acquired a company in Utah and created a remarkable PowerPoint presentation showing Fast ESP (enterprise search platform) as the foundation of a next-generation newspaper. I’m not sure what happened to that initiative since Microsoft gobbled up Fast Search and turned Oslo’s engineers into the heart of Redmond’s search innovation effort.

Endeca, therefore, years ago made a well considered move to tailor its technology to the needs of publishers. I heard that the company has more than 150 publishing clients. You can read about the services in the company’s news release here or a boiled down version from Customer Interaction Solutions here. According to Endeca’s Steve Papa:

Media and publishing represents one of Endeca’s largest and fastest growing areas of focus. Web and mobile platforms, once seen as a required complement to traditional print and broadcast mediums, have rapidly become the primary area for new product creation and revenue growth. We’re working closely with our most innovative clients and partners to develop next-generation offerings that deliver a differentiated cross-medium experience, simplify the re-use of content across media platforms, and create new opportunities to monetize text, audio and video assets.

The question becomes, “With more search and content processing vendors chasing publishing companies, will the vendors be able to deliver enough value to warrant the high license fees some vendors charge?” What may happen is that price competition may force some of the smaller, less well known vendors to park on the side of the information highway hoping another ride comes along. “Value”, as I use the term, means that these potent systems scale economically, deliver good performance, and accommodate change without requiring a Roman legion of programmers. In my experience, publishers often lack a good understanding of the problems their own content creates for them. Publishers often don’t want search; publishers want the ability to create new information products from existing content. The ideal system delivers what publishers call “content repurposing” without requiring expensive, vain, and erratic human editors. Publishers would prefer life without equally expensive, vain, and erratic authors if possible. Publishing looks like an ideal market, but in some ways it is a difficult sector in which to gain traction and make sales. Sci-tech publishers want to “own” a solution so competitors can’t enjoy the benefits of a level playing field.

You can learn more about Endeca here.

Stephen Arnold, October 3, 2008

Written by Stephen E. Arnold · Filed Under Business strategy, Enterprise, News, Search, Text analytics, Text processing | 2 Comments

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Mark Logic and Basis Technology

The Financial Times Rediscovers Text Mining

Data Mining: A Bad Report Card

MarkLogic 4.0: A Next-Generation Content System

Powerset’s Approach to Search

Exalead’s High Performance Platform: CloudView

The Goose Quacks: Arnold Endnote at Enterprise Search Summit

Cognos 8: Blurring Business Intelligence and Search

Intel’s Interest in Medical Terminology Translation

Endeca Pursues Publishers

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta