Silobreaker to Roll Out Report Feature

June 4, 2010

Short honk: We learned from a reader in Europe, that Silobreaker plans to roll out a report feature in the fall of 2010. Silobreaker delivers high value information in an easy-to-digest format. If you are not familiar with the company, navigate to http://www.silobreaker.com/. You can use the free service and upgrade to the industrial-strength version once you get a feel for the depth of the service. Silobreaker supports one or multi dimension queries. When you become a paying customer, you can configure a custom report on a topic of interest to you. You can specify daily, weekly, or monthly updates.

Stephen E Arnold, June 4, 2010

Freebie although I have been assured a fish treat the next time I track down a Silobreaker executive. Promises, promises. I would settle for an answer to my email queries.

Exalead Cloudview Lets Fingers Do the Walking and Caring

June 4, 2010

Yellow Pages Group’s phone application, Urbanizer, selected Exalead Cloudview to collect customer sentiment information. This innovative product is the first restaurant recommendation application that aligns with the emotional element of consumer decision making.

Sys-Con Media reports in “Urbanizer iPhone Application Uses Exalead CloudView to Collect Customer Sentiment Data” () that this new phone application allows users to choose from a selection of pre-defined moods or use Urbanizer’s equalizer function to create a custom mood based on combinations of cuisine, ambiance and service categories. Exalead’s CloudView search-based application platform is embedded into the Urbanizer application architecture and uses semantic extraction capabilities to distill sentiment from unstructured web data from consumer comments posted to Urbanizer.

The advanced semantic technology that Exalead brings to the table seems to be reshaping the digital content landscape. Cloudview collects data from virtually any source, in any format, and transforms it into structured business information that can be directly searched and queried.

Melody K. Smith, June 4, 2010

A freebie but maybe a Coca lite when I am next in Paris?

SAS Text Analytics and Teragram

May 28, 2010

I received a call about Teragram, the text processing company that SAS acquired a couple of years ago. I did a quick Overflight check and realized that I had not documented the absorption of Teragram into SAS. Teragram’s technology is alive and well, but the SAS positioning is for content processing to be a component of SAS Text Analytics. The product and solution has its own subsite within SAS.com. You can locate the details at http://www.sas.com/text-analytics/.

Another important point is that SAS Text Analytics includes four components. There is the SAS Enterprise Content Categorization function. The system parses content and identifies entities. Metadata are created along with category rules.

The second function is SAS Sentiment Analysis. A number of companies are competing in this sector. The SAS approach sucks in emails, tweets, and other documents. The system identifies various subjective shades in the source content.

SAS Text Miner now includes both text and data mining operations. The system is not one of those Web 2.0, “it is really easy” solutions. The system is easy to use, but to put “easy” in context, you will need programming and statistical savvy along with solid data set building skills.

The SAS Ontology Management solution provides a centralized method for keeping index terms and metatags consistent. Sounds easy, but this type of consistency is the difference between useful and useless information. SharePoint lacks this type of functionality. You have been given a gentle reminder about consistent tagging, dear SharePoint user.

SAS has a blog focused on text analytics. You can read “The Text Frontier” but last time I checked, the blog’s most recent update was posted in March 2010.

Bottomline: Teragram is alive and well, just part of SAS Text Analytics.

Stephen E Arnold, May 28, 2010

Freebie

Sybase Touts Search Prowess

May 27, 2010

Sybase IQ Update Strengthens Database Query, Search Features” is, I suppose, a response of sorts to my opinion that SAP has not made a commitment to search. With TREX and some Endeca support, SAP seems content to rely upon third parties to make content findable within a sprawling SAP construct.

The story trots out an azure chip consultant to dash around the circus ring. I love those azure chip “we” statements. Right. The core of the announcement is that Sybase IQ15.2 can search unstructured content and perform such tricks as term frequency. Yep, that’s text analytics.

The passage that caught my attention was this:

“Sybase IQ is best known for its extreme performance, allowing decision-makers to analyze business trends, predict outcomes, and revise strategies, often in a matter of seconds,” Joydeep Das, Director of Product Management, Data Warehousing and Analytics at Sybase, said in a statement. “With Sybase IQ 15.2, enterprises are now able to analyze previously untapped sources of information, such as web content and email, to deliver smarter answers across structured and unstructured data.”

What is Sybase IQ? I dug through my Overflight file on this product and jotted down these points:

  • This is a column-based database. The column approach stacks data vertically, not in the Excel-like horizontal row format. Arguments between row store and column dudes are esoteric. In a nutshell, certain types of processes are facilitated with the column approach. Keep in mind it is a relational database and some RDBMS jockeys can make row stores into columnar structures.
  • For certain types of data reads are faster. Some data warehouse jockeys swear by the column method. Sun was a cheerleader, but we know what happened to Sun, so that may not be the endorsement it once was. Keep in mind that Sybase itself was acquired after compiling an interesting financial track record, but those offices are cool looking.
  • The company’s definition of data federation confused me. Does Sybase perform a Mark Logic type of function, creating a repository? Does Sybase work like the original Vivisimo federating method? What happens if I need to see the source Sybase has its indexes within its system. I am not sure how IQ queries external databases, makes sure security is observed, and then returns results without getting confused about who provided what and which data item is visible to a particular user. But like many vendors, azure chip folks are happy to parrot “federated”, cash their check, and move on to the next client.
  • You can if you hurry download the Sybase description of IQ’s architectural strength at this link. I checked it on May 26, 2010, and it was valid. Wait too long and the PDF may be unavailable.

With Sybase in a new home, a product in Version 15 will have an opportunity to show how it can grow and contribute to the SAP franchise. With SAP pursuing inorganic growth, the expectations and timeline will be key factors in my opinion. The database sector has some dominant players with upstarts like Google ready to enter the fray. Can Sybase challenge IBM, Microsoft, Oracle, and NoSQL crowd? I hope so.

Stephen E Arnold, May 27, 2010

Freebie

IBM Search Technology

May 25, 2010

Before I headed West last week, I participated in a discussion about IBM search technology. No one at the lunch meeting worked at IBM, but, hey, IBM is a giant in software and services and each person had a viewpoint.

One surprising factoid emerged from chatter, and I wanted to snag it before it flew away like my first female goose friend. (She left for New York, abandoning the joys of rural Illinois for the bright lights in the big city. She probably ended up working at IBM in Armonk.)

IBM has a mini Web site embedded within its sprawling IBM digital Uzbekistan. The page is “Enterprise Search Technology.” The subtitle is “Innovation Matters”. You can navigate directly to this page by clicking this link. Finding the page took some work, but you are welcome to experience the thrill of hunt via IBM.com if you have some spare time.

IBM describes Trevi, which is an Intranet search system. The system incorporates six technologies, illustrated in the diagram below:

image

Source: IBM 2010.

The factoid: The page seems to be an island in time. The featured researcher – Marcus Fontoura – offers some comments about problems in searching. A click returns a 404 error.

Interesting.

Stephen E Arnold, May 25, 2010

Freebie.

Exalead and Dassault Tie Up, Users Benefit

May 24, 2010

A happy quack to the reader who alerted us to another win by Exalead.

Dassault Systèmes (DS) (Euronext Paris: #13065, DSY.PA), one of the world leaders in 3D and Product Lifecycle Management (PLM) solutions, announced an OEM agreement with Exalead, a global software provider in the enterprise and Web search market. As a result of this partnership, Dassault will deliver discovery and advanced PLM enterprise search capabilities within the Dassault ENOVIA V6 solutions.

The Exalead CloudView OEM edition is dedicated to ISVs and integrators who want to differentiate their solutions with high-performing and highly scalable embedded search capabilities. Built on an open, modular architecture, Exalead CloudView uses minimal hardware but provides high scalability, which helps reduce overall costs. Additionally, Exalead’s CloudView uses advanced semantic technologies to analyze, categorize, enhance and align data automatically. Users benefit from more accurate, precise and relevant search results.

This partnership with Exalead demonstrates the unique capabilities of ENOVIA’s V6 PLM solutions to serve as an open federation, indexing and data warehouse platform for process and user data, for customers across multiple industries. Dassault Systèmes PLM users will benefit from its Exalead-empowered ENOVIA V6 solutions to handle large data volumes thus enabling PLM enterprise data to be easily discovered, indexed and instantaneously available for real-time search and intelligent navigation. Non-experts will have the opportunity to access PLM know-how and knowledge with the simplicity and the performance of the Web in scalable online collaborative environments. Moreover, PLM creators and collaborators will be able to instantly find IP from any generic, business, product and social content and turn it into actionable intelligence.

Stephen E Arnold, May 22, 2010

Freebie.

Twitter Sentiments: A Search Variant?

May 13, 2010

Quite a suggestive write up from DNA India called “Twitter Sentiments May Soon Replace Public Opinion Polls.” According to the write up, “combing Twitter for data can be as good a way of researching opinions as conducting an actual poll.” Instead of working through a traditional survey process with sampling, instrument drafting, and instrument testing, just tweet. The notion of searching through data sets for a nugget gets replaced with an instant answer. For me the key point in the write up was:

Noah Smith, assistant professor of language technologies and machine learning in the School of Computer Science, said that the findings suggest that analyzing the text found in streams of tweets could become a cheap, rapid means of gauging public opinion on at least some subjects. He, however, warned that tools for extracting public opinion from social media text are still crude and social media remain in their infancy, so the extent to which these methods could replace or supplement traditional polling is still unknown.

What is the make up of a Twitter message sample? Noise. That’s an understatement. Nevertheless, the idea is interesting and shows how “informazation” is becoming a significant method.

Stephen E Arnold, May 13, 2010

Freebie.

Google Gets Guha Patent

May 12, 2010

Short honk: I learned today that Google received a patent for Ramanathan Guha’s 2005 invention “Aggregating Context Data for Programmable Search Engines”, US 7,716,199. Will the other PSE inventions find their way out of the USPTO’s cave of winds? This “aggregation” invention is significant, so the fate of the other Guha inventions may not matter.

Stephen E Arnold, May 12, 2010

Freebie.

News Flash: Data Mining May Not Be an Information Cure All

May 7, 2010

Technology can work wonders. Technology is supposed to make it easier for downsized organizations to perform with agility and alacrity. I am “into” technology but I understand that the minimum wage workers at airline counters and financial institutions operate within systems assumed to work as intended. These systems, in my opinion, neither work at the level of answering a simple question like “Is the flight on time?” or at more a sophisticated level of “Where did this wire transfer come from?”

Why is it a surprise that technology does not do less familiar tasks with glitches or outright breakdowns? I was surprised to read “NY Plot Highlights Limitations of Data Mining.” There were three reasons:

  1. The writer for Network World expresses gentle surprise that predictive systems don’t work too well when applied to the actions of one person. Network World documents lots of system glitches, and the gentle surprise is not warranted.
  2. The story plants the seed that we have no choice but to rely on fancy content processing systems. Are there other options? None if you rely on this article’s analysis. In my experience there are indeed options, but these are conveniently nudged to the margins.
  3. The dancing around with data mining is specious. Text processing is one of those Rube Goldberg machines just built with software. Get the assumptions wrong, the inputs wrong, or the algorithms wrong to a slight degree and guess what? The outputs are likely to be wrong.

Here’s the passage I found interesting:

That fact is likely to provide more fodder for those who question the effectiveness of using data mining approaches to uncover and forecast terror plots. Since the terror attacks of Sept. 11, the federal government has spent tens of millions of dollars on data mining programs and behavioral surveillance technologies that are being used by several agencies to identify potential terrorists. The tools typically work by searching through mountains of data in large databases for unusual patterns of activity, which are then used to predict future behavior. The data is often culled from dozens of sources including commercial and government databases and meshed together to see what kind of patterns emerge.

In my experience, humans and text processing must work in an integrated way. Depend only on technology and the likelihood of getting actionable information that is immediately useful goes down. Even Google asks humans to improve on its machine translation outputs. Smart software may not be so smart.

Stephen E Arnold, May 7, 2010

Unsponsored post.

Milward from Linguamatics Wins 2010 Evvie Award

April 28, 2010

The Search Engine Meeting, held this year in Boston, is one of the few events that focuses on the substance of information retrieval, not the marketing hyperbole of the sector. Entering its second decade, the conference speakers tackle challenging subjects. This year speakers addressed such topics as “Universal Composable Indexing” by Chris Biow, Mark Logic Corporation, “Innovations in Social Search” by Jeff Fried, Microsoft, and “From Structured to Unstructured and Back Again: Database Offloading”, by Gregory Grefenstette, Exalead, and a dozen other important topics.

evvie2010

From left to right: Sue Feldman, Vice President, IDC, Dr. David Milward, Liz Diamond, Stephen E. Arnold, and Eric Rogge, Exalead.

Each year, the best paper is recognized with the Evvie Award. The “Evvie” was created in honor of Ev Brenner, one of the pioneers in machine-readable content. After a distinguished career at the American Petroleum Institute, Ev served on the planning committee for the Search Engine Meeting and contributed his insights to many search and content processing companies. One of the questions I asked after each presentation was, “What did Ev think?”. I valued Ev Brenner’s viewpoint as did many others in the field.

The winner of this year’s Evvie award is David R. Milward, Linguamatics, for his paper “From Document Search to Knowledge Discovery: Changing the Paradigm.” Dr. Milward said:

Business success is often dependent on making timely decisions based on the best information available. Typically, for text information, this has meant using document search. However, the process can be accelerated by using agile text mining to provide decision-makers directly with answers rather than sets of documents. This presentation will review the challenges faced in bringing together diverse and extensive information resources to answer business-critical R&D questions in the pharmaceutical domain. In particular, it will outline how an agile NLPbased approach for discovering facts and relationships from free text can be used to leverage scientific knowledge and move beyond search to  automated profiling and hypothesis generation from millions of documents in real time.

Dr. Milward has 20 years’ experience of product development, consultancy and research in natural language processing. He is a co-founder of Linguamatics, and designed the I2E text mining system which uses a novel interactive approach to information extraction. He has been involved in applying text mining to applications in the life sciences for the last 10 years, initially as a Senior Computer Scientist at SRI International. David has a PhD from the University of Cambridge, and was a researcher and lecturer at the University of Edinburgh. He is widely published in the areas of information extraction, spoken dialogue, parsing, syntax and semantics.

Presenting this year’s award was Eric Rogge, Exalead, and Liz Diamond, niece of Ev Brenner. The award winner received a recognition award and a check for $500. A special thanks to Exalead for sponsoring this year’s Evvie.

The judges for the 2010 Evvie were Dr. David Evans (Evans Research), Sue Feldman (IDC), and Jill O’Neill, NFAIS.

Congratulations, Dr. Milward.

Stuart Schram IV, April 28, 2010

Sponsored post.

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta