Attivio’s Sid Probstein: An Exclusive Interview

February 25, 2009

I caught up with Sid Probstein, Attivio’s engaging chief technologist on February 23, 2009. Attivio is a new breed information company. The company combines a number of technologies to allow its licensees to extract more value from structured and unstructured information. Mr. Probstein is one of the speakers at the Boston Search Engine Meeting, a show that is now recognized as one of the most important venues for those serious about search, information retrieval, and content processing. You can register to attend this year’s conference here. Too many conferences features confusing multi track programs, cavernous exhibit halls, and annoyed attendees who find that the substance of the program does not match the marketing hyperbole. When you attend the Boston Search Engine Meeting, you have opportunities to talk directly to influential experts like Mr. Probstein. The full text of the interview appears below.

Will you describe briefly your company and its search / content processing technology? If you are not a company, please, describe your research in search / content processing.

Attivio’s Active Intelligence Engine (AIE) is powering today’s critical business solutions with a completely new approach to unifying information access. AIE supports querying with the precision of SQL and the fuzziness of full-text search. Our patent-applied-for query-side JOIN() operator allows relational data to be manipulated as a database would, but in combination with full-text operations like fuzzy search, fielded search, Boolean search, etc. Finally our ability to save any query as an alert and thereafter have new data trigger a workflow that may notify a user or update another system, brings a sorely needed “active” component to information access.

By extending enterprise search capabilities across documents, data and media, AIE brings deeper insight to business applications and Web sites. AIE’s flexible design enables business and technology leaders to speed innovation through rapid prototyping and deployment, which dramatically lowers risk – and important consideration in today’s economy. Systems integrators, independent software vendors, corporations and government agencies partner with Attivio to automate information-driven processes and gain competitive advantage.

What are the three major challenges you see in search / content processing in 2009?

May I offer three plus a bonus challenge?

First, understanding structured and unstructured data; currently most search engines don’t deal with structured data as it exists; they remove or require removal of the relationships. Retaining these relationships is the key challenge and a core value of information access.

Second, switching from the “pull” model in which end-users consume information, to the “push” model in which end-users and information systems are fed a stream of relevant information and analysis.

Third, being able to easily and rapidly construct information access applications. The year-long implementation cycle simply won’t cut it in the current climate; after all, that was the status quo for the past five years – long, challenging implementations, as search was still nascent. In 2009 what took months should take weeks. Also, the model has to change. Instead of trying to determine exactly how to build your information access strategy – the classic “aim, fire” approach – which often misses! – the new model is to “fire” and then “aim, aim aim” – correct your course and learn as you go so that you ultimately produce an application you are delighted with.

I also want to mention supporting complex analysis and enrichment of many different forms of content. For example: identifying important fields, from a search perspective; detecting relationships between pieces of content, or entire silos of content. This is key to breaking down silos – something leading analysts agree that this will be a major focus in enterprise IT starting in 2011.

With search / content processing decades old, what have been the principal barriers to resolving these challenges in the past?

There are several hurdles. First, the inverted index structure has not traditionally been able to deal with relationships; just terms and documents. Second, there still is a lack of tools to move data around, as opposed to simply obtaining content, has been a barrier for enterprise search in particular. There has not been an analog to “ETL” in the unstructured world. (The “connector” standard is about getting data, not moving it.) Finally, I think there’s a lack of a truly dynamic architecture has meant having to re-index when changing configuration or adding new types of data to the index; also a lack of support for rapid updates has lead to a proliferation of paired search engines and databases.

With the rapid change in the business climate, how will the increasing financial pressure on information technology affect search / content processing?

Information access is critically important during a recession. Every interaction with the customer has the potential to cause churn. Reducing churn is less costly by far then acquiring new customers. Good service is one of the keys to retaining customers, and a typical cause of poor service is … poor information access. A real life example: I recently rolled over my 401K. I had 30 days to do it, and did on the 28th day via phone. On the 29th day someone else from my financial services firm called back and asked me if I wanted to roll my 401K over. This was quite surprising. When asked why the representative didn’t know I had done it the day before, they said “I don’t have access to that information”. The cost of that information access problem was two phone calls: the second rollover call, and then another call back from me to verify that I had, in fact, rolled over my 401k.

From the internal perspective of IT, demand to turn-around information access solutions will be higher than ever. The need to show progress quickly has never been higher, so selecting tools that support rapid development via iteration and prototyping is critically important.

Search / content processing systems have been integrated into such diverse functions as business intelligence and customer support. Do you see search / content processing becoming increasingly integrated into enterprise applications?

Search is an essential feature in most every application used to create, manage or even analyze content. However, in this mode search is both a commodity and a de-facto silo of data. Standalone search and content processing will still be important as it is the best way to build applications using data across these silos. A good example here is what we call the Agile Content Network (ACN). Every content management system (CMS) has at least minimal search facilities. But how can a content provider create new channels and micro-sites of content across many incompatible CMSs? Standalone information access that can cut across silos is the answer.

Google has disrupted certain enterprise search markets with its appliance solution. The Google brand creates the idea in the minds of some procurement teams and purchasing agents that Google is the only or preferred search solution. What can a vendor do to adapt to this Google effect?

It is certainly true that Google has a powerful brand. However, vendors must promote transparency and help educate buyers so that they realize, on their own, the fit or non-fit of the GSA. It is also important to explain how what your product does is different from what Google does and how those differences apply to the customers’ needs for accessing information. Buyers are smart, and the challenge for vendors is to be sure to communicate and educate about needs, goals and the most effective way to attain them.

A good example of the Google brand blinding customers to their own needs is detailed in the following blog entry: http://www.attivio.com/attivio/blog/317-report-from-gilbane-2008-our-take-on-open-source-search.html

As you look forward, what are some new features / issues that you think will become more important in 2009? Where do you see a major break-through over the next 36 months?

I think that there continue to be no real standards around information access. We believe that older standards like SQL need to be updated with full-text capabilities. Legacy enterprise search vendors have traditionally focused on proprietary interfaces or driving their own standards. This will not be the case for the next wave of information access companies. Google and others are showing how powerful language modeling can be. I believe machine translation and various multi-word applications will all become part of the landscape in the next 36 months.

12. Mobile search is emerging as an important branch of search / content processing. Mobile search, however, imposes some limitations on presentation and query submission. What are your views of mobile search’s impact on more traditional enterprise search / content processing?

Mobile information access is definitely emerging in the enterprise. In the short term, it needs to become the instrument by which some updates are delivered – as alerts – and in other cases it is simply a notification that a more complex update – perhaps requiring a laptop – is available. In time mobile devices will be able to enrich results on their own. The iPhone, for example, could filter results using GPS location. The iPhone also shows that complex presentations are increasingly possible.

Ultimately, a mobile device, like the desktop, call center, digital home, brick and mortar store kiosk, are all access and delivery channels. Getting the information flow for each to work consistently while taking advantage of the intimacy of the medium (e.g. GPS information for mobile) is the future.

15. Where can I find more information about your products, services, and research?

The best place is our Web site: www.attivio.com.

Stephen Arnold, February 25, 2009

YAGG: Google Talk

February 24, 2009

Tweets and posts are flying by about an alleged pfishing exploit for Google email. Mashable reports here that another issue may be poking its snout into hapless users’ lives. Adam Ostrow wrote:

Gmail is now being attacked by a phishing scam that is spreading like wildfire.

If true, YAGG strikes again. You get message “check me out” with a link to a tinyurl. Click the puppy and you go to “a site called ViddyHo.” Lucky you. Your contacts get an email. Nifty. Love those tiny urls which mask the destination url.

Stephen Arnold, February 24, 2009

Amazon: Outage Reported

February 24, 2009

The old US of A’s computing infrastructure seems to be showing that it ain’t what it used to be. ComputerWorld’s Sumner Lemon wrote here “Amazon Search Engine Suffers Brief Outage.” I have not been too thrilled with some of the features of the A9 system. But my quibbles are minor compared to the search system’s not working. Search is the means by which Amazon generates the bulk of its money. The vaunted cloud services are still modestly sized French fries at the Amazon revenue feast. The system was down for about an hour, but, hey, cloud services are supposed to have rock solid uptime this addled goose thought.

Stephen Arnold, February 24, 2009

Google: Yet Another Google Glitch

February 24, 2009

YAGG (a new acronym pronounced like “gag” as in choke) has been coined by the goslings in the mine drainage pond. The addled goose has little to add to the Washington Post’s headline “Trouble in the Clouds: Gmail Turns into Gfail” here. Xooglers are sending nasty grams to other Googlers, when Gmail and MOMA work obviously. Glitches are becoming very Vista like in the opinion of the addled goose. The reasons offered by main stream dead tree publications omit such interesting causes as:

  • Googlers are smart, but the size of the company has made the culture susceptible to the Microsoft product management disease
  • Dependencies within the system are usually trapped by Google’s compile time checks and the peer quality assurance project, but as more Googlers become too busy, little errors can grow up to be big mistakes. Google has not created an Orkut class issue, but the Gmail issue is more immediate
  • Problems are evident in such unrelated areas as ad metrics, malware flagging, and today mail.

Too bad there is no competitor in a position to challenge the GOOG. A decade of indifference has created a culture of failure among Google’s direct competitors and now a soupçon is evident to the addled goose in some Google functions. Just my honking opinion. I don’t have a fix. The future is evident to me for some Google services. I can see that vista before me. Can you? More pointedly, can you see your Gmail?

Stephen Arnold, February 24, 2009

Another Stunner: Enterprise Software, Downright Louse Ridden

February 24, 2009

Every once in a while, the editors at CNet allow or encourage friskiness. A good example is Matt Asay’s “Why Enterprise Software Is So Shockingly Bad” here. One would think that in the present economic meltdown, one would want to rub oil on the vendors’ shoulders and say, “You are so wonderful to me.” Guess not. Mr. Asay does the commercial equivalent of the now retired “Golden Fleece Award” given by a feisty senator for government fumbles.  Mr. Asay identifies four issues courtesy of a person named Michael Nygard. I don’t feel comfortable repeating these in full. I would like to highlight one of the reasons why enterprise software is not too good and then offer a comment. Mr. Asay picks up Mr. Nygard’s point that enterprise vendors have captive audiences. I agree. I don’t think the word captive does the method justice. In my opinion, companies are given a taste of the software. Like a junkie hooking middle school kids, the first experience is really good. The middle school kid takes the bait and then finds himself  or herself hooked and in the thrall of the dealer who controls how much good feelings result from the addict’s behavior. The notion is “lock in”. The method is addiction. Once an enterprise is hooked, the enterprise abrogates responsibility to the vendor. In some cases, the vendor works with a consultant. No matter. The client can’t escape even if he or she wants to. What’s the trajectory of this approach? Well, check out the sorry state of information technology budgets? Look at the dissatisfaction with existing CRM, CMS, or enterprise search systems. Scary thought to this addled goose.

Stephen Arnold, February 24, 2009

More on the Publisher Who Violated Copyright

February 24, 2009

A quick update: believe it or not, the publisher whose staff emailed me another author’s complete work with a transmittal note saying “Of interest” stunned me. One of this company’s writers actually posted a story about how awful this action was. Amazing. The publishers I trust–Gilbane, Galatea, Infonortics–were radio silent and for good reason. None of these outfits is disorganized, clueless, and disrespectful of copyright. Amazing. No wonder so many organizations are floundering in today’s economic climate. From banking to publishing, from automobile sales to advertising production–tricks, angles, short cuts, and disregard for appropriate behavior are everywhere. No, I won’t name this outfit. The goose does not need legal eagles darkening the sky above my pond filled with mine run off.

Stephen Arnold, February 24, 2009

ISYS Taps Smith as President of Its Americas Division

February 24, 2009

ISYS Search Software, http://www.isys-search.com/, which specializes in enterprise search, has named Bob Smith president of its Americas division. He brings to ISYS more than 20 years of software sales and management history; the hiring signals another company investment in growth. The company’s enterprise search suite here offers solutions for use on the desktop, the Web, e-mail, an SDK, and more, all out of the box. The company was already showing record sales with its new aggressive growth plans. To their credit, Smith has shown some real savvy in the past: He last worked at Format Dynamics, nailing down the company’s first major commercial deals, and at Ping Identity, he grew company income by $14 million in 24 months. ISYS Search Software is making big moves in the funding game, and their future’s looking bright.

Jessica W. Bratcher, February 24, 2009

Googler Allegedly Says, Aloha, Suckers

February 24, 2009

Dead tree outfits and high tech are not like peanut butter and jelly. I don’t know if this interesting item that was published in the Los Angeles Times’s Web log here is accurate. Nevertheless, I find it suggestive of the changing attitude of some big time news outfits toward Google and, if it is true, of the dissension or dissatisfaction that seems to be more evident within and around the Google halo. The headline is a catchy “Googler Goodbye E-Mail: So Long, Suckers. I’m Out”. For me, the most interesting comment in Jason Shugars’ e-mail, if true, was this comment reported by the LA Times:

But Jason Shugars worked at Google, whose off-center corporate culture is more forgiving than that of your average buttoned-down investment bank. In the rest of his goodbye, Shugars, a senior sales compliance specialist, reminisced about workplace moments that included putting cake down his pants at a sales conference, stealing a boss’ $8,000 leather couch and singing “Hit Me Baby One More Time” in a miniskirt and braids.

I quite liked the cake down the pants, but on reflection the cross dressing riff was most memorable. The LA Times’s article concluded: “There are, quite simply, no rules.” Not too reassuring for enterprise software procurement teams in my opinion.

Stephen Arnold, February 24, 2009

Microsoft FAST 5.x Scripting Vulnerability

February 24, 2009

I have not verified this news item posted by Security Focus here. If the allegation is true, then “FAST ESP is prone to a cross-site scripting vulnerability because it fails to sufficiently sanitize user-supplied data.” The alleged vulnerability was identified by Kentaro Ohshima.

Stephen Arnold, February 24, 2009

Twitter Security: An Oxymoron

February 24, 2009

PCWorld’s Joan Goodchild wrote an interesting article about Twitter’s security issues here. She identifies three potential areas of concern. First, a url shortener can send a hapless user to an unknown and potentially harmful location. Second, she identifies a lack of email authentication. And, third, my favorite: Twitter can be useful those who want to “follow” a person. The addled goose is confident that these three issues do not exhaust the security vulnerabilities. The goose does not directly Twitter, send tweets, or fiddle with Twitter ecosystem tools. Those who follow the goose often want to cook it. Could Twitter users get their geese cooked?

Stephen Arnold, February 24, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta