Text Analytics SummitPolySpot: Agile Enterprise Search Infrastructure

Pingar for Personalized Search

November 2, 2009

A happy quack to the reader who alerted me to Pingar, a SharePoint centric content processing vendor with offices in New Zealand. The company, founded in 2006,  announced its enterprise search solution for SharePoint in mid October 2009. According to the company:

“The intelligent enterprise search tool that can be embedded into the upcoming release of Microsoft SharePoint Server 2010 will take the browsing out of browsing,” says Pingar’s co-founder and Managing Director Peter Wren-Hilton.

I stumbled on the notion of taking “the browsing out of browsing”. I use a browser to browse. If don’t want to browse, I use another method. Nevertheless, the system, according to the company:

[the] solution goes inside data documents, finds the content the user is seeking and then places it into a dynamically generated PDF or XPS document, rather than just presenting a list of links like the traditional search model. Pingar’s solution also sorts the search into categories to minimize reading times.

image

Pingar.com

In a Pingar report, a hit includes a back link to the original source document.

According to the Tauranga Eastern Link Newsletter:

Pingar has developed dynamic  software  to  create  an  ‘intelligent’  search  engine,  which enables  users  to  type  in  a  specific  question  and  get  an exact answer.  Pingar’s new offices will be in Hong Kong’s prestigious Science &  Innovation Park, close  to one of  its key partners as well as a  range of potential customers  in China  and Asia.   Sharon-May McCrostie, New  Zealand’s Trade  Commissioner  to  Hong  Kong,  endorses  Pingar’s move  into  the  Chinese  market.    “Pingar  has  a revolutionary,  clever way  of  search  that will  transform  so many industry sectors, including publishing and monetizing online  content.    It  has  been  very  exciting  for  us  to see  a New  Zealand  company  taking  on  the  world  and  making real inroads…

Read more

Google and Cloud Puffs

November 2, 2009

Editor’s Note: I am translating a conference session talk. It could be construed as a pitch for Google. That’s not what Beyond Search does, so this summary is all my own. Just a heads up. Jessica Bratcher

Google’s Michael Lock, director of Americas Sales & Operations, Google Enterprise, gave a spirited talk called “Top 5 Lessons Learned Selling and Marketing Cloud-based Computing” at the SIIA OnDemand 2009 conference, http://www.siia.net/OnDemand/2009/default.asp, in San Jose, CA, on Friday. An admitted former software salesman, Lock had a lot to say, including Cloud=Good, Software/Middleware=Bad. It was a stark statement, and he made a really great argument for apps out in the cloud.

Let me summarize Lock’s five lessons as related to moi. I’m a cloud-based computing Google apps user. Why?

  • It’s a free storage system. There’s only so much memory on my laptop. I hate my external hard drive because it was expensive, it’s clunky, and Vista won’t let me copy stuff off of it. I’ve been through at least five thumb drives in the past year. It’s comforting to know that if my laptop were stolen or if my hard drive fizzled again, all my stuff wouldn’t be on it and lost.
  • It’s fast to access from anywhere, and in some cases whether I have an Internet connection or not. I can be at home, at Panera, in an airport, or at my mother’s.
  • It’s password-secured. As long as I have that password, I can access my stuff from anywhere and from computers not my own. Not that I have a pressing need for data security, but I don’t have to deal with security updates. Every. Other. Day.
  • It’s part of a cheap (or for me, an individual, free!) suite of interrelated products. You know how expensive a Microsoft Office Suite is. Ouch.
  • I don’t have to deal with learning/using Office or similar software to make my work happen. Lock mentioned the agony of upgrading from MS 2003 to 2007. I felt that pain keenly.

Lock made two strong points in favor of using/buying cloud-based computed related to that fifth bullet: “Legacy vendors will fight to prevent this with their very lives… Microsoft Office generates 16 to 18 billion dollars… they will throw mud, say it’s not secure, say it’s not functional.”

FYI: There are more than 20 million users on Google Apps, from government to higher education to small enterprise businesses that don’t even have offices, servers, or shopping cart software. His other point was that the pace of innovation in the cloud is accelerating. Google had 97 major feature releases in 2009, 68 in 2008. How many major updates has Microsoft had since 2000? Lock said Google has an enterprise vision to make the cloud-based apps broader, deeper, more functional, simpler to use, highly extensible, massively scalable–the figurative sky is not the limit.

Jessica Bratcher, November 2, 2009

Dear Fish & Wildlife Service, I, Stephen Arnold, paid Ms. Bratcher for this write up.

Metadata Now Fair Game

November 2, 2009

The US legal system has spoken. I saw the ZDNet UK story “Watch Out, Your Metadata Is Showing” and chuckled. Not long ago in goose years, legal eagles realized that the Word fast save function preserved text once in a document. Sending the document with fast save activated could allow the curious to see the bits and pieces of document that were once believed to be deleted from that document. Exciting stuff. Now the Arizona supreme court, according to Simon Bisson and Mary Branscombe, “has decided that the metadata of a document is governed by the same rules as the document. With value-added indexing coming to most SharePoint systems, there will be some interesting discussions about what metadata is the document’s metadata and which metadata are part of another, broader system. If you read vendors’ enthusiastic descriptions of what their smart software will assign to documents, users, and system processes, you will enter into an interesting world. How exciting will be be? Consider a document that has metadata such as date of creation, file format, and the name of the author. Now consider a document that has metadata pertaining to the “aboutness” of a document, who looked at the document, who made which change and when, and who opened the document and for how long. Interesting stuff in my opinion. The courts will be entering data space soon, and I think that journey will be difficult. Next up? A metadata specialist at your local Top 10 law firm. Get your checkbook ready.

Stephen Arnold, November 2, 2009

I say, no pay.

Fast to Integrate with SharePoint

November 2, 2009

Overflight’s SharePoint search container has been an empty can for weeks. Today I saw a link to the Microsoft Enterprise Search Blog’s “Fast Meets SharePoint – What’s Coming in Search for SharePoint 2010”. Microsoft has owned Fast Search & Technology since April 2008. Since that time, there’s been a Web part and lots of speculation. The “legacy” Fast ESP customers seem to have the impression that Microsoft will support those systems for years. I heard that the commitment was for a decade. The mystery has been the union of Fast ESP with SharePoint. In my opinion, both software systems are bundles of subsystems, and the complexities of each separate product are familiar to me and my goslings here in Harrod’s Creek, Kentucky.

The October 28, 2009, write up contains a number of interesting points. Let me run down those that struck me as significant to my work. Your mileage may vary because the points in the Microsoft blog post are a feature run down, not substantive information about the “new” Fast ESP for SharePoint.

First, the SharePoint conference was sold out. This reminds me that training people to use Microsoft products is a big business. Obviously this is an important point because it focuses on what matters—conference attendance—not search it seems to me.

Second, there was a flashback to a conference held in February 2009. That tells me that there’s not much new in the way of SharePoint / Fast ESP news.

Third, there is a reminder to me that SharePoint is a work in progress. I think someone told me that it is the next operating system from Microsoft. The Fast component will be called Fast Search Server 2010 for SharePoint. Now the story gets interesting. Here’s what’s coming:

  • A content processing pipeline. Most enterprise search systems use intake, content processing, query processing, and administrative controls. Frankly I don’t know how this “pipeline” will differ from other search systems crafted in the late 1990s, when the core of Fast Search was built.
  • Metadata extraction. The idea of identifying concepts to help a user find documents or other content objects “about” a topic is not new. The hitch in the existing systems is that metadata extraction imposes a performance hit on a system. Some metadata system do “deep extraction” which takes significant computational time. Some vendors use the notion of metadata and facets interchangeably. I will be interested to see if Fast ESP will bring the product into line with the systems available from such companies as Coveo, Exalead, and Ontolica, among others. (Some of the terms used to describe the approach remind me of Endeca’s current approach to extending its system.)
  • Structured data search. I find the notion of merging data from database tables or third party applications which use an RDBMS to house data and unstructured text interesting. Attivio, Clarabridge, and other firms are working in this niche now. There are significant challenges related to data transformation. In fact, data transformation can chew up significant resources * before * the search system can begin intake and processing.
  • Visual search. This is the Bing 3D interface. I am not sure how that will fly because it does not seem to be a component of text processing which remains the thorn in the side of the SharePoint user.
  • Advanced linguistics. My recollection is that Fast Search & Transfer had a shop in Germany that worked on certain linguistic functions. Some linguistic manipulations require set up and testing. Out of the box linguistic functions are available from some vendors like Basis Tech, but there is a lot of work that must be done to get these systems working so that the outputs match the needs of some users.
  • Best bets. This is a variation of what I think of as Google’s “I’m feeling lucky”. These best bets work when there are sufficient data to make the recommendations useful. If the Fast ESP system uses a simple metric such as the number of documents “about” a topic written by a SharePoint user, I don’t think the best bet will be particularly helpful. More sophisticated methods require big data to generate useful results. Sophisticated methods operating of too small a data flow return bad bets in my experience.
  • Development platform. The original Fast ESP system was coded in a range of languages. Each time an issue was discovered in the installation of Fast on my watch, new scripts had to be written and inserted into the Fast ESP system. My recollection is that Fast was not a homogeneous system. My guess is that the development system will be Microsoft’s own programming tools which include its scripting language and the VisualStudio line of products. It’s easy to talk about a development platform, but the reality of complex systems like Fast ESP may require considerable creativity to achieve certain objectives.
  • Customization. User profiles have been around a long time. The SharePoint version of Fast will make use of available flags as well as information about a user or the groups to which a user belongs in order to present a Google “ig” type of interface. (“IG” means “individualized Google”, which is a newer version of the MyYahoo feature.)

The last point in the write up is the one that took my breath away. Earlier today I wrote about Autonomy’s assertion about “infinite scalability”. Now I read “extreme scale and performance” for the SharePoint / Fast search system. Keep in mind that “extreme” implies some non standard behavior. With Microsoft’s up and out approach to scaling, this phrase “extreme scale and performance” mean that lots and lots of hardware are going to be needed to handle large content flows. In fact only a handful of systems we have tested this year can deal with petascale content flows. I want to mention the little known Perfect Search as one example of a company that has nailed big data with a very modest hardware footprint. I will have to test the new Fast system before I can accept this “extreme” assertion.

Three thoughts crossed my mind as I worked through this SharePoint / Fast blog post from Microsoft:

First, why wasn’t the Fast ESP for SharePoint rolled out at the SharePoint conference? A delay suggests that something was not in sync. I wondered, “Is this not an interesting business strategy for a search vendor to implement?”

Second, these Microsoft blog posts recycle the same old information without adding any substantive new data. Where’s the block diagram? Where’s the sample / default interfaces? Where’s the list of methods?

Third, I keep thinking about the interactions among two complex systems. Who is going to have the time, money, patience, and management support to get these beasties to cuddle in a sleeping bag? With an alleged 100 million SharePoint licenses, those Microsoft Certified Professionals will be eager to give the work a try.

Billing for consulting services ahead for some search experts!

Stephen Arnold, November 2, 2009

Microsoft routed a question to me last week, but no money was forthcoming. After this article, I am a gone goose.

JustSystems ConceptBase

November 1, 2009

My Overflight system reported that JustSystems Corp. and two partners (Foz Co. and IBM Japan) have teamed up to create a search appliance. The product is called ConceptBase Enterprise Search Appliance”.  The product will be available during the week of November 2, 2009.

ConceptBase offers powerful document search and summarization capabilities for managing and using corporate information infrastructures. ConceptBase is offered in two versions – ConceptBase 20, for managing information on the PC desktop, and ConceptBase 1000, for managing information at the corporate enterprise level. ConceptBase won 1998 Software Product of the Year award from the Software Information Center (SOFTIC), a bureau of Japan’s Ministry of International Trade and Industry (MITI).

image

I have sketchy details. JustSystems has been working with search wizard Dr. David Evans, Clairvoyance, for a number of years. JustSystems has been active in the XML tools market.

As I understand the appliance, the user “just” plugs the system into an existing network. After configuration, the system indexes document, file servers, and Web server content. The box ships with connectors for database files and  Lotus notes.

Stephen Arnold, November 1, 2009

Years ago I did some work for Just Systems. I spent that money on dog food. This article earned me zippo.

Social Media Talks but Not Stats

November 1, 2009

84% of Social Media Programs Don’t Measure ROI” provides some insight into the hopes of social media fans. Lots of talk but little data. For me, the most interesting passage was:

… Some professionals or companies are adopting social technologies without having a way to actually measure how effective or useful the measures actually are. In fact, 84% of respondents said they don’t currently measure the ROI (return on investment) of their social media programs.

Those Enterprise 2.0 champions are looking for a silver bullet to fix the problems of their Enterprise 1.0 predecessors.

Stephen Arnold, November 1, 2009

To the House Appropriations Committee: No funds were appropriated to pay me to write this item.

Autonomy Sparks Arcpliance

November 1, 2009

Autonomy has been busy. After receiving a pat on the back from IDC, Autonomy introduced its digital archiving appliance. Appliances apply toaster think to complex software tasks. For example, if you want to crunch real time flows of financial information, Exegy has an appliance for you. If you want to index the data written by an enterprise back up system, give Index Engines a jingle. If you want to index an organization’s content, ring up Adhere Solutions and get a Google Search Appliance. You get the idea. Complex task encapsulated in a search toaster.

Autonomy’s appliance, described by CMSWire in “Autonomy Releases Arcpliance, IDOL-Based Digital Archiving Appliance”, is “a new  tool to further enhance their established cloud-based and on-premise archiving solutions.” As I worked through the glowing write up, I noted this interesting passage:

While their Intelligent Digital Operating Layer (IDOL) automatically “understands” how to manage each piece of content, Arcpliance works to archive it without the headache. Developed as a response to shortcomings in Storage Area Networks, Arcpliance utilizes Autonomy’s special split-cell architecture. The grid-based design is reportedly “infinitely scalable” and also combines the power of Autonomy’s Digital Safe, making the tool enterprise friendly.

What I think this means is that Autonomy’s software is smart, a bit like an educated, context sensitive, intuitive human analyst. Autonomy’s approach eliminates the problems that other types of digital archiving systems bring to the table. The smart software uses a “split cell architecture”, an approach with which I have zero experience. The Autonomy solutions—which runs in an organization’s data center, on premises or from the cloud—uses a “grad based design”. Again I lack the expertise to comment on this approach. However, I understand that the method is “infinitely scalable.” I do recall learning in 8th grade math class that the notion of  infinite is pretty big and thinking about infinity can drive some folks up a wall, an infinite wall I might add. So Autonomy’s ability to deploy a system that is infinitely scalable raises a bit of a logical pickle but I think that phrase is a bit of over enthusiastic purple prose. If not, the Arcpliance is brushing shoulders with the big \aleph. I would imagine the demo is interesting indeed.

More information is available from Autonomy.

Stephen Arnold, November 1, 2009

The Department of Agriculture needs to know I received no fodder for this article.

Some Brainware Confusion

November 1, 2009

Brainware offers a system that ingests paper at one end and makes the digital information searchable. The company competes with ZyLAB and other end-to-end content proc3essing companies as well as some eDiscovery firms. The company’s name is an unusual one, and I was surprised to read “Brainware India Launches New Website for Its SEO Division.” I did a bit of looking and found out that the two Brainwares are not obviously related. My thought is that in a confused market such as search, a distinctive name is necessary. One of these Brainwares may want to find a way to differentiate itself to avoid the type of confusion the addled goose noted. I read quite a few vendor news releases, and I have to tell you that I have a difficult time figuring out exactly what function is real and which is vaporware. When the names confuse, the situation for me is almost hopeless.

Stephen Arnold, November 1, 2009

To the Kentucky State Police: A freebie.

« Previous Page

  •  Only search links from this page: