Mindbreeze InSite DemoAugmentextPolySpot: Agile Enterprise Search Infrastructure

InQuira Antecedents: Answerfriend and Electric Knowledge

May 26, 2012

I have had to look up the antecedents for InQuira again. I wanted to create this post to make it easy to reference these two firms which were combined to create InQuira. InQuira was acquired by Oracle Corp. in that company’s push to address its long-standing search and content processing issues. I have in my Overflight system the 2006 InQuira marketing collateral which, I noticed, provides a crib sheet for the many enterprise search vendors piling into the customer support segment. What’s interesting is that customer support is one of the sectors where open source search is getting some attention.

The antecedents of InQuira were:

  • Answerfriend. The company had software which could understand text. In 2000, the company landed Accenture as a customer. Answerfriend pivoted on its natural language processing technology. Allegedly Answerfriend could handle both structured an unstructured data. Sound familiar in 2012?
  • Electric Knowledge Inc. This also was an NLP shop. The technology was based on computational linguistic technology. This company had licensed its technology to Bank of America, an outfit which has had a long history of trying to find a search system which meets its requirements.

InQuira was created in 2002. The notion of hooking together two separate vendors to do the 1+1=3 thing has been used more recently by Lexalytics and Attensity.

At one time, InQuira was the answer system used by Yahoo’s customer support service. I encountered this when I tried to cancel a Yahoo service. The InQuira service was not too helpful to me. I just killed the credit card and solved the problem.

The marketing pitch of InQuira is as fresh today as it was in 2002. How much progress has there been in search and content processing in the last decade? Could the marketing collateral for a 2002 Oldsmobile be used without any changes? Probably not. Search has a limited supply of jargon, and it gets recycled endlessly in my opinion.

Stephen E Arnold, May 26, 2012

Sponsored by Polyspot

Three Metasearch Vulnerabilities and DuckDuckGo

May 25, 2012

I read “The Digital Skeptic: DuckDuckGo Cooks Google’s Goose.” I am okay with online cheerleading. I like to use metasearch systems like DuckDuckGo, but my favorite Ez2Ask.com went away. Ixquick is okay, but each of these systems has three vulnerabilities. I want to highlight them before my addled goose brain forgets them. It is possible that those experts writing about metasearch or federating systems will want to consider these points. One of two might make the analysis a little tastier, sort of like paté from a force fed goose.

First, metasearch engines take a query and send it to a third-party index. The results come back and the results are ideally deduped, relevance ranked, and displayed for the user. Some metasearch systems perform a number of value adding functions. These include putting the hits in folders, which was Vivisimo’s claim to fame. Others parse the results by source type and display them in groups, a function which EZ2Ask.com offered while it was going full throttle from its redoubt in southern France. But when the third party indexes charge money to pull results or just block the metasearch engine, the party is over. Vivisimo built a crawler in order to have an original index for some applications. Most metasearch systems just hope that the third party index won’t change the rules. Anyone remember the original BOSS service and its flexibility? So, vulnerability one is losing a source of hits. No hits, reduced utility. Less utility means less traffic.

Second, when queries are sent to third party indexes, there is latency. There are tricks to mask the latency, but the fact is that in certain situations, the metasearch engine is either presenting a partial result set or one that is just slow to render. So vulnerability two is a performance headache for the metasearch crowd.

Third, deduplication. For some queries, the Web indexes will bang the same drum and loudly. A query for Hewlett Packard Lynch will generate many duplicate and near duplicate hits. The metasearch system must have a way to winnow the most egregious duplicates from the results list and quickly. Slow deduping or no deduping is bad. Partial deduping may be acceptable, but there is a trade off. So, vulnerability three is a results list which contains many identical or similar stories.

Why do a metasearch engine if there are vulnerabilities cheerfully overlooked in the “Cooks Google’s Goose” write up?

  1. Metasearch is a heck of a lot cheaper to pull off than brute force search.
  2. Users often prefer the convenience of having one system “pull together” what the user perceives as the most relevant content
  3. Metasearch allows a marketer to engage in the type of promotion that produces the “Cooks Google’s Goose” article.

As an addled goose, I try not to be too confused about metasearch. Are you?

Stephen E Arnold, May 25, 2012

Sponsored by Polyspot

Big Outfits Buy Search Vendors: Does Chaos Commence?

May 25, 2012

I don’t want to mention any specifics in this write up. I have a for-fee Overflight on the subject. I do want to highlight some of the preliminary thoughts the goslings and I collected before creating our client-focused analysis. This write up was sparked by the recent news that the founder of Autonomy, which HP acquired for $10 billion, is seeking new opportunities after eight months immersed in the HP way. See “Hewlett-Packard Can’t Say It Wasn’t Warned about Autonomy.” This write up contained a remarkable statement, even when measured against the work of other “real” journalists:

Some will say this is a classic case of an entrepreneurial business being bought by a hulking, bureaucratic institution which failed to integrate it and failed to understand its culture. Others will say HP, desperate to do a deal, simply overpaid for a company that was going to struggle to maintain its sales and earnings momentum and was deluded about its abilities. Certainly warnings about the latter were there for HP to see before it handed over all that cash. Here’s what Marc Geall, a Deutsche Bank analyst who used to work at Autonomy, said in October 2010 about the business model: “…investment in the business has lagged revenues… [which] could affect customer satisfaction towards the product and the value it delivers.” He went on to warn that Autonomy’s service business was “too lean” and that it “risks falling short of standards demanded by customers”. All of which prompted Geall to question whether the company needed to change its business model – “traditionally, software companies have needed to change their business models at around $1bn in revenues”.

Yep, now the issues are easy to identify: the brutal cost of customer support, the yawning maw of research and development, the time and cost of customizing a system. The problem is that these issues have been identified. However, senior managers looking for the next big thing are extremely confident of their business and technical acumen. Search is a slam dunk. Heck, I can find what I want in Google. How tough can it be to find that purchase order? That confidence may work in business school, but it has not worked in the wild-and-crazy world of enterprise search and content processing.

Think back to the notable search acquisitions over the last few years. Here are some to jump start your memory:

  • IBM in 2005 and 2006 purchases iPhrase (a MarkLogic precursor with semantic components) and Language Analysis Systems (a next generation content processing vendor)
  • Microsoft which acquired Powerset and Fast Search & Transfer in the 2008 to 2009 period. Both vendors had next-generation systems with semantic, natural language processing, and other near-magical capabilities
  • Oracle acquired TripleHop in 2005, focused on its less-and-less visible Secure Enterprise Search line up (SES10g and SES11g), then went on a buying spree to snap up InQuira (actually the company formed when two weaker players, Answerfriend Inc. and Electric Knowledge Inc., merged in 2002 or 2003, RightNow (which uses the Q-Go natural language processing system purchased in 2010 or 2011), and Endeca, an established search vendor with technology dating from the late 1990s)
  • SAP snagged some search functions with its NetWeaver buy in 2004 which coexisted in a truce of sorts with the SAP TREX system. SAP bought Business Objects in 2007, the company inherited the Inxight Software, a text analytics vendor with assorted wizardry explained in buzzwords by marketing mavens.

So what have we learned from these buy outs by big companies? Here are the observations:

First, search and content processing does not behave the way other types of software learns to sit, come, and roll over. The MBAs, lawyers, and accountants issue commands like good organizational team players. The enterprise search and content processing crowd listens to the management edicts with bemusement. Everyone thinks search is a slam dunk. How tough can a utility function be? Well, let me remind you, gentle reader, search is pretty darned difficult. Unlike a cloud service for managing contacts, search is not one thing. Furthermore, those who have to use search are generally annoyed because systems have since 1970 failed to generate answers. Search outputs create more work. Usually the outputs are mostly wide of the mark. Big companies want to sell a software product or service that solves a problem like what is the back log for the Midwestern region or when did I last call Mr. Jones? The big companies don’t get this type of system when they buy, often for a premium, companies which purport to make content findable, smart, and accessible. So we have a situation in which a sales presentation whets the appetite of the big company executive who perceives himself or herself as an expert in search. Then when anticipation is at its peak, the sales person closes the deal. In the aftermath, the executives realize that search just does not follow the groove of an accounting system, a videoconferencing system, or a security system. Panic sets in, and you get crazy actions. IBM pretty much jettisoned its search systems and fell in love with open source Lucene / Solr. Good enough was a lot better than trying to figure out the mysteries of proprietary search and how to pay for the brutal research and development costs search requires.

Second, search is a moving target. I find that as recently as my meetings with sleek MBAs from six major financial firms, search was assumed to be a no brainer. Google has figured out search. Move on. When I asked the group how many considered themselves experts in search, everyone replied, “Yes.” I submit that none of these well-paid movers-and-shakers are very good at search and retrieval. Few of them have the time or patience for old fashioned research. Most get information from colleagues, via phone calls which include “I have a hard stop in five minutes”, and emails sent to people whom they have met at social functions or at conferences. Search is not looking up a phone number. Search is not slamming the name of a company into Google. Search is not wandering around midtown Manhattan with an iPhone displaying the location of a pizza joint. Search is whatever the user wishes to find, access, know, or learn at any point in time and in any context. Google is okay at some search functions. Other vendors are okay at others. The problem is that virtually all search and retrieval solutions are okay. People have been trying for about 50 years to deliver responses to queries that are what the user requires. Most systems dissatisfy more than half their users and have for 50 years. A big company buying a next generation search system wants these problems solved. The big company wants to close deals, get client access licenses, or cloud transactions for queries. But the big companies don’t get these things, so the MBAs, lawyers, and accountants are really confused. Confused people make crazy decisions. You get the idea.

Third, search does not mean search. Search technology includes figuring out which words to index in a document. Search does a miserable job of indexing videos unless the video audio track is converted to ASCII and then that ASCII is indexed. Even with this type of content processing system, search does not deliver a usable output. What a user gets is garbled snippets and maybe the opportunity to look at a video to figure out if the information is relevant. Search includes figuring out what a user wants before the user asks the question or even knows what the question is. One company is collecting millions in venture money to achieve this goal. Good luck on that. Search includes providing outputs that answer an employee’s specific question. Most systems provide a horseshoe type of result; that is, the search vendor wants points for getting close to the answer. Employees who have to click, scan, close, and repeat the process are not amused. The employee wants the Smith invoice from April, not increased risk of carpal tunnel problems. The poobahs who acquire search companies want none of these excuses. The poobahs want sales. What search acquisitions generate are increased costs, long sales cycles, and much friction. Marketers overstate and search systems routinely under deliver.

Who cares?

Another enterprise search train wreck. The engineer was either an MBA, an accountant, or a lawyer. No big deal. Just get another search train. How tough can it be to run a search system? Thanks to http://www.eccchistory.org/CCRailroads.htm

Well, the executives selling big companies a search and content processing just want the money. After years of backbreaking effort to generate revenues, the founders usually figure out that there are easier ways to earn a living. If the founders don’t bail out, they get a new job or become a guru at a venture capital firm.

Read more

Attivio Signs TCPlus as a Partner

May 25, 2012

The folks at Attivio must be pleased with their most recent success. MMD Newswire reveals, “TCPlus and Attivio Sign Partner Agreement for Australia and Switzerland.” Network component vendor TCplus Datennetz is adding Attivio’s Active Intelligence Engine (AIE) to its wares. AIE is a unified information access platform that goes beyond traditional data warehousing and enterprise search solutions. The press release emphasizes:

“Attivio AIE freely integrates and presents structured data (databases) together with unstructured content (documents, SharePoint, web content, email, etc.) enabling customers to know not only ‘what’ is happening, but also gain context to analyze ‘why’ it is happening. Organizations that implement AIE empower their business users to easily access and analyze all relevant enterprise information to identify new business solutions and opportunities that might otherwise go undiscovered.

“Attivio AIE offers the most accessible and standards-based approach to analytics of any UIA platform.”

AIE is compatible with SQL as well as with leading business intelligence and analytic platforms. The product has garnered several awards.

Headquartered in Newton, MA, Attivio also has offices in the UK and Germany. The company has made it its mission to unravel the unstructured data conundrum. It has partnered with prominent OEM/service providers and solution providers as well as technology vendors like TCplus Datennetz.

Founded in 2004, TCplus Datennetz sells and installs active and passive network components. They pride themselves on their expertise and their close relationships with their customers.

Cynthia Murrell, May 25, 2012

Sponsored by PolySpot

SAP Big Blue Rides Hana

May 25, 2012

The University of Kentucky‘s business intelligence team has had to make some adjustments after the school implemented SAP‘s HANA system. ComputerWorld declares, “For Univ. of Kentucky, SAP’s HANA is ‘Disruptive’.” Writer Patrick Thibodeau, punning on the term “disruptive technology,” notes that the University is (purposely) using HANA to restructure its BI system to better analyze student retention.

The new in-memory systems like HANA pull data from RAM instead of from hard disks. Speed and relative simplicity are the advantages, but these systems do require a hardware investment. In this case, Dell provided the hardware and developed the school’s student retention data models.

HANA is only a year old, and questions about its longevity are still in the air. Part of the issue is the hardware question—should organizations deploy on the tried and true x86 system or go with an engineered system, like IBM’s new PureSystems. Thibodeau writes:

“Engineered systems offer performance gains, meaning faster time to realize value and ‘less cumbersome’ management, said Alys Woodward, a research director at IDC. On the other hand, ‘software on commodity hardware reduces vendor lock-in and enables the use of cheaper components,’ said Woodward.

“How SAP HANA ‘will play in the broader marketplace — outside SAP’s core install base — against Oracle Exadata and IBM engineered systems, depends to some extent on how these two opposing concepts will play out,’ said Woodward.”

So, x86 or engineered, take your pick. If you are considering HANA, though, the write up notes that you should make sure it will do what you want before buying the pricey software. It will not, for example, make up for poor data quality. It is also more worth the cost and effort someplace where business requirements change frequently than for an organization with a more static environment.

Cynthia Murrell, May 25, 2012

Sponsored by PolySpot

ZyLAB Embraces Predictive and Concept Searching

May 25, 2012

The CodeZed blog recently reported on the automated classification of legal documents in the article “Technology Assisted Review, Concept Search and Predictive Coding: The Limitations & Risks.”

According to the article, artificial intelligence and machine learning has been around since the 1980’s but a recent US ruling regarding the use of machine learning technology in legal review has stirred up trouble in the eDiscovery community. As a result of this ruling, one can expect a dramatic increase in Predictive Coding, Concept Search or other terms relating to TAR capabilities being a requirement for eDiscovery software buyers.

When discussing some of the detriments of machine learning and artificial intelligence, the article states:

“Machine-learning requires significant set-up involving training and testing the quality of the classification model (aka the classifier), which is a time consuming and demanding task that requires at least the manual tagging and evaluation of both the training and the test set by more than one party (in order to prevent biased opinions). Testing has to be done according to best practice standards used in the information retrieval community (e.g. see the proceedings of the TREC conferences organized by the NIST). Deviation from such standards will be challenged in courts. This is time consuming and expensive and should be factored into the cost-benefit analysis for the approach.”

So the short of it is, before using Technology Assisted Review make sure that you do your research and figure out what is best for your business.

Jasmine Ashton, May 25, 2012

Sponsored by PolySpot

Palantir Receives Seventh Round of Funding

May 24, 2012

Palantir is in the money again, TechCrunch informs us in “Palantir Technologies Nabs $56M in New Funding, SEC Filing Shows.” According to the article, this is the seventh round of venture capital funding for the data management company.

What is the company doing with this money?

That’s a lot of investment. What are these folks inventing? Writer Colleen Taylor doesn’t seem quite sure, noting that TechCrunch has requested more information from the company. She plans to update her post when she gets a response. For now, she writes:

“The company provides high-powered software platforms that let users integrate, visualize, and analyze large quantities of data. Perhaps most importantly, Palantir specifically has targeted its products to two sectors that need to parse large amounts of classified information, and require super solid security: Government and finance. The company counts governmental organizations such as the FBI and financial institutions such as JP Morgan as customers. Palantir has doubled in size each year since it was founded, according to its website.”

With founding members from such promising pools as PayPal alumni and Stanford computer science grads, Palantir launched in 2004. Its two products, Palantir Government and Palantir Finance, work with a wide range of data types. The company is based in Palo Alto, CA, and has offices in Virginia, New York, and London. Despite its growth, Palantir strives to retain its startup attitude and maintain the highest of standards.

But. . . just what are they working on now? Try turning back to the TechCrunch article to see whether Taylor got her answers. Other companies are able to push forward without sucking tens of millions in cash. Check out www.ikanow.com and www.digitalreasoning.com.

Cynthia Murrell, May 24, 2012

Sponsored by PolySpot

Wolff Howls, The Facebook Is Failing

May 24, 2012

I read “The Facebook Fallacy.” The point of the write up is that online advertising is doomed. Upbeat. Clever. And it certainly seems to be spot on in the wake of the slow sinking of Facebook shares.

Mr. Wolff asserts:

I don’t know anyone in the ad-Web business who isn’t engaged in a relentless, demoralizing, no-exit operation to realign costs with falling per-user revenues, or who isn’t manically inflating traffic to compensate for ever-lower per-user value.

I quite like the word “humper”. It adds some interesting connotations to the person engaged in selling advertising. What does “humper” call to your mind? Keep your thoughts to yourself; otherwise, an online advertiser may insert an advertisement into your once-private life.

The killer sentence in the write up, in my opinion, was this one:

The growth of its user base and its ever-expanding  page views means an almost infinite inventory to sell. But the expanding supply, together with an equivocal demand, means ever-lowering costs. The math is sickeningly inevitable. Absent an earth-shaking idea, Facebook will look forward to slowing or declining growth in a tapped-out market, and ever-falling ad rates, both on the Web and (especially) in mobile. Facebook isn’t Google; it’s Yahoo or AOL.

I put the juicy bit in bold. I enjoyed the poignant reference to the value of a New York Times online subscriber, but let’s think a moment about the reality of Facebook.

First, the social trend does not have much impact on me. But for some, Facebook is a must-have application or service. However, Facebook is oozing forward. The company is likely to undergo changes. My view is that the changes will be slow, so the demise of the Facebook blob will take some time.

Second, the problem online advertising faces is in some ways similar to the problem traditional advertising faces. Audiences phase change without warning. The truisms which allowed my account representative from Ketchum McLeod & Grove don’t work too well in today’s wonky business climate. In the absence of proven methods for making sales, there is a desperation marketing phenomenon which I find interesting. Nothing much works, and I don’t think Facebook will crack the code. However, there are enough PT Barnum opportunities to keep the business afloat for a while.

Third, the present financial climate jeopardizes Facebook and a number of other businesses. I am far more concerned about the social consequences of cutting the financial lifelines to those who depend on government largesse to survive. One can advertise and market like the Dickens. If potential customers don’t buy, there is a larger problem.

I don’t have a horse in this race. I don’t care what happens to Facebook or any of the Web outfits. I am reluctant to cry “wolff”.

Stephen E Arnold, May 24, 2012

Sponsored by Polyspot

The Challenges for Microsoft SharePoint Integrators

May 22, 2012

I don’t care too much about outfits who surf on other company’s software. Been there. Done that. In my experience with Infozen, an outfit with which I was affiliated during the wild and crazy “index the Federal government” years, I learned:

  1. Integrators and resellers take advantage of clients who lack the expertise, time, and management acumen to get a job done in a cost effective manner during normal work hours
  2. Partners, integrators and resellers sell what generates money. Investing in research and development is a PowerPoint or Keynote slide, not a business practice. Clients pay for the resellers and integrators to solve a problem. If the solution works, the integrator or reseller will resell the solution, emphasizing that it is an invention.
  3. Integrators and resellers are trying to avoid the “pay to play” model enforced by a number of software giants. A good way to determine if the outfit requires integrators or resellers to pony6 up hard cash for the privilege of selling enterprise software is too look for print advertising in various trade publications.
  4. Integrators and resellers use a tie up as an occasion for a news release. A good example is the “Oracle Endeca Getting Started Partner Guide.”

At a recent briefing I gave in New York, I had an occasion to talk to a very energetic investment type. I picked up three signals about the Microsoft SharePoint reseller and partner ecosystem. Like most information floating around after 6 pm in Manhattan, I suspect there is mostly baloney in the observations. But I wanted to snag them before they slipped from my flawed short term memory bank:

First, it seems that Microsoft is not putting much wood behind Fast Search & Transfer technology. I believe the phrase the MBA squirrel used was “end of life.” If true, the $1.2 billion and messy Fast situation may be in the midst of a rethink. What will Microsoft do? With the juicy search companies gobbled up, Microsoft may have to pull some rabbits out of its many hats. Open source, non US search and content processing vendors, making a cake from its own search ingredients, leveraging Powerset and other technologies?

Second, some Microsoft partners are starting to “go off the reservation.” In the free blog, I do not want to mention names. I learned that one prominent Microsoft Certified Partner had quietly embraced non Microsoft technologies. The “quietly” suggests to me that Microsoft could choke off a flow of sales leads if the shift caused big waves. The reason to “go off the reservation” boiled down to the sense that some Microsoft centric shops were starting to demonstrate “fee fatigue.” What do resellers do when revenue from Old Faithful slows, resellers and integrators look for what will sell.

Third, after decades of having a sure-fire business model, some partners and integrators see that alternatives exist and may be worth exploring. Examples include cloud alternatives to on premises Microsoft solutions or – hang on to your hat – open source solutions.

The impact of the lousy financial climate is taking a toll on some Microsoft centric vendors. The toll will be more burdensome going forward. In short, integrators and resellers are in play.

Stephen E Arnold, May 22, 2012

Sponsored by Polyspot

Scoop: Is It a Surprise That Google and Microsoft Target Amazon?

May 22, 2012

Okay, “real” journalists are causing my blood pressure medicine to work overtime. I did not know that Amazon was a big deal. I am delighted that a major “real” news outfit reported for the first time in the history of mankind this insight: “Scoop: Google, Microsoft Both Targeting Amazon with New Clouds.” The insight which knocked me on my tail feathers was:

Google and Microsoft are two cloud providers that should have Amazon Web Services shaking a bit, in a way Rackspace and the OpenStack haven’t yet been able to. Google and Microsoft both have the engineering chops to compete with AWS technically, and both have lots of experience dealing with both developers and large companies. More importantly, both seem willing and able to compete with AWS on price — a big advantage for AWS right now as its economies of scale allow it to regularly slash prices for its cloud computing services.

Even though we have provided some insight to our hopeless befuddled investment bank clients, we totally missed the fact that Amazon had a cloud service, that Google and Microsoft seem to be playing a me too game, and that Amazon is rolling out new services.

How could the goslings have failed me? We thought Amazon was really a purveyor of hard backed books and diapers? I expect that the financial outfits who pay us to analyze the more subtle aspects of companies engaged in online will be firing us in the next minute or two. Now I know my IQ is below 70, not even “dull normal.”

I suppose I can become a WalMart greeter.

Stephen E Arnold, May 22, 2012

Sponsored by no one. I mean who would pay money to an outfit who did not know that Google and Microsoft were interested in cloud revenue.

Next Page »

  •  Only search links from this page: