Three Metasearch Vulnerabilities and DuckDuckGo

May 25, 2012

I read “The Digital Skeptic: DuckDuckGo Cooks Google’s Goose.” I am okay with online cheerleading. I like to use metasearch systems like DuckDuckGo, but my favorite Ez2Ask.com went away. Ixquick is okay, but each of these systems has three vulnerabilities. I want to highlight them before my addled goose brain forgets them. It is possible that those experts writing about metasearch or federating systems will want to consider these points. One of two might make the analysis a little tastier, sort of like paté from a force fed goose.

First, metasearch engines take a query and send it to a third-party index. The results come back and the results are ideally deduped, relevance ranked, and displayed for the user. Some metasearch systems perform a number of value adding functions. These include putting the hits in folders, which was Vivisimo’s claim to fame. Others parse the results by source type and display them in groups, a function which EZ2Ask.com offered while it was going full throttle from its redoubt in southern France. But when the third party indexes charge money to pull results or just block the metasearch engine, the party is over. Vivisimo built a crawler in order to have an original index for some applications. Most metasearch systems just hope that the third party index won’t change the rules. Anyone remember the original BOSS service and its flexibility? So, vulnerability one is losing a source of hits. No hits, reduced utility. Less utility means less traffic.

Second, when queries are sent to third party indexes, there is latency. There are tricks to mask the latency, but the fact is that in certain situations, the metasearch engine is either presenting a partial result set or one that is just slow to render. So vulnerability two is a performance headache for the metasearch crowd.

Third, deduplication. For some queries, the Web indexes will bang the same drum and loudly. A query for Hewlett Packard Lynch will generate many duplicate and near duplicate hits. The metasearch system must have a way to winnow the most egregious duplicates from the results list and quickly. Slow deduping or no deduping is bad. Partial deduping may be acceptable, but there is a trade off. So, vulnerability three is a results list which contains many identical or similar stories.

Why do a metasearch engine if there are vulnerabilities cheerfully overlooked in the “Cooks Google’s Goose” write up?

  1. Metasearch is a heck of a lot cheaper to pull off than brute force search.
  2. Users often prefer the convenience of having one system “pull together” what the user perceives as the most relevant content
  3. Metasearch allows a marketer to engage in the type of promotion that produces the “Cooks Google’s Goose” article.

As an addled goose, I try not to be too confused about metasearch. Are you?

Stephen E Arnold, May 25, 2012

Sponsored by Polyspot

Big Outfits Buy Search Vendors: Does Chaos Commence?

May 25, 2012

I don’t want to mention any specifics in this write up. I have a for-fee Overflight on the subject. I do want to highlight some of the preliminary thoughts the goslings and I collected before creating our client-focused analysis. This write up was sparked by the recent news that the founder of Autonomy, which HP acquired for $10 billion, is seeking new opportunities after eight months immersed in the HP way. See “Hewlett-Packard Can’t Say It Wasn’t Warned about Autonomy.” This write up contained a remarkable statement, even when measured against the work of other “real” journalists:

Some will say this is a classic case of an entrepreneurial business being bought by a hulking, bureaucratic institution which failed to integrate it and failed to understand its culture. Others will say HP, desperate to do a deal, simply overpaid for a company that was going to struggle to maintain its sales and earnings momentum and was deluded about its abilities. Certainly warnings about the latter were there for HP to see before it handed over all that cash. Here’s what Marc Geall, a Deutsche Bank analyst who used to work at Autonomy, said in October 2010 about the business model: “…investment in the business has lagged revenues… [which] could affect customer satisfaction towards the product and the value it delivers.” He went on to warn that Autonomy’s service business was “too lean” and that it “risks falling short of standards demanded by customers”. All of which prompted Geall to question whether the company needed to change its business model – “traditionally, software companies have needed to change their business models at around $1bn in revenues”.

Yep, now the issues are easy to identify: the brutal cost of customer support, the yawning maw of research and development, the time and cost of customizing a system. The problem is that these issues have been identified. However, senior managers looking for the next big thing are extremely confident of their business and technical acumen. Search is a slam dunk. Heck, I can find what I want in Google. How tough can it be to find that purchase order? That confidence may work in business school, but it has not worked in the wild-and-crazy world of enterprise search and content processing.

Think back to the notable search acquisitions over the last few years. Here are some to jump start your memory:

  • IBM in 2005 and 2006 purchases iPhrase (a MarkLogic precursor with semantic components) and Language Analysis Systems (a next generation content processing vendor)
  • Microsoft which acquired Powerset and Fast Search & Transfer in the 2008 to 2009 period. Both vendors had next-generation systems with semantic, natural language processing, and other near-magical capabilities
  • Oracle acquired TripleHop in 2005, focused on its less-and-less visible Secure Enterprise Search line up (SES10g and SES11g), then went on a buying spree to snap up InQuira (actually the company formed when two weaker players, Answerfriend Inc. and Electric Knowledge Inc., merged in 2002 or 2003, RightNow (which uses the Q-Go natural language processing system purchased in 2010 or 2011), and Endeca, an established search vendor with technology dating from the late 1990s)
  • SAP snagged some search functions with its NetWeaver buy in 2004 which coexisted in a truce of sorts with the SAP TREX system. SAP bought Business Objects in 2007, the company inherited the Inxight Software, a text analytics vendor with assorted wizardry explained in buzzwords by marketing mavens.

So what have we learned from these buy outs by big companies? Here are the observations:

First, search and content processing does not behave the way other types of software learns to sit, come, and roll over. The MBAs, lawyers, and accountants issue commands like good organizational team players. The enterprise search and content processing crowd listens to the management edicts with bemusement. Everyone thinks search is a slam dunk. How tough can a utility function be? Well, let me remind you, gentle reader, search is pretty darned difficult. Unlike a cloud service for managing contacts, search is not one thing. Furthermore, those who have to use search are generally annoyed because systems have since 1970 failed to generate answers. Search outputs create more work. Usually the outputs are mostly wide of the mark. Big companies want to sell a software product or service that solves a problem like what is the back log for the Midwestern region or when did I last call Mr. Jones? The big companies don’t get this type of system when they buy, often for a premium, companies which purport to make content findable, smart, and accessible. So we have a situation in which a sales presentation whets the appetite of the big company executive who perceives himself or herself as an expert in search. Then when anticipation is at its peak, the sales person closes the deal. In the aftermath, the executives realize that search just does not follow the groove of an accounting system, a videoconferencing system, or a security system. Panic sets in, and you get crazy actions. IBM pretty much jettisoned its search systems and fell in love with open source Lucene / Solr. Good enough was a lot better than trying to figure out the mysteries of proprietary search and how to pay for the brutal research and development costs search requires.

Second, search is a moving target. I find that as recently as my meetings with sleek MBAs from six major financial firms, search was assumed to be a no brainer. Google has figured out search. Move on. When I asked the group how many considered themselves experts in search, everyone replied, “Yes.” I submit that none of these well-paid movers-and-shakers are very good at search and retrieval. Few of them have the time or patience for old fashioned research. Most get information from colleagues, via phone calls which include “I have a hard stop in five minutes”, and emails sent to people whom they have met at social functions or at conferences. Search is not looking up a phone number. Search is not slamming the name of a company into Google. Search is not wandering around midtown Manhattan with an iPhone displaying the location of a pizza joint. Search is whatever the user wishes to find, access, know, or learn at any point in time and in any context. Google is okay at some search functions. Other vendors are okay at others. The problem is that virtually all search and retrieval solutions are okay. People have been trying for about 50 years to deliver responses to queries that are what the user requires. Most systems dissatisfy more than half their users and have for 50 years. A big company buying a next generation search system wants these problems solved. The big company wants to close deals, get client access licenses, or cloud transactions for queries. But the big companies don’t get these things, so the MBAs, lawyers, and accountants are really confused. Confused people make crazy decisions. You get the idea.

Third, search does not mean search. Search technology includes figuring out which words to index in a document. Search does a miserable job of indexing videos unless the video audio track is converted to ASCII and then that ASCII is indexed. Even with this type of content processing system, search does not deliver a usable output. What a user gets is garbled snippets and maybe the opportunity to look at a video to figure out if the information is relevant. Search includes figuring out what a user wants before the user asks the question or even knows what the question is. One company is collecting millions in venture money to achieve this goal. Good luck on that. Search includes providing outputs that answer an employee’s specific question. Most systems provide a horseshoe type of result; that is, the search vendor wants points for getting close to the answer. Employees who have to click, scan, close, and repeat the process are not amused. The employee wants the Smith invoice from April, not increased risk of carpal tunnel problems. The poobahs who acquire search companies want none of these excuses. The poobahs want sales. What search acquisitions generate are increased costs, long sales cycles, and much friction. Marketers overstate and search systems routinely under deliver.

Who cares?

Another enterprise search train wreck. The engineer was either an MBA, an accountant, or a lawyer. No big deal. Just get another search train. How tough can it be to run a search system? Thanks to http://www.eccchistory.org/CCRailroads.htm

Well, the executives selling big companies a search and content processing just want the money. After years of backbreaking effort to generate revenues, the founders usually figure out that there are easier ways to earn a living. If the founders don’t bail out, they get a new job or become a guru at a venture capital firm.

Read more

Learning SharePoint and PowerShell is Worth It

May 25, 2012

In “Getting to Grips with PowerShell,” Robert Schifreen continues his SharePoint 2010 odyssey series by taking a closer look at the importance of PowerShell. The powerful scripting language may leave many users baffled, but Schifreen explains the benefits of being well-versed in PowerShell:

Although you can do all your SharePoint admin through the web interface via something called Central Administration, getting to grips with doing things in PowerShell is well worth the investment in time. For example, instead of sitting in Central Admin creating dozens of departmental site areas, you can take a CSV file of your department names and quickly turn it into a PowerShell script that does the job for you. If the resulting site structure then doesn’t look quite right, just alter the script, delete all the sites you created (using PowerShell, naturally), and run the script again. So being a good SharePoint admin means learning SharePoint and PowerShell. That’s just the start.

Schifreen also suggests investing in a decent book on the topic, and if you want enterprise-level search, he suggests adding a book on FAST, InfoPath, Business Intelligence connectivity services, Visual Studio, SharePoint Designer, and more. While enterprise-search is an investment, there are out-of-the-box solutions out there that can save you valuable training and setup resources.

To bypass the need for some expensive or time–consuming training, consider a third party solution like Fabasoft Mindbreeze, which extends the capabilities of your SharePoint system. Their Web Parts based information pairing capabilities give you powerful searches and a complete picture of your business information, allowing you to get the most out of your enterprise search investments. And your end users will benefit from the fast and intuitive search with clearly displayed results and simple navigation.

Fabasoft Mindbreeze Enterprise gains each employee two weeks per through focused finding of data (IDC Studies). An invaluable competitive advantage in business as well as providing employee satisfaction.

Mindbreeze’s intuitiveness means less training required. They also have tutorials and wikis that are easy to use and more efficient. Here you can browse Mindbreeze’s support tools for users, including videos, FAQs, wikis, and other training options. Check out the full suite of solutions at Fabasoft Mindbreeze.

Philip West, May 25, 2012

Sponsored by Pandia.com

What Differentiates Facebook from Google and HP?

May 25, 2012

It would seem to state the obvious to say that Facebook is different from Google and HP. Just how it stands apart could be more interesting, though, particularly from the viewpoint of the social giant’s own fearless leader. ZDNet reports, “Mark Zuckerberg on How Facebook is Different from Google, HP.”

Going public can have a way of changing a company; suddenly, shareholders must be placated, and it can easily become all about short-term profits. Zuckerberg insists, however, that his company will forever be all about “the social mission,” as writer Emil Protalinski put it. Recently, Zuckerberg emphasized this priority in a comparison with two of Facebook’s biggest competitors:

“I think the biggest difference between Facebook and other companies is how focused we are on our mission … Different companies care about different things. There are companies that care about, just really care about having the biggest market cap. Or there are companies that are really into process or the way they do things. Hewlett Packard, right? The thing that you always hear about them is ‘the HP Way.’ … Google, I think, is very tied to their culture — they really love that. For us, it is the mission: building a company that makes the world more open and connected. The articulation of that has, I think, changed over time. But that’s really been, like, the belief the whole time.”

Will investors appreciate the boy genius’ attitude? Some have already expressed disappointment in his wardrobe. Protalinski points out that anyone investing in Facebook is ultimately investing in Zuckerberg and his vision; stockholders would do well to give the man room to keep doing what he does. Comfortably dressed, even.

Cynthia Murrell, May 25, 2012

Sponsored by PolySpot

Jetbox Introduces Enticing Solution to Old Problems

May 25, 2012

Finding the right balance between easy to use and robust enough to handle a company’s data is a problem with which many traditional product lifecycle management providers struggle.  A new company, Jetbox, has just recently entered the PLM field and promises that their solution is unlike any before.  The article, “Introducing Jetbox(TM), Inc., a Company that Has Reinvented the Way PLM Is Sold, Deployed, Used, and Maintained”, on Market Watch, announces all the ways Jetbox is not the same old, same old when it comes to PLM solutions.

The article describes the company:

“Included in its ground-breaking software is iC5(TM) Turbo, a comprehensive sales and implementation toolset that dramatically slashes the time and eliminates the expertise required to sell, plan, install, configure, document, test, deploy, train, use, support, and upgrade a PLM system. This powerful software toolset helps companies eliminate the need for a customized solution and a team of consultants, keeping within the constraints of an out-of-the-box approach.”

Jetbox’s approach at making PLM accessible to more people within an enterprise while keeping costs affordable and making it easily integratable with existing platforms is exactly the direction PLM needs to be moving.  Other providers, like Inforbix, are also realizing that a PLM solution that is too difficult to operate and requires a lot of money to install and maintain does more harm than it does good.  We look forward to what Jetbox and others similar come up with next.

Catherine Lamsfuss, May 25, 2012

Will Quora Funding Pay Amazon Invoice?

May 25, 2012

Hmm, does this mean AWS is not free? Popular Q&A site Quora just raised a hefty chunk of change, but Business Insider reveals that “A Lot of Quora’s $50 Million Is Going Straight to Amazon.” The write up explains:

“Amazon.com, besides its vast online store, also rents out computing power to startups like Quora, so they don’t have to buy servers and lease space in data centers themselves. The division is called Amazon Web Services, and one of its key offerings is the Elastic Compute Cloud, or EC2.

“‘We project a large portion of this money to go to EC2 and other AWS bills,'” wrote D’Angelo. “‘It might be replaced by whatever the most appropriate place for us to run our infrastructure is in the future but as of today it’s looking like EC2.'”

I know I’m unfamiliar with the details, but on the surface it doesn’t sound like that deal lives up to AWS’ “low cost” promise. Perhaps Quora should shop around a bit more? Just a thought.

Amazon won’t be getting all of Quora’s cash, though. The company will soon be hiring, and will also save some for a rainy day. This team seems quite thrifty—they still have about half of their first financing round socked away. Very prudent.

Quora began work on their product in 2009, and launched their beta in early 2010. Their innovative system curates content on personal home pages so that users can easily find what is relevant to them—information from those who share their interests, or those with experience in the subject at hand. Last year, the company received the TechCrunch Award for Best New Startup or Product of 2010.

Cynthia Murrell, May 25, 2012

Sponsored by PolySpot

Attivio Signs TCPlus as a Partner

May 25, 2012

The folks at Attivio must be pleased with their most recent success. MMD Newswire reveals, “TCPlus and Attivio Sign Partner Agreement for Australia and Switzerland.” Network component vendor TCplus Datennetz is adding Attivio’s Active Intelligence Engine (AIE) to its wares. AIE is a unified information access platform that goes beyond traditional data warehousing and enterprise search solutions. The press release emphasizes:

“Attivio AIE freely integrates and presents structured data (databases) together with unstructured content (documents, SharePoint, web content, email, etc.) enabling customers to know not only ‘what’ is happening, but also gain context to analyze ‘why’ it is happening. Organizations that implement AIE empower their business users to easily access and analyze all relevant enterprise information to identify new business solutions and opportunities that might otherwise go undiscovered.

“Attivio AIE offers the most accessible and standards-based approach to analytics of any UIA platform.”

AIE is compatible with SQL as well as with leading business intelligence and analytic platforms. The product has garnered several awards.

Headquartered in Newton, MA, Attivio also has offices in the UK and Germany. The company has made it its mission to unravel the unstructured data conundrum. It has partnered with prominent OEM/service providers and solution providers as well as technology vendors like TCplus Datennetz.

Founded in 2004, TCplus Datennetz sells and installs active and passive network components. They pride themselves on their expertise and their close relationships with their customers.

Cynthia Murrell, May 25, 2012

Sponsored by PolySpot

SAP Big Blue Rides Hana

May 25, 2012

The University of Kentucky‘s business intelligence team has had to make some adjustments after the school implemented SAP‘s HANA system. ComputerWorld declares, “For Univ. of Kentucky, SAP’s HANA is ‘Disruptive’.” Writer Patrick Thibodeau, punning on the term “disruptive technology,” notes that the University is (purposely) using HANA to restructure its BI system to better analyze student retention.

The new in-memory systems like HANA pull data from RAM instead of from hard disks. Speed and relative simplicity are the advantages, but these systems do require a hardware investment. In this case, Dell provided the hardware and developed the school’s student retention data models.

HANA is only a year old, and questions about its longevity are still in the air. Part of the issue is the hardware question—should organizations deploy on the tried and true x86 system or go with an engineered system, like IBM’s new PureSystems. Thibodeau writes:

“Engineered systems offer performance gains, meaning faster time to realize value and ‘less cumbersome’ management, said Alys Woodward, a research director at IDC. On the other hand, ‘software on commodity hardware reduces vendor lock-in and enables the use of cheaper components,’ said Woodward.

“How SAP HANA ‘will play in the broader marketplace — outside SAP’s core install base — against Oracle Exadata and IBM engineered systems, depends to some extent on how these two opposing concepts will play out,’ said Woodward.”

So, x86 or engineered, take your pick. If you are considering HANA, though, the write up notes that you should make sure it will do what you want before buying the pricey software. It will not, for example, make up for poor data quality. It is also more worth the cost and effort someplace where business requirements change frequently than for an organization with a more static environment.

Cynthia Murrell, May 25, 2012

Sponsored by PolySpot

ZyLAB Embraces Predictive and Concept Searching

May 25, 2012

The CodeZed blog recently reported on the automated classification of legal documents in the article “Technology Assisted Review, Concept Search and Predictive Coding: The Limitations & Risks.”

According to the article, artificial intelligence and machine learning has been around since the 1980’s but a recent US ruling regarding the use of machine learning technology in legal review has stirred up trouble in the eDiscovery community. As a result of this ruling, one can expect a dramatic increase in Predictive Coding, Concept Search or other terms relating to TAR capabilities being a requirement for eDiscovery software buyers.

When discussing some of the detriments of machine learning and artificial intelligence, the article states:

“Machine-learning requires significant set-up involving training and testing the quality of the classification model (aka the classifier), which is a time consuming and demanding task that requires at least the manual tagging and evaluation of both the training and the test set by more than one party (in order to prevent biased opinions). Testing has to be done according to best practice standards used in the information retrieval community (e.g. see the proceedings of the TREC conferences organized by the NIST). Deviation from such standards will be challenged in courts. This is time consuming and expensive and should be factored into the cost-benefit analysis for the approach.”

So the short of it is, before using Technology Assisted Review make sure that you do your research and figure out what is best for your business.

Jasmine Ashton, May 25, 2012

Sponsored by PolySpot

SharePoint Options for iPhone and iPad

May 24, 2012

Mobile devices are moving from a business necessity to a formal mandate.  While apps exist for most major functions, tackling mobile access to SharePoint is a bigger creature to tame.  We may have a breaking development in, “Use SharePoint on your iPad or iPhone,” by Dave Johnson.

Collaborating with an iPad in a corporate environment is sometimes challenging, because the ubiquitous tablet hasn’t had the same access to SharePoint as laptops and desktops. Sharing a document, then, has meant sending attachments, and iPads were blocked from accessing the vast stores of corporate documents already archived in SharePoint. Now harmon.ie delivers substantially the same SharePoint experience as your desktop PC provides.

The free harmon.ie app offers an upgrade to a premium addition for a slim $20.  And while accessing an existing SharePoint infrastructure via an iPad or iPhone is now possible, the usability and efficiency is still in question.  How much work can actually be accomplished by forcing SharePoint to fit the mobile mold?

For those who are high mobile users, we would recommend a smart third-party solution that integrates a mobile offering into their primary solution.  Fabasoft Mindbreeze Mobile is one such option.

Smartphones and tablets are constant companions, indispensable in the business world. Information needs to be able to be exchanged at all times and wherever you are. Easily. Quickly. Securely.  Fabasoft Mindbreeze Mobile makes company data available on all mobile devices, regardless of whether you have a BlackBerry®, iPhone®, Windows Phone or Android™ Smartphone or a tablet such as the Apple iPad, Samsung Chromebook/GalaxyTab or Blackberry Playbook. You can act independently and freely – yet always securely. Irrespective of what format the data is in.

Perhaps most importantly, security is never compromised with the Mindbreeze compliance and industry-vetted solution.   So while some apps will let you access your SharePoint installation, Fabasoft Mindbreeze is built to make your mobile access fully functional.

Emily Rae Aldridge, May 24, 2012

Sponsored by Pandia.com

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta