The Decline of PCs and Search?
November 4, 2012
I worked through “The Slow Decline of PCs and the Fast Rise of Smartphones/Tablets Was Predicted in 1993.” The main point is that rocket scientist cook and patent expert, Nathan P. Myhrvold anticipated the shift from desktop computers to more portable form factors. Years earlier I remember a person from Knight Ridder pitching a handheld gizmo which piggybacked on the Dynabook. When looking for accurate forecasts and precedents, those with access to a good library, commercial databases, and the Web can ferret up many examples of the Nostradamus approach to research. I am all for it. Too many people today do not do hands on research. Any exercise of this skill is to be congratulated.
Here’s the main point of the write up in my opinion:
His memo is amazingly accurate. Note that his term “IHC” (Information Highway Computer) could be roughly equated with today’s smartphone or tablet device, connecting to the Internet via WiFi or a cellular network. In his second last paragraph, Myhrvold predicts the winners will be those who “own the software standards on IHCs” which could be roughly equated with today’s app stores, such as those on iOS (Apple), Android (Google, Amazon) and Windows 8 (Microsoft). The only thing you could say he possibly didn’t foresee would be the importance of hardware design in the new smartphone and tablet industry.
Let’s assume that Mr. Myhrvold was functioning in “I Dream of Jeannie” mode. Now let’s take that notion of a big change coming quickly and apply it to search. My view is that traditional key word search was there and then—poof—without a twitch of the soothsayer’s nose, search was gone.
Look at what exists today:
- Free search which can be downloaded from more than a dozen pretty reliable vendors plus the Apache Foundation. Install the code and you have state of the art search, facets, etc.
- Business intelligence. This is search with grafted on analytics. I think of this as Frankensearch, but I am old and live in rural Kentucky. What do you expect?
- Content process. This is data management with some search functions and a bunch of parsing and tagging. Indexing is good, but the cost of humans is too high for many government intelligence organizations. So automation is the future.
- Predictive search. This is the Google angle. You don’t need to do anything, including think too much. The system does the tireless nanny job.
So is search in demise mode? Yep. Did anyone predict it? I would wager one thin dime that any number of azure chip consultants will have documents in their archive which show that the death of search was indeed predicted. One big outfit killed a “magic carpet tile” showing the search industry and then brought it back.
So search is not dead. Maybe it was Mark Twain who said, “The reports of my death have been greatly exaggerated.” Just like PCs, mainframes, and key word search?
Stephen E Arnold, November 4, 2012
The Fragmentation of Content Analytics
October 29, 2012
I am in the midst of finalizing a series of Search Wizards Speak interviews with founders or chief technology officers of some interesting analytics vendors. Add to this work the briefings I have attended in the last two weeks. Toss in a conference which presented a fruit bowl of advanced technologies which read, understand, parse, count, track, analyze, and predict who will do what next.
Wow.
From a distance, the analytics vendors look the same. Up close, each is individual and often not identical. Pick up the wrong shard and a cut finger or worse may result.
A happy quack to www.thegreenlivingexpert.com
Who would have thought that virtually every company engaged in indexing would morph into next-generation, Euler crazed, and Gauss loving number crunchers. If the names Euler and Gauss do not resonate with you, you are in for tough sledding in 2013. Math speak is the name of the game.
The are three very good reasons for repackaging Vivisimo as a big data and analytics player. I choose Vivisimo because I have used it as an example of IBM’s public relations mastery. The company developed a deduplication feature which was and is, I assume, pretty darned good. Then Vivisimo became a federated search system, nosing into territory staked out by Deep Web Technologies. Finally, when IBM bought Vivisimo for about $20 million, the reason was big data and similarly bright, sparkling marketing lingo. I wanted to mention Hewlett Packard’s recent touting of Autonomy as an analytics vendor or Oracle’s push to make Endeca a business analytics giant. But IBM gets the nod. Heck, it is a $100 billion a year outfit. It can define an acquisition any way it wishes. I am okay with that.
The Google Search Appliance Adds Bells and Whistles
October 18, 2012
A version of this article appears on the www.citizentekk.com Web site.
The Google Search Appliance is getting along in year. A couple of weeks ago (October 2012), Google announced that Version 7.0 of the Google Search Appliance GB-7007 and the GB-9009 was available. The features of the new system are long-overdue in my opinion. Among the new features are two highly desirable enhancements: better security controls, faceted browsing. But the killer feature, in my opinion, is support of the Google Translate application programming interface.
Microsoft will have to differentiate the now aging SharePoint Search 2013 from a Google Search Appliance. Why? GSA Version 7 can be plugged into a SharePoint environment and the system will, without much or fuss, index the SharePoint content. Plug and play is not what SharePoint Search 2013 delivers. The fast deployment of a GSA remains one of its killer features. Simplicity and ease of use are important. When one adds Google magic, the GSA Version 7 can be another thrust at Microsoft’s enterprise business.
See http://www.bluepoint.net.au/google-search/gsa-product-model
Google has examined competitive search solutions and, in my opinion, made some good decisions. For example, a user may add a comment to a record displayed in a results list. The idea of allowing enterprise users add value to a record was a popular feature of Vivisimo Velocity. But since IBM acquired Vivisimo, that company has trotted down the big data trail.
Endeca has for more than 12 years offered licensees of its systems point-and-click navigation. An Endeca search solution can slash the time it takes for a user to pinpoint content related to a query. Google has made the GSA more Endeca like while retaining the simplified deployment which characterizes an appliance solution.
As I mentioned in the introduction, one of the most compelling features of the Version 7 GSAs is direct support for Google Translate. Organizations increasingly deal with mixed language documents. Product and market research will benefit from Google’s deep support of languages. At last count, Google Translate supported more than 60 languages, excluding Latin and Pig Latin. Now Google is accelerating its language support due to its scale and data sets. Coupled with Google’s smart software, the language feature may be tough for other vendors to match.
Enterprise searchers want to be able to examine a document quickly. To meet this need, Google has implemented in-line document preview. A user can click on a hit and see a rendering of the document without having to launch the native applications. A PDF in a results list appears without waiting the seconds it takes for Adobe Reader or FoxIt to fetch and display the document.
What’s not to like? The GSA GB-7007 and GB-9009 delivers most of the most-wanted features to make content searchable regardless of resource. If a proprietary file type must be indexed, Google provides developers with enough information to get the content into a form which the GSA can process. Failing that, Google partners and third-party vendors can deliver specialized connectors quickly.
How a Sitemap Can Enhance a Web Presence
October 17, 2012
Business2Community covers the importance of Web site indexing in its piece, “How To Build Your Own Sitemap in Five Minutes, and Why You Need To.” In it, the author discusses differences processes for simply creating an effective sitemap. However, he begins with an introduction.
When you ask most business owners and beginning online marketers what a ‘Sitemap’ is, you usually get two responses. ‘What’s that?’ or ‘That’s just too complicated for us.’ Sitemaps for your website aren’t impossible to make, and they certainly aren’t a waste of time. To understand why you need to make your own Sitemap today, you need to understand what they are and how they work.
The author then goes on to recommend various tools and techniques for effectively creating a sitemap. However, there are other solutions that not only automatically generate sitemaps, but also automatically crawl and index any organization’s site in order to enable effective Web site search. One highly awarded option is Fabasoft Mindbreeze InSite. Fabasoft Mindbreeze takes the guesswork out of indexing and mapping, reaping high results with little effort. Explore how Fabasoft Mindbreeze might enhance your organization’s online presence today.
Emily Rae Aldridge, October 17, 2012
Sponsored by ArnoldIT.com, developer of Augmentext.
Automation to Cure Duplicate Content Issues
October 15, 2012
Search Engine Land is shining a light on a common Web site search problem, duplicate content issues. Read the full report in, “An Automated Tool To Eliminate Duplicate Content Issues.”
The author begins:
BloomReach announced a new software product named Dynamic Duplication Reduction (DDR) that aims to eliminate duplicate content issues on web sites. Typically, software tools are known to cause duplicate content issues but this tool promises to reverse it. The tool deeply crawls your web pages and continuously interprets all content on a site. It will automatically discover and act on duplicate pages.
They say an ounce of prevention is worth a pound of cure and in this case the prevention needed is effective Web site indexing. Fabasoft Mindbreeze InSite quickly crawls and indexes all Web site content delivering search results based on relevancy. Misspellings are even corrected with InSite and duplication is prevented. Fabasoft Mindbreeze is a longstanding leader in third party solutions for the enterprise. InSite is quickly becoming the icing on the cake of this industry leader.
Emily Rae Aldridge, October 15, 2012
Sponsored by ArnoldIT.com, developer of Augmentext.
Get A Comprehensive Search Strategy Plan from Aspire
October 12, 2012
People tend to doubt the power of a good search application. They take it for granted that all out-of-the-box and Internet search engines are as accurate as Google (only the most powerful in the public eye). The truth of the matter is most businesses are losing business productivity, because they have not harnessed the true potential of search. Search Technologies, a leading IT company that specializes in search engine implementation, managed services, and consulting, is the innovator behind Aspire:
“Aspire is a powerful framework and application platform for acquiring both structured and unstructured data from just about any content source, processing / enriching that content, and then publishing it to the search engine or business analytics tool of your choice.”
Aspire uses a built-in indexing pipeline and propriety code maintained by Search Technologies high standards. It is based on Apache Felix, the leading open source implementation for OSGI standard. OSGI is built for Java and supported by IT companies worldwide. Aspire can gather documents from a variety of resources, including relational databases, SharePoint, file systems, and many more. The metadata is captured and then it can be enriched, combined, reformatted, or normalized to whatever the business needs before it is submitted search engines, document repositories, or business analytics applications. Aspire performs content processing that cleans and repackages data for findability.
“Almost all structured data is originally created in a tightly controlled or automated way.
By contrast, unstructured content is created interactively by individual people, and is infinitely variable in its format, style, quality and structure. Because of this, content processing techniques that were originally developed to work with structured data simply cannot cope with the unpredictability and variability of unstructured content.”
By implementing a content processing application like Aspire, unstructured content is “scrubbed,” then enriched, for better search results. Most commercial search engines do not have the same filters that weed out relevant content from the bad. The results displayed to the user are thus poor quality and are of zero to little use. They try to resolve the problem with custom coding and updates for every new data source that pops up, which is tedious. Aspire fixes tired coding problems, by using automated metadata extraction and manipulation outside the search engine.
As powerful as commercial search engines are they can often lack the refined quality one gets from a robust ISV. Aspire does not follow the same search technology path as its competitors, rather it has designed a new, original solution to provide its clients with a comprehensive search strategy plan to help improve productivity, organization, and data management.
Remember. Search Technologies is sponsoring a meet up at the October 2012 Enterprise Search Summit. More information is available at http://www.meetup.com/DC-Metro-Enterprise-Search-Network/
Iain Fletcher, October 12, 2012
Sponsored by ArnoldIT.com, developer of Augmentext
Salesforce Incorporates Coveo Enterprise Search
September 22, 2012
ITWorldCanada announces, “Coveo Brings Enterprise Search to Salesforce.com.” The Canadian company will contribute its indexing engine and business intelligence tools to the Salesforce.com cloud. Coveo for Salesforce, which can pull together, index, and analyze unstructured data from multiple sources, will be fully integrated into the popular online customer relationship management (CRM) platform.
The write up tells us:
“Louis Tetu, CEO of Coveo, said the product is the first tool of its kind that is integrated directly into Salesforce. ‘We are enabling an entirely new paradigm to federate information on demand,’ he said. ‘And that paradigm means that we don’t have to move data, we’re just pointing…secure indexes to that information.’
“Users of the technology that need information delivered in real-time, such as customer-facing companies, will be able to get it rapidly — within 100 milliseconds — he added. This will help solve the common problem of consumers dealing with contact centres that cannot pull up their information in a reasonable period of time.”
Yes, that is a real plus. Tetu went on to emphasize that this is no small development– his company has conquered the considerable challenges of operating securely in the cloud. He mentions they also make a special effort to ensure new users can dive in as easily as possible.
Coveo was founded in 2005 by some members of the team which developed Copernic Desktop Search. Coveo takes pride in solutions that are agile and easy to use yet scalable, fast, and efficient.
Cynthia Murrell, September 22, 2012
Sponsored by ArnoldIT.com, developer of Augmentext
No Wonder Search Is Broken. Software Does Not Work.
September 17, 2012
Several years ago, I ran across a Microsoft centric podcast hosted by an affable American, Scott Hanselman. At the time he worked for a company developing software for the enterprise. Then I think he started working at Microsoft and I lost track of him.
I read “Everything’s Broken and Nobody’s Upset.” The author was Scott Hanselman, who is “a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee.”
The article is a list of bullet points. Each bullet point identifies a range of software problems. Some of these were familiar; for example, iPhoto’s choking on large numbers of pictures on my wife’s new Mac laptop. Others were unknown to me; for example, the lousy performance of Gmail. Hopefully Eric Brewer, founder of Inktomi, can help improve the performance of some Google services.
Answer to the Google query “Why are Americans…”
The problems, Mr. Hanselman, identifies can be fixed. He writes:
Here we are in 2012 in a world of open standards on an open network, with angle brackets and curly braces flying at gigabit speeds and it’s all a mess. Everyone sucks, equally and completely.
- Is this a speed problem? Are we feeling we have to develop too fast and loose?
- Is it a quality issue? Have we forgotten the art and science of Software QA?
- Is it a people problem? Are folks just not passionate about their software enough to fix it?
I think it’s all of the above. We need to care and we need the collective will to fix it.
My reaction was surprise. I know search, content processing, and Fancy Dan analytics do not work as advertised, as expected, or, in some cases, very well despite the best efforts of rocket scientists.
The idea that the broad world of software is broken was an interesting idea. Last week, I struggled with a client who could not explain what its new technology actually delivered to a user. The reason was that the words the person was using did not match what the new software widget actually did. Maybe the rush to come up with clever marketing catchphrases is more important than solving a problem for a user?
In the three disciplines we monitor—search, content processing, and analytics—I do not have a broad method for remediating “broken” software. My team and I have found that the approach outlined by Martin White and I in Successful Enterprise Search Management is just ignored by those implementing search. I can’t speak for Martin, but my experience is that the people who want to implement a search, content processing or analytics system demonstrate these characteristics. These items are not universally shared, but I have gathered the most frequent actions and statements over the last year for the list. The reason for lousy search-related systems:
- Short cuts only, please. Another consultant explained that buying third party components was cheaper, quicker, and easier than looking at the existing search related system
- Something for nothing. The idea is that a free system is going to save the day.
- New is better. The perception that a new system from a different vendor would solve the findability problem because it was different
- We are too busy. The belief that talking to the users of a system was a waste of time. The typical statement about this can be summarized, “Users don’t know what they want or need.”
- No appetite for grunt work. This is an entitlement problem because figuring out metrics like content volume, processing issues for content normalization, and reviewing candidate term lists is not their job or too hard.
- No knowledge. This is a weird problem caused in part by point-and-click interfaces or predictive systems like Google’s. Those who should know about search related issues do not. Therefore, education is needed. Like recalcitrant 6th graders, the effort required to learn is not there.
- Looking for greener pastures. Many of those working on search related projects are looking to jump to a different and higher paying job in the organization or leave the company to do a start up. As a result, search related projects are irrelevant.
The problem in search, therefore, is not the technology. Most of the systems are essentially the same as those which have been available for decades. Yes, decades. Precision and recall remain in the 80 percent range. Predictive systems chop down data sets to more usable chunks but prediction is a hit and miss game. Automated indexing requires a human to keep the system on track.
The problem is anchored in humans: Their knowledge, their ability to prioritize search related tasks, their willingness to learn. Net net: Software is not getting much better, but it is prettier than a blinking dot on a VAX terminal. Better? Nah. Upset? Nope, there are distractions and Facebook pals to provide assurances that everything is A-OK.
Stephen E Arnold, September 17, 2012
Sponsored by Augmentext
IBM and Its Predictive Analytics Push
September 12, 2012
I prefer to examine the plumbing of search and content processing systems. What is becoming increasingly obvious to me is that many of the “new” business intelligence and eDiscovery vendors are licensing technology and putting a different user interface on what is a collection of components.
Slap on visualization and some game-like controls and you have “big data analytics.” Swizzle around the decades-old technology from Oracle, and you still find the Oracle database system. Probe the Hadoop vendors, and you find fancy dancing away from the batch orientation of the NoSQL data management framework. Check out the indexing subsystems and you find third parties which a handful of customers who license their technology to a “wrapper company.”
The phrase “wrapper company” and the product approach of “wrapper bundles” is now described in some clever marketing lingo. The notion of federation, real time, and distributed data are woven into systems which predict, permit discovery, and allow users to find answers to questions the user did not know to ask.
Everything sounds so “beyond search.” I think many of the licensees and prospects react to the visualizations in the demos and the promise that a business professional can use these systems without knowing about the underlying data, programming, or statistical methods is what sells. Who wants to pay for a person to babysit a system and write custom reports? Chop that headcount because the modern systems are “smart.”
Next generation analytics systems are, like enterprise search, comprised of many moving parts. For most professionals, the “moving parts” are of little interest and even less frequently scrutinized. Users want answers or information without having to do much more than glance at a visual display. The ideal system says, “Hello, Dave, here’s what you need to know right now.”
The IBM Ad
I noted an advertisement in the Wall Street Journal, on September 10, 2012 on page A20. The advertiser was IBM. The full page ad featured the headline, “We Used to Schedule Repairs.” The idea is that smart software monitors complex systems and proactively find, repairs, and notifies before a system fails.
Sounds fantastic.
The ad asserts:
Fixing what will break next, first. Managing [the client’s] infrastructure proactively rather than reactively has helped the utility reduce its customer calls by 36 percent.”
The argument concludes:
Replacing intuition with analytics. No one knows your organization’s millions of moving parts better than you. But now with IBM predictive maintenance, you can spend less time and fewer resources repairing things either too early to too late, and more time focusing your attention on what happens next.”
The ad points me to this IBM page:
Snappy visualizations, the phrase “smarter analytics,” and a video round out the supplemental information.
Observations
Three observations:
- IBM has the resources to launch a major promotion of its predictive analytics capabilities. The footprint of IBM in this concept space may boost interest in analytics. However, smaller firms will have to be able to differentiate themselves and offer the type of benefits and customer references IBM employs.
- The approach of the copy in the ad is to make predictive analytics synonymous with smart management and cost effective systems. Many of the analytics companies struggle to articulate a clear value proposition like this.
- The notion of making a smarter information technology department fits into IBM’s broader message of a smarter planet, city, government, etc. Big ideas like this are certainly easier to grasp than the nitty gritty, weaknesses, and costs of computationally canned methods.
For smaller analytics vendors, it is game on.
Stephen E Arnold, September 12, 2012
Sponsored by Augmentext
More on Marketing Confusion in Big Data Analytics
September 11, 2012
Search vendors are like a squirrel dodging traffic. Some make it across the road safely. Others? Well, there is a squirrel heaven I assume. Which search vendors will survive the speeding tractor trailers carrying big data, analytics, and visualization to customers who are famished for systems which make sense of information? I don’t know. No one really knows.
Do squirrels understand high speed, high volume traffic? A happy quack to http://surykatki.blox.pl/html/1310721,262146,14,15.html?7,2007 for a fierce squirrel image.
What is fascinating is to watch the Darwinian process at work among vendors of search and content processing. TextRadar’s “Content Intelligence: An Unexpected Collision Is Coming” makes clear that there are quite a few companies not widely known in the financial and health care markets. Some of these companies have opportunities to make the leap from government contract work to commercial work for Fortune 1000 companies.
But what about more traditional search vendors?
I received in the snail mail a copy of Oracle Magazine. September October 2012. The article which caught my attention was “New Questions, Fast Answers.” The information was in the form of an interview between Rich Schwerin, an Oracle magazine writer, and Paul Sonderegger, senior director of analytics at Oracle. Mr. Sonderegger was the chief strategist at Endeca, which is now part of the Oracle family of companies.
I have followed Endeca since I first learned about the company in 1999, 22 years ago. Like many traditional search vendors, the underlying technical concepts of Endeca date from the salad days of key word search. Endeca’s innovation was to identify concepts either human-assigned or generated by software to group related information. The idea was that a user could run a query and then click on concepts to “discover” information not in the explicit key word match. Endeca dubbed the function “guided navigation” and applied the approach to eCommerce as well as search across the type of information found in a company. The core of the technology was the “Endeca MDEX” engine. At the time of Endeca’s market entrance, there were only a handful of companies competing for enterprise search and eCommerce. In the last two decades the field has narrowed in one sense with the big name companies acquired by larger firms and broadened in another. There are hundreds of vendors offering search, but the majority of these companies use different words to describe indexing and search.
One Endeca executive (Peter Bell) told me in 2005 that the company had been growing at 100 percent each year since 2002.” At the time of the Oracle buy out, I estimated that Endeca had hit about $150 million in revenues. Oracle paid about $1.1 billion for the company or what, if I am accurate, amounts to about 10 times annual revenues. Endeca was a relative bargain compared to Hewlett Packard’s purchase of Autonomy for $10 billion. Autonomy, founded a few years before Endeca, had reached about $850 million in annual revenues, so the multiple on revenues was greater than the Endeca deal. The point is that both of these search giants ranked one and two in enterprise search revenues. Both companies emphasized their technologies’ ability to handle structured and unstructured information. Both Autonomy and Endeca offered business intelligence solutions. In short, both companies had capabilities which some of the newcomers mentioned in the Text Radar article are now touting as fresh and innovative. One key point: It took 22 years for Endeca to hit $150 million and now Oracle has to generate more revenue from the aging Endeca technology. HP has the same challenge with Autonomy, of course. Revenue generation, in my opinion, has been time consuming and difficult. Of the hundreds of vendors past and present, only two have broken the $150 million in revenue barrier. Google and Microsoft would be quick to point out that their search systems are far larger, but these are special cases because it is difficult to unwrap search revenues from other revenue streams.
What does Mr. Sonderegger say in this Oracle Magazine interview. Let me highlight three points and urge you to read the full text of his remarks.
Easy Access
First, business users do not know how to write queries, so “guided navigation” services are needed. Mr. Sonderegger noted:
There has to be some easy way to explore, some way to search and navigate as easily as you do on an e-commerce site.
Most of the current vendors of analytics and findability systems seem to have made the leap from point-and-click to snazzy visualizations. The Endeca angle is that users want to discover and navigate. The companies referenced in the Text Radar story want to make the experience visual, almost video-game like.