Is Duck Duck Go a No Go?
July 28, 2013
I’m sorry to say that I agree with Brian Mayer wholeheartedly when he explains, “I Used DuckDuckGo for a Week and Had to Switch Back. Here’s Why.” In his blog, Notes, the busy entrepreneur says he was prompted to give the Google alternative another try upon recent revelations about government snooping, since DuckDuckGo famously does not track users’ search terms. The exercise just reinforced for the blogger just how much better Google is at delivering relevant results. He writes:
“Now, I love that DuckDuckGo doesn’t track searches. In terms of their commitment to privacy and their users, I don’t think there’s a better option. And I love that there’s an alternative for people concerned about their data being collected. But it took me only a week using DuckDuckGo to appreciate the little things that Google does that still make it a far superior product.”
Mayer lists some of those “little” things: Google is faster; it keeps up with current events (returning more timely results); it refuses to index sites containing code errors (!); and it knows which Wikipedia articles are worth pulling up. He concludes:
“I tried, and for the things that matter to me, it seems that Google is just a better experience. I hope DuckDuckGo improves the product, because eventually I would love to switch back. But philosophical alignment isn’t enough to get me to use an inferior product.”
I can corroborate Mayer’s account; I have had a similarly fraught relationship with this water fowl. I still use it if I’m looking up something sensitive, like health or money stuff. For the most part, though, I am also waiting for the duck to improve. At least I know I’m not waiting alone.
Cynthia Murrell, July 28, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Big O Explained: Why Systems Are Alike?
July 27, 2013
In several of my recent lectures, I pointed out that most end users cannot differentiate among search systems. The comment made about these systems is often, “Why can’t these systems be like Google?” I concluded that the similarity of requests suggests that systems are essentially identical.
One reason is that training in university and the “use what works” approach in the real world produces search, content processing, and analytics systems that are pretty much indistinguishable. There are differences, but these can be appreciated only when a person takes the systems apart. Even then, differences are difficult to explain; for example, why a threshold value in System A is 15 percent lower than in System B. When dealing with sketchy data, the difference is usually irrelevant.
Another reason is that today’s systems are struggling to cope with operations that stretch the capabilities of even the most robust systems. Developers have to balance what the engineering plan wants to do with what can be done in a reasonable amount of time on an existing system.
Enter Big O.
You may want to take a look at “Big O Notation Explained by a Self-Taught Programmer.” I found the write up interesting and clear. The main point in my opinion is:
Consider this function:
def all_combinations(the_list): results = [] for item in the_list: for inner_item in the_list: results.append((item, inner_item)) return resultsThis matches every item in the list with every other item in the list. If we gave it an array
[1,2,3]
, we’d get back[(1,1) (1,2), (1,3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)]
. This is part of the field of combinatorics(warning: scary math terms!), which is the mathematical field which studies combinations of things. This function (or algorithm, if you want to sound fancy) is consideredO(n^2)
. This is because for every item in the list (akan
for the input size), we have to don
more operations. Son * n == n^2
.Below is a comparison of each of these graphs, for reference. You can see that an
O(n^2)
function will get slow very quickly where as something that operates in constant time will be much better.
Net net: Developers have to do what works. Search and related content processes are complex. In order to get the work done, search systems have embraced “what works.” Over time, we get undifferentiable systems.
Disagree? Use the comments section to explain.
Stephen E Arnold, July 27, 2013
Sponsored by Xenky
Amazon, Losses, and Search
July 26, 2013
I followed the flow of stories about Amazon’s jump in sales (up 20+ percent) and the loss of a pittance ($7 million). A year ago, I slogged through a report about Amazon’s technology for one of my clients. I think this outfit lost its funding and the senior managers are now taking some time off to recharge their batteries. I also completed my August/September column for Information Today. This is one of the for fee articles I write, so it is quite different from the information I catalog in Beyond Search. The articles are substantive; Beyond Search is my public collection of abstracts, ideas, and hypotheses. Many readers, including some challenged azure chip consultants, confuse the for fee articles with Beyond Search. Well, what can I do to help them? I am content with the difference between “free” and “for fee”? That’s what counts for me.
Where will the fracture occur? Amazon is an enterprise operating under stress with a range of “pressures” operating on the enterprise.
One story — “Jeff Bezos Doesn’t Care What You Think about Amazon’s Quarterly Earnings” — caught my attention on two levels. On the obvious financial stratus, the loss is merely an investment. The MBA idea is that if you spend wisely today, you will, if your are the right kind of executive, work out in the longer run. On the second stratum, Amazon is rolling down the side lanes in a bowling alley. I think these channels are called in the parlance of the bowling superstars, the gutter. The notion is that once the ball gets in a gutter it goes straight ahead and misses the pins.
Amazon, like Google, is now in the Sam Walton sphere. In order to serve the largest possible audience, costs are the key issue. Not surprisingly, coincident with the Amazon financial reports, a lone Amazon person wrote “Brutal Letter to Jeff Bezos Says Way to Succeed at Amazon Is ‘Be a Pretty Girl or a Dude Who User Liberal Amounts of Axe.’” I don’t know if the write up is accurate (who knows what article is accurate these days?). Here’s the snippet I highlighted:
… There will always be an endless supply of replacements, and they will be paid less since the pay rate of the team decreased with every new batch of hires. My replacement will probably work really hard for about six months, and then realize that they are cruising towards a dead end. They might start caring a little less. For the full letter, click here.
My interest is search and content processing. In my Information Today column, which will be online in a couple of weeks, I point out that Amazon is in the for-fee search game. I also point out that Amazon, as far as I know, is the first search lazy Susan. The idea is that if you don’t like one search, you can choose another vendor who is offering its search / content processing system on the Amazon cloud.
The approach is interesting because the Amazon search system is immature. Check out the file types supported. Look at the pricing approach. Examine the features in comparison with a system like LucidWorks or some other enterprise class service. What will you discover? I cover that in my for fee column. A hint is that Amazon can learn a great deal watching behaviors. I find this approach quite intriguing.
Now if we look at these three points, I see a connection of sorts between losses/investments, cost cutting at the human knowledge layer, and the creation of a system which informs Amazon about search and content processing services. Amazon may be on a path to create what might become the WalMart of enterprise search. Google tried this approach in appliance form.
Will the resulting information retrieval services improve findability? Jury’s still out. But the pursuit of the mass market has some interesting vectors which may work at cross purposes.
Stephen E Arnold, July 26, 2013
Sponsored by Xenky
UK University Relies on Funnelback
July 25, 2013
Search solution firm Funnelback has achieved a spot on the U.K. government’s list of cloudy vendors, we learn from the Sales Information at the HM Government G-Cloud site. Commitments to savings and transparency prompted the agency to publish this list of cloud-services vendors, which includes cost information. The introduction explains:
“As part of G-Cloud’s commitment to make central government savings by encouraging a shift to cloud computing commodity services, and our equal commitment to transparency, we publish details of all public sector spend through the G-Cloud frameworks. Details of the projected savings enabled by the Programme can be found here.
“All suppliers on the G-Cloud frameworks are obliged to provide monthly reports of invoiced sales to the Government Procurement Service (GPS). Once the data has been validated, we then publish updated figures on to this page on a monthly basis.”
We have taken interest in Funnelback before, and were happy to spot it in the list (its platform was used at the University of Surrey.) The Australian enterprise-search provider grew from technology developed by scientific research agency CSIRO. Funnelback was established in 2005, and was bought by U.K. content management outfit Squiz in 2009. Their memorable moniker combines the names of two Australian spiders, the funnel-web and the redback.
Cynthia Murrell, July 25, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Rumored Acquisition May Put Baidu on Defensive
July 24, 2013
Now this is an interesting development. Search Engine Watch‘s Jennifer Slegg points to a rumor about China’s massive search engine market in, “Chinese Search Engine Qihoo to Buy Sogou for $1.4 Billion.” She writes:
“The Chinese search engine space just got a lot more interesting with Qihoo 360 reportedly purchasing Sohu’s Sogou search engine.
“If the report from DoNews (via The Next Web) is accurate, this deal will effectively combine the second and third largest search engines in China, which could have a significant impact on Baidu’s huge market share. . . .
“Qihoo 360 launched its own search engine in August of last year, and is second only to Baidu in terms of market share in China. Purchasing Sogou would mean the company would have nearly 25 percent of the search market share compared to Baidu’s eroding market share, which is now slightly under 70 percent.”
Is Baidu worried? Qihoo 360 launched its own engine just last year, and acquisition of the popular Sohu would mean a merger of China’s second- and third-largest search engines. Some expect the deal, rumored to be in the neighborhood of $1.4 billion, will soon be officially announced.
Cynthia Murrell, July 24, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Kapow Reinforces It Is a Big Data Platform
July 21, 2013
Short honk: Data integration, like search, is expanding. We noted a news release called “Kapow Software Quarterly Revenue Rises as Newly Acquired Customer Bookings and Subscriptions Fuel Growth.” The news release explains that a privately held firm is growing. The important point for me was this phrase: “a leading Big Data solution provider.”
The news release explains:
The Kapow Enterprise Big Data Integration Platform enables companies to integrate any cloud or on-premise data source using Kapow Software’s patented, intelligent integration workflows and Synthetic APIs™. Once the critical data is found and surgically extracted, Kapow Enterprise 9.2 delivers timely information to the workforce in an easily consumable form called Kapow Kapplets™ through an enterprise app library offering called the Kapow KappZone™. KappZones can be easily branded and distributed for employees to discover and use on any computing device they choose.
The Kapow Web site points out that the company’s business includes:
- Content integration
- Content migration
- Legacy application integration
- Enterprise search.
The company also offers three aforementioned products: Katalyst, Kapplets, and KappZone. I find this semantic embrace fascinating and indicative of a trend in which vendors pretty much do anything related to information which is, it seems, Big Data.
Stephen E Arnold, July 21, 2013
Sponsored by Xenky
Autonomy: A New Kind of Search?
July 20, 2013
Autonomy was founded in 1996. That was 17 years ago. In my upcoming KMWorld column for August/September, I point out that search, content processing, and even analytics have been consistent for many years. There are a number of reasons for the “sameness” of systems and the corresponding difficulty prospects have in differentiating one system from another.
Perhaps I am off base. Search systems, content processing systems, and analytics systems are very, very different. I am looking at out dated notions such as precision and recall. I am missing the point that search is about interface, “smart” software which knows what I want based on my past behavior, and mobile computing demands search apps which just present information. No information retrieval baloney required like a carefully crafted Boolean query.
I read with interest and my acknowledged lack of expertise “Analytics for Human Information: Enterprise Search in the age of Big Data.” In one article, I learned that HP Autonomy delivers analytics and search in a big data world. More interesting was this phrase “a new kind of search is here.” Okay, after 17 years, I am open to innovation even though I see more and more similarity.
The article asserts:
Here at HP Autonomy, we think the market is hungry for a more open and comprehensive approach to solving big data access problems. So we are excited to be launching a promotion program called Enterprise Search Rescue to help Microsoft FAST and Oracle Endeca customers migrate to Autonomy IDOL quickly and seamlessly. Everyone deserves a search technology that can solve tomorrow’s challenges.
My recollection is that when HP acquired Autonomy a number of Autonomy vendors offered demonstrations and programs to “rescue” Autonomy customers from HP. Oracle Endeca is cutting some of its prices and the founders have moved on to other interests. Microsoft Fast is a money machine for consultants, but rumors swirl that changes are coming.
What we have then, is Autonomy reinventing itself to provide an alternative to Endeca (founded in 1999) and Fast Search & Transfer (founded in 1997).
Am I alone in finding it somewhat amusing to see these aging search systems trying to capture one another’s customers? Are there less proprietary solutions available; for example, perhaps an Autonomy licensee could implement LucidWorks and gain some advantages?
Net net. Yep, I think many organizations are hungry for findability solutions which work, do not cost millions, and can cope with today’s information tasks. I read a news release last week that pointed out no new search system has been patented by the USPTO in the last five years. You can find that story here.
When is “new” new?
Stephen E Arnold, July 20, 2013
Sponsored by Xenky
Rainstor Claims Hadoop Secure Even for Large Banks
July 20, 2013
The article titled RainStor Adds Enterprise-Grade Security, Search to Hadoop on ITWorld discusses the database specialist’s answer to the Big Data problem. What problem, you ask? When your clients number among the world’s largest banks, security and speedy search are of paramount importance. The article explains,
“When you put Hadoop into production, especially if you’re a telco or a large investment or retail bank, you suddenly have to think about the sensitivity and importance of the data,” says John Bantleman, CEO of RainStor. “If you lose a webclick, nobody cares. But if you allow unauthorized users access to high-value data … the requirements are just so much more rigorous. You need good authentication. You need to manage encryption keys and have an understanding around how the data is used.”
Rainstor’s data compression technology reduces the storage footprint by up to 97%, and they believe their enterprise-grade security and search for Hadoop will solve past problems. Data encryption, data masking, audit trail and tamper proofing are all new security features. The search aspect was also a priority (another search Hadoop play). Rainstor claims that its search capability performs at speeds 10 to 100 times faster than standard SQL by quickly dividing data into subsets, which analysts can further explore.
Chelsea Kerwin, July 20, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Finding an Optical Character Recognition Program
July 19, 2013
The I ran the following query for a client project yesterday: “OCR programs.”
I passed the query to Google, Yandex, and Bing in that order. What did I find?
There are 11 ads and 10 hits, one set of news items and one set of related search suggestions. Several the links pointed me to downloads which were too confusing to try. The other links pointed to information ranging from Google Groups to commercial companies’ products.
Here’s what Yandex delivered to me:
No ads and mostly general information, including a hit to TextBridge which is no longer current.
And Bing?
There were five ads, related searches in two places, and links to mostly “free” programs and general information sites.
The reason this is an important series of examples is that I have been reading some of the articles about Google’s somewhat disappointing earnings results. The numbers are huge, but when most search and content processing companies are struggling for growth, Google is the Sir Lancelot of search vendors. If Google can’t grow quickly, what does that say about Google’s business strategy, about other search and content processing companies, and the US economy? My takeaway is not much different from that expressed in USA Today. Yes, USA Today, what one of my goslings calls “McPaper.”
The story is “Google Earnings Clipped in Mobile Headwinds.” The main point is, in my opinion:
Concerns continue about so-called cost-per-click prices that advertisers pay Google for Internet-search advertising.
And then:
Google’s average cost-per-click, which includes clicks related to ads served on Google sites and the sites of its network members, decreased about 6% in the quarter compared with a year ago. Analysts had predicted prices would drop about 3% in the period.
JackBe Embraces SharePoint with Presto Release
July 19, 2013
An article on Business Wire titled JackBe Presto Makes SharePoint Real-Time for the Enterprise reports on the software provider, JackBe. JackBe provides intuitive dashboards that organize Big Data. Presto Add-On for SharePoint, the most recent version of their software, allows users to build apps and dashboards with a familiar interface. The article explains,
“Presto Add-On for SharePoint enables users to query Presto-connected data within SharePoint, using SharePoint Search. In addition, the solution’s new “FAST Enterprise Search” Wires block provides a simple drag-and-drop search experience using FAST, SharePoint’s popular enterprise search capability. Powered by Wires, Presto’s “point-click-mash” visual assembly tool, Presto Add-On for SharePoint enables mashing of multiple FAST search results with support for keyword and FAST Query Language (FQL) queries. This allows users to easily combine data from multiple sources, lists and queries into single, meaningful data visualizations.”
Not only FAST Search block, but several other new Wires are included in the recent upgrade. SharePoint List Add Item, SharePoint List, SharePoint List Merge, SharePoint Search and External Content Adapter are all Wire blocks that will enable reading and replying to data sources and solving List ID issues. We can’t help but notice that as soon as other vendors are exiting SharePoint, JackBe jumps in full throttle.
Chelsea Kerwin, July 19, 2013
Sponsored by ArnoldIT.com, developer of Augmentext