Replacing dtSearch is Easier than it Sounds

August 5, 2013

DtSearch is an interesting topic. Certainly once considered a high water mark for text retrieval systems, it has mostly fallen off the cultural radar. However, that hasn’t stopped one industrious company of…replacing it? We learned more from a recent Flax article, “An Open Source Replacement for the dtSearch Closed Source Search Engine.”

According to the story:

…we developed a new Lucene Analyzer that speaks the same syntax as dtSearch, allowing us to index text input. On the search side we have a Lucene QueryParser that shares this syntax. To make it easier to use we’ve wrapped the whole lot in a modified Solr server. As we needed some features of very recent Lucene code, our modifications are based on a patch to Lucene trunk.

Our best response here is, well, whoopee. Saying you’ve replaced dtSearch is like Chevy claiming it has replaced the horse and buggy with its 2014 model. Frankly, we weren’t aware of too many people still using that software. For goodness sake, a Google search only brought up a single news piece. Chances are most people moved on a long time ago, so we will be stunned to hear about anyone jumping for joy because of this open source option.

Patrick Roland, August 05, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Autonomy ArcSight Tackles Security

August 2, 2013

HP Autonomy is chasing the Oracle SES angle: security for search. We took a look at the company’s pages about HAVEn, Autonomy’s latest big data platform. Regarding the security feature, ArcSight Logger, the company promises:

“With HP ArcSight Logger you can improve everything from compliance and risk management to security intelligence to IT operations to efforts that prevent insider and advanced persistent threats. This universal log management solution collects machine data from any log-generating source and unifies the data for searching, indexing, reporting, analysis, and retention. And in the age of BYOD and mobility, it enables you to comprehensively manage an increasing volume of log data from an increasing number of sources.”

More information on HAVEn can be found in the YouTube video, “Brian Weiss Talks HAVEn: Inside Track with HP Autonomy.” At the 1:34 mark, Autonomy VP Weiss briefly describes how ArcSight analyzes the data itself, from not only inside but also outside an enterprise, for security clues. For example, a threatening post in social media might indicate a potential cyber-attack. It is an interesting approach. Can HP make this a high revenue angle?

Cynthia Murrell, August 02, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Enterprise Partnership Announced

August 2, 2013

The shift to unified information access is occurring throughout the enterprise search market. In order to make that shift more seamless and effective Attivio has partnered with Capax Global. Read all about the partnership in the article, “Capax Global and Attivio Announce Strategic Reseller Partnership.”

The article begins:

Attivio, creator of the award-winning Active Intelligence Engine (AIE), has formed a strategic reseller partnership with Capax Global, a recognised leader in enterprise search and critical business technology consulting. The partnership addresses the changing needs of Capax Global’s customers as they deal with the widespread shift from traditional enterprise search to unified information access (UIA).”

Unified information access addresses both Big Data and unstructured data. Users are looking for a way to intuitively interact with their data in a way that produces meaning but does not disregard the user experience. LucidWorks, and other value-added open source enterprise providers, seek these same objectives through the use of open source infrastructure. LucidWorks relies on the power of Apache Lucene Solr to keep its customers satisfied at a low cost of both time and money.

Emily Rae Aldridge, August 2, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Has Oracle Swallowed Endeca?

August 1, 2013

Oracle is making headlines in the likes of ZDNet and other sources, with some good news and some bad news. The ZDnet article, “25 Striking Things Oracle Said On Its Q4 Earnings Call,” discusses just that. Up first is that their quarterly earnings fell short of expectations.

However, there was some good news. In just the fourth quarter, Oracle added 500 new SaaS customers — noting that they are bigger and growing faster than Workday.

President Mark Hurd was quoted as stating the following reassuring statements that Oracle will not be slowing down:

“‘Next week we will be announcing technology partnerships with the most largest and important SaaS companies and infrastructure companies in the cloud,’ Hurd said. ‘And they will be using and committing to our technology for years to come…Hurd’s ‘startling series of announcements’ will ‘reshape the cloud’ and perception of it. He mentioned Salesforce, NetSuite and Microsoft.”

Anyone remember when Oracle acquired Endeca back in 2011? It appears that Endeca might be getting marginalized.

Megan Feil, August 01, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Nanotechnology and Search are Ignored Part of Computing Future

July 30, 2013

The future of computing is here! That’s because it looks like the future of computing is the same as its past twenty years. Everywhere we see people talking about innovation, they seem to be missing some key instruments that will likely be shaping our next computing decade. Such was the case with a recent Fred Wu article, “The Future of Computer Programmers – An Interview with Yukihiro ‘Mats’ Matsumoto.”

According to Matsumoto:

“I believe in the foreseeable future the computing industry is still going to advance based on Moore’s law. Although, it is possible that in the next year or two quantum computers become a practical reality, in that case it will change everything! *chuckles* On a serious note, according to Moore’s law, the cost of computing will decrease and the performance and capacity of computing will increase – this basic principle is unlikely to change.“

Sorry, but cheaper computers isn’t a revelation. Nobody ever seems to focus on how nanotechnology and search will undoubtedly reshuffle the deck. Probably because A) it’s hard to determine just how radical of a shift we will see; B) these both teeter on privacy issue that have been so thorny. We can only hope journalists stop burying their head in the sand about the real future.

Patrick Roland, July 30, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Search Factoid from Research Moz

July 29, 2013

I saw “Global Enterprise Search Market to 2016: Latest Industry Analysis, Strategies, Survey, Size, Share, Growth Trends, and Forecast Research Report Available at Research Moz.” The news release explains that Research Moz has completed a study of the enterprise search market, making an effort to cover every possible angle. The report, unlike other analyses, purposes to cover the Middle East and what I used to think of as the Pacific Rim.

I navigated to Research Moz and learned that the report is 58 pages in length. The most fascinating item in the news release, in my view, was:

Global Enterprise Search market to grow at a CAGR of 12.98 percent over the period 2012-2016.

If the robust growth rate is accurate, the search and content processing firms working hard to cover their payroll can look forward to a brighter future. The information available to me suggests that search is fracturing, making growth estimates difficult. The fastest growing sectors like military intelligence are less than forthcoming about the size of the contracts awarded by various nation states. In addition, the sharp uptake of open source search solutions continues to have an impact of some commercial vendors. Companies which sell services to support information retrieval are, in my view, consulting and engineering firms, not vendors of search solutions.

Research Moz also offers reports on other global markets; for example, pet food.

More information is available at http://www.researchmoz.us/global-enterprise-search-market-2012-2016-report.html. Pricing information was not available.

Stephen E Arnold, July 29, 2013

Sponsored by Xenky

Marketers! OdinText Can Help You

July 28, 2013

I saw a flurry of links to a news release titled “New Patented Text Analytics Analytics Approach [sic]” about a text analytics package. The company receiving the patent is OdinText / Anderson Analytics. The company asserts that it provides a text analytics system for market research professionals. I was intrigued by an “analytics analytics approach.”

The news story describes US 8,475,498, “Natural Language Text Analytics.” The abstract states:

A method of text analytics includes filtering a plurality of unfiltered records having unstructured data into at least a first group and a second group. The first group and said second group each include at least two records and the first group is different than the second group. The method includes determining a first proportion of occurrence for a term by comparing a first number of records having at least one occurrence of the term in the first group to a first total number of records in the first group, determining a second proportion of occurrence for the term by comparing a second number of records having at least one occurrence of the term in said second group to a second total number of records in the second group, and comparing the first proportion of occurrence to the second proportion of occurrence to yield a resultant comparison occurrence.

Anderson Analytics’ Web site says:

We Focus on Getting Accurate and Relevant Data. Quality research starts with quality data, and the best answers come from well thought out questions. Whether we are working with internal business data or gathering primary research, we make sure that projects are of correct and sufficient scope to accurately address the business need.

I scanned the document and thought about Ramanathan Guha’s programmable search engine and context server invention; for example, US 8,316,040 and its related inventions from 2007 forward. The Guha system and method are quite different from the Odin/Anderson system and method.

If you are an NLP savvy marketer, you may want to take a closer look at OdinText. The system “overcomes, alleviates, and/or mitigates one or more of the aforementioned [references a list of known NLP search problems] and other deleterious effects of prior art.

Google and Dr. Guha, you may have some work to do.

Stephen E Arnold, July 28, 2013

Sponsored by Xenky

Is Duck Duck Go a No Go?

July 28, 2013

I’m sorry to say that I agree with Brian Mayer wholeheartedly when he explains, “I Used DuckDuckGo for a Week and Had to Switch Back. Here’s Why.” In his blog, Notes, the busy entrepreneur says he was prompted to give the Google alternative another try upon recent revelations about government snooping, since DuckDuckGo famously does not track users’ search terms. The exercise just reinforced for the blogger just how much better Google is at delivering relevant results. He writes:

“Now, I love that DuckDuckGo doesn’t track searches. In terms of their commitment to privacy and their users, I don’t think there’s a better option. And I love that there’s an alternative for people concerned about their data being collected. But it took me only a week using DuckDuckGo to appreciate the little things that Google does that still make it a far superior product.”

Mayer lists some of those “little” things: Google is faster; it keeps up with current events (returning more timely results); it refuses to index sites containing code errors (!); and it knows which Wikipedia articles are worth pulling up. He concludes:

“I tried, and for the things that matter to me, it seems that Google is just a better experience. I hope DuckDuckGo improves the product, because eventually I would love to switch back. But philosophical alignment isn’t enough to get me to use an inferior product.”

I can corroborate Mayer’s account; I have had a similarly fraught relationship with this water fowl. I still use it if I’m looking up something sensitive, like health or money stuff. For the most part, though, I am also waiting for the duck to improve. At least I know I’m not waiting alone.

Cynthia Murrell, July 28, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Big O Explained: Why Systems Are Alike?

July 27, 2013

In several of my recent lectures, I pointed out that most end users cannot differentiate among search systems. The comment made about these systems is often, “Why can’t these systems be like Google?” I concluded that the similarity of requests suggests that systems are essentially identical.

One reason is that training in university and the “use what works” approach in the real world produces search, content processing, and analytics systems that are pretty much indistinguishable. There are differences, but these can be appreciated only when a person takes the systems apart. Even then, differences are difficult to explain; for example, why a threshold value in System A is 15 percent lower than in System B. When dealing with sketchy data, the difference is usually irrelevant.

Another reason is that today’s systems are struggling to cope with operations that stretch the capabilities of even the most robust systems. Developers have to balance what the engineering plan wants to do with what can be done in a reasonable amount of time on an existing system.

Enter Big O.

You may want to take a look at “Big O Notation Explained by a Self-Taught Programmer.” I found the write up interesting and clear. The main point in my opinion is:

Consider this function:

def all_combinations(the_list): results = [] for item in the_list: for inner_item in the_list: results.append((item, inner_item)) return resultsThis matches every item in the list with every other item in the list. If we gave it an array [1,2,3], we’d get back [(1,1) (1,2), (1,3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)]. This is part of the field of combinatorics(warning: scary math terms!), which is the mathematical field which studies combinations of things. This function (or algorithm, if you want to sound fancy) is considered O(n^2). This is because for every item in the list (aka n for the input size), we have to do n more operations. So n * n == n^2.

Below is a comparison of each of these graphs, for reference. You can see that an O(n^2) function will get slow very quickly where as something that operates in constant time will be much better.

Net net: Developers have to do what works. Search and related content processes are complex. In order to get the work done, search systems have embraced “what works.” Over time, we get undifferentiable systems.

Disagree? Use the comments section to explain.

Stephen E Arnold, July 27, 2013

Sponsored by Xenky

Amazon, Losses, and Search

July 26, 2013

I followed the flow of stories about Amazon’s jump in sales (up 20+ percent) and the loss of a pittance ($7 million). A year ago, I slogged through a report about Amazon’s technology for one of my clients. I think this outfit lost its funding and the senior managers are now taking some time off to recharge their batteries. I also completed my August/September column for Information Today. This is one of the for fee articles I write, so it is quite different from the information I catalog in Beyond Search. The articles are substantive; Beyond Search is my public collection of abstracts, ideas, and hypotheses. Many readers, including some challenged azure chip consultants, confuse the for fee articles with Beyond Search. Well, what can I do to help them? I am content with the difference between “free” and “for fee”? That’s what counts for me.

Where will the fracture occur? Amazon is an enterprise operating under stress with a range of “pressures” operating on the enterprise.

One story — “Jeff Bezos Doesn’t Care What You Think about Amazon’s Quarterly Earnings” — caught my attention on two levels. On the obvious financial stratus, the loss is merely an investment. The MBA idea is that if you spend wisely today, you will, if your are the right kind of executive, work out in the longer run. On the second stratum, Amazon is rolling down the side lanes in a bowling alley. I think these channels are called in the parlance of the bowling superstars, the gutter. The notion is that once the ball gets in a gutter it goes straight ahead and misses the pins.

Amazon, like Google, is now in the Sam Walton sphere. In order to serve the largest possible audience, costs are the key issue. Not surprisingly, coincident with the Amazon financial reports, a lone Amazon person wrote “Brutal Letter to Jeff Bezos Says Way to Succeed at Amazon Is ‘Be a Pretty Girl or a Dude Who User Liberal Amounts of Axe.’” I don’t know if the write up is accurate (who knows what article is accurate these days?). Here’s the snippet I highlighted:

… There will always be an endless supply of replacements, and they will be paid less since the pay rate of the team decreased with every new batch of hires. My replacement will probably work really hard for about six months, and then realize that they are cruising towards a dead end. They might start caring a little less. For the full letter, click here.

 

My interest is search and content processing. In my Information Today column, which will be online in a couple of weeks, I point out that Amazon is in the for-fee search game. I also point out that Amazon, as far as I know, is the first search lazy Susan. The idea is that if you don’t like one search, you can choose another vendor who is offering its search / content processing system on the Amazon cloud.

The approach is interesting because the Amazon search system is immature. Check out the file types supported. Look at the pricing approach. Examine the features in comparison with a system like LucidWorks or some other enterprise class service. What will you discover? I cover that in my for fee column. A hint is that Amazon can learn a great deal watching behaviors. I find this approach quite intriguing.

Now if we look at these three points, I see a connection of sorts between losses/investments, cost cutting at the human knowledge layer, and the creation of a system which informs Amazon about search and content processing services. Amazon may be on a path to create what might become the WalMart of enterprise search. Google tried this approach in appliance form.

Will the resulting information retrieval services improve findability? Jury’s still out. But the pursuit of the mass market has some interesting vectors which may work at cross purposes.

Stephen E Arnold, July 26, 2013

Sponsored by Xenky

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta