Google Puts Some Effort into the Google Search Appliance

February 12, 2014

Last I knew, the Google Search Appliance (GAS) had trimmed its product line, eliminated the impulse buy option for the Mini, and kept the price at the higher end of the appliance market.

I learned over the last two years that Google has placed more than 60,000 GSAs in organizations. I have no idea if the number is valid, but if it is, the GSA is one of the top dogs in enterprise search. I also heard that there was a small team working on the GSA and an even smaller team handling customer support. Google pushes functions to resellers who deal with the customers. Google outsources manufacturing of the GSA. Most important, Google seems to have an off-again, on-again interest in on premises search. The future, as I understand it, is the cloud. The GSA is, in my opinion, an anachronism in the Nest, X Labs, and Android-Chrome world. But, hey, I have been wrong before. I once asserted that basic search should not be a challenge for most organizations. Wow, did I get that wrong! Jail time, law suits, and DARPA’s almost admission that search is not working notwithstanding.

image

The GSA has been around almost a decade. Version 7.2 is “a leader in the Garnet Enterprise Search MQ.” I certainly don’t doubt the word of an estimable azure chip consulting firm. No, no, no.

The new version, according to Google, delivers:

  • Metadata sorting. A function available in the 1983 version of Fulcrum Technologies’ system
  • language translation. A function available from Delphes in the 1990s
  • A document preview function. iPhrase in 1999 delivered this feature
  • Entity recognition. Verity implemented this function in the 1980s
  • Dynamic navigation. Endeca rolled out this feature in 1998

In my opinion, the GSA is catching up to innovations available for many years from other vendors. Comparing the EPI Thunderstone and Maxxcat appliances to the GSA emphasizes that the GSA is not quite at parity with other products in the channel.

According to “Google Updates Enterprise Search Appliance Tool,”

The GSA 7.2 update comes more than a year after the firm upgraded the GSA to version 7.0, and builds on the features included in that update. The most notable includes the ability to improve the way data can be indexed with key attributes, such as author name, or the date it was created.

How much does a GSA cost? According to the US government’s GSAadvantage.gov, a 36 month license for a GB 7007 is $69,296 for 500,000 documents. Have more documents? Pay for an upgrade. However, I can use a hosted service like Blossom Software to index my content for about $2,400 per month. I can use the low cost dtSearch solution for $160 per seat. I can download an open source solution and do it myself.

For an organization with 20 million documents to index, the cost of the GSA solution noses into HP Autonomy territory. Too rich for my blood, and I think that lower cost appliance vendors will see the Google Search Appliance as a lead generator.

I wonder if those azure chip consultants have licensed the GSA to handle their Intranet information retrieval tasks?

Stephen E Arnold, February 12, 2014

Advice on Making the Most of Limited Data

February 12, 2014

The article How To Do Predictive Analytics with Limited Data from Datameer on Slideshare suggests that Limited Data may replace Big Data in import. The idea of “semi-supervised learning” is presented to handle the difficulties associated with creating predictions based on limited data such as expense and manageability and simply missing key data. The overview states,

“As it turns out, recent research on machine learning techniques has found a way to deal effectively with such situations with a technique called semi-supervised learning. These techniques are often able to leverage the vast amount of related, but unlabeled data to generate accurate models. In this talk, we will give an overview of the most common techniques including co-training regularization. We first explain the principles and underlying assumptions of semi-supervised learning and then show how to implement such methods with Hadoop.”

The presentation summarizes possible approaches to semi-supervised learning and the assumptions it is possible to make about unlabeled data (these include such models as clustering, low density and manifold assumptions). It also covers the concepts of Label Propagation and Nearest Neighbor Join. However, as inviting as it is to forget Big Data, and switch to predictive analytics with Limited Data the suggestion may sound too much like Bayes-Laplace.

Chelsea Kerwin, February 12, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Major Topics for the Word Economic Forum Predicted by Digimind

February 12, 2014

The article titled What to Expect From the World Economic Forum 2014 on Digimind’s Blog predicts the major themes of the WEF in advance of its start. The WEF is an independent organization committed to stimulating communication and engagement among political, business and academic leaders worldwide. The content processing company, Digimind, explains what can be expected by the trending buzzwords from Digimind Social’s wordcloud. The article states,

“As the business, political, academic and other leaders of society descend on the Swiss Alpine town today, we can exclusively reveal that the Global Risks talks, focusing on the global economy, are being buzzed about the most online. With words like ‘Rich,’ ‘Poor,’ and ‘Inequality’ appearing frequently… it is not hard to deduce that the global wealth divide and income inequality will feature heavily, especially given Oxfam’s recent report…claiming that the 85 richest people on the planet own nearly half of global wealth.”

Other key concepts include Madrid’s mayor Ana Botella, a divisive politician whom many Spanish twitter users have lambasted for incompetency. Besides income inequality, the major issue at hand, Iran also figures on the list, along with Syria and Oxfam. As these topics are addressed, Digimind promises to update its audience. This bold move by Digimind invites speculation about the accuracy of their predictions.

Chelsea Kerwin, February 12, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Digital Reasoning and Paragon Science Promote Natural Language Processing and Graph Analysis

February 12, 2014

The presentation on slideshare titled Got Chaos? Extracting Business Intelligence from Email with Natural Language Processing and Dynamic Graph Analysis discusses the work by Digital Reasoning and Paragon Science. Digital Reasoning asserts that it is an Oracle for human language data. There are color-coded sentences that illustrate the abilities of Natural Language Processing, from recognizing people and location words to entities related to a single concept and associated entities. The presentation consists of many equations, but the overview explains,

“In this presentation, O’Reilly author and Digital Reasoning CTO Matthew Russell along with Dr. Steve Kramer, founder and chief scientist at Paragon Science, discuss how Digital Reasoning processed the Enron corpus with its advanced Natural Language Processing (NLP) technology – effectively transforming it into building blocks that are viable for data science. Then, Paragon Science used dynamic graph analysis inspired from particle physics to tease out insights from the data..”

Ultimately the point of the entire process was to gain a better understanding of how the Enron catastrophe could be avoided in other enterprises. It is difficult to say whether Digital Reasoning is imitating IBM Watson or if IBM Watson is imitating Digital Reasoning. At any rate it sound familiar, didn’t Autonomy, TeraText, and other firms push into this sector decades ago?

Chelsea Kerwin, February 12, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Las Vegas Hosts SharePoint Conference

February 12, 2014

Although the nation is currently locked in what seems to be an endless snowstorm, believe it or not, spring is coming. And with spring comes the conference season. One highly anticipated annual conference is SPC14, which will be held this year in Sin City. Read more in the PR Web release, “Intlock Hits Sin City for the Biggest SharePoint Event of the Year: SPC14.”

The article begins:

“With Q1 of 2014 coming to its peak, Intlock will be joining the SharePoint buzz and crowds at SPC14 in Las Vegas from March 3-6, 2013, the largest and most comprehensive of SharePoint conferences held worldwide.”

Intlock is just one of many third party vendors that will be there to showcase the functionality that can be achieved with smart third party solutions are added to a solid SharePoint infrastructure. Stephen E. Arnold, a longtime leader in search and SharePoint, will be keeping an eye on the conference and its major announcements. His information service, ArnoldIT.com, makes SharePoint a common topic for coverage, so SharePoint users should check in frequently for latest in SharePoint news and tips.

Emily Rae Aldridge, February 12, 2014

DARPA Hints That Search Fails

February 11, 2014

One of my two or three readers sent me a link to “DARPA-BAA-14-21: Memex.” The item is interesting because it reaches back to the idea of Vannevar Bush, sidesteps the use of the word “Memex” by a search vendor once operating in the United Kingdom, and provides pretty clear proof that DARPA is not happy with search. You can dig into the details at https://www.fbo.gov/utils/view?id=32c351ba7850360e140a29f363819052.

US government content has some interesting characteristics. One of the most interesting is that items like DARPA-BAA-14-21 appear without context. For example, there is not a hint, nary a whisper of In-Q-Tel’s investments in search and content processing. Years ago, I heard at an intel conference that In-Q-Tel funds promising companies but few of these deliver operational payoffs. You can see a list of In-Q-Tel investments at https://www.iqt.org/portfolio/. Some of these companies deliver darned interesting demonstration systems. Others have offered solutions that were eventually abandoned. Others are  like Fourth of July fireworks; that is, the financial support and walk arounds provide the type of show that some decision makers perceive as progress and purposeful action.

The net net is that this DARPA item underscores that information retrieval system is not appropriate for the future needs of DARPA. For me, this is one indication that my assertion about the troubled state of information retrieval.

Perhaps the funding, the TREC tests, and the DARPA solicitation will yield a payoff for operational personnel. “Perhaps” is a bit soft even if the devalued dollars are real. Our research offers some interesting facts that finding information today is more difficult than it was five years ago.

Stephen E Arnold, February 11, 2014

Attivio and Quant5 Partner to Meet Challenges of Data Analytics

February 11, 2014

The article on PRNewswire titled Attivio and Quant5 Partner to Bring Fast and Reliable Predictive Customer Analytics to the Cloud explains the partnership between the two analytics innovators. Aimed at producing information from data without the hassle of a team of data scientists, the partnership promises to effectively create insights that companies will be able to act on. The partnership responds to the growing frustration some companies face with gleaning useful information from huge amounts of data. The article explains,

“Attivio built its business around the core principle that integrating big data and big content should not require expensive mainframe legacy systems, handcuffing service agreements, years of integration and expensive data scientists. Attivio enterprise customers experience business-changing efficiency, sales and competitive results within 90 days. Similarly, Quant5 arose from the understanding that businesses need simple, elegant solutions to address difficult and complex marketing challenges. Quant5 customers experience increased revenues, reduced customer churn and an affordable and fast path to predictive analytics.”

The possibility of indirect sales following in the footsteps of Autonomy and Endeca does seem to be a part of the 2014 tactics. The AttivioQuant5, Inc. solutions are offered in five major areas of concern: Lead & Opportunity Scoring, Customer Segmentation, Targeted Offers, Product Usage and Product Relationships.

Chelsea Kerwin, February 11, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Free Statistics Text from Computer Science TA

February 11, 2014

The Probability and Statistics Cookbook from Matthias Vallentin is a free statistics text. The creator, Vallentin, is a doctoral student at UC Berkeley who works with Vern Paxson in his studies of computer science. While there Vallentin has worked as a teaching assistant in undergraduate computer security course. Vallentin also works for the International Computer Science Institute. His research in network intrusion and network forensics began in his undergraduate career in Germany. The “cookbook” is explained in the article,

“The cookbook aims to be language agnostic and factors out its textual elements into a separate dictionary. It is thus possible to translate the entire cookbook without needing to change the core LaTeX source and simply providing a new dictionary file. Please see the github repository for details. The current translation setup is heavily geared to Roman languages, as this was the easiest way to begin with. Feel free to make the necessary changes to the LaTeX source to relax this constraint.”

The overview provides screenshots that make it clear the cookbook is more interested in the mathematical crux rather than elaborate clarifications. The author is open to pull requests in order to lengthen the cookbook, but in the meanwhile the LaTeX source code can be found on github.

Chelsea Kerwin, February 11, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

About Those Recommended Story Links at News Sites

February 11, 2014

Ever wonder about the origin of the often incongruous titles listed at the bottom of real news stories online? The Washington Post declares, “You’ll Never Believe How Recommended Stories Are Generated on Otherwise Serious News Sites.” Not surprisingly, these “recommended” stories are outright click bait. The ones that appear are based on a combination of the user’s online activity and standards the news sites chose to place (or not) on the content. This means serious stories are often followed by silly and salacious links, which are made to look like real articles from whatever site you’re visiting at the time. I appreciate our head Goose for refusing to attach such distractions to the work of me and my fellow Beyond Search writers.

These pieces of writing which show up all over the Web are, for the most part, the work of two outfits, Taboola and Outbrain. Both companies launched in Israel and are now headquartered in New York. Reportedly, Taboola expects to reap $100 million in revenue this year from these tactics. They aren’t the only ones to profit, either. The news sites get a cut, of course, and those “articles” are often links to marketer-crafted promotional content. Yet another brick removed from the wall between advertising and useful information. The practice’s longevity may be limited, though. The (real!) article reports:

“The algorithms are still a long way from knowing that you really don’t want to read the story ‘No Way These Celebrities Are 60-Plus!,’ a Taboola recommendation on HuffingtonPost.com on Thursday. In fact, digital media analyst Ken Doctor, author of the Newsonomics blog, said news-recommendation engines are ‘degenerating’ in efficiency as they serve up more links from more publishers seeking readers and traffic. ‘Early on, the click-through rates [by readers] were as high as 6 to 8 percent, because the [recommended] stories were relevant to the stories on the [host] site,’ he said. ‘But there’s been a junking up that does a disservice to the reader. There’s too much “click bait” that has no relation to the actual readers of the site.'”

Apparently, Outbrain’s Lisa LaCour would disagree that her industry has seen its best days. She says her company is working to offer recommendations that users really, truly want to read, “but we’re not there yet.” Will they get there, or has this racket just about run its course? We’ll see, but in the meantime these companies, and the news sites, are raking in the bucks pennies at a time. That’s one weird trick!

Cynthia Murrell, February 11, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Bill Clinton is Keynote Speaker at Upcoming SharePoint Event

February 11, 2014

Who says that IT isn’t showy, or a game of personalities? SharePoint is dispelling this myth with their latest announcement. Read all the details in the WinBeta article, “Microsoft Welcomes Bill Clinton as the Keynote Speaker at SharePoint Conference 2014.”

The article begins:

“The next one [conference] up on the schedule is the SharePoint Conference, which kicks off on Monday March 3rd in Las Vegas, and runs through the 6th. Today the company announces a very special keynote speaker will be getting things started at 8:30am that Monday. President Bill Clinton will take the stage to kick things off.”

And while Bill Clinton may be a powerful personality, Stephen E. Arnold is a longtime leader in search and a more authoritative voice when it comes to SharePoint. Arnold shares his knowledge on his information service, ArnoldIT.com. Stay tuned for all of the latest SharePoint tips and tricks from a professional point of view.

Emily Rae Aldridge, February 11, 2014

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta