Text Analytics SummitPolySpot: Agile Enterprise Search Infrastructure

The Feivi Arnstein Interview: Founder of SearchLion

August 2, 2011

On August 1, 2011, I had an opportunity to talk with Feivi Arnstein, founder of SearchLion. SearchLion provides a browser-based interface that looks like a Google-influenced Web search system. The home page for SearchLion presents an interesting description: The new way to search. Welcome to the 21st century Web search.” The system makes it easy to narrow a query on specific types of content; for example, Web content, images, news, blogs, and Twitter messages.

SearchLion reflects a different approach from the keyword method that is quite different from the brute force approach used by the early Web search systems. In fact, the tagline for the service is “The New Way to Search.” To make certain a user understands the new direction the company is taking, the splash page offers the greeting, “Welcome to 21st century Web search.”

I ran queries on the system, which offers relevance ranked search results from Google and Yahoo. I found the output useful. When I clicked on the Open button next to an entry in the results list, the system displayed in the browser a preview of the Web page. IN addition, other hits are listed in the right hand column of the display which are related to the result I “opened”.

image

Source: www.searchlion.com

When I spoke with Mr. Arnstein, I was curious about the inspiration for the interface, which puts the focus on content, not ads. The idea for the content centric interface was, according to Mr. Arnstein, a result of his work in the financial services sector. Screens for traders, for example, are filled with information important to the task at hand. He said:

My first professional background was as a Technical Futures trader. I spent several years making a living day trading equity futures from my own private office. When you trade equities, you use software which makes use of every inch of screen space. So, for example, you can have a screen which is evenly split into four equity charts. The concept is simple: the more data you can access on the screen, the more productive you will be. I was accustomed to the efficiency of trading software. I realized that when searching and browsing the web, there were big parts of the screen going to waste. So I sought to find ways to use the available screen space to give the user more data.

He noted:

We think this fosters switching back and forth which is time consuming and can be confusing to many users. If you can have results and the source both on the same screen, our research suggests that users can find what they looking for much more quickly. In addition to opening the live sites, you can also save your searches together with the live sites. When you then load a search from your saved list, the live sites open automatically. We’ve used the same concepts without our MultiView features. Instead of the live Web site, MultiView uses the blank areas of the page to show you a different type of search result; for example, images, news, videos, etc.

The technical challenges were “interesting”, according to Mr. Arnstein. He added:

When showing more information, your browser will be using more resources. It took a lot of work and innovation to make sure the user gets his additional information, whether the live sites or the various types of results and still be extremely fast.

You can read the full interview with Mr. Arnstein on the ArnoldIT.com subsite, Search Wizards Speak. The Search Lion site is at http://www.searchlion.com.

Stephen E Arnold, August 2, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Webmasters Vocalize about the Google Panda Update

August 2, 2011

The race to the top is tough – especially in the race to Google’s top search rankings. “Google: Ask Yourself These 23 Questions if Panda Impacted Your Website” reports on the woes of some webmasters who’ve lost revenue due to Google’s rollout of the Panda Update, the algorithm change aimed at identifying low-quality pages and sites.

The article reports on Google’s Amit Singhal’s recent blog post outlining 24 questions webmasters should ask themselves about the quality of their own sites. But blog respondents also had something to say:

Singhal also notes that since Panda rolled out, Google has rolled out more than a dozen additional tweaks. But that doesn’t matter to a few people who have already commented on Singhal’s post, noting a very obvious flaw that Google still hasn’t conquered: scrapers are outranking the original content in many cases.

So what’s Google aiming at here? Could they be trying to give their AdWords a boost? AdWords, the main advertising product and source of revenue for the information giant, is the reason you see sponsored links with your Google search. An advertiser pays for trigger words.

When a user searches on Google, ads for the relevant words are shown as the sponsored links. So what does this mean for the little guy with high-quality Web content?

We’ll have to wait and see as Google continues to tweak and webmasters continue to vocalize through blog posts. The question we keep discussing at lunch is, “Are these changes intended to boost ad revenues or help the average Google searcher?”

Philip West, August 2, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Social Content Feed Tool from Know about It

August 2, 2011

When all your Facebook, Twitter, and other social streams become so convoluted, you might miss out on that link, photo, or music video you would’ve loved. You’ll never know – until now…maybe. Marshall Kirkpatrick looks at the new start-up, Know About It, in “New Service Sniffs out Secret Gems from across Your News Feeds.”

The service brings in all your subscribed content from major social networks, then offers a number of different ways to sort what it finds. My favorite is the filter called “Potentially Missed – links from people who don’t share a lot of links.

Know About It explains on its Web site they collect all the links passing through your social streams and perform a “bunch of analysis on each one to determine which are most likely to be of interest to you.”

Sounds helpful. The idea of sorting all your inbound information in a variety of ways is appealing. You can also look at the service’s recommendations based on your expressed interest or get a personalized email digest.

Mr. Kirkpatrick has not yet tested the service but likes the idea. What isn’t mentioned? Privacy. So what is the ‘bunch of analysis’ and where do all those links end up? Advertisers? If the start-up is successful, time will tell. But with the social web moving at a never-ending pace and growing, social media users wanting to sort their feeds likely won’t mind too much. We think these types of tools are likely to grow in importance as free real time search becomes a difficult service to monetize.

Philip West, August 2, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

DotNetNuke and the New SharePoint Additions

August 2, 2011

This event has been sheltered in the wires for a long time via blogs and feeds, but now users can finally rejoice! CMSWire announced that the “DotNetNuke 6.0: Revamped User Interface, Cloud, and SharePoint Integrations” has deployed. The first thing you will notice about the new DNN 6.0 is the user interface. They redesigned DNN 6.0 in C#, which makes for a more seamless flow for the eye—pure digital eye candy. The article states:

DNN 6 has a much more modern interface, but it not only looks good, it’s also much faster to build and deploy a Web site using it. With all three editions of DNN (community, professional and enterprise) a site template is shipped with many of the features you would see in a typical Web site.

Users will not longer have to scroll within a browser, web pages have been replaced with pop-up boxes and modal windows. A Pages Administration was added to the interface to make it easier to manage and create new pages in your web site. Other new features include integration with Windows Azure, Snow-covered.com, Cloud Storage, and ecommerce. Also the best news for the SharePoint crowd is that in the Enterprise edition, a connector with one-way synchronization with SharePoint to DNN. It ensures that only the most recent version of a document is uploaded to the DNN web site. Definitely check out DNN 6.0 along with SurfRay Ontolica to improve your web site’s search and webpage management.

Whitney Grace, August 2, 2011

Sponsored by SurfRay, developers of Ontolica for SharePoint

Inteltrax: Top Stories, July 25 to July 29

August 1, 2011

Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, each dealing either with some of the surprising negative news found in the analytics industry—each a lesson that can be learned for others. The system powering the Inteltrax system is called Augmentext.

Perhaps the most shocking tale of self-destruction we’ve seen in a while, “US Army is Not an Analytic Superpower,”  detailed how this defense branch spent over $2 billion taxpayer dollars for an analytic tool that never worked, when private companies could have been contracted out for pennies on the dollar.

Another sort-of David and Goliath story, “SAS Falling Behind in the Cloud,” detailed how one-time business intelligence superpower SAS has rested on its laurels and, in the process, become a joke in the competitive and lucrative world of cloud-based analytics.

Finally, we served up a cautionary tale to those believing everything they read with “Parallel NFS Barely on the Radar.” This was a story of warning, as the company in question got some great press for its software, but has almost no history to back it up, which made us incredibly suspicious.

These three stories are, thankfully, the exception and not the rule. Every day we are wowed by news of analytics and business intelligence helping practically every business imaginable. However, there are always rotten eggs, even during an impressive time of growth. That’s why we’re here, to help readers sort out the good and the bad and make more informed decisions.

Follow the Inteltrax news stream by visiting www.inteltrax.com

Patrick Roland, Editor, Inteltrax, August 1, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Oracle Updates SES11g

August 1, 2011

We wanted to mention the update to Secure Enterprise Search (SES) to our Oracle fans.  Users will want to upgrade to 11g Release 1 (11.1.2.2), which can be downloaded at the link above.

First the token bullet list of “what’s new” straight from the Web site:

  • All platforms available for download, including Windows 64-bit
  • Oracle Access Manager integration for crawler and search application
  • Autovue CAD file support
  • Custom lexers and stop words lists, on per-data source granularity

It’s nice to see that the add-on is ready to cooperate with Oracle’s own Autovue; including drawings in an index is a must for several industries.  Provided it proves functional, adding more flexibility with the stoplist should increase accuracy and weed out those pesky repetitive user-specific terms.

I scanned the release notes; no surprises here (a patch is a patch is a patch).  There are several known issues but save a few exceptions the workarounds are adequately documented.  Watch out for a possible compatibility issue with IPv6. Keep in mind that Oracle bought a natural language search engine with its InQuira purchase. NLP seems to be an interest of Oracle.

Sarah Rogers, August 1, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Vivisimo Granted Patent for Clustering

August 1, 2011

Great news for Vivisimo!  A news release titled “Vivisimo Receives U.S. Patent for Clustering System and Method” reproted:

“Before remix clustering, Vivisimo was the first tech firm to introduce on-the-fly clustering, which allowed users to see their search query results arranged in topic folders. This unique feature gave users the ability to review a list of similar results associated with their search. The invention of remix clustering took enterprise search to a whole new experience, allowing both consumers and employees to see what other subtler topics are connected to their search.”

If you haven’t had the pleasure of receiving neatly, categorized search results courtesy of remix clustering, check out Yippy (formerly Clusty).  

Rattling off a client list including Cisco Systems, NASA, the German Intelligence body, the Institute of Physics, the National Library of Medicine, the American Association for the Advancement of Science, et al lends some serious credibility for Vivisimo; needless to say these aren’t your everyday Google users or trend surfers.  If remix clustering is preferred by those in the business of information, that’s as good an endorsement as any.

So well done and congratulations, Vivisimo. I look for the clearing of this hurdle to spawn more innovation in the future or litigation, which seems to be important to many organizations.

Never hurts the ol’ pockets, either.

Sarah Rogers, August 1, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Solr Deep Paging Fix

August 1, 2011

After being spoiled by modern technology, let’s face it: who has three seconds to spare?  This ultimately is the question posed in the “Deep Paging Problem” post on the Solr Enterprise Search blog, which presents an interesting performance tweak for the open source system.

Querying data buried deep in the information banks can be a bit hairy.  Even the search giant Google stands at arms length from the problem, only returning 90-pages or so of results.  If Solr was asked to retrieve the 500th document from an index, it must cycle through each of the first 499 documents to grab it.  What can be done to save valuable time as well as ease the strain on the system, you ask?

Here enters the power of filters, handy from cigarettes to spreadsheets and nearly everything in between. The author asserts:

“The idea is to limit the number of documents Lucene must put in the queue. How to do it? We will use filters to help us, so Solr we will use the fq parameter. Using a filter will limit the number of search results. The ideal size of the queue would be the one that is passed in the rows parameter of query. … The solution … is making two queries instead of just one – the first one to see how limiting is our filter thus using rows=0 and start=0, and the second is already adequately calculated.”

So use the two saved seconds in searching to write that down.  One query to recover the first page of results and a second, two-part query to check the number of results and then return the desired elements.  For a useful example of the code in action, check the original post linked above.

Sarah Rogers, August 1, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

SharePoint Licensees May Want to Check Out Daytona

August 1, 2011

Here’s an interesting open source item from Microsoft, which may at some point have an impact on the traditional Microsoft server ZDNet reports that “Microsoft rolls out ‘Daytona’ MapReduce runtime for Windows Azure.” Project Daytona, developed by the company’s eXtreme Computing Group, taps the compute and storage services of Azure to perform analytics in the cloud. Writer Mary-Jo Foley quotes the explanation from Microsoft’s research site:

“Using Daytona, a user can submit a model, such as a data-analytics or machine-learning algorithm, written as a map-and-reduce function to the Daytona service for execution on Windows Azure. The Daytona runtime will coordinate the execution of the map-and-reduce tasks that implement the algorithm across multiple Azure virtual machines.”

So, will this development obsolete the company’s own SQL Server? What will happen with SharePoint cloud implementations using the traditional Microsoft server line up?

No answers yet.

Microsoft is looking to the future with this one, and is banking heavily on continued migration to the Cloud. I, for one, don’t trust others to manage my data and have trouble imagining that no one else shares my suspicions. I’ve been wrong before, though, and the trend does seem to be growing unchecked. We’ll see.

If you want to make your SharePoint implementation more useful, check out the SurfRay Ontolica search solution. The work.

Cynthia Murrell, August 1, 2011

Sponsored by SurfRay, developers of Ontolica.

« Previous Page

  •  Only search links from this page: