SearchBlox 7.0 Available

June 7, 2012

SearchBlox’s blog invites us to “Compare SearchBlox7.0 vs. Solr.” Okay, so I am to compare. I wonder how? There is no side-by-side comparison set up here nor is there any link to one. Hmm. . . I guess I am expected to do the legwork.

Misleading headline aside, the write up does thoroughly describe SearchBlox new version 7.0 in relation to rival Solr. It reads:

“SearchBlox 7 is a (free) enterprise solution for website, ecommerce, intranet and portal search. The new 7.0 version makes it easy to add faceted search without the hassles of managing a schema and scales horizontally without any manual configuration or external software/scripts. SearchBlox enables you to achieve term, range and date based faceted search without manually maintaining a schema file as in Solr. SearchBlox enables to have distributed indexing and searching abilities without using any separate scripts/programs as in SolrCloud. SearchBlox provides on demand dynamic faceting of fields without specifying them through a config or script.”

The software also sports a Web-based administrator’s console. Unlike Solr, SearchBlox indexes custom meta fields without the need to specify custom fields or setup within the schema.xml file. It also supports: multiple indexes out of the box; indexing of custom content and multiple content types; and the specification of facets at runtime (as opposed to requiring a prior definition). Another nifty feature lets you add or remove SearchBlox servers from a cluster without the need to restart or stop the servers.

Perhaps SearchBlox 7.0 outpaces Solr in these metrics because it is built on top of that Apache product. SearchBlox Software was founded in 2003 and is based in Richmond, VA.

Cynthia Murrell, June 7, 2012

Sponsored by PolySpot

Get a Comprehensive Search Solution for SharePoint from Fabasoft Mindbreeze

June 4, 2012

In “SharePoint Log: When Databases Rebel,” Robert Schifreen looks at how one user can generate 16 gigabytes of logs in just three months. The article is the ninth part of a larger SharePoint 2010 series chronicling a SharePoint deployment at the ZDNet Blog.

Schifreen has this to say about navigating the growing amounts of data:

Microsoft markets a separate SharePoint add-on product called FAST Search, and likes to imply that no successful SharePoint installation is complete without it. In practice, from what I have read, it seems that FAST is unnecessary unless you have tens of millions of documents to index. Otherwise, SharePoint’s out-of-the-box indexing system will crawl the full text of all your documents (you’ll need to download a free ifilter, as it’s called, to crawl PDF files) perfectly well.

But he goes on to add:

There’s a handful of things missing from the standard search, such as having the number of hits displayed in brackets within the search results page, and there are no thumbnail previews of search results, but nothing that is sufficiently must-have to warrant the added expense or complication of learning yet another Microsoft technology.

We know SharePoint is a complex and beneficial system for content management, but we also know there are gaps in the out-of-the-box search feature. But you don’t have to learn a new Microsoft technology or settle for less. Consider a third party solution developed and devoted specifically to search, like Fabasoft Mindbreeze. Their Web Parts based information pairing capabilities give you powerful searches and a complete picture of your business information, allowing you to get the most out of your enterprise search investments. And your end users will benefit from the fast and intuitive search with clearly displayed results and simple navigation.

Creating relevant knowledge means processing data in a comprehensible form and utilizing relations between information objects. Data is sorted according to type and relevance. The enterprise search for professionals.

Mindbreeze’s intuitiveness also means less training required. They have tutorials and wikis that are easy to use and more efficient. Here you can browse Mindbreeze’s support tools for users, including videos, FAQs, wikis, and other training options. Check out the full suite of solutions at Fabasoft Mindbreeze.

Philip West, June 4, 2012

Sponsored by Pandia.com

Forward Search 2.7 Arrives

June 1, 2012

The eagerly anticipated newest version of Forward Search has finally been released. According to Release of Forward Search 2.7, extensive testing has been done over the past several weeks to ensure the functionality of new features and improvements. The company is secure in their opinion that this is the most stable, flexible and versatile Forward Search version ever.

The Highlighted Features and Improvements are as follows on the backend client:

“Facet Counted Search – Count of each found result, Faster Numerical Range Query – Returns hits within specified range, Type-ahead improvement – Support for selected fields and sorting by frequency, Improved HTML5 support – New filtering options for extended custom fields, WebService improvement – Json-returning search interface, Web Crawler – Now supports partial crawl, and Indexing – Complete re-indexing of an index. “

Some of the new Administration Clients Services are:

“New Atom-Feed News reader – relaying news from the Forward Search Partner Portal, Added support for above backend features, and improved interface for related control element editing”

Forward Search is a Microsoft Partner that offers Enterprise Search for enterprise solutions including Content Management, intranets, databases, document repositories, OEM software etc. They currently work with over 35 partners in 9 countries offering backing and support to enable corporations to handle large amounts of unstructured data and create success within their client circles by utilizing Content Management Solutions based on Microsoft technology like EPiServer, Sitecore and Umbraco.

Jennifer Shockley, June 1, 2012

Sponsored by PolySpot

Lucid Imagination Previews Solr 4

May 30, 2012

With the first alpha release of Solr 4 promised to us soon, Lucid Imagination posts a “Solr 4 Preview: SolrCloud, NoSQL, and More.” Solr 4 is full of features that enhance existing Solr applications. It also paves the way for new applications by further muddying the distinction between full text search and NoSQL.

Solr Cloud is the code name for the largest set of features. These promise easy scalability to Solr as well as distributed indexing to boost elements such as real time-get, optimistic locking, and durable updates.

Solr 4 incorporates Apache‘s robust distributed coordination project ZooKeeper. This tool contains the Solr configuration as well as cluster meta-data such as hosts, collections, shards, and replicas. The post describes how distributed coordination works in Solr 4:

“When a new node is brought up, it will automatically be assigned a role such as becoming an additional replica for a shard. A bounced node can do a quick ‘peer sync’ by exchanging updates with its peers in order to bring itself back up to date. New nodes, or those that have been down too long, recover by replicating the whole index of a peer while concurrently buffering any new updates.

“An update can be sent to any node in the cluster, and it’s automatically forwarded to the correct node and immediately replicated to a number of other nodes to enable fault tolerance, high availability, and query scalability. Likewise, queries may be sent to any node in a cluster and they will automatically be routed to the correct nodes and load balanced across replicas.”

Solr 4 is packed with other new features, like pivot faceting, pseudo-fields, and a spell checker that can work from the main index to name just a few. See the write up for more.

Lucid Imagination is the commercial company for Lucene and its search server Solr. The company crafts robust scalable search solutions that make the most of the open source technology. Lucid prides itself on making open source search accessible and easy to learn for clients worldwide, many of which are industry heavyweights. These search gurus recently moved to new digs in Redwood City, CA.

Cynthia Murrell, May 30, 2012

Sponsored by PolySpot

Semantic Key Word Research

May 29, 2012

Keyword research is the time-tested, reliable way to locate information on the Internet and databases. There have been many changes to they way people use keyword research, some of them have stayed around and others have disappeared into the invisible web faster than a spambot hits a web site. The Search Engine Journal has come up with “5 Tips for Conducting Semantic Keyword Research” which believes that users “must recognize the semantic nature of the search engines’ indexing behaviors.”

For those without a dictionary handy, semantics refers to the meaning or interpretation of a word or phrase. When a user types a phrase into a search engine, it uses indexing (akin to browsing through a list of synonyms) to find other pertinent results.

words yellow copy

A happy quack to http://languagelog.ldc.upenn.edu

So how do the tips measure up? Tip #1 has users create a list of “level 1” core keywords aka write a list of subject/keywords. This is the first step in any research project and most people will be familiar with it if they have completed elementary school. Pretty basic, but it builds the foundation for an entire project. Tip #2 delves farther by having users expand the first list by finding more supporting keywords that are not necessary tied to the main keyword, but are connected to others on the list. Again another elementary research tip, reach out and expand.

Tip #3 moves us away from the keyword lists and tells users to peruse their results and see what questions they can answer. After the users find what can be answered they make another list detailing their findings (so we didn’t step that far away from lists).

Tip #4 explains to combine tips #1-3, which will allow the users to outline their research and then write an article on the topic. Lastly. Tip #5 is a fare-thee-well, good luck, and write interesting content:

“One final tip for incorporating semantically-related keywords into your website’s content…  Building these varied phrases into your web articles should help eliminate the stilted, unpleasant content that results from trying to stuff a single target keyword into your text a certain number of times.

However, it’s still important to focus on using your new keyword lists to write content that’s as appealing to your readers as it is to the search engines.  If Google’s recent crackdowns on Web spam are any indication of its future intentions, it’s safe to say that the best long-term strategy is to use semantic keywords to enhance the value of your copy – without letting its optimization eclipse the quality of the information you deliver to your website visitors.”

What have we got here? Are the tips useful? Yes, they are, but they do not bring about new material about keyword searching. As mentioned earlier, these steps are taught as the very basic of elementary research: make a keyword list about your topic, find associated terms, read what you got, then write the report. It is true that many schools and higher education institutes do not teach the basics, thus so-called researchers lack these finite skills. Also people tend to forget the beginner’s steps. Two common mishaps that make articles like this necessary, but the more seasoned researcher will simply intone, “Duh!.”

Whitney Grace, May 29, 2012

Sponsored by Polyspot

Big Outfits Buy Search Vendors: Does Chaos Commence?

May 25, 2012

I don’t want to mention any specifics in this write up. I have a for-fee Overflight on the subject. I do want to highlight some of the preliminary thoughts the goslings and I collected before creating our client-focused analysis. This write up was sparked by the recent news that the founder of Autonomy, which HP acquired for $10 billion, is seeking new opportunities after eight months immersed in the HP way. See “Hewlett-Packard Can’t Say It Wasn’t Warned about Autonomy.” This write up contained a remarkable statement, even when measured against the work of other “real” journalists:

Some will say this is a classic case of an entrepreneurial business being bought by a hulking, bureaucratic institution which failed to integrate it and failed to understand its culture. Others will say HP, desperate to do a deal, simply overpaid for a company that was going to struggle to maintain its sales and earnings momentum and was deluded about its abilities. Certainly warnings about the latter were there for HP to see before it handed over all that cash. Here’s what Marc Geall, a Deutsche Bank analyst who used to work at Autonomy, said in October 2010 about the business model: “…investment in the business has lagged revenues… [which] could affect customer satisfaction towards the product and the value it delivers.” He went on to warn that Autonomy’s service business was “too lean” and that it “risks falling short of standards demanded by customers”. All of which prompted Geall to question whether the company needed to change its business model – “traditionally, software companies have needed to change their business models at around $1bn in revenues”.

Yep, now the issues are easy to identify: the brutal cost of customer support, the yawning maw of research and development, the time and cost of customizing a system. The problem is that these issues have been identified. However, senior managers looking for the next big thing are extremely confident of their business and technical acumen. Search is a slam dunk. Heck, I can find what I want in Google. How tough can it be to find that purchase order? That confidence may work in business school, but it has not worked in the wild-and-crazy world of enterprise search and content processing.

Think back to the notable search acquisitions over the last few years. Here are some to jump start your memory:

  • IBM in 2005 and 2006 purchases iPhrase (a MarkLogic precursor with semantic components) and Language Analysis Systems (a next generation content processing vendor)
  • Microsoft which acquired Powerset and Fast Search & Transfer in the 2008 to 2009 period. Both vendors had next-generation systems with semantic, natural language processing, and other near-magical capabilities
  • Oracle acquired TripleHop in 2005, focused on its less-and-less visible Secure Enterprise Search line up (SES10g and SES11g), then went on a buying spree to snap up InQuira (actually the company formed when two weaker players, Answerfriend Inc. and Electric Knowledge Inc., merged in 2002 or 2003, RightNow (which uses the Q-Go natural language processing system purchased in 2010 or 2011), and Endeca, an established search vendor with technology dating from the late 1990s)
  • SAP snagged some search functions with its NetWeaver buy in 2004 which coexisted in a truce of sorts with the SAP TREX system. SAP bought Business Objects in 2007, the company inherited the Inxight Software, a text analytics vendor with assorted wizardry explained in buzzwords by marketing mavens.

So what have we learned from these buy outs by big companies? Here are the observations:

First, search and content processing does not behave the way other types of software learns to sit, come, and roll over. The MBAs, lawyers, and accountants issue commands like good organizational team players. The enterprise search and content processing crowd listens to the management edicts with bemusement. Everyone thinks search is a slam dunk. How tough can a utility function be? Well, let me remind you, gentle reader, search is pretty darned difficult. Unlike a cloud service for managing contacts, search is not one thing. Furthermore, those who have to use search are generally annoyed because systems have since 1970 failed to generate answers. Search outputs create more work. Usually the outputs are mostly wide of the mark. Big companies want to sell a software product or service that solves a problem like what is the back log for the Midwestern region or when did I last call Mr. Jones? The big companies don’t get this type of system when they buy, often for a premium, companies which purport to make content findable, smart, and accessible. So we have a situation in which a sales presentation whets the appetite of the big company executive who perceives himself or herself as an expert in search. Then when anticipation is at its peak, the sales person closes the deal. In the aftermath, the executives realize that search just does not follow the groove of an accounting system, a videoconferencing system, or a security system. Panic sets in, and you get crazy actions. IBM pretty much jettisoned its search systems and fell in love with open source Lucene / Solr. Good enough was a lot better than trying to figure out the mysteries of proprietary search and how to pay for the brutal research and development costs search requires.

Second, search is a moving target. I find that as recently as my meetings with sleek MBAs from six major financial firms, search was assumed to be a no brainer. Google has figured out search. Move on. When I asked the group how many considered themselves experts in search, everyone replied, “Yes.” I submit that none of these well-paid movers-and-shakers are very good at search and retrieval. Few of them have the time or patience for old fashioned research. Most get information from colleagues, via phone calls which include “I have a hard stop in five minutes”, and emails sent to people whom they have met at social functions or at conferences. Search is not looking up a phone number. Search is not slamming the name of a company into Google. Search is not wandering around midtown Manhattan with an iPhone displaying the location of a pizza joint. Search is whatever the user wishes to find, access, know, or learn at any point in time and in any context. Google is okay at some search functions. Other vendors are okay at others. The problem is that virtually all search and retrieval solutions are okay. People have been trying for about 50 years to deliver responses to queries that are what the user requires. Most systems dissatisfy more than half their users and have for 50 years. A big company buying a next generation search system wants these problems solved. The big company wants to close deals, get client access licenses, or cloud transactions for queries. But the big companies don’t get these things, so the MBAs, lawyers, and accountants are really confused. Confused people make crazy decisions. You get the idea.

Third, search does not mean search. Search technology includes figuring out which words to index in a document. Search does a miserable job of indexing videos unless the video audio track is converted to ASCII and then that ASCII is indexed. Even with this type of content processing system, search does not deliver a usable output. What a user gets is garbled snippets and maybe the opportunity to look at a video to figure out if the information is relevant. Search includes figuring out what a user wants before the user asks the question or even knows what the question is. One company is collecting millions in venture money to achieve this goal. Good luck on that. Search includes providing outputs that answer an employee’s specific question. Most systems provide a horseshoe type of result; that is, the search vendor wants points for getting close to the answer. Employees who have to click, scan, close, and repeat the process are not amused. The employee wants the Smith invoice from April, not increased risk of carpal tunnel problems. The poobahs who acquire search companies want none of these excuses. The poobahs want sales. What search acquisitions generate are increased costs, long sales cycles, and much friction. Marketers overstate and search systems routinely under deliver.

Who cares?

Another enterprise search train wreck. The engineer was either an MBA, an accountant, or a lawyer. No big deal. Just get another search train. How tough can it be to run a search system? Thanks to http://www.eccchistory.org/CCRailroads.htm

Well, the executives selling big companies a search and content processing just want the money. After years of backbreaking effort to generate revenues, the founders usually figure out that there are easier ways to earn a living. If the founders don’t bail out, they get a new job or become a guru at a venture capital firm.

Read more

Lucid Imagination Conference Full of Insights

May 21, 2012

The powerful advantages of open source search solutions are still new to many who would embrace them. That’s one of the conclusions to be drawn from O’Reilly Radar’s “Lucene Conference Touches Many Areas of Growth in Search.” Presenters at Lucid Imagination’s recent conference, Lucene Revolution, detailed those advantages as well as new developments in the field.

Sign-up stats indicate that many of the attendees were new to Lucene and Solr, with about a third having experienced them for less than one year. It sounds like there’s a lot of room for the technologies to grow.

There is more information in the article than I can go into here, so you might want to check it out for yourself. Writer and conference attendee Andy Oram shares some of the highlights regarding big data:

Mark Davis did a fast-pace presentation on the use of Solr along with Hadoop, and systems hosting GPUs at the information processing firm Kitenga. A RESTful API from LucidWorks Enterprise gives Solr access to Hadoop to run jobs. Glenn Engstrand described how Zoosk, The “Romantic Social Network,” keeps slow operations on the update side of the operation so that searches can be simple and fast. As in many applications, Solr at Zoosk pulls information from MySQL. Other tools they use include the High-speed ObjectWeb Logger (HOWL) to log transactions and RabbitMQ for auto-acknowledge messages. HOWL is also useful for warming Solr’s cache with recent searches, because certain operations flush the cache. “

We welcome another big data development revealed at the conference: LucidWorks Big Data platform, now in Beta, will allow users to manage Solr schemas without having to configure and certify the local tools. Now, there’s a time saver. Lucid vows that the platform can handle any “volume, variety, and velocity” of content.

Auto-completion warranted its own presentation, wherein Sudarshan Gaikaiwari focused on geospatially informed results. Geohashes are used to retrieve geospatial info. They represent the world’s grid as arbitrary strings, where shorter strings represent larger regions and adding characters narrows the search to smaller area. Using these, applications can suggest auto-completed terms local to the user, like a nearby museum or restaurant.

Oram notes that Apache‘s Lucene is probably the most popular independent search engine. Written in Java, it can be used for nearly any full-text search application, particularly cross-platform. The engine boasts low memory requirements, fast incremental indexing, and an array of query types.

Conference sponsor Lucid Imagination is the commercial company for Lucene and its search server Solr. The company crafts robust scalable search solutions that make the most of the open source technology. Lucid prides itself on making open source search accessible and easy to learn. These search gurus recently moved to new digs in Redwood City, CA.

Cynthia Murrell, May 21, 2012

Sponsored by HighGainBlog

Silobreaker Serves Swiss Soldiery

May 15, 2012

Switzerland’s Department of Defense, Civil Protection, and Sport (DDPS) will soon be relying on Solobreaker’s considerable data management chops, MarketWire announces in “Silobreaker Delivers Enterprise Software to Swiss Armed Forces.” The software underpins a turnkey open source intelligence (OSINT) solution provided by LearningWell, a Swedish integration consulting firm.

The write up quotes Kristofer Månsson, Silobreaker’s CEO:

“Situational awareness and contextual insight are essential and time-critical requirements for any corporate or governmental organization today. Yet, users are drowning in information, and cutting through the proliferation of content from both traditional and social media represents significant challenges, which cannot be met by the use of conventional search methods. We are very pleased that customers keep recognizing our products as leading edge for analytical and sense-making purposes, as well as for the efficiency of their prompt implementations.”

Yes, the “big data” phenomenon is placing big demands on organizations everywhere. Silobreaker’s Enterprise Software Suite is a robust tool for making sense of it all. It covers workflow from beginning— back end content, aggregation, indexing, classification and storage— to end— front-end search, filtering, analysis, visualization, user collaboration, report generation, and decision support. The software handles both structured and unstructured content with aplomb, and manages data from both inside and outside sources, both horizontally and vertically.

Founded in 2005, Silobreaker has headquarters in London and Stockholm. Their solutions facilitate teamwork across user groups through strong single user platforms that provide information aggregation, analytical tools, and collaboration features. Besides the aforementioned Enterprise Software Suite, the company offers Silobreaker Premium, a powerful intelligence and media monitoring SaaS tool for corporate, financial, NGO and government agency users. Silobreaker’s products help many private, corporate, academic, financial and government organizations worldwide with intelligence, media-monitoring, risk management, and early warning capabilities.

Cynthia Murrell, May 15, 2012

Sponsored by PolySpot

Yahoo, Flubs, and an Azure Chip Consulting Firm

May 12, 2012

The addled goose steers clear of icebergs. But Yahoo, flubs, and an azure chip consulting firm keep appearing in my Overflight system. The most recent item to catch my attention was “Heidrick & Struggles Slaps Back at Thompson’s Yahoo in Blame Game Over ResuMess.” In terms of Web indexing, this headline is a keeper. I am not sure how many hits “resumess” had prior to this article, but it will be a zingy word going forward.

The point of this write up is that an azure chip consulting firm in the business of recruiting blue-chip or maybe azure chip executives defended itself and its professionalism. Here’s the passage in the “real” news story I noted:

[Scott] Thompson [the CEO with the flub on his bio] did not name the firm, but he was clearly referring to Heidrick & Struggles, which handled that placement. It was also working on the Yahoo CEO search, after the Silicon Valley Internet giant fired its former CEO Carol Bartz last fall. But, because it had originally placed Thompson at eBay, the firm did not work on his hiring at Yahoo.

Ah, the same firm—Heidrick & Struggles–was involved with eBay and Yahoo. Some questions:

  1. What did the headhunting firm have in its files about Mr. Thompson? Perhaps an “old” version of Mr. Thomson’s curriculum vitae?
  2. Did anyone request a transcript from Mr. Thompson’s college? If so, who and when? What did the transcript reveal?
  3. Why did the azure chip consulting firm write a letter without some hard data. I have been in meetings in which highly paid consultants armed with stacks of “facts”, clippings, data, and interview notes. Why not present some of this information?

A mistake happened somewhere along the line. As a curious type of person, I was hoping for some more substance to what is a most interesting affair. Oh, I graduated from Bradley University with a major in poetry. Now I am an addled goose floating in a pond filled with mine run off. Iambic pentameter or perhaps something with a Catullus dactylic Hexameter. I should have applied for a job at eBay or Yahoo in my youth. Engineers, MBAs, accountants, and movie moguls have not fared particularly well. A spondee to you, gentle reader. A struggle one might say.

Stephen E Arnold, May 12, 2012

Sponsored by HighGainBlog

Swiftype: A Challenge Google, SearchBlox, and Blossom

May 9, 2012

The SEO crowd and the newly minted open source search experts are usually insensitive to the challenge of site search. Now an outfit called Swiftype wants to displace Google and its site search / custom search, SearchBlox (an open source Web site indexing service), and Blossom (one of the leaders in hosted search for Web sites and organizations).

Background

“Site search” means indexing a public facing Web site and making the content findable. For the Beyond Search information service, we have two systems at work so we can point out the differences to those who ask us for our professional opinion. The search box visible at the top of the page runs a visitor’s query against an index of the content in the ArnoldIT.com domain. When you entire a query for “mysteries of online”, this is the results list:

mysteries screeenshot

When you scroll below the picture of our engineers in action, you will see a second search box labeled “Google Customer Search.” This is a variant of Google’s site search service. When you run the query “mysteries of online”, you get this results list:

mysteries of online google

The Google index follows the links within the content of Beyond Search so you get a broader results set.

Which is better? There is no better in site search. One can find the answer to one’s question or not. We use two systems to show that in some cases a narrow result set will have higher precision and recall. In other situations, one trades off precision for a broader recall.

Swiftype’s Play

The company has rolled out a Web site search service and an application programming interface for developers. You can view a demonstration of the service at www.swiftype.com. Set up is easy and features:

  • Auto complete
  • Generates a code snippet to put the search system in a Web page
  • Indexing is performed “immediately”
  • Analytics to show most popular queries

The company offers a quick start guide, information about the REST API, and information about crawler meta tags. The meta tags allow the developer to direct the crawler to index a site in a specific manner.

The company has been funded by YCombinator and is located in San Francisco. The service is now in public beta and is free. The fee for the service, when it exits beta, will be based on the amount of API traffic it generates.

Y Combinator-Backed Swiftype Builds Site Search That Doesn’t Suck” provides a positive review of the beta service. The article asserts:

Among other things, Swiftype is supposedly easy to integrate with Tumblr — our own MG Siegler has added it to his blog ParisLemon. In other words, there’s virtually no technical work required from the publisher — something else that distinguishes Swiftype from the various other search products and open source libraries out there. At same time, companies who want a little more control can access Swiftype through its APIs.

Our view is that search in general presents a number of challenges. Site search is one subset of a broader information retrieval issue. For site search, we think that Swiftype deserves a look and a head-to-head comparison with other services. Unfortunately, after 50 years of innovation in search and retrieval, there is room for improvement in findability. Give the Swiftype system a whirl.

Stephen E Arnold, May 9, 2012

Sponsored by IKANOW

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta