Blossom Search for Web Logs
September 5, 2008
Over the summer, several people have inquired about the search system I use for my WordPress Web log. Well, it’s not the default WordPress engine. Since I wrote the first edition of Enterprise Search Report (CMSWatch.com), I have had developers providing me with search and content processing technology. We’ve tested more than 50 search systems in the last year alone. After quite a bit of testing, I decided upon the Blossom Software search engine. This system received high marks in my reports about search and content processing. You can learn more about the Blossom system by navigating to www.blossom.com. Founded by a former Bell Laboratories’ scientist, Dr. Alan Feuer, Blossom search works quickly and unobtrusively to index content of Web sites, behind-the-firewall, and hybrid collections.
You can try the system by navigating to the home page for this Web log here and entering the search phrase in quotes “search imperative” and you will get this result:
When you run this query, you will see that the search terms are highlighted in red. The bound phrase is easily spotted. The key words in context snippet makes it easy to determine if I want to read the full article or just the extract.
Most Web log content baffles some search engines. For example, recent posts may not appear. The reason is that the index updating cycle is sluggish. Blossom indexes my Web site on a daily basis, but you can specify the update cycle appropriate to your users’ needs and your content. I update the site at midnight of each day, so a daily update allows me to find the most recent posts when I arrive at my desk in the morning.
The data management system for WordPress is a bit tricky. Our tests of various search engines identified three issues that came up when third-party systems were launched at my WordPress Web log:
- Some older posts were not indexed. The issue appeared to be the way in which WordPress handles the older material within its data management system.
- Certain posts could not be located. The posts were indexed, but the default OR for phrase searching displayed too many results. With more than 700 posts on this site, the precision of the query processing system was not too helpful to me.
- Current posts were not indexed. Our tests revealed several issues. The content was indexed, but the indexes did not refresh. The cause appeared to be a result of the traffic to the site. Another likely issue was WordPress’ native data management set up.
As we worked on figuring out search for Web logs, two other issues became evident. First, redundant hits (since there are multiple paths to the same content) as well as incorrect time stamps (since all of the content is generated dynamically). Blossom has figured out a way to make sense of the dates in Web log posts, a good thing from my point of view.
The Blossom engine operates for my Web log as a cloud service; that is, there is no on premises installation of the Blossom system. An on premises system is available. My preference is to have the search and query processing handled by Blossom in its data centers. These deliver low latency response and feature fail over, redundancy, and distributed processing.
The glitches we identified to Blossom proved to be no big deal for Dr. Feuer. He made adjustments to the Blossom crawler to finesse the issues with WordPress’ data management system. The indexing cycle does not choke my available bandwidth. The indexing process is light weight and has not made a significant impact on my bandwidth usage. In fact, traffic to the Web log continues to rise, and the Blossom demand for bandwidth has remained constant.
We have implemented this system on a site run by a former intelligence officer, which is not publicly accessible. The reason I mention this is that some cloud based search systems cannot conform to the security requirements of Web sites with classified content and their log in and authentication procedures.
The ArnoldIT.com site, which is the place for my presentations and occasional writings, is also indexed and search with the Blossom engine. You can try some queries at http://www.arnoldit.com/sitemap.html. Keep in mind that the material on this Web site may be lengthy. ArnoldIT.com is an archive and digital brochure for my consulting services. Several of my books, which are now out of print, are available on this Web site as well.
Pricing for the Blossom service starts at about $10 per month. If you want to use the Blossom system for enterprise search, a custom price quote will be provided by Dr. Feuer.
If you want to use the Blossom hosted search system on your Web site, for your Web log, or your organization, you can contact either me or Dr. Alan Feuer by emailing or phoning:
- Stephen Arnold seaky2000 at yahoo dot com or 502 228 1966.
- Dr. Alan Feuer arf at blossom dot com
Dr. Feuer has posted a landing page for readers of “Beyond Search”. If you sign up for the Blossom.com Web log search service, “Beyond Search” gets a modest commission. We use this money to buy bunny rabbit ears and paté. I like my logo, but I love my paté.
Click here for the Web log search order form landing page.
If you mention Beyond Search, a discount applies to bloggers who sign up for the Blossom service. A happy quack to the folks at Blossom.com for an excellent, reasonably priced, efficient search and retrieval system.
Stephen Arnold, September 5, 2008
Comments
One Response to “Blossom Search for Web Logs”
[…] response? The best place is absolutely free, some general information. You will also want to get some automated forex software like a gambler. Control of denial is you don’t have to do an […]