Taxodiary: At Last a Taxonomy News Service
August 3, 2010
I have tried to write about taxonomies, ontologies, and controlled term lists. I will be the first to admit that my approach has been to comment on the faux pundits, the so-called experts, and the azurini (self appointed experts in metatagging and indexing). The problem with the existing content flowing through the datasphere is that it is uninformed.
What makes commentary about tagging informed? Three attributes. First, I expect those who write about taxonomies to have built commercially-successful systems to manage terms lists and that those term lists are in wide use, conform to standards from ISO, ANSI, and similar outfits. Second, I expect those running the company to have broad experience in tagging for serious subjects, not the baloney that smacks of search engine optimization and snookering humans and algorithms with their alleged cleverness. Third, I expect the systems used to build taxonomies, manage classification schemes, and term lists to work; that is, a user can figure out how to get information out of a system relevant to his / her query.
Splash page for the Taxodiary news and information service.
How rare are these attributes?
Darned rare. When I worked on ABI/INFORM, Business Dateline, and the other database products, I relied on two people to guide my team and me. The first person is Betty Eddison, one of the leaders in indexing. May she rest in indexing heaven where SEO is confined to Hell. Betty was one of the founders of InMagic, a company on whose board I served for several years. Top notch. Care to argue? Get ready for a rumble, gentle reader.
The second person was Margie Hlava. Now Ms. Hlava, like Ms. Eddison, is one of the top guns in indexing. In fact, I would assert that she is on my yardstick either at the top or holds the top spot in this discipline. Please, keep in mind that her company Access Innovations and her partner Dr. Jay ven Eman are included in my reference to Ms. Hlava. How good is Ms. Hlava? Very good saith the goose.
Wikipedia as Open Source Search Reference
August 3, 2010
Short honk: It is commonplace to utilize Wikipedia for most research projects, but remember that Wikipedia is also a valuable reference point of information for open source coding developments and search engines. http://en.wikipedia.org/wiki/List_of_search_engines lists nearly twenty open source engine lists, and it includes a number of updates.
In typical Wikipedia style, all entries are conveniently organized in outline from, with each heading containing a URL for quick access. Entries are categorized by content, information type, model (includes open source, semantic, social, metasearch, visual, appliances, desktop engines, and usenet), ‘based on’ listings, and even defunct search engines.
When searching by model for open source search engines, you’ll find a wealth of info on Lucene, Nutch, Sphinx, and Zettair. Lucene’s article http://en.wikipedia.org/wiki/Lucene is reasonably well-researched, providing cross-references to other open source code. The Lucene Revolution conference in Boston, Mass. on October 7 and 8, 2010, is an excellent event if you are interested in open source search.
Brett Quinn, August 3, 2010
Comparison Highlights Lucene
August 3, 2010
Vik Singh has posted a thorough and impartial comparative analysis of selected search engines. Singh used his own testing code, and kept the playing field level by not changing any numerical tuning parameters. He summarizes by saying:
Based on these preliminary results and anecdotal information I’ve collected from the web and people in the field (with more emphasis on the latter), I would probably recommend Lucene (which is an IR library – use a wrapper platform like Solr w/ Nutch if you need all the search dressings like snippets, crawlers, servlets) for many vertical search indexing applications – especially if you need something that runs decently well out of the box (as that’s what I’m mainly evaluating here) and community support.
Lucene earned a perfect 5/5 for support–highest of all tested platforms. (You can download Lucene/Solr at Lucid Imagination.)
As an IT professional, you are always on the lookout for ways to cut costs, and you also know that software licenses aren’t getting any cheaper, particularly for popular pro-sumer products such as Photoshop and Dreamweaver. http://www.osalt.com hosts a treasure trove of free, high-quality open source alternatives designed to save you time and money and still deliver a first-rate final product. By choosing an open source product, the user obtains a number of advantages compared to commercial products. Besides the fact that open source is always available for free, it is a transparent application, in that you are invited exclusively behind the scenes to view all source code and thereby to suggest improvements to the product. Furthermore, every product is covered by a large dedicated network, or community, who is more than willing to answer any questions you may have. http://www.osalt.com is definitely worth bookmarking.
Brett Quinn, August 3, 2010
Yahoo Japan Acts Googley
August 3, 2010
Yahoo Japan has decided to give the search engine nod to Google. On a slow news day, the snubbing of Microsoft caught our attention in Harrod’s Creek. The war of words has erupted. Concerned over a monopoly, Microsoft lead attorney Brad Smith called the decision anti-competitive and has compared it with a failed deal earlier within the U.S. Yahoo market place. Google has replied, saying it was due to good old fashioned fair business practices. “Google Outdeals Microsoft in Japan” said:
Under the terms of the new alliance, Yahoo will use Google’s search and advertising platform technology to power its site, matching Google’s superior tech with its own, highly popular content portals. In Japan, Google hasn’t quite enjoyed the success it has elsewhere around the world, trailing Yahoo in search dominance. This new deal makes it the cock of the walk; according to the New York Times, Google and Yahoo together comprise 90.5 percent of the Japanese search market. (If you’re wondering why Yahoo would cut against Microsoft like this, the answer is that Yahoo is actually a minority owner in its own Japanese property; the biggest shareholder is the cell phone company SoftBank.)
Being able to provide Yahoo Japan with the necessary search engine technology will only broaden Google’s reach throughout the world. Already the largest search engine available, this is due in part to ad revenues associated with key word searches. These ads are projected to have a 27 percent increase over the next three years in Japan alone.
With Google expanding into yet another region, Yahoo Japan is poised to shift into high gear. Will this be a long, drawn out fight among the competitors? Yep. How much did this deal cost Google? How much will this deal cost Microsoft? No data yet.
Glenn Black, August 3, 2010
Exclusive Interview: Bill McQuaide, Black Duck Software
August 3, 2010
The Lucene Revolution is 10 weeks away. October 7th and 8th, 2010, to be exact. One of the companies participating is Black Duck Software. The company’s Open Source Resource Center http://www.blackducksoftware.com/oss has been of considerably utility to me in my work on the Lucene/Solr centric conference AS HAS its code search Web site koders.com. Black Duck also offers a free version of its enterprise code search product, Code Sight. The company has a wide range of for-fee software products. If you work with open source, you will find that the firm’s Black Duck Suite may be indispensible to you and your development team for managing and controlling the myriad open source components in your code base. I wanted to know more about this company, so I contacted Bill McQuaide, Black Duck’s Executive Vice President of Products and Strategy. Mr. McQuaide has 20 years of technology experience and executive leadership spanning engineering, product management, marketing and business development. He comes to Black Duck after spending 10 years with RSA Security during which time the company experienced rapid growth. RSA Security was acquired by EMC Corporation in 2006. He most recently served as a Senior Vice President of the Enterprise Solutions Group and Corporate Development. His experience includes four years at Hewlett-Packard, where he led the Product Management and Channel Development teams for the company’s Technical Systems Division. The full text of our interview appears below:
Bill McQuaide, Black Duck Software’s Executive Vice President of Products and Strategy.
My logo is a goose. Yours is a black duck. What’s Black Duck Software?
There is a story behind the name of the company. It goes back to our founder, Doug Levin, who found and nursed a black duck back to health when he was seven. He raised the duck as a pet. Doug decided to name the company Black Duck Software. Of course black ducks are reputed to be very intelligent animals. They are hard to decoy and very aware of their surroundings.
A goose is definitely less intelligent than a duck.
Maybe, but we see the black duck as a metaphor for how we run the company. We pay attention to details. We are aware of the larger enterprise IT and open source communities in which we operate. And we work smart. We have smart people. We capture information about open source software projects being worked on by other smart people. And we design our products and services for smart companies.
You have an extensive background in commercial software? RSA, HP, and Data General, don’t you?
That’s right.
So why open source? That’s a radical departure, isn’t it?
It was a natural move. We needed to collect data about open source projects for our business. We help development organizations manage and control open source and other externally sourced software, and, of course, we use open source in our products.
Lucene/Solr?
Yes. I think it is pretty clear that open source software is a huge source of innovation. We track more than 250,000 open source projects from thousands of Internet sites, and we maintain a free code search engine, Koders.com, which tens of thousands of developers use every day to search for open source software and other Web-downloadable code.
Koders contains billions of lines of code written in over 30 languages, code that’s available under at least 28 different OSS software licenses. Koders is a powerful tool for developers looking for reusable open source code, methods, examples, algorithms and more. Code re-use makes sense – why re-invent the wheel? And it makes developers more efficient, and lets them focus on more the interesting aspects of software development, the innovation their companies need to succeed.
What have been the payoffs from this business decision?
Our involvement with open source has many benefits. We build better products, our time to market and innovation are improved, and we are more in tune with where the industry is moving – what types of software are leading innovation (mobile, health care information technology, and cloud solutions, for example.)
Any special challenges?
There are those who think Black Duck sells fear chiefly because our products help companies identify OSS in their code bases. What we really do is help our customers do the right thing. We believe we do more to help corporate users of OSS comply with the license obligations than any other organization. Most people want to do the right thing, and that’s not about fear, uncertainty, and doubt. It is being pragmatic. Our products ensure customers know what’s in their code, and automate the management of OSS use, so companies can ensure they are complying with the terms of the various open source software licenses in their code. We “design in” compliance making it easier for companies to innovate using open source.
You have a number of high profile clients? Are you able to characterize a typical use case?
Absolutely. How about search?
That would be great.
Search is really about delivering on intent, as one expert has observed. Koders.com is a perfect example of open source search in action. Say a developer is working on an application. He or she needs a compiler, but why write yet another compiler? It’s potentially a waste of time and a distraction away from more critical work. The developer’s intent is to find a compiler that meets a set of criteria. Koders delivers by searching for a complier and returning results that meet the developer’s needs.
Search also has to be fast, return relevant results, be scalable, easy to use, and have access to many sources of information, in many formats. Lucene was originally developed to search text; the Apache Solr project extends Lucene with faceted search; other OSS search engines search for images, crawl, parse, or search within the firewall (like Black Duck Code Sight.)
I want you to look in your crystal ball for a moment. What do you envision as future uses of open source search at your company going forward?
My crystal ball is usually blurry.
Mine doesn’t work at all. What’s ahead in your opinion?
We have just released Black Duck Code Sight, a search engine that indexes code across an organization’s source code repositories to enable fast search and navigation at enterprise scale. Code Sight (which is available as a free download at http://blackducksoftware.com/code-sight/download or as an enterprise-scale offering), improves developer productivity and software quality, supports standardization, and enhances compliance. Developers can find code quickly and leverage existing and approved components within their code bases whether they are internally written, open source, or come from other external sources.
We think the Black Duck Suite provides the most comprehensive search capability possible for developers. Inside the company they can search all internal repositories and, catalogs. Across the Internet, they can search all known OSS projects and code search sites, searching source code and project information (metadata).
Search Improves the development process, because developers find what they need fast – they can search the Internet or do an internal search to quickly locate a project or file that contains the desired code; they can debug faster, because search makes it easy to find the same code fragments; they can find security problems like “password=”; and search helps breaks down information silos. Version proliferation can be controlled, because you can find duplicate bugs in branched code.
In your opinion, what makes open source a viable option for the enterprise or development world?
Open source is about freedom to use the best code to solve the problem at hand, and the power that comes from having a large community of coders supporting you. For enterprises, OSS means increased development productivity and velocity, and frees developers up to work on projects that are differentiating and valuable. As part of a multi-source development model – where OSS is combined with other externally sourced code and home-grown code – it’s hard to beat on a cost, time-to-solution or innovation level.
One final question, if I may? Where can a reader get more information about your firm?
Check out http://blackducksoftware.com <http://blackducksoftware.com/> . Make sure to visit the Open Source Resource Center, look into Code Sight, and don’t forget the Koders.com search site.
Stephen E Arnold, August 3, 2010
Freebie
Azurini Lock In Analysis Baffles the Goose
August 3, 2010
I know, I know. Consulting firms have to be “real” and “objective” and “mavenesque.” I accept that. But the write up “Burton Group: Avoid Office 2010 Lock-In, Stick with Office 2007” wowed me. Microsoft buys lots of consulting, research, and advice. As a result, those who want to get jobs with the Redmond fun lovers often find a way to put a honey colored light on almost any product, service, or initiative. How many raves did I read about Vista? How many times have I heard about the wonders of MSN, now Live something? How many times have I heard experts explain the impact of Microsoft’s mobile strategy, its search strategy, its social strategy, its cloud strategy, and other strategies. The addled goose sure does not generate $70 billion a year in revenue and Microsoft does. So, guess who is really smart? Time’s up. Microsoft.
But a consulting firm criticizing Microsoft albeit somewhat indirectly? That is amazing, and it means to me that maybe the fondness Microsoft once felt for Burton has faded. Maybe Burton no longer loves Microsoft? Maybe there are other forces in play? Who knows.
What is clear is that Burton suggests an organization that embraces Office 2010 may be a candidate for lock in. Lock in means that a vendor calls the shots, not the client. The only way to get free is to break out. In fact, that’s one of the appeals of open source software. An organization using open source software believes it has more freedom than when chained to a giant SharePoint installation, an even bigger Microsoft Exchange construct, and the 40 other servers that Microsoft has on offer.
My view is that Microsoft is not the only enterprise software vendor looking to get shelf space and then become a monoculture in a client organization. Does IBM seek to monopolize hardware, software, and services? In my experience, you better understand the way Big Blue operates before your local IBM vice president gets a temporary office down the hall from your company’s president. Same with the Google.
So what strikes me as interesting is not the lock in angle. That’s old news. The criticism of a big outfit like Microsoft has caught my attention. Is one of the azurini changing colors?
Stephen E Arnold, August 3, 2010
Freebie
IBM OmniFind 9.1: Trouble for Some Search Partners?
August 2, 2010
IBM has embraced open source. Now before you wade through the links for the new IBM OmniFind 9.1 search system, let me own up to a previous error. I did not believe that IBM would do much to make open source search a key part of the firm’s software strategy. I was wrong. IBM did or people like Mike McCandless did. Second, the decision to use Lucene and wrap IBM’s product strategy and pricing around it pretty much means that some of IBM’s favored enterprise search vendors are going to find themselves sitting home when IBM makes certain sales calls. Third, the IBM pricing strategy does not mean that enterprise search IBM-style is free. The idea is that IBM will be able to chase after Microsoft without the legacy of the $1.3 billion investment in Fast Search & Transfer, the legal and police muddle, and the mind boggling task of converting Fast into the broader vista of SharePoint. (Do you think my reference to “vista” evokes the Windows 7 predecessor? Silly you.)
Here’s what we have based on my poking around.
You get to license connectors. These puppies will be saddled with IBM pricing elements. This means that it will be tough for a customer to compare what he/she paid with what another customer paid. Bad for competitors too, but that’s a secondary issue compared to generating revenue. Run a query for part number BFG04CML. The adapters work with the UIMA standard.
You get to pay for the multi language option. Same pricing deal as connectors.
There is an email search component. which is available as “IBM OmniFind Personal E-Mail Search or IOPES. This works with Lotus Notes and Microsoft Outlook. IBM sales engineers may be able to bundle up the bits and pieces needed to stop outfits like the not well known Isys Search Software outfit from Australia from selling search to a Lotus Notes’ customer.
The security model reminds me of Oracle’s SES11g approach. You get a system and then get to buy components. Same pricing model again.
You can license a classification model. Same pricing mechanism.
If you already have an OmniFind search installation, you have to reindex after working through the update procedure. That sequence is too complex for a blog post, and if anyone wants a summary, I charge for it. The darned method was not particularly easy to locate on the IBM Web site. Sorry, I run a business.
You can still handle collections, but you have to set these up via the administrative interface or the configuration files.
If you have a bunch of IBM servers running OmniFind, you have to update each one in the search system. Have fun.
There is a Web crawler available, and I think our test showed that it called itself UFOcrawler.
For more information about OmniFind 9.1 click this link. Be patient. The new color is green which evokes the cost of the add ons and components. Nevertheless, bad news for some commercial search and content processing vendors accustomed to IBM’s throwing them bones. IBM is now eating those bones in my opinion. The sauce is open source. Tasty too.
Stephen E Arnold, July 30, 2010
Lockheed Drops Open Source Bomb
August 2, 2010
Used to creating explosions itself, one company is helping make a big bang in the open source world. Lockheed Martin, America’s biggest defense contractor, recently unveiled its own company-based open source social media program, Eureka Streams. eWeek broke the story in its article, “Lockheed Martin Launches Open-Source Social Networking Project,” Lockheed built Eureka Streams, hoping to create something that, “represents a new communication experience for knowledge workers, empowering them to pick and choose the channels of news, information and conversation.” The most impressive part of the entire project is its use of open source technology: “our framework is written in Java and is based on open source projects such as GWT, Hibernate, Spring, and Lucene, to name a few,” one designer said. Seeing the big boys embrace open source technology is a major bump in visibility and vitality for these explosive programs.
Pat Roland, August 2, 2010
Freebie
Storm Warnings for OneRiot?
August 2, 2010
Short honk: Search and content processing vendors face a tough market. Some outfits have figured out how to make money; two examples are Autonomy and Exalead. Google is an ad agency and not knocking the socks off the enterprise search crowd. Microsoft is stuffing search into everything in hopes of getting CALs, so its efforts don’t line up with what other outfits do. IBM is in the open source search hoard. Real time search outfits like OneRiot.com had an angle, but if “Layoffs, Reshuffle at OneRiot” is on the money, OneRiot.com may be struggling. I like OneRiot.com. I don’t want to put it on the list with former search engines along with Convera, Delphes, and Entopia. Here’s hoping, but there are four outfits in Europe hanging by a thread. Tough times the 2010s.
Stephen E Arnold, August 2, 2010
Freebie
Arnold / Oldham Podcast on Process Monitoring
August 2, 2010
Dr. Tyra Oldham, president of LAND CC, an engineering services firm, spoke with Stephen E Arnold in an ArnoldIT.com podcast about process monitoring. The topics covered included manufacturing, business, and software processes. The need for monitoring in real time is going up because the cost of a failure can be catastrophic.
Dr. Tyra Oldham, founder and president of LAND Construct. Dr. Oldham holds an MBA with a focus on information technology management.
Dr. Oldham and Stephen Arnold discuss these ideas and touch upon the innovative software available from IGear, a company that is redefining monitoring for production and manufacturing operations. You can listen to the podcast via the ArnoldIT.com Podcast page at http://arnoldit.com/podcasts/. The program runs 15 minutes. Information about Dr. Oldham is here.
Ken Toth, August 2, 2010
Sponsored post