CMS, Search, and Signal Flares
March 14, 2010
In my first Internet monograph—Internet 2000: The Path to the Total Network, published in 1994 by Infonortics Ltd., Tetbury, Glos—I discussed the challenges Web information posed. One of my points was:
The Internet is different from, print, video, or facsimile because it can incorporate elements of each medium in real time.
Content management systems focused on making HTML Web pages and making it possible for non programmers to create a page, ftp it to a server, and handle the various scripting issues that arose from the hackathon that HTML triggered. Since I wrote that sentence in 1994, the point-and-click browser model has supplanted other types of computer interfaces. The simplicity for the user insulates the user from the complexity beneath the surface. Even the phrase “code behind” baffles most Internet users with whom I speak.
Content management vendors have responded in one or more ways:
- Some have stuck to the “it’s really easy” method. When the customer discovers that CMS is not easy, the vendor moves on to a new town. This 19th century frontier entrepreneurship works as long as there is a “new” next town. But as Americans have learned, once one hits that Manifest Destiny barrier, life gets tougher.
- Some CMS wizards have tried to beef up their CMS to handle the increasingly complex features and functions. These systems work * when * the client has enough money, computing expertise, and stamina to see the job through. Not surprisingly, once a six or seven figure job is done, no one is too eager to reengineer the system to handle the “next big thing.” So the system just keeps doing what it is doing until the company does a rip-and-replace, which is another six or seven figure job. When these jobs go off the rails, then litigation often results.
- Some CMS vendors shift gears and become something that is more narrowly defined. Examples range from customer support content management to certain types of eDiscovery work. The idea is that replacing those glittering generalities with more narrowly defined functions makes it possible for the company to survive or successfully sell itself.
- Go open source and hope that the halo about “community” puts Neosporin on the infected wounds of what was originally code written for a single client and then boldly marketed as a “solution”.
- Mix and match.
When I read “Latest MySource Matrix Release Includes Funnelback Search Integration for Superior Search Capability”, I thought about the long journey that CMS vendors have traveled since making and managing Web pages became the equivalent of the Oklahoma Land Rush for some 19th century type entrepreneurs.
Funnelback is a search engine that is now part of Squiz, “a supported open source solutions company”. Funnelback is a search and retrieval system that was nurtured in an Australian university and research Petri dish.
The key point in the write up was:
MySource Matrix has been integrated with purpose-built Funnelback binaries incorporating powerful features for improving search results such as Contextual Navigation, Featured Pages, Type Formats and spelling suggestions. The Funnelback Search Page asset has been expanded to make it easy to implement these features. Scripts are available within the Funnelback package which can be configured to update the index, giving the administrator control over the frequency with which the indexer is run, according to the amount of content being indexed and its dynamic requirements.
My take on this is that the open source CMS created a situation in which some users were not able to locate content. The addition of search as a utility bolsters the CMS. My hunch is that CMS is morphing into a “portal” or “platform” play. Will this make users happy? I don’t know. The recent work we have done suggests that users cannot articulate what they want or need when it comes to content creation and management.
I am delighted that search is being added to a CMS. I am not confident that search alone can address the many hurdles that a CMS must jump over. Most people are not in the content producing business or are most CMS users programmers. Software that tries to facilitate both processes in a world that is shifting to rich media has a big job to tackle.
CMS is, in my opinion, increasingly a problem. Consultants are reinventing themselves. Roll ups are taking place. Open source solutions are proliferating. In short, CMS is and is likely to continue to be a black eye in the enterprise software sector.
Stephen E Arnold, March 14, 2010
No one paid me to write about content management systems. I will report non payment to the GSA, which has a heck of a content management system.
Lucid Hits $16 Million in Funding
March 13, 2010
Short honk: I saw an item in the San Jose business journal about Lucid Imagination’s Series B funding. The story “Open Source Search Startup Lucid Imagination Raises $10M” said:
[The] new investor Shasta Ventures of Menlo Park was joined by existing San Francisco-based investors Granite Ventures and Walden International.
Strong interest in open source search contributed to the funding I believe.
Stephen E Arnold, March 13, 2010
No one paid me to write this meaty, fact filled news item. Because I reference open source, I will report non payment for the article to the White House where “open” is a key notion.
Open Source Tactics
March 7, 2010
Open source software has some teeth. In the enterprise search sector, Lemur Consulting continues to gain ground. Most recently, the firm’s wizards have been providing substantive comments about the role of semantic methods in content processing.
The story “iPhone Lessons from Google’s Nexus One” adds another factoid to my open source note card. The main idea is that Google’s open source play with the Android operating system, while not a slam dunk, but it is in the words of the article “a really good device.” The article points out that the Nexus One underscores the challenge Apple faces; for example, improving the screen resolution, giving the user a more flexible “home screen”, notifications (real time info), multitasking, a combined inbox with multiple email accounts, and similar tweaks.
I liked the article and I realized that with Android available as open source and to date Google’s less mom-like approach to developer and applications, Google’s use of open source play may put some worm’s into Apple’s core.
The kidney punch thrown by Google at Apple is reMail, a mobile search tool. Google has made that software open source. Mobile search is another weakness for the iPhone, and here comes Google with another open source thrust. See “reMail Is Now Open Source.” But the killer play is what Google calls device seeding. Yep, become an Adroid developer and get an Adroid phone for “free”. See “Google’s Device Seeding Program Underway—Free Phones Heading Out to Developers Soon.”
Stephen E Arnold, March 7, 2010
No one paid me to write this. Since I mention open source, I will disclose non payment to 16000 Pennsylvania, where open source has some supporters.
The Sun Case Study: Pertinent to Google?
March 4, 2010
I read the Sun case study “How Sun’s Need to Control the Code Cost Them the Company” and wondered if the information pertains to Oracle and to a lesser extent to Google. The author, Jeremy Allison, is a Googler who used to work at Sun. In addition to Mr. Allison, Eric Schmidt is a former Sun Microsystems employee. I once heard that Sun contributed quite a few employees to Google, but I don’t have any hard numbers. (If anyone does, please, use the comments section of this blog to share them.)
You will want to read the quite good write up because it combines useful business insights with a touch of humor. Rare in the world of ZDNet blog posts in my opinion. As I worked through the case study, I kept wondering “Why now?” and “For whom is this written?” I have to tell you I kept thinking about Oracle and Google even through Mr. Allison was writing his opinions and made clear that his views were not those of Google’s.
Fair enough.
First, I noticed that the discussion of the proprietary Sparc chip was a dog compared to Intel’s CPUs. When I read this, I realized that Oracle is requiring that each salesperson sell one of the big, honking servers along with the standard quota. If Oracle keeps the Sparc chip, Oracle is going to find itself selling systems that may be hard pressed to deliver the performance customers demand. Even worse, if an Oracle customer goes off the reservation, that customer may discover that commodity boxes and non relational databases are the cure for those performance woes. That would be bad news for Oracle in my opinion.
Second, I twitched my pinfeathers when I read the discussion of Sun and open source code. You must read this sequence including the “Have you ever kissed a girl?” email thread. In this exchange I see a warning (gentle, of course) about what problems Oracle may face in the open source community and how a company like Microsoft might find itself in a heap of trouble for a more savvy open source outfit. Could Google be more clever about “open”? Mr. Allison does not explore this idea, but I have a hunch that the lessons of Sun apply to both Microsoft and Oracle.
Finally, if you did not fall asleep in one of those required literature classes in college, you may be able to read even more between the words and the lines in this essay. Fascinating.
Stephen E Arnold, March 4, 2010
No one paid me to write this opinion as a blog post. I know I have to report whether my blog posts are commercials or the random thoughts of an addled goose. Ah, random. I will report this to the Clerk of the Supreme Court, an institution whose judgments are never random even though I see some of them that way.
Open Source: Magic or Dirty Carpet?
February 25, 2010
I have to give the Guardian a pat on the back. I try to ignore open source, and my feedreaders keeps routing me open source articles. I ignore most of them. The Google-spider food headline, “When Using Open Source Makes You an Enemy of the State”, arrested me (no pun intended). The main idea is that copyright and intellectual property has another mini-storm front brewing. The key passage pivots on a person named Andres Guadamuz, a law professor in Scotland. The Guardian reported:
Guadamuz has done some digging and discovered that an influential lobby group is asking the US government to basically consider open source as the equivalent of piracy – or even worse.
You can read the original article to get the scoop. In a nutshell, legal eagles in the US (home of the sticky tort with spaghetti noodles) wants to make life tough for open source. The addled goose has not figured out the “secret” ACTA treaty and now he is nervous about open source.
I am using Windows to write this post, and I think I will watch this issue. My thought is that life must have been simpler in the 4th century.
Stephen E Arnold, February 25, 2010
No one paid me to write this. Since I reference the era cheerfully tagged “Dark Ages,” I will report non payment for my work to the National Archives. Whoops. The US only goes back a couple of centuries. Well, shucks.
Search and the Open Source Card
February 22, 2010
A happy quack to the reader who sent me a link to Michael Tiemann’s “How Open Source Software Can Save the ICT Industry One Trillion Dollars per Year.” You can find the seven page document at http://regmedia.co.uk/2010/02/18/tiemann_cost_of_development_paper.pdf. When the paper was written in the fall of 2009, Michael Tiemann was the President Open Source Initiative and Vice President Open Source Affairs, at Red Hat. This firm is one of the highest profile commercial enterprises to have built a business on open source software. You can get the rosy financial news by searching Google Finance for RHT.
When I read the paper, I found myself in general agreement. But Red Hat is in the operating system and middleware business. For companies eager to chop down the license fees charged by commercial software companies, Red Hat’s approach is a must-have play.
My interest is search and content processing, and I think that many organizations are struggling to define search. If the news flowing from companies like Lemur Consulting and Lucid Imagination is accurate, some commercial search vendors no longer get a chance to compete. The outfits happy with Red Hat, JBoss, and other open source software are likely to hop on the Lucene / Solr bandwagon.
You can get a very upbeat picture of the benefits of open source software in Mr. Tiemann’s white paper. So if you want to make a case to go open source, you will want to download the document and tuck it in your “Sources” file.
There are some interesting “factoids” in the paper; for instance:
- A reminder that most commercial software installations end up as train wrecks.
- Costs and unnecessary expenses continue to escalate for organizations relying on commercial software
- Proprietary software inhibits innovation.
But what about search?
Let me identify what I think is an interesting trend regarding open source and commercial vendors of search and content processing systems.
First, I have noted that one company has cut a deal with a commercial enterprise to make “connectors” available to the open source licensees. Connectors are the code widgets that allow one type of content such as Lotus Notes email to be indexed by a third-party system such as Lucene. This merging of commercial and open source suggests to me that for certain types of software, the open source community does not provide what many organizations need. After all, what good is a search system if it cannot index information in a widely used email system like Lotus Notes? I am not suggesting that the rosy picture painted my Mr. Tiemann is incorrect, but I think this is an interesting open source gap. Perhaps it will be filled by Red Hat?
Second, a number of high profile companies are offering open source operating systems. One notable example is a large search vendor’s operating system for mobile devices. If I were a struggling mobile company, I would certainly look closely at an open source, no-fee operating system. One would think that such a mobile operating system would sweep through the telecommunications industry like wildfire. What I learned last week was that Motorola was giving the for fee Windows 7 Mobile a very close look. Why? If the open source mobile operating system has a fraction of the payoffs referenced in Mr. Tiemann’s essay, why hook up with a very proprietary outfit like Microsoft? What does Motorola know that I don’t know?
Third, a number of vendors are talking about such Frankencode approaches as “support for open systems”, “full embrace of standards,” and “our APIs are open”. What do these phrases mean? On the surface, these vendors of proprietary systems seem to be leading me down the open source path. However, are these vendors using language in a way to lure the red fox to the steel trap?
Fourth, a very large outfit has figured out how to run Linux on its mainframes. What’s the purpose of this technical cartwheel? If I “buy” a mainframe, won’t the margins be sustained by boosting the price of those funny little connectors that mainframes use to hold drives in the DASD or the truly weird cables needed to hook certified gizmo A to certified gizmo B?
My hunch is that open source is a significant trend in software. Some of the success of open source is driven by those who want to create software to hold down costs and operate in a manner that to some degree reduces the brutal costs associated with certain commercial software products.
I think there is a big marketing and PR play underway as well. The use of the phrases “open source” and “support standards” sounds pretty good. Get the software into the company. When the organization’s boss figures out that the existing tech staff cannot make the open source software work as everyone believed it would, then the consulting engineers are ready to pounce.
My view is that one needs to bring the same discipline to defining requirements, testing software, and performing financial analyses regardless of the software type. This means that commercial and open source adherents will have to prove that their products and services can stand and deliver.
Without that discipline, “open source” is little more than a buzzword like “social media.”
Stephen E Arnold, February 22, 2010
No one paid me to write about open source. Because open source is “free” and I was not compensated, I am at a loss to know to whom to report my financial lapse. Maybe the Department of Treasury is the outfit in charge? Treasury knows money or at least how to print it I believe.
A Free Pass for Open Source Search?
February 11, 2010
Dateline: Harrod’s Creek, February 11, 2010
I read Gavin Clarke’s “Microsoft Drops Open Source Birthday Gift with Fast Lucidly Imaginative?” I think that the point of the story was “a free pass” to “open source search providers like Lucid Imagination” is interesting. However, I am not willing to accept “free pass”, a variant of the “free lunch” in my opinion.
Here’s my view from the pleasant clime of snowy Harrod’s Creek.
First, in my opinion, most of the Fast Search & Transfer licensees bought into the “one size fits all” approach to search: facets, reports, access to structured and unstructured data, etc. As many of these licensees discovered, the cost of making Fast’s search technology deliver on the marketing PowerPoints was high. Furthermore, some like me learned how difficult it was for certain licensees to get the moving parts in sync quickly. Fast ESP consisted, prior to the Microsoft buy out, of keyword search, semantics from a team in Germany, third-party magic from companies like Lexalytics, home brew code from Norwegian wizards, and outright acquisitions for publishing and content management functionality. Wisely, many search vendors have learned to steer clear of the path that Fast Search & Transfer chopped through the sales wilderness. This means that orphaned Fast Search licensees may be looking at procurements that narrow the scope of search and content processing systems. In fact, there are only a handful vendors who are now pitching the “kitchen sink” approach to search.
Source: http://www.graceforlife.com/uploaded_images/no_free_lunch-772769.jpg
Second, open source search solutions are not created equal. Some are tool kits; others are ready-to-run systems. Lucid Imagination has a good public relations presence in certain places; for example, San Francisco. For those who monitor the search space, there are some other open source vendors that may provide some options. I particularly like the open source version of Lucene available from Tesuji.eu. Ah, never heard of the outfit, right? I also find the FLAX system available from Lemur Consulting useful as well. I think the issues with Fast Search & Transfer are not going to be resolved by ringing up a single vendor and saying, “We’re ready to go with your open source solution.” The more prudent approach is going to be understanding what the differences among various open source search solutions are and then determining if an organization’s specific requirements match up to one of these firms’ service offerings. Open source, therefore, requires some work and I don’t think a knee jerk reaction or a sweeping statement that the Microsoft announcement will deliver a “free pass” is accurate.
Lucene and Integrated Log Data
February 5, 2010
You may find “Into the Cloud: How Search Unlocks Log Metadata to Visualize Your Business Process” interesting if you are an open source technology maven. The idea is that different applications generate log files. When these log files are aggregated, the information that can be searched reveals insights about a business, customers, system issues, etc. The participants are Boomi and Lucid Imagination. Boomi is the “integration cloud company”. You can get more information at www.boomi. com. Lucid Imagination is the company that creates a build of Lucene and Solr that is current, complete, and ready to install. Lucid sells engineering services, and I have a hunch some services will be required to deliver unlocked log data.
After listening to the program, I had several questions:
First, the notion of integrating log files is a good one but I wondered how long it takes to suck big log files, determine deltas, and then update the indexes.
The second question pivots on the usefulness of search for log file analysis. In my experience, we have had to jump through hoops to concatenate certain query results, perform sub queries, and then crunch data. The bigger the log files, the more work these steps were.
Listen to the podcast. The idea is interesting, and I think the market uptake on this idea will be the proof of the pudding.
Stephen E Arnold, February 5, 2010
No one paid me to listen to the podcast or write this article. Too bad. I will report this failure to get paid to the Department of Labor. Too bad I am not a child. I could report myself for unfair practices.
Embedding Lucene
January 31, 2010
The goslings and I participated in a search conference call last week. One of the topics du jour is Lucene. The open source search system continues to fascinate certain government procurement teams and those looking for a low-cost way to provide users with a search-and-retrieval system. The enthusiasm for Lucene and Solr goes up as the age of the information technology professionals decreases. Whatever universities are putting in the Red Bull sold in computer science departments seems to trigger a Lucene / Solr craving.
In the course of the conversation, I mentioned embedding Lucene in commercial software. The advantages ranged from low cost to sidestepping the blow-back from customers. The blow back occurs when the users of software want a feature not in the OEM “stub” embedded in a system or gizmo. The fix is to buy the full version of the software. The “stub” is a good enough chunk of functionality, but it won’t do the fancy back flips some users want when looking for information.
© Scribovox 2009
Lucene can be extended as long as the outfit doing the embedding has some Lucene experts on staff or access to a consultant able to keep appointments, complete work on time and in budget, and writes code that works. The example I gave was the Lucene within Scribovox.com.
Scribovox is a software that performs such tricks as converting a podcast to text. You can get more information about the product at http://www.scribvox.com. The information I referenced came from a June 17, 2009 Scribovox design document called “Integration with Social Networks.” I found the information in this write up quite useful, and you can download a copy of the paper from this link.
The author of the paper is Patrick Nicholas. He discusses some interesting ideas; for example:
- Flow diagrams for processing real time content
- A useful architecture diagram
- A discussion of indexing and summarization
- Some information about Amazon EC2, MapReduce and Hadoop.
If you are serious about open source, I would tuck this document in your bag of tricks. The time estimation puts search and semantics into perspective. Useful for the azure chip crowd since most don’t have too much, if any, oil under their fingers from removing the fuel injection unit from a search system.
Stephen E Arnold, January 31, 2010
A freebie. No one paid me to write this. I will report this charitable act to the boss at the National Cathedral on Wisconsin Avenue, in Washington, DC.
Palantir Describes Lucene Searching with a Twist
January 27, 2010
If you do work in law enforcement, financial services, or intelligence (business or governmental), chances are high that you know about Palantir. The firm provides sophisticated data analysis and analytics tools for industrial-strength information jobs.
The company published in August 2009 and October 2009, a discussion of its approach to search and retrieval. I had occasion to update my file about Palantir technology, and I reviewed these two write ups. Both appeared in the Palantir Web log, and I thought that the information was relevant to some of the issues I am working on in 2010.
The first article is “Palantir: Search with a Twist (Part One: Memory Efficiency).” In that write up, the company points out that it uses the “venerable Java search engine Lucene.” Ah, open source, I thought. Palantir’s engineers encountered some limitations in Lucene and needed to work around these. The article explains that Palantir addressed Lucene’s approach to accumulating search results with a priority queue, streaming through results and inserting into the queue, and returning the set of results in the priority queue. The first article provides a useful summary of the Palantir method.
The second article is “Palantir: Search with a Twist (Part Two: Real-Time Indexing and Security).” This write up explains two approaches Palantir explored to deal with what the company calls “leaking information; namely that there’s data on this object that the user making the query is not privy to.” The write up says:
Given this problem, there are two approaches one can take: [1] Store all the information needed to decide which labels are visible to the user running the query and then use only the visible labels when calculating the relevance of a match. Note that is a pretty expensive operation. [2] Don’t use the length of match to compute relevance. Note that skipping a relevance calculation is, obviously, a very cheap thing do. Which do we do? Both.
I recommend that anyone wrestling with Lucene to take a look at these two articles. A third installment has been promised but I have not yet seen it.
Stephen E Arnold, January 27, 2010
A free search engine warrants a free post. No one paid me to write this. I will report this sad fact to the Department of Labor.


