Simplexo: Another Open Source Enterprise Search Platform

September 11, 2008

A satisfied reader alerted me to the Simplexo open source search announcement today. Simplexo offers its search system on the open source plan. If you use the engine, you can pay Simplexo to customize, support, and tune the system. The business model strikes me as quite similar to Lemur Consulting’s approach described here.

The article “Simplexo Launches Open Source Enterprise Search Platform” by Steve Evans appeared in CBROnline.com here. Simplexo offers a Butler Group (a Datamonitor Company) “audit” for download here.

According to Mr. Evans’ write up:

The software is capable of searching through unstructured data such as email, word processor documents, images, text files and spreadsheets, as well structured data including databases, payroll, HR systems, and SAP.

Mr. Evans identifies one interesting method used by Simplexo. He observed:

Simplexo Enterprise uses the indexing capabilities of databases and other legacy software and therefore does not need to index this data. It only indexes unstructured data, reducing the amount of resources taken up by search indexes.

The Butler “audit” noted that Simplexo opened for business in September 2008 and is a start up. The company generates revenue by supporting and customizing its open source search system. The product analysis seemed a bit sketchy, which is not surprising. The Butler “auditor” reported that the system can index two terabytes of data in about five hours. I urge you to download and read the Butler “audit”. I don’t want to recycle that firm’s information for a new company whose technology is unfamiliar to me. You can absorb the “audit” yourself and decide if the system is right for you.

More information about the company is available at www.simplexo.com or click here.

My view on open source search engines is that this is becoming a sector with a number of options for the organization interested in this approach. I mentioned Lemur Consulting. I have also written about Tesuji here. You can also take a look at my write up about Lucene here. These open source options are selective, not comprehensive.

I am neutral on open source search solutions. If you have the technical resources, open source can deliver excellent results. If you are not comfortable with open source, then you may be better served by running a try-before-you-buy analysis and then a bake off. Let your data collection guide you.

Stephen Arnold, September 11, 2008

The 451 Group’s SharePoint Data

September 10, 2008

A happy quack to the reader who called “Old News Department: Continued Growth for SharePoint” to my attention. “Too Much Information” is a Web log published by the 451 Group. (The number 451 echoes the science fiction story and reminds us of temperature at which paper ignites on earth under “normal” conditions. A book burning is, therefore, a 451 event.) You can read the original 451 “take” here. I quite like the “old news” angle. I’m a specialist in a number of “old” postings. The reader wanted my comment on this piece of data picked up from Microsoft via a news release cited in the “Old News Department: Continued Growth for SharePoint” article; to wit:

Microsoft claimed $800m in SharePoint revenue (in a press release) last year for fiscal 2007, so 30% growth puts 2008 revenue at $1.04 billion, 35% growth puts it at $1.08 billion.  The company also made a rather vague announcement in March the SharePoint Conference and via a press release that it had surpassed the $1 billion revenue mark.  At that point, we dug into it to find the $1 billion number was for the rolling twelve-month period.

The 451 Group pointed out that the numbers were mushy. In my experience, most numbers related to software company’s revenues and customers are indeed soft. I still don’t know the final numbers on the BearStearns’ fiasco, the Enron scam, or the US government’s budget for software licenses at the General Services Administration. Therefore, it’s a safe bet that SharePoint numbers will be squishy too.

Let’s assume, however, that SharePoint is a multi billion dollar product. Further, let’s accept the idea that there are upwards of 100 million SharePoint licenses in the wild. And, let’s embrace the notion that SharePoint is Microsoft’s next generation operating system. If these assumptions are correct within a range of plus or minus 20 percent, here’s my take on the growth of SharePoint:

  1. The incredibly wild and wacky world of content management is going to face a nuclear winter. Already discredited in many organizations, content management like key word enterprise search systems, don’t work, are disappointing to their users, and incredibly expensive to operate. SharePoint may not be the best cookie in the batch, but Microsoft is making it easy and economical to get SharePoint and “do” content management. Interwoven, Documentum, Ektron, and the rest of the CMS crowd will have to do some fancy dancing to keep their revenues flowing and stakeholders wearing happy faces.
  2. SharePoint itself is going to be a big consulting business. For the most part, SharePoint works when one doesn’t ask too much of the system. Two or three people can share and collaborate. The search function is pretty awful, but that can be fixed with a quick phone call to ISYS Search Software, an outfit whose software we just tested. Watch for this write up as a feature on September 15, 2008.
  3. The Microsoft ecosystem is going to follow the trajectory of the mainframe ecosystem or the Oracle database ecosystem. The environment will change but the micro climates will persist within organizations for a long time. Certified Professionals will fight tooth and nail to keep SharePoint and their jobs.

The net net on SharePoint for me is that software is following the consolidation route traveled by the auto companies. Chrysler, Ford, and GM are not competitive. These giants are suffering financial emphysema. Death can be postponed, but none of these companies will be doing much more than walking slowly to local convenience store to buy a microwaved burrito.

Users are going to be the losers. SharePoint is a complex system. It hogs resources. The scale up and out model becomes too costly for most organizations. I think of SharePoint as the digital equivalent of train travel in the US. Yes, one can do it, but the journey is filled with uncertainties. When the train breaks down, the passenger has little choice but wait until the repairs are made and the journey can resume. When the trip is over, passengers step off the Amtrak thankful to have arrived and eager to put the experience behind them.

And CMS? It won’t survive in its present form. Most CMS vendors will struggle to survive on the margin of the SharePoint ecosystem and have to fight off predators hungry for the customers CMS companies have been able to retain. The phrase “nasty, brutish, and short” comes to mind. Squish numbers of not, SharePoint is cat’s pajamas.

Stephen Arnold, September 10, 2008

Google: Employing Lawyers by the Score

September 10, 2008

A Walt Disney wizard is on the job for the US government. The issue, well stated by Jeff Jarvis, is “We Hate Success.” You can read his take on the latest and more threatening legal challenge to Google in its 10 year history here. The hook for his write up is the increasing heat directed at Google’s feet about “its growing dominance in advertising.” I agree with Mr. Jarvis when he wrote:

I’ve long argued that we do, indeed, need competition in the ad market but it’s not going to come from regulation. It’s going to come from getting off our asses and creating those competitors. I said that we need an open-source ad marketplace. Nobody’s heeded that advice.

I would like to add several comments about my perception and Mr. Jarvis’ regarding Google.

First, I do not believe that advertisers, telcos, and publishers understand what Google has built, what it is doing, or how the physics of Google operates in the cloud chamber businesses these business sectors try to enforce. Without understanding, there’s precious little hope of developing an effective, appropriate response to Google. These industries are watching Google’s digital beams punch holes in their semi closed business environments. Left to its own devices, Google will vaporize the “walls”. Then what?

Second, the competitors have watched Google through search colored glasses with some frou frou trim called online advertising. Competitors have assumed that with traffic, ad dollars would flow to them. So, the perception of what was needed to respond to Google was wrong in 1998 and it is wrong today–a decade later. That’s a pretty long time to misdiagnose a problem. At this point, there is no single company with sufficient resources to leap frog Google. I make this point in my 2007 study Google Version 2.0. That’s the reason that the billions spent by Microsoft haven’t d3elivered. The company is not investing enough. Meanwhile, Google keeps on widening its lead in technology, customer traffic, and advertising. You will have to read my study to find out who and what can get past the GOOG.

Third, local regulation won’t do the job. Where is Google selling advertising? In what country is the money booked? What is Google selling?

I am not sure that an auction, operated from servers somewhere in the cloud and arguably not in the US, and delivering a message on behalf of a person or company to an unknown user who happens to have an interest in a subject is going to be easy to limit. Google is not a local newspaper selling a fungible product with a guaranteed circulation, a physical product, and specific customers who are betting that the newspaper can catch the interest of a person wanting to buy a used car.

In short, lawyers will make a great deal of money chasing Google and its monopoly. The problem is that Google is a supra national entity dealing in zeros and ones. I was on the fringes of the AT&T break up in the late 1970s and early 1980s. That was trivial compared to dealing with the actions of individuals who do what ad agencies used to do by themselves on servers routing the work hither and yon.

The definition of terms will generate enough billable hours to keep a legion of attorneys gainfully employed for a long time. When a decision is reached, the question becomes, “Will Google be the same at that point in time as it was when the legal engine started running?” I don’t think so.

Stephen Arnold, September 10, 2008

Redshift: With Google It Depends From Where You Observe

September 10, 2008

My research suggests that opportunities, money, and customers are rushing toward Google. Competitors–like publishers–are trying to rush away, but the “gravitational pull” is too great. Traditional publishers don’t have the escape velocity to break away. What is this a redshift or a blueshift?

Dr. Greg Papadopoulos, Sun Microsystems wizard, gave a talk at the 2007 Analyst Summit (summit is an over used word in the conference universe in my opinion) called “Redshift: The Explosion of Massive Scale Systems.” I think much of the analysis is right on, but the notion of a “redshift” (not a misspelling) applies to rushing away from something, not rushing toward something. You can download a copy of this interesting presentation here. (Verified on September 9, 2008).

Dr. Papadopoulos referenced Google in this lecture in 2007. For the purposes of this post, I will think of his remarks as concerning Google. I’m a captive of my own narrow research. I think that’s why this presentation nagged at my mind for a year. Today, reading about hadron colliders and string theory, I realized that it depends on where one stands when observing Doppler effects. From my vantage point, I don’t think Google was a redshift. You can brush up on this notion by scanning the Wikipedia entry, which seems okay to me, but I am no theoretical physicist. I did work at a nuclear engineering firm, but I worked on goose feathers, not gluons and colors. From what I recall, when the object speeds away from the observer, you get the “red shift”. When the object rushes towards the observer, you get the blue shift. Redshift means the universe is expanding when one observes certain phenomena from earth. Blueshift means something is coming at you. Google is pretty darn blue to my eyes.

The Papadopoulos presentation contains a wealth of interesting and useful data. I am fighting the urge to cut, paste, borrow, and recycle. But there are three points that warrant a comment.

Read more

Google Chrome: What’s It Mean

September 10, 2008

Author’s Note: this post is speculation about the “meaning” of chrome.

Over the weekend, I spoke with a colleague who was interested in the metaphor behind Google’s choice of the word chrome as the name for the beta of the Google browser. There’s a firestorm of controversy raging over what that Google browser is. I want to steer clear of that discussion. I have written about Google’s technology elsewhere and concluded in 2005 that Google is now building applications for its infrastructure. The browser is just an application, which means that it is not “just” a browser.

Back to our conversation: chrome is an interesting choice. I argued that the meaning of “chrome” was a bright, shiny surface, tougher than the lower grade compound to which it is applied. I was thinking of the bumpers on my restored 1973 Grandville convertible, which gets an awesome five miles to the gallon.

The first metaphor, then, is a shiny, hard surface. Could Google Chrome make the innards of Google more attractive? If so, then, it follows that the surface would protect the underlying parts. Makes send to me. I think this “meaning” works quite well.

Chrome also is an alternative name for the Oxygene programming language. Based on Object Pascal, Chrome is adept at lambda expressions. Could the meaning of chrome be a reference to the functions of this specialized programming language. I think this is an outlier. More information about this language is at here.

Chrome carries the connotation of bright colors and hyper reality. The source for this interpretation is Kodak Kodachrome transparency film. John Evans, a professional photographer based in Pittsburgh, told me, “Kodachrome makes nuclear power plants look good.” Maybe? I do like the suggestion of heightening reality. Could Google Chrome heighten the reality of a browser experience.

Chrome is a fictional mutant character in Marvel Comics’ Universe. I often refer to Google as Googzilla. I must admit I have a predisposition to this “meaning” of chrome.

Chrome refers to music. There’s an XM Radio channel by that name, an album by Trace Adkins, who is popular in rural Kentucky, and a track Debbie Harry’s album Koo Koo.

What does this tell us? Not much I fear.

Stephen Arnold, September 14, 2008

Autonomy Marries the New Ma Bell

September 9, 2008

North American media outlets were choked with information about Google’s scanning newspapers. The news that AT&T inked a deal with Autonomy received modest coverage. What made it to my newsreader was AT&T’s decision to license Yahoo for mobile search.

What’s the Autonomy deal?

Based on the sketchy information I have, Autonomy licensed its search and content processing technology to AT&T. The deal is worth several million dollars and follows on another mega-deal with Home Box Office (HBO). As other search and content vendors thrash for sales, Autonomy continues its “mega deal” method. I expect that Autonomy’s share price will perk up a tad. Maybe Oracle or some other super platform should bite the bullet and buy Autonomy. Autonomy can close deals in a dismal economy; some vendors cannot.

Stephen Arnold, September 9, 2008

GooNews: Google Dooms Some Commercial Database Publishers

September 9, 2008

I have been mired in family business about 90 miles south of Chicago. I was unfortunately unable to add my two cents to the Web wave of comments about Google’s scanning newspapers. Anyone remember University Microfilms, the outfit that put newspapers on–yuck–microfilm?  Techmeme and Megite have dozens of posts about Google scanning newspapers, and I doubt that my telling you that Google is supplementing its book scanning activities will add much to your day.

My angle on this announcement by Google here is rotated about six degrees off the Web buzz.

First, you can kiss most commercial database publishers’ as great investments good bye. Customers are tired of paying through the nose for “real” databases. The idea is that Google makes “toy” databases. Wrong. Google is collecting information and making it available with a business model that allows searching for free. Google’s business model is a big earth mover grinding down traditional media. Most traditional media mavens hear crunching but have not connected the noise with the footfalls of the GOOG.

Second, you can ignore those Monday Night Football ads from Thomson Reuters. There were more buzz words about intelligent information and professionals than I could process. Advertising is not going to sell search queries that cost anywhere from $5 to $500 per query. Yes, $500. Fire up Derwent. Hunt for Google patents. Poke around for prior art and let me know how much you pay to search and save your results. Google Patents may not be perfect, but access is free. Ads kon Monday Night Football won’t sell searches on WestLaw–ever.

Third, the yip yap of competitors, advertisers, and Google critics won’t make a single iota of difference to what Google is doing. I have been documenting for clients and for readers of my monographs that Google is a supra national enterprise. So tell me, “Who is going to regulate Google?” One wealthy wizard screamed at me when I hinted that Google could fold its tent and move to another country without much downtime. When I suggested Russia and mentioned Mr. Brin’s interest in going into space, the wealthy wizard foamed at the mouth. I think he threw a pencil at me. If GooNews wipes out companies in the archived news business, to whom does one complain.

In short, GooNews is the start of a new era at Google. I dubbed the company Googzilla in 2005. No one paid much attention. Bet those folks at ProQuest and Newsbank are perking up now. Agree? Disagree? Help me learn. Just bring facts.

Stephen Arnold, September 9, 2008

First Search Mini-Profile: Stratify

September 9, 2008

Beyond Search has started its search and content processing mini-profile series.

The first profile is about Stratify, and you can read it here.

The goal is to publish each week a brief snapshot of selected search and content processing vendors. The format of each profile will be a short essay that covers the background of the system, its principal features, strengths, weaknesses, and an observation. The idea inspiring each profile is to create a basic summary. Each vendor is invited to post additional information, links, and updates. On a schedule yet to be determined, each mini-profile will be updated and the comments providing new information deleted. The system allows a reasonable trade off between editorial control and vendor supplements. We will try to adhere to the weekly schedule. Our “Search Wizards Speak” series has been well received, and we will add interviews, but the interest in profiles has been good. Remember. You don’t need to write me “off the record” or even worse call me to provide insights, updates, and emendations. Please, use the comments section for each profile. I have other work to do. I enjoy meeting new people via email and the phone, the volume of messages to me is rising rapidly. Enjoy the Stratify post. You will find the profiles under the “Profile” tab on the splash page for the Web log. I will post a short news item when a new profile becomes available. Each profile will be indexed with the key word “profile”.

Stephen Arnold, September

Stratify: Discovery System 3.x

September 9, 2008

by Nicholas E. Stover for Beyond Search

Basics

Stratify—formerly Purple Yogi—provides eDiscovery services intended to help attorneys minimize the resources needed to analyze and manage documents. The company is owned by Iron Mountain, a diversified records management company. More specifically, technology provided by Stratify aids in the search, retrieval, and management of information required for a legal matter. Founded in 1999, the company has offices in Mountain View, California, Boston, Massachusetts, and Bangalore, India. Stratify can found on the Web at www.stratify.com.

Product

Stratify Discovery System 3.x employs statistical and other methods to identify named people, places, and proper nouns from any collection of documents, including content from network file servers, content management systems, Web sites, and data fields. The indexed information and metadata are categorized into the system’s taxonomy, which can be modified by a licensee. Stratify provides a key word search and retrieval system to allow attorneys and paralegals to locate information processed by the system.

Stratify includes several interesting features; for example, the ability processing documents in multiple languages; inclusion of text mining functions; automatically creating dynamic taxonomies; providing discovery interfaces for reporting and analysis; and function to identify relationships between and among people, locations, organizations and topics.

Customers

The firm’s customers include law firms, corporate legal departments, and US intelligence agencies. Thomson Reuters uses the system. Stratify helps reduce the need for human indexers. Other customers include NASA, the Department of Education, and Canada’s Department of Foreign Affairs.

Strengths

The major advantage of Stratify is that the company’s system has been optimized for eDiscovery and manipulation of information generated in legal matters. The company was an early entrant in the text mining sector. The firm’s product now includes visualization tools. The utility of these ways of viewing query results will vary from licensee to licensee. Now included are “heat maps” (to show the user hot spots, neighbor maps (showing topics or entities that have a direct, non-mediated relationship to each other), and network graphs (sometimes called social graphs) to help identify direct and indirect relationships between entities.

Weaknesses

Stratify systems begin in the $100,000 range. This license fee does not include customization or additional engineering services required by the licensee. Over the years, the system benefits from the involvement of a subject matter expert. An organization with little or no experience with enterprise search is not likely to understand or benefit from the software.

Price

Current pricing for the Stratify 3.x system begins at about $100,000. A custom price quote is required.

Summary

The Stratify software search-and-retrieval system lacks some of the case management functions that certain competitors are now bundling with their eDiscovery systems. Performance can be an issue if the system is not properly resourced.

Web site: www.stratify.com

Chrome: What It Isn’t

September 8, 2008

The writing is a bit salty, but I found Ted Dziuba’s post at http://teddziuba.com/2008/09/a-web-os-are-you-dense.html quite interesting. Mr. Dziuba asserts that Chrome is not a Web operating system. I agree. The most interesting comment in his article “A Web OS. Are You Dense?” was:

The “Web Operating System” just highlights how much journalists don’t know about computers.

I must admit that I enjoyed this post. A happy quack to Mr. Dziuba. More, please. You have an invitation to contribute to my wimpy Web log as well.

Stephen Arnold, September 8, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta