Order Google: The Digital GutenbergTop Banner

Hadoop Caught in Loops

July 4, 2009

Dana Blankenhorn’s “Who Will Control Hadoop?” here raised an important question. The focus was close, but I considered his question in a broader context. Mr. Blankenthorn asked:

Do too many Hadoops spoil the code?

In a narrow sense, my view is let many flowers bloom. When the world was less fluid, flakey, and financially challenged, many efforts seemed like a good idea. Now, I am not so sure. Mr. Blankenthorn said:

But some reporters are beginning to ask who is really in charge of Hadoop. Is it Apache or Yahoo? Was Yahoo’s distribution a diss of Facebook, which previously developed its own Hadoop SQL, called Hive? Most projects have a community and a commercial arm. Hadoop’s importance has drawn a number of corporate sponsors to separately deliver their implementations. Microsoft, Yahoo, Google, and Facebook all have their own takes on Hadoop, alongside Apache and Cloudera. All these various Hadoops can be seen as a positive or a negative. As a positive, there is growth and momentum for the framework. As a negative, there are many organizations pulling Hadoop in different directions.

In a broad context, the value of open source software is that many hands working to create something that is not proprietary, not unstable, and not subject to the whims of a corporate titan is a foundation stone. On the other hand, fragmentation of an important technology makes some folks wary of open source.

The way online works is to reward one company with a virtual monopoly. This is a natural consequence of costs and user behavior. The problem is that when one outfit is in control, that organization follows the well worn path of profit and benefit maximization. That can’t be helped either.

In short, I think the same type of financial meltdown that has trashed some individuals’ plans for the future is likely to take place again. Tricky stuff, indeed.

Stephen Arnold, July 4, 2009

Performance Fireworks: Microsoft Fast Fizzles, Google Explodes

July 4, 2009

I was sitting in an airport, and I clicked on a link for Microsoft Fast ESP. A video ran and presented me with a couple of professional fellows talking about Microsoft Fast search. The video was interesting, but I went back and snagged one screen frame from the presentation because it struck me as a way to explain the distance between the performance of Microsoft Fast and the performance of Google’s system. Now performance data for search systems is a murky area. I don’t want to get into a squabble about something being five times faster. The difference here makes a point, and I will leave it to Googlers and Microsofties to post corrected performance data in the Comments section of this Web log, assuming those companies’ professionals have time to read the thoughts of the addled goose.

First, the Microsoft data. Here’s the screenshot, and I want you to notice that the performance that is presented is five to 20 queries per second. That is pretty modest for a performance threshold even for a Microsoft team in Charlotte, North Carolina, where I have heard the pace of life is on par with Harrod’s Creek.

fast performance

Source: http://www.youtube.com/ watch?v=kTbcCNby8xE

I ask you to click here to look at the performance data I calculated for Google. The key point is that if the Google data are reasonably accurate, the Google is cranking along about about 1,700 queries per second. Even Yahoo appears to perform better than Microsoft Fast. See my write up here.

That’s a big gap. Assume the Google data are off by a factor of four. The Google is handling 400 queries per second. If we boost the Microsoft Fast performance by a factor of four to 20 queries per second to 80 queries per second, the Google appears to be the speed demon.

If you want performance fireworks, my thought is that the Google is the fire cracker if the data are correct.

Stephen Arnold, July 4, 2009

Vivisimo Lands HCPro Deal

July 3, 2009

Vivisimo has a new client. HCPro, a health care regulation and revenue cycle management company, will use the Velocity platform, to power MedicareFind.com. That Web site offers definable search of a comprehensive database of Medicare rules, regulations, and CMS documents governing reimbursement–a critical tool for many companies in the health care business. According to a press release here, “Velocity’s ease of implementation, flexible user interface and social search features were key business drivers in selecting Vivisimo to power MedicareFind.com.” MedicareFind.com is a part of HCPro Inc., a portfolio company of Halyard Investments. Halyard is a private equity firm with more than $600 million of capital under management focused on investing in media, communications, and business services companies. If this is the type of company Vivismo is getting contracts with, they may be an even bigger player in search very soon.

Jessica Bratcher, July 3, 2009

Concept Searching Update

July 3, 2009

Founded in 2002, Concept Searching provides licensees with search, auto-classification, taxonomy management and metadata tagging solutions. You can download a fact sheet about the privately firm here. The software can be used on an individual user’s computer or mounted on servers to deliver enterprise solutions. The company’s secret sauce is its statistical metadata generation and classification method. The technology uses concept extraction and compound term processing to facilitate access to unstructured information. The company operates from Stevenage in Hertsfordshire. A list of the Concept Searching offices is here.

The company emphasizes the value of lateral thinking, and its approach to content analysis implements numerical recipes to find these insights and linkages within unstructured text.

When I updated my profile for this company earlier this year, I noted that the firm had signed Portal Solutions, a company that focuses on things Microsoft. The idea is to make it possible for a user to search for “insider dealing” and retrieve documents where that bound phrase does not appear but a related phrase such as “insider trading” does appear. This type of system appeals to intelligence officers and financial analysts. Concept Searching’s methods generated lists of related topics. You can see an example of the system in action by navigating to this page. I ran several test queries and the interface provided useful information and suggestions about other related content in the processed corpus. A screen shot of the output appears below:

concept hmso

Concept Searching is a Microsoft and Fast Search partner. The idea is that Concept Searching’s technology complements and in some cases extends the search and content processing services in Microsoft products. In May 2009, the company sponsored a best practices site for Microsoft SharePoint. The deal involves a number of companies, including ShemaLogic, KnowlegeLake, and K2 Technologies among others. The site is supposed to go live in the next couple of weeks, but I don’t have a url or a date at this time.

The company had a busy May, signing deals with Allianz Global Investors, Directory, and AT&T Government Solutions.

For me, the most interesting system that Concept Searching offers is its ability to generate and classify terms found in SharePoint documents into a taxonomy. The company has prepared a brief video that demonstrates this functionality. You can find the video here. The company’s approach does not require a separate index. Microsoft Enterprise Search can use the outputs of the Concept Searching system. I noted two “uniques” in the narrative to the video, and I remain skeptical about categorical affirmatives. I think the bound phrase extraction and the close integration with SharePoint are benefits. I just bristle when I hear “unique”, which means the one and only anywhere in the world. Broad assertion in my experience.

concept searching block diagram

Concept Searching’s president, Martin Garland, said here:

Our intellectual property is still unique as we are the only statistical search technology able to indentify multi-word patterns within text and insert these patterns directly into the index at ingestion or creation time. We call this “Compound Term Processing”.

Last week I sat in a briefing given by one of Microsoft’s enterprise search team. I thought I heard descriptions of functions that struck me as quite similar to those performed by Concept Search and such companies as Interse in Copenhagen, Denmark.

I think it will be fruitful to watch what features and functions are baked into the upcoming Microsoft Fast ESP version of the old Fast Search & Transfer system. Remember: the roots of Fast Search stretch deep to 1997, a year before Google poked its nose from the Stanford baby crib.

Partners like Concept Searching have invested significant resources in Microsoft technologies. Will Microsoft respect these investments, or will Microsoft in an effort to recoup is $1.23 billion investment take a hard line toward such companies as Concept Searching.

I am on the fence regarding this issue.

Stephen Arnold, July 3, 2009

UFC 2010: HTML 5, Air, and Silverlight

July 3, 2009

Mary Jo Foley opened my eyes to a new unlimited online fighting battle in 2010. Her story with a lamentably cryptic headline appeared on June 11, 2009 as “Microsoft .Net RIA Services: Not until 2010.” You can find the article here. He story revealed that Microsoft will try to push its Rich Internet Application technology into the market in 2010. She wrote:

.Net RIA Services is designed to allow coders to bring together the .Net programming model with Microsoft’s Silverlight competitor to Adobe Flash. Microsoft made a Community Technology Preview (CTP) of the technology available in March, but didn’t provide any final availability information.

The RIA acronym means stuff like Adobe Flash and Google’s HTML 5 methods. The idea is that a computing device with an Internet connection can look and feel like a traditional application, a DVD player, or an immersive game. The end of shrink-wrap software and the money machine that made Microsoft and Adobe the big dogs each is today is likely to whine and stumble to a limp along, not a footrace.

I want to capture my thoughts about the dust up:

  1. I think Adobe is the weakest of the three combatants in the UFC 2010 digital slugfest. Adobe’s pushing the envelope with its license fees now. The sudden spate of security problems coupled with the balky nature of some Adobe Air implementations means that whatever cash Adobe has will not be enough to cope with the GOOG and the Softies.
  2. The Google team has a quasi-open source angle. The Microsoft team wants everyone to get with the Windows agenda, memorize it, and live it. This is a toss up because Google has been stumbling of late with regard to security, government regulations, and that old annoyance copyright. Microsoft is Microsoft, so it is a force no matter how wacky the Silverlight code may be.
  3. The financial climate, despite the sunny news from TV commentators, looks bleak to me. As a result, each of these UFC 2010 fighters will be ready to rumble. I think fingers in the eyes, low blows, and blows to the back of the neck will be entertaining tactics to watch.

In short, Ms. Foley reminded me to make time in 2010 for this traveling road show.

Stephen Arnold, July 3, 2009

SAP: Dinosaurs Resist Extinction

July 2, 2009

Kelly Fishash’s “SAP Hits On Demand SaaS Button to Avoid Extinction” here reminded me that I had in my write up pile a comment about the German software giant’s latest reflex action. Mr. Fishash wrote:

SAP, in a spectacular U-turn, has leapt on board the software-as-a-service bandwagon - the company confirmed its new selling strategy yesterday [June 10, 2009]. The German software giant, which was speaking at an On-Demand conference in Amsterdam on Wednesday, said it will launch SaaS functionality add-ons for its existing Business Suite ERP customers soon. It will wedge open the door to its Large Enterprise on-Demand product, to allow companies to bolt on SAP’s web offerings with their core, on-site or hosted ERP platforms.

I think SAP is one of those companies that merits close observation. The company is a variant of the IBM approach to software and services; that is, big, complex, expensive, and an exemplar of the “take your medicine” method. The SAP TREX search system is interesting, but I don’t see much about it. I track TREX in my Overflight service (sorry, this part of the service is not available for free at this time). I did a write up about TREX in one of the three editions of Enterprise Search Report I wrote. I did not include the system in my 2008 Beyond Search because I just wasn’t hearing much about the company. I continue to follow SAP outfit because it pumped cash into Endeca via its venture unit a year or so ago. I wondered if SAP execs recognized that Endeca required similar upfront consulting for its search and content processing system. The SAP system is front loaded in the same way, and both SAP and Endeca avoid offering bargain basement pricing on enterprise systems.

Now I learn that after a run at raising some fees, SAP is embracing SaaS or Software as a Service which is a more trendy name than timesharing.

Dennis Howlett’s “European SaaS Vendors: Not Quite Comfortabole in Their Skins” here made this point in his June 10, 2009 article:

you have John Wookey’s announcement of SAP’s saas plans. Confused or not, it speaks volumes that SAP chose to make the public announcement to the industry itself. It was greeted with muted acceptance with some muttering that it was defensive while others immediately thought ‘cost.’

I have a slightly different view; specifically:

  • SAP is struggling with two financial challenges. The first is the money sucked into the SAP’ black hole of engineering. The company has to spend to keep the quite interesting collection of systems and subsystems working for today’s customers. Second, the company has to find a way to fund research that gets the SAP systems out of the dinosaur trap and into the Googzilla type of low cost engineering mode that Messrs. Brin and Page use. Even Amazon has figured out that open source and commodity hardware are a way to control costs. (Amazon reliability is another issue, however.)
  • SAP’s customers are either happy because the system is up and running, business procedures are understood by licensees’ employees, and senior management just pays for engineering support and upgrades. The big invoices are behind the company. Happy days!
  • Competitors like Salesforce.com and the Google are not deaf, blind, and mute to the opportunities the IBMs, Microsofts, Oracles, and SAPs create. So, SAP with its juicy client base and “intersting financial challenges” chugs along with a system creaking under complexity, almost immune to substantive change.

I think sudden shifts like the SaaS “love” are little more than signals that an era is ending. I keep watching for similar indicators from IBM, Microsoft, and Oracle. I wonder which of these three will follow in the footsteps of the SAP dinosaur?

Stephen Arnold, July 2, 2009

Polyspot: Version 4.8 Released

July 2, 2009

Speed is the name of the game in search, and Polyspot, is keeping its hand in the pot. The French company just released an update of their enterprise search product, V4.8. The new version “speeds faceted search and navigation” and is designed to accurately retrieve relevant information intuitively. Polyspot works by extracting information/metadata/tags/etc. while indexing data repositories, and then it accesses those indexes to speed the search process. The product allows parametric search and faceted navigation while fine tuning results through widgets and tag clouds. There are also filters to allow for combinations of search terms. V4.8 also has an Administrative Console that facilitates facet definition, calibration of a host of widgets, and display options. You can check out the product here.

Jessica Bratcher, July 2, 2009

Monitoring, Snooping, and Search

July 2, 2009

Every time I mention to an audience of information professionals the value of monitoring information flow, I see lots of rolling eyes and disgusted looks. Too bad. Snooping, monitoring, and search are fast friends. Don’t believe it? Click here and read “IT Staff Snooping on Colleagues on Rise: Survey”. Tarmo Virki summarized a number of data points. Among those that I found interesting was this factoid: One third of IT professionals abuse administrative passwords.The findings of the Cyber Ark study are almost identical to a study run in 2008. The passage that jumped out at me was:

Cyber-Ark said the most common areas respondents indicated they access are HR records, followed by customer databases, M&A plans, layoff lists and lastly, marketing information.

Troubled? Concerned? More information appeared in the original article. Accurate? A spoof?

Stephen Arnold, June 14, 2009

Oracle Salesforce Rumor: A Summer Thriller

July 1, 2009

I heard chatter at the Gilbane conference in San Francisco on June 4, 2009. I did not know the slick, 20 something who was explaining over his Pop Tart that Oracle was interested in Salesforce.com. Now the story “pops” into my feed reader with a Reuters’ logo, a byline for Jim Finkle, and the rumor elevated to the status of mainstream media “story”. You can try to locate the Reuters’ story “Sales Force CEO Downplays Chatter of Sale to Oracle” but I have had some 404s of late. These Reuters’ stories are too valuable to be left where my feed reader first pointed. Go figure. Anyway, Mr. Finkle wrote:

Salesforce.com Inc  Chief Executive Marc Benioff downplayed persistent speculation that bigger rival Oracle Corp  may buy his Web-based software company. Oracle CEO Larry Ellison was an early investor and one-time board member in San Francisco-based Salesforce but Benioff told Reuters on Monday [June 29, 2009]: “If he wanted to buy it, he would have.”

A couple of thoughts flapped through the addled goose’s tiny brain:

  • Google has been a cheerleader for Salesforce.com for quite a while. Google, however, has not made overt moves to acquire Salesforce.com. If Oracle shows interest, might that urge Googzilla to snap up Salesforce.com along with its real sales team and its customers.
  • Despite Mr. Ellison’s investment in Salesforce.com, I have sensed some cattiness about Salesforce.com’s success with its off premises, cloud based service. Even though Oracle beats at the heart of the Salesforce.com system, the model challenges Oracle’s on premises approach. A purchase might lead to some sudden changes in Salesforce.com. I think of this management approach as oncology management.
  • With a great deal of cash slopping around in some investment firms’ wallets, if Salesforce.com is in play, there may be some left field buyers in the game.

Nothing like a buy out rumor to add zest to the summer financial drama. My hunch is that this thriller may have a touch of Hollywood, however. Whatever happens, I think Google benefits. That company’s search and glue code makes contributions to both Oracle and Salesforce.com. Neither company has a search system that rises above unsalted popcorn. Google may end up a winner by providing search and other services no matter how the script unfolds.

Stephen Arnold, July 1, 2009

Lucid Imagination Offers Connectors to Lucene Solr Systems

June 30, 2009

Lucid Imagination now resells ISYS Search Software file filters to offer content access capability to Lucene/Solr open source search systems. This clever move has the happy side effect of allowing Lucid to market the filters, a set of .dlls (dynamic link libraries) normally used in retail products for text extraction, to their own customers, effectively stretching its Lucene/Solr search product into the pay-for-service enterprise data field. It’s a streamlined effort designed to be significantly cheaper than competitor connectors and gets around one of the barriers to broader uptake of open source search technology. Most commercial search vendors do not unbundle their connectors and often use them to justify higher price-tags. This deal may take a lot of wind out of their sails. Lucid will offer five categories of content filters, available separately or in any combination, so a company can customize based on their search needs. Beyond Search was surprised that commercial search vendor is unbundling its technology. The plus is that it gets the Australian company’s foot in the door to the open source market. Meanwhile, Lucid is on the move to strengthen its position bridging the gap between open source and commercial software and will be signing up other commercial software components so Lucene/Solr users can build more robust search solutions.

Jessica Bratcher, June 30, 2009

Next Page »