Simplexo: Single Click Enterprise Search

April 11, 2009

I wrote about Simplexo in September 2008. You can read that article here. I received a question about open source search vendors last week. As a result, I updated my files. Here’s summary of the information I added to my database.

Simplexo is a privately held company in the UK. The company says here that it “is focused on delivering a new experience in enterprise search.” The product is available in two versions:

  • An open source version. This version can be used used in-house by developers without charge under the Open Source GPL2 licence
  • A commercial version 9.x. This is a Windows Service application fully-supported via an annual subscription.

The company’s licensing statement is here. The 2008 Butler Group write up is here.

The company supplied one of the first applications to BSkyB for secure online voting.

The features of the product are interesting. The system can acquire and index structure and unstructured information. The approach is to maintain two indexes. One allows access to the structured data from Oracle, SAP or other enterprise applications. The second index handles the unstructured information. One nice feature is the system’s ability to search other indexes, including Microsoft Search Server and Autonomy’s indexes. The company asserts that throughput is can hit 10 million documents in 24 hours. I don’t have the basic server configuration for this throughput rate, but most organizations running SharePoint environments will have to look for a solution once the documents in the SharePoint environment hit 50 million.

A user can specify parameters, use Boolean, or run free text queries. The system supports phrase searching. The system can be integrated into Lotus Notes and Microsoft Office. Security features include encryption, single sign on, and user authentication.

Simplexo says that the system can support more than 600,000 simultaneous users. Another interesting claim is that queries execute six times faster than queries run on Google’s enterprise system. Queries are processed in parallel to the two indexes.

A desktop version for Outlook will be available later in 2009. You can register to download the software when it becomes available. The desktop information page is here. The news release about the desktop version is here.

Stephen Arnold, April 11, 2009

RapidMiner: Open Source Data Mining

April 11, 2009

A happy quack to the reader who reminded me that Google Apps supports Java. If you are interested in data mining, you may want to catch up with RapidMiner, an open source data mining system. RapidMiner drinks Java, so you may want to think about ways to make use of Google Apps and RapidMiner. The person who wrote me wanted some information about this idea.

My April 2009 column for KMWorld talks about Google Apps, but I don’t have any information about hooking RapidMininer into Google Apps. In fact, I had not thought about it.

RapidMiner is “the world-wide leading open-source data mining solution due to the combination of its leading-edge technologies and its functional range. Applications of RapidMiner cover a wide range of real-world data mining tasks.” There is an enterprise version plus consulting services available.

You can download the RapidMiner community edition here. The documentation is quite good. You can snag a copy of those documents here. The community edition offers a number of features, and it is extensible. Here’s an example of a data output from RapidMiner:

rapidminer

You can find a useful discussion by Michael Wurst of the open source version at Nemoz.org here. This write up provides some useful examples that show one way to hook RapidMiner into a Java application. What is quite useful is the code sample for using the text classifier on a chunk of text. RapidMiner’s classification component is called RapidMinerTextClassifier.

There are some limitations to the Google Apps implementation of Java, but I think the person who wrote me has an interesting idea. The notion of combining sophisticated RapidMiner oiperations with the Google Apps struck me as interesting. If you have any interesting examples of this type of hybridization, use the comments section of this Web log to pass along the information.

Stephen Arnold, April 11, 2009

Twitter: Lover with Two Suitors

April 11, 2009

Network World’s “Twitter Yet Another Microsoft-Google Battleground” here made clear that Twitter is like a cute guy with two aggressive females interested in going to the mall. The Network World story references a cartoon but there’s not much humor in the write up. The article summarizes the tit for tat approach that each company has offered Twitter’s top dogs. The most interesting part of the article in my opinion was this passage:

Google and Microsoft need to be careful so that in their zeal to win the prize, they don’t sabotage it instead. Microsoft has seen the worth of its Facebook stake plummet, now that Facebook’s overall worth has dropped from $15 billion at the time of its deal to just $3 billion to $5 billion today. And Google’s MySpace deal hasn’t panned out to be all that lucrative for the search giant either. The fact that Google’s purchase of Twitter turned out to be either wrong, or very premature, is actually a good thing. Rather than purchasing an outright stake in Twitter now, perhaps it would be wiser to let it grow a bit–all while testing various ad schemes and monetization plans. The risk is that once Twitter hits paydirt, its eventual purchase price will skyrocket. But the downside is that Google may pay a pretty penny for Twitter now, only to see its investment disappear (think Dodgeball, Jaiku, etc.).

Let me offer a couple of observations.

First, jumping into a bidding war in a lousy economy raises an interesting question, “What happens if Twitter turns out to be the next big thing?” The anxiety might be causing the stomachs of the wizards to churn. If the analysis of the Prisoner’s Dilemma is correct, both of these outfits may be throwing bouquets in the hopes of “winning”.

Second, time is the enemy of both Google and Microsoft in this emerging real time search space. Twitter continues to become more useful. The Twitter jobs service here is a glimpse of its utility. A delay could create the same situation Google faces with Windows on the desktop or Microsoft faces with Google’s Web search. Once an outfit gets a bite on the market, shaking the pit bull loose can be tough.

My view is that sales of Tum’s will go up in Mountain View and Redmond until the wizards open Pandora’s Box.

Stephen Arnold, April 11, 2009

Appearances Are Deceiving

April 11, 2009

Peter Kafka has a wonderful post here. The title tells the tale: “‘AP Exec: “To the Untrained Eye It Looks Like We’re Stupid’” Do you think? In my opinion, the most interesting comment in the article was this paragraph:

On the confusing message that the AP presented to the world this week: Guilty as charged, says Kennedy [an Associated Press senior manager]. But he argues that his group has indeed given some thought to what it’s doing, even if it hasn’t communicated that clearly to date.

Wow. No, I don’t think appearances are deceiving in this case. What you see is what you get.

Stephen Arnold, April 11, 2009

Wired Explains Why the Children of Publishers Are a Problem

April 11, 2009

Now the remaining hands at the downsized Wired did not say that. I wrote a headline that expresses why the dead tree crowd is paddling against the current at Niagara Falls. First, click to this Wired story here: “Teens Love Aggregation and ‘Free’, Newspaper Study Finds”. Second, consider this snippet from the article:

“Not only are teens not rushing to pay for content, but they also struggle to envision in what realm they would need to pay for content,” said the study, conducted for the NAA by Northwestern University’s Media Management Center. They are less interested in news brands than a site’s usability and depth of content. “Ask teens where they find news, and they typically say Yahoo!, Google, AOL or MSN,” the study said. “Sometimes, they mean Yahoo! and other times they mean Yahoo! News; sometimes they mean Google, the search bar, and other times they mean Google News or iGoogle. And sometimes they say MSN but mean MSNBC.com.”

The problems seems to be what I call demographic. The children of the traditional media giants are the termites in the old media’s business model. Wonder how the media companies will deal with that. Ground them and cut off online access. I heard a rumor that William Gates banned Apple iPhones and iPods from his house. I suppose that works too.

Stephen Arnold, April 11, 2009

Microsoft and Proprietary Chips

April 10, 2009

Stacey Higginbotham’s “Is Microsoft Turning Away from Commodity Server?” here reminded me of a client study I did five or six years ago. The Sony PS3 was working on a proprietary chip. IBM was involved, and I documented the graphics method which built upon IBM technology. In short order, Microsoft and Nintendo signed up with IBM to use its generic chip design for their next generation game devices. Sony ran into three problems. First, costs went through the roof. Sony did not have a core competency in chip design and fabrication, and it was evident even in the sketchy technical information my Overflight service dug out.

Second, the yield on chips is a tricky issue. Without getting into why a yield goes wrong, I focused on the two key factors: time and cost overruns. The costs were brutal, eventually forcing Sony to change its fabrication plans. The time is a matter of public record. Microsoft beat the PS3 to market, and Sony is starting to recover now. We’re talking years of lost revenue, not days or weeks or months.

Third, the developers were stuck in limbo. With new chips, new programming tools and procedures were needed. Without a flow of chips, developers were flying blind. The problem became critical and when the PS3 launched, the grousing of developers about the complexity of programming the new chip joined with complaints from fanboys that games were in short supply.

Compatibility, availability, and affordability joined the chorus.

Ms. Higginbotham’s article summarized what is known about Microsoft’s alleged interest in creating its own chips for its own servers. The motivator for Microsoft, if I read Ms. Higginbotham’s article correctly, is related to performance. One way to get performance is to get zippier hardware. With faster CPUs and maybe other custom chips, the performance of Microsoft software would improve more than it would by using Intel or AMD CPUs. (Google uses both.)

For me, the most interesting point in her write up was:

The issue of getting software performance to scale linearly with the addition of more cores has become a vexing problem. Plus, as data center operators look for better application performance without expending as many watts, they are experimenting with different kinds of processors that may be better-suited to a particular task, such as using graphics processors for Monte Carlo simulations.

She did not draw any parallels with the Sony chip play. I will:

  1. The Sony Ken Kutaragi chip play provides a good lesson about the risks of rolling your own chips. Without a core competency across multiple disciplines, I think the chance for a misstep is high. Maybe Microsoft is just researching this topic? That’s prudent. Jumping into a proprietary chip may come, but some ramp up may be needed.
  2. Google does many proprietary things. The performance of Google’s system is not the result of a crash project. Time is of the essence because the GOOG is gaining momentum, not losing it. Therefore, the Sony “time problem” with regard to the Xbox may translate into years of lost opportunity. Chip designs are running into fundamental laws of physics, so software solutions may reduce the development time.
  3. The performance problem will not be resolved by faster hardware. Multiple changes are needed across the computing system. There are programming slow downs because tools have to generate zippy code for high speed devices. Most of the slow downs are not caused by calculations. Moving data is the problem. Inefficient designs and code combine with known bottlenecks to choke high performance systems, including those at Google. As the volume of data increases, the plumbing has to be scalable, stable, and dirt cheap. Performance problems are complex and expensive to resolves. Fixes often don’t work which makes the system slower. Nice, right? Need more data? Ask a SharePoint administrator about the cost and payoff of her last SharePoint scaling exercise.

My view is that one hire does not a chip fab make. Microsoft’s analysts have ample data to understand the costs of custom chip design and fabrication. Google requires immediate attention and rapid, purposeful progress on the part of Microsoft’s engineers. Time is the real enemy now. Without a meaningful competitor, Google seems to enjoy large degrees of freedom.

Stephen Arnold, April 10, 2009

Digital Gutenberg Study Completed

April 10, 2009

Infonortics Ltd. received the manuscript for Google: The Digital Gutenberg yesterday, April 10, 2009. The monograph is the third in my series of Google analyses. The topics addressed in this new study include:

  • Google’s content automation methods
  • A discussions of dataspace functions, the report or dossier system, and content-that-follows system
  • A description of Google’s increasing impact on education, scholarly publishing, and commercial online

The information in the study comes from open sources such as Google’s presentations, technical reports, and US government filings to the SEC and USPTO. I have revised and updated some of the information I wrote for BearStearns, Trust Company of the West, and IDC for this study as well as included completely new material that, as far as I know, has not been described in detail elsewhere. I am often asked, “Does Google cooperate with you and provide information.” The answer is, “No.” The Google ignores me, making sure my “authoritative” score is near the bottom of the barrel. I have remarked on many occasions that Google would like to see this goose’s cooked. Google professionals off the record express their surprise at what their employer is doing. Google is not into opening its technical kimono for researchers of my ilk. Compartmentalization is useful I suppose.

!logo

Why Google and Publishing

I narrowed the focus to publishing for three reasons:

First, Google finds itself in the news because some newspapers have become critical of Google’s pointing to content produced by third parties. What I have tried to do is explain that Google’s technology processes information and provides access. One of my findings is that Google has shown considerable restraint in the use of its inventions. If my research data are correct, Google could be more active as a content generator than it has chosen to be. Google, for this reason, has “potential energy”; that is, without much additional investment, the company could produce more content objects.

Second, Google’s technical infrastructure plus its software adds up to create a “digital Gutenberg”; that is, an individual could create a Knol (fact based essay on a subject), create a business listing in another Google service, and create a Web log on the Knol’s topic. The “author” or user uses Google as a giant information factory. Inputs go in and traffic “finds” the information. There are different ways to monetize this manufacturing and distribution system. Google has created its own version of Ford’s River Rouge integrated facility.

Third, Google is following what users click on. As a result, it is important to track the demographic behaviors of Google customers, advertisers, licensees, and users. The users, not Google management, help determine where Google goes and what Google does. Competitors who attempt to predict Google’s next action are likely to be off base unless those analyses are anchored in demographic and usage data. Another finding is that Google is relying on demographics to carry its “River Rouge” and “digital Gutenberg” capabilities into different markets.

@kimono

Google did not open its kimono to me. The open source intelligence methods yielded that data in this study. You can see one of my tools here.

Differences in Digital Gutenberg

In my first two studies, I explained in detail Google’s systems and methods. I include a couple of Google equations in this new study. I make brief references to patent documents and technical papers, but my editor and I have worked to make this study more accessible to the general business reader. I lack the capacity to write a “Sergey and Larry eat pizza” monograph. Frankly, technology, not pizza, interests me. I suppose I am as mechanistic and data centric as some Googlers.

Also, I don’t take sides. Google is neither good nor evil. The companies affected by Google’s waves of innovation are just average companies. Google, however, thrives in sophisticated technology and data. In my encounters with Googlers, most would prefer to talk about a function instead of the color of a sofa. The companies criticizing Google lack Google’s techno-centrism. I point out that Google’s actions and public statements make perfect sense to someone who is Googley. Those same statements when heard by those who operated mostly from subjective information come across as arrogant or, in some cases, pretty wacky.

The conclusion to the study is a discussion of one of Google’s most important initiatives in its 10-year history: the Google App Engine. That surprised some of the people whom I asked to read early drafts of the manuscript. The App Engine is the culmination of many thousands of hours of engineering, and it will make its presence felt across the many business sectors into which Google finds itself thrust.

You can see an early version of the study’s table of contents here. (And, yes, I know the Chinese “invented movable wood block printing”. I used “Gutenberg” as a literary convenience.)

Who Should Read This Monograph?

My mom never read any of my monographs. She looked at my first study, written decades ago, and said, “Dull.” Today, I am still writing dull stuff, but the need to understand what is happening and will happen in electronic information is escalating.

At a minimum, I think the contents of the Digital Gutenberg would be of interest to companies who are engaged in traditional media; that is, publishing, video and motion picture production, and broadcasting. Others who may find the monograph a useful reference may include:

  • Analysts, consultants, and pundits who track Google
  • Competitors and soon-to-be Google’s competitors
  • Lawyers who are on the prowl for Google-related information
  • Entrepreneurs who want to find out how to “surf on Google”
  • Government regulators eager to find out whether the existing net of regulations has hooked on Google
  • People who want to work at Google because some of Google’s most exciting innovations are not well known.

Read more

Yahoo Takes a One, Two Punch

April 10, 2009

The Wall Street Journal here and Search Engine Land here reported that Yahoo lost some toolbar deals. The idea is that a PC comes bundled with crapware. When the consumer fires up the new computer, the crapware delivers toolbars, antivirus, and other quasi-useful applications. I have heard that toolbar peddlers pay big bucks (sometimes upfront and sometimes based on a quota) to get on new PCs. To lose a deal can be a problem, but I am not sure if the problem is a big one. Anyway, the WSJ and SEL suggest that Yahoo could lose search share because it lost a deal with Hewlett Packard and Acer. Yahoo has other problems that may affect traffic. Are these one, two punches soft or hard? Too soon to tell.

Stephen Arnold, April 10, 2009

Search Costs: Clouds Come Lower

April 10, 2009

IT-Analysis published Laurie McCabe’s “Will CPAs Bring the Cloud to Earth for SMBs?” You can read the story here. The hook for the story was CPA chatter. I would imaging that “chatter” to a CPA is fairly tame stuff, but I may be wrong. MBAs were once considered harmless but since the financial meltdown, MBAs are downright lethal. The write up is about two accounting groups’ decision to support Intacct for their customers. I never heard of Intacct, but I assume QuickBooks has. Ms. McCabe wrote:

Not only does this alliance pose a strong threat to Intuit QuickBooks’ dominance in the small business accounting market, it has the potential to pull SMBs into cloud computing in vast numbers. Intacct, AICPA and CPA2Biz did a lot of homework beforehand, including research that showed online accounting solutions boost productivity by as much as 50%. By dramatically reducing the need for travel, and the necessity of exchanging paper and email files, CPAs have more time to spend providing guidance to clients to help them improve financial performance and decision-making.

Too bad for QuickBooks, but the green eyeshade set believes that cloud-based applications like accounting make financial sense. Do you think? When the bean counters figure out how to save money, it makes little difference what the info tech folks say. Blossom.com, one of the most successful cloud search vendors, is probably quite happy with the CPAs’ new found ability to see the clouds.

Stephen Arnold, April 10, 2009

Google Apps: Googzilla’s Fangs

April 10, 2009

ComputerWorld has an important story here. The url is a Dusie so click quick. I sense a 404 in your future if you delay. The title “Google Working to Add Every Last Service to Apps” is a categorical affirmative. If you recall your college logic class, categorical affirmatives are tough to make stick, particularly when these are applied to the GOOG. The subtitle is the ballpeen hammer: “Exec Offers Up Plans in Colorful Tweet.” Google reveals that it will attack the enterprise with the muscular App Engine, not the kick-sand-in-its-face Google Search Appliance. Instead of a news conference in New York, the GOOG sends out a Twitter message. I think it is safe to say that the GOOG is banking on the demographics of the Twitter generation to get the message. ComputerWorld’s writers quoted various gurus as allegedly saying:

“While this strategy creates a certain ‘shock and awe’ factor in the developer and geek world, this still leaves certain large enterprise requirements unanswered, such as role-based administration and records management capabilities,” he said. “I think this strategy strengthens Google Apps within its core constituency — the [small and midsize business] market. SMBs will love the increasingly Swiss Army knife capabilities of Google Apps.

My thought is that Google’s enterprise search group knew exactly what it was doing. Furthermore, Google’s demographic card is a component of the surround and seep strategy. Traditional marketers are not Googley. In my opinion, Google is content to blaze its own trail to the enterprise and the crown jewels of IBM, Microsoft, and Oracle. Just my opinion.

Stephen Arnold, April 10, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta