Microsoft Search Infrastructure

June 25, 2009

Short honk: Microsoft has hired some Yahoo wizards and now information popped into my RSS reader about Autopilot. You can read Codename Windows’ write up about this program and ask yourself, “How long will it take to complete this project?” and “What is its cost?” and “How much time will be needed to catch up, then pass Google in this engineering sector?” I don’t have answers but Google did plumbing then search. Microsoft seems to be approaching the problem the other way around if this information is on target. Just my opinion.

Stephen Arnold, June 25, 2009

IBM Challenges Google by Reacting, Not Leading

June 25, 2009

Internet News ran a very interesting article called “IBM Takes on Google with Social Cloud Apps”. I revealed that IBM rebuffed a certain government group with a statement along the lines that IBM knew in early 2008 what Google was doing. You can read my short write up in which this confidence is discussed. Alex Goldman wrote:

Scoring an upset over Google Apps was IBM’s LotusLive Connections, which is so new it’s not available until June 30, 2009, but the IBM product won the Enterprise 2.0 Cloud Computing Buyers’ Choice Award. LotusLive Connections offers profiles that list employees’ expertise so that others can find them, blogs so that experts can share knowledge and learn from each other, add a dogear to bookmarks and share information, Activities for project collaboration, and brings it all together in one unified home page. IBM’s social cloud software is fully integrated. Other LotusLive cloud services include LotusLive Engage for collaboration, LotusLive Meeting for voice and video conferencing, LotusLive Events for registration and ticketing of large meetings, LotusLive Notes Web mail, and LotusLive iNotes file sharing.

The issue in my opinion is that IBM is reacting to Google. If you know what a partner / competitor is doing, you get ahead of that competitor. Reacting suggests that control has been ceded. Google is search. IBM is vulnerable in a core function.

Stephen Arnold, June 25, 2009

A Glimpse of the Google Collaborative Plumbing

June 19, 2009

On June 18, 2009, the ever efficient US Patent & Trademark Office published  US2009/0157608, “Online Content Collaboration Model”, a patent document filed by the Google in December 2007. With Wave in demo mode, I found this document quite interesting. Your mileage may vary because you may see patent documents as flukes, hot air, or legal eagle acrobatics. I am not in concert with that type of thinking, but if you are, navigate to one of the Twitter search engines. That content will be more comfortable.

The inventors were two Googlers, William Strathearn and Michael McNally, neither identified as part of the Australian team responsible for Wave. I like to build little family trees of Googlers who work on certain projects. Mr. Strathearn seems to have worked on the Knol team, which works on collaboration and knowledge sharing. Mr. McNally, another member of the Knol team, and he has written a Knol about himself which is at this time (June 19, 2009) online as a unit of knowledge.

The two Googlers wrote:

A collaborative editing model for online content is described. A set of suggested edits to a version of the online content is received from multiple users. Each suggested edit in the set relates to the same version. The set of suggested edits is provided to an authorized editor, who is visually notified of differences between the version of the content and the suggested edits and conflicts existing between two or more suggested edits. Input is received from the editor resolving conflicts and accepting or rejecting suggested edits in the set. The first version of the content is modified accordingly to generate a second version of the content. Suggested edits from the set that were not accepted nor rejected and are not in conflict with the second version are carried over and can remain pending with respect to the second version.

What’s happening is that the basic editorial system for Knol and other Google products gets visual cues, enhanced work flow, and some versioning moxie.

knol collaboration

Figure 2 from US2009/0157608

Is this a big deal? Well, I think that some of the big content management players will be interested in Google’s methodical enhancement of its primitive CMS tools. I also think that those thinking of Wave as a method for organizing communications related to a project might find these systems and methods suggestive as well.

Read more

IBM Equals Cost and Complexity

June 16, 2009

I had heard that this PR push was coming. That’s the reason I posted the story detailing the steps required to connect OmniFind to other IBM software. If you don’t recall that post and the eight Web pages of technical procedures and code snippets, you can read “Teaching IBM OmniFind to Index IBM’s Portal Document Manager Content” or my other Web log posts about IBM’s technology.

The New York Times’s “IBM. to Help Clients Fight Cost and Complexity” is a Big Bertha information blast, and I was delighted to see the story getting such strong pick up and play. Disinformation is a wonderful thing in the opinion of the addled goose.

The story, by Steve Lohr, stated:

In the cloud market, I.B.M. plans to take a tailored approach. The hardware and software in its cloud offerings will be meant for specific computing chores. Just as Google runs a computing cloud optimized for Internet search, I.B.M. will make bespoke clouds for computing workloads in business. Its early cloud entries, to be announced on Monday, follow that model. One set of offerings is focused on streamlining the technology used by corporate software developers and testers, which can consume 30 percent or more of a company’s technology resources.

Mr. Lohr concluded:

I.B.M.’s cloud strategy, the company said, is the culmination of 100 prototype projects with companies and government agencies over the last year, and its research partnership with Google. “The information technology infrastructure is under stress already, and the data flood is just accelerating,” said Samuel J. Palmisano, I.B.M.’s chief executive. “We’ve decided that how you solve that starts by organizing technology around the workload.”

Several comments:

  1. Nary a mention of IBM’s previous cloud initiatives. I was hoping to read about the IBM Internet dial up service or the grid system that I learned about from a person in West Virginia (definitely a hot bed of massively parallel computing). I was hoping for a reference to the early cloud system used inside IBM for its own technical information center. No joy.
  2. Complexity is not reduced with cloud computing. If anything, data interchange and access becomes more complex, particularly if the IBM customer has other hosted services plus a vegetable medley of mainframes, mid range, and client server IBM gear. Hooking this stuff up and reducing latency without using the equivalent of the GNP of Switzerland perhaps should have warranted a comment?
  3. IBM is a trend surfer. It is becoming more and more dependent on engineering and professional services. I was looking for a comment, maybe a hint of doubt that the IBM cloud push would assist companies now, not at some vague time in the near future.

Will IBM run a full page ad about its new cloud services in the newspaper? I don’t know, but I will be looking for one. An ad will be a nice complement to the story I just read. Just my opinion, Big Blue and Gray Lady. Just my opinion.

Stephen Arnold, June 16, 2009

Intel and the Cloud. Wow.

June 16, 2009

Please, read Dave Asprey’s “What Intel Can Teach Google about the Cloud.” I was surprised. Mr. Asprey wrote:

But these cloud compute providers, liberated from the shackles of Moore’s law, can’t grow network speeds as quickly as they can add servers, creating exactly the same problem that CPU vendors faced when their CPUs grew faster than the system bus. It’s getting worse, too — according to the lesser-known Nielsen’s Law, Internet bandwidth grows at an annual rate of 50 percent, compared with compute capacity, which grows at 60 percent, meaning that over a 10-year time period, computer power grows 100X, but bandwidth grows at 57X. Ouch. So what did Intel and AMD do when faced with the same problem? They looked for a fix they could apply quickly.  The quick fix was to add a cache to the processor, which allowed the CPU to run at full speed and store results in temporary memory until they could move across the slower system bus. It also allowed them to keep selling faster processors while they tackled the longer-term project of improving standards for bus speeds.

Three comments:

  1. Intel tried a cloud play with some expensive data centers and a deal with Convera. Intel demonstrated that it could not deliver.
  2. Physics is indeed the problem for servers. The Google is dabbling in a range of interesting engineering to compensate for those issues. Caching is one solution, not the only solution. On CPU caches can introduce latency and on die data to’ing and fro’ing reduces some fancy multicore gizmos to piggies.
  3. Look at the approach of Perfect Search in Orem, Utah. Mr. Asprey’s assessment does not apply to that firm’s engineering approach. Big oversight in my opinion.

Stephen Arnold, June 15, 2000

Amazon and the Unexpected

June 11, 2009

I opened my new Kindle Two after my Kindle One disappeared from my briefcase during a talk at the Gilbane Conference on June 4, 2009. I was disappointed that the display wasn’t much of an improvement. The addled goose’s eyes are not what they used to be. I wondered in the God of Geese would come to my aid. When I read Rich Miller’s “Lightning Strike Triggers Amazon EC2 Outage” here, I quacked, “Whoops.” Maybe the God of Geese was paying attention to my Kindle Two injunction. Mr. Miller wrote:

Some customers of Amazon’s EC2 cloud computing service were offline for more than four hours Wednesday night after an electrical storm damaged power equipment at one of the company’s data centers.

Mr. Miller did not raise these questions:

  • What is the reliability of cloud computing in general if lightning, hardly an unexpected weather event, can kill a major data center?
  • Who pays for the loss of revenue to customers when a vendor’s engineering is not able to handle something than Ben Franklin was able to manage?
  • What steps will cloud vendors take to prevent their “enterprise ready” systems from running to the cellar when a storm passes through?

If a cloud system is delivering search and text analysis for a mission critical application, won’t the customers start thinking about the benefits of on premises installations. Losing a service for a photo archive is one thing. Losing a system related to more significant business operations seems to be different to me.

Stephen Arnold, June 11, 2009

Clearpace RainStor Supports Queries

June 11, 2009

A happy quack to the reader in Australia who alerted me to an outfit called Clearpace Software. According the the company’s Web site, Clearpace is

a software company that provides data archive store solutions for the long-term retention of structured data within the enterprise. Clearpace has become a pioneer in the database archiving market by providing archive stores that are the optimal destination for inactive data that has been removed from production systems. The Clearpace NParchive software enables organizations with large and growing data estates to massively reduce the cost and complexity of storing historical information while making archived data easily accessible for regulatory, legal and business purposes. Using NParchive, companies are able to store as much as 60x more historical information on commodity hardware.

The angle that interested me was that Clearpace includes a query tool with its system. The idea is that a Clearpace client can search the data in the Clearpace RainStor archive. Here’s what the company says about the Rainstor cloud storage service:

RainStor is a cloud-based archiving service for simply, securely and cost-effectively preserving historical structured data. The RainStor archive service enables companies to send an unlimited amount of inactive data from databases or event logs to a hosted storage platform where it can be retained and searched on demand. RainStor compresses data by 40x before transferring encrypted data files to the cloud, providing rapid load times and reduced storage cost, while also supporting full SQL access to the archived data files using industry standard reporting tools and interfaces. RainStor is delivered on a Software-as-a-Service (SaaS) basis, leveraging cloud infrastructure. The RainStor cloud archive service requires no upfront investment offering a pay-as-you-use model based on the volume of raw data that is sent to the cloud. Rainstor is provided as a service by Clearpace Software.

You can read the company’s news release here.

I don’t have too much information about the search function. My questions will focus on latency and the user interface.

Stay tuned.

Stephen Arnold, June 11, 2009

Every Cloud Has a Tin Lining

June 9, 2009

I found the article “Microsoft Exec Sees Lower Margins from “Cloud” suggestive. You can read the article here (I hope), because this is the type of document that can come back and nibble at one’s ankles. The idea is that selling cloud services yields less revenue than selling shrink wrapped software. The article reported:

Microsoft Corp’s chief software architect said on Thursday the profit margins on providing online services — broadly known as cloud computing — would likely yield a lower profit margin than the company’s existing software business. “The margins on services are not like the margins on software, so it (cloud computing) will increase our profit and it will increase our revenue, but you won’t have that margin,” said Ray Ozzie on Thursday at a Silicon Valley technology event.

Several observations:

  1. If Microsoft deploys a cloud based enterprise search solution, the payback on the $1.2 billion purchase price, the engineering rework, and the marketing of the Fast ESP system may take a longer time to get into the black
  2. Stakeholders looking for a jet boost to the MSFT share price gets their feet placed in a bucket of ice water
  3. If the MSFT assertion is accurate, cost control becomes a much more significant for MSFT going forward in a lousy economy.

Stephen Arnold, June 8, 2009

Clouds Part, Truth Peeks Out

June 8, 2009

Ned Batchelder’s “Real World Cloud Computing is worth reading. You can find the article here. The information is a summary of key and interesting observations made by six start ups’ top guns about cloud computing. I don’t want to spoil your enjoyment of the original post. I would like to quote one passage to motivate you to read the article:

Here are five points about Amazon’s cloud services:

  • “Can’t send lots of emails, since you need a spam-white listed server
  • Disk I/O in the cloud is a lot slower than in real machines (“punishingly slow”).
  • Want a db server with 32Gb RAM? Amazon doesn’t offer it.
  • Want dual hardware load balancers? Amazon doesn’t have it.
  • PCI (credit card) compliance is a problem: use a 3rd-party cart or PayPal instead of doing it yourself in the cloud.”

Very useful.

Stephen Arnold, June 8, 2009

Exalead’s Vision for Enterprise Search

June 4, 2009

I had a long conversation with Exalead’s director of marketing, Eric Rogge. We covered a number of topics, but one of his comments seemed particularly prescient. Let me summarize my understanding of his view of the evolution of search and offer several comments.

First, Exalead is a company that provides a high performance content processing system. I profiled the company in the Enterprise Search Report, Beyond Search, and Successful Enterprise Search Management. Furthermore, I use the company’s search system for my intelligence service Overflight, which you can explore on the ArnoldIT.com Web site. Although I am no expert, I do know quite a bit about Exalead and how it enables my competitive intelligence work.

Second, let me summarize my understanding of Mr. Rogge’s view of what search and content processing may be in the next six to 12 months. The phrase that resonated with me was, “Search Based Applications.” The idea, as I understand it, is to put search and content processing into a work process. The “finding” function meshes with specific tasks, enables them, and reduces the “friction” that makes information such an expensive, frustrating experience.

Mr. Rogge mentioned several examples of Exalead’s search base applications approach. The company has a call center implementation and an online advertising implementation. He also described a talent management solution that combines search with traditional booking agency operations. The system manipulates image portfolios and allows the agency to eliminate steps and the paper that once was required.

The company’s rich media system handles digital asset management, an area of increasing importance. Keeping track of rich media objects in digital form requires an high-speed, easy-to-use system. Staff using a digital asset management system have quite different needs and skill levels. Due to the fast pace of most media companies, training is not possible. A photographer and a copyright specialist have to be able to use the system out of the box.

But the most interesting implementation of the SBA architecture was the company’s integration of the Exalead methods into a global logistics company. The information required to tell a client where a shipment is and when it will arrive. The Exalead system handles 5GB of structured data to track up to 1M shipments daily. Those using the system have a search box, topics and clients a click away, and automated reports that contain the most recent information. Updating of the information occurs multiple times each hour.

Finally, my view of his vision is quite positive. I know from my research that most people are not interested in search. What matters is getting the information required to perform a task. The notion of a search box that provides a way for the user to key a word or two and get an answer is desirable. But in most organizations, users of systems want the information to be “there”. That’s the reason that lists of topics or client names are important. After all, if a person looks up a particular item or entity several times a day, the system should just display that hot link. The notion of Web pages or displays that contain the results of a standing query is powerful. Users understand clicking on a link and seeing a “report” that mashes up information from various sources.

Exalead is winning enterprise deals in the US and Europe. My hunch is that the notion of the SBA will be one that makes intuitive sense to commercial enterprises, government agencies, and not-for-profit organizations. More important, the Exalead system works.

Stephen Arnold, June 5, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta