Social Networking Is Hot: Users Love It and So Do Intelligence Professionals

July 14, 2008

Once I fought through the pop up ad and the request to provide information about why I am growing to hate the InfoWorld Web site, I was able to read Paul Krill’s essay “Enterprises Become the Battleground for Social Networking.” Mr. Krill explains that social networking services are gaining traction outside the consumer market. MySpace.com, Facebook.com, Bebo.com, and dozens of other services make it easy to connect with friends in cyberspace. Citing a number of industry authorities and thought leaders, Mr. Krill provides a useful run down of the benefits of social networking. in commercial organizations, not-for-profit outfits, and governmental agencies. interest in social networking is rising.

The most interesting portion of the essay is the comments from an individual identified as MattRhodes. Mr. MattRhodes is a supporter of Gartner Group and its report on social networking. He writes:

… businesses aren’t making the use they should do of social communication. That consumers are getting more and more used to social networking and other social tools is well known by those of us who work in the industry. The reasons are simple – they actually offer a new and different way of communicating.

This assertion is indeed true. Also true is the interest in social networking. Technologies and services that work in the consumer Web migrate into organizations as well. Social networking, therefore, is going to play an important part in the information technology mix.

Amidst this violent agreement among myself, Mr. Krill, and Mr MattRhodes, there lurk some flashing yellow warning signals. In my opinion, some issues to ponder include:

  • Social networking provides a potent monitoring tool. Employees, users, indeed, anyone using the watched system can be tracked. Intelligence can be extracted. Individuals taking actions that are counter to the organization’s interest can be identified and appropriate action taken. The essence of social networking is not collaboration; social networking generates useful user behavior data and potentially more useful metadata.
  • Organizations have secrets. Social networking systems add doors and windows through which secrets can escape or watched. Most organizations have security provisions but actual security is breachable. Automated security systems that eliminate tedious permission set up by a security professional make it possible to reduce certain costs. The flip side is that most organizations have flawed security procedures, and the information technology department does what it can with its available resources. The security for certain social networking services can be a time bomb. No one knows that problem is there until the bomb goes up. Damage, depending on the magnitude of the bomb, can be insignificant or horrific.
  • New employees, comfortable with the mores of the evolving social networking world, bring different values and behaviors to online activity. Granted, some new hires will be gung ho and sing the company song each morning. Other new hires will take an informal approach to mandates about what information to share. Are you familiar with the actual behavior of graduates of one of India’s prestigious high school? I think this approach will characterize some of the new hires’ use of social networking.

To repeat: I think social networking and its underlying technology is important. I see many benefits. My experience suggests that those who cheerlead may want to spend a bit more time in the library reading about security vulnerabilities of real time, fluid, social functions. There’s a reason undercover agents make “friends” with persons of interest. The important relationships are not focused on finding a fourth for a golf outing.

Stephen Arnold, July 14, 2008

Microsoft Yahoo: Search Realities

July 14, 2008

The Wall Street Journal, the New York Times, and Reuters have covered the most recent Microsoft Yahoo mating dance in excruciating detail. If you have not seen these three media giants’ take on the Yahoo snub of Microsoft’s and Mr. Carl Ichan’s most recent offers, just navigate to one of these links:

  • New York Times here but you have to register: Angle is shift from saber rattling to escalating conflict
  • Reuters here: Angle is “guaranteed ad revenue” for five years
  • Wall Street Journal here: Angle is impasse that will lead to an “incredible dance”

You can explore links galore on Techmeme.com and Megite.com. I can’t add much to these reports of this ménage à trois. I would  like to point out that when some sort of deal goes through, search gains a new urgency. Here’s why:

  1. Google faces a real pit bull in the legal squabble with Viacom. Based on my research findings, Google may for the first time face a perfect story: lousy economy, escalating annoyance from developers over the Apps flap, and the privacy monsoon unleashed with the YouTube usage data decision. Now is the time to strike Google, but if the internecine warfare continues, Microsoft may miss this opportunity to deal a potentially devastating blow to the GOOG
  2. Yahoo is in disarray. Open source is a great idea. Cutting deals with Google is a great idea. The problem is that when one looks at the long term impact of these great ideas, the great ideas undermine the foundation of Yahoo. Better shore up that foundation before the basement fills with water and undermines the entire shotgun house
  3. Capturing headlines is not the same as making money. Microsoft itself needs to concentrate its forces, set priorities, and get down to business with regards to [a] Web search and [b] enterprise search. The senior management of any organization has a finite amount of attention and energy. Whatever is available needs to be focused on closing the gap with Googzilla and making gains in the severely fragmented enterprise search sector.

No doubt business school case writers are sharpening their pencils. Unless Microsoft can resolve this Yahoo business, the company may miss its chance at the brass ring. Google can settle with Viacom, mend its fences, and rebuild its lead with regard to Microsoft. Agree? Disagree? Help me fill in the gaps in my understanding.

Stephen Arnold, July 14, 2008

Microsoft: 1999 to 2008

July 14, 2008

I have written one short post and two longer posts about Microsoft.com’s architecture for its online services. You can read each of these essays by clicking on the titles of the stories:

I want to urge each of my two or three Web log readers to validate my assertions. Not only am I an addled goose, I am an old goose. I make errors as young wizards delight in reminding me. On Friday, July 11, 2008, two of my engineers filled some gaps in my knowledge about X++, one of Microsoft’s less well-known programming languages.

the perils of complexity

The diagram shows how complexity increases when systems are designed to support solutions that do not simplify the design. Source: http://www.epmbook.com/complexity.gif

Stepping Back

As I reflected upon the information I reviewed pertaining to Microsoft.com’s online architecture, several thoughts bubbled to the surface of my consciousness:

First, I believe Microsoft’s new data centers and online architecture shares DNA with those 1999 data centers. Microsoft is not embracing the systems and methods in use at Amazon, Google, and even the hapless Yahoo. Microsoft is using its own “dog food”. While commendable, the bottlenecks have not been fully resolved. Microsoft uses scale up and scale out to make systems keep pace with user expectations of response time. One engineer who works at a company competing with Microsoft told me: “Run a query on Live.com. The response times in many cases are faster than our. The reason is that Microsoft caches everything. It works, but it is expensive.”

Second, Microsoft lacks a cohesive code base and a new one. With each upgrade, legacy code and baked in features and functions are dragged along. A good example is SQL Server. Although rewritten from the good old days with Sybase, SQL Server is not the right tool for peta-scale data manipulation chores. Alternatives exist and Amazon and Yahoo are using them. Microsoft is sticking with its RDBMS engine, and it is very expensive to replicate, cluster, back up with stand by hardware, and keep in sync. The performance challenge remains even though user experience seems as good if not better than the competition’s. In my opinion, the reliance on this particular “dog food” is akin to building a wooden power boat with unseasoned wood.

Third, in each of the essays, Microsoft’s own engineers emphasize the cost of the engineering approaches. There is no emphasis on slashing costs. The emphasis is on spending money to get the job done. In my opinion, spending money to solve problems via the scale up and scale out approach is okay as long as there are barrels of cash to throw at the problem. The better approach, in my opinion is to engineer solutions that make scaling and performance as economical as possible and direct investment at finding ways to leap frog over the well-known, long-standing problems with the Codd database model, inefficient and latency inducing message passing, and dedicated hardware for specific functions and applications then replicating these clusters. And, finally, using more hardware that is, in effect, sitting like an idle railroad car until needed. What happens when the money for these expensive approaches becomes less available?

Read more

Microsoft.com in 2006

July 13, 2008

In late 2006, I had to prepare a report assessing a recommendation made to a large services firm by Microsoft Consulting. One of the questions I had to try and answer was, “How does Microsoft set up its online system?” I had the Jim Gray diagram which I referenced in this Web log essay “Microsoft.com in 1999”. To be forthright, I had not paid much attention to Microsoft because I was immersed in my Google research.

I poked around on various search systems, MSDN, and eventually found a diagram that purported to explain the layout of Microsoft’s online system. The information appeared in a PowerPoint presentation by Sunjeev Pandey, Senior Director Microsoft.com Operations and Paul Wright, Technology Architect Manager, Microsoft.com Operations. On July 13, 2008 the presentation was available here. The PowerPoint itself does not appear in the Live.com index. I cannot guarantee that this link will remain valid. Important documents about Microsoft’s own architecture are disappearing from MSDN and other Microsoft Web sites. I am reluctant to post the entire presentation even though it does not carry a Microsoft copyright.

I want to spell out the caveats. Some new readers of this Web log assume that I am writing news. I am not. The information in this essay is from June 2006, possibly a few months earlier. Furthermore, as I get new information, I reserve the right to change my mind. This means that I am not asserting absolutes. I am capturing my ideas as if I were Samuel Pepys writing in the 17th century. You want real news? Navigate elsewhere.

My notes suggest that Messrs Pandey and Wright prepared a PowerPoint deck for use in a Web case about Microsoft’s own infrastructure. These Web casts are available, but my Verizon wireless service times out when I try to view them. You may have better luck.

Microsoft.com in 2006

Here is a diagram from the presentation “Microsoft.com: Design for Resilience. The Infrastructure of www.microsoft.com, Microsoft Update, and the Download Center.” The title is important because the focus is narrow compared to the bundle of services explained in Mr. Gray’s Three Talks PowerPoint deck and in Steven Levi and Galen Hunt “Challenges to Building Scalable Services.” In a future essay, I will comment on this shift. For now, let’s look at what Microsoft.com’s architecture may have been in mid-2007.

2006 architecture

Microsoft.com Mid-2006

This architecture represents a more robust approach. Between 1995 and 2006, the number of users rose from 30,000 per day to about 17 million per day. In 2001, the baseline operating system was Windows 2000. The shift to Microsoft’s 64-bit operating system took place in 2005, a year in which (if Messrs Pandey and Wright are correct) Microsoft.com experienced some interesting challenges. For example, international network service was disrupted in May and September of 2005. More tellingly, Microsoft was subject to Denial of Service attacks and experience network failures in April and May of 2005. Presumably, the mid-2006 architecture was designed to address these challenges.

The block diagram makes it clear that Microsoft wanted to deploy an architecture in 2006 that provided excellent availability and better performance via caching. The drawbacks are those that were part of the DNA of the original 1999 design–higher costs due to the scale up and out model and its use of name brand, top quality hardware and the complexity of the system. You can see four distinct tiers in the architecture.

Information has to move from the Microsoft Corp. network to the back end network tier. Then the information must move from the back end to the content delivery tier. Due to the “islands” approach that now includes distributed data centers, the information must propagate across data centers. Finally, the most accessed data or the highest priority information must be make available to the Akamai and Savvis “edge of network” systems. Microsoft, presumably to get engineering expertise and exercise better control of costs, purchased two adjoining data centers from Savvis in mid-2007 for about $200 million. (Note: for comparison purposes, keep in mind that Microsoft’s San Antonio data center cost about $600 to $650 million.)

Read more

Google Wants the World’s Knowledge–Yep, All of It

July 13, 2008

ABC News featured Gregory Lamb’s essay “Could Google Monopolize Human Knowledge.” If you are interested in Google’s scanning project or the prospect of Google monopolizing knowledge, click here.

Between you and me, I don’t know what knowledge is, so I think I am supposed to be flexible in interpreting the title of Mr. Lamb’s essay.

The core of the argument is that Google has cash. The company scanning books in a shoddy way. And Microsoft, once involved in this expensive game, dropped out. Brewster Kahle is scanning books and looking for funding to continue with his scanning project.

So, if Google keeps on scanning, Google will have page images, crappy ASCII versions of the source documents, and lots of users. I am not doing justice to Mr. Lamb’s analysis.

One point that encapsulated the argument was:

So far, Google isn’t aggressively trying to make money off its book pages, though a few ads and links to buy hard copies from the publisher do appear. Keeping users inside Google’s online “universe” seems to be the company’s long-term motive.

The operative phrase is “long term motive”. I know that the “don’t be evil” catchphrase clashes with the company’s obligations to Wall Street and stakeholders. In fact, Mr. Lamb cites academics’ annoyance that Google’s optical character recognition sucks. That’s a useful fact because it underscores the lack of understanding some–maybe ABC News, journalists, and University of Virginia professors–bring to commercial information processing.

For several years, I labored in the vineyards at Bell+Howell. The company was one of the leaders in scanning. Converting a paper document to an image file and its accompanying ASCII text is tricky. I am not going into the mechanics of bursting (chopping up source documents in order to get pages that can fed by the stack into a scanning device), making sure the pages are not misfed and therefore crooked, checking the order of the auto numbered images to make sure that the images are OCR’ed in the correct order, and verifying that a group of scanned images comprising a logical document are properly linked to the ASCII file. This stuff is trivial and too bothersome for amateurs to explore.

The core issue is that libraries lack the funds to manage their vertical file collections. When an author dies or a photographer falls out of a helicopter, the heirs often gather up manuscripts, notebooks, negatives, and pictures. A quick trip to the local university library allows the heirs to get a tax deduction and get the stuff out of the kids’ garage. Libraries are choking on this primary material. The Library of Congress has a warehouse full of important primary material and it lacks the funds to catalog and make the hard copy materials available to researchers. Scanning of important materials such as those found in the American Memory project are not funded by government money. The librarians have to do fund raising.

University libraries are in worse financial shape. Public libraries, if you can believe, are farther down the drain.

And publishers? These folks are fighting for survival. If a bright young Radcliffe Institute of Advanced Study post doc gets the idea to scan a book on the publisher’s back list, our take charge, eager beaver will be flipping burgers at the McDonald’s on Times Square.

Let’s review some facts, always painful to folks like those in the news business:

  1. Scanning sucks. Optical character recognition sucks more. Fixing lousy ASCII requires human editors because software still is not infallible. 97% accuracy means three errors per 100 words. If an insect gets trapped in the scanner, accuracy can be adversely affected because the source image has a big bug over the text. The OCR engine can’t figure out what’s under the bug, so 97% drops to 96%. The fix is fuzzy algorithms, trigrams, and other tricks to make lousy ASCII useful. I have been in the information processing business for a long time. OCR sucks less today, but it still sucks.
  2. Scanning is expensive. If Google quits scanning, who is going to do the work and pay the bill? My hunch is that if we asked graduate school professors to work one day a week to scan the primary material in their institution’s library, the request would be received with derisive scorn. Scanning is messy, dirty, tedious, fraught with errors, and dull, dull work. Operating a scanner and performing the human sheep herding is tough work. Volunteers from the UVa?
  3. Google is using the book project in several ones. The good news is that making book search available is useful to scholars. If you look at the fees levied by our friends at Ebsco, ProQuest, Reed Elsevier, and Thomson Reuters–Google’s “free” looks pretty good to me. The bad news is that few people outside of Google understand what the book scanning project provides to Google. And I am not going to include that item in a free Web log post. Google isn’t scanning because it’s cheap. There are technical and economic reasons the company is investing in the project, haphazard as it is.

Perhaps Kirtas Technologies’, maker of five and six digit scanning and OCR systems will dip into the company’s vast cash surplus and do the job right? The reality is that Kirtas won’t scan one page unless it is doing a demo or getting paid to do the work. It’s easy to criticize; it is harder to do this work when you have to write the checks from the Kirtas bank account. Based on my information, Google has a bit more financial headroom than Kirtas.

Observations

  1. Mr. Lamb has a well written essay that contributes to the spate of essays bashing Google. This is a mini trend, and I think the criticism will increase as it dawns on pundits that Google has been beavering away without competition for a decade. Now that the company is the dominant search system and primary online advertising engine, it’s time to point out the many flaws in Google. Sorry, OCR is what it is. Google is what it is.
  2. The complexity of certain Google activities is not well understood. Overlooking the economics of scanning is an important omission in the essay. A question to ask is, “If not Google, who?” I don’t have many names on my short list. Obviously the Bill and Melinda Gates Foundation wasn’t ready to pick up the thrown Microsoft ball.
  3. Google is a very different enterprise. I marvel at how Wall Street sees Google in terms of quarterly ad revenue. I am amazed at analyses of one tiny Google initiative. Google is a game changer, and the book project is a tiny component in a specific information initiative. Any one of the Beyond Search Web log readers know what it is? If you do, use the comments section to fill me in.

Google has reader pull. I look forward to more “flows like water” analyses of the GOOG. Over time, one of the reports will further our understanding of Googzilla. Film at 11.

Stephen Arnold, July 13, 2008

SaaS Analysis

July 13, 2008

Peter Laird’s “Oracle, IBM, SAP, Microsoft, Intuit and the SaaS Revolution” is a business analysis you will want to scan, download, and save for reference. Unlike most Web log posting–especially the ones generated by the addled goose of Beyond Search–Mr. Laird digs into a comparison with objective descriptions, strengths, and weaknesses. He also provides useful links to Web sites and Web logs with related information. You can read the full posting on Sys-Con’s Oracle Journal here.

I have minimal interest in SAP unless the company’s fabulously expensive “boil the ocean” software brings a company to the brink of insolvency. Intuit is too tainted by the weird upgrades and my memories of trying to make its line items match the ones a client wanted.

I found the analysis of Oracle, IBM, and Microsoft quite good. For a free report, Mr. Laird deserves a pat on the back. Heck, I will buy him a two-piece crispy meal at the Kentucky Fried Chicken just up the paved road from where I live.

One comment bit me on the arm and hasn’t let go. The information appears in his write up about IBM. The remark is not Mr. Laird’s. He is quoting a pundit named Jeff Nolan, who opines:

IBM lacks the business apps necessary to execute on an effective SaaS strategy.

This is a strong statement and, in my opinion, accurate. I am baffled by IBM’s initiatives. Several years ago, a Wall Street maven told me that IBM was building a giant grid computer system. The headquarters was in West Virginia as I recall. I scanned Mr. Laird’s comments about SaaSpace.com, Applications on Demand, and (my favorite) Blue Business Cloud (quite a metaphor of unhappiness perhaps). Despite my digging into IBM’s enterprise search and text mining products and services, these were news to me.

I realize the IBM is in the $100 billion range. I even own two NetFinity 5500s. Both have been running for years with minimal hassle. But I cannot explain without quite a bit of work why IBM’s products and services are ill-defined. The confusion is not intentional. I have a difficult time seeing IBM focusing as Salesforce.com does and introducing a service that can be explained in a 60-second elevator pitch.

If you have links to a clear, concise explanation of IBM’s many search and text mining initiatives, please, post the links. For now, IBM is lagging behind Microsoft, which may be hard to swallow if you are a well-paid super smart engineer working on Blue Business Cloud. IBM also believes it has Google figured out.

Send me those links and read Mr, Laird’s report.

Stephen Arnold, July 13, 2008

Microsoft.com in 1999

July 12, 2008

In my previous essay about Jim Gray’s Three Talks in 1999, I mentioned that he and his team had done an excellent job of summarizing trends in data center design, online infrastructure options, and cost analysis of power and name brand hardware. If you have not read that essay, I invite you to take a look at it here. You may want to download the Power Point here. The document does not carry a copyright mark, but I am reluctant to post it for my readers. Please, keep in mind that Microsoft can remove this document at any time. One of the baseline papers referenced in this 1999 Three Talks document is no longer available, and I have a resource working on tracking it down now.

I invite you to look at this diagram. I apologize for the poor quality of the graphic, but I am using an image in Mr. Gray’s 1999 presentation which has been crunched by the WordPress program. I will make some high level observations, and you will be able to download the 1999 PowerPoint and examine the image in that document.

gray diagram 1998

I want to keep the engineering jargon to a minimum. Half of my two to four Web log regulars are MBAs, and I have been asked to clarify or expand on a number of technical concepts. I will not provide that “deep dive” in my public Web log. Information of that type appears in my for-fee studies. If this offends you, please, stop reading. I have to make a decision about what is placed on the Web log as general information and what goes in the studies that pays for the blood sucking leeches who assist me in my research.

The Diagram: High-Level Observations

The set up of Microsoft.com in 1999–if Mr. Gray’s diagram is accurate–shows islands of two types. First, there are discrete data centers; for example, European Data Center, Japan Data Center, and Building 11. Each of these appear to be microcosms of the larger set up used in North America. European and Japan Data Centers are identical in the schematic. I took this to mean that Microsoft had a “cookie cutter” model. This is a good approach, and it is one used by many online services today. Instead of coming up with a new design for each data center, a standard plan is followed. Japan is connected to the Internet with a high speed OC3 line. The European Data Center connection is identified as Ethernet. When you print out Mr. Gray’s Three Talk presentation, you will see that details of the hardware and the cost of the hardware is provided. For example, in the Japan Data Center, the SQL Server cluster uses two servers with an average cost of $80,000. I know this number seems high, but Microsoft is using brand name equipment, a practice which the material I have reviewed suggests continues in 2008.

Second, there is a big island–a cluster of machines that provide database services. For example, there are “Live SQL Servers” with an average cost of $83,000, SQL Consolidators at a cost of $83,000, a feeder local area network to hook thee two SQL Server components together. I interpret this approach as a pragmatic way to reduce latency when hitting the SQL Server data stores for reading data and to reduce the bottlenecks that can occur when writing to SQL Server. Appreciate that in 1999, SQL Server lacked many of the features in the forthcoming SQL Server update. Database access is a continuing problem even today. In my opinion, relational databases or RDBMS are not well suited to handle the spikes that accompany online access. Furthermore, in this design, there is no provision I can see in this schematic for distributing database reads across data centers. We will return to the implications of this approach in a moment.

Third, notice that there are separate clusters of servers in an even bigger island, probably a big data center. Each performs a specific function. For example, there is a search cluster identified as “search.microsoft.com” and an ActiveX cluster identified as “activex.microsoft.com”. Presumably in a major data center or possibly two data centers connected by a high speed line in North America, the servers are hard wired to perform specific functions. The connections among the servers in the data centers use a very sophisticated and expensive in 1999 dollars a fiber ring or more precisely Fiber Distributed Data Interface. (FDDI is a 100 Mbps fiber optic LAN. It is an ANSI standard. It accommodates redundancy.) Microsoft’s own definition here says:

[The acronym] stands for Fiber Distributed Data Interface, a high-speed (100 Mbps) networking technology based on fiber optic cable, token passing, and a ring topology.

To me, the set up is pragmatic, but it suggests putting every thing in one, maybe two places. In 1999, demand was lower than today obviously. With servers under one roof, administration was simplified. In the absence of automated server management systems, technicians and engineers had to perform many tasks by walking up to a rack, pulling out the keyboard, and directly interacting with the servers.

Finally (there are many other points that can be explored, of course), note that one FDDI ring connects to the primary node (not a good word but the diagram shows the FDDI rings in this type of set up) to a secondary FDDI ring. Some services are mirrored such as home.microsoft.com and support.microsoft.com. Others such as premium.microsoft.com and “ftp://ftp.microsoft.com” are not.

Read more

Traditional Media: Will the World End in Fire or Ice?

July 12, 2008

Search systems index content. In the hoary past, name brand magazines, journals, and tabloids were the sources of choice. With the Guardian paying $30 million for a Web log, established media companies like the Guardian Media Group still have some fight in them. You can read the many views on this $30 million deal from on BoomTown here. Word smither-ette par excellence Kara Swisher provides the meat that fattens many Web content recyclers like moi. I enjoyed PaidContent.org’s coverage of itself here. One sentence stuck in my short term memory as a nice summary of how a big outfit perceives itself:

We’re not keen on strategic investing because my position is very much a GMG – Guardian Media Group – view of the world, which is that with strategic investments and minority share holdings… you can’t really drive anything … you don’t really feel like it’s part of you and they can take up a lot of time and they can take a lot of trouble and you don’t have much impact.

Almost over looked was some other news about traditional media. My favorite dead trees’ publication–the New York Times–published Richard Pérez-Peña’s essay “In Deepening Ad Decline, Sales Fall 8% at Magazines.” You can read the full text here. I don’t need to quote a sentence or two. The headline says it all.

My take on the sale of PaidContent.org is that it is great news for the owners. My take on traditional news and magazine publishing companies is that these outfits are a bit like sheep. Now that a couple of sheep are moving, the rest of the sheep will follow. Their smooth-talking business development managers will be the sheep dogs to keep the herd together and the skeptics away.

Well, too late for me. When the dust from Web log acquisition settles, the traditional newspapers and magazine companies will still be rolling up their pants and pantsuits as the red ink rises. Owning a Web log is not the same thing as making it work in a traditional publishing company’s business model.

Agree? Disagree? (Remember I used to work at a newspaper company and a large New York publishing outfit.) Data are more useful than sending me questions in a personal email (seaky 2000 @ yahoo dot com) or calling me and working really hard to change my mind.

Data make it clear: odds of long-term success in a Webby world are long. Bet on the Cubs instead.

Stephen Arnold, July 12, 2008

Open Text Closes on Spicer Slice

July 11, 2008

Open Text acquired privately-held Spicer Corp. You can read Christian Daems’s “Open Text Acquires Division of Spicer Corporation” here.

You may be wondering, “What’s an Open Text?” The company is a player in enterprise search. Among its search properties are an SGML database and search system, the Fulcrum search and retrieval system, BRS Search (a variant of IBM’s original STAIRS mainframe search system), and the Information Dimension’s BASIS data management and search system. In the first edition of Enterprise Search Report, I provided some background information on these systems, and I don’t know if the 4th edition, which I did not write, retained my original baseline on the company.

Open Text was a category leader in collaboration. I recall seeing a demonstration of the system in Washington, DC, many years ago. LiveLink is a content management, collaboration, and search platform. The company hopped on the email search and management bandwagon as soon as news of corporate fraud gained momentum.

What’s a Spicer? According to the Open Text news release, which you can read in its entirety here, tasty part that Open Text bought is the

division that specializes in file format viewer solutions for desktop applications, integrated business process management (BPM) systems, and reprographics.

Spicer provides file viewing software. Instead of launching a third-party application to view a file in a results list, the Spicer technology displays the file without recourse to the native application. Advantages include speed because the native application like Adobe Acrobat is a fat little piggy, chomping memory and time. The other advantage is an opportunity to step away from Stellent’s Outside In viewing technology, which is getting more expensive with each license cycle. Spider also has some security functions that Open Text wants. You can read the full Spicer story here.

This acquisition accompanies Open Text’s purchase of Corbis eMotion. This is an electronic media management tool, primarily used to keep track of images. Could Open Text be contemplating a push into enterprise publishing systems to compete with IBM and Hewlett Packard? If so, Open Text may want to buy Nstein and beef up its tagging capability.

What’s the connection with enterprise search? Not much in my opinion.

Open Text has become a mini-IBM, offering a range of products, services, and features. My thought is that search technology is not delivering the slices of bacon that Open Text’s management and stakeholders want. Furthermore, the competition in email and litigation support is increasing. The core content management system customers are pushing back because CMS is a mess for many customers and vendors. Upstarts like Brainware and ZyLAB are pushing into accounts once viewed as captive to Open Text’s unit managers. The collection of search technologies is difficult to explain, expensive to maintain, and confusing to some of the new Open Text hires whom I have encountered at trade shows this year.

Open Text, after Research in Motion and Coveo, is a darling of the Canadian high-tech sector. The Canadian government doesn’t want another Delphes-like misfire to tarnish the reputation Industry Canada and provincial governments work hard to communicate.

In my opinion, the Open Text buying spree delivers these benefits in my opinion:

  1. Customers which can be given an opportunity to buy more Open Text products and services
  2. Media buzz which translates to investor communications
  3. Filling in gaps in order to make repositioning easier and more credible if the CMS push back becomes more aggressive.

I am probably an addled goose, but I find Open Text’s messaging about search muddled. Click this link to see a search for “search retrieval” on Open Text’s Web site with its own search system. I hope you find the results crystal clear. I don’t.

My working hypothesis is that when companies buy a number of search technologies, I think the cost of explaining each system is high, maybe as expensive as maintaining, supporting, and enhancing the menagerie search systems.

Yahoo fell into this swamp. IBM is in the swamp’ as well. Microsoft has just waded in with a pack of search technologies. If my research is on target, the more search technologies a company collects, the less effective the company becomes in search, content processing, and text processing.

I think companies need to manage the search brands, messaging, and costs; otherwise, cost control becomes very difficult. Even worse, no customer knows what to buy when for which particular search problem. In my own experience, the engineers who have to keep these complex search systems working and in fighting trim are not given the time, resources, and freedom to make Rube Goldberg devices hum like a Toyota computer controlled welding machine. Customers want search to work, and my research suggests for most users search is a source of dissatisfaction.

With enterprise search getting close to a commodity function, some drastic MBA-type positioning is needed right after a finance type with a sharp pencil tallies the cost of search roll ups.

Agree? Disagree? Help me learn.

Stephen Arnold, July 11, 2008

More Artificial Intelligence: This Time Search

July 11, 2008

I remember when InfoWorld was a big, fat tabloid. I had to keep two subscriptions going because I would be summarily dropped. So, my dog at the time–Kelsey Benjamin–got one, and my now deceased partner, Ken Toth got the other one. It was easy to spoof the circulation folks who had me fill out forms. I used to check my company size as $10 to $20 million and assert that I bought more than $1 million in networking gear.

Paul Krill wrote “Artificial Intelligence Tied to Search Future”, which appeared on July 12, 2008, on the InfoWorld Web site. You can read the story here. (Search is not a core competency of most publishing companies, so you may have to enlist the help of a gum shoe if this link goes dead quickly.)

The point of the well-written essay is that an IBM wizard asserts that artificial intelligence will be instrumental in advanced text processing.

No disagreement from me on that assertion. What struck me as interesting was this passage from the essay:

“We’re going to see in the next five years next-generation search systems based on things like Open IE (Information Extraction),” Etzioni said. Open IE involves techniques for mapping sentences to logical expressions and could apply to arbitrary sentences on the Web, he said.

The Etzioni referenced in the passage is none other than Oren Etzioni, director of the Turing Center at the University of Washington.

Why is this important?

Google and Microsoft hire the junior wizards from this institution, pay them pretty well, and let them do stuff like develop systems that use artificial intelligence. The only point omitted from the article is that smart software has been part of the plumbing at Google for a decade, but Google prefers the term “janitors” to “smartbots”. Microsoft in 1998 was aware of smart software, and the Redmonians have been investing in artificial intelligence for quite a while.

My point is that that AI is not new, and it is not in disfavor among wizards. AI has been in disfavor among marketers and pundits. The marketers avoid the term because it evokes the image of SkyNet in Terminator. SkyNet is smart software that wants to kill all humans. The pundits over hyped AI years ago, discovered that smart software was useful in air craft control systems (yawn) and determining what content to cache on Akamai’s content delivery network servers (bigger yawn).

Now AI is back with zippier names–the essay includes a raft of them, and you can dig through the long list on page 2 of Mr. Krill’s essay. More important, the applications are ones that may mean something to an average Web surfer.

I must admit I don’t know what this means, however:

Etzioni emphasized more intelligent Internet- searching. “We’re going to see in the next five years next-generation search systems based on things like Open IE (Information Extraction),” Etzioni said. Open IE involves techniques for mapping sentences to logical expressions and could apply to arbitrary sentences on the Web, he said.

If you know, use the comments section for this Web log to help me out. In the meantime, run a Google query from www.google.com/ig. There’s AI under the hood. Few take the time to lift it and look. Some of the really neat stuff is coming from Dr. Etzioni’s former students just as it has for the last decade at Google.

Stephen Arnold, July 10, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta