Email or Search: Which Wins the Gold

August 18, 2008

My son (Erik Arnold) runs a nifty services firm called Adhere Solutions. He’s hooked up with Google, and he views the world through Googley eyes. I (Stephen Arnold) run the addled goose outfit ArnoldIT. Google does not know I exist, and if Googzilla did, the Mountain View giant would make a duvet from my tail feathers.

The setting. We’re sitting in a cafeteria. The subject turns to which is the killer application for today’s 20 something. Is it email (the Brett Favre of online) or is it search (the Michael Phelps of cloud services). My son and I play this argument MP3 file frequently, and our wives have set down specific rules for these talks. First, we have to be by ourselves. Two, we have to knock off the debate after 30 minutes or so. Erik and I can extend analytic discussions of digital theory over years, and we have marching orders to knock that off.

Here’s the argument. Erik asserts that search is the new killer app. I agree, but I tell him I want to make a case for email as long as I can extend it to SMS and newer services under the category Twitterish. He agrees.

My Argument: Messaging

Messaging is communications. Search is finding and discovering. Therefore, the need to communicate is higher on the digital needs scale than simple finding. With services that allow me to call, text, create mini blogs, and broadcast brief Tweets, I am outputting and receiving messages that are known to be:

  • Important. I don’t text a client to tell her what I had for lunch is the wonderful cafeteria. Grilled cheese as it turns out. Important to me, but to no one else. I send important messages that have an instrumentality.
  • Timely. I control the time delivery, matching urgency with medium. I sent a fax last week. What a hassle, but the message warranted a fungible copy, not urgent delivery. I want to dial in the “time” function, not leave it to chance or to some other authority.
  • Content rich. I write baloney, but I wouldn’t write baloney unless it was important to me and to the recipient of one of my messages, articles, or 350 page studies.

In conclusion, messaging–particularly electronically implemented messaging–is the killer app. Search is useful, just not one to one, one to many, many to one, or many to many communications. By definition, search is not timely, of uncertain importance, and often not content rich due to format, editorial policy or the vapidity of the data.

My Son’s Argument

Messaging is not necessarily digital. Though crucial, when we talk about an online killer app, it’s not email. The killer app must deliver a function that we can’t duplicate in the analogue world. For that reason search is the killer application for the 21st century. Here’s why:

Read more

Microsoft Cloud Economics

August 17, 2008

Richi Jennings is an independent consultant and writer, specializing in email, spam, blogging, and Linux. His article “On Microsoft Online Services” is worth reading. You can find it here. His assertion is that Microsoft’s pricing for its online services will weaken the service. Mr. Jennings identifies information technology managers’ lack of knowledge about the cost of running machines and software on premises. He notes:

vendors would tell potential purchasers that they [the vendors] could provide the service for less money than it was currently costing to run it in-house, but when it came time to actually quote for the service, most IT managers simply didn’t believe it cost them that much.

The point is that basic knowledge of what enterprise software costs may be a factor in the success or failure of cloud services. He contrasts Microsoft’s online service pricing with Google’s. Google is less expensive. A happy quack to Mr. Jennings for this analysis.

Stephen Arnold, August 17, 2008

Wired Weighs in about Google and Privacy

August 16, 2008

Much of the information in the article by Ryan Singel in Wired here has been floating at conferences and in lunch conversations for almost a month. Mr. Singel in his “Google Privacy Practices Worse than ISP Snooping AT&T Charges” pulls together threads about AT&T’s view of Google here. You will want to read the article. For me, the most interesting point was this quote from the reassembling Ma Bell:

AT&T does not at this time engage in practices that allow it to track a consumer’s search and browsing activities across multiple unrelated websites for the purpose [of] developing a profile of a particular consumer’s online behavior.

Permit to offer several personal observations about the notion of monitoring by companies who intermediate digital flows:

  1. Monitoring can be narrowly defined or more broadly defined. The fact is that monitoring is performed at multiple points by multiple parties. Without precise definitions, assertions about what an intermediary does or does not do are subject to interpretations.
  2. Intermediaries want to know about users for the purpose of “owning” the customer. In the present environment, security and ad monitoring are “in addition to” not “instead of” a long standing characteristic of intermediaries to obtain information in order to “serve” customers better.
  3. Today any intermediary can use a variety of mechanisms to monitor, track, and use tracking data. These data can be fine grained; that is, about a specific user with a stateful session. Alternatively, an anonymous user can be placed in one or more clusters and then be “refined” as more data arrive.

Wired has taken an important step. More information about the data models in use for usage data are needed. More information about tracking and usage methods available to large intermediaries is also needed. Finally, with the emergence of “janitor” technology that can automatically clean up ambiguities, more information about this suggestive innovation is needed as well. I want more information, not just assertions.

Stephen Arnold, August 16, 2008

The Future of Search Layer Cake

August 14, 2008

Yesterday I contributed a short essay about the future of search. I thought I was being realistic for the readers of AltSearchEngines.com, a darn good Web log in my opinion. I wanted to be more frisky than the contributions from SearchEngineLand.com and Hakia.com too. I’m not an academic, and I’m not in the search engine business. I do competitive technical analysis for a living. Search is a side interest, and prior to my writing the Enterprise Search Report, no one had taken a comprehensive look at a couple dozen of the major vendors. I now have profiles on 52 companies, and I’m adding a new one in the next few days. I don’t pay much attention to the university information retrieval community because I’m not smart enough to figure out the equations any more.

From the number of positive and negative responses that have flowed to me, I know I wasn’t clear about my focus on behind the firewall search and Google’s enterprise activities. This short post is designed to put my “layer cake” image into context. If you want to read the original essay on AltSearchEngines.com, click here. To refresh your memory, here’s the diagram, which in one form or another I have been using in my lectures for more than a decade. I’m a lousy teacher, and I make mistakes. But I have a wealth of hands on experience, and I have the research under my belt from creating and maintaining the 52 profiles of companies that are engaged in commercial search, content processing, and text analytics.

search future

I’ve been through many search revolutions, and this diagram explains how I perceive those innovations. Furthermore, the diagram makes clear a point that many people do not fully understand until the bills come in the mail. Over time search gets more expensive. A lot more expensive. The reason is that each “layer” is not necessarily a system from a single vendor. The layers show that an organization rarely rips and replaces existing search technology. So, no matter how lousy a system, there will be two or three or maybe a thousand people who love the old system. But there may be one person or 10,000 who want different functionality. The easy path for most organizations is to buy another search solution or buy an “add in” or “add on” that in theory brings the old system closer to the needs of new users or different business needs.

Read more

Daticon’s Invenio

August 14, 2008

eDiscovery continues to bubble despite the lousy economy in North America. Several weeks ago we started the update procedure for our eDiscovery vendors. I made a mental note to post a short item about Daticon, a company supporting organizations engaged in electronic discovery. You can learn more about this company here. What interests me is the firm’s search technology, called Invenio. The technology is based on a neural network, and when I reviewed the system, some of its features reminded me of an outfit called Dolphin Search, but I may be wrong on this point. If Invenio is Dolphin Search, let me know.

Invenio is integrated with Daticon’s management tools. These tools are among the most fine grained I have seen. Once deployed, a manager can track most of the metrics associated with processing, reviewing, and screening email, documents, and other content associated with eDiscovery processes.

Here’s a representative display of system metrics.

dashboard

There are similarities between Daticon’s approach and that of other eDiscovery specialists such as Stratify and Zantaz. Daticon bundles eDiscovery with a work flow, data capture, metrics, and a range of content processing functions.

The search and content processing system support concept searching, duplicate detection and duplicate removal, email threading, non text objects, and case management tools. Essentially this is a case management function that allows analysis of activities associated with a matter.

The company makes an interesting series of demonstrations available. I did not have to register to get walk throughs of the Invenio system. Try them yourself by clicking here.

Stephen Arnold, August 14, 2008

Autonomy Lands Fresh Content Tuna

August 13, 2008

Northcliffe Media, a unit of the Daily Mail and General Trust, publishes a score of newspapers mostly in the UK. Circulation is nosing towards a million. Northcliffe also operates a little more than two dozen subscription and ad support weekly regional tabloids and produces 60 ad supported free week shopper type publications. The company also cranks out directories, runs a printing business, and is in the newsstand business. The company whose tag line is “At the heart of all things local” has international interests as well. Despite the categorical affirmative “all”, I don’t think Northcliffe operates in Harrod’s Creek, Kentucky. Perhaps it should?

Autonomy, the Cambridge-based search and Okana (an Autonomy partner) have landed the Northcliffe Media search business; thus, a big content tuna. Okana describes big clients like Northcliffe as “platinum” not “tuna” but I wanted to keep my metaphor consistent and Okana rhymes with tuna.

Okana was Autonomy’s “Innovative Partner of the Year” in 2007. Okana says, “Based around proven architectures, Okana’s range of Autonomy appliance products provide instantly deployable and scaleable [sic] solutions for Information Discovery, Monitoring, and Investigation.Compliance.”

This description of Okana’s offering an “appliance” was new information for me. Also, my research suggests that many Autonomy IDOL installations benefit from training the IDOL system. If training is required, then I ask, “What if any trade off is required for an instant Autonomy IDOL deployment?” If anyone has details for me, please, use the comments section of this Web log.

Autonomy’s IDOL (Intelligent Data Operating Layer) will index an initial  40 million documents and then process new content. The new system will use Autonomy’s “meaning-based computing” for research, information retrieval and trend spotting.

You can read the complete Autonomy news release here. Once again Autonomy is making sales as word reaches me of competitors running into revenue problems. Autonomy’s a tuna catcher while others seem fated to return to port with empty holds.

Stephen Arnold, August 13, 2008

The Future of Search? It’s Here and Disappointing

August 13, 2008

AltSearchEngines.com–an excellent Web log and information retrieval news source–tapped the addled goose (me, Stephen E. Arnold) for some thoughts about the future of search. I’m no wizard, being an a befuddled flow, but I did offer several hundred words on the subject. I even contributed one of my semi-famous “layers” diagrams. These are important because each layer represents a slathering of computational crunching. The result is an incremental boost to the the underlying search system’s precision, recall, interface outputs, and overall utility of the system. The downside is that as the layers pile up so does complexity and its girl friend costs. You can read the full essay and look at the diagram here. A happy quack to the AltSearchEngines.com team for [a] asking me to contribute an essay and [b] having the moxie to publish the goose feathers I generate. The message in my essay is no laughing matter. The future of search is here and in many ways, it is deeply disappointing and increasingly troubling to me. For an example, click here.

Stephen Arnold, August 13, 2008

MarkLogic: The Army’s New Information Access Platform

August 13, 2008

You probably know that the US Army has nicknames for its elite units. Screaming Eagle, Big Red One, and my favorite “Hell on Wheels.” Now some HUMINT, COMINT, and SIGINT brass may create a MarkLogic unit with its own flash. Based on the early reports I have, the MarkLogic system works.

Based in San Carlos (next to Google’s Postini unit, by the way), MarkLogic announced that the US Army Combined Arms Center or CAC in Ft. Leavenworth, Kansas, has embraced MarkLogic Server. BCKS, shorthand for the Army’s Battle Command Knowledge System, will use this next-generation content processing and intelligence system for the Warrior Knowledge Base. Believe me, when someone wants to do you and your team harm, access to the most timely, on point information is important. If Napoleon were based at Ft. Leavenworth today, he would have this unit report directly to him. Information, the famous general is reported to have said, is nine tenths of any battle.

Ft. Leavenworth plays a pivotal role in the US Army’s commitment to capture, analyze, share, and make available information from a range of sources. MarkLogic’s technology, which has the Department of Defense Good Housekeeping Seal of Approval, delivers search, content management, and collaborative functions.

img 813a

An unclassified sample display from the US Army’s BCKS system. Thanks to MarkLogic and the US Army for permission to use this image.

The system applies metadata based on the DOD Metadata Specification (DDMS). The content is managed automatically by applying metadata properties such as the ‘Valid Until’ date. The system uses the schema standard used by the DOD community. The MarkLogic Server manages the work flow until the file is transferred to archives or deleted by the content manager. MarkLogic points to savings in time and money. My sources tell me that the system can reduce the risk to service personnel. So, I’m going to editorialize and say, “The system saves lives.” More details about the BCKS is available here. Dot Mil content does move, so click today. I verified this link at 0719, August 13, 2008.

Read more

Is There a Mainframe in Your Future

August 13, 2008

Brian Womack’s article “Big Iron Anything But Rusty For Mainframe Pioneer IBM” brought a tear to my eye. Writing in Investor’s Business Daily here, Mr. Womak says:

IBM says revenue for its mainframe business rose 32% in the second quarter compared with a year earlier, easily outpacing overall sales growth of 13%. A big driver was February’s launch of IBM’s next-generation mainframe line, the z10, its first big upgrade since 2004. IBM spent about $1.5 billion on the new line.

The core of the article is an interview with David Gelardi, a 52-year-old mainframer. I don’t want to spoil your fun. I love mainframers who explain why big iron is as trendy as Heidi Klum’s catchphrase, “One day you’re in. And one day you’re out.” For example, consider this comment by Mr. Gelardi:

If I take (1,500 Intel) servers . . . and put them on a single mainframe, I’ll have no performance problems whatsoever. But I’m taking all of that workload that was on 1,500 separate servers and consolidating them on one mainframe. While it may be a million-dollar machine and up, it’s actually cheaper than those 1,500 servers.

This is pretty compelling data. I wonder if Google is aware of what it might gain if it were to abandon its decade of effort with commodity servers? Google and IBM are best buddies now. Maybe IBM will convince the GOOG to change its ways? Is there a  mainframe in your future?

Stephen Arnold, August 13, 2008

More Search without Search

August 13, 2008

Google wizard Stephen R. Lawrence and sub-wizard Omar Khan invented a what I probably too simplistically characterize as meta-data vacuum cleaner. Useful for mobile devices, this addition to Google’s “search without search” arsenal is quite interesting to me. The invention is disclosed in US7,412,708, granted on August 12, 2008, with the title “Methods and Systems for Capturing Information.” If you are interested in how Google can deliver information before a user types a query or what type of data Google captures, you will want to read this 14 page document. Think email addresses and more.

The invention is not new, which is important. The GOOG is slow in integrating whizzy new monitoring technology in its public-facing systems. This invention was filed on on March 31, 2004. Figure nine to 12 months of work, I think that this is an important chunk of Google’s metadata vacuum cleaner. I cover a number of these inventions in Google Version 2.0. I discussed one exemplary data model for usage tracking data in my for-money July August column for KMWorld. I won’t rehash those documents in this Web log article. You can download a copy of the document from the good, old USPTO here. Study those syntax examples. That wonderful USPTO search engine is a treat to use.

What’s this invention do? Here’s the official legal eagle and engineer description:

Systems and methods for capturing information are described. In one embodiment, an event having an associated article is identified, article data associated with the article is identified, and a capture score for the event is determined based at least in part on article data. Article data can comprise, for example, one or a combination of a location of the article, a file-type of the article, and access data for the article. Event data associated with the event is compiled responsive at least in part to a comparison of the capture score and a threshold value.

The GOOG’s Gmail plumbing may need some patch ups, but once those pin hole leaks are soldered, US7,412,708 portends some remarkable predictive services. I can’t type on my mobile phone’s keyboard now. Google knows that I will be one of the people eager to let Google anticipate my needs. I wonder if there’s a link analysis routine running across those extracted metadata. I think I need to reread this patent document one more time. Join me?

Stephen Arnold, August 13, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta