Open Source: Another Pre-Quake Tremor

June 10, 2008

Dzone (Javalobby department) has an excellent interview that suggests an unexpected open source casuality–software tools. The Web log post is chock full of useful information plus some revelatory screen shots. This is a four-page interview, links, and comments. The subject of the interview is John De Goes, president of N-Brain, a firm that creates UNA, a source code editor. UNA at this time is free. The idea is that a better mousetrap is free, and it may put increasing pressure on the companies in the tools business.

Another key in this nice piece of work is a comment from a user of the open source tool UNA:

UNA is a special platform. Anyone who knows how I code and run projects understand[s] how bold a statement that is for me. Why? I very much believe in the solo hacking til it works. UNA is about group – real time collab. I usually hate group collab on code and design because the communication and miscommunication gets in the way. UNA is different because the collaboration is weirdly seamless and actually real time – you all see the same things, you chat inline, code completion just works, everything is tracked, and never once does the group feature take precedence over just coding. …I sure hope the Visual Studio, Netbeans, Eclipse, Zend, Codeworks, and Nusphere folks pay attention to this and either integrate or buy N-brain[‘s technology]. Seriously, the system is that cool.

I interpreted the interview and this biting observation as meaning that open source programming tools are likely to take increasingly large bites of the proprietary software tools market. Why’s this important? Lucene, Nutch, FLAX, and other open source search systems are likely to have a similar impact over time. Read this interview, please.

Stephen Arnold, June 10, 2008

Amazon and Cloud Computing

June 10, 2008

Intelligent Enterprise’s very good journalist Doug Henschen has an excellent interview with Amazon’s Adam Selipsky here. The interview is meaty, running to three sections. (Snag this now. Content can be tricky to find after a day or two.) Mr. Henschen has made an effort to capture the detail that many editors deem irrelevant. This interview is a keeper if you are a collector of Amazon business thought.

Two key points struck me as I was reading Amazon’s VP of product management’s answers to Mr. Henschen’s questions. Let me highlight these two points and then offer several observations.

The first point is this comment by Mr. Selipsky:

I’d also say that part of the whole point of computing in the cloud is that you don’t have to worry about where the resources are. If you decide to do enterprise backup in S3 — which is a great application that we’re seeing more and more of, by the way — do you really need to know exactly where and when those copies are replicated and how our replication works? That’s time you could be spending on something else. What you want to know is that we’re never going to lose the data, that your people are going to be able to access the data whenever they want, and that it’s secure.

As I understand this comment, Mr. Selipsky wants to shift a customer’s concern from the physical location of data to comfort that “we’re never going to lose the data.” I recall that one of my college professors explained the notion of categorical affirmatives and categorical negatives. In my aging Kentucky brain I have boiled down that learning to one precept: Never say never. I think Mr. Selipsky is saying “never”.

The second point is this exchange between Mr. Henschen and Mr. Selipsky. I have marked Mr. Henschen’s question with the label “question” and Mr. Selipsky’s response as “answer”:

Question: Would you say that Internet-based, customer-facing businesses like Amazon are at the center of the target and that enterprises business might be ceded, one day, to cloud vendors that specialize in enterprise services?

Answer: I would absolutely not say that. We had Fortune 500 companies using Amazon Web Services literally from the day we launched S3. It’s true that the majority of our early usage was from smaller companies and startups because they tend to be a little more risk tolerant. We thought it would take several years to get to the enterprise stage in a meaningful way, but we’ve actually been rather surprised at how quickly enterprise adoption and interest have accelerated.

As I understand Mr. Selipsky’s remark, Amazon is prepared to compete with firms yet to enter the cloud computing arena. For example, there’s the loose tie up between Google and Salesforce.com. There’s the rumor of interest in utility services from Verizon, AT&T, and other telecommunications companies. I’ve heard chatter that Cisco and Intel as unlikely as this seems to me are thinking about cloud computing. Most of these companies have the potential to make a monopoly play in cloud computing. Also, I think I see another categorical in Mr. Selipsky’s remark: “absolutely”. I find this an interesting word choice in a market sector that is in its infancy.

Observations

Let me wrap up with several unasked for remarks:

  1. I find that categorical affirmatives and negatives jejune. Maybe the word choice is colloquial and unintentional, but I think it shows the shallowness of Amazon’s positioning of its Web services. It’s tough for me to accept absolutes and categoricals when so much is uncertain in this cloud computing space.
  2. My recollection is that Amazon’s core service suffered service outages on Friday, June 6, 2008, and then again on Monday, June 9, 2008. The information about the problem has been sparse. Regardless of the cause, these recent problems suggest that Amazon’s engineering has some details to which it must attend.
  3. Certain services naturally tend toward monopolies. For example, I don’t buy my electric power in rural Kentucky from the cheapest vendor. I use the only vendor, a non-US outfit with control of the local power company. I think cloud computing may share some DNA with the pre-break up AT&T; that is, it will make economic sense for one company to control the market. Because cloud computing is an engineering problem, I think the winner will be a company that can do great engineering economically. A system failure doesn’t translate in my mind to great engineering.

Agree? Disagree? Let me know. Use the comments section to push back or provide additional insight. This is a Web log, so weigh in.

Report about Amazon technical issue from Network World is here.

Stephen Arnold, June 10, 2008

Google Economics: Innovation + Scalabilty = Success

June 10, 2008

Eric Schmidt’s talk at the tony Economic Club of Washington caught the attention of journalists, Web log authors, and assorted pundits. You can find a plethora of links and news stories on Google News, Topix.net, and NewsNow.co.uk. The write up that caught my eye is the one in the Los Angeles Times’s Web log. An essay written by Jim Puzzanghera, whom I don’t know, struck me as the pick of the litter. The write up is quite long, and I urge you to download it here and read it. (My experience with the search functions on traditional media’s Web sites is generally negative. If the link goes dead, that’s par for the course.)

Two points in the article caught my attention. Let me highlight each and then offer several observations. Now, the hot stuff in the article by Mr. Puzzanghera.

First, Mr. Puzzanghera quotes Mr. Schmidt as saying:

It is possible to build a culture around innovation. It is possible to build a culture around leadership. And it is possible to build a culture around optimism. Google is an example, but by no means the only example, of a culture that can be built based on relatively scalable principles. We could run our country this way. We could run the world this way.

My understanding of this statement is that Google is an innovator interested in “scalable principles”. A “scalable principle” as I understand it means that something can get big and then bigger. Knitting together the notions of innovation and scaling, we get a mathematician’s view of how to run the railroad. The only downside to fusing these two ideas is that there’s not much room for folks who don’t want to innovate (most businesses and the US government to cite two examples) or scale (people fearful of getting too big because that’s a lot of work). I can see the audience attendees shifting in their chairs and looking at one another as if to ask, “What’s this fellow talking about?” Mr. Schmidt is providing a clear statement of what makes Google tick. The problem is that most Economic Club members don’t understand Google as anything other than a Web search company and ad company.

The second point is the last sentence of Mr. Puzzanghera’s article. In reference to Mr. Schmidt’s statement “Let’s be revolutionaries”, Mr. Puzzanghera writes in reference to this comment by Mr. Schmidt:

Those sound like the words of someone who might be considering a run for higher office one day, assuming Google isn’t running everything by then.

What struck me about this comment is that Google’s senior manager is using the lingo of a diplomat as shaped by a speech advisor schooled who wants to leave an audience with a call to action. The “Let’s be revolutionaries” line is a new twist to Google’s public persona. Mr. Puzzanghera nails it. Google is positioning itself and maybe its management team to take a more proactive leadership role. If I’m right, Google is going to be doing more talking about its technology, its implications, and its way of doing business.

Observations

Google’s spring transparency offensive continues into early summer. Googlers–previously a secretive cabal–are turning up on podcasts (Steve Gillmor landed the big Google API fish Mark Lucovsky), Web logs (Datawocky’s Anand Rajaraman shared some Googley insights from artificial intelligence guru Peter Norvig), and an associate in Israel spotted Sergey Brin chatting about green energy and data here. Even this goose’s Web log received a comment from an alleged Googler about the company’s transparency here.

Several other thoughts triggered by Mr. Puzzanghera’s write up are warranted:

  1. Google is talking more, and I think as this information becomes more widely available, public perception of Google may evolve–and quickly. The one-dimensional view of Google as a search company in the business of selling online ads may give way to a multi-dimensional view of the GOOG
  2. The leadership message is a signal to me that Google wants to move from the shadows into the mainstream of business and political influence. After the missteps in Washington when one of the Googlers visited senators and congressmen in a sparkly T-shirt and sneakers, Google is starting to understand how the non-mathematical world works. That’s a big shift in the last three years.
  3. The greater openness leaves the company open to charges of doing too many things. Microsoft may be the beneficiary of Google’s chattiness. Assume Google continues to explain its view of business, leadership, economics as a blend of innovation and scalability. Google becomes more vulnerable to criticism.

How will Google deal with the consequences of talking more? Let me know your thoughts.

Stephen Arnold, June 10, 2008

Panorama: A Google Surfer

June 9, 2008

A colleague in the UK sent me the text of a Datamonitor Computerwire with the title “Google Sparks Analytics as a Service Push”. I tracked down a version of the item sent me here. I have a sneaking suspicion that the news item will not be available online very long. Companies with eight syllables in their names charge for their information.

The key point in the write up from my point of view is:

Google partner Panorama Software in March unveiled a set of analytics gadgets for Google Docs, Google’s personal productivity and collaboration tools that are offered on a software-as-a-service model. On the back of that development will soon come Panorama’s PowerApps, an analytical engine for the web or OLAP 2.0, which when released later this year will enable ISVs and software developers to build and extend analytical applications using the power of cloud computing. The platform will offer APIs to create OLAP cubes as well as deliver and create customized reports from within Google applications.

Panorama Software is one of a large number of vendors in the analytics business. Unlike many of those firms, Panorama has embraced Googzilla. In my opinion, Panorama’s management has figured out what wave to ride, particularly when it comes to enterprise applications delivered from the cloud.

One interesting fact about Panorama is that it sold its OLAP technology to Microsoft in 1996. You can obtain Panorama’s system as SQL Server Analysis Services, which is integrated into the SQL Server database platform. The company’s embrace of Google suggests that Panorama’s management has found another perfect wave. Google seems content to let companies that “get it” surf the Google-generated opportunities at least for now. You can learn about Panorama’s services here.

Stephen Arnold, June 10, 2008

More Funny Numbers: Enterprise 2.0

June 9, 2008

The Industry Standard’s article “Momentum, Some Confusion Mark Enterprise 2.0” does a good job explaining why buzz words are dragging information technology managers into a quagmire. You can find the very good article here. A number of points jumped out at me. I want to highlight two that struck me as particularly important and offer one observation.

The first point is this comment bit of information in the middle of the write up:

A recent study by the research firm AIIM found that while nearly half of respondents said Enterprise 2.0 is “imperative or significant to corporate goals and objectives,” 74 percent said they have “only a vague familiarity or no clear understanding” of the concept.

I found this downright amazing.

The second point is this fact about a market that people have “only a vague familiarity or no clear understanding” of:

And one prediction has the space ready to explode. Forrester Research recently said that enterprise spending on technologies such as social-networking platforms and RSS feeds will grow to US$4.6 billion by 2013, from $764 million in 2008.

That’s a lot of revenue from a concept most people, including this rural Kentucky boy, don’t understand.

My observation: economic pressures are forcing conference organizers to go to great lengths to create buzz. In a deteriorating economic climate, I think it is attendees zero. Conference marketers 2.

Stephen Arnold, June 10, 2008

Exalead: Enterprise Search Heats Up

June 9, 2008

Exalead, founded by former AltaVista.com guru, François Bourdoncle named a US president for the privately-held search and content processing company. Paul Doscher will work from Exalead’s San Francisco office. Mr. Doscher is an experienced executive. He will tackle Exalead’s OEM strategy, assist the online business market launch, and consolidate the firm’s activities in in business-to-business markets.

Mr. Doscher joins Exalead from Jaspersoft, which under his management became the worldwide leader in the open source business intelligence market. This expertise combined with nearly thirty years experience in the IT industry at companies including VMware and Oracle, adds muscle to the fast-growing Exalead.

Exalead has been growing at double-digit rates for the last two years. Offering its platform exalead one:search; software which scales from desktop to data center entry level points, Exalead enables industrial informational access which incorporates structured and unstructured data as well as internal and external content for individuals and organizations alike to retrieve and utilize effectively.

The firm has a platform that shares many of the characteristics of low-cost scaling and high performance with Google. You can find more information about Exalead here. In my April 2008 study for the Gilbane Group, Exalead was named a “company to watch”.

Stephen Arnold, June 10, 2008

Clearwell: Another eDiscovery Platform

June 9, 2008

The giant Thomson Reuters owns an outfit called Thomson Litigation Consulting. Thomson Litigation Consulting, in turn, recommends systems to its law firm customers. The consulting unit of Thomson Reuters earned some praise for its recommendation to DLA Piper, a firm that had a need for fast-cycle eDiscovery. You can read the effusive write up as reported on Law.com here,

Clearwell processed all 570,000 e-mail messages and attachments within our deadline of five days, providing enough time for analysis, review and production of the data. Clearwell’s incremental processing capabilities enabled TLC to start the analysis process for initial custodians within 25 minutes. The platform’s communication flow analysis enabled the legal team to quickly find all e-mails sent to specific individuals and to specific organizations (domains) within a confined date range. Clearwell’s organizational discovery automatically identified all variations of a custodian’s e-mail address, ensuring that no data for a custodian was missed.

A happy quack to Thomson Legal Consulting and to the happy, happy client. With as many as two-thirds of search and content processing systems dissatisfied, it is gratifying to know that there are success stories. The question is, “What’s a Clearwell?” The purpose of this short article is to provide some basic information about this system and make several observations about the niche strategy in search and content processing.

clearwell email thread

This is a screen shot of the Clearwell interface to see a thread or chain of related emails. The attorney can use the system to move forward and backward in the email chain. A new query can be launched. A point-and-click interface allows the attorney to filter the processed content by project, name, and other filters. The interface automatically saves an attorney’s query.

What’s a Clearwell?

The metaphor implied by the name of the company is to see into a deep, dark pit. The idea is that technology can illuminate what’s hidden.

The company is backed by Sequoia Capital, Redpoint Ventures, DAG Ventures, and Northgate Capital. In short, the firm has “smart money”. “Smart money” opens doors, presumably to secretive outfits like the Thomson Corporation. Clearwell conducted a Webinar with Google, which illustrates the company’s ability to hook up with the heavy hitters in online to educate companies about eDiscovery.

As one of the investors describes the company, Clearwell

delivers a new level of analysis of information contained in corporate document and email systems. As the first e-discovery 2.0 solution, Clearwell is poised to capitalize on this emerging market, which we expect to become a multi-billion dollar industry with the next few years.

In a nutshell, the company bundles content processing, analytics, and work flow into a product that is tailored to the needs of eDiscovery. “eDiscovery” is the term applied to figuring out what’s in the gigabytes of digital email, Word files, and depositions generated in the course of a legal matter. eDiscovery means that a research tries to know what it is in the discovered information so the lawyers know what they don’t know.

The company, unlike a generalized enterprise search platform, focuses its technology on specific markets unified by each market’s need to perform eDiscovery. These markets are:

  • Corporate security. Think email analysis.
  • Law firms. Grinding through information obtained in the discovery process
  • Service providers. Data centers, ISPs, telcos processing content for compliance
  • Government. Generally I associate the government with surveillance and intelligence operations.

Technology

There are more than 300 companies in the text processing business. I track about 12 firms focusing on the eDiscovery angle. I published a short list of some vendors as a general reference to readers of this Web log here.

The key differentiator for Clearwell is that it is a platform; that is, the customer does not have to assemble a random collection of Lego blocks into a system. Clearwell arrives, installs its system, and provides any technical assistance. For law firms in a time crunch, the Clearwell appliance is packaged as a solution that is:

  • Transparent which means another attorney can figure out what produced a particular result
  • Easy to use which means attorneys aren’t technical wizards
  • Able to handle different type of documents and language, including misspellings
  • Capable of not missing a key document which is a bad thing when the opposing attorney did not miss a document.

How does this work?

Clearwell ships an appliance that can be up and running in less than a half hour, maybe longer if the law firm doesn’t have a full-time system administrator. A graphical administration utility allows the collection or corpus to be identified to the system. Clearwell then processes the content and makes it available to authorized users.

The appliance implements the Electronic Discovery Reference Model which is a methodology supported by about 100 firms. The idea is that EDRM standardizes the eDiscovery process so an opposing attorney has a shot at figuring out where “something” comes from.

As part of the content processing, Clearwell generates entities, metadata, and indexes. One key feature of the system is that Clearwell automatically links emails into threads. An attorney can locate an email of interest and then follow the Clearwell thread through the email processed by the system. Before Clearwell, a human had to make notes about related emails. Other systems provide similar functionality. Brainware, for example, offers similar features, and it is possible to use Recommind and Stratify in this way. The idea is that Clearwell is an “eDiscovery toaster”. Lawyers understand toasters; lawyers don’t understand complex search and content processing systems.

The technical components of the Clearwell system include:

  • Deduplication
  • Support for multiple languages
  • Entity extraction
  • On-the-fly classification
  • Canned analytics to count number of references to entities
  • Basic and advanced search.

The system can be configured to allow an authorized user to add a tag or a flag so a particular document can be reviewed by another person. This function is generally described as a “social search” operation. It is little more than an interface to permit user-assigned index terms.

One of the most common requests made of enterprise search systems is a case function; that is, the ability to keep track of information related to a particular matter. Case operations are quite complex, and the major search platforms make it possible for the licensee to code these functions themselves. In effect, mainstream search systems don’t do case management operations out of the box.

Clearwell does. My review of the system identified this function as one of the most useful operations baked into the appliance. Case management means keeping track of who looked at what and when. In addition, the case management system bundles information about content and operations in one tidy package.

The Clearwell case function includes these features:

  • Analytics which can be used for time calculations, verifying that a person who was supposed to review a document did in fact open the document
  • Ability to handle multiple legal matters
  • Function to permit tags and categories to be set for different legal matters
  • User management tools
  • Audit trails.

Attempting to implement these features with an enterprise search platform is virtually a six month job, not one that can be accomplished in a day or less.

Observations

Clearwell is an example of how a start up can look at a crowded field like enterprise search and content processing, identify points of pain, and build a business providing a product that makes the pain bearable. Clearwell’s technology is, like most search vendors’, is not unique; that is, other companies provide similar functions. What sets the company apart is the packaging of the technology for the target market. Clearwell’s technical acumen is evident in the case management functions and the useful exposure of threaded emails.

Other points that impressed me are:

  • An appliance. I like appliances because I don’t have to build anything. Search is such a basic need in organizations, why should I build a search system. I don’t build a toaster.
  • Bundled software. Clearwell–unlike Exegy, Google, and Thunderstone–delivers a usable application out of the box. Index Engines comes close with its search-back ups solution. But Clearwell is the leader in the appliance-that-works niche in search at this time.
  • Smart money. When investors with a track record bet on a company, I think it’s worth paying attention.

I don’t have a confirmation on the cost of the appliance. My hunch is that it will be competitive with one-year fees from Autonomy, Endeca, and Fast Search (Microsoft) which is to say a six-figure number. If you have solid prices for Clearwell, use the comments section of the Web log to share that information. Please, check out the company at ClearwellSystems.com.

Stephen Arnold, June 9, 2008

Deep Web Tech’s Abe Lederman Interviewed

June 9, 2008

Abe Lederman, one of the founders of Verity, created Deep Web Technologies to provide “one-stop access to multiple research resources.” By 1999, Deep Web Technologies offered a system that performed “federated search.” Mr. Lederman defines “federated search” as a system that “allows users to search multiple information sources in parallel.” He added in his interview with ArnoldIT.com:

Results are retrieved, aggregated, ranked and deduped. This doesn’t seem too difficult, but trust me it’s much harder than one might think. Deep Web started out building federated search solutions for the Federal government. We run some highly visible public sites such as Science.gov, WorldWideScience.org and Scitopia.org. We have expanded our market in the last few years and sell to corporate libraries as well as academic libraries.

believes that Google’s “forms” technology to index the content of dynamic Web sites is flawed.

Mr. Lederman said:

Deep Web goes out and in real-time sends out search requests to information sources. Each such request is equivalent to a user going to the search form of an information source and filling the form out. Google is attempting to do something different. Using automated tools Google is filling out forms that when executed will retrieve search results which can then be downloaded and indexed by Google. This effort has a number of flaws, including automated tools that fill out forms with search terms and retrieve results will only work on a small subset of forms. Google will not be able to download every document in a database as it is only going to be issuing random or semi-random queries.

In the exclusive interview, Mr. Lederman reveals a new feature. He calls is “smart clustering.” Search results within a cluster are displayed in rank order.

You can read the full text of the interview on the ArnoldIT.com Web site in its Search Wizards Speak series. The interview with Mr. Lederman is the 17th interview with individuals who have had an impact on search and content processing. Search Wizards Speak provides an oral history in transcript form of the origin, functions, and positioning of commercial search and text processing systems.

The interview with Mr. Lederman is here. The index of previous interviews is here.

Stephen Arnold, June 9, 2008

Adaptive Search

June 9, 2008

Technology Review, a publication affiliated with the Massachusetts Institute of Technology, has an important essay by Erica Naone about adaptive computing. Her story here is “Adapting Websites [sic] to Users” provides a useful run down of high-profile sites that change what’s displayed for a particular user based on what actions the user takes on a Web page. I found the screen shots of a prototype British Telecom service particularly useful. When a large, fuzzy telecommunications company embraces autonomous computing on a Web site, I know a technology has arrived. Telcos enforce rigorous testing of even trivial technology to make certain an errant chunk of code won’t kill the core system.

For me, the most interesting point in the article is a quotation Ms. Naone attributes to John Hauser, a professor at MIT’s business school; to wit:

Suddenly, you’re finding the website [sic] is easy to navigate, more comfortable, and it gives you the information you need. The user, he says, shouldn’t even realize that the website [sic] is personalized.

User Confusion?

I recall my consternation when one of the versions of Microsoft software displayed reduced menus based on my behaviors. The first time I encountered this change is appearance, I was confused. Then I rooted around in the guts of the system to turn off the adaptive function. I have a visual memory that allows me to recall locations, procedures, and methods using that eidetic ability. Once I see something and then it changes, it throws off a wide range of automatic mental processes. In college, I recall seeing an updated version of an economics book, and I could pinpoint which charts had been changed, and I found one with an error almost 20 years after taking the course.

simplifed autonomous function

This is a schematic I prepared of a simplified autonomous computing process. Note that the core system represented by the circle receives inputs from external and internal processes and sources. The functions in the circular area are, therefore, able to adapt to information about different environmental factors.

Adaptive displays, for me, are a problem. If you want to sell products or shape information for those without this eidetic flaw, adaptive Web pages are for you.

As I thought about the implications of this on-the-fly personalization, I opened a white paper sent to me by a person whom I met via the comments section of my Web log “Beyond Search.”

Microsoft Active in the Field Too

The essay is “What Is Autonomous Search?”, and it is a product of Microsoft’s research unit. The authors are Youssef Hamadi, Eric Monfroy, and Fréderéic Saubion. Each author has an academic affiliation, and I will let you download the paper and sort out its provenance. You can locate the paper here.

In a nutshell, the paper makes it clear that Microsoft wants to use autonomous techniques to make certain types of search smarter. The idea is a deeper application of algorithms and methods that morph a Web page to suit a user’s behaviors. Applied to search, autonomous functions monitor information, machine processes, and user behaviors via log files. When something significant changes, the system modifies a threshold or setting in order to respond to a change.

The system automatically makes inferences. A simple example might be a surge in information and clicks on a soccer player; for example, Gomez. The system would note this name and automatically note that Gomez was associated with the German Euro 2008 team. Relevance is automatically adjusted. Other uses of the system range from determining what to cache to what relationships can be inferred about users in a geographic region.

Google: Automatic with a Human Wrangler Riding Herd

Not surprisingly, Google has a keen interest in autonomous functions. What is interesting is that in the short essay I wrote about Peter Norvig’s conversation with Anand Rajaraman here, Dr. Norvig–now on a Google leave of absence–emphasized Google’s view of automated functions. As I understand what Mr. Rajaraman wrote, Google wants to use autonomous techniques, but Google wants to keep some of its engineers’ hands on the controls. Some autonomous systems can run off the tracks and produce garbage.

I can’t name the enterprise search systems with this flaw, but those search systems that emphasize automated processes that run after ingesting training sets are prone to this problem. The reason is that the thresholds determined by processing the training sets don’t apply to new information entering the system. A simple thought experiment reveals why this happens.

Assume you have a system designed to process information about skin cancer. You assemble a training set of skin cancer information. The search and retrieval system generates good results on test queries; for example, precision and recall scores in the 85 percent range. You turn the system loose on content that is now obtained from Web sites, professional publishers, and authors within an organization. The terminology differs from author to author. The system–anchored in a training set–cannot handle the diffusion of terms or even properly resolve new terms; for example, a new treatment methodology from a different research theater. Over time, the system works less and less well. Training autonomous systems is a tricky business, and it can be expensive.

Google’s approach, therefore, bakes in an expensive human process to keep the “smart” algorithms from becoming dumber over time. The absent mindedness of an Albert Einstein is a quirk. A search system that becomes stupid is a major annoyance.

You can read more about Google’s approach to intelligent algorithms by sifting through the papers on the subject here. If you enjoy patent applications and view their turgid, opaque prose as a way to peek under Google’s kimono, I recommend that you download US2008/0022267. this invention by H. Bruce Johnson, Jr. and Joel Webber discloses how a smart system can handle certain programming chores at Google. The idea is that busy, bright Googlers shouldn’t have to do certain coding manually. An autonomous system can handle the job. The method involves the types of external “looks” and internal “inputs” that appear in the Microsoft paper by Hamadi, Monfry, and Saubion.

Observations

I anticipate more public discussion of autonomous computing systems and methods in the near future. Because the technology is out of sight, it is out of mind. It does have some interesting implications for broader social computing issues as well as enterprise search; for example:

  1. Control. Some users–specifically, me–want to control what I see. If there are automatic functions, I want to see the settings and have the ability to fiddle the dials. Denied that, I will spend considerable time and energy trying to get control of the system. If I can’t, then I will take steps to work around the automated decisions.
  2. Unexpected costs. Fully automated systems can go off the rails. In the enterprise search arena, a licensee must be prepared to retrain an automatic system or assign an expensive human to ride herd on the automated functions. Most search vendors provide administrative interfaces to allow a subject matter expert to override or input a correction. Even Google in its new site search and revamped Google Mini allows a licensee to weight certain values such as time.
  3. Suspicion of nefarious intent. When a system operates autonomously, how is a user to know that a particular adjustment has been made to “help” the user. Could the adjustment be made to exploit a psychological weakness of the user. Digital used car sales professionals could become a popular citizen in the Internet community.
  4. Ineffective regulation. Government officials may have a difficult time understanding autonomous systems and methods. As a result, the wizards of autonomous computing operate without any effective oversight.

The concern I have is that “big data” makes autonomous computing work reasonably well. It follows that the company with the “biggest data” and the ability to crunch those data will dominate. In effect, autonomous computing may set the stage for an enterprise that takes the best pieces of the US Steel, the Standard Oil, and J.P. Morgan models to build a new type of monopoly. Agree? Disagree? Use the comments section to let me know your thoughts.

Stephen Arnold, June 9, 2008

Chicago Tribune Online: Why Old Print Subscribers Will Hate the Online Edition

June 8, 2008

I don’t spend much time writing about user interface or usability. My 86-year-old father, however, forced me to confront the interface for the Chicago Tribune Online. This essay has a search angle, but the majority of my comments apply to the interface for the Chicago Tribune Online. Now if you search Google for “Chicago Tribune Online”, the fist hit is the Chicago Tribune’s main Web site. There is no direct link to the electronic edition for subscribers. You can find this service, which requires a user name and password, here. An 86-year-old person doesn’t file email like his 64 year-old son or the 12-year-old who lives in the neighborhood.

My father prints out important email. This makes it tricky for him to type in the url, enter his user name and password (a helpful eight letters and digits all in upper case so it’s impossible for him to discern whether the zero is a number or an “oh” for the letter.

Why does this matter?

I set up yesterday (June 6, 2008) an icon that contained sufficient pixie dust to send him to the electronic edition and log him in automatically. This morning he called to tell me that he had nuked his icon. I dutifully explained in an email, which he would print out, how to navigate to the page, enter the user name, enter the eight digit password (remember there are two possibilities for the zero), click the “save user name and password option” and access the Sunday newspaper.

Essentially these steps are beyond his computing ability, visual acuity, and keyboarding skills.

Does the Chicago Tribune care? My view is that whoever designed the access Web page gave little thought to the needs of my father. Why should these 20 somethings? Their world is one in which twitching icons and subtle interfaces with designer colors are irrelevant.

There’s one other weirdness about the log in page for the electronic edition of the Chicago Tribune. My father has a big flat screen, and I set it for 800 by 600 pixels so he can read the text. The problem with this size is that most Web pages, including the ones for this Beyond Search Web log are designed for larger displays. I use three displays–two for the Windows machine and one big one for the Mac. Linux machines get cast off monitors which we often unplug once the machine is running because no one “uses” the Linux machines perched in front of the boxes.

Not my father, he gets up close and personal. The failure to design for my father is understandable. Life would be easier if people were perpetually 21. Here’s the full text of the help tips in the email the Chicago Tribune sent my father:

Getting started with your Chicago Tribune electronic subscription: 1. To view a story, photo, or advertisement click the item on the full-page image (left side of your screen). It will enlarge on the right side of your screen for easier reading. 2. Use the pull-down lists located in the top center to navigate through which section and page you would like to view. 3. Use “Advanced Search” on the top center area of the window to find a specific article. 4. Use the buttons on the right to email or print each page. Use the buttons on the left to set up email alerts through e-notify and download articles or the entire paper as a PDF. 5. For more help on all the features, just click on the “Help” button found near the top left under the Chicago Tribune logo.

So, here’s what my father sees when he clicks on the electronic edition link on the 800 x 600 display in his browser:

tribune 800 600

I had trouble figuring out what button and what option was described in the “help” with the registration email. Know why? The log in information requires my father to scroll to the left and then down. There is no visible clue about the log in.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta