Gilbane Chats Up a Silly Goose: The Arnold Interview

June 18, 2008

On Wednesday, June 18, 2008, I will be interviewed in front of an audience completely unaware of why a fellow from Harrod’s Creek, Kentucky, is sitting on a stage answering questions. No one is more baffled than I. Based on my knowledge of the big city, I anticipate confusion, torpor, and indifference to my comments.

In this essay, which will become available on June 18, 2008, the curious will have a reference document that summarizes my thoughts on issues about which I may be asked. There has been no dry run for this interview. The last one in which I participated–the Associated Press’s invitation-only gathering last year–left the audience with little appetite for food. Some found the beverage table a more welcome destination.

Anticipated Question 1: What’s “beyond search” mean?

In research conducted by me and others, about two-thirds of the users of an enterprise search system are dissatisfied with that system. “Beyond search” implies that we have to move to another approach because what is now available in organizations with which I and the other researchers have investigated is not well liked. Due to the cost of some systems, annoying two-thirds of the users is tantamount to getting a D or an F on a report card.

Anticipated Question 2: What’s “behind the firewall search” mean?

I wrote about the search elephant here. Many different functions involving information access are made available to an employee, contractor, or authorized user. The idea is that “behind the firewall search” is not public and made available by an organization to a select group of users. The “search elephant” refers to the many different ways in which search is understood and perceived within an organization.

Anticipated Question 3: Why are there so many search vendors and more coming each day?

There is a belief that existing systems are not tapping into what I have estimated to be a $2.5 billion market for information access in the enterprise. Entrepreneurs and people with money look at Google and think, “We should be able to make gains like that in the enterprise market.” I also think that the market itself is trying to figure out the search elephant. Buyers don’t know what is needed. When entrepreneurs, money, and confused customers with severe information access problems come together, we have the type of market place that exists today.

Anticipated Question 4: What about Microsoft and Fast Search & Transfer?

I understand that it is business as usual at Microsoft and Fast Search. For Microsoft, this means trying to get 10,000 motorboats to go in roughly the same direction. For Fast Search, the company continues to license its Enterprise Search Platform and service customers. There are many bits of grit in the working parts where Microsoft and Fast Search mesh. It is too soon to tell if these inhibitors are trivial or whether the machine will sputter, maybe stop. What I tell people is to ignore the Microsoft-Fast Search tie up, and get a solution for a SharePoint environment that works. There are good choices ranging from a lower cost solution like dtSearch to a competitively priced system from Coveo, Exalead, ISYS Search Software, or another Microsoft Certified vendor.

Anticipated Question 5: What’s the impact of the Google Search Appliance?

Many vendors will tell you that Google has delivered a second-class system. That’s not exactly true. With the OneBox API, Google has a very solid solution. The impact is that Google has about 10,000 enterprise customers. These are sales made, in many cases, under the noses of incumbent vendors. Google’s a player in the enterprise market and a serious one. I have uncovered one impactful bit of research at Google that could–note, I said, could–change the search landscape. I have tried to ask Google about this development, but the GOOG thinks I am do not merit their attention. Too bad for me, I guess.

Anticipated Question 6: What’s the impact of text processing, semantic search, and other new technologies on enterprise search?

These are hot terms that will open doors. Some vendors will make sales because of their ability to mesh trendy concepts with more traditional search.

Stephen Arnold, June 18, 2008

Microsoft’s Web Search Strategy Revealed: The Scoble Goldberg Interview

June 16, 2008

Online video does not match my mode of learning. Robert Scoble, a laurel leaf wwearerin the new world of video and text Web logs, conducted an interview with Brad Goldberg.

The interview is part of the Fast Company videos, and it is available here. The interview is remarkable, and I urge you to spend 31 minutes and listen to Brad Goldberg, General Manager of Microsoft Search Business Group.

The interview reveals useful information about the time line for Microsoft to capture market share fro9m Google and Microsoft’s ideas for differentiating itself from Google in Web search.

Surprisingly, there were no references that I could pick up to enterprise search, nor was there any indication that Mr. Goldberg was aware of the Fast Search & Transfer Web search technology which was quite good. As you may know, Fast Search withdrew from Web search in 2003, selling its AllTheWeb.com Web index to Overture. Yahoo gobbled Overture and used bits and pieces of the Fast Search technology recently. The “auto suggest” feature is still available from Yahoo’s AllTheWeb.com site. My tests suggest that today’s AllTheWeb.com uses the Yahoo Search index built by the Slurp crawler and the Fast Search technology for some of the bells and whistles on the site. The news search function is actually quite useful. If you are not familiar with it, you can try it here.

During the interview, Mr. Goldberg uses some sample queries to illustrate his claims about Live.com’s search performance, precision, and recall. I ran the “Paris” query on each of these systems, and I ran comparative queries on this Web log as well. After the interview, I took a look at the 2005 analysis of mainstream Web search systems here so I could gauge how much change has taken place in the last three years. Quick impression: Not much. You may want to perform similar as-you-listen tests. It is easy to see what search system responds most quickly, how the search results differ, and the features that each system makes available.

Three points in Mr. Goldberg’s remarks stuck in my mind. I want to mention each of these and then offer a few observations. Judging from the edgy comments to some my essays, I want you to know that you may not agree with me. That’s okay with me. Please, use the comments section to set me straight. Providing some facts to go along with your push back is helpful to me.

Key Points for Me

1. Parity or Microsoft’s Relevancy Is As Good as Google’s

Mr. Goldberg asserted that the major search services were at parity in terms of relevance and coverage. I found this notion somewhat difficult to comprehend. The data about Web search market share undermines any argument about parity which means, according to my understanding of the word “equality” or “equivalence”. I have had difficulty interpreting comments by whiz kids before, so I may be off base. My thought was that Google continues to gain market share at the expense of both Microsoft and Yahoo. The dis-parity is significant because Google, according to data mavens, accounts for 60 percent of more of user queries in the US. In Europe, the market share is higher. US search systems do not hold commanding leads in China, Korea, and other Eastern markets.

Should parity mean visual appearance, yes, Microsoft is looking more like Google. Here is the result of one of my test queries: “real estate baltimore maryland”.

googlesearch live search

On the surface these look alike. Closer inspection reveals that Google includes a canned form so I can narrow my result by location and property type. Google eliminates a step in looking for real estate in Baltimore. Microsoft’s result does not offer this feature, preferring to show “related searches”. I like the Google approach. I don’t make much use of machine-generated related queries. I have specialized tools to discern relationships in result sets.

Read more

Search Rumor Round Up, Summer 2008

June 14, 2008

I am fortunate to receive a flow of information, often completely wacky and erroneous, in my redoubt in rural Kentucky. The last six months have been a particularly rich period. Compared to 2007, 2008 has been quite exciting.

I’m not going to assure you that these rumors have any significant foundation. What I propose to do is highlight several of the more interesting ones and offer a broader observation about each. My goal is to provide some context for the ripples that are shaking the fabric of search, content processing, and information retrieval.

The analogy to keep in mind is that we are standing on top of a jello dessert like this one.

jellow 2 brighter copy copy

The substance itself has a certain firmness. Try to pick it it up or chop off a hunk, and you have a slippery job on your hands. Now, the rumors:

Rumor 1: More Consolidation in Search

I think this is easy to say, but it is tough to pull off in the present economic environment. Some companies have either investors who have pumped millions into a search and content processing company. These kind souls want their money back. If the search vendor is publicly traded, the set up of the company or its valuation may be a sticky wicket. There have been some stunning buy outs so far in 2008. The most remarkable was Microsoft’s purchase of Fast Search & Transfer. SAS snapped up the little-known Teragram. But the wave of buy outs across the more than 300 companies in the search and content processing sector has not materialized.

Rumor 2: Oracle Will Make a Play in Enterprise Search

I receive a phone call or two a month asking me about Oracle SES10g. (When you access the Oracle Web site, be patient. The system was sluggish for me on June 14, 2008.)The drift of these calls boils down to one key point, “What’s Oracle’s share of the enterprise search market?” The answer is that its share can be whatever Oracle’s accountants want it to be. You see Oracle SES10g is linked to the Oracle relational database and other bits and pieces of the Oracle framework. Oracle’s acquisitions in search and retrieval from Artificial Linguistics more than a decade ago to Triple Hop in more recent times has given Oracle capability. As a superplatform, Oracle is a player in search. So far this year, Oracle has been moving forward slowly. An experiment with Bitext here and a deployment with Siderean Software there. Financial mavens want Oracle to start acquiring search and content processing companies. There are rumors, but so far no action, and I don’t expect significant changes in the short term.

Read more

Goo Jit Su: Google’s Art of Soft Force in Competitive Fights

June 13, 2008

Note to PR mavens. This is an essay based on my personal opinions. Please, don’t call me to set me straight. The author wears bunny rabbit ears. Thank you for your attention.

I have a friend who is a Georgia Tech computer wizard. I don’t think he went to class; he just took tests and aced them. But like me, he’s logged a number of years on his disc drives. But I recall fondly his many references to various martial arts. He was fascinated by akido, the art of soft force. He even introduced me to his sensei before the two of these unathletic looking lads went off to the Times Square subway station in the hopes of having a street gang try to mug them. Quite a duo: A math wizard and an umpteenth degree black belt from somewhere west of Marina del Rey.

The idea of “soft” fighting is that you use your opponent’s force to defeat the opponent. I remember one day when my son was in high school. My friend asked me, “Will your son wrestle me?”

Now, my son was a quite a good high school footballer and quite fit. He had muscles where I didn’t know one could have muscles. When he arrived home from school, I said, “Howie wants to wrestle you. Please, don’t hurt him. We have to get the system running tonight, and I don’t have time to take him to the hospital.” “Sure,” he said.

My son smiled and then without warning grabbed my friend’s arm and twisted it–or tried to twist it. This Georgia Tech engineer who looked like a Georgia Tech engineer, not a street fighter, turned toward my son and gently put him to the ground. My son went for a tackle and ended up in the marigolds. “That’s it,” my wife said. “You guys get out of my flowers.” My son asked my friend, “How do you do that?”

My friend said, “Ah, grasshopper, you need to study akido with my sensei. The secret is to use use your energy to achieve my ends. It is strength from soft force. It is power without effort.”

I thought it was baloney. But that “power without effort” idea stuck in my mind. I also quite liked the phrase soft force. I thought the silliness of dojos, pajamas, and strength with minimal effort was poppycock. But there it was: My fit son gently deflected and controlled by my Georgia Tech pal and his grasshopper parody from the old TV show Kung Fu.

Then I made the connection between my friend, a math and computer whiz, and Google. I realized that the GOOG was practicing its own black art of Goo Jit Su.

goo flipping opponent

This is an illustration of Googzilla, dressed in traditional garb designed to make US wrestlers chuckle, using “soft force” to throw an opponent into a tizzy. Notice that Googzilla expends little effort. The opponent is headed for a shock with his energy redirected against him. Googzille seems to be lowering the opponent to the ground almost gently. Appearances can be deceiving.

Let me explain.

Google has demonstrated for the second time in less than a year its mastery of a new form of “soft” force. I call this form of fighting “Goo Jit Su”. Instead of defining it and using those cute line drawings that show how to kill an opponent with the crane or other animal inspired technique, let me give you two examples of Goo Jit Su.

Verizon

Google is a peanut compared to Verizon. It’s not just revenue. Verizon is big. It has the AT&T pre-Judge-Green DNA in its digital marrow. Verizon understands lobbying. Verizon knows how to win government contracts. Verizon knows how to squeeze money from its customers. I heard that in Washington, DC, even the drug dealers pay their Verizon wireless bills on time. No reason to annoy Mother Verizon.

Verizon’s approach to business combat is similar to extreme martial arts–anything goes. There’s one objective: triumph.

Google pulled its Goo Jit Su on Verizon. Without any effort beyond some letter writing and hiring familiar lobbyist type drones, Verizon agreed to open its wireless spectrum. I don’t have a clue what “open” means, but as a former Bell Labs contractor, work at Bellcore, and my USWest Web work, “open” is not what phone companies do. AT&T defined “open” in one way–AT&T’s way. Verizon’s agreeing to open spectrum is tantamount to one of the Mt. Rushmore faces turning up in the Poconos.

How did Google achieve this feat with little cost, modest effort, and generally disorganized PR? The answer, “Goo Jit Su.” Google used the force of Verizon the way my friend turned a collision with my son into a romp.

Read more

Mobile Search

June 10, 2008

I try to steer clear of mobile search. The notion is broad and like most terms used to describe information retrieval the phrase mobile search is frequently undefined. The idea, I assume, is that everyone knows what mobile search is.

I asked my neighbor what mobile search was, and he said, “I just use my phone for calls.” Functions like sending a query to Yahoo’s mobile service aren’t used very often by me, not at all by him, and probably not by you, gentle reader, either.

But if you you get text or graphic information on a mobile device, it’s mobile search. Most pundits feel that this definition is close enough for horse shoes. The problem is that it is the equivalent of cutting a cherry pie with a Husqvarna 455 Rancher chain saw, a popular model here in the hills of Kentucky.

mobile search disappoints

This is a photograph of a Beyond Search programmer expressing dissatisfaction with the mobile search function on an Apple iPhone and a Treo 650. “Both are terrible,” says ArnoldIT.com’s chief technical officer.

The USA Today business section ran this front page story on June 10, 2008: “Are Google, Yahoo the Next Dinosaurs?” I couldn’t find the story on USAToday’s Web site. If it does appear online, I think this is the link that will display it for you. If you can’t locate this story online, you may have to hunt for a tree-unfriendly printed version.

The story, written by Leslie Cauley, is that “many [vendors are] on the hunt for a way to cash in on wireless search.” The idea is that no one, not even Google, Microsoft, or Yahoo have cracked the code for mobile search. The “dinosaur” part is a bit of color. The notion is that because neither Google nor Yahoo have cracked the code for mobile search, these two firms could be left in the dust by younger, more hip innovators. Ergo: Google and Yahoo become the brontosauri of online with regard to mobile search. Ms. Cauley mentions an up-and-coming company called Medio, careful to explain that this is just one interesting company among many. You can read more about Medio here. Could Medio be the next Google?

Because mobile devices are more plentiful than other types of computers, whoever cracks the code can make boat loads of cash selling ads to mobile phone device users running search. I’m not going to cite USAToday’s statistics. I have heard that Gannett takes a dim view of old researchers tapping into their high-value statistical data captured in bar charts without data tables.

I urge you buy “America’s newspaper”; make Gannett’s accountants happy.

The challenges of mobile search are formidable. There are established business models ossified in the American telecommunications industry. There are device issues; namely, screens smaller than the 48 inches of flat panel I have in front of me at this moment, lousy keyboards, and users who aren’t too keen on taking time to paw through a laundry lists of results.

Read more

Clearwell: Another eDiscovery Platform

June 9, 2008

The giant Thomson Reuters owns an outfit called Thomson Litigation Consulting. Thomson Litigation Consulting, in turn, recommends systems to its law firm customers. The consulting unit of Thomson Reuters earned some praise for its recommendation to DLA Piper, a firm that had a need for fast-cycle eDiscovery. You can read the effusive write up as reported on Law.com here,

Clearwell processed all 570,000 e-mail messages and attachments within our deadline of five days, providing enough time for analysis, review and production of the data. Clearwell’s incremental processing capabilities enabled TLC to start the analysis process for initial custodians within 25 minutes. The platform’s communication flow analysis enabled the legal team to quickly find all e-mails sent to specific individuals and to specific organizations (domains) within a confined date range. Clearwell’s organizational discovery automatically identified all variations of a custodian’s e-mail address, ensuring that no data for a custodian was missed.

A happy quack to Thomson Legal Consulting and to the happy, happy client. With as many as two-thirds of search and content processing systems dissatisfied, it is gratifying to know that there are success stories. The question is, “What’s a Clearwell?” The purpose of this short article is to provide some basic information about this system and make several observations about the niche strategy in search and content processing.

clearwell email thread

This is a screen shot of the Clearwell interface to see a thread or chain of related emails. The attorney can use the system to move forward and backward in the email chain. A new query can be launched. A point-and-click interface allows the attorney to filter the processed content by project, name, and other filters. The interface automatically saves an attorney’s query.

What’s a Clearwell?

The metaphor implied by the name of the company is to see into a deep, dark pit. The idea is that technology can illuminate what’s hidden.

The company is backed by Sequoia Capital, Redpoint Ventures, DAG Ventures, and Northgate Capital. In short, the firm has “smart money”. “Smart money” opens doors, presumably to secretive outfits like the Thomson Corporation. Clearwell conducted a Webinar with Google, which illustrates the company’s ability to hook up with the heavy hitters in online to educate companies about eDiscovery.

As one of the investors describes the company, Clearwell

delivers a new level of analysis of information contained in corporate document and email systems. As the first e-discovery 2.0 solution, Clearwell is poised to capitalize on this emerging market, which we expect to become a multi-billion dollar industry with the next few years.

In a nutshell, the company bundles content processing, analytics, and work flow into a product that is tailored to the needs of eDiscovery. “eDiscovery” is the term applied to figuring out what’s in the gigabytes of digital email, Word files, and depositions generated in the course of a legal matter. eDiscovery means that a research tries to know what it is in the discovered information so the lawyers know what they don’t know.

The company, unlike a generalized enterprise search platform, focuses its technology on specific markets unified by each market’s need to perform eDiscovery. These markets are:

  • Corporate security. Think email analysis.
  • Law firms. Grinding through information obtained in the discovery process
  • Service providers. Data centers, ISPs, telcos processing content for compliance
  • Government. Generally I associate the government with surveillance and intelligence operations.

Technology

There are more than 300 companies in the text processing business. I track about 12 firms focusing on the eDiscovery angle. I published a short list of some vendors as a general reference to readers of this Web log here.

The key differentiator for Clearwell is that it is a platform; that is, the customer does not have to assemble a random collection of Lego blocks into a system. Clearwell arrives, installs its system, and provides any technical assistance. For law firms in a time crunch, the Clearwell appliance is packaged as a solution that is:

  • Transparent which means another attorney can figure out what produced a particular result
  • Easy to use which means attorneys aren’t technical wizards
  • Able to handle different type of documents and language, including misspellings
  • Capable of not missing a key document which is a bad thing when the opposing attorney did not miss a document.

How does this work?

Clearwell ships an appliance that can be up and running in less than a half hour, maybe longer if the law firm doesn’t have a full-time system administrator. A graphical administration utility allows the collection or corpus to be identified to the system. Clearwell then processes the content and makes it available to authorized users.

The appliance implements the Electronic Discovery Reference Model which is a methodology supported by about 100 firms. The idea is that EDRM standardizes the eDiscovery process so an opposing attorney has a shot at figuring out where “something” comes from.

As part of the content processing, Clearwell generates entities, metadata, and indexes. One key feature of the system is that Clearwell automatically links emails into threads. An attorney can locate an email of interest and then follow the Clearwell thread through the email processed by the system. Before Clearwell, a human had to make notes about related emails. Other systems provide similar functionality. Brainware, for example, offers similar features, and it is possible to use Recommind and Stratify in this way. The idea is that Clearwell is an “eDiscovery toaster”. Lawyers understand toasters; lawyers don’t understand complex search and content processing systems.

The technical components of the Clearwell system include:

  • Deduplication
  • Support for multiple languages
  • Entity extraction
  • On-the-fly classification
  • Canned analytics to count number of references to entities
  • Basic and advanced search.

The system can be configured to allow an authorized user to add a tag or a flag so a particular document can be reviewed by another person. This function is generally described as a “social search” operation. It is little more than an interface to permit user-assigned index terms.

One of the most common requests made of enterprise search systems is a case function; that is, the ability to keep track of information related to a particular matter. Case operations are quite complex, and the major search platforms make it possible for the licensee to code these functions themselves. In effect, mainstream search systems don’t do case management operations out of the box.

Clearwell does. My review of the system identified this function as one of the most useful operations baked into the appliance. Case management means keeping track of who looked at what and when. In addition, the case management system bundles information about content and operations in one tidy package.

The Clearwell case function includes these features:

  • Analytics which can be used for time calculations, verifying that a person who was supposed to review a document did in fact open the document
  • Ability to handle multiple legal matters
  • Function to permit tags and categories to be set for different legal matters
  • User management tools
  • Audit trails.

Attempting to implement these features with an enterprise search platform is virtually a six month job, not one that can be accomplished in a day or less.

Observations

Clearwell is an example of how a start up can look at a crowded field like enterprise search and content processing, identify points of pain, and build a business providing a product that makes the pain bearable. Clearwell’s technology is, like most search vendors’, is not unique; that is, other companies provide similar functions. What sets the company apart is the packaging of the technology for the target market. Clearwell’s technical acumen is evident in the case management functions and the useful exposure of threaded emails.

Other points that impressed me are:

  • An appliance. I like appliances because I don’t have to build anything. Search is such a basic need in organizations, why should I build a search system. I don’t build a toaster.
  • Bundled software. Clearwell–unlike Exegy, Google, and Thunderstone–delivers a usable application out of the box. Index Engines comes close with its search-back ups solution. But Clearwell is the leader in the appliance-that-works niche in search at this time.
  • Smart money. When investors with a track record bet on a company, I think it’s worth paying attention.

I don’t have a confirmation on the cost of the appliance. My hunch is that it will be competitive with one-year fees from Autonomy, Endeca, and Fast Search (Microsoft) which is to say a six-figure number. If you have solid prices for Clearwell, use the comments section of the Web log to share that information. Please, check out the company at ClearwellSystems.com.

Stephen Arnold, June 9, 2008

Chicago Tribune Online: Why Old Print Subscribers Will Hate the Online Edition

June 8, 2008

I don’t spend much time writing about user interface or usability. My 86-year-old father, however, forced me to confront the interface for the Chicago Tribune Online. This essay has a search angle, but the majority of my comments apply to the interface for the Chicago Tribune Online. Now if you search Google for “Chicago Tribune Online”, the fist hit is the Chicago Tribune’s main Web site. There is no direct link to the electronic edition for subscribers. You can find this service, which requires a user name and password, here. An 86-year-old person doesn’t file email like his 64 year-old son or the 12-year-old who lives in the neighborhood.

My father prints out important email. This makes it tricky for him to type in the url, enter his user name and password (a helpful eight letters and digits all in upper case so it’s impossible for him to discern whether the zero is a number or an “oh” for the letter.

Why does this matter?

I set up yesterday (June 6, 2008) an icon that contained sufficient pixie dust to send him to the electronic edition and log him in automatically. This morning he called to tell me that he had nuked his icon. I dutifully explained in an email, which he would print out, how to navigate to the page, enter the user name, enter the eight digit password (remember there are two possibilities for the zero), click the “save user name and password option” and access the Sunday newspaper.

Essentially these steps are beyond his computing ability, visual acuity, and keyboarding skills.

Does the Chicago Tribune care? My view is that whoever designed the access Web page gave little thought to the needs of my father. Why should these 20 somethings? Their world is one in which twitching icons and subtle interfaces with designer colors are irrelevant.

There’s one other weirdness about the log in page for the electronic edition of the Chicago Tribune. My father has a big flat screen, and I set it for 800 by 600 pixels so he can read the text. The problem with this size is that most Web pages, including the ones for this Beyond Search Web log are designed for larger displays. I use three displays–two for the Windows machine and one big one for the Mac. Linux machines get cast off monitors which we often unplug once the machine is running because no one “uses” the Linux machines perched in front of the boxes.

Not my father, he gets up close and personal. The failure to design for my father is understandable. Life would be easier if people were perpetually 21. Here’s the full text of the help tips in the email the Chicago Tribune sent my father:

Getting started with your Chicago Tribune electronic subscription: 1. To view a story, photo, or advertisement click the item on the full-page image (left side of your screen). It will enlarge on the right side of your screen for easier reading. 2. Use the pull-down lists located in the top center to navigate through which section and page you would like to view. 3. Use “Advanced Search” on the top center area of the window to find a specific article. 4. Use the buttons on the right to email or print each page. Use the buttons on the left to set up email alerts through e-notify and download articles or the entire paper as a PDF. 5. For more help on all the features, just click on the “Help” button found near the top left under the Chicago Tribune logo.

So, here’s what my father sees when he clicks on the electronic edition link on the 800 x 600 display in his browser:

tribune 800 600

I had trouble figuring out what button and what option was described in the “help” with the registration email. Know why? The log in information requires my father to scroll to the left and then down. There is no visible clue about the log in.

Read more

Enterprise Information’s Missing Pieces

June 5, 2008

In 2001, I found myself on a panel talking about electronic information and enterprise search. The venue was Internet World. That’s right the once dominant trade show for the brave new world of online.

I’m not sure how I ended up on the program, but I recall I was there, facing an audience of 250 people. Put the word “Internet” on a hand lettered sign in a diner’s window and a crowd would gather. The Internet has evolved but the missing pieces in the information puzzle are still with us.

Here’s an image from my PowerPoint deck.

puzzle pieces

Web log graphics are “crunched” and the result is difficult for me to read. Let me highlight each of these nine pieces of the enterprise information puzzle.

  1. Graphical editor
  2. Database engine
  3. Version controls
  4. Site manipulation tools (that is, publishing tools)
  5. Personalization tools
  6. Search engine
  7. Administrative interface
  8. Usage tracking
  9. Security services

Nothing is missing. The nine elements are identified in the graphic, and in your own organization you have each of these functions up and running. Some puzzle pieces work better than others. These are complex sub systems and functions. Variability and unevenness are to be expected.

My point in 2001 was that each of these pieces was not fitted to the others. The parts are there, but until integration across different sub systems and functions, the puzzle is incomplete. In fact, you don’t even have a decent picture of what the integrated results will look like.

Read more

SolveIT: Fancy Math

June 5, 2008

Several years ago I found myself in a meeting. I was paid to attend a session in North Carolina; otherwise, I wouldn’t go to Charlotte. The city is too sophisticated for this Kentuckian.

In the meeting, a soft-spoken mathematician, his son, a couple of cousins, and maybe an uncle explained sparse sets, assigning probabilities to boundaries, and ant algorithms.  As I struggled to dredge definitions about these concepts from my admittedly poor memory, the soft-spoken mathematician asked me a word problem. A waiter had 12 customers and ended up with an extra dollar. Why? I just sat there and looked my normal stupid self.

Later, he explained that his inspiration was a mathematician named Stanis?aw Le?niewski. Okay, early 20th century wizard. That was the end of my knowledge. Puzzles are the key to learning math he told me. In his spare time, this fellow has set up a Web site to make this concept more widely known. You can see it here.

I had no clue who these fellows were, but I was getting paid to listen so I listened.

A Super Guru: Who Says He’s Just a Regular Guy

The super guru is a fellow named Zbigniew Michalewicz, a highly regarded mathematician everywhere except in Harrod’s Creek. The relatives were also mathematicians. The crowd could finish one another’s sentences and equations. Math, it turns out, is something that runs in the Michalewicz family and has for decades.

Dr. Michalewicz is an expert in generic algorithms and data structures. When added together, the mathematical recipe yield evolution programs. You can read more about this approach to some tough data problems in Genetic Algorithms + Data Structures = Evolution Programs, published by Springer-Verlag ISBN: 3-540-60676-9. No, your local book store won’t stock it. Amazon does.

The group sold its US enterprise and Dr. Michalewicz and a family member or two moved to Australia.

After losing track of these fellows, I learned that Dr. Michalewicz, his son, and a handful of mathematical gurus set up shop as SolveIT Software. Click here to navigate to the company’s Web site.

The new company uses new math to solve old problems. The company is in the business of delivering solutions that deliver “adaptive business intelligence”. The company’s range of technology is remarkable and it may be meaningless to you unless you took a couple of advanced math classes; for example:

  • Agent-based systems
  • Ant systems (my favorite)
  • Evolutionary strategies
  • Evolutionary programming
  • Fuzzy systems
  • Genetic algorithms
  • Neural networks
  • Rough sets (great stuff!)
  • Swarm intelligence
  • Simulated annealing (does with math to data what oil quenching does to low-grade steel)
  • Tabu search (I have no clue what this numerical method yields).

You can figure out most of these notions by dipping into Peter Norvig’s Artificial Intelligence or E. J Borowski’s and J. M. Borwein’s Web-Linked Dictionary Mathematics. (Note: there is a subtle difference between the Norvig approach and the Michalewicz method. Google uses humans. Humans play an optional role in the Michalewicz recipes. No big deal, but you can explore the differences yourself by reading each guru’s text book.)

A Case Example

Equations are not likely to raise my Google ranking. Let me describe an outcome of Dr. Michalewicz’s skills.

Here’s the set up. You are Ford, Honda, or Toyota. Each week you get a couple of thousand lease cars back. You want to sell the cars quickly. You want to minimize how much you have to spend to truck these white elephants to a location where a particular model will sell. Pink convertibles don’t fly in Nome, Alaska, but are hot items in Scottsdale, Arizona. Your resale team would rather go to a bowling convention that work Excel models.

You want to maximize return, minimize expenses, and get the decisions out of your resale team’s “instinct” and into something fungible like a SolveIT solution.

SolveIT’s analysts beaver their way through the data, the work flow, and the exogenous factors that you and your resale team did not consider. The company builds from its mathematical Lego blocks, a computerized system that prints out a map and report telling your sales team where to ship which car.

You use the SolveIT system for a couple of months, and you notice that your expenses go down and your net goes up. SolveIT removed the guess work and let the “fancy math” do the heavy lifting. When I spoke with the company several years ago, one beta client was generating cash positives in six figures within six weeks.

Like most sophisticated companies run by serious math geeks, there’s not much information available on the company’s Web site. I did dig through my files, and I found an example of the company’s outputs. Keep in mind that this diagram is probably out of date, but it will give you a flavor of what the SolveIT operation does.

The system “shows” the resale team where certain cars will sell. Then the system prints out a report that says, “Send the pink convertible to Chicago and the truck to Paducah.” The math does the heavy lifting. The resale team looks at simple diagrams. The math remains safely hidden away.

solveit optimizer

Observations

SolveIT is one of a handful of companies pushing the envelope in analytics. If you want to tap into some serious math, contact this company. I have one tip. Don’t ask, “How does this work?” The explanation requires a solid foundation if traditional mathematics and post-doctoral work in set theory. How complicated is the math. I found in my files one example which I had to scan and convert to an image. I kept it as a reminder of how little I know about the next big things in mathematics; for example, in my notes I had this pair of statements:

If these statements speak to you, then you can dig more deeply into the SolveIT systems and methods.

Based on my personal experience with Dr. Michalewicz, he’s a capable mathematical thinker. For more about his company’s approach to problem solving, you will find useful How to Solve It: Modern Heuristics, also by Springer Verlag. You can get a copy here.

Stephen Arnold, June 6, 2008

Ontos: a Text Processing Company, Not a Weapon

June 5, 2008

In a conference call yesterday (June 4, 2008), someone mentioned “Ontos”. Another person asked, “What’s an Ontos?” I answered, “An anti-tank vehicle” What I remembered about the Ontos is that it was a tank loaded down with so many weapons I a turtle was speedier. Big laugh. Ontos is a company engaged in text and content processing with a product called ObjectSpark. To fill in the void in my knowledge, I navigated to the GOOG, plugged in “Ontos” and found a link to a 2001 article in Intelligent Enterprise, a very good Web site now that the print magazine has been put out to pasture. You can read the description here.

The company’s English language Web site is at www.ontos.com. The product line up no longer relies on the ObjectSpark name. You can license:

  • OntosMiner, which “analyzes natural language text. It recognizes objects and their relations and adds them as annotations to the related text parts. The technology is based on semantic rules, i.e. NLP (Natural Language Processing). It uses ontologies to define the area of interest.”
  • LightOntos for Workgroups, which “helps to organize and search information and documents. It allows the user to process and annotate PDF, Word, RTF, Text or HTML files using OntosMiner.”
  • Ontos SOA, which “realizes the whole cycle of semantic-syntactical processing, management and analysis of unstructured information located in the Internet and large corporative data banks.”
  • TAIS Ontos, which is “created as an Application Package using ORACLE technologies and Java. The system uses a semantic designed for building and maintaining object oriented databases. Additional components are effective engines for the search of explicit and hidden relations between objects. A visualization environment (interface) supports the analysts when analyzing a domain of interest. The product is adapted for the segment of law enforcing structures and attributed to the class of anti-criminal analytical systems”

The display of tagged text uses color to identify specific elements. When I saw this display, it reminded me to the output from Inxight Software’s text processing system.

ontos mark up

The company’s Russian partner–ZAO AviComp Services–participated in the recent German technical extravaganza, CEBIT 2008.

You will find a handful of white papers on the Ontos Web site. I found “Ontos Solutions for the Semantic Web” quite interesting and informative. You can download it here.

I wasn’t able to locate any pricing or licensing information. If you have some of these data points, please, use the comment form below this essay to share the information with other readers. My email to the company went unanswered.

Based on my clicking through the Web site, you might want to take a look at this system. The white papers and technical descriptions use the buzz words that other vendors bandy about. The one drawback to a system that lacks a high profile in the US is this question, “Does the system meet US security guidelines?” My hunch is that the system is industrial strength; otherwise, the Brussels customer would not have signed a deal to use the Ontos technology.

Stephen Arnold, June 5, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta