SeeWhy: Real Time Business Intelligence without Search
July 6, 2008
SeeWhy came on my radar with its “no search” marketing angle. I poked around and was, at first, confused. The company appeared to occupy a no-man’s-land between search engine optimization and business intelligence that I avoid. A quick look revealed that the company has a business event system with some interesting twists.
Real Time and My Concern with the Phrase
“Real time” has been promoted from technical impossibility to buzz word. The general notion of “real time” among computer scientists is that simultaneity across linked systems is impossible outside of the bizarre world of high-energy physics. No matter how minute, latencies exists even if measured in picoseconds. But to a marketer, “real time” connotes a software, gentler world far from the “batch oriented” or human-intermediated world familiar to most professionals.
Now, real time is coming to the enterprise. Exegy, based in St. Louis, Missouri, offers an appliance that can ingest content by the megabyte per second and spit out processed content without much latency. To achieve this, Exegy has done some hardware engineering, but the gizmo works. When you shift to “real time” in the types of server environments found in a trucking company or a consulting company where capital investment is mostly out of the question, “real time” is not in Exegy’s league.
Let me be clear: to deliver near real time content processing Exegy style, you need specialized infrastructure. The average Dell server is not able to deliver no matter how insistent Bill Trucking Company’s information technology consultant becomes.
A number of text and content processing companies are asserting that their systems operate in “real time”. They don’t. Against this background, let’s look at one interesting company. I will not comment on this firm’s emphasis on real time processing, preferring to provide some basic information about this single firm and then offering, as a wrap up, a handful of generalized observations.
SeeWhy Software: Operational Business Intelligence
SeeWhy is one of the ?rst “open source” real time Business Intelligence platform for the event driven enterprise. SeeWhy continuously analyzes and interprets streams of individual business events, to alert you immediately to opportunities and risks and enable everyday decisions to be automated.
The marketing angle that snared my attention.
This company Incorporated in 2003 by BI industry veteran Charles Nicholls, SeeWhy is backed by several venture capital investors, including LogiSpring, Pentech Ventures, Delta Partners and handful of private folks. SeeWhy is headquartered in Windsor, United Kingdom.
The Charles Nicholls, founder and CEO, said here:
I began to ponder on the Business Intelligence industry with all its unfulfilled promise, often long on vision and short on delivery. The more that you challenge the status quo, the faster that you can see the opportunities to make the world a better place. It was this process that started me on a journey that led inevitably to create SeeWhy.
The basic premise of the company is summarized in this diagram from “In Search of Insight,” a 43 page document from Mr. Nicholls:
The Web 2.0 Angle
You can download a monograph “In Search of Insight” about the company’s approach to business intelligence here, no annoying registration, thank you, SeeWhy.
Email Analysis
July 5, 2008
This summer I have been asked about email analysis on two different occasions. In order to respond to these requests, I had to grind through my archive of email-related information. I wrote about Clearwell Systems and its approach earlier this year. You can read this essay here.
I cannot reproduce the information my paying customers received. I can take a representative company–in this case, Stratify, a unit of Iron Mountain–and show you two different screen shots. These layouts and representations are the property of Stratify, and I am including them in this essay for two reasons:
- Stratify has been one of the early players in text analytics. First as Purple Yogi and then as Stratify, the company was engaged in the difficult missionary marketing needed to make non believers into believers
- The company has gained some traction in the legal market, which in the US, is a booming sector. The problems of the economy translate into a harvest of riches for some legal firms. Email is a big deal in discovery, and few have the resources to get a human to read all the baloney that zooms around an organization involved in a legal matter.
The Problem
You know the problem. Email was once ASCII shot between two people on Arpanet. Today email is the bane of the knowledge worker. The volume is high. The storage systems antiquated. The attachments madden the sane. The people using email forget that the messages live on different servers and can, in the process of discovery, be copied to a storage device and delivered to the attorney or attorneys who have to find something germane to the legal matter in the terabytes of digital data.
To summarize the challenges:
- Email volume (lots of it, maybe a billion messages in a mid-sized organization every year)
- Email attachments (tough to find the “right” one)
- Email crashes (restores don’t always work, which you probably know first hand)
- Email sent as if it were a one-time, secret communication
- Email with recipients who, by definition, have some relationship.
For a lawyer, email is good and bad. It’s good if one finds a smoking gun or better yet a gun in the act of shooting. It’s bad if the bullets are coming at the opposing side’s legal eagles, worse if the bullet shoots a legal eagle out of the sky with a slug through the brain.
Ergo: email is a big, big deal in the information world of litigation.
The Solution
The fix is obvious–search. Actually to be precise, the conundrums of email invite text processing, text analytics, link analysis, relationship extraction, entity extraction, and other nifty methods.
The basics of email analysis are actually simple on the surface, more complicated under the hood and out of sight of non-technical types like lawyers: [a] copy email to a storage device that is fast, [b] tell email analysis program to index the email, [c] key word search or browse outputs, [d] make notes, print out email, and read individual documents of interest, [e] repeat taking care to bill for the time. (That’s the best part of email analysis. It’s quicker than manual methods, but the systems have to have a baby sitter. Those operating these systems can bill without working up too much of mental headache. Automated processes do make some legal thinking less painful. The best part is billing for this less stressful time.)
What do these systems show the user? The illustration below shows a Stratify search screen. Since I obtained this screen shot, Stratify has probably updated the interface. The main features are our interest. Take a look at what the Stratify system user sees when analyzing processed email:
Stratify’s email visualization
The principal features of this display are:
- Simplicity. You don’t want to confuse attorneys
- A picture showing people and their relationships as discerned by the system. Remember, an email can be sent to a person unrelated to a subject either by accident or for some other reason such as an “this is what I am doing” courtesy
- Links on the right hand panel to make it easy for the user to poke around by sender, topic, etc.
Let’s assume that the email is one part of a discovered collection of information. Stratify provides a richer interface. This one includes the bells and whistles that warrants the Stratify system price tag which is in six figures in case you want to license the system.
Google Bashing: Now Googzilla Is Playing Catch Up
July 5, 2008
Elise Ackerman’s essay “Google Finds Itself Playing Catch Up” is another installment in the continuing saga of Google’s losing its youthfulness. You can read the full story here. I am fearful of quoting anything from a newspaper because the Associated Press has intimidated the squawking goose.
My take on her July 5, 2008, analysis is that Google is getting its rump roasted in the mobile phone sector. The reasons range from industry giants taking such un Googley actions as buying Symbian to forming the LiMo foundation, owned by 18 telcos. You can learn more about LiMo here. This is a link to Wikipedia, but when I scanned it, the entry seemed reasonably objective. Telco-related information can be difficult to winnow for information versus disinformation.
Ms. Ackerman’s conclusion is, based on my reading her essay, that Google has to contend with competition, technical issues, and its own desire to make a Gphone that is similar to the iPhone.
My thoughts on Google and mobile phones have been described in both of my Google studies, which you can learn about here. I also took money to dig into Google telephony technology for a couple of telcos. My audience communicated in verbal and non-verbal ways, “Google is a Web search and ad company. Google is not going to be able to play in the mobile telephony space.” Fortunately I got paid before sharing my research findings with the poobahs of phones.
Let me offer several observations on the Google-telco show down:
- Google has been fooling around with mobile technology for at least nine years. Do a patent search or check out the technical journals using a commercial online engineering database. The company has been thinking about telephony, quality of service, and alternative transmission technologies over time.
- Google is not a fast mover. The company gives an impression of joyful spontaneity, but at its core, Google is conservative in many ways, preferring to test, obtain data, test, refine, obtain more data, and then determine if any one cares. “Care” as I use the term means “click” or “advertise”.
- Google operates via “pull”, not push. Most companies, including telcos, decide what the customer will get and then shoves it down the marketing pipeline. the “pull” model, when combined with test-analyze-refine-test looks pretty ineffectual. Caution is advised before tagging Google as “ineffectual”.
- Google’s “play” may not be easily translated into a one-to-one response to existing products and services. Android is a work in progress. Voice search is a component of a knowledge base. Google’s deals with various companies and partners are better viewed as variants of what I call the “Google dating model”. You can learn about this by checking out how Google works with Salesforce.com. Google is dating, not yet marrying Salesforce.com, which might happen.
When assessing a Google challenge in telephony, then, a different perspective on the day-to-day activities are needed. Getting perspective on Googzilla is hard for some companies, some analysts, some competitors, and some partners to do.
My thought is that it is too soon to label the GOOG as a winded has-been in the great telephony race. Perhaps we are watching a young Googzilla running wind sprints. The Googzilla may be in training, not playing for real yet.
Stephen Arnold, July 5, 2008
Google: Another Angle on Question Answering
July 5, 2008
On July 3, 2008, the USPTO published US20080160490. Applications do not equal real products and services. Many people remind me that patent applications are the busy work of misguided engineers, flights of fancy, or bar bets among engineers to see who can fool the USPTO. The application was filed on March 22, 2007, and published about 15 months later, pretty snappy for this fine US government entity. The paperwork was herded along by Google’s legal eagles at Fish & Richardson, a law firm operating from the warm and sunny Minneapolis, Minnesota.
If you are curious, you may want to take a look at “Seeking Answers to Questions”. The buzz about Powerset’s marvelous semantic search engine has many folks twittering. You may want to visit Hakia.com here to see an all-software approach. Then check out Yahoo’s help system here which seems to share some similarities with what Google describes. You can find other question answering systems, including InQuira’s implementation for Honda, Semantra’s system, et al.
Here’s what Google says its invention does:
A computer-implemented method of seeking answers to questions comprises receiving one or more questions from users seeking answers, maintaining an inventory of pending questions to be answered, and transmitting a question from the pending question inventory to a network location determined to be topically relevant to the transmitted question based on the content of the network location.
Pretty mundane, right? If so, then why are two Google wizards–Udi Manber and Benedict Gomes–wasting their time and Google’s money with this approach to question answering?
The social aspects of this invention are interesting. The human inputs hook into the Google infrastructure. There are hints of Google’s method for figuring out what’s good and what’s less good expressed as “knowledgeable users”, Google’s desire to build knowledge bases as it does with phonemes, and Google’s interest in hooking traffic into Web sites for the purpose of selling advertising. The notion of experts collaborating with experts struck me as a broader implementation of the types of operations one can achieve with appropriate resources via Tacit Software’s system for an enterprise.
This invention caught my attention because it expresses the meta-nature of some of Google’s other recent innovations. Google is chugging to knit existing intelligent sub systems into integrated fabrics of functionality.
I find this invention amusing because as Microsoft pursues Google with Xerox PARC technology that iterates down to meaning via machine processes. Google is exploring how to integrate human smarts, Google fancy math, and finer-grained advertising opportunities for advertisers. Judge for yourself if this expresses a holistic approach to information. The patent application is only 13 pages of crystal clear Google legalese and engineering explication. Agree? Disagree? Let me know.
Stephen Arnold, July 5, 2008
Architecture of a Database System Available for Download
July 5, 2008
I was looking for information about MySQL and SharePoint, and I came across a useful monograph. You can download a copy of the 2007 “Architecture of a Database System” by Joseph Hellerstein, Michael Stonebraker, and James Hamilton. Mr. Hamilton is involved in Microsoft’s new data center initiatives, and he has a solid grounding in databases. His co authors are professors. At the time the monograph was published, Mr. Hellerstein was at the University of California-Berkeley and Mr. Stonebraker at Massachusetts Institute of Technology. On July 4, 2008, the complete monograph was located here. I scanned the 180 page document and came away with the impression that the monograph provides a thorough review of database technologies. I did not spot much information about Google’s system, but I zipped through the PDF quickly and may have missed the references. You may want to snag a copy before it disappears.
Stephen Arnold, July 5, 2008
India Times Goes Googley
July 4, 2008
“Life at the Googleplex” appeared in my news reader on July 4, 2008. You can read the IndiaTimes Infotech essay here. (Note: this site’s search system is awful. I had to find missing chunks of the story in the cache for the site.) Why did this essay catch my attention? Well, it is a high water mark in dressing Googzilla as the world’ most wonderful mom. I read:
The company’s carefully assembled university campus like environment — lava lamps, massage chairs, free gourmet food courts — has been the subject of saturation media coverage.
Then the author giddily reveals:
There are Snack Rooms packed with bins of various cereals, gummy bears, M&Ms, toffee, licorice, cashew nuts, yogurt, carrots, fresh fruit and other snacks. Also stacked are dozens of different drinks including fresh juice, soda and make-your-own cappuccino. In fact, there’s a rule within Google: that there must be food within 100 feet of every employee.
The only hint of a negative in this star-struck writer’s Googley moment is this statement:
But despite its much-lionised and benign corporate credo ‘Do No Evil’, and a self-projected work culture that would appear to border on bohemian anarchism, Google gives media and technology giants from Redmond to New York to Minato sleepless nights. For the conspiracy theorists, Google’s takeover of the world is imminent.
Google gets the type of treatment usually reserved for Trump buildings and millionaires who buy big pink Pontiacs. Google’s taking over the world sounds like a pretty good thing to the author of this encomium.
Stephen Arnold, July 4, 2008
More Google Coding Goodies
July 4, 2008
Google is pushing its technical peanut ahead a millimeter at a time. You can read about Google’s C++ Testing Framework here. The Googley know that debugging sucks. So anything that helps a programmer write cleaner code “sucks less”. Why’s this important? Google is building a very solid, publicly-accessible development platform. Most organization’s information departments are not staffed with Google types. But in a few years, hoards of college graduates with programming skills and Google juice in their veins will be entering the work force. So, each of these programming services, features, and functions add up to a phase change. You can’t see it unless you piece together fragments. Google’s system makes it tough to find the pieces. Forget making a picture. Well, this is another piece. Google says:
It will take you about 10 minutes to learn the basics and get started. Stay tuned to this blog for helpful Google Test information in upcoming Testing on the Toilet episodes.
Google is writing to a very specific audience which does not need to be told explicitly that C++ testing is now quicker and easier. Do you know how to translate Google speak? Better learn.
Stephen Arnold, July 4, 2008
Fast Cash, Faster Crash
July 4, 2008
On July 3, 2008, Erick Schonfeld summarized the continuing saga of Fast Search & Transfer’s fastest move ever. The story “Did the Enron of Norway Pull a Fast One on Microsoft? More Details about the Mess at Fast Search $ Transfer? is here.
The story is quite thorough, according to my sources in Norway, and there is little I can add to the TechCrunch write up.
I would like to highlight one point, provide the links to my analysis of the Fast Search saga, and offer several observations about the nature of enterprise search. Before I start, take a look at this graphic because this is the wild bobsled ride that many vendors are queued to take:
Once a vendor starts down the sales bobsled run, it is tough to stop. The vendor has to ride to the bottom of the hill, hoping that he will not crash, rising serious injury and maybe death.
The Key Point for Me
After reading the TechCrunch essay, one segment gnawed at me; specifically:
…It [Microsoft’s paying $1.2 billion for Fast Search & Transfer] does point to a certain blindness on the part of Microsoft, or at least a willingness to look the other way, in its obsessive quest to become a player in search (see Yahoo and Powerset). It also raises questions about Fast’s underlying search technology. If Fast was having trouble closing deals for its products, how good can its technology really be?
Yes, this is the key question. The Fast Search & Transfer core technology was purpose built to index static Web sites. At the time Google started operations, AltaVista.com was an orphan, quickly losing its leadership position due to the voracious demand for resources that public Web search engines demand. The mantra is “Feed me computing resources or dies”.
Fast Search offered a Web site called AllTheWeb.com, and it was pretty good. At the time of 9/11, the AllTheWeb.com news indexing system was among the first to have reasonably timely information. Fast Search made a fateful decision in 2002 which led to Fast Search & Transfer’s exiting the Web indexing business. Fast Search sold its Web indexing business to Overture for $70 million with more money promised if certain goals were achieved. Fast Search took the money and focused on enterprise search.
The decision, as I recall my conversations with Fast Search & Transfer executives, when I was involved in the Fast Search deployment for a government project was that enterprise search was a great opportunity. Fast Search’s executives suggested to me that the company could move quickly to dominate the search market. At the time, there was little reason to doubt the confidence of the Fast Search team. A Fortune 50 was backing the Fast Search system in the government-wide indexing program. In the 2002-2003 time period, there were not too many systems that could demonstrate an index of 40 million documents. Even today, licensees of search systems do not grasp the hurdles that indexing large amounts of text puts in front of an organization. I have written extensively about this elsewhere, and I have little to add to the ignorance about search scaling that continues to plague organizations.
Business Intelligence: Growth but Is It Really Delivering
July 4, 2008
The fireworks have started in rural Kentucky. Oh, wait. That’s the neighbors firing shotguns at squirrels. Think of it as a way to give squirrels a fighting change.
Amidst the gun fire, I was chugging through my trust news reader and came across two stories about business intelligence. Both are well written and in a way complementary.
The IDC Business Intelligence Study
The first essay was by a solid journalist, Doug Henchen, who writes for Intelligent Enterprise. “IDC Report See Steady Growth for BI, Pent-Up Demand for Analytics” summarizes data about the business intelligence market or “BI” for short. You can read the full essay here. (Note: the url is a complex one, which often means a story can be tough to locate after a few days. Read Mr. Henchen’s article promptly. please.)
The essay is lengthy, and it is not possible to summarize it. Mr. Henchen crams a large amount of information into this two-page post. For me, the most important point in the article was:
Another technology seeing increased demand is text mining… with applications blossoming in areas such as voice-of-the-customer analysis. Vendors including Business Objects, SAS and SPSS have responded with recent acquisitions and product releases aimed at combining text mining and data mining techniques. The two camps of structured and unstructured data analysis remain very separate. It’s important for vendors to respond because if the products aren’t there, it makes it harder for practitioners to invest in the technology. [Some minor edits for readability made. SEA]
This observation underscores the assault on enterprise search vendors that users and business intelligence vendors are now making. Enterprise search is in a “circle the wagons” mode with significant pressure on high profile vendors from many quarters. Now business intelligence vendors see an opportunity to push applications that may be perceived as higher value.
One of the highlights of the essay is charts. Mr. Henchen has reproduced graphics, presumably from the for-fee report. Here’s an example:
So business intelligence is growing. Good news in a sinking economic ship.
Searching Microsoft with a Google Shortcut
July 4, 2008
A happy quack to Mitch Tullock at IT World for a useful script for Windows users. Mr. Tullock offers a script that directs Google to search the Microsoft content collection. If you are not familiar with this index, navigate to Google.com. Then click on “Advanced Search”. Scroll to the bottom of the page and click on “Microsoft”. Google will now use a special Microsoft index for your query. If you look for information about Microsoft, you will want Mr. Tullock’s script. Oh, why do Mr. Tullock and I use Google.com to search for Microsoft information? The Microsoft search feature does not do as good a job with the content as Google. Don’t believe me? Try locating information using http://search.live.com and look for information about Microsoft data center scaling. Now try the same query on Google. See for yourself. You can find the script here.
Stephen Arnold, July 4, 29008
http://www.itworld.com/internet/53438/script-make-kb-searches-easier