Salesforce InStranet and Sinequa Connection
September 13, 2008
In August 2008, Salesforce.com paid about $31.0 million for InStranet, according to eWeek here. Salesforce.com is the high flying cloud computing vendor. InStranet, founded in 1999, was generating a fraction of the Salesforce.com revenue. The principal business of InStranet’s technology provides customer support and contact center systems and tools. You can find more information, including white papers, here. InStranet calls its solutions “multi channel knowledge applications.” The idea is that a customer searches for information or looks at suggestions for links that can resolve a customer support issue.
Why am I writing about this modest acquisition that has been overlooked by most Web logs and trade news services? The answer is that Sinequa, the French search and content processing vendor, provides the search and finding technology included in some of InStranet’s customer support solutions. After four weeks of waiting, Sinequa learned that its deal with InStranet will not be affected by the Salesforce.com buy out of InStranet. You can read Bios Magazine’s report here.
Will Salesforce.com leverage Sinequa’s search and content processing technology? I will keep you posted.
More information about Sinequa is here.
Google and Content: Way Back in 1999
September 13, 2008
Nine years ago Google was a search engine, right? If you said, “Yes,” you were not wrong but not correct either. Google was worrying about static Web pages and ways to inject content into those Web pages. The idea was to get “some arbitrary input” from the user, and Google would take it from there. In 1999, Google’s wizards were working on ways to respond to user actions, create methods to assemble pertinent content likely to match the user’s need, and generate a Web page with that disparate information.
Why do I care about Google in 1999? Two reasons:
- Google was thinking “publishing” type thoughts a long time ago
- A patent with information on the specific system and method just popped out of the USPTO’s extremely efficient system.
The patent in question is US7424478 B2. Google filed it in 2000, received patent 6728705 and now this most recent incarnation. The title is “System and Method for Selecting Content for Displaying over the Internet Based upon Some User Input.” With the recent release of Chrome, the notion of assembling and publishing content from disparate sources is somewhat analogous to what Ziff Communications Co. used to do when it was publishing magazines or what its database units did when generating job opportunities in its General Business File product.
With Google scanning books and newspapers, it seems logical that it would take user input and assemble a Web page that goes beyond a laundry list. For me, the importance of this invention is that the GOOG was thinking these thoughts before it had much search traffic or money. Postscript: the mock screen shots are fun as well. You can see the sites that were catching Google’s attention almost a decade ago. Anyone remember Go.com?
Stephen Arnold, September 13, 2008
Search: A Failure to Communicate
September 12, 2008
At lunch today, the ArnoldIT.com team embraced a law librarian. For Mongolian beef, this information professional agreed to talk about indexing. The conversation turned to the grousing that lawyers do when looking for information. I remembered seeing a cartoon that captured the the problem we shelled, boiled, and deviled during our Chinese meal.
Source: http://www.i-heart-god.com/images/failure%20to%20communicate.jpg
Our lunch analysis identified three constituencies in a professionals services organization. We agreed that narrowing our focus to consultants, lawyers, financial mavens, and accountants was an easy way to put egg rolls in one basket.
First, we have the people who understand information. Think indexing, consistent tagging for XML documents, consistent bibliographic data, the credibility of the source, and other nuances that escape my 86 year old father when he searches for “Chicago Cubs”.
Second, we have the information technology people. The “information” in their title is a bit of misdirection that leads to a stir fry of trouble. IT pros understand databases and file types. Once data are structured and normalized, the job is complete. Algorithms can handle the indexing and the metadata. When a system needs to go faster, the fix is to buy hardware. If it breaks, the IT pros tinker a bit and then call in an authorized service provider.
Third, we have the professionals. These are the ladies and gentlemen who have trained to master a specific professional skill; for example, legal eagle or bean counter. These folks are trapped within their training. Their notions of information are shaped by their dead lines, crazed clients, and crushing billability.
Here’s where the search system or content processing system begins it rapid slide to the greasy bottom of the organization’s wok.
- No one listens or understands the other players’ definition of “information”.
- The three players, unable to get their points across, clam up and work to implement their vision of information
- The vendors, hungry for the licensing deal, steer clear of this internal collision of ignorant, often supremely confident souls
- The system is a clunker, doing nothing particularly well.
Enter the senior manager or the CFO. Users are unhappy. Maybe the system is broken and a big deal is lost or a legal matter goes against the organization. The senior manager wants a fix. The problem is that unless the three constituents go back to the definition of information and carry that common understanding through requirements, to procurement, to deployment, not much will change.
Like the old joke says, “Get me some new numbers or I will get a new numbers guy.” So, heads may roll. The problem remains the same. The search and content processing system annoys a majority of its users. Now, a question for you two or three readers, “How do we fix this problem in professional services organizations?
Stephen Arnold, September 12, 2008
A Useful SharePoint Functional Diagram
September 12, 2008
Eric Johnson, enterprise architect, invested some time in creating a SharePoint functional breakout. Most of the SharePoint diagrams I have seen are too complicated for my goose-sized brain. You can look at his “Moss Functional Breakout” here and snag a copy of this useful diagram. His Web log post identifies a couple of individuals who have “borrowed” his diagram without permission. I don’t want to make this error, nor do I want you to use it without appropriate attribution. Here’s a low resolution version of this schematic. The credit line is Eric Johnson, formerly Microsoft, now enterprise architect. You can reach him here.
A happy quack to Mr. Johnson and a little goose present on the windshield of those who used this diagram with out permission or attribution.
Stephen Arnold, September 12, 2008
eDiscovery: Speed Bumps Annoy Billing Attorneys
September 12, 2008
A happy quack to my Australian reader who called “eDiscovery Performance Still a Worry”. The article by Greg McNevin appeared on the IDM.net.au Web site on September 10, 2008. The main point of the write up is that 60 percent of those polled about their organization’s eDiscovery litigation support system said, “Dog slow.” The more felicitous wording chosen by Mr. McNevin was:
The survey also found that despite 80 percent of organisations claiming to have made an investment in IT to address discovery challenges, 60 percent of respondents think their IT department is not always able to deliver information quickly enough for them to do their legal job efficiently.
The survey was conducted by Dynamic Markets, who polled 300 in house legal eagles in the Uk, Germany, and the Netherlands. My hunch is that the 60 percent figure may well apply in North America as well. My own research unearthed the fact that two thirds of the users of enterprise search systems were dissatisfied with those systems. The 60 percent score matches up well.
In my view, the larger implication of this CommVault study is that when it comes to text and content processing, more than half the users go away annoyed or use the system whilst grumbling and complaining.
What are vendors doing? There’s quite a bit of activity in the eDiscovery arena. More gladiators arrive to take the place of those who fall on their swords, get bought as trophies, or die at hands of another gladiator. Sadly, the activity does not address the issue of speed. In the sense for this context, “speed” in not three millisecond response time. “Speed” means transforming content, updating indexes, and generating the reports needed to figure out what information is where in the discovered information.
Many vendors are counting on Intel to solve the “speed” problem. I don’t think faster chips will do much, however. The “speed” problem is that eDiscovery relies on a great many processes. Lawyers, in general, have a need for what’s required to meet a deadline. There’s little reason for them to trouble their keen legal minds with such details as content throughput, malformed XML, flawed metatagging, and trashed indexes after an index update.
eDiscovery’s dissatisfaction score mirrors the larger problems with search and content processing. There’s no fix coming that will convert a grim black and white image to a Kodachrome version of reality.
Stephen Arnold, September 12, 2008
Convera Ties Up with Fast Search
September 12, 2008
I have to admit that I was surprised with this headline “Convera and Fast, a Microsoft Subsidiary Announce Extended Business Relationship.” You can read the story from MarketWatch here. (Note: this is a wacky url and I have a hunch the source will 404 unless you click in a timely manner.) After releasing its Web Part for SharePoint, I was expecting a tapering off in Fast Search & Transfer innovation. Was I surprised? Yes, a tie up between Convera (formerly Excalibur Technologies) and Microsoft’s enterprise search unit is news. Convera, like Fast Search, seems to have gone through a rough patch on the information superhighway. There was the marriage and divorce from Intel and the NBA several years ago. Then there was a decline in revenues. Finally, Convera exited the enterprise search business to focus on building vertical search engines for publishers. My recollection is that the company’s revenues continue to dog paddle. Fast, as you may know, is alleged to have made a math error with its financial reports in the 24 months prior to its sell out to Microsoft in April 2008 for $1.2 billion. I thought opposites attracted, and these companies seem to have some interesting similarities. I recall that Autonomy bought a chunk of Convera. Then Fast Search jumped in and bought another piece of Convera. There’s a history between Fast Search and Convera, and obviously the management teams of both companies get along swimmingly.
According to the write up that appeared on September 11, 2008:
Convera will integrate FAST ESP(TM) search capabilities into its hosted vertical search solution for publishers. Publishers will be able to configure and customize an integrated site search and vertical search solution using Convera’s Publisher Control Panel (PCP). Additionally, Convera will extend FAST AdMomentum across the Convera search platform for publishers. Customers of the Convera professional network would be able to leverage targeted search and contextual ad inventory supplied by Microsoft and distributed from the Convera platform.
I certainly hope this deal becomes a 1 + 1 = 3 for their respective stakeholders. Fast Search has faded to the background at Microsoft from my vantage point. Convera had, in all honesty, dropped completely off my radar. The Intel – NBA blow ups waved me off the company years ago. My take on this deal is that I am not sure it will generate significant revenue unless Convera has found a way to out Google Google. This seems unlikely to me. Agree? Disagree? Help me learn.
Stephen Arnold, September 12, 2008
Search: Google’s 10 Percent Problem
September 11, 2008
I love it when Google explains the future of search. Since Google equals search for more than 70 percent of the users in North America and even more outside the US, the future of search means Google. And what does Google’s helpful Google Web log here tells us:
So what’s our straightforward definition of the ideal search engine? Your best friend with instant access to all the world’s facts and a photographic memory of everything you’ve seen and know. That search engine could tailor answers to you based on your preferences, your existing knowledge and the best available information; it could ask for clarification and present the answers in whatever setting or media worked best. That ideal search engine could have easily and elegantly quenched my withdrawal and fueled my addiction on Saturday.
The “universal search” play announced at the hastily conceived Searchology news conference–anyone remember that?–has fallen by the wayside. I have wondered if the BearStearns’ publication of the Google Programmable Search Engine report and the suggestion that Google may be angling to become the Semantic Web spawned that Searchology program.
I don’t think search is a 10 percent problem for Google. The problem is bandwidth, regulations, traffic, and the market. After digging through Google’s technical papers and patent documents, I have reached the conclusion that the GOOG has the basics in place for next-generation search; for example:
- Search without search
- Dossier generation
- Predictive content assembly
- Integration of multiple functions because “search” is simply a way station on the path to solving a problem.
Most of the search pundits getting regular paychecks for now from mid level consulting firms assert that we are at the first step or Day One of a long journey with regard to search. Sorry, mid range MBAs. Search–key word variety–has been nailed. Meeting the needs of the herd searcher–nailed. Personalization of results–nailed.
What’s next are these search solutions. The reason that vendors are chasing niches like eDiscovery and call center support is simple. These are problems that can be addressed in part by information access.
Meanwhile the GOOG sits in its lairs and ponders when and how to release to maximum advantage the PSE, dataspaces, “I’m feeling doubly lucky” and dozens of other next generation search goodies, including social. Keep in mind that the notion of clicks is a social function. Google’s been social since the early days of BackRub.
There you have it. Google has a 10 percent challenge. In my opinion, that last 10 percent will be tough. Lawyers and other statistically messy non-algorithmic operations may now govern Googzilla’s future. If you want links to these Google references, you can find them here. My rescue boxer Tess needs special medical attention, so you have to buy my studies for the details. Sorry. Rescue boxers come before free Web log readers. Such is life. Sigh.
Stephen Arnold, September 11, 2008
Tribune Says: Google’s Automated Indexing Not Good
September 11, 2008
I have been a critic of Sam Zell’s Tribune since I tangled with the site for my 86 year old father. You can read my negative views of the site’s usability, its indexing, and its method of displaying content here.
Now on with my comments on this Marketwatch story titled “Tribune Blames Google for Damaging News Story” by John Letzing, a good journalist in my book. Mr. Letzing reports that Google’s automated crawler and indexing system could not figure out that a story from 2002 was old. As a result, the “old” story appeared in Google News and the stock of United Airlines took a hit. The Tribune, according to the story, blames Google.
Hold your horses. This problem is identical to the folks who say, “Index my servers. The information on them is what we want indexed.” As soon as the index goes live, these same folks complain that the search engine has processed ripped off music, software from mysterious sources, Cub Scout fund raising materials, and some content I don’t want to mention in a Web log. How do I know? I have heard this type of rationalization many times. Malformed XML, duplicate content, and other problems means content mismanagement, not bad indexing by a search systems.
Most people don’t have a clue what’s on their public facing servers. The content management system may be at fault. The users might be careless. Management may not have policies and create an environment in which those policies are observed. Most people don’t know that “dates” are assigned and may not correlate with the “date” embedded in a document. In fact, some documents contain many dates. Entity extraction can discover a date, but when there are multiple dates, which date is the “right one”? What’s a search system supposed to do? Well, search systems process what’s exposed on a public facing server or a source identified in the administrative controls for the content acquisition system.
Blaming a software system for lousy content management is a flashing yellow sign that says to me “Uninformed ahead. Detour around problem.”
Based on my experience with indexing content managed by people who were too busy to know what was on their machines, I think blaming Google is typical of the level of understanding in traditional media about how automated or semi automated systems work. Furthermore, when I examined the Tribune’s for fee service referenced in my description identified above, it was clear that the level of expertise brought to bear on this service was in my opinion rudimentary.
Traditional media is eager to find fault with Google. Yet some of these outfits use automated systems to index content and cut headcount. The indexing generated by these systems is acceptable, but there are errors. Some traditional publishers not only index in a casual manner, these publishers charge for each query. A user may have to experiment in order to find relevant documents. Each search puts money in the publisher’s pocket. The Tribune charges for an online service that is essentially unusable by my 86 year old father.
If a Tribune company does not know what’s on its servers and exposes those servers on the Internet, the problem is not Google’s. The problem is the Tribune’s.
Stephen Arnold, September 11, 2008
tyBit: Zero Click Fraud
September 11, 2008
I’m getting my bobbin loaded and squirting the trestle on my sewing machine. It’s almost time for Project Runway, my favorite television show. I put down my can of 3 in 1 oil and scanned my newsreader for gewgaws. What did I see? A story in the prestigious Forbes Magazine about a new search engine called tyBit. I put down my bobbin and picked up my mouse. The paragraph in the Business Wire story on the lofty Forbes.com’s Web site said here:
tyBit is the only Internet search solution that eliminates click fraud for its advertisers and provides itemized billing for all advertising dollars spent. It is also a no-cost private label search engine for traditional media so they can win back their advertisers, subscribers and revenue.
I navigated to the tyBit Web site, which was new to me, and saw this splash page complete with my tyBit “man”.
I ran my favorite query “ArnoldIT Google” and received this list of results:
I was happy. The first hit pointed to something I had written.
I then ran an image search on the query “arnoldit” and saw this:
There I was in bunny rabbit ears in 1981 and in 2007 with my lifetime achievement award for ineptitude. Happy again.
But I clicked on the ad with label “Get free advertising now.” I don’t do advertising. No one hires me anyway. I clicked on the ad, hit back, and then clicked again. What do you know? Click fraud; that is, the click with no intent to buy. In fact, I did it seven or eight times until I decided that the zero click fraud assertion did not apply to house ads on queries about “ArnoldIT Google.”
The site indexes images, video, news, music, “local” and “shop”. I found a line to sign up for tyBit mail. I did not click on each of these links. Project Runway awaits. The Forbes.com write up provides some metrics about the company:
- More than 6,000 advertisers test the click fraud technology
- The site averages 2.1 million search per day and 50 million searches in August 2008
- One advertiser got more than 40 leads.
Sounds good. My suggestion is read the Forbes.com write up, explore the tyBit site here, and make up your mind. Google’s dominance does not seem to intimidate Clarence Briggs, CEO of tyBit. I have added this company to my watch list. Lots of search innovation out there right now there is, there is.
Stephen Arnold, September 11, 2008
Google: EULA Loop
September 11, 2008
The Australian PC World ran a news story I found interesting. “Google Claims License to User Content in Multiple Products.” You can read Grant Gross’s article here. The news peg is that the “ownership” language in the Chrome license agreement appears in other Google EULAs as well. Mr. Gross mentions these products:
- Picasa
- Blogger
- Google Docs
- Google Groups
You and I need to verify Mr. Gross’s assertions. If true, my observation that this type of language is not accidental may not be wide of the mark. Mr. Gross reports that some folks find these EULAs sufficient cause to work around Google services. Google, on the other hand, seems to suggest that we’re really great guys and won’t take anyone’s content.
For me the most interesting comment in the write up was:
…the copyright terms that still exist in Picasa, Blogger and other Google applications would allow the company to use its customers’ content to promote the Google service. That could allow Google to use the content in live product demonstrations, for example, or in some promotional materials…
If true, I need to do some hard thinking about what and what not to do via Google services. If false, it’s Google all the way. Right now, I’m on the fence. I think this is a safe place to be until this EULA issues becomes clearer to me.
Stephen Arnold, September 11, 2008