Google Odds: A Possible Search First
January 13, 2010
Amidst the furor of the Google – China issue, I noticed that most of the pundits ignored the global disruptive power of a Google decision. I may be one of the few—maybe the only addled goose—pointing out that Google operates like a nation-state, not a garden variety company. Another example of Google’s significance popped up in my Overflight service this morning. PaddyPower.com, an online wagering operation, issued a news release with the headline “Bookie Calls Google for Chinese Takeaway.” The company has put odds on Google’s action. Here’s the relevant passage:
Bookies Paddy Power are offering odds of 3/1 that Internet giant Google will follow through on it’s threat to quit China before 2012. The harsh warning by the worlds biggest search engine was sparked after the illegal hacking of Chinese Gmail accounts and comes amid increasing tensions between the US and China over Internet censorship. Any move by Google to quit China will no doubt comes as good news for China’s leading search engine, Baidu, that currently enjoys a 60 percent share of the Chinese Internet search market. Paddy Power are quoting odds of 10/11 that their market share will increase to 65 per cent by the end of 2010. Paddy Power said “China is obviously a massive potential market for Google so it will be interesting to see what the long-term strategic impact will be should they effectively give two fingers to the Chinese government and jump ship”
Three to one. If I were a betting goose, maybe?
Stephen E. Arnold, January 13, 2010
No one paid me to write this news item. Since it relates to wagering, I will report it to one of the many lottery commissions. Now which state governs geese?
PolySpot Lands Crédit Agricole SA
January 13, 2010
PolySpot, a French systems development company, has landed the Economic Research Department of Crédit Agricole SA as a customer. The system will be used with the financial institutions bilingual Intranet portal. The story I saw appeared in Communauté Finance Opérationnelle. The PolySpot system will provide:
- Access to structured and unstructured data
- Theme suggestions
- Simple and advanced search options
- Programmable Custom Alerts
- Sort options
- Faceted navigation (grouping results by different criteria)
- Access rights management
- Stored query support.
You can get more information about PolySpot’s search and content processing system at www.polyspot.com/. You can read an interview with a PolySpot executive in the ArnoldIT.com Overflight service.
Stephen E Arnold, January 13, 2010
Nope, an unpaid post. When I am in Paris, I hide out in the flea market at Porte de Clignancourt. I will report this to the CRS shortly.
Stratify Software India
January 13, 2010
Some interesting information about Stratify, a unit of Iron Mountain, surfaced in a job posting for an engineer in Bangalore. In India, Stratify does business as Stratify Software India Pvt Ltd. The part of the advert that caught my attention was this description of Stratify as a Software as a Service company. Here’s the snippet I found interesting:
Stratify is a Product company which provides electronic discovery or unstructured Data mining solutions through Software as a Service Model. We are a fully owned subsidiary of Iron Mountain, the world’s largest Data Storage, protection and Recovery company with $3 Billion revenue. We are market leaders in our space and have registered 25-30% growth last year and 70% per annum growth in the previous 4 years. We have mostly Fortune-1000 companies as our clients. Iron Mountain, our parent company, has more than 13,000 employees.
Stratify, originally Purple Yogi, came on my radar as a text and content processing company. Now the firm is a provider of electronic discovery or unstructured data mining solutions. I also think the growth of Iron Mountain is a useful factoid as well.
Stephen E. Arnold, January 13, 2010
A freebie. I suppose this disclosure falls under the purview of the ExIm Bank to which I shall report the fact that I got no money for this item. Don’t you feel better knowing I wrote this because I have only a small pond in which to swim.
Search Vendors Working the Content Food Chain
January 13, 2010
In the last six months, I have noticed that three companies are making an effort to respond to ZyLAB’s success in the end-to-end content processing sector. There has been some uninformed and misleading discussion of search and content processing companies shift to vertical market solutions. I think this view distorts what some vendors are doing; namely, when one company finds a way to make sales, the other vendors pile into the Volkswagen. This is not so much “imitation as flattery”. What is happening is that sales are tough to make. When a company finds an angle, the stampede is on. In a short period of time, an underserved sector in search and content processing has more people stomping around than Lady Gaga.
Let’s go back in history, a subject that most of the poobahs, azure chip consultants, and self appointed experts avoid. The idea that certain actions have surfaced before is no fun. Identifying a “new” trend is easier, particularly when the trend spotter’s “history” extends to his / her last Google query.
The Mobius strip is non-orientable, just like search solutions that provide end-to-end solutions. A path on a Mobius strip can be twice as long as the original strip of paper. That’s a good way for me to think about end-to-end search and content processing systems. Costs follow a similar trajectory as well.
In the dim mists of time, one of the first outfits to offer and end-to-end solution to content acquisitions, indexing, and search was—believe it or not—Excalibur. The first demonstration I received of the Excalibur RetrievalWare technology included scanning, conversion of the scanned image’s text to ASCII, indexing of the ASCII for an image, and search. The information processed in that demonstration was a competitor’s marketing collateral. There were online search systems, but these were mostly small scale systems due to the brutal costs of indexing large domains of HTML. A number of companies were pushing forward with the idea of integrated scanning systems. Sure, in the 1990s you could buy a high end scanner and software. But in order to build a system that minimized the fiddly human touch, you had to build the missing components yourself. Excalibur hooked up with resellers of high end scanners from companies like Bell+Howell, Fujitsu, and others. The notion of taking a scanned image and then via an in memory processing performing optical character recognition of the page image and then indexing that ASCII was a relatively new method. UMI (a unit of Bell+Howell) had a sophisticated production process to do this work. Big outfits like Thomson were interested in this type of process because lots of information in the early 1990s was still in hard copy form. To make a long story short, the Excalibur engineers were among the first to create commercial product that mostly worked, well, sort of. The indexing was an issue. Excalibur embarked on a journey that required enhancing the RetrievalWare product, generating ready-to-use controlled vocabularies for specific business sectors like defense and banking. As you may know, Excalibur’s original vision did not work so the company mrophed into a search and content processing company with a focus on business intelligence. The firm renamed itself as Convera. The origins of the company were mostly ignored as the Convera package of services chased government work, commercial accounts like Intel and the National Basketball Association (data center SaaS functions for the former and video searching for the hoopsters). When those changes did not work out too well, Convera refocused to become a for fee version of the free Google custom search engine. That did not work out too well either, and the company has be semi-dissolved.
Why’s this important?
First, the history shows that end-to-end processing is not new. Like much of the hot search innovations, I find the discoveries of the azure chip crowd a “been there, done that” experience. Processing paper and making it searchable is a basic way to approach certain persistent problems.
Second, the synopsis of the Excalibur trajectory makes clear that senior managers of search and content processing companies scramble, following well worn paths. The constant repositioning and restating of what a technology allegedly does is a characteristic of search and content processing.
Third, the shifts and jolts in the path of the Excalibur / Convera entity are predictable. The template is:
- Start with a problem
- Integrate
- Sell
- Engineer fixes on the fly
- Fail
- Identify a new problem
- Rinse, repeat.
What has popped out of my Overflight intel system is that law firms are now looking for a solution to a persistent information problem; that is, when a legal matter fires up, most search systems work just fine with content in electronic form. The hitch is that a great deal of paper is produced. If something exists in digital form and one law firm must provide that information to another law firm, some law firms convert the digital information to paper, slap on a code, and have FedEx deliver boxes of paper. The law firm receiving this paper no longer has the luxury of paying minions to grind through the paper. The new spin on the problem is that the law firm’s information technology people want to buy a hardware-software combination that allows a box of paper to be put in one end and the magic between the hard copy and the searchable, electronic instance of the documents are magically completed.
Well, that’s the idea. Some of the arabesques that vendors slap on this quite difficult problem include:
- Audit records so a law firm knows who looked at what when and for how long
- A billing method. Law firms want to do invoices, of course
- A single point solution so there is “one throat to choke”.
What the companies want is what Excalibur asserted it had almost 20 years ago.
ZyLAB, under the firm hand of Johann Scholtes (a former Dutch naval officer), has made inroads in this market sector. You can read an interview with him in the Search Wizards Speak series, so I won’t recycle that information in this write up.
Autonomy was quick to move to build out its end-to-end solutions for law firms and other clients with a paper and digital content problem. In fact, Autonomy just received an award for its end-to-end eDiscovery platform.
Brainware offers a similar system. That company, a couple of years ago, told me that it had to add staff to handle the demand for its scanning and search solution. Among the firm’s largest customers were law firms and, not surprisingly, the Federal government. You can read an interview with a Brainware executive (who is an attorney) in the Search Wizards Speak series.
I learned that Recommind has inked a deal with Daeja Image Systems for its various document processing software components. The idea is to be able to provide an end-to-end solution to law firms, government agencies, and other outfits that need a system that provides access to paper based content and digital content.
Let’s step back.
What this addled goose sees in these recent announcements is that the “new” is little more than a rediscovery that law firms have not yet cracked the back of the paper to digital job and been able to get a search system that provides access to the source material. Sure, there were solutions 20 years ago, but those solutions don’t meet a continuing need. Notice that this problem has been around for a long time, and I don’t think the present crop of solutions will solve the problem fully.
Search Merging with CMS
January 13, 2010
When you have a CMS “hammer”, you have the opportunity to see an information problem as something that can be pounded with CMS. Let me be upfront. Most organizations are not in the information business. The idea that Big O’s tires in Kentucky is an information company is not just silly; it’s a financially imprudent assertion. Big O’s is a retail operation that sells tires and services. The company’s Web site is a marketing is a marketing effort, but when you need tires for your Hummer with a gun mount, you have to haul on over to the closest Big O’s, pony up cash, and get your tires mounted, balanced, and bolted on. Sure, information is important to the Big O operation, but like many other businesses, Big O’s moves tires. Information is an enabler, sort of a digital lubricant. A person dressed up in a Daniel Boone outfit holding a sign that says, “50% off Tires. Today only.” is information. But the pointy end of the business is selling tires.
Just hop right into the CMS tanning bed. It will make you look and feel great. Oh, there may be some risks, but what’s more important? Looking great or becoming a human Blutwurst.
When I read CMS Wire’s short article “MySource Matrix” I was surprised that search is becoming part of CMS. Yikes. CMS, content management systems, refers to a bunch of software components that perform integrated content operations for Web sites. There are document management systems that help nuclear power plants keep track of engineering change orders. And there are really expensive enterprise publishing systems from Hewlett Packard and StreamServe that manage and output certain types of enterprise information. I grant that when you can’t find a document, you can’t do much with any of these systems. So, search is a utility. Search in any of these three types of content systems often is not particularly good. Vendors license “stubs” stick them in CMS and related systems so when more features are needed, the vendors can turn on the taxi meter. Software cannot put an editorial sense into an organization. Humans have to do that, and humans often are not able to perceive the problem or its optimal solution when basking in the vendor’s tanning salon.
Here’s the passage from Squiz that caught my attention:
They’ve [Squiz, Funnelback, and MySource Matrix] chosen this direction because they see the lines between CMS and search blurring, where some projects may need search-based vertical applications rather than starting with a separate CMS and search library. According to Morgan [Squiz executive], this approach will reduce integration costs and increase access to data across an organization.
Note: Squiz owns the Funnelback search system. You can see this in action on the Australian Resource Centre for Healthcare Innovation or ARCHI.
Most CMS, DMS, and enterprise publishing systems are complicated beasties, and each has a contribution to make to certain organizations, the path to a functioning, easy to maintain content system can be a long, difficult one. In my experience, CMS means managing a Web site. CMS has been stretched into DMS territory, and some of the vendors with the biggest marketing horn have floundered and ended up chum for the M&A crowd. The document management systems that focus on a specific content purpose like the aforementioned ECOs work well, but one needs to have an records management specialist handy. The enterprise publishing systems are not widely known outside of certain market sectors. These cost a lot of money and suffer from one fatal flaw in my opinion. Most lack an information infrastructure service or foundation. No foundation, the structure built on it is dicey.
This notion of having everything in one place so anyone can edit, repurpose, and search is a great idea. Today, the cost of achieving that utopia can be high, both in time and money.
I can see the direction this marketing angle will lead. Thank goodness I am old and won’t have to deal with the wackiness these big marketing ideas unleash on cash strapped organizations struggling to keep their systems from breaking the bank each time those systems crash. There’s a lot of opportunity in content, but fuzzy thinking may not be what Boards of Directors and CFOs want.
Stephen E Arnold, January 13, 2010
I want to disclose to the Office of Management and Budget that I was not paid to point out the financial issues of fuzzy thinking. I bet this article was a surprise to them. Don’t Federal content and document managements systems work like spinning tops?
Competitive Intel about Google
January 13, 2010
If you are interested in what people say about Google, you will want to become a user of Aqute Intelligence: Google. In addition to being quite helpful, the service is offered without charge and does not have any annoying features. You can scan a list of Aqute’s favorite items. I found the round up of links to the Nexus One an easy way to follow the customer support issues related to the device. One feature that is unique is Google Employees in the News”. You can see the information for the period from December 21, 2009, to January 4, 2010. I find that my work in Google patent applications often requires a quick check to determine if the Google inventor is still in the Google engineering line up. One recent example was a patent document with Anna Patterson’s name. Dr. Patterson founded the Cuil.com system, and I need to see if she had surfaced as a Google employee since her departure. A happy quack to the Aqute team.
Stephen E Arnold, January 13, 2010
Nope, no one paid me to write this. I would like to suggest I did it out of the goodness of my heart, but this is a marketing and sales blog. I will report it to National Cancer Institute anyway.
Google, the Nation State: What Does the Flag Look Like?
January 12, 2010
Short honk: The Google is deleting China. The German Justice Minister figures out that Google is global. The French want to tax the Googlers. My view is that the GOOG is one heck of a country. Yep, country, not a company. Maybe the algorithm crowd will get a seat at the UN?
Stephen E Arnold, January 13, 2010
A freebie. Gee, I don’t know to whom to report this. Maybe the United Nations?
Free Complex Products: Sign of Revenue Starvation?
January 12, 2010
Here is a hypothetical. You are sitting in the airport. A young woman sits next to you and examines the engineering diagram for a CPU with 1,000 cores. You ask her, “What’s that?” She says, “The basic building block of Google’s hyper grid architecture.” You look at her, glance at the diagram, and wonder what the heck she just explained to you.
That’s information without context. You don’t know what you looked at. You don’t know what a hyper grid is. You don’t know why Google needs an architecture. In short, you get a peek in Messrs Brin and Pages techno-think and its worth bupkis.
I learned from one of my two or three readers that Dialog Information Services, was giving away free online searching for a couple of months. That is big news to me. ProQuest, a commercial database publisher, bought Dialog from Thomson Reuters and aims to make it pay off big time.
The story “Dialog Offers Free Searching of Selected Cengage Files Through March 30th” appeared in a publication called the Resource Shelf. Aimed at info pros, the Resource Shelf covers some of the machinations in the world of former online superstars like LexisNexis and the aforementioned Dialog Information Services. (If you resonate with ss cc=7600 AND ESOP AND UD=9999, you know this service is not what it once was. Today’s online searcher has little interest in where data originates, editorial policies, and command line searching on systems written in part in PL/1 and running on big honking machines.
The reference to “Cengage” refers to a spin out of Thomson Reuters. Yep, the same Thomson Reuters that sold Dialog to Pro/Quest. If you are like me, it seems that Thomson Reuters got out of the commercial online business and now its former kissing cousins are teaming up to pump up usage of these commercial databases.
Most online users today don’t think too much about paying $0.25 to print a title while paying connect time to outfits like Tymnet. Lawyers, at least some lawyers, still do this. Patent researchers still fork over big money to look at special databases containing publicly accessible patent data. Certain chemists love—absolutely love—searching for chemical structures using the Chem Abs super service.
The key point in the Resource Shelf’s write up was, in my opinion, this segment:
“During this time all DialUnits, Connect Time, and Alert Profile charges will be waived to allow customers to search these files and create and run Alerts profiles at no charge. Output pricing such as full formats and Alert prints will be charged at current rates.”
What this means is that “free” applies to the part of the service that does not generate the big bucks. What makes money for commercial online services? Online types (looking at a record in electronic form) and offline prints (getting a hard copy of the search results). Note that if your query returns zero useful results, the traditional customer friendly approach is to charge for the privilege of finding out that there was no useful information in the database you query. Lawyers love this sort of zero results is good information. Most people don’t.
In fact, the commercial online services have to get out of their historical approach to online and find a way to attract new users and generate meaningful new revenue. Here’s why:
- Google is good enough and free.
- Commercial database publishers are in a market squeeze arguably more problematic than the mostly inept content management system vendors find themselves. (CMS doesn’t work too well either in my opinion.)
- Enterprise software vendors are putting code shims in place to provide certain high value information within other enterprise applications. Data.gov might be immature, but it sure suggestive.
- The costs of running an outdated infrastructure with data that change less frequently than I paint the log cabin in which I live don’t match the real time demands of 20 somethings
- The expense of creating a commercial database is creeping up. The mom and pop shops cannot compete with the more sophisticated operators like Ebsco, one of the big dogs in the high value information business.
Add this up and what do you get? Not too much. I think it is a marketing play that communicates that both Dialog and Cengage may be grasping at straws. Why? Go back to the hypothetical with which I started this write up. Most people won’t know what they are looking at when they run queries against these databases:
80 Aerospace/Defense Markets & Technology
13 Business & Management Practices
88 Business A.R.T.S.SM
479 Company Intelligence
275 Computer Database
18 F&S Index
583 Globalbase
149 Health & Wellness Database
150 Legal Resource Index™
47 Magazine Database
75 Management Contents
570 Marketing & Advertising Reference Service
111 National Newspaper Index
211 Newsearch
649 Newswire ASAP
160 PROMT (1972-1989)
148 Trade & Industry Database
93 TableBase
So what’s the number. That’s the Dialog number which you enter with the command “b 160”. See what I mean. Giving away free information without context means zero, maybe less than zero because a potential customer may be turned off when paying for outputs.
I don’t go to the real “library” any more. I use my own online files and free online resources. That’s the problem. When someone from the commercial database world does not use his own products, something has changed. Free won’t bring back the bad old days of traditional online services.
The field of battle has shifted. The free offer underscores how much of a gap exists between the former giants of online and the new services from other vendors. There’s no 1,000 processor architecture available that can alter a business model that worked 30 years ago.
Stephen E Arnold, January 12, 2010
I state that I was not paid to point out that free with a tiny footnote that explains the not free part of this Dialog offer. I shall report this pitiable excuse for a non-compensated write up to the US Postal Service, another traditional institution hit by new technology and changing user behavior. Where has all the junk mail gone?
Google Compromises When Necessary
January 12, 2010
Which is easier? Ask for permission or apologize for an action. The article “China Writers Say Google Ready to Settle Book Row” suggests that the apologize tactic is operative. Competitors and governments find themselves reacting to Google. Unless there is a good reason, Google will just keep being Google. The Google book story is even more interesting. The Android open source software has found its way into the Barnes & Noble Nook. Spring Designs allegedly will roll out a $350 eReader running Android in February 2010. My take is that the Google wants to keep on scanning and will apologize only when its feet are held to the heat of a country’s legal furnace. Those Google watchers who are tracking the phone may want to keep their eyes peeled for other Android stickiness. Oh, one more thing: Google does not seem to go with the ask for permission approach.
Stephen E Arnold, January 12, 2010
Oyez, oyez. I wish I could reveal that I received a bag of goose feed for this short write up. Alas, alas, cried the goose. I will report this to the Occupational Safety & Health Administration (OSHA) who wants a hale and hearty goose.
Copyright and the Generation Angle
January 12, 2010
Short honk: If you are into the copyright battles now underway, you may find “The Copyright Bubble” interesting. At a minimum, it makes clear that a demographic blip is part of the problem. The thought I had was that parents of copyright ignorers may want to take another run at changing their children’s behaviors. If a child—now 25 –embraces the copyright bad mentality, perhaps more aggressive parental action is needed. The implication of the write up is that when old folks head to the traditional heaven where the Internet does not work, copyright will be okay. Failing that, the author of the Copyright Bubble may be correct.
Stephen E Arnold, January 12, 2010
I know the disclosure ruling became effective on December 1, 2010. It is the new year and I still am writing this baloney for no money. I think I will reveal this fact to the Marine Mammal Commission.