Chemse: Not Chem Abs but Useful and Free
January 12, 2010
Most of the pundits ignore the real world search problems. Ternary phase diagrams, math recipes, and chemical information to name three. But what you don’t know makes it easy to point out that content management is a great business and search is really simple. Dorkiness from pundits aside, the real world tackles information retrieval in subject areas the search poobahs rarely trek. For those interested in chemicals, chemical suppliers, and chemical industry news, check out Chemse, a free chemical search engine. About Us reported:
Our vision is the creation of a 100% transparence over the chemical market. Therefore we want to be an established partner for your daily business activities. We achieve this with sharing our experience, strong network, willingness for continuous improvement, a good team and with a steady improvement of our existing and new information.
A search for calcium permanganate returned vendors and this surprise:
Yes, a chemical structure. That’s a trick that the Google has not yet added to their online service.
The service operates a useful service. I can locate a seller of the chemical, scan news, and register. That’s free and delivers some additional services:
- Search history
- Predefine one’s inquiry text
- Keep an inquiry history
- See mail addresses for certain entities
- Manage inquires sent to a vendor.
More details as I locate them.
Stephen E Arnold, January 12, 2010
A freebie. I think I will report this sad fact to the National Nuclear Security Administration.
Another Stab at the Cost of Finding Documents
January 12, 2010
Some folks watch a few professionals flounder when looking for information. Others guesstimate how much time is required to locate a needed document. Fresh Business Thinking quotes a wizard, offering factoids like:
SMEs spend approx. 3 months a year looking for documents. (SME is a small-sized or mid-sized organization)
87% of respondents spend up to 2 hours every day looking for documents (on average, one hour of a person’s time is worth £86.61 so that’s £173.22 per person per day wasted across the UK!) (This is the easy route to a cost estimate and probably not a number to take to the bank.)
93% of people surveyed think they waste time looking for documents every day (I want to meet the other seven percent and find out their methods)
On average 46.93% of documents handled by SMEs are still paper based which is incredibly dangerous should there be a fire or flood. (Paper equals danger. I prefer “risk”)
I don’t doubt these figures, but it would have been helpful to get a bit more information about the size of the sample.
That thought these figures triggered was, “No wonder there is such high dissatisfaction with enterprise search systems.” If I spent 25 percent of my time hunting, I would have less time for thinking. We had to locate a single file last used in 1998. It took 30 minutes, which included snarfing through storages devices that had to be reconnected. Search systems have to meet business user needs. The goslings and I are lucky, we have the pick of the litter when it comes to search systems. In fact, that is the secret—multiple tools indexing the same corpus. You would be surprised to compare the difference in search results across systems, each indexing the same corpus. I know I was when I discovered this a decade ago.
But most organizations, or at least those in this sample, could not find the sand in the Mojave Desert. Are the vendors at fault? The procurement teams? The individual users? My thought is that each group shares responsibility for the waste that finding imposes on organizations and individual users of search systems.
Change in 2010? Nope.
Stephen E Arnold, January 12, 2010
I disclose to the General Services Administration’s purchasing group that I was not paid to write this article.
Demand Media in the Hot Seat
January 12, 2010
Bulk content producers are the latest next generation content business that irritates the poobahs in “real” publishing companies. The DNA of Demand Media marries MySpace.com with writers hungry to get their name in the Google index and earn some cash achieving this goal. That’s a potent combination. I must admit I don’t see much to raise my feathers. The database business is built along similar lines. Demand Media adds one or two twists; namely:
- Writers create content, post it to the Demand Media’s Pluck system, index the story, and get paid. No annoying human editors slow down the process.
- Media companies buy these stories, slap their label on them, and sell them in other publishing entities. Demand Media lists some of the media superstars who find a cheap source of good enough content ideal for their readers and the CFO’s blood pressure.
- New investors include some media savvy outfits; that is, smart money from Goldman Sachs, 3i Group, Generation Partners, Oak Investment Partners and Spectrum Equity Investors.
You can see a list of some of the Demand Media customers in the graphics on the Demand Media Web site that USA Today seems to be a customer of some Demand Media services.
When I read “Demand Media May Be Bad for Social Media but Not for Journalism”, I was surprised that the criticism of a company producing content would extend to social media. Here’s the passage that caught my attention:
I believe Demand Media is more of a threat to social media communications than it is to journalism and journalistic standards because of the kind of content it provides and what it does by providing search optimized content for corporate sites and evergreen content for the news industry.
With SEO undergoing seismic shocks, content has become a hot commodity for some Webmasters. Most companies can’t write too well or quickly. Demand Media has a method that generates several thousand stories a day. The company allegedly optimized the content, although I think this type of numerical recipe is like Macbeth witches’ brew. A great stage effect. A site with zero content can obtain Demand Media content. The inclusion of content might be just what the PageRank doctor ordered. Who knows?
Demand Media is not a “real” publisher. The coinage of the term “content farm” was a stroke of genius because it made it possible to discuss blogs, semi real publishing, and real publishing in a more interesting way. The idea is that a blog may be written by a few people but we know that blogs are not “real journalism”, according to some experts. The “real publishing company” engages in a centuries old tradition of picking content, paying a writer to write to the needs of the market as the publishing company sees them. This top down approach is where the notion of “knowing best” and “quality writing” emanates. The “content farm” is some weird beast that is worse than a blog and concerned less about “quality” than a traditional publishing company.
Baloney. The “content farm” is an information factory just like units of the Bureau of National Affairs, LexisNexis, West Publishing, and most commercial database companies. But “content farm” is way more suggestive than an “information factory”. What’s happening is that a phrase is making it possible to have a discussion about a business method that has been around a long, long time.
What’s happening is that certain high volume content production methods are being applied to different markets and at a price point that is appealing to some customers and to Webmasters who need a way to get traffic to a Web site.
Lots of talk over a business method that has been around for decades in electronic information. This is one more example of folks not knowing what they don’t know. In this case, the chatter is interesting but unlikely to halt the shift from top down content generation methods to alternatives.
The question I am considering: What happens when Google deploys its automated content generation technologies? I am looking forward to another big argument because when Google moves it will be too late to do anything about smart software that can write news and other types of documents with zero humans. None. How about that for low cost production. Demand Media pays humans and Google won’t have to. That will be something to see unfold.
Stephen E Arnold, January 12, 2010
A freebie. Thank goodness there seems to be an endless supply of regulatory authorities concerned about who pays me to write a blog article. Today my boss is the Department of Agriculture, a fine group.
SharePoint Sunday: Microsoft Fast Tuning Document
January 11, 2010
Microsoft has published in the XPS format “Optimize Search Relevance with Microsoft FAST Search Server 2010 for SharePoint (Beta)”. If you don’t have the XPS viewer installed, you will have to download that Adobe Acrobat inspired program from Microsoft.com/downloads. If you can’t locate the file, you can use Google to bing you to the appropriate link. (I had to fiddle a bit to get the XPS file to render because the download for XP would not run. My Windows 7 machine was more compliant.)
What interested me about this document is that it addresses relevance performance issues for the new search system. Even more interesting is that Fast Search & Transfer has been licensing software to the enterprise since 2004. The performance challenges of ESP have become familiar friends to some administrators. The fact that a 23-page technical white paper is required * before * the actual product ships furrowed my brow.
I know that many Certified Microsoft Professionals salivate when these types of opportunities arise. I don’t think that many Chief Financial Officers will be as gleeful. Money is required to address certain search performance issues. The Fast ESP system, like other complex, older search architectures, poses some significant hurdles for the CFO who wants engineers to spring to finish a search optimization job and the related bits of craft necessary to tune content processing, index refreshes, and the burden of certain content that must be crunched as soon as the info arrives.
The document reminds me that I cannot get too frisky when thinking about or using the information in the white paper. I will do my best to keep my friskiness under control. I think you should read the document and let it speak for itself.
Let me highlight some of the tips that caught my attention. First, the white paper makes clear that one may want to “prevent indexing irrelevant SharePoint content.” That’s interesting advice. The user does not know what he or she needs. A search therefore that is selective or winnows down what is in the index means that a user may not find what he or she seeks, may not have a complete set of pertinent information, or will have to supplement an online search with the old fashioned, real expensive approach—the Easter egg hunt. The trend in my experience is to index as much information that is available. Exclusion of information is tricky, and I for one don’t want to explain that the irrelevant content I did not process is exactly what was needed to close a big deal or get a fact verified. I can apply the same concern to the statement “Encourage archiving and deleting old content.” I know that old content may not be frequently accessed. But if the needed information is “old”, should the user be denied knowledge that germane content is indeed available.
The paper then shifts to running the Fast Search connector. A connector is a code shim that hooks content to the content processing system. My son’s company, Adhere Software, is in the connector business, and my understanding is that multiple connectors are required because organizations have diverse content ty8pes. But if there is one Fast Search connector, one has to tune it. The settings strike to the heart of the performance on a content processing system. The idea is not to crawl for new or updated content frequently. This is at odds with some companies’ desire to have the most timely information in the search system. If you get the crawl wrong, in my experience the users email, asking, “Where is that document?” One of the flaws in enterprise search is that the basic content acquisition method is at odds with the expectations of the users. Exalead, for example, delivers content within a 12 to 15 minute window. The other vendors struggle to match this timeliness. I don’t think the “new” version of Fast Search will be much different. Exalead and a couple of other outfits are the present speed champs in the enterprise. The code base in Exalead is “newer” than that in Fast Search which dates from the late 1990s.
Relevance is a tricky topic. In order to generate results that are useful to a user, enterprise systems require more care and feeding than some vendors reveal during the run up to contract signing. The section “Tune relevance in Fast Search for SharePoint” makes clear that term lists and manual promotion or demotion of certain information are needed. Hit boosting is important and today’s boosted document may be tomorrow’s demoted document. The hands on part of a search system like Fast Search is often a matter of trial and error. An unexpected result can create quite a bit of excitement. This type of tuning is expensive. The dependencies within the Fast Search system often create a need to revise the changes. The tuning segment is several pages long, and I suggest you read it, considering the cost implications of the recommendations that you as the search manager will have to perform yourself or hire specialists. Plan to spend quite a bit of time with the staging server implementation of Fast Search. Making relevancy changes while the plane is in the air can be exciting. I find drill level setting particularly enervating.
The linguistic relevance tuning is an important exercise. The idea is that some of nifty features like suggested content and forgiveness for poor spelling requires some manual experimentation and adjustment. I don’t know too many SharePoint administrators who have deep linguistic and semantic background. I know I don’t, and we do this stuff for a living. I rely on specialists, but these can be tough to find. A fiddly mistake can wreck havoc with the search system itself.
The white paper devotes a page and a half to custom search applications. As you might expect, quite a bit of useful detail was excluded to make the custom search application fit in a page and a half. The inclusion of the topic revealed to me that Microsoft is making an effort to minimize the complexity of creating a useful search enabled application. I know from many years in the search field that few SharePoint administrators posses the expertise to handle the challenges a search based application presents. Some point administrators have confidence in their ability. This confidence can undergo some alteration when a simple job becomes a death march that seems to have no end. If one does not know what one does not know, the search based application will send that person to boot camp quickly.
Several observations:
- Microsoft is going to give SharePoint licensees the idea that tuning Fast Search is not big deal. I think that Fast Search tuning will become a big deal and very quickly. The complexity of Fast Search cannot be minimized or swept under the rug no matter how many azure chip consultants assert that life is simple. Not in Fast Search land I opine.
- The white paper is a shopping list of tasks. The code samples are helpful, but the guidance is not deep. This means that a SharePoint administrator following the steps outlined in this white paper will find himself or herself with quite a technical challenge. What makes life exciting is that the dependencies within and among the Fast Search “control knobs” are not documented. Fast Search is not an iPhone app that one can master if a matter of minutes. Tuning Fast Search is challenging technical work.
- The omission of references to third party experts who will be needed to handle certain tasks such as the controlled terms operations sets the stage for a big surprise. Linguistic components are not intuitive, and the work requires experts who know how to set up term lists that deliver meaningful results to the user.
My view is that Microsoft is beginning to understand what a challenge Fast Search will be to the average SharePoint administrator. Enterprise search appliance and purpose built search systems vendors will see the broad deployment of Fast Search as the best marketing for their competitive products.
Pretty exciting stuff.
Stephen E. Arnold, January 11, 2010
No one paid me to write this, but I believe that if I were actually a salesperson I could have found a couple of outfits eager to have me slap their SharePoint search solution on the blog page displaying this essay. Also, no dough. I will report this to the administrator of Walter Reed Hospital. My idea is that those under stress may seek remediation at that fine facility.
The Economy Fosters Interesting Hook Ups: Open Text and Oracle
January 11, 2010
A happy quack to the article on Fierce Content Management with a title guaranteed to catch my attention: “Open Text Releases Oracle Application Compatibility Tools.” Open Text is a search and SGML indexing company that has morphed over the years. The company’s approach to growing revenues may have inspired Autonomy in the acquisition approach to revenue growth. The idea was that Open Text bought product lines and companies. With each purchase, Open Text grew larger. I have lost track of the company through my own lack of interest in roll ups, but at one time Open Text offered several search systems (BASIS, BRS/Search, Fulcrum, and the original SGML search features and probably others. In addition, Open Text had its own collaboration and content management system called LiveLink and then added the RedDot system to complement the firm’s purchase of Vignette. I try to steer clear of content management because organizations want software to generate gold from lead. Publishing companies have a tough time generating content, and in my experience, an engineering firm or a construction company expects software to create golden prose. The reality is that systems like BASIS for structured data search and report generation are complex beasties. Information Dimensions had trouble getting enough dough to invest in the BASIS system. A firm like Open Text appears to have no problems garnering sufficient money and technical talent to maintain no one search system but three or four. Furthermore Open Text must have enough cash left over to deal with the upgrades and bug fixes required by the RedDot and Vignette systems plus the other technology Open Text has acquired as it follow the roll up trail.
Oracle, on the other hand, is a database company that has systems that compete directly with Open Text’s. The Oracle database performs search, content management, and related tricks. In addition, Oracle created a Fusion product line that reduces the complexity of hooking a third party application into an Oracle system. In addition, Oracle has a school bus of acquisitions as well. These range from the text processing system from Triple Hop, its own search system which has a low, low profile these days, to the Siebel CRM products and the even more fascination Oracle Social CRM offerings. Database management systems have been the foundation of the Oracle revenue, and like Google, the company has not been able to diversify its revenues from one flagship product in my opinion. Oracle has tried with the nCube and now with the Sun Microsystems’ play.
The fact that these two outfits with similar growth strategies are teaming up is one indication of how the enterprise software market has changed. Today competitors hook up. Even more interesting is that in the story about the application compatibility tools this passage appears:
Rich Buchheim, Vice President of Oracle Solutions at Open Text, says the fact that Open Text works together with Oracle (which itself has a content management solution), is not contradictory at all, and that his company works with several major vendors in this fashion. “Enterprises today are inherently heterogeneous. Open Text provides better access to content from across the enterprise. As the largest independent provider of ECM, Open Text has the significant advantage of being able to provide tight integration with all the leading enterprise platforms including Oracle, SAP and Microsoft.”
That may be true but when the pie is getting smaller, corporate rivals are often reluctant to share available revenue. The notion of “lock in” and control of the customer are two primal drives in the enterprise software world. My view of this deal is that it is not much news. The Content Connector is now in version 3.0, so this is not a new product. Furthermore, the Fusion technology that Oracle pushed has suffered from staff adjustments, and there are a number of options available to make the sale more challenging.
With the world of enterprise software in flux, I wonder if both of these companies will be able to pump up their profits in 2010. I keep visualizing Oracle executives sharing the six inch mini cherry pie with Open Text and vice versa. Who will team up next? IBM and Microsoft, Apple and Google? Who knows. The economic environment is causing some interesting behavior.
Stephen E Arnold, January 10, 2010
Oyez, oyez, this is a freebie. I was not able to get anyone including my dog Tess to show interest in the machinations of two companies not known as fleet of foot. Ah, “fleet”. I must report this lack of payment to the US Coast Guard, happily housed in DHS, not the US Navy. Another interesting tie up.
Google User Monitoring Described
January 11, 2010
I steer clear of explaining the methods Google uses to track user behavior. Whenever I mention, noticing hover time, I get weird looks. “How Google Collects Data about You and the Internet” provides an accessible summary of the basics of Google monitoring technology. Most people don’t care much about Google’s monitoring of systems, advertisers, and such esoterica as context tags. For me the most interesting comment in the write up was:
An interesting observation when using these tools is that in many cases information can be found for everything except for Google’s own products. For example, Ad Planner and Trends for Websites don’t show site statistics for Google sites, but you can find information about any other sites.
The reason this passage struck me as useful is that it points out that Google presents a view of data that the company selects. The notion of managing information is a useful one to keep in mind when one reads about Google in the writings of poobahs, pundits and mavens who observe Google the way I learned about the Venus fly trap in biology.
Stephen E. Arnold, January 10, 2010
A freebie. I shall report this to the National Regional Research Laboratory manager in Peoria, Illinois, where the soy bean was converted from food into latex paint components.
The Skiff May Not Be a Magazine Industry Life Preserver
January 11, 2010
TechCrunch’s “Can the Skiff Save the Magazine Industry?” strikes a good balance between hope and reality. Media giant Hearst, like other publishing outfits, are looking for a digital alternative to paper. If you are going to float an industry, you need one heck of a life preserver. TechCrunch said:
Skiff isn’t in this game to make hardware. They’re in it to save their industry from imminent demise. Magazines, as they exist now, are expensive artifacts of an industrial process that has been refined over the past century. They are eye-catching pieces of typographic art and they contain some of the best writing of any generation in a package that appears on a monthly basis and is sold or mailed to readers in paper format. But – and here’s where Skiff comes in – there is no perceived value in print magazines anymore. They are expensive to produce and circulation is falling drastically, resulting in a panic in the industry. The solution? Subsidized ereaders and reading services that will keep the subscription model from failing. The Skiff is the first of these efforts and, if first mover advantage remains true, they may be the winner.
Maybe. But the problems of the magazine industry were evident when Bill Ziff, a pretty sharp magazine guy, decided that magazines were not a good business in the early 1990s, an e reader is not going to reverse what seems to be an irreversible decline. TechCrunch identifies the obvious problems: the environmental costs of paper, declining circulation, and advertisers who want more than a 32 page publication mailed to a person like me who rarely looks at magazines any more.
I would add to the TechCrunch comments one point that few like to talk about; namely, the killers of the magazine industry are the children of publishers, lawyers, and other white collar types. Magazines fail to enchant the teens in my dentist’s waiting room. Last visit, the three teens waiting for a check up ignored the pile of magazines with big names like National Geographic and Motor Trend. The teens were thumbing and finger tapping in tune with a digital world that has little interest in magazines.
My hunch is that a magazine on a dedicated reading device won’t create too many new magazine readers. A percentage of the upper income, 40 somethings will buy a reading device. What the magazine needs is readers, not gizmos. The problem is that the children of upper class, white collar workers are not following in their parents’ footsteps. My son (University of Virginia) and wife (Duke Law) subscribe to the New Yorker. I do too. I like the cartoons and some of the essays but not as many as I did when I was younger. I can’t figure out some of the ads either. The last time my son and I were in an airport together we skipped the news stand. Neither of us buy magazines while traveling. The offerings are expensive, out of date, and an extra thing to carry.
When I commuted from Louisville to New York and San Francisco, I used to load up on the magazines that looked interesting. Now the cover stories don’t have much magnetic appeal.
Maybe government subsidies will work? Maybe magazines can be operated by a charity? Maybe magazines can be owned by Warren Buffet who funds them out of a sense of social significance? Maybe governments will tax Google and use the money to keep ink-on-paper publishing afloat.
The Kindle clones won’t have the buoyancy the magazine industry needs and needs quickly.
Stephen E Arnold, January 11, 2010
Yep, another freebie. I suppose I need to report this to the US Department of Navy, an outfit with some skill in keeping giant things afloat.
The Open Approach at Google Allegedly Baffles Microsoft
January 11, 2010
I read the Motley Fool’s “Why Microsoft Is Wrong About Google” and found the write up pretty good. I had a couple of conversations today that reminded me how difficult it is for some very informed people to misread Google smoke signals. The messages Google sends are clear, content rich, and direct if you are tuned into Google’s developer and technical messages. If you are looking at Google marketing baloney, you have many reasons to be confused.
Who is able to perceive reality: The fish in the tank or the observers of the tank? What about the outfit that owns Sea World? Who is really able to perceive the one true reality? Viewpoint is important when trying to figure out what a specific Google action is intended to accomplish.
The Motley Fool points out that Microsoft is one of the outfits that may not be reading the technical smoke signals. The comment in the write I found most telling was:
Big G doesn’t necessarily want to win the phone wars on its own. If these phones stoke the fires under its partners and flat-out competitors (non-Android), that’s plenty good enough. Better products mean more customers, which begets more browsing and more clicks on Google ads. You see where I’m going with this. In short, Google isn’t doing the same thing as Cisco at all, and has no desire whatsoever to emulate Apple. Google wins if anybody wins — particularly the consumer.
This is a pretty solid observation, but it does not hook the Android to the larger world in which Google moves. Here’s the analogy. Most pundits, azure chip consultants, and self appointed satraps look at the here and now and say, “I see what Google is doing.” The problem is perspective. Google is not in a fish bowl. Google owns a fish bowl.
Getting creatures into the fish bowl is the name of the game. Once in the Google fish bowl, the inhabitants have a tough time figuring out life beyond the glass walls. I can’t see very well underwater, and the folks in the fish bowl can’t see very well either.
As a result, the Android is one way of putting inhabitants in the Google fish bowl. Open source is one method but not the only method. The objective is to become the owner not of a fish bowl but an aquarium, maybe dozens of aquaria. That’s the perspective. The Android is not an Apple play. It is operating at a much higher level of abstraction, which may be difficult for Google to achieve.
I admire the Googlers for chugging away in relative obscurity as the experts try to compare Google to another company’s products. Won’t work. Google is a domain, not a one trick pony.
Stephen E. Arnold, January 11, 2010
I suppose the reference to fish means that I have to report that I was not paid to write this article. Okay, I will snail mail the Fish & Wildlife execs.
Quote to Note: Apple Exec Turned Googler Comment
January 10, 2010
“Live from Las Vegas: Google VP of Engineering Andy Rubin” provides one view of the Google Android play. The statement is attributed to Andy Grover, founder of Danger, a mobile computing company. Here is the quote:
No one’s breathing down your neck, he says. No one’s trying to upsell you.
Consumers may perceive the Nexus One’s marketing one way. Telco executives don’t feel Google’s breath. Google is standing on the doorstep drooling. Google’s customer support unit may feel more heat than those valiant workers anticipated.
Stephen E Arnold, January 10, 2010
A freebie. Due to the references to heavy breathing I will report my unpaidness to the director of the National Zoo in Washington, DC.
More Advice from the Bleachers for Google
January 10, 2010
Google wants to get in the power business. I thought Google owned a power generation facility in Finland. The company has a lot of experience in the power game. Eric Schmidt in a talk I heard a couple of years ago reported that Google’s data centers sucked as much power as a small town. Now the pundits have a new horse to ride.
Read “What Does Google Want To Be When (and If) It Grows Up?” in Bnet’s tech news service. The guts of the article is that Google is getting into too many businesses. As a result, the company may find itself in a business pickle. Google’s competitors are rooting for the Google to waste its resources, lose focus, and die a quick death. The pundit from Bnet wants the Google to get back on track.
I have quite a loud quack when I read these suggestions from the sidelines. Let’s assume that Google screws up its latest big thing. I think there will be general celebration from the companies disrupted and threatened by Google. The power generation industry is struggling. From hard to control costs to belching pollutants, the power utility industry in the US and other countries faces a big problem—money.
Advice to Google from folks who see only the here and now is likely to be weird, maybe incorrect, advice. Publishers should be urging the Google to self destruct.
Stephen E Arnold, January 10, 2010
A freebie. I will report this to the Federal consulting crowd across from DC power centers.