Autonomy Lands Fresh Content Tuna
August 13, 2008
Northcliffe Media, a unit of the Daily Mail and General Trust, publishes a score of newspapers mostly in the UK. Circulation is nosing towards a million. Northcliffe also operates a little more than two dozen subscription and ad support weekly regional tabloids and produces 60 ad supported free week shopper type publications. The company also cranks out directories, runs a printing business, and is in the newsstand business. The company whose tag line is “At the heart of all things local” has international interests as well. Despite the categorical affirmative “all”, I don’t think Northcliffe operates in Harrod’s Creek, Kentucky. Perhaps it should?
Autonomy, the Cambridge-based search and Okana (an Autonomy partner) have landed the Northcliffe Media search business; thus, a big content tuna. Okana describes big clients like Northcliffe as “platinum” not “tuna” but I wanted to keep my metaphor consistent and Okana rhymes with tuna.
Okana was Autonomy’s “Innovative Partner of the Year” in 2007. Okana says, “Based around proven architectures, Okana’s range of Autonomy appliance products provide instantly deployable and scaleable [sic] solutions for Information Discovery, Monitoring, and Investigation.Compliance.”
This description of Okana’s offering an “appliance” was new information for me. Also, my research suggests that many Autonomy IDOL installations benefit from training the IDOL system. If training is required, then I ask, “What if any trade off is required for an instant Autonomy IDOL deployment?” If anyone has details for me, please, use the comments section of this Web log.
Autonomy’s IDOL (Intelligent Data Operating Layer) will index an initial 40 million documents and then process new content. The new system will use Autonomy’s “meaning-based computing” for research, information retrieval and trend spotting.
You can read the complete Autonomy news release here. Once again Autonomy is making sales as word reaches me of competitors running into revenue problems. Autonomy’s a tuna catcher while others seem fated to return to port with empty holds.
Stephen Arnold, August 13, 2008
The Future of Search? It’s Here and Disappointing
August 13, 2008
AltSearchEngines.com–an excellent Web log and information retrieval news source–tapped the addled goose (me, Stephen E. Arnold) for some thoughts about the future of search. I’m no wizard, being an a befuddled flow, but I did offer several hundred words on the subject. I even contributed one of my semi-famous “layers” diagrams. These are important because each layer represents a slathering of computational crunching. The result is an incremental boost to the the underlying search system’s precision, recall, interface outputs, and overall utility of the system. The downside is that as the layers pile up so does complexity and its girl friend costs. You can read the full essay and look at the diagram here. A happy quack to the AltSearchEngines.com team for [a] asking me to contribute an essay and [b] having the moxie to publish the goose feathers I generate. The message in my essay is no laughing matter. The future of search is here and in many ways, it is deeply disappointing and increasingly troubling to me. For an example, click here.
Stephen Arnold, August 13, 2008
MarkLogic: The Army’s New Information Access Platform
August 13, 2008
You probably know that the US Army has nicknames for its elite units. Screaming Eagle, Big Red One, and my favorite “Hell on Wheels.” Now some HUMINT, COMINT, and SIGINT brass may create a MarkLogic unit with its own flash. Based on the early reports I have, the MarkLogic system works.
Based in San Carlos (next to Google’s Postini unit, by the way), MarkLogic announced that the US Army Combined Arms Center or CAC in Ft. Leavenworth, Kansas, has embraced MarkLogic Server. BCKS, shorthand for the Army’s Battle Command Knowledge System, will use this next-generation content processing and intelligence system for the Warrior Knowledge Base. Believe me, when someone wants to do you and your team harm, access to the most timely, on point information is important. If Napoleon were based at Ft. Leavenworth today, he would have this unit report directly to him. Information, the famous general is reported to have said, is nine tenths of any battle.
Ft. Leavenworth plays a pivotal role in the US Army’s commitment to capture, analyze, share, and make available information from a range of sources. MarkLogic’s technology, which has the Department of Defense Good Housekeeping Seal of Approval, delivers search, content management, and collaborative functions.
An unclassified sample display from the US Army’s BCKS system. Thanks to MarkLogic and the US Army for permission to use this image.
The system applies metadata based on the DOD Metadata Specification (DDMS). The content is managed automatically by applying metadata properties such as the ‘Valid Until’ date. The system uses the schema standard used by the DOD community. The MarkLogic Server manages the work flow until the file is transferred to archives or deleted by the content manager. MarkLogic points to savings in time and money. My sources tell me that the system can reduce the risk to service personnel. So, I’m going to editorialize and say, “The system saves lives.” More details about the BCKS is available here. Dot Mil content does move, so click today. I verified this link at 0719, August 13, 2008.
Vorsite Connectors for SharePoint
August 13, 2008
A helpful reader alerted me to Vorsite’s connectors for SharePoint. Based in Seattle, the company has a core competency in SharePoint. The person who wrote me alerted me to connectors; for example, the code widget that hooks Documentum to SharePoint is called “v-Pass for Documentum”. Once installed, a SharePoint user can search the contents of a Documentum content management system repository. The company also offers Active Results for Microsoft Search. This product “extends Microsoft Search capability.” With ActiveResults, a SharePoint user can “e-mail a document, tag it or send it to a records vault without leaving the search results user interface.” Earlier this month, the company bundled a number of tools, including the Documentum connector and Active Results.” You can learn more here. The company’s Web site is here. If you are wedded to SharePoint and need to connect to Documentum or other content management systems, give them a call. A happy quack to the reader who alerted me to this firm’s connectors.
Stephen Arnold, August 13, 2008
Is There a Mainframe in Your Future
August 13, 2008
Brian Womack’s article “Big Iron Anything But Rusty For Mainframe Pioneer IBM” brought a tear to my eye. Writing in Investor’s Business Daily here, Mr. Womak says:
IBM says revenue for its mainframe business rose 32% in the second quarter compared with a year earlier, easily outpacing overall sales growth of 13%. A big driver was February’s launch of IBM’s next-generation mainframe line, the z10, its first big upgrade since 2004. IBM spent about $1.5 billion on the new line.
The core of the article is an interview with David Gelardi, a 52-year-old mainframer. I don’t want to spoil your fun. I love mainframers who explain why big iron is as trendy as Heidi Klum’s catchphrase, “One day you’re in. And one day you’re out.” For example, consider this comment by Mr. Gelardi:
If I take (1,500 Intel) servers . . . and put them on a single mainframe, I’ll have no performance problems whatsoever. But I’m taking all of that workload that was on 1,500 separate servers and consolidating them on one mainframe. While it may be a million-dollar machine and up, it’s actually cheaper than those 1,500 servers.
This is pretty compelling data. I wonder if Google is aware of what it might gain if it were to abandon its decade of effort with commodity servers? Google and IBM are best buddies now. Maybe IBM will convince the GOOG to change its ways? Is there a mainframe in your future?
Stephen Arnold, August 13, 2008
More Search without Search
August 13, 2008
Google wizard Stephen R. Lawrence and sub-wizard Omar Khan invented a what I probably too simplistically characterize as meta-data vacuum cleaner. Useful for mobile devices, this addition to Google’s “search without search” arsenal is quite interesting to me. The invention is disclosed in US7,412,708, granted on August 12, 2008, with the title “Methods and Systems for Capturing Information.” If you are interested in how Google can deliver information before a user types a query or what type of data Google captures, you will want to read this 14 page document. Think email addresses and more.
The invention is not new, which is important. The GOOG is slow in integrating whizzy new monitoring technology in its public-facing systems. This invention was filed on on March 31, 2004. Figure nine to 12 months of work, I think that this is an important chunk of Google’s metadata vacuum cleaner. I cover a number of these inventions in Google Version 2.0. I discussed one exemplary data model for usage tracking data in my for-money July August column for KMWorld. I won’t rehash those documents in this Web log article. You can download a copy of the document from the good, old USPTO here. Study those syntax examples. That wonderful USPTO search engine is a treat to use.
What’s this invention do? Here’s the official legal eagle and engineer description:
Systems and methods for capturing information are described. In one embodiment, an event having an associated article is identified, article data associated with the article is identified, and a capture score for the event is determined based at least in part on article data. Article data can comprise, for example, one or a combination of a location of the article, a file-type of the article, and access data for the article. Event data associated with the event is compiled responsive at least in part to a comparison of the capture score and a threshold value.
The GOOG’s Gmail plumbing may need some patch ups, but once those pin hole leaks are soldered, US7,412,708 portends some remarkable predictive services. I can’t type on my mobile phone’s keyboard now. Google knows that I will be one of the people eager to let Google anticipate my needs. I wonder if there’s a link analysis routine running across those extracted metadata. I think I need to reread this patent document one more time. Join me?
Stephen Arnold, August 13, 2008
Gmail Gfail Update
August 12, 2008
An estimated 20 million Gmail users received 502 errors upon login on August 11, causing a huge furor. Both personal and hosted apps accounts were affected. People in the United States, Canada, and India reported the problem and even a Google employee said the company’s corporate e-mail account was down. (You may want to read our earlier, opinionated post here. Google News’s own run down of stories is here but may be gone soon as well. Click quick.)
Google posted a comment about the August 11, 2008, outage: “Since about 2 p.m. Pacific Time today, many Gmail users have been unable to access their email. We are very sorry for this interruption in service. The issue is being caused by a temporary outage in the contacts system used by Gmail which is preventing Gmail from loading properly. We are starting to roll out a fix now and hope to have the problem resolved as quickly as possible. Even though you may not be able to get to your inbox right now, your mail is safe, including new incoming messages.”
The first help discussion post from the Gmail Guide team at 5:31 p.m. At 7:35 p.m., Gmail Guide stated all accounts should be accessible. Although a bit slow on the uptake, Gmail communicated as it went along, which surprised many armchair and credentialed commentators. 502 errors appear to be a fairly regular occurrence, looking at the discussion group notifications at http://groups.google.com/group/Gmail-Help-Announcements-and-Alerts-en/topics.
Gmail Product Manager Todd Jackson actually posted an apology on the official Gmail blog, giving a tiny explanation of the problem: “The issue was caused by a temporary outage in our contacts system that was preventing Gmail from loading properly.” Translation: Address books are mucking up the works. He also said: “We’re conducting a full review of what went wrong and moving quickly to update our internal systems and procedures accordingly. We don’t usually post about problems like this on our blog, but we wanted to make an exception in this case since so many people were impacted.” Translation: We hear the several million people screaming at us.
A long, technical explanation wasn’t issued. But just so you have an idea about where the problem occurred: the Contacts data is based on an API that ties contact to your entire account, not just Gmail. With the API you can “synchronize Google contacts with contacts on a mobile device, maintain relationships between people in social applications (Facebook, MySpace, etc.), give users the ability to communicate directly with their friends from external applications using phone, email, and IM.”
So that Contacts function has its Googley-tendrils extending into complex places. Maybe it’s like a long-tailed cat in a room full of rocking chairs, and this particular tentacle just got squished?
Jessica Bratcher, August 12, 2008
SharePoint’s Bottlenecks: Databases Table Makes It Clear Now
August 12, 2008
Tucked away on the Microsoft MSDN Web site is this document, “Databases Table”. Sadly there is no author. I want to congratulate the person for what must have taken days to compile. This lengthy document gathers together and explains to some degree the database tables upon which the SharePoint edifice stands.
As useful as this enumeration is, for me the best information in the MSDN Web log entry is this re assuring statement:
Modifying the database schema or database structures is not supported. Changes that you make to the database contents may be overwritten when you install updates or service packs for Windows SharePoint Services, or when you upgrade an installation to the next product version.
I like to fiddle. I am officially on notice to keep my hex editor away from SharePoint’s database table. Not only will I lose my table changes, I can blow away my data. Nice to know.
Stephen Arnold, August 12, 2008
Googzilla Stumbles Then Apologizes (Gasp!)
August 12, 2008
I recall an earnest Googler in May 2008 accusing me of creating a “fake” screen shot from a Google patent document. I am confident that the Googler, bright smile, direct gaze, was supremely confident that as a Googley person, he was right. Well, he was wrong. The school-mother who crafted his supreme self-confidence and his fraternity brothers who fawned over his brilliance has his head in his patootie. I followed up with an email–not Gmail, thank my lucky stars. I provided the patent document number, and I invited him to give me a buzz to talk about what I called Google’s “profiling” service. The “invention” disclosed in a publicly available document converts a query such as “Michael Jackson” or some other proper noun into a dossier. Yep, just like the type of intelligence reports that wizards at McKinsey & Co. generate for their well-groomed clients.
Every time Google–an outfit I have dubbed Googzilla in honor of the giant, dangerous, but rubber suited monster from the Japanese horror films I loved in my youth–stumbles, I think of Googlers. I recall a situation a couple of years ago when a Googler showed up for a talk at the International Online Show in London. On the panel was a fellow who had worked at the original AltaVista.com. The Googler ran through a canned PowerPoint which was unfamiliar to him. His bright smile and earnest gaze did little to disguise his failure to look at the program, the title of his talk, or how the program would be orchestrated by yours truly. Well, the AltaVista.com wizard gutted and roasted the Googler to the delight of the crowd. At that time, I had a financial incentive to get the Googler off the roasting spit and back into his seat. I failed. The AltaVista.com wizard enjoyed the bar-b-que.
I have reserved comment about the increasing friability of Google operations that must read and write data. In my 2005 study The Google Legacy, I tried in my non-Googley way to explain that the engineers responsible for Google had focused on really fast read speeds. In fact, the company built upon the learnings of AltaVista.com and other information retrieval scientists to use commodity drives to deliver lightning fast performance using a wide range of engineering insights, clever techniques, and elbow grease to resolve known bottlenecks in serving queries. My publisher reports a spurt of interest in my Google studies. We hypothesized that now almost four years after the first analysis appeared, some folks are figuring out that Google has designs on more than online advertising. Also, I work through some of Google’s vulnerabilities such as a digital Achilles’ heel.
My hunch is that my analysis of Google’s weaknesses are now of interest because Googzilla appears to have feet of clay when read-write functions overwhelm the allocated resources.
Why am I reminding people of Google’s focus on reads and Google’s somewhat arrogant attitude toward competitors, customers, and, of course, addled me?
Easy.
Here are some links to bring you up to date on Google’s most recent online outage:
- Google’s very own apology. Click here to read the sincere “We Feel Your Pain, and We’re Sorry”. Yep, I believe this.
- TechCrunch’s report, including useful updates that pinpoint when Gmail went south and when the Googlers figured out what went wrong. Access this good write up here. I love the whale illustration, but I would have used my Googzilla art. Including the error message with another “We’re sorry but your Gmail account is currently experiencing errors” line. The wording is remarkable because an account experiences errors because the system is not working.
- Rafe Needleman’s post which was a subtle reminder that some youthful thinkers rely on a communications medium more immediate (but almost as reliable) than Gmail. You can read his post here.
News aggregators have hundreds of stories about this failure, and I will leave it to you to click through the links on Daily Rotation, PopURLs, and Megite.
My take on this problem is that Google’s architecture does some things well (server result sets, track user behavior, sell ads) and others not so well (Gmail, Android, Knol). The decreasing interval between failure and visible loss of service is encouraging to some competitors, I surmise. Microsoft, despite its slow start, has delivered a reasonably solid Olympics service. So, Microsoft stays online; Google doesn’t. That should bring some smiles to the Microsoft faces.
Stephen Arnold, August 12, 2008
Virtualization for MOSS (SharePoint 2007): Sort of. Maybe. Some Day
August 12, 2008
Ketaanhs wrote “What Is the Support for Virtualization for MOSS (SharePoint 2007)? You can read the full post on MSDN Web logs here. First, the good news. Mr. Ketaanhs provides links to hard-to-find Microsoft documentation about virtualization; for example, KB897615 which explains a gotcha for everyone but premier support buyers. He has also scoured these fine expository documents for crucial SharePoint virtualization information. So, if you are a lucky MOSS licensee and want to use virtualization to maximize your “scale up and scale out” investments, jump to MSDN and download these files.
Second, the bad news. There is no solid information about support of virtualization in SharePoint 2007. There’s speculation, and Mr. Ketaanhs writes:
So currently as of Aug 08th 2008 we are awaiting an Official statement to come out in few weeks, until then I *assume* Microsoft Support may provide Commercial Reasonable Support.
Virtualization is one of the hot trends in server rooms. With upwards of 65 million SharePoint users, some of those IT managers would like to virtualize, squeeze more mileage from their hardware, and increase the performance of SharePoint when it processes documents, performs queries, and generates those tasty Web 2.0-style interfaces.
In my opinion, Microsoft continues with some of its pre-Ozzie code synchronization policies. Microsoft marketers hype virtualization and cook up zippy new product names like Hyper-V. Licensees, on the other hand, don’t have what they need to do substantive quarterly planning. Not good.
Stephen Arnold, August 12, 2008