Daticon’s Invenio

August 14, 2008

eDiscovery continues to bubble despite the lousy economy in North America. Several weeks ago we started the update procedure for our eDiscovery vendors. I made a mental note to post a short item about Daticon, a company supporting organizations engaged in electronic discovery. You can learn more about this company here. What interests me is the firm’s search technology, called Invenio. The technology is based on a neural network, and when I reviewed the system, some of its features reminded me of an outfit called Dolphin Search, but I may be wrong on this point. If Invenio is Dolphin Search, let me know.

Invenio is integrated with Daticon’s management tools. These tools are among the most fine grained I have seen. Once deployed, a manager can track most of the metrics associated with processing, reviewing, and screening email, documents, and other content associated with eDiscovery processes.

Here’s a representative display of system metrics.

dashboard

There are similarities between Daticon’s approach and that of other eDiscovery specialists such as Stratify and Zantaz. Daticon bundles eDiscovery with a work flow, data capture, metrics, and a range of content processing functions.

The search and content processing system support concept searching, duplicate detection and duplicate removal, email threading, non text objects, and case management tools. Essentially this is a case management function that allows analysis of activities associated with a matter.

The company makes an interesting series of demonstrations available. I did not have to register to get walk throughs of the Invenio system. Try them yourself by clicking here.

Stephen Arnold, August 14, 2008

Lexalytics and Infonic Go Beyond Sentiment and Get Hitched

August 14, 2008

I learned about Lexalytics when I was researching Fast Search & Technology. Fast Search introduced when I was writing the Enterprise Search Report a function that would report on the sentiment in documents or email. The idea is particularly important in customer support. A flow of email that turns sour can be identified by sentiment analysis software. Fast Search’s approach was interesting to me because it was able to use Fast Search’s alert feature.

Founded in 2000, Infonic here is a publicly traded company (previously named Corpora plc). Infonic is listed on the UK’s AiM Stock Exchange as LON:IFNC. The company offers geo-replication and document management solutions. The firm also develops text analytics and sentiment technology. The firm’s Geo-Replicator software uses data compression and synchronization technology to replicate data between servers and laptops and server to server. The firm’s Document Manager software permits scanning, search, and retrieval of processed content. The company’s text analytics software product is called Sentiment.

At the end of July 2008, the two companies announced that the sentiment units would be merged. The new unit will be based in the UK and named Lexalytics Limited. I profiled the company in my new study for the Gilbane Group here. Lexalytics software performs entity extraction, sentiment analysis, document summarization and thematic extraction. Information about Lexalytics is here.

According to the two companies,

The rationale behind combining the businesses is to pool the expertise and complementary products of the parties in this specialist area and to drive joint growth in sales, utilizing Infonic’s global sales capabilities.

The new company has a value estimated at $40 million. Jeff Caitlin, founder of Lexalytics, will be the managing director of the new company.

Sentiment analysis is moving to the mainstream. The addled goose wishes the new sentimental outfits good luck. Oh, one final point: watch for more consolidation in the text analytics space. The market is a frosty place for some search and content processing vendors at this time.

Stephen Arnold, August 14, 2008

Scaling SharePoint

August 14, 2008

We were looking for information about scaling SharePoint to handle a big job like the Olympics. One of my team called “Case Study: Using Microsoft Office SharePoint Server to Implement a Large-Scale Content Storage Scenario with Rapid Search Availability.” The authors are Paul J. Learning,  Microsoft Consulting Services, Russ Houberg, KnowledgeLake, and Andy Hopkins,  Microsoft. The download page is here. A PDF, DOC, and DOCX version are available. The document updated in June 2008 is over 100 pages in length, so summarizing it is not possible. The information is pertinent to most enterprise SharePoint installations.

For me, one of the most useful sections was the discussion of server topology, which begins on page 21. The design of the servers and the sheer horsepower recommended for the exemplary installation were, in my opinion, quite interesting. Brand name hardware and other high end infrastructure components underscore the need to have appropriate resources when scaling SharePoint. Here’s a representative server topology diagram that I think is first rate:

clip_image002

Page 34 from Case Study by Houberg, et al.

I also found interesting the paper’s discussion of SQL Server allocations on pages 41 and following stuffed full of useful data. Download the paper today. A happy quack to the Microsoft team for a job well done.

Stephen Arnold, August 14, 2008x

Autonomy Lands Fresh Content Tuna

August 13, 2008

Northcliffe Media, a unit of the Daily Mail and General Trust, publishes a score of newspapers mostly in the UK. Circulation is nosing towards a million. Northcliffe also operates a little more than two dozen subscription and ad support weekly regional tabloids and produces 60 ad supported free week shopper type publications. The company also cranks out directories, runs a printing business, and is in the newsstand business. The company whose tag line is “At the heart of all things local” has international interests as well. Despite the categorical affirmative “all”, I don’t think Northcliffe operates in Harrod’s Creek, Kentucky. Perhaps it should?

Autonomy, the Cambridge-based search and Okana (an Autonomy partner) have landed the Northcliffe Media search business; thus, a big content tuna. Okana describes big clients like Northcliffe as “platinum” not “tuna” but I wanted to keep my metaphor consistent and Okana rhymes with tuna.

Okana was Autonomy’s “Innovative Partner of the Year” in 2007. Okana says, “Based around proven architectures, Okana’s range of Autonomy appliance products provide instantly deployable and scaleable [sic] solutions for Information Discovery, Monitoring, and Investigation.Compliance.”

This description of Okana’s offering an “appliance” was new information for me. Also, my research suggests that many Autonomy IDOL installations benefit from training the IDOL system. If training is required, then I ask, “What if any trade off is required for an instant Autonomy IDOL deployment?” If anyone has details for me, please, use the comments section of this Web log.

Autonomy’s IDOL (Intelligent Data Operating Layer) will index an initial  40 million documents and then process new content. The new system will use Autonomy’s “meaning-based computing” for research, information retrieval and trend spotting.

You can read the complete Autonomy news release here. Once again Autonomy is making sales as word reaches me of competitors running into revenue problems. Autonomy’s a tuna catcher while others seem fated to return to port with empty holds.

Stephen Arnold, August 13, 2008

The Future of Search? It’s Here and Disappointing

August 13, 2008

AltSearchEngines.com–an excellent Web log and information retrieval news source–tapped the addled goose (me, Stephen E. Arnold) for some thoughts about the future of search. I’m no wizard, being an a befuddled flow, but I did offer several hundred words on the subject. I even contributed one of my semi-famous “layers” diagrams. These are important because each layer represents a slathering of computational crunching. The result is an incremental boost to the the underlying search system’s precision, recall, interface outputs, and overall utility of the system. The downside is that as the layers pile up so does complexity and its girl friend costs. You can read the full essay and look at the diagram here. A happy quack to the AltSearchEngines.com team for [a] asking me to contribute an essay and [b] having the moxie to publish the goose feathers I generate. The message in my essay is no laughing matter. The future of search is here and in many ways, it is deeply disappointing and increasingly troubling to me. For an example, click here.

Stephen Arnold, August 13, 2008

MarkLogic: The Army’s New Information Access Platform

August 13, 2008

You probably know that the US Army has nicknames for its elite units. Screaming Eagle, Big Red One, and my favorite “Hell on Wheels.” Now some HUMINT, COMINT, and SIGINT brass may create a MarkLogic unit with its own flash. Based on the early reports I have, the MarkLogic system works.

Based in San Carlos (next to Google’s Postini unit, by the way), MarkLogic announced that the US Army Combined Arms Center or CAC in Ft. Leavenworth, Kansas, has embraced MarkLogic Server. BCKS, shorthand for the Army’s Battle Command Knowledge System, will use this next-generation content processing and intelligence system for the Warrior Knowledge Base. Believe me, when someone wants to do you and your team harm, access to the most timely, on point information is important. If Napoleon were based at Ft. Leavenworth today, he would have this unit report directly to him. Information, the famous general is reported to have said, is nine tenths of any battle.

Ft. Leavenworth plays a pivotal role in the US Army’s commitment to capture, analyze, share, and make available information from a range of sources. MarkLogic’s technology, which has the Department of Defense Good Housekeeping Seal of Approval, delivers search, content management, and collaborative functions.

img 813a

An unclassified sample display from the US Army’s BCKS system. Thanks to MarkLogic and the US Army for permission to use this image.

The system applies metadata based on the DOD Metadata Specification (DDMS). The content is managed automatically by applying metadata properties such as the ‘Valid Until’ date. The system uses the schema standard used by the DOD community. The MarkLogic Server manages the work flow until the file is transferred to archives or deleted by the content manager. MarkLogic points to savings in time and money. My sources tell me that the system can reduce the risk to service personnel. So, I’m going to editorialize and say, “The system saves lives.” More details about the BCKS is available here. Dot Mil content does move, so click today. I verified this link at 0719, August 13, 2008.

Read more

Vorsite Connectors for SharePoint

August 13, 2008

A helpful reader alerted me to Vorsite’s connectors for SharePoint. Based in Seattle, the company has a core competency in SharePoint. The person who wrote me alerted me to connectors; for example, the code widget that hooks Documentum to SharePoint is called “v-Pass for Documentum”. Once installed, a SharePoint user can search the contents of a Documentum content management system repository. The company also offers Active Results for Microsoft Search. This product “extends Microsoft Search capability.” With ActiveResults, a SharePoint user can “e-mail a document, tag it or send it to a records vault without leaving the search results user interface.” Earlier this month, the company bundled a number of tools, including the Documentum connector and Active Results.” You can learn more here. The company’s Web site is here. If you are wedded to SharePoint and need to connect to Documentum or other content management systems, give them a call. A happy quack to the reader who alerted me to this firm’s connectors.

Stephen Arnold, August 13, 2008

Is There a Mainframe in Your Future

August 13, 2008

Brian Womack’s article “Big Iron Anything But Rusty For Mainframe Pioneer IBM” brought a tear to my eye. Writing in Investor’s Business Daily here, Mr. Womak says:

IBM says revenue for its mainframe business rose 32% in the second quarter compared with a year earlier, easily outpacing overall sales growth of 13%. A big driver was February’s launch of IBM’s next-generation mainframe line, the z10, its first big upgrade since 2004. IBM spent about $1.5 billion on the new line.

The core of the article is an interview with David Gelardi, a 52-year-old mainframer. I don’t want to spoil your fun. I love mainframers who explain why big iron is as trendy as Heidi Klum’s catchphrase, “One day you’re in. And one day you’re out.” For example, consider this comment by Mr. Gelardi:

If I take (1,500 Intel) servers . . . and put them on a single mainframe, I’ll have no performance problems whatsoever. But I’m taking all of that workload that was on 1,500 separate servers and consolidating them on one mainframe. While it may be a million-dollar machine and up, it’s actually cheaper than those 1,500 servers.

This is pretty compelling data. I wonder if Google is aware of what it might gain if it were to abandon its decade of effort with commodity servers? Google and IBM are best buddies now. Maybe IBM will convince the GOOG to change its ways? Is there a  mainframe in your future?

Stephen Arnold, August 13, 2008

More Search without Search

August 13, 2008

Google wizard Stephen R. Lawrence and sub-wizard Omar Khan invented a what I probably too simplistically characterize as meta-data vacuum cleaner. Useful for mobile devices, this addition to Google’s “search without search” arsenal is quite interesting to me. The invention is disclosed in US7,412,708, granted on August 12, 2008, with the title “Methods and Systems for Capturing Information.” If you are interested in how Google can deliver information before a user types a query or what type of data Google captures, you will want to read this 14 page document. Think email addresses and more.

The invention is not new, which is important. The GOOG is slow in integrating whizzy new monitoring technology in its public-facing systems. This invention was filed on on March 31, 2004. Figure nine to 12 months of work, I think that this is an important chunk of Google’s metadata vacuum cleaner. I cover a number of these inventions in Google Version 2.0. I discussed one exemplary data model for usage tracking data in my for-money July August column for KMWorld. I won’t rehash those documents in this Web log article. You can download a copy of the document from the good, old USPTO here. Study those syntax examples. That wonderful USPTO search engine is a treat to use.

What’s this invention do? Here’s the official legal eagle and engineer description:

Systems and methods for capturing information are described. In one embodiment, an event having an associated article is identified, article data associated with the article is identified, and a capture score for the event is determined based at least in part on article data. Article data can comprise, for example, one or a combination of a location of the article, a file-type of the article, and access data for the article. Event data associated with the event is compiled responsive at least in part to a comparison of the capture score and a threshold value.

The GOOG’s Gmail plumbing may need some patch ups, but once those pin hole leaks are soldered, US7,412,708 portends some remarkable predictive services. I can’t type on my mobile phone’s keyboard now. Google knows that I will be one of the people eager to let Google anticipate my needs. I wonder if there’s a link analysis routine running across those extracted metadata. I think I need to reread this patent document one more time. Join me?

Stephen Arnold, August 13, 2008

Gmail Gfail Update

August 12, 2008

An estimated 20 million Gmail users received 502 errors upon login on August 11, causing a huge furor. Both personal and hosted apps accounts were affected. People in the United States, Canada, and India reported the problem and even a Google employee said the company’s corporate e-mail account was down. (You may want to read our earlier, opinionated post here. Google News’s own run down of stories is here but may be gone soon as well. Click quick.)

Google posted a comment about the August 11, 2008, outage: “Since about 2 p.m. Pacific Time today, many Gmail users have been unable to access their email. We are very sorry for this interruption in service. The issue is being caused by a temporary outage in the contacts system used by Gmail which is preventing Gmail from loading properly. We are starting to roll out a fix now and hope to have the problem resolved as quickly as possible. Even though you may not be able to get to your inbox right now, your mail is safe, including new incoming messages.”

The first help discussion post from the Gmail Guide team at 5:31 p.m. At 7:35 p.m., Gmail Guide stated all accounts should be accessible. Although a bit slow on the uptake, Gmail communicated as it went along, which surprised many armchair and credentialed commentators. 502 errors appear to be a fairly regular occurrence, looking at the discussion group notifications at http://groups.google.com/group/Gmail-Help-Announcements-and-Alerts-en/topics.

Gmail Product Manager Todd Jackson actually posted an apology on the official Gmail blog, giving a tiny explanation of the problem: “The issue was caused by a temporary outage in our contacts system that was preventing Gmail from loading properly.” Translation: Address books are mucking up the works. He also said: “We’re conducting a full review of what went wrong and moving quickly to update our internal systems and procedures accordingly. We don’t usually post about problems like this on our blog, but we wanted to make an exception in this case since so many people were impacted.” Translation: We hear the several million people screaming at us.

A long, technical explanation wasn’t issued. But just so you have an idea about where the problem occurred: the Contacts data is based on an API that ties contact to your entire account, not just Gmail. With the API you can “synchronize Google contacts with contacts on a mobile device, maintain relationships between people in social applications (Facebook, MySpace, etc.), give users the ability to communicate directly with their friends from external applications using phone, email, and IM.”

So that Contacts function has its Googley-tendrils extending into complex places. Maybe it’s like a long-tailed cat in a room full of rocking chairs, and this particular tentacle just got squished?

Jessica Bratcher, August 12, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta