Google and Data Object Visualization

June 30, 2009

The USPTO published US7555471 B2 on June 30, 2009. The Beyond Search goslings think this is a reasonably important Google disclosure. The investors include one super Googler and clutch of other Google rock star engineers. Andrew Hogue is a Googler to watch. If you find his official Google page opaque, try this link.  He and his band of engineers have received a patent for “Data Object Visualization.” Don’t get too excited about the graphics. The system and method applies to a core Google system for cleaning up discrepancies in fact tables. If you are a fan of Dilbert, this is the invention that describes one of Google’s smartest agents the official descriptor “janitor”. How smart is the janitor. Smart enough to make dataspaces closer to reality. The USPTO system is sluggish today, so you can get info from FreePatentsOnline.com or one of the other services that provide access to these public documents. I love that janitor lingo too. Googley humor for big time inventions makes clear that the 11 year old Google still possesses math club whimsy. Those examples for atomic mass and volcano are equally illuminating.

Stephen Arnold, June 30, 2009

DARPA: We Want to Be Like Google

June 30, 2009

Short honk: I think that the US government should emulate Google. I am not critical of DARPA (unit of the Department of Defense) and its new project. You can read about it in the CNet story “Reading Machine to Snoop on Web.” If you click this link, you can probably for a short period of time, access the statement of work for this little project. The first money chunk seems to be north of $20 million, a fraction of what it will cost to replicate some of these “as is” features of the Google. Enjoy.

Stephen Arnold, June 30, 2009

The Search Horn of Plenty

June 27, 2009

Bundling is starting to poke its nose into enterprise search. The idea is that one buys and end-to-end solution. Each vendor defines “end”, of course. The customer gets a bundle, in effect, a digital horn of plenty. David Neal’s “Autonomy Launches Social Media Contact Solution” may be a search harbinger of things to come. Autonomy has been among the most agile vendors when it comes to packaging search in ways that strike a chord with customers and journalists. (Keep in mind that the addled goose is not a journalist.)

The release is a module within Autonomy’s Meaning Based Marketing Suite, and brings the web directly into the contact centre, according to the firm. AIMO comprises Autonomy’s TeamSite web content management system, the Optimost advanced analytics marketing solution and its IDOL server platform for search and information processing. The application of web content management and analytics within IDOL means that companies can understand and act on detailed customer input in real time, the firm claimed.

Microsoft, based on rain drops of information falling on the goose pond in Harrod’s Creek, is also in the bundling business. Details of Microsoft’s approach are not as crisp as Autonomy’s package, but the broad outlines are starting to be visible through the torrent of marketing and PR about Microsoft’s search systems.

Autonomy, however, is a trend aware company. Its approach warrants watching.

Stephen Arnold, June 27, 2009

Seeing Crime the Professional Way

June 25, 2009

Most visualization is gratuitous. Where graphic representations are useful is in law enforcement. Crimes are committed by people. People have to be someplace. Putting the data, the people, and the events together makes it possible to “see” patterns. Visualization of crime related information is often a short cut to resource deployment, anticipatory planning, and budget management. You can see some open source, no cost examples in the article “20 Visualizations to Understand Crime”.

image

Flowing Data did a good job on this write up. Recommended for those who wonder about the value of monitoring. Adding real time data to these visualizations is a very useful innovation that some organizations are now exploring.

Stephen Arnold, June 25, 2009

Text Mining and Predicting Doom

June 23, 2009

The New Scientist does not cover the information retrieval sector. Occasionally the publication runs an article like “Email Patterns Can Predict Impending Doom” which gets into a content processing issue. I quite liked the confluence of three buzz words in today’s ever thrilling milieu: “predict”, “email”, and “doom”. What’s the New Scientist’s angle? The answer is that as tension within an organization increases, communication patterns in email can be discerned via text mining. The article hints that analysis of email is tough with privacy a concern. The article offers a suggestive reference to an email project at Yahoo, but provided few details. With monitoring of real time data flows available to anyone with an Internet connection, message patterns seem to be quite useful to those lucky enough to have the tools need to ferret out the nuggets. Nothing about fuzzification of data, however. Nothing about which vendors are leaders in the space except for the Yahoo and Enron comments. I think there is more to be said on this topic.

Stephen Arnold, June 23, 2009

Track Folks Down via People Search Systems

June 14, 2009

You too can be a private eye. A happy quack to the reader who alerted me to a list of 25 search engines that can help you find a person. “25 Free People Search Engines to Find Anyone in the World” is quite useful. I learned about some systems that the goslings did not have in our list. A couple of quick examples and then you can navigate to Findermind.com and snag the full listing:

  1. Tweepz—Looks very strong
  2. Private Eye—Like having Peter Gunn at your side
  3. Criminal Searches—Very, very useful

Add all 25 to your bookmarks.

Stephen Arnold, June 14, 2009

Autonomy and Social Media

May 28, 2009

Exalead, according to my search archive, was the first vendor out of the blocks with a social search function. You can find that system on the Exalead Labs’s Web page here. I recall a demo from IBM several years ago in which social network analysis of content was featured, but I have a tough time keeping track of what trend Big Blue surfs from quarter to quarter.

Now Autonomy is now in the social game as well. You can read Phil Muncaster’s “Autonomy Launches Social Media Analysis Tool” here. The new system is called Autonomy Interwoven Social Media Analysis, which like other Autonomy’s products integrates the IDOL technology.

For me, the most interesting part of the write up was:

The technology uses clustering, pattern matching techniques and probabilistic modeling to understand sentiment, and can present marketers with a richer and more contextual set of data than traditional keyword spotting tools may be able to, according to Autonomy.

You can locate more information at www.autonomy.com.

Stephen Arnold, May 28, 2009

Digital Reef Makes Microsoft Fast Work

May 25, 2009

I puzzled over “Digital Reef Partners with FAST, Helps Manage SharePoint Content” in CMSWire here. The article covers a number of content functions that I try to keep separate; for example, unstructured data, “out of the box support for eDiscovery, compliance, Office SharePoint Server management, data security, and storage initiatives”, and analytic tools. Oh, I almost omitted manipulation of structured data. Who provides this happy family of services? Digital Reef. You can read more about this company here. The company asserts that it handles these different functions. My view is that the company knows how to tame SharePoint and implement Fast Search’s ESP “out of the box”. In my experience, prior to the acquisition of Fast Search & Transfer, implementing Fast ESP as an “out of the box” solution was time consuming, difficult, and required a Fast engineer with email and phone access to senior Fast Search wizards in Boston and Oslo. Dark days ahead for third party vendors of alternatives to Microsoft SharePoint services.

Stephen Arnold, May 25, 2009

Data Tables Contain Deleted Data. Yikes. Revelation.

May 21, 2009

it was spies on Facebook. Then it was the LA Times’s spoofed via a year old Prop 8 story. Now – news flash – the issue is privacy on social networking sites. Yikes. What a scoop? Sky News in the UK published “Fears over Privacy on Social Networking Sites” here. The intrepid news hounds at Sky News reported:

Researchers from the University of Cambridge say that many social networking sites maintain copies of user photos even after users delete them.

I wonder if the wizards in the groves of academe figured out that quite a bit of other information and data lurk on these sites. In fact, unless the indexes have been rebuilt, my hunch is that my team could find some interesting stuff not searchable but available to those poking around with forensic savvy.

I am waiting for one of these intrepid reporters to define “delete” and “remove”.

Stephen Arnold, May 22, 2009

Early Days for Information Management

May 21, 2009

In the last two weeks, I have been crisscrossing the United States. On last night’s fab flight from Philadelphia to Louisville, I watched the lights and thought about the comments I heard about data management. I have to mask the clients with whom I spoke and fuzzify the language, but I think I can communicate several key points.

Search Is a Symptom, Not the Cause

One idea that hooked me was an observation about search and the turmoil and confusion it creates and leaves behind once a new system is up and running. Search is not the problem. Search is a manifestation of the organization’s broader information management situation. If information management is lousy, then search will be lousy as well. The problem is that fixing information management in an organization under financial pressure is a big job. Furthermore, it involves change which is often resisted when job loss and work responsibilities are likely. It’s much easier to slap in a new search system and move on. Unfortunately, search gets another black eye and a vendor can be criticized, sometimes in a scathing manner, because the information management approach was flawed, broken, or non existent.

image

Fatigue or diabetes?

No Clue about Volume

Most of the people with whom I spoke sang one verse from one hymnal, “We have no clue about our data. We don’t know how much info we have. We are lost in bits. We are lost in bits. We are clueless.”

Most of the savvy information technology professionals know that the volume of digital information is increasing. The problem is that no one knows exactly how fast, what to do with the emails and documents, or how to keep track of what’s where. The Abbott and Costello routine “Who’s on First?” anticipates the statements about the hassle information volume poses.

One doesn’t need a degree in information science to recognize that if you can’t collect digital information, you don’t have much of a chance answering this question: “Are you sure we don’t have that document?” Finding is now becoming a must have function, and the Catch 22 is that most organizations don’t have a grasp on the amount of data in the organization or where an item is, search becomes a bit tougher.

image

How big is the information task?

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta