Inside the Tokamak, Part 3: The Green Spheres of Community

April 4, 2008

In the second part of this essay, I explored the notion of context. The short comings of key word search and retrieval are easy to identify once we think in terms of what the user needs to do his job or accomplish a task. But context is larger than a single user, context spills into other areas as well and it gains significance when interacting with messages and community.

We’re ready to tackle one of today’s hottest ideas–community. I loathe the term social software, but English is what it is, and I can’t figure common usage. I will stick with the word community, and you can substitute social software, so this essay seems more in step with the times. You can see where the community function sits in this schematic:

expanded gray bar

When the Internet was unknown to the auto mechanic, community, not technology, allowed Internet Protocol to work. The early Internet and its precursor the Advanced Research Projects Agency was for a nerdy in crowd. I was lucky. The University of Illinois in Chambana was a player in this game. But for all practical purposes, Internet access when I started college was for an elite group. Flash forward four decades, and the Internet is dependent on people communicating. The surge of interest in point-and-click services like MySpace.com and Facebook.com defines millions of people’s Internet experience.

Read more

Search Technology’s Nose under Our Tent?

April 4, 2008

FCW (Federal Computer Week) explains that search technology is on the US government’s radar. You will want to read the full story “Agencies Grapple with Search and Discovery” by Michael Hardy on the FCW Web site. The point in the story that jumped at me is:

The situation will only grow more complicated, [Jason] Baron [director of litigation at the National Archives and Records Administration] said. To date, most of the attention to electronically stored information has centered on e-mail, text chat logs and similar common tools. But it can also include voice mail, electronic calendars, instant messages, video conferences, posts to wikis and blogs, and virtual worlds such as Second LIfe, he said.

This FCW story complements the information (which is sketchy at best) about certain search vendors’ cooperating with US intelligence agencies. A representative story appeared on SFGate (home of the San Francisco Chronicle) on March 30, 2008.

Stephen Arnold, April 4, 2008

Content Analyst and dtSearch Combo Product Announced

April 4, 2008

Content Analyst, a text processing company with DNA from the US intelligence community, released its Conceptual Search and Text Analytics software Version 3.2. This release incorporates dtSearch’s search-and-retrieval system. dtSearch has offices in Bethesda, Maryland, has offered a solid search-and-retrieval system for single users, developers, and organizations since 1991.

The combo product delivers key work and conceptual search. The release also offers licensees clustering and support for cross language support. Based in Reston, Virginia, Content Analyst’s technology can be used to generate taxonomies and produce summaries of documents.

Content Analyst–like Groxis, Recommind, and Vivisimo–is making a move from a niche market into the broader market of behind-the-firewall search applications.

Digimind Says, “Bonjour America”

April 3, 2008

Digimind, a French marketing intelligence and content processing systems company, has opened offices and a subsidiary in Boston. Chris Hote, Ph.D. will head up the operation.

The company’s flagship product is Digimind Evolution. The company asserts that it ‘”is the only global competitive intelligence platform.” What sets Digimind apart from other text processing companies is the scope of the firm’s platform, which delivers intelligence software as a service.

The company reports that it has tallied nine consecutive years of profitable growth. The company says its market intelligence systems have more than 60,000 users worldwide. More information is available at the company’s English language Web site at www.digimind.com. French readers will find www.digimind.fr adds detail to the information available in English.
Stephen Arnold, April 3, 2008

Inside the Tokamak, Part 2: The Red Spheres of Context

April 3, 2008

In the first part of this essay, I drew a parallel between a tokamak device and plasmas. The idea is that in an organization, new technologies and increasing pressure to work smarter changes what users expect a search and retrieval system to deliver. In this second installment, we look at four additional digital ions and electrons that are “going critical” with regards to information access.

Let’s begin by revisiting the diagram, paying particular attention to the 12 spheres inside the diagram’s central “gray boundary”.

expanded gray bar

The outer two stacks of “yellow spheres” and “purple spheres” exert pressure on users, vendors, and organizations. As the individual yellow and purple spheres expand, the activity inside the “gray boundary” increases. When dealing with non-linear phenomena, it is difficult to predict what will give way and what will surge to dominance. There is considerable uncertainty within the “gray boundary”.

Perhaps you have experienced this yourself. In my work in the last five years, I have documented the increasing dissatisfaction users express about their search and retrieval systems. Some comments are delivered with hope: for example, “I wish the system would let me retrieve what I need regardless of which department has the data”. Other comments are more earthy, “Management has no idea how frustrated I am with this stupid system.” In my work in New York, I have seen 20-somethings staring at a search results display with frustration and anger clouding their otherwise pampered features.

You may want to click on the diagram to see the labels of the “red spheres” more clearly. As you recall, I prepared this diagram more than five years ago, so it is long in the tooth. But it serves as a useful starting point for our exploration of the forces transforming search from a nice-to-have function to a must-have service.

The Red Spheres

There are four “red spheres” in this stack of digital ions and electrons. As per my wont, I’ll comment on each briefly. To sum up this second installment, I want to offer some additional comments about the “search” sphere. The label for the “red spheres” is contextual. Read more

Microsoft’s Search Wrangler Identified

April 2, 2008

Satya Nardella, according to the Hindustan Times, is Microsoft’s top gun in search. In a story by Priya Ganapati, we learn:

Nadella, senior vice-president for Microsoft’s search, portal and advertising platform group, is trying hard to focus on business as usual in Microsoft’s online division even as he faces an uncertain future. In other words, the MSN portal and the Live Search business, and big bets on getting advertising revenues from services now rest on Nadella’s shoulders.

A graduate of Mangalore University and an alumnus of Manipal Institute of Technology, Nardella is described as “technical”. He’s a manager who can “go really deep technically”. According to Mary Jo Foley, a Microsoft pundit, “he’s a real straight shooter”. But Nardella may find himself reporting to Brian McAndrews, the former boss of aQuantive, which Microsoft acquired in May 2007.

This is one job to watch because whoever is “search wrangler” for Microsoft, a shoot out with Google requires serious weaponry. Beyond Search wonders if Yahoo’s big guns can add the fire power Microsoft needs to bag the Googzilla of search.

Stephen Arnold, April 2, 2008

Inside the Information Tokamak, Part 1: The Blue Spheres of Messaging

April 2, 2008

I’ve also enjoyed the tokamak, a machine that produces a toroidal magnetic field for confining a plasma. (A plasma is, for those who cut physics class to enjoy a spring day, an ionized gas containing an approximately equal number of positive ions and electrons. Zap this puppy, you get interesting phenomena. Here’s one example on a slightly larger scale than your local university’s physics lab.

image

Source: http://ocw.mit.edu/NR/rdonlyres/Global/7/
77E722FA-4A00-476D-9D4A-3F86C9BDA2B3/0/chp_sun_plasma.jpg

So what does nuclear physics have to do with behind-the-firewall search? Actually, quite a log if you have a poetic side to your curious self.

I am living in a digital tokamak. Instead of ions and electrons, I am bombarded by the information particles shown in the diagram below:

expanded gray bar

This is a diagram prepared in 2003. I am using it “as is” despite its flaws. If you want to recycle the diagram, please coordinate with me.

If you read my earlier post about the “gray bar”, you know that the “yellow spheres” and the “purple spheres” exert pressure on an organization’s information environment. The three new sets of spheres in blue, red, and green are what’s inside the “gray bar” in this diagram.

Read more

Nettlesome Google Story Won’t Die

April 1, 2008

Rumors about Google’s cooperation with US government agencies come and go. In the last week, more details percolate about the tie up between the search giant and the intelligence community.

An interesting recycling of the current crop of rumors appeared on April 1, 2008, in the Times of India‘s Web site. The Times‘s unsigned article states:

In the most innovative service, for which Google equipment provides the core search technology, agents are encouraged to post intelligence information on a secure forum, which other spies are free to read, edit, and tag-like the online encyclopedia Wikipedia .

Beyond Search has no information to prove or disprove the assertions in the Times‘s article. If true, a public relations dust up can add to the increasingly negative stance taken toward Google for its loss of key staff to Facebook.com and its seemingly weakening grip on online advertising.

Stephen Arnold, April 1, 2008

ZyLAB Opens NY Office

March 31, 2008

Search and content processing company ZyLAB opened an office in New York City in Rockefeller Center. ZyLAB describes its suite of technology as an “Information Access Platform,” a positioning shift that other vendors are emulating.

Dr. Johannes Scholtes, president of ZyLAB North America LLC, told Beyond Search, “We understand that as our client base continues to grow that we need to expand our corporate presence to ensure our customer service remains first-class.”

Beyond Search, a new study published by the Gilbane Group, profiles ZyLAB’s innovative technology. The company blends scanning, rich content processing, and search to allow licensees to manipulate a range of structured and unstructured data whether in hard copy or electronic form.

More information about the company is located at www.zylab.com.

Stephen Arnold, March 31, 2008

Key Word Search Vendors: Panting Laggards

March 31, 2008

In September 2003, I gave an invited lecture at LANL, an acronym for Los Alamos National Laboratories for those of you who don’t keep up with some of the US government’s most interesting research nomenclature. I poked around my digital warehouse today when I saw an announcement that a major search-and-retrieval vendor was now officially in the “information access business”. I used to work for Ziff Communications Co., and we owned an outfit called Information Access Co. That was a great company name, but the whole shooting match was sold to the giant Thomson Corporation and the name Information Access fell into disuse or so I thought.

I marvel at the “back from the dead” certain terminology demonstrates. IAC, as Information Access was known for more than 15 years, allowed a person to search for electronic information. The idea was a good one, and IAC had revenues of more than $100 million at the time of the sale. The idea was simple. We used bibliographic records or what today would be called “structured metadata”, full text of articles or what today would be called content, and proprietary scripts to generate reports or what today would be called business intelligence. The user of our General Business File product in 1990 would pick from a menu of options; for example, look for a job. Then the user would pick from one of the major cities whose employment opportunities we indexed (now tagged) and the system would display job openings. A mouse click sent the report to the printer, and we had happy users. We sold more than 1,000 of these systems in less than nine months in 1990. Considering each system was in the $20,000 plus range, the General Business File would be a success in our Googley world.

The LANL group wanted to know about the future of search and “The Information Implications of Social Software”. Now in 2003, there wasn’t the popular awareness of social software because MySpace.com, Facebook.com, the Web 2.0 “revolution”, and AJAX were dreams or oddities known to a handful of code bangers.

One of the key points in my presentation was that “information access” was an umbrella term for a bundle of activities and functions. These separate entities were now able to interact to form new, often quite surprising products and services. Social software–which I defined as the use of network technology for communication, collaboration, and combination–was a terrible term, but we were stuck with it. (To learn more about my annoyance with information terminology, Searcher Magazine is running an features story that updates to my 1999 article and my year 2000 article about technology convergence. Sorry. I don’t have a publication date yet, but the editor, Barbara Quint, is working on my lousy prose now.)

Take a look at one diagram from my lecture. Keep in mind that I prepared this five years ago, but for our purpose it is, I hope, useful to you.

growthzones

Someone complained that I was copyrighting my work on this Web log. Okay, I won’t put the copyright symbol on this graphic. If you want to recycle my work, please, send me an email and get permission. I get annoyed when certain individuals borrow with neither attribution nor permission. Right, Mr. Hermans?

Let’s take a quick tour of this diagram, and then I will close with some observations about the “panting laggard” that is behind-the-firewall search.

Yellow Spheres

Notice the “yellow spheres”. You may have to click on the small image in order to read the notations on this diagram. The heading is “Enabling”. The idea is that each of the “yellow spheres” represents a category of technology that makes online information more useful. For example, “Converting Creating Content” refers to content authoring and content transformation. Behind-the-firewall systems have to take different file types and homogenize them so the system can manipulate them. If a search or content processing system can’t “read” a file, the system won’t process it. The idea, then, is to get the content regardless of its form and format into the search and content processing system. The bottom “yellow ball” is labeled “Spidering, Indexing, and Searching”. You recognize these ideas because 90 percent of a search vendor’s sales pitch talks about this “yellow ball”. In terms of this diagram, it’s easy to see that these three operations–spidering, indexing, and search–are just a cog in a much larger system. Vendors who pitch you about these three features are “panting laggards”. These vendors are almost out of the race and almost certainly won’t win in the long run in my opinion.

Purple Spheres

The “purple spheres” are identified as “Analysis”. Each of these four spaces are now mainstream. Vendors offer these services because each is easier for a manager to assess in terms of a payoff. Few people in an organization want to see laundry lists of information. Filtering eliminates information that rules, methods, or user-defined specifications say, “I don’t want information about enterprise search. I want information about predictive analytics.” Clustering is a catch-all term. In it reside classification, grouping, categorization, and any thing to do with today’s idées du jour–taxonomies and ontologies. The idea is that the system groups similar documents in a meaningful way. If you don’t know what you really want to review, you scan the category labels and browse the results. The third “purple sphere” is data mining. Companies like SPSS and SAS Institute are familiar to you if you took advanced statistics in college. These companies are not in the business of text processing and offering a burgeoning array of features and functions designed to whip unstructured content into shape. SAS Institute bought Teragram, and their PR team told me that SAS will become an “enterprise search company”. I detest this term, but the move is a good one. SAS wants to chop up text, pull out the juicy bits, count them, crunch them, and generate reports for users. The final “purple sphere” is labeled “static / video imaging”. Most organizations are awash in digital information, but most of that is text. Not for long will it be text. “Going forward”, I said in 2003, “behind-the-firewall search systems will have to come to grip with the information-charged binary files–chemical structures, engineering drawings, audio recordings, and video.” Now five years later, only Autonomy has a reasonable solution to video. The other data types remain “outside” the behind-the-firewall system vendors capabilities.

Gray Bar

The “gray bar” was intended to be a spectrum. My lousy Photoshop skills produced this blah “gray bar”. The idea is that “Enabling” and “Analysis” are two distinct types of pressure on search and content processing opportunities. As the “yellow spheres” get bigger, they will exert pressure on the folks in the “gray bar”. Similarly, as the “purple spheres” exert their influence on users, a catalytic reaction occurs in the “gray bar”. In 2003, I identified three significant changes in the way employees will interact with digital information.

First, instead of a search box, people looking for information want some sort of information finder “landing page”. For want of a better term, I used the word portal for the notion of gaining access to information in a search and content processing system.

Second, I identified the shift from getting laundry lists of “hits” to a type of collaborative work. Vendors often forget that documents are created by people, unless you are lucky enough to live inside some hyper-advanced culture like Google’s. But the GOOG is an anomaly, so think about your company. You want to accomplish a work task. Many work tasks require working with one or more colleagues. So, the world of search and retrieval becomes an enabler of collaborative interaction.

Third, the search system is a means of keeping track of what’s been done and how information has changed. In my new study, Beyond Search, published by the Gilbane Group, I talk about one of Google’s most interesting acquisitions data management acquisitions in 2006. (A discussion of this company and its technology appears in Beyond Search.) This company was working is this type of hyper-search space, and if Google does more than launch betas, the technology could revolutionize its enterprise applications division. The point is that search is simply one facet of a much more significant set of processes coming about as the “yellow spheres” and the “purple spheres” expand and change the “pressure” for next-generation applications.

Going Nuclear at LANL

To wrap up, I was making explicit that key word search was a dead end. The action was in the “yellow spheres” and the “purple spheres”. As these various functional and technical areas grew more robust and fell in price, the notion of key words is irrelevant to the real opportunities in the “gray bar”.

In my discussion of the prescient Sagemaker technology here, I make it clear that the flabby key word search had short comings that were well known a decade ago. Now many leaders in search and retrieval are repositioning themselves–actually distancing themselves–from key word search. Not only is it a commodity, the financial difficulties of some of the highest profile vendors make it clear that generating revenue is not easy to do. You can snag Lucene (discussed here) or Flax (discussed here) and save yourself some money.

The LANL folks were not thrilled with my talk. I thought some in the audience would explode. Webmasters and government marketers had just completed a redesign of the LANL Web site. Key word search was offered, but it was slow as molasses. I think it’s been improved now. None of the functions I identified as important in the “gray bar” were available on the LANL’s public-facing or employee-only Web site.

These wizards invited a guy from rural Kentucky, and I did the intellectual equivalent of tracking mud on their white carpet. Competition for clicks among the national labs is fierce. LANL, long the number one research facility, had suffered some security disappointments and the wily wizards at Oak Ridge National Lab had rolled out a niftier Web site. Believe it or not, a high-traffic Web site makes a difference at budget time on Capitol Hill. Here I was making a mess of the new white carpet. I turned in my fancy badge and high-tailed it back to Kentucky.

Most vendors of search and content processing systems have been slow to provide the functionality shown on my amateurish diagram. These vendors are now charging forward with new positioning, new buzzwords, and new ways to explain the benefits of their systems. Like the out-of-shape athlete, some of these folks are coming into our offices looking much the worse for wear. Most are “panting laggards”–not fit for serious information access duty and several years too late.

Stephen Arnold, April 1, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta