Federation: Big Need, Still a Challenge
April 25, 2008
In May 2001, I gave a talk at one of the first Web Search Universities. The audience was baffled by my talk, which I called “Vertical Search Engines: System-Initiated Information Retrieval”. I recall that no one knew what I was talking about. Sigh. Story of my life.
Organizational Reality
Here’s the core diagram from this talk:
This is a clip art silo and it is a basic feature of the enterprise. This silo does not hold corn; it is a metaphor for the information technology department. IT operates in its own world or space. The engineers and computer wizards stick to themselves, use their own jargon, and occasionally snort at the antics of a 20-something in the marketing department.
Here’s another diagram from my 2001 lecture. This diagram shows a company as a collection of silos. I know that people in organizations are part of one big family, everyone is on the same team, and everyone is in the same fox hole. This all-too-common set up of a company appears below:
Each of these silos has its own information. Even in organizations with an effective IT infrastructure, there are nooks and crannies stuffed with digital information. It may be a laptop that a manager carries back and forth, a USB drive, or a Google Search Appliance tucked in a corner of the marketing department where “competitive intelligence” is kept for the use of the marketing mavens.
Now let’s look at this 2001 diagram:
The red, blue, and yellow areas cut across the silos. Even the most disorganized commercial entities don’t want to have multiple accounting, personnel, and management systems. In 2001, it was common sense. Today, it’s regulated, and the MBAs, financial geniuses, and lawyers running the show don’t want to follow Enron’s executives to a country club prison.
Organizations may be flat, disorganized, ad hoc or just wacky. But the four types share one characteristics: the intersection of silos and horizontal flows create dust bins of data and content. Federating systems have to find the appropriate “stuff”, ensure security, and make the “stuff” findable to authorized users. This was very hard in 2001 and only slightly less challenging today.
Three Issues
What’s this old diagram have to do with the subject of federation?
First, in order to federate content, you have to deal with two issues unrelated to the technology that you choose to index, search, and content process. You need to deal with security, regulatory requirements, confidentially, and security. It’s obvious that if you index whatever sits on your servers behind your company’s firewall, you are going to expose content about which you know little or nothing. Problems are lying in wait until the right query surfaces the information, then the craziness begins. Politics is the other non-tech issue. Humans want control, and horizontal functions crash directly into the silos.
Second, in order to federate you need to access the information in the silos, homogenize it, and deduplicate it. You have to figure out how to index the information and data in a way that is meaningful to the silo owner who created the data and to the folks in the other silos who don’t have much knowledge of the jargon used in another unit. When an organization operates across time zones and national boundaries, the problem jumps up a notch or two. You don’t have to be married to a Harvard MBA to conclude that federation is a tough technical problem.
Third, you have to get the work flow figured out. Search and content processing systems are programmatic. Machines can collide with human activities. Let me give you an example. If your indexing sub system is aggressive, you might “kill” accounting’s ability to generate paychecks or run the IT department ragged trying to restore servers that your indexing robot overloaded.
Federating Realities
To wrap up this quick look at federation, let me offer several observations:
- Federation, while challenging technically, poses some political and procedural issues
- Politics, not technology, may be the major barrier to get through when implementing a federated search system
- Security and access controls can make or break the system
- Working through the non-technical aspect of a federated search system takes time and resources (money).
A number of vendors assert that their systems deliver federated search. That’s true. Keep in mind that you have a great deal of work to do to make the vendor’s system work in your organization. If you get something wrong or rush the job, you can end up with a non-functioning system or one that users dislike.
This message fell on deaf ears in 2001. Let me know if it is ringing loud and clear in 2008.
Stephen Arnold, April 25, 2008
Comments
2 Responses to “Federation: Big Need, Still a Challenge”
You’re Right! And I must say, I can relate to your frustrations. It’s hard to get the lemmings in the world of IT to look at technologies for what they are, they are now blinded by the mass marketing dollars of massive technology conglomerates. The interesting observation is that as technology markets grow, the appropriate choice of the right technology applied to the problem becomes less likely, as the marketing dollars of large companies with obsolete prior generation technologies milk the markets and customers as long and far as they can. I encourage you to see what we have been working on for 7 years. Our CTO, Andrew Scherpbier, was the original author and wrote ht://DIG the first open source GPL “federated” web search engine 20 years ago while trying to solve distributed search across 50 web servers at the university. It is now used by most of the major universities today as a source code base to teach web search. Corporate search and data management, is quite a different issue, with many dynamics that web search. If you are interested, I’d be more than happy to email you our whitepaper.
Kind Regards,
Bob Brown
Chairman & CEO
BlackBall, Inc.
I appreciate your taking time to share your thoughts. I would like to see the whitepaper. Email it to seaky2000 at yahoo dot com.
Thank you,
Stephen Arnold