Virtual Servers: It Is Recrawl and Reindex Time

September 22, 2008

The malarky about virtualization has many information technology professionals courting chimeras. Some virtualization is good. For example, we have a couple of quad core, four gigabyte servers that are four to five times faster on our benchmark tests than the aged NetFinity 5500s we retired. The new servers have the moxie to run virtualization software. No problems so far. In fact, chopping boxes into separate virtual servers makes sense and is tame compared to some of the technologies that arrive at our office door.

Virtual storage, however, is another kettle of fish. Our experience has been that complex directory structures such as those spawned by SharePoint and certain enterprise applications are complicated. When these complex structures are mixed with virtual storage, we have encountered some excitement. We test software, so our trashed files provide us with useful data, not long weekends and sleepless nights.

InfoWorld on September 19, 2008, here called attention to some of the issues virtual storage drags along with the snappy marketing messages and rah rahs for cheaper administration. “Virtual Server Backups Prone to Failure, Survey Finds” makes clear that virtual solutions are not without some problems. The InfoWorld write up reports on a survey that asserts more than half the virtual server backups don’t restore. The article has some other data but I want to focus only on the backups not restoring.

Here’s the problem. Search is a storage intensive application. The indexes can be big. If an index doesn’t start out big, in a matter of months the index gets big. Logs get big. When a search or content processing system crashes or an index update corrupts the master index, an administrator turns to the back up sytem. If the search system is using a whizzy new virtual storage system, the backup won’t work. The problem is that rebuilding the index is not always a five minute or even a five hour job.

Recrawling and reindexing can be tricky. Systems that perform significant content processing can crunch for a day,. maybe more generating metadata. Our suggestion is to skip virtual storage for search and content processing systems. Already have one? You may want to devirtualize and quickly.

Stephen Arnold, September 22, 2008

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta