Accelerating XML Parsing

December 4, 2008

I have received a number of comments about the high speed indexing referenced in the interview with Perfect Search. One reader asked me to call attention to the open source XML parser VTD-XML. The acronym means Virtual Token Descriptor for eXtensible Markup Language. The suite of open source software may not meet the needs of some content processing applications because the number of large documents imposes additional work for the developer. However, for database type and other types of records, the method can eliminate redundant parsing, which is computationally expensive. One reader sent me a link to a useful description of VTD-XML. Here are the links to this write up by James Zhang. The original series–“Index XML Documents with VTD-XML of VTD-XML”–was published by SOA World Magazine, whose url is www.soa.sys-con.com. (Note: Sys-con has republished at least one of the articles from  this Beyond Search Web log.) The explanation of the method is in five parts. The first section provides a general description and the last section spells out the performance improvements:

  1. How to turn the indexing capability on in your application here
  2. Part 2 here — Sample code
  3. Part 3 here — Sample code
  4. Part 4 here — A discussion of application scenarios
  5. Part 5 here — The benchmark table

The conclusion to the write up made this point:

It’s not uncommon that those overheads [redundant parsing of XML] account for 80%-90% or more of the total CPU cycles of running the application. VTD-XML obliterates those overheads since there’s not much overhead left to optimize. Using VTD-XML as a parser reduces XML parsing overhead by 5x-10x. Next VTD-XML’s incremental update uniquely eliminates the roundtrip overhead of updating XML. Moreover, this article shows VTD-XML’s innovative non-blocking, stateless XPath engine significantly outperforming Jaxen and Xalan. With the addition of the indexing capability, XML parsing has now become “optional.”

A happy quack to the reader who called the VTD-XML method to my attention.

Stephen Arnold, December 4, 2008

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta