Xerox Devices Play Fast and Loose with Numbers
September 2, 2013
Recently, it has been discovered that some Xerox scanners and copiers, specifically those in its WorkCentre line, have begun to make changes to documents. The compression technology meant to save memory is making users question their own recall by swapping out similar-looking numerals. The ensuing havoc can be anything from mildly annoying to downright dangerous, depending on the document in question. German blogger D. Kriesel goes into the technical details in, “Xerox Scanners/ Photocopiers Randomly Alter Numbers in Scanned Documents.”
From the post, it looks like Kriesel was part of the group that originally discovered the problem while reviewing some construction plans they were working on. (Hooray, due diligence!) He has worked hard to reproduce the error, and documents his observations with plenty of screenshots. He writes (and amends):
“There seems to be a correlation between font size, scan dpi used. I was able to reliably reproduce the error for 200 DPI PDF scans w/o OCR, of sheets with Arial 7pt and 8pt numbers. Overall it looks like some sort of compression algorithm using patches more than once (I think I could even identify some equally-pixeled eights).
“Edit: It seems that the above thought was not that wrong at all. Several mails I got suggest that the Xerox machines use JBIG2 for compression. Even though the specification only cover the JBIG2 decompression, in reality, there is often created a dictionary of image patches found to be ‘similar’. Those patches then get reused instead of the original image data, as long as the error generated by them is not ‘too high.’ Makes sense.”
Kriesel goes on to note that the JGIB2 compression standard gives “no guarantee that parts of the scanned image actually come from the corresponding place on the paper.” So, as long as the bit of the image looks right, the software goes with it. That may be just fine for purely aesthetic applications, or even number-free text (if there is such a thing), but numbers kinda need to adhere to the original.
As Kriesel points out, the snafu raises serious questions about Xerox’s quality-control process. He continues to add information to this post as the issue develops. Will the tech company adequately address the error?
Cynthia Murrell, September 02, 2013
Sponsored by ArnoldIT.com, developer of Augmentext