PDF Search

June 23, 2010

You can pinpoint PDF files in Google via its advanced search option or just keying this string after your query, filetype:pdf. Too much work? Navigate to http://www.pdfpick.com/. The service limits the query to the wonderful PDF files. My acquaintance with PDFs began at Ziff in the late 1980s. I think I had to kick the tires of what was then called “Trapeze”. Over the last 20 years I have watched the file format become the sleek, well formed, round, firm, and fully packed wonder that it is. Bound phrases? Forget it. Snappy rendering. Forget it. Malware safe? Forget it. Tools for limiting file validity by time or number of opens? Forget it. Universally searchable? Forget it. Autoscaling on mobile devices? Forget it. Users who know what a tiff wrapper is? Forget it. Nevertheless, PDFs are part of the landscape. If you want to limit your query to this file type, give PDFPick a try.

Stephen E Arnold, June 23, 2010

Freebie

Comments

3 Responses to “PDF Search”

  1. sperky undernet on June 23rd, 2010 3:41 am

    Google Adds OCR for PDF Files and Images
    http://googlesystem.blogspot.com/2010/06/google-adds-ocr-for-pdf-files-and.html

    Question: If, as described in this blogpiece, the service is imperfect – does this have ramifications for imperfect results when searching within pdf files on universal google search too? Or can the “do you mean” feature work both ways – if I search “meaningful dent in the universe” filetype”:pdf can google also produce the correct result even if google reads the pdf as “meaningful dem in the universe”?

    I tried this and got no result but then I also searched “make a dent in the universe” filetype”pdf and got no result either – so I assume either this was not an indexed pdf file or was a private file uploaded for the exercise.

    If anyone from the googlesystem.blogspot.com team reads this – or someone else tries this exercise with an indexed pdf file – an answer here will be appreciated. The ramifications either way are significant, assuming my assumption is on the mark.

  2. Marc Arenstein on June 24th, 2010 8:00 am

    http://www.googlelabs.com/show_details?app_key=agtnbGFiczIwLXd3d3ITCxIMTGFic0FwcE1vZGVsGIQpDA

    Have you considered using “Google Suggest” as an internal fix for producing better results searching pdf documents based on imperfect OCRs containing broken or hyphenated words and/or typos? To best of my knowledge based on my own initial testing – this is a huge gap with potentially huge ramifications and I would guess affecting most searches based also on pdf documents – and could be corrected I would suggest hypothetically by including hits with the OCR imperfections leading invisibly for the user to the pdf documents without the imperfections he/she has no idea he/she was “missing” anyway.
    If true, the suggestion is that there – in this case alone – are a lot of potentially big fish the user is not catching.

    ——————————————————————————————————-

    extra detail:

    Try any pdf – convert it to OCR, then look for broken words or typos in the OCR
    and search as part of phrase in the universal search engine. And Google
    cannot find it!

    Also, if you search the corrected phrase from the OCR, and if it is unique – not appearing in duplicated but corrected form in the OCR in that document – you won’t find that either…

  3. Richard on June 27th, 2010 7:06 am

    That site looks good but frankly i find http://zxdrive.com way better in searching for documents on the internet. It got way more options and besides the pdf files search, it can find Acrobat, Doc, Word, Excel, Powerpoint and Flash files as well.

  • Archives

  • Recent Posts

  • Meta