Skip to main content
Participant
September 22, 2016
Question

Converting Multiple Scanned PDF's into Excel

  • September 22, 2016
  • 1 reply
  • 503 views

We have a LOT of scanned PDF's at work (about 8,400) that we are needing to export into Excel. There are a few tricky parts to this as I'm trying to find if there's a way to automate all of this. These are all individual PDF's, but I know I could convert them all to a multipage PDF if I need to, that shouldn't be an issue. The major issue being we only want a small portion of all of the PDF's, the top left corner that has some basic information (name and address), to be exported to an Excel spreadsheet. The rest of the PDF we do not need. The are all single page PDF's if that helps anything on the automation formatting. Does anyone know if that can be done?

This topic has been closed for replies.

1 reply

CtDave
Participating Frequently
September 23, 2016

Remember that the output of a scanner is an image -- no "text".

OCR of the scanned image can provide renderable text (hidden or visible glyphs depending on which method of OCR is used).

The OCR output can be exported. Don't expect it to be a 100% recognition of the bit map images of the characters.

Pulling a "piece" of the OCR output only would be something to ask about in the javascript sub-forum.

Be well...