Skip to main content
rafx79
Known Participant
March 12, 2015
Question

Extract text from placed PDF file.

  • March 12, 2015
  • 1 reply
  • 412 views

Hi!

I'm wondering if there's a way to extract the text from a Placed PDF file or EPS file.

This topic has been closed for replies.

1 reply

Legend
March 16, 2015

If the export formats won‘t help, I fear there is nothing straightforward.

Short from parsing the files (both formats are documented), you can render them thru a custom graphics port.

You'll have to translate glyph IDs to characters, distances to word breaks etc, and still miss bitmap text under flattened transparencies.

rafx79
rafx79Author
Known Participant
March 18, 2015

Thanks Dirk.

I thought that might be the case... But just expanding on your great idea of using export, using Acrobat I can export it to a text file, but I can't find a way to do that using the export formats exposed in InDesign. Maybe I could use the PDF library SDK within my InDesign Plugin to export it and then read the output so that I can use it for my purpose?